Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Poisson splitting rule #495

Merged
merged 12 commits into from
Jun 11, 2024
Merged

Add Poisson splitting rule #495

merged 12 commits into from
Jun 11, 2024

Conversation

lorentzenchr
Copy link
Contributor

What does this PR do?

This PR implements a splitrule = "poisson" with the additional option poisson.tau to deal with pure nodes that have y = 0.

References

Solves #433.

Further info

The Poisson splitrule is based on the Poisson deviance, but after some arithmetics that makes the split rule computation faster.

The option poisson.tau takes action if a terminal node has y=0. It then estimates the value of that node as alpha * 0 + (1-alpha) * mean(parent node) and alpha = samples(node)*mean(parent node) / (poisson.tau + samples(node)*mean(parent node)). The larger the value of poisson.tau the closer the prediction to the parent node's mean. Rpart does it similar.
An alternative would have been (or for the future?) to give an option like "minimum sum of responses per node".

@lorentzenchr
Copy link
Contributor Author

@mnwright It would be really great if you could have a look. Feedback is very warmly welcome.

R/ranger.R Outdated Show resolved Hide resolved
@@ -2,7 +2,7 @@ library(ranger)
context("ranger_quantreg")

rf.quant <- ranger(mpg ~ ., mtcars[1:26, ], quantreg = TRUE,
keep.inbag = TRUE, num.trees = 50)
keep.inbag = TRUE, num.trees = 100, seed = 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for the change? Or just a mistake commit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this change, I got

Error in ranger(mpg ~ ., mtcars[1:26, ], quantreg = TRUE, keep.inbag = TRUE,  : 
  Error: Too few trees for out-of-bag quantile regression.

It also depends on the seed. This change solved that on my machine. I should have done a separate commit.

tests/testthat/test_poissonsplit.R Show resolved Hide resolved
src/TreeRegression.cpp Show resolved Hide resolved
src/TreeRegression.cpp Show resolved Hide resolved
src/TreeRegression.cpp Show resolved Hide resolved
@mnwright
Copy link
Member

Looks great, thanks!

What I have to do before merge (notes to myself):

  • Add to pure C++ version (help etc.)
  • Read and understand the splitting rule

@lorentzenchr
Copy link
Contributor Author

Thank you for the fast review!

@lorentzenchr
Copy link
Contributor Author

lorentzenchr commented Mar 25, 2020

I refactored the Poisson splitting rule a bit to follow findBestSplitValueSmallQ instead of findBestSplitValueBeta. This gave me a huge speedup (up to 20x). It is now roughly half as fast as the variance splitting rule.

I assume, a similar speedup could be possible for the beta splitting rule?

@mnwright
Copy link
Member

I assume, a similar speedup could be possible for the beta splitting rule?

Yes, that should be possible.

@lorentzenchr
Copy link
Contributor Author

@mnwright I merged master and still think this would be nice to have.

@mnwright
Copy link
Member

Sorry for the long silence. I still think this is useful. Let's try to merge it for the next release.

@mnwright
Copy link
Member

Looks good, I think we are ready to merge?

@lorentzenchr
Copy link
Contributor Author

Yes, why not just merge?

@mnwright mnwright merged commit 10b73fd into imbs-hl:master Jun 11, 2024
8 checks passed
@mnwright
Copy link
Member

Merged 🎉

@lorentzenchr lorentzenchr deleted the poisson branch June 11, 2024 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants