Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

min.node.size. #102

Closed
mutian-niu opened this issue Aug 10, 2016 · 1 comment
Closed

min.node.size. #102

mutian-niu opened this issue Aug 10, 2016 · 1 comment

Comments

@mutian-niu
Copy link

mutian-niu commented Aug 10, 2016

##Hello Marvin @mnwright ,

I really appreciate your efforts on writing this package, it is really fast!

I just have a quick question on the min.node.size. The default of min.node.size for regression is 5, however, in some RF model that we fit, I found the average node size is smaller than 5 (I have 15000 observations and the approximate average tree size is around 4000, so about 3 observations per terminal). I looked up the R code, it seems like it is set to 0 if not specified. Could you please check the setting of min.node.size?

Btw, I was just wondering if it there a way to compute the tree size of the model? Thank you!!

Sincerely,
Mutian

@mnwright
Copy link
Member

mnwright commented Aug 12, 2016

The min.node.size is evaluated before the splitting, meaning that smaller nodes can occur.
This is also described in the help on ranger():

Note that for classification and regression nodes with size smaller than min.node.size can occur, like in original Random Forest. For survival all nodes contain at least min.node.size samples.

Note the difference for survival forests. The 0 in the R code is mapped to response-specific default values later in C++.

To get the number of nodes per tree you could use
sapply(rf$forest$split.varIDs, length)
where rf is a ranger object with write.forest = TRUE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants