You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've noticed more than once that growforest tends to output the first trees relatively fast and slows down in the end, when there are around 5 or 6 trees missing (out of a 100).
What I believe is happening is the recursion sometimes keeps going on for a really long time regardless of the depth. I haven't seem it go into an infinite loop or a stack overflow, but I suppose that's possible if my interpretation is correct.
At one point last year I remember having seen this and I made a change to my local copy in which I broke out of the recursion when the depth was some high number that almost never happened (100 thousand or 1 million, can't remember) and it worked, even though it was very ugly. Applyforest was happy with the .sf file generated, nothing seemed to be wrong.
This time around I'd like to understand what's happening better to see if there is a better solution. I've only been experimenting with combinations of -oob, -progress and -vet so far, so there might be flags already to help with this, I'm not sure.
The text was updated successfully, but these errors were encountered:
I have code to add a max depth parameter i'll push soon as part of an overhaul to boosting (which is often done with "stumps" or other simple trees).
I've definitely noticed some straggler issues like you're describing, there may be a few things going on.
A lot of it is because parallelism is done on a per tree level so as the number of trees left drops bellow the number of cores in use the rate at which trees finish drops off. Moving to parallelism on a per tree or per feature evaluated in split searching would speed this up and allow parallel boosting but require some sort of task queue so I haven't done it (yet).
I usually use a relatively large value of leaf size to limit model complexity (tree depth can be used for the same thing though slightly different) as this will both combat overfitting and result in faster training. The default settings are probably best for data sets that are small by modern standards. Smaller values of mTry will also limit tree depth as the tree will stop when it can't find a good split.
Definitely let me know if you run across a case where you believe tree growth should stop when it isn't
I've noticed more than once that
growforest
tends to output the first trees relatively fast and slows down in the end, when there are around 5 or 6 trees missing (out of a 100).What I believe is happening is the recursion sometimes keeps going on for a really long time regardless of the depth. I haven't seem it go into an infinite loop or a stack overflow, but I suppose that's possible if my interpretation is correct.
At one point last year I remember having seen this and I made a change to my local copy in which I broke out of the recursion when the depth was some high number that almost never happened (100 thousand or 1 million, can't remember) and it worked, even though it was very ugly. Applyforest was happy with the .sf file generated, nothing seemed to be wrong.
This time around I'd like to understand what's happening better to see if there is a better solution. I've only been experimenting with combinations of
-oob
,-progress
and-vet
so far, so there might be flags already to help with this, I'm not sure.The text was updated successfully, but these errors were encountered: