Sometimes growforest runs for a long time in the last few trees #50

vdemario · 2015-06-09T17:09:11Z

I've noticed more than once that growforest tends to output the first trees relatively fast and slows down in the end, when there are around 5 or 6 trees missing (out of a 100).

What I believe is happening is the recursion sometimes keeps going on for a really long time regardless of the depth. I haven't seem it go into an infinite loop or a stack overflow, but I suppose that's possible if my interpretation is correct.

At one point last year I remember having seen this and I made a change to my local copy in which I broke out of the recursion when the depth was some high number that almost never happened (100 thousand or 1 million, can't remember) and it worked, even though it was very ugly. Applyforest was happy with the .sf file generated, nothing seemed to be wrong.

This time around I'd like to understand what's happening better to see if there is a better solution. I've only been experimenting with combinations of -oob, -progress and -vet so far, so there might be flags already to help with this, I'm not sure.

The text was updated successfully, but these errors were encountered:

ryanbressler · 2015-06-09T17:53:47Z

I have code to add a max depth parameter i'll push soon as part of an overhaul to boosting (which is often done with "stumps" or other simple trees).

I've definitely noticed some straggler issues like you're describing, there may be a few things going on.

A lot of it is because parallelism is done on a per tree level so as the number of trees left drops bellow the number of cores in use the rate at which trees finish drops off. Moving to parallelism on a per tree or per feature evaluated in split searching would speed this up and allow parallel boosting but require some sort of task queue so I haven't done it (yet).

I usually use a relatively large value of leaf size to limit model complexity (tree depth can be used for the same thing though slightly different) as this will both combat overfitting and result in faster training. The default settings are probably best for data sets that are small by modern standards. Smaller values of mTry will also limit tree depth as the tree will stop when it can't find a good split.

Definitely let me know if you run across a case where you believe tree growth should stop when it isn't

vdemario · 2016-03-09T15:24:18Z

-maxDepth is on master since September, so I'm gonna close this issue. Thanks.

vdemario mentioned this issue Jun 9, 2015

Starting TREE header from 0 instead of nCores #49

Closed

vdemario closed this as completed Mar 9, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sometimes growforest runs for a long time in the last few trees #50

Sometimes growforest runs for a long time in the last few trees #50

vdemario commented Jun 9, 2015

ryanbressler commented Jun 9, 2015

vdemario commented Mar 9, 2016

Sometimes growforest runs for a long time in the last few trees #50

Sometimes growforest runs for a long time in the last few trees #50

Comments

vdemario commented Jun 9, 2015

ryanbressler commented Jun 9, 2015

vdemario commented Mar 9, 2016