Warm-up/pre-training configuration for the AdaptiveRandomForest #1493
-
Hi all, For the AdaptiveRandomForest, is there a warm-up/pre-training configuration where the algorithm could benefit from processing an entire dataset initially instead of learning instances one by one? If not, could you provide a general direction on where I should look to implement it? After searching previous discussion posts, I believe this is not implemented, but I am double-checking. I know I can pre-train the model using learn_one on the initial data (instance by instance). My point is that ARF performance could be close to the Random Forest if, at an initial stage, it could use all samples at once. Pre-training would be useful if you have a dataset that "mostly" will represent the streaming data, and instead of starting from scratch in production, you could use a well-pre-trained model. Thanks. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Hi @kdMoura. No, there is not currently such an option available. The way Hoeffding Trees are trained is crucially different from, let's say, CART trees. There is the whole single pass philosophy and computational resource saving. Split evaluations are also different. That is not to say that your idea is invalid; it is and I think it has a lot of potential. I believe it could be best implemented via a tree model converter that takes trained sklearn forests and converts them to River's format. Otherwise, we would need to create a batch-based training method in ARF, which, again, is fundamentally different from the Hoeffding tree framework. |
Beta Was this translation helpful? Give feedback.
Hi @kdMoura. No, there is not currently such an option available. The way Hoeffding Trees are trained is crucially different from, let's say, CART trees. There is the whole single pass philosophy and computational resource saving. Split evaluations are also different.
That is not to say that your idea is invalid; it is and I think it has a lot of potential. I believe it could be best implemented via a tree model converter that takes trained sklearn forests and converts them to River's format. Otherwise, we would need to create a batch-based training method in ARF, which, again, is fundamentally different from the Hoeffding tree framework.