Warm-up/pre-training configuration for the AdaptiveRandomForest #1493

kdMoura · 2024-01-20T23:23:14Z

kdMoura
Jan 20, 2024

Hi all,

For the AdaptiveRandomForest, is there a warm-up/pre-training configuration where the algorithm could benefit from processing an entire dataset initially instead of learning instances one by one? If not, could you provide a general direction on where I should look to implement it?

After searching previous discussion posts, I believe this is not implemented, but I am double-checking. I know I can pre-train the model using learn_one on the initial data (instance by instance).

My point is that ARF performance could be close to the Random Forest if, at an initial stage, it could use all samples at once. Pre-training would be useful if you have a dataset that "mostly" will represent the streaming data, and instead of starting from scratch in production, you could use a well-pre-trained model.

Thanks.

Answered by smastelini

Jan 24, 2024

Hi @kdMoura. No, there is not currently such an option available. The way Hoeffding Trees are trained is crucially different from, let's say, CART trees. There is the whole single pass philosophy and computational resource saving. Split evaluations are also different.

That is not to say that your idea is invalid; it is and I think it has a lot of potential. I believe it could be best implemented via a tree model converter that takes trained sklearn forests and converts them to River's format. Otherwise, we would need to create a batch-based training method in ARF, which, again, is fundamentally different from the Hoeffding tree framework.

View full answer

smastelini · 2024-01-24T14:36:05Z

smastelini
Jan 24, 2024
Maintainer

Hi @kdMoura. No, there is not currently such an option available. The way Hoeffding Trees are trained is crucially different from, let's say, CART trees. There is the whole single pass philosophy and computational resource saving. Split evaluations are also different.

That is not to say that your idea is invalid; it is and I think it has a lot of potential. I believe it could be best implemented via a tree model converter that takes trained sklearn forests and converts them to River's format. Otherwise, we would need to create a batch-based training method in ARF, which, again, is fundamentally different from the Hoeffding tree framework.

2 replies

kdMoura Jan 30, 2024
Author

Thanks for your answer, @smastelini! Very informative and helpful!

smastelini Jan 31, 2024
Maintainer

You're welcome!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warm-up/pre-training configuration for the AdaptiveRandomForest #1493

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Warm-up/pre-training configuration for the AdaptiveRandomForest #1493

kdMoura Jan 20, 2024

Replies: 1 comment · 2 replies

smastelini Jan 24, 2024 Maintainer

kdMoura Jan 30, 2024 Author

smastelini Jan 31, 2024 Maintainer

kdMoura
Jan 20, 2024

Replies: 1 comment 2 replies

smastelini
Jan 24, 2024
Maintainer

kdMoura Jan 30, 2024
Author

smastelini Jan 31, 2024
Maintainer