New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC add comments regarding to make a balanced random forest from a BalancedBaggingClassifier #372

Closed
glemaitre opened this Issue Nov 22, 2017 · 14 comments

Comments

Projects
None yet
3 participants
@glemaitre
Member

glemaitre commented Nov 22, 2017

I think that we could add note mentioning that we can achieve a balanced random forest classifier by setting max_features='auto' of the decision tree. I don't think that we should implement a new estimator since scikit-learn is going to do it.

@glemaitre

This comment has been minimized.

Member

glemaitre commented Nov 22, 2017

@chkoar WDYT?

@chkoar

This comment has been minimized.

Member

chkoar commented Nov 22, 2017

I don't have a strong opinion on this. I believe that a robust predictors/ensemble methods module could bring traction to the package. For that reason I would may implement BRF even as a shortcut.

@glemaitre

This comment has been minimized.

Member

glemaitre commented Nov 22, 2017

@chkoar

This comment has been minimized.

Member

chkoar commented Nov 27, 2017

@glemaitre is a PR stalled in scikit-learn where we could contribute to be finished?

@glemaitre

This comment has been minimized.

Member

glemaitre commented Nov 27, 2017

it is already merged in #373

@chkoar

This comment has been minimized.

Member

chkoar commented Nov 27, 2017

I asked because you said .

In that regards this why scikit learn has a PR there.

@glemaitre

This comment has been minimized.

Member

glemaitre commented Nov 27, 2017

@chkoar

This comment has been minimized.

Member

chkoar commented Nov 27, 2017

which one?

@glemaitre

This comment has been minimized.

Member

glemaitre commented Nov 27, 2017

@chkoar

This comment has been minimized.

Member

chkoar commented Dec 26, 2017

We wanted also to balanced at each node instead of tree to see the difference.

@glemaitre in each tree? Does this task need modification in the Cython level?

@glemaitre

This comment has been minimized.

Member

glemaitre commented Dec 26, 2017

@potash

This comment has been minimized.

potash commented Jan 29, 2018

@chkoar, @glemaitre I'm not familiar with imblearn but in sklearn you can balance each tree simply by changing the sample_indices that get passed. This is how I implemented it in scikit-learn/scikit-learn#8732.

So does BalancedBaggingClassifier(max_features="auto") balance at each tree or does it just balance the data once? If the latter, I think it is confusing to call that a "balanced random forest" in the documentation because in the Breiman paper that refers to balancing each tree.

@glemaitre

This comment has been minimized.

Member

glemaitre commented Jan 29, 2018

The implementation is a pipeline of a random under sampler with an estimator. So if you pass an estimator which is a tree, it will balance each subset and then fit a tree on each subset. Therefore, BalancedBaggingClassifier become a BalancedRandomForest with max_feature='auto' as mentioned in the documentation.

@potash

This comment has been minimized.

potash commented Jan 29, 2018

@glemaitre OK thanks for the clarification

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment