Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC add comments regarding to make a balanced random forest from a BalancedBaggingClassifier #372

Closed
glemaitre opened this issue Nov 22, 2017 · 14 comments

Comments

@glemaitre
Copy link
Member

@glemaitre glemaitre commented Nov 22, 2017

I think that we could add note mentioning that we can achieve a balanced random forest classifier by setting max_features='auto' of the decision tree. I don't think that we should implement a new estimator since scikit-learn is going to do it.

@glemaitre
Copy link
Member Author

@glemaitre glemaitre commented Nov 22, 2017

@chkoar WDYT?

@chkoar
Copy link
Member

@chkoar chkoar commented Nov 22, 2017

I don't have a strong opinion on this. I believe that a robust predictors/ensemble methods module could bring traction to the package. For that reason I would may implement BRF even as a shortcut.

@glemaitre
Copy link
Member Author

@glemaitre glemaitre commented Nov 22, 2017

@chkoar
Copy link
Member

@chkoar chkoar commented Nov 27, 2017

@glemaitre is a PR stalled in scikit-learn where we could contribute to be finished?

@glemaitre
Copy link
Member Author

@glemaitre glemaitre commented Nov 27, 2017

it is already merged in #373

@chkoar
Copy link
Member

@chkoar chkoar commented Nov 27, 2017

I asked because you said .

In that regards this why scikit learn has a PR there.

@glemaitre
Copy link
Member Author

@glemaitre glemaitre commented Nov 27, 2017

@chkoar
Copy link
Member

@chkoar chkoar commented Nov 27, 2017

which one?

@glemaitre
Copy link
Member Author

@glemaitre glemaitre commented Nov 27, 2017

@chkoar
Copy link
Member

@chkoar chkoar commented Dec 26, 2017

We wanted also to balanced at each node instead of tree to see the difference.

@glemaitre in each tree? Does this task need modification in the Cython level?

@glemaitre
Copy link
Member Author

@glemaitre glemaitre commented Dec 26, 2017

@potash
Copy link

@potash potash commented Jan 29, 2018

@chkoar, @glemaitre I'm not familiar with imblearn but in sklearn you can balance each tree simply by changing the sample_indices that get passed. This is how I implemented it in scikit-learn/scikit-learn#8732.

So does BalancedBaggingClassifier(max_features="auto") balance at each tree or does it just balance the data once? If the latter, I think it is confusing to call that a "balanced random forest" in the documentation because in the Breiman paper that refers to balancing each tree.

@glemaitre
Copy link
Member Author

@glemaitre glemaitre commented Jan 29, 2018

The implementation is a pipeline of a random under sampler with an estimator. So if you pass an estimator which is a tree, it will balance each subset and then fit a tree on each subset. Therefore, BalancedBaggingClassifier become a BalancedRandomForest with max_feature='auto' as mentioned in the documentation.

@potash
Copy link

@potash potash commented Jan 29, 2018

@glemaitre OK thanks for the clarification

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

3 participants