Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EasyEnsemble should be a meta-estimator #252

Closed
amueller opened this issue Mar 20, 2017 · 6 comments · Fixed by #315
Closed

EasyEnsemble should be a meta-estimator #252

amueller opened this issue Mar 20, 2017 · 6 comments · Fixed by #315
Assignees
Labels
Status: Blocker Type: Enhancement Indicates new feature requests

Comments

@amueller
Copy link
Member

amueller commented Mar 20, 2017

I'm not entirely sure how EasyEnsemble should be used, but I feel like it might be easier if it was a meta-estimator.
Let's say I want to implement a random forest using EasyEnsemble as an estimator. I have no idea how to do that easily. If it was a meta-estimator, I could just do EasyEnsemble(DecisionTreeClassifier(max_features="auto")), which would be nice. Ideally this would set the random seeds, too.

I ended up with this solution, which works, but is verbose and probably inefficient:

from sklearn.ensemble import VotingClassifier
from sklearn.tree import DecisionTreeClassifier
def make_resample_tree(random_state=0):
    tree = make_imb_pipeline(RandomUnderSampler(random_state=random_state, replacement=True),
                             DecisionTreeClassifier(max_features='auto', random_state=random_state))
    return "tree_i".format(random_state), tree
classifiers = [make_resample_tree(i) for i in range(100)]
resampled_rf = VotingClassifier(classifiers, voting='soft')

[Though it is kinda neat that I can write it down like that ;]

@amueller
Copy link
Member Author

A stupid implementation of the general scheme is this:

from sklearn.base import clone

def make_resampled_ensemble(estimator, n_estimators=100):
    estimators = []
    for i in range(n_estimators):
        est = clone(estimator)
        if hasattr(est, "random_state"):
            est.random_state = i
        pipe = make_imb_pipeline(RandomUnderSampler(random_state=i, replacement=True),
                                 est)
        estimators.append(("est_i".format(i), pipe))
    return VotingClassifier(estimators, voting="soft")

Clearly you could do that more elegantly by actually writing a meta-estimator class...

@glemaitre
Copy link
Member

You are right. I think that this is also the solution at #149.

@glemaitre glemaitre added the Type: Enhancement Indicates new feature requests label Mar 20, 2017
@chkoar
Copy link
Member

chkoar commented Mar 20, 2017

@amueller is right. On the other hand we could preserve functionality by rename this class as EasyEnsembleSampler and on top of that build the EasyEnsembleClassifier. I am ok either way. I could do it. PS the sampler and the classifier could easily paralellized.

@glemaitre
Copy link
Member

glemaitre commented Mar 20, 2017 via email

@amueller
Copy link
Member Author

fwiw I don't have a strong opinion on also keeping the current version or not - people might rely on it.

@glemaitre
Copy link
Member

@chkoar I was thinking for that matter. I think that the resampling strategy should take any under-sampling methods and that the classifier could be any classifier as well. You should probably keep this in mind while designing the meta-estimators.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Blocker Type: Enhancement Indicates new feature requests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants