Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Precompute X_argsorted in AdaBoost #1668

Merged
merged 4 commits into from
Feb 10, 2013

Conversation

glouppe
Copy link
Contributor

@glouppe glouppe commented Feb 9, 2013

This fixes #1667.

@ogrisel
Copy link
Member

ogrisel commented Feb 9, 2013

LGTM.

@amueller
Copy link
Member

amueller commented Feb 9, 2013

sweet, that was fast :)

@amueller
Copy link
Member

amueller commented Feb 9, 2013

Can we test somehow that it does the right thing?

@ogrisel
Copy link
Member

ogrisel commented Feb 9, 2013

BTW: if we used isinstance(base_estimator, BaseDecisionTree) instead of the inspect trick that would prevent to use bagging ensemble of trees as the base estimator for AdaBoost so I am ok with keeping the inspect trick.

@mblondel
Copy link
Member

-1 on using inspect. It's a really ugly hack.

@glouppe
Copy link
Contributor Author

glouppe commented Feb 10, 2013

What shall be used instead? Indeed, that trick also works for forests of randomized trees.

if isinstance(base_estimator, BaseDecisionTree) or isinstance(base_estimator, BaseForest) ?

(I have no strong opinion about that. I will be happy as soon it speeds things up for trees.)

@mblondel
Copy link
Member

@glouppe The solution based on isinstance seems nicer. If you want to allow external estimators, you could also implement a fit_argsorted method and check its presence with if hasattr(base_estimator, "fit_argsorted").

@glouppe
Copy link
Contributor Author

glouppe commented Feb 10, 2013

I changed for isinstance. Actually, forests do not have a X_argsorted fit parameter (I thought they were), hence that optimization cannot apply to them at the moment. I'll fix that later today or tomorrow.

@amueller
Copy link
Member

-1 on isinstance. It relies on inheritance which we usually try to avoid.
Also -1 on adding another function. Why blow up the API like that?
@mblondel why do you think inspection is an ugly hack? We use it a lot in the basic api (for get_params for example).

@mblondel
Copy link
Member

My gut feeling is that if we have to resort to something like that, something went wrong.
If inspecting the parameters is such a nice idea, why do we force unsupervised estimators to have a y parameter?
Regarding fit_argsorted, if you prefer, it could be considered an internal method and prefixed with a _.

@ogrisel
Copy link
Member

ogrisel commented Feb 10, 2013

I am ok for isinstance in the short term as the X_argsorted optim can currently only be implemented for DecisionTree and it's the least hackish / invasive method for now.

Alternatively I am also ok for _fit_argsorted (private) to avoid a combinatorial explosion of the public API. This optim is really specific for tree based models and I don't think it deserves extending the supported public API.

The later option would require refactoring the BaseDecisiontTree class a bit though.

@amueller
Copy link
Member

Ok, I guess the private method would be the nicest solution as it allows usual ducktyping.
@mblondel I think testing whether fit takes a y argument might actually be nice ;) The problem with that kind of inspection is that we test the argument names which is probably not that stable, i.e. if if is called Y somewhere it would suddenly break :-/

@glouppe
Copy link
Contributor Author

glouppe commented Feb 10, 2013

So basically, can this be merged as this or shall I add now _fit_argsorted in BaseDecisionTree and in BaseForest? (This requires some work, but I can do it sometimes this week.)

@amueller
Copy link
Member

feel free to merge

glouppe added a commit that referenced this pull request Feb 10, 2013
[MRG] Precompute X_argsorted in AdaBoost
@glouppe glouppe merged commit b126421 into scikit-learn:master Feb 10, 2013
@glouppe glouppe deleted the adaboost-tree branch February 10, 2013 13:38
@ogrisel
Copy link
Member

ogrisel commented Feb 10, 2013

@glouppe Please open a new issue to implement the _fit_argsorted solution to make it AdaBoost work both with simple trees and ensemble of trees as base estimators.

@GaelVaroquaux
Copy link
Member

BTW: if we used isinstance(base_estimator, BaseDecisionTree) instead of the
inspect trick that would prevent to use bagging ensemble of trees as the base
estimator for AdaBoost so I am ok with keeping the inspect trick.

:). Relying on inheritence is troublesome for software design.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AdaBoost is slow with trees
5 participants