Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated ensembling techniques #34

Closed
reiinakano opened this issue Jun 10, 2017 · 5 comments
Closed

Automated ensembling techniques #34

reiinakano opened this issue Jun 10, 2017 · 5 comments

Comments

@reiinakano
Copy link
Owner

reiinakano commented Jun 10, 2017

Working for a while with Xcessiv, I feel there's a need for some way to automate the selection of base learners in an ensemble. I'm unaware of existing techniques for this, so if anyone has any suggestions or could point me towards relevant literature, it would be greatly appreciated.

@techscientist
Copy link

techscientist commented Jun 10, 2017

This would be an awesome idea.

One idea that I have would be to do the following:

  1. Ask the user to select an evaluation metric that he/she wishes to maximize (ie. accuracy) or minimize.
  2. Ask the user to select the maximum number of top-performing ensembles to have (like top-k ensembles based on their performance against the metric)
  3. Then perform random or grid-like combinations of multiple base estimators. For each one, train it, and then evaluate it against the metric. If its performance is in the top-k list of ensembles then add it, maintaining a rolling top-k list of the top-performing ensembles.

In this approach, it is definitely important to let the user quit the automation process while it is running.

How does this sound, @reiinakano ? Maybe this would be good for an initial implementation?

@reiinakano
Copy link
Owner Author

reiinakano commented Jun 10, 2017

I haven't actually figured out the best way to let a user quit a process manually. Currently the only way to do that is to forcibly close the terminal running Xcessiv. One good thing about Xcessiv is that it stores meta-features of each base learner scored automatically, so it's actually quite fast to calculate the performance of one ensemble, since the only training you do is for the secondary estimator.

Anyway, I don't think it's necessary to maintain a rolling top-k list of ensembles. Instead I'd just store everything that was calculated in the database. You can easily sort with whatever metric you want anyway. This is currently what is done when you do Bayesian optimization for the base learners. The list of base learners just kind of auto-updates while the search is running.

I was thinking of doing something along the same lines for stacked ensembles. What I need is a smart algorithm or technique for selecting which base learners should be used and in what combinations. One way people do this is through a kind of greedy approach, iteratively trying out base learners to add and keeping it if the target metric rises. Of course, random combinations of base learners might actually be a good approach too, considering that it's better than grid search for optimizing base learners.

@techscientist
Copy link

@reiinakano , that's also a good point. So, maybe go with a random approach for now, and add more later? Perhaps more developers would add their ideas to this issue and other issues as time goes on

I also think that a random approach might be better than grid search for now.

@reiinakano
Copy link
Owner Author

Agreed, I think it's important to settle on some kind of framework so that in the future, different exploration methods can be added very easily. I certainly intend on adding things other than Bayesian optimization for optimizing base learners in the future e.g. hyperband

Thanks for your inputs! Appreciate them a lot!

@reiinakano
Copy link
Owner Author

reiinakano commented Jun 23, 2017

Added automated ensembling based on greedy forward model selection in #43 and is in v0.5.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants