Automated ensembling techniques #34

reiinakano · 2017-06-10T09:15:01Z

Working for a while with Xcessiv, I feel there's a need for some way to automate the selection of base learners in an ensemble. I'm unaware of existing techniques for this, so if anyone has any suggestions or could point me towards relevant literature, it would be greatly appreciated.

techscientist · 2017-06-10T11:48:05Z

This would be an awesome idea.

One idea that I have would be to do the following:

Ask the user to select an evaluation metric that he/she wishes to maximize (ie. accuracy) or minimize.
Ask the user to select the maximum number of top-performing ensembles to have (like top-k ensembles based on their performance against the metric)
Then perform random or grid-like combinations of multiple base estimators. For each one, train it, and then evaluate it against the metric. If its performance is in the top-k list of ensembles then add it, maintaining a rolling top-k list of the top-performing ensembles.

In this approach, it is definitely important to let the user quit the automation process while it is running.

How does this sound, @reiinakano ? Maybe this would be good for an initial implementation?

reiinakano · 2017-06-10T12:02:14Z

I haven't actually figured out the best way to let a user quit a process manually. Currently the only way to do that is to forcibly close the terminal running Xcessiv. One good thing about Xcessiv is that it stores meta-features of each base learner scored automatically, so it's actually quite fast to calculate the performance of one ensemble, since the only training you do is for the secondary estimator.

Anyway, I don't think it's necessary to maintain a rolling top-k list of ensembles. Instead I'd just store everything that was calculated in the database. You can easily sort with whatever metric you want anyway. This is currently what is done when you do Bayesian optimization for the base learners. The list of base learners just kind of auto-updates while the search is running.

I was thinking of doing something along the same lines for stacked ensembles. What I need is a smart algorithm or technique for selecting which base learners should be used and in what combinations. One way people do this is through a kind of greedy approach, iteratively trying out base learners to add and keeping it if the target metric rises. Of course, random combinations of base learners might actually be a good approach too, considering that it's better than grid search for optimizing base learners.

techscientist · 2017-06-10T12:19:50Z

@reiinakano , that's also a good point. So, maybe go with a random approach for now, and add more later? Perhaps more developers would add their ideas to this issue and other issues as time goes on

I also think that a random approach might be better than grid search for now.

reiinakano · 2017-06-10T12:22:43Z

Agreed, I think it's important to settle on some kind of framework so that in the future, different exploration methods can be added very easily. I certainly intend on adding things other than Bayesian optimization for optimizing base learners in the future e.g. hyperband

Thanks for your inputs! Appreciate them a lot!

reiinakano · 2017-06-23T17:54:05Z

Added automated ensembling based on greedy forward model selection in #43 and is in v0.5.0

reiinakano added enhancement help wanted labels Jun 10, 2017

reiinakano removed the help wanted label Jun 15, 2017

reiinakano closed this as completed Jun 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automated ensembling techniques #34

Automated ensembling techniques #34

reiinakano commented Jun 10, 2017 •

edited

techscientist commented Jun 10, 2017 •

edited

reiinakano commented Jun 10, 2017 •

edited

techscientist commented Jun 10, 2017

reiinakano commented Jun 10, 2017

reiinakano commented Jun 23, 2017 •

edited

Automated ensembling techniques #34

Automated ensembling techniques #34

Comments

reiinakano commented Jun 10, 2017 • edited

techscientist commented Jun 10, 2017 • edited

reiinakano commented Jun 10, 2017 • edited

techscientist commented Jun 10, 2017

reiinakano commented Jun 10, 2017

reiinakano commented Jun 23, 2017 • edited

reiinakano commented Jun 10, 2017 •

edited

techscientist commented Jun 10, 2017 •

edited

reiinakano commented Jun 10, 2017 •

edited

reiinakano commented Jun 23, 2017 •

edited