Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds a way to create an ensemble out of existing models #129

Merged
merged 6 commits into from
Apr 14, 2021

Conversation

Craigacp
Copy link
Member

@Craigacp Craigacp commented Apr 7, 2021

Description

This PR adds a method to WeightedEnsembleModel called createEnsembleFromExistingModels which builds an ensemble out of the supplied model list, ensemble combiner and optionally model combination weights. It adds a TimestampTrainerProvenance which is used as the TrainerProvenance for any ensembles created this way. Finally it adds a couple of tests for VotingCombiner and AveragingCombiner along with a few tests of the new behaviour.

The ensemble building mechanism validates that there are at least two models, that there are equal numbers of models and combination weights and that the output domains for all the supplied models are the same. We could consider adding a check that the feature domains are the same, or have some overlap, but it's plausible that text models trained using data subsampling have different views of the feature space and so I've left that check for future work.

Motivation

It's currently hard in Tribuo to ensemble together models of different types (e.g. an ensemble of a linear model, a decision tree and a kernel SVM). This kind of arbitrary ensembling is occasionally very powerful, but difficult to express via Tribuo's trainer mechanism, so it's simpler to allow users to build custom ensembles from models they already have.

Paper reference

Kuncheva, L. I. (2014). Combining pattern classifiers: methods and algorithms.

@Craigacp Craigacp added the Oracle employee This PR is from an Oracle employee label Apr 13, 2021
Copy link
Member

@JackSullivan JackSullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good, apart from a few questions/edits.

I'd like to entertain doing something better with the provenance than just taking the first model's feature domain. i.e., how involved would it be to create a DatasetUnionProvenance? We don't seem to need the actual union here, though.

Co-authored-by: Jack Sullivan <john.t.sullivan@gmail.com>
@Craigacp
Copy link
Member Author

Well the ensemble provenance contains each of the individual models' provenances, and those each contain their training dataset provenance. So the information is all there, but I needed to pick something to put in the top level model provenance and the first one seemed easiest.

Copy link
Member

@JackSullivan JackSullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good based on our discussion with the above changes.

Co-authored-by: Jack Sullivan <john.t.sullivan@gmail.com>
Copy link
Member

@JackSullivan JackSullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@Craigacp Craigacp merged commit 0adf555 into oracle:main Apr 14, 2021
@Craigacp Craigacp deleted the ensemble-all-the-things branch April 14, 2021 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Oracle employee This PR is from an Oracle employee
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants