-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds a way to create an ensemble out of existing models #129
Conversation
…d arbitrary ensembles from existing models.
…leModel.createEnsembleFromExistingModels.
Core/src/main/java/org/tribuo/ensemble/WeightedEnsembleModel.java
Outdated
Show resolved
Hide resolved
Core/src/main/java/org/tribuo/ensemble/WeightedEnsembleModel.java
Outdated
Show resolved
Hide resolved
Core/src/main/java/org/tribuo/ensemble/WeightedEnsembleModel.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good, apart from a few questions/edits.
I'd like to entertain doing something better with the provenance than just taking the first model's feature domain. i.e., how involved would it be to create a DatasetUnionProvenance
? We don't seem to need the actual union here, though.
Co-authored-by: Jack Sullivan <john.t.sullivan@gmail.com>
Well the ensemble provenance contains each of the individual models' provenances, and those each contain their training dataset provenance. So the information is all there, but I needed to pick something to put in the top level model provenance and the first one seemed easiest. |
Core/src/main/java/org/tribuo/ensemble/WeightedEnsembleModel.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good based on our discussion with the above changes.
Co-authored-by: Jack Sullivan <john.t.sullivan@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Description
This PR adds a method to
WeightedEnsembleModel
calledcreateEnsembleFromExistingModels
which builds an ensemble out of the supplied model list, ensemble combiner and optionally model combination weights. It adds aTimestampTrainerProvenance
which is used as theTrainerProvenance
for any ensembles created this way. Finally it adds a couple of tests forVotingCombiner
andAveragingCombiner
along with a few tests of the new behaviour.The ensemble building mechanism validates that there are at least two models, that there are equal numbers of models and combination weights and that the output domains for all the supplied models are the same. We could consider adding a check that the feature domains are the same, or have some overlap, but it's plausible that text models trained using data subsampling have different views of the feature space and so I've left that check for future work.
Motivation
It's currently hard in Tribuo to ensemble together models of different types (e.g. an ensemble of a linear model, a decision tree and a kernel SVM). This kind of arbitrary ensembling is occasionally very powerful, but difficult to express via Tribuo's trainer mechanism, so it's simpler to allow users to build custom ensembles from models they already have.
Paper reference
Kuncheva, L. I. (2014). Combining pattern classifiers: methods and algorithms.