Adds a way to create an ensemble out of existing models #129

Craigacp · 2021-04-07T02:28:19Z

Description

This PR adds a method to WeightedEnsembleModel called createEnsembleFromExistingModels which builds an ensemble out of the supplied model list, ensemble combiner and optionally model combination weights. It adds a TimestampTrainerProvenance which is used as the TrainerProvenance for any ensembles created this way. Finally it adds a couple of tests for VotingCombiner and AveragingCombiner along with a few tests of the new behaviour.

The ensemble building mechanism validates that there are at least two models, that there are equal numbers of models and combination weights and that the output domains for all the supplied models are the same. We could consider adding a check that the feature domains are the same, or have some overlap, but it's plausible that text models trained using data subsampling have different views of the feature space and so I've left that check for future work.

Motivation

It's currently hard in Tribuo to ensemble together models of different types (e.g. an ensemble of a linear model, a decision tree and a kernel SVM). This kind of arbitrary ensembling is occasionally very powerful, but difficult to express via Tribuo's trainer mechanism, so it's simpler to allow users to build custom ensembles from models they already have.

Paper reference

Kuncheva, L. I. (2014). Combining pattern classifiers: methods and algorithms.

…d arbitrary ensembles from existing models.

…leModel.createEnsembleFromExistingModels.

…CombinerTest.

Core/src/main/java/org/tribuo/ensemble/WeightedEnsembleModel.java

JackSullivan

Mostly looks good, apart from a few questions/edits.

I'd like to entertain doing something better with the provenance than just taking the first model's feature domain. i.e., how involved would it be to create a DatasetUnionProvenance? We don't seem to need the actual union here, though.

Co-authored-by: Jack Sullivan <john.t.sullivan@gmail.com>

Craigacp · 2021-04-13T20:35:06Z

Well the ensemble provenance contains each of the individual models' provenances, and those each contain their training dataset provenance. So the information is all there, but I needed to pick something to put in the top level model provenance and the first one seemed easiest.

Core/src/main/java/org/tribuo/ensemble/WeightedEnsembleModel.java

JackSullivan

Looks good based on our discussion with the above changes.

Co-authored-by: Jack Sullivan <john.t.sullivan@gmail.com>

JackSullivan

Looks good to me.

Craigacp added 3 commits April 5, 2021 22:19

Adding a factory method to WeightedEnsembleModel allowing you to buil…

8745506

…d arbitrary ensembles from existing models.

Adding tests for VotingCombiner, AveragingCombiner and WeightedEnsemb…

d0ffa8c

…leModel.createEnsembleFromExistingModels.

Adding a model serialisation test to AveragingCombinerTest and Voting…

3b46769

…CombinerTest.

Craigacp added the Oracle employee This PR is from an Oracle employee label Apr 13, 2021

JackSullivan reviewed Apr 13, 2021

View reviewed changes

Core/src/main/java/org/tribuo/ensemble/WeightedEnsembleModel.java Outdated Show resolved Hide resolved

JackSullivan reviewed Apr 13, 2021

View reviewed changes

Core/src/main/java/org/tribuo/ensemble/WeightedEnsembleModel.java Outdated Show resolved Hide resolved

JackSullivan reviewed Apr 13, 2021

View reviewed changes

Core/src/main/java/org/tribuo/ensemble/WeightedEnsembleModel.java Outdated Show resolved Hide resolved

JackSullivan reviewed Apr 13, 2021

View reviewed changes

Update Core/src/main/java/org/tribuo/ensemble/WeightedEnsembleModel.java

da1b239

Co-authored-by: Jack Sullivan <john.t.sullivan@gmail.com>

Docs changes for WeightedEnsembleModel.

86f8da5

JackSullivan reviewed Apr 14, 2021

View reviewed changes

Core/src/main/java/org/tribuo/ensemble/WeightedEnsembleModel.java Show resolved Hide resolved

JackSullivan reviewed Apr 14, 2021

View reviewed changes

Core/src/main/java/org/tribuo/ensemble/WeightedEnsembleModel.java Outdated Show resolved Hide resolved

JackSullivan reviewed Apr 14, 2021

View reviewed changes

Apply suggestions from code review

2aca521

Co-authored-by: Jack Sullivan <john.t.sullivan@gmail.com>

JackSullivan approved these changes Apr 14, 2021

View reviewed changes

Craigacp merged commit 0adf555 into oracle:main Apr 14, 2021

Craigacp deleted the ensemble-all-the-things branch April 14, 2021 16:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds a way to create an ensemble out of existing models #129

Adds a way to create an ensemble out of existing models #129

Craigacp commented Apr 7, 2021

JackSullivan left a comment

Craigacp commented Apr 13, 2021

JackSullivan left a comment

JackSullivan left a comment

Adds a way to create an ensemble out of existing models #129

Adds a way to create an ensemble out of existing models #129

Conversation

Craigacp commented Apr 7, 2021

Description

Motivation

Paper reference

JackSullivan left a comment

Choose a reason for hiding this comment

Craigacp commented Apr 13, 2021

JackSullivan left a comment

Choose a reason for hiding this comment

JackSullivan left a comment

Choose a reason for hiding this comment