Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Initial implementation of VotingRegressor. #390

Merged
merged 28 commits into from
Jan 25, 2020
Merged

Conversation

pieths
Copy link
Collaborator

@pieths pieths commented Dec 18, 2019

No description provided.

"""
Tells if the predictor depends on other predictors.
"""
return self._has_implicit_predictors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return self._has_implicit_predictors [](start = 8, length = 36)

Didnt get it. What is the purpose of this?

Copy link
Collaborator Author

@pieths pieths Dec 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method was added to the base class of predictors so that the code in Pipeline can determine which predictors have implicit predictors (the voting ensembles) and thus generate the graph in a different way.

This was done to avoid having if classname == "VotingRegressor" line in Pipeline.


In reply to: 361724371 [](ancestors = 361724371)

pass

def test_ensemble_supports_get_fit_info(self):
# TODO: fill this in
Copy link
Member

@ganik ganik Dec 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: fill this in [](start = 8, length = 20)

One more test where Pipeline has few transforms before VotingClassifier #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a unit test stub for that on line 91.


In reply to: 361724877 [](ancestors = 361724877)

elif label_column:
learner.label_column_name = label_column
elif y is None:
if label_column is None:
Copy link
Member

@ganik ganik Dec 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is crazy :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same code that was there before my change. The only modification I made was to unindent this code by 4 spaces to remove an unnecessary if statement.

I will look in to what can be done to clean this up. Though, I might put that in a different pull request.


In reply to: 361725035 [](ancestors = 361725035)

models_regressionensemble


class VotingRegressor(BasePredictor,
Copy link
Contributor

@justinormont justinormont Dec 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think VotingRegressor may be the wrong name. Voting is one option to combine model outputs for classification. One doesn't ensembe using voting for regression; they tend to use {median, average, weighted average, stacking}, with options for model selection of {outlier removal, diversity, best-performance}.

I might suggest naming as EnsembleRegressor, or RegressionEnsembler:

Suggested change
class VotingRegressor(BasePredictor,
class EnsembleRegressor(BasePredictor,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, if you make a stacked ensemble regression example/unit-test, I'd recommend a final model of LogisticRegression w/ non-negative set. The non-negative helps by ensuring you don't bet against the sub-models: dotnet/machinelearning#1651 (comment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name VotingRegressor was chosen to match the class in scikit-learn. See VotingRegressor.

Though, everyone that has seen these changes has made similar comments about the name. Perhaps it would be best to change it.

There is already an EnsembleRegressor class in NimbusML (which is actually being updated to offer the same functionality as this class; this class is just a temporary class to add support for multi-predictor ensembling functionality until the existing class modifications are completed). RegressionEnsembler seems like it might be too similar to EnsembleRegressor and might lead to confusion.

@justinormont Thanks for your input. I will continue looking for other names.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a decision on naming?

Copy link
Member

@ganik ganik Jan 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have decided to keep the naming. Arguments are:

  1. EnsembleRegressor already exists in NimbusML (where its restricted to the only base learner Online Gradient Descent)
  2. To be on par with scikit-learn where its named also VotingRegressor.

Once we are able to fix existing EnsembleRegressor to take any base learner we would obsolete this VotingRegressor in future.


In reply to: 370876925 [](ancestors = 370876925)

@pieths
Copy link
Collaborator Author

pieths commented Jan 15, 2020

The cross validation additions should be refactored if/when the Pipeline/EntryPoint/Graph code is refactored.

@pieths pieths changed the title [WIP] Initial implementation of VotingRegressor. Initial implementation of VotingRegressor. Jan 17, 2020
@@ -502,21 +508,39 @@ def fit(
implicit_nodes)

# Add learner node
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls describe how CV is split when VotingEnsemble is used

Combine regression models into an ensemble

:param estimators: The predictors to combine into an ensemble.
:param model_combiner: The combiner used to combine the scores.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:param [](start = 3, length = 7)

Pls add bit more description and point to sklearn VotingRegressor for more info

@pieths pieths merged commit d08b702 into microsoft:master Jan 25, 2020
@pieths pieths deleted the ensemble branch January 25, 2020 00:42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants