Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Change VotingClassifier estimators by set_params #7674

Merged
merged 12 commits into from Apr 10, 2017

Conversation

@yl565
Copy link
Contributor

@yl565 yl565 commented Oct 14, 2016

PR to #7288. Continuation of #7484

yl565 added 2 commits Oct 14, 2016
Use _BaseComposition as base
@jnothman
Copy link
Member

@jnothman jnothman commented Oct 15, 2016

You shouldn't need to open a new PR for a rebase. I'll try take a look at this over the coming weeks. Sorry for the slow reviews

@yl565 yl565 changed the title Change VotingClassifier estimators by set_params [MRG] Change VotingClassifier estimators by set_params Oct 17, 2016
@yl565
Copy link
Contributor Author

@yl565 yl565 commented Oct 17, 2016

@jnothman, I have just started using git, do you mind letting me know if the following procedure is correct for a rebase (assume origin is already up to date with upstream/master):

git checkout mybranch
git rebase master
git push origin master
@amueller
Copy link
Member

@amueller amueller commented Oct 17, 2016

Copy link
Member

@jnothman jnothman left a comment

Sorry this has taken a while to get to

@@ -44,7 +43,8 @@ class VotingClassifier(BaseEstimator, ClassifierMixin, TransformerMixin):
estimators : list of (string, estimator) tuples
Invoking the ``fit`` method on the ``VotingClassifier`` will fit clones
of those original estimators that will be stored in the class attribute
`self.estimators_`.
`self.estimators_`. An estimator can be set to `None` using

This comment has been minimized.

@jnothman

jnothman Nov 9, 2016
Member

these should use double-backticks.

isnone = np.array([1 if clf is None else 0
for _, clf in self.estimators])
if isnone.sum() == len(self.estimators):
raise ValueError('All estimators is None. At least one is required'

This comment has been minimized.

@jnothman

jnothman Nov 9, 2016
Member

'is None' -> 'are None'

@@ -161,11 +166,19 @@ def fit(self, X, y, sample_weight=None):

self.estimators_ = Parallel(n_jobs=self.n_jobs)(

This comment has been minimized.

@jnothman

jnothman Nov 9, 2016
Member

This change to estimators_ needs to be documented under Attributes


return self

@property
def _narej_weights(self):

This comment has been minimized.

@jnothman

jnothman Nov 9, 2016
Member

What is narej?

This comment has been minimized.

@yl565

yl565 Nov 9, 2016
Author Contributor

short for rejecting NaN, may be changed to _rej_nan_weights to be more clear?

This comment has been minimized.

@jnothman

jnothman Nov 9, 2016
Member

Confusing because there aren't any NaNs involved. How about _weights_not_none... or just remove this helper


__all__ = ['if_delegate_has_method']


class _BaseComposition(six.with_metaclass(ABCMeta, BaseEstimator)):

This comment has been minimized.

@jnothman

jnothman Nov 9, 2016
Member

You should be using this in Pipeline and FeatureUnion

This comment has been minimized.

@yl565

yl565 Nov 9, 2016
Author Contributor

Do you mean I should also modify Pipeline and FeatureUnion to use _BaseComposition?

This comment has been minimized.

eclf2.set_params(voting='soft').fit(X, y)
assert_array_equal(eclf1.predict(X), eclf2.predict(X))
assert_array_equal(eclf1.predict_proba(X), eclf2.predict_proba(X))
msg = ('All estimators is None. At least one is required'

This comment has been minimized.

@jnothman

jnothman Nov 9, 2016
Member

is -> are

eclf1.set_params(voting='soft').fit(X, y)
eclf2.set_params(voting='soft').fit(X, y)
assert_array_equal(eclf1.predict(X), eclf2.predict(X))
assert_array_equal(eclf1.predict_proba(X), eclf2.predict_proba(X))

This comment has been minimized.

@jnothman

jnothman Nov 9, 2016
Member

Please test soft transform. The outputs should differ between the 0-weight and None variants, though...

1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion``
2. Add tests of soft voting ``transform`` when one estimator is set to None
3. Add estimator name validation in ``_BaseComposition`` and tests
4. Other requested changes.
@yl565
Copy link
Contributor Author

@yl565 yl565 commented Nov 13, 2016

@jnothman I've made the requested changes. Also added estimator name validation and tests.

Copy link
Member

@jnothman jnothman left a comment

LGTM, thanks!

eclf2 = VotingClassifier(estimators=[('rf', clf2), ('nb', clf3)],
voting='soft', weights=[1, 0.5])
eclf2.set_params(rf=None).fit(X1, y1)
assert_array_equal(eclf1.transform(X1), np.array([[[0.7, 0.3], [0.3, 0.7]],

This comment has been minimized.

@jnothman

jnothman Nov 16, 2016
Member

Hmm. Looking at this makes me wonder whether we should be multiplying the outputs by the weight. Not an issue for this PR.

This comment has been minimized.

@MechCoder

MechCoder Apr 4, 2017
Member

Should we test the output of "hard-voting" and transform as well

self.voting = voting
self.weights = weights
self.n_jobs = n_jobs

@property
def named_estimators(self):

This comment has been minimized.

@jnothman

jnothman Nov 16, 2016
Member

I wish this didn't exist, but I know it's not your problem.

This comment has been minimized.

@amueller

amueller Nov 16, 2016
Member

You wish the property didn't exist? Or that wasn't a property but a function?

This comment has been minimized.

@jnothman

jnothman Nov 17, 2016
Member

I wish that we had not copied this bad design feature from pipeline!

@jnothman jnothman changed the title [MRG] Change VotingClassifier estimators by set_params [MRG+1] Change VotingClassifier estimators by set_params Nov 16, 2016
@jnothman
Copy link
Member

@jnothman jnothman commented Jan 9, 2017

Looking for a reviewer...

@jnothman
Copy link
Member

@jnothman jnothman commented Jan 9, 2017

This also needs merging with updated master.

@yl565
Copy link
Contributor Author

@yl565 yl565 commented Jan 21, 2017

@jnothman I merged it with updated master. Can you help me check why the appveyor build is cancelled?

@jnothman
Copy link
Member

@jnothman jnothman commented Jan 22, 2017

Build was cancelled because all our appveyors were stuck and were cancelled. Don't worry about it.

@prcastro
Copy link

@prcastro prcastro commented Feb 9, 2017

Anything preventing this from getting merged?

@jnothman
Copy link
Member

@jnothman jnothman commented Feb 9, 2017

Anything preventing this from getting merged?

A backlog and low reviewer availability. We require two approvals.

@jnothman
Copy link
Member

@jnothman jnothman commented Mar 8, 2017

Feel like reviewing this, @lesteve? @raghavrv?

Copy link
Member

@MechCoder MechCoder left a comment

Just some minor comments related to testing.

eclf2.set_params(nb=clf2).fit(X, y)
assert_array_equal(eclf1.predict(X), eclf2.predict(X))
assert_array_equal(eclf1.predict_proba(X), eclf2.predict_proba(X))

This comment has been minimized.

@MechCoder

MechCoder Apr 4, 2017
Member

Can you directly check

assert_equal(eclf2.estimators[0][1].get_params(), clf1.get_params())
assert_equal(eclf2.estimators[1][1].get_params(), clf2.get_params())

This comment has been minimized.

@MechCoder

MechCoder Apr 4, 2017
Member

(It might be possible that two different classifiers give the same predictions)

('nb', clf3)],
voting='hard', weights=[1, 1, 0.5])
eclf2.set_params(rf=None).fit(X, y)
assert_array_equal(eclf1.predict(X), eclf2.predict(X))

This comment has been minimized.

@MechCoder

MechCoder Apr 4, 2017
Member

Can you also check eclf2.estimators_, eclf2.estimators and eclf2.get_params()?

This comment has been minimized.

@MechCoder

MechCoder Apr 7, 2017
Member

I was suggesting to test the behaviour of eclf2.estimators, eclf2.estimators_ and ``eclf2.get_params()`.

assert_true(dict(eclf2.estimators)["rf"] is None)
assert_true(len(eclf2.estimators_) == 2)
assert_true(all([not isinstance(est, RandomForestClassifier)  for est in eclf2.estimators_])
assert_true(eclf2.get_params()["rf"] is None)
eclf2 = VotingClassifier(estimators=[('rf', clf2), ('nb', clf3)],
voting='soft', weights=[1, 0.5])
eclf2.set_params(rf=None).fit(X1, y1)
assert_array_equal(eclf1.transform(X1), np.array([[[0.7, 0.3], [0.3, 0.7]],

This comment has been minimized.

@MechCoder

MechCoder Apr 4, 2017
Member

Should we test the output of "hard-voting" and transform as well

eclf1.set_params(lr__C=10.0)
eclf2.set_params(nb__max_depth=5)

assert_true(eclf1.estimators[0][1].get_params()['C'] == 10.0)

This comment has been minimized.

@MechCoder

MechCoder Apr 4, 2017
Member

Should we also test the get_params() interface of the VotingClassifier directly?. More specifically, eclf1.get_params()["lr__C"], eclf1.get_params()["lr"].get_params("C")? The get_params() interface seems untested.

names, clfs = zip(*self.estimators)
self._validate_names(names)

isnone = np.array([1 if clf is None else 0

This comment has been minimized.

@MechCoder

MechCoder Apr 4, 2017
Member

nitpick:

isnone = np.sum([clf is None for _, clf in self.estimators])
@MechCoder
Copy link
Member

@MechCoder MechCoder commented Apr 4, 2017

Extremely sorry for the long wait @yl565 ! There seems to be two additions in this PR:

  1. Ability to substitute or set a classifier directly using the set_params interface.
  2. Ability to set the classifier to None or disabling it using set_params.

Given that in master, both these fail silently, should we document these as new features or bug-fixes?

@jnothman
Copy link
Member

@jnothman jnothman commented Apr 4, 2017

@MechCoder
Copy link
Member

@MechCoder MechCoder commented Apr 4, 2017

Ah I see why, in that case we should also test that

vclf.set_params(nb=clf2)
assert_false(hasattr(vclf, "nb")

Also, a weird corner case, but it might be a good idea to also test what happens when one sets estimators, the classifiers by themselves and the hyperparameters of the classifiers at once.

@@ -252,17 +270,13 @@ def transform(self, X):
else:
return self._predict(X)

def set_params(self, **params):

This comment has been minimized.

@MechCoder

MechCoder Apr 4, 2017
Member

Can you document this?

for key, value in six.iteritems(step.get_params(deep=True)):
out['%s__%s' % (name, key)] = value
return out
return super(VotingClassifier,

This comment has been minimized.

@MechCoder

MechCoder Apr 4, 2017
Member

Can you document this?

@yl565
Copy link
Contributor Author

@yl565 yl565 commented Apr 5, 2017

@MechCoder I added the tests and documentation. Could you please explain more on the following two of your comments?

Can you also check eclf2.estimators_, eclf2.estimators and eclf2.get_params()

Also,a weird corner case, but it might be a good idea to also test what happens when one sets estimators, the classifiers by themselves and the hyperparameters of the classifiers at once.

('nb', clf3)],
voting='hard', weights=[1, 1, 0.5])
eclf2.set_params(rf=None).fit(X, y)
assert_array_equal(eclf1.predict(X), eclf2.predict(X))

This comment has been minimized.

@MechCoder

MechCoder Apr 7, 2017
Member

I was suggesting to test the behaviour of eclf2.estimators, eclf2.estimators_ and ``eclf2.get_params()`.

assert_true(dict(eclf2.estimators)["rf"] is None)
assert_true(len(eclf2.estimators_) == 2)
assert_true(all([not isinstance(est, RandomForestClassifier)  for est in eclf2.estimators_])
assert_true(eclf2.get_params()["rf"] is None)
def set_params(self, **params):
""" Setting the parameters for the voting classifier
Valid parameter keys can be listed with get_params().

This comment has been minimized.

@MechCoder

MechCoder Apr 7, 2017
Member

Can you add under the get_params heading? I would just say "Get the parameters of the VotingClassifier". In addition, I would also document the parameter deep saying that setting it to True gets the various classifiers and the parameters of the classifiers as well.

Specific parameters using e.g. set_params(parameter_name=new_value)
Estimators can be removed by setting them to None. In the following
example, the RandomForestClassifier is removed:
clf1 = LogisticRegression()

This comment has been minimized.

@MechCoder

MechCoder Apr 7, 2017
Member

I am fairly sure that you have to add this under a side-heading "Examples" so that it renders properly

Parameters
----------
params: keyword arguments
Specific parameters using e.g. set_params(parameter_name=new_value)

This comment has been minimized.

@MechCoder

MechCoder Apr 7, 2017
Member

In addition, to setting the parameters of the VotingClassifier, (with doubleticks) the individual classifiers of the VotingClassifier can also be set or replaced by setting them to None.

@MechCoder
Copy link
Member

@MechCoder MechCoder commented Apr 7, 2017

That is it from me! Please add a whatsnew under Enhancements and rebase properly so that I can merge.

@MechCoder
Copy link
Member

@MechCoder MechCoder commented Apr 7, 2017

You just need to do an interactive rebase

git rebase -i master

It will spit out errors wherever there are merge conflicts. You would need to check wherever the following block is present.

<<<<<<< HEAD
some code
========
some code
>>>>>>>>

and decide how you would like to merge the two blocks. Then you do

git add name_of_file
git rebase --continue

and keep continuing.

@jnothman
Copy link
Member

@jnothman jnothman commented Apr 8, 2017

@MechCoder
Copy link
Member

@MechCoder MechCoder commented Apr 10, 2017

Travis is failing because of some cosmetic reasons. Could you fix that?

@yl565
Copy link
Contributor Author

@yl565 yl565 commented Apr 10, 2017

Thanks @MechCoder

@MechCoder MechCoder merged commit 194c231 into scikit-learn:master Apr 10, 2017
5 checks passed
5 checks passed
ci/circleci Your tests passed on CircleCI!
Details
@codecov
codecov/patch 98.4% of diff hit (target 95.5%)
Details
@codecov
codecov/project 95.51% (+<.01%) compared to 7d1e430
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@MechCoder
Copy link
Member

@MechCoder MechCoder commented Apr 10, 2017

Thanks you @yl565 !!

massich added a commit to massich/scikit-learn that referenced this pull request Apr 11, 2017
…cikit-learn#7674)

* PR to 7288
Use _BaseComposition as base

*  Fix flakes problem

* Change ``pipeline``, add more tests and other changes
1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion``
2. Add tests of soft voting ``transform`` when one estimator is set to None
3. Add estimator name validation in ``_BaseComposition`` and tests
4. Other requested changes.

* Remove the unused import warn

* Add more test and documentation

* resolve conflict with master

* Add testing cases and modify documentation

* Add to whats_new.rst

*  Fix too many blank lines
@prcastro
Copy link

@prcastro prcastro commented Apr 20, 2017

🎉

massich added a commit to massich/scikit-learn that referenced this pull request Apr 26, 2017
…cikit-learn#7674)

* PR to 7288
Use _BaseComposition as base

*  Fix flakes problem

* Change ``pipeline``, add more tests and other changes
1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion``
2. Add tests of soft voting ``transform`` when one estimator is set to None
3. Add estimator name validation in ``_BaseComposition`` and tests
4. Other requested changes.

* Remove the unused import warn

* Add more test and documentation

* resolve conflict with master

* Add testing cases and modify documentation

* Add to whats_new.rst

*  Fix too many blank lines
Sundrique added a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
…cikit-learn#7674)

* PR to 7288
Use _BaseComposition as base

*  Fix flakes problem

* Change ``pipeline``, add more tests and other changes
1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion``
2. Add tests of soft voting ``transform`` when one estimator is set to None
3. Add estimator name validation in ``_BaseComposition`` and tests
4. Other requested changes.

* Remove the unused import warn

* Add more test and documentation

* resolve conflict with master

* Add testing cases and modify documentation

* Add to whats_new.rst

*  Fix too many blank lines
dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
…cikit-learn#7674)

* PR to 7288
Use _BaseComposition as base

*  Fix flakes problem

* Change ``pipeline``, add more tests and other changes
1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion``
2. Add tests of soft voting ``transform`` when one estimator is set to None
3. Add estimator name validation in ``_BaseComposition`` and tests
4. Other requested changes.

* Remove the unused import warn

* Add more test and documentation

* resolve conflict with master

* Add testing cases and modify documentation

* Add to whats_new.rst

*  Fix too many blank lines
dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
…cikit-learn#7674)

* PR to 7288
Use _BaseComposition as base

*  Fix flakes problem

* Change ``pipeline``, add more tests and other changes
1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion``
2. Add tests of soft voting ``transform`` when one estimator is set to None
3. Add estimator name validation in ``_BaseComposition`` and tests
4. Other requested changes.

* Remove the unused import warn

* Add more test and documentation

* resolve conflict with master

* Add testing cases and modify documentation

* Add to whats_new.rst

*  Fix too many blank lines
NelleV added a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
…cikit-learn#7674)

* PR to 7288
Use _BaseComposition as base

*  Fix flakes problem

* Change ``pipeline``, add more tests and other changes
1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion``
2. Add tests of soft voting ``transform`` when one estimator is set to None
3. Add estimator name validation in ``_BaseComposition`` and tests
4. Other requested changes.

* Remove the unused import warn

* Add more test and documentation

* resolve conflict with master

* Add testing cases and modify documentation

* Add to whats_new.rst

*  Fix too many blank lines
paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
…cikit-learn#7674)

* PR to 7288
Use _BaseComposition as base

*  Fix flakes problem

* Change ``pipeline``, add more tests and other changes
1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion``
2. Add tests of soft voting ``transform`` when one estimator is set to None
3. Add estimator name validation in ``_BaseComposition`` and tests
4. Other requested changes.

* Remove the unused import warn

* Add more test and documentation

* resolve conflict with master

* Add testing cases and modify documentation

* Add to whats_new.rst

*  Fix too many blank lines
maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
…cikit-learn#7674)

* PR to 7288
Use _BaseComposition as base

*  Fix flakes problem

* Change ``pipeline``, add more tests and other changes
1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion``
2. Add tests of soft voting ``transform`` when one estimator is set to None
3. Add estimator name validation in ``_BaseComposition`` and tests
4. Other requested changes.

* Remove the unused import warn

* Add more test and documentation

* resolve conflict with master

* Add testing cases and modify documentation

* Add to whats_new.rst

*  Fix too many blank lines
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
…cikit-learn#7674)

* PR to 7288
Use _BaseComposition as base

*  Fix flakes problem

* Change ``pipeline``, add more tests and other changes
1. Use ``_BaseComposition`` in class ``Pipeline`` and ``FeatureUnion``
2. Add tests of soft voting ``transform`` when one estimator is set to None
3. Add estimator name validation in ``_BaseComposition`` and tests
4. Other requested changes.

* Remove the unused import warn

* Add more test and documentation

* resolve conflict with master

* Add testing cases and modify documentation

* Add to whats_new.rst

*  Fix too many blank lines
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants