TST factorize some common test of ensemble of heterogeneous es… #15387

glemaitre · 2019-10-28T23:07:55Z

Reference Issues/PRs

A follow-up to #15375
Include a test to illustrate non-consistency between Stacking and Voting

TODO:

Decide what should be the correct API (Probably modify Stacking to report dropped estimator in named_estimators_)

What does this implement/fix? Explain your changes.

We merged #15375 but included a regression/inconsistent between Stacking and Voting estimators.
The documentation is not straightforward but estimators_ does not include the dropped estimators. For Stacking estimators named_estimators_ expose the same semantic while Voting estimators report the dropped estimator with the 'drop' values.

Any other comments?

glemaitre · 2019-10-28T23:18:37Z

Looking at the voting tests, it seems that Voting.named_estimators_ is expected to report dropped estimators, meaning that we should change the Stacking estimators instead of the Voting.

ping @jnothman @amueller @NicolasHug @thomasjpfan @qinhanmin2014 Could you confirm that we have to change the Stacking?

jnothman · 2019-10-29T01:30:44Z

By "expected to report dropped estimators" do you mean named_estimators_['dropped'] == 'drop'? This seems reasonable to me? Either way seems reasonable really, but conforming to existing API is best?

thomasjpfan · 2019-10-29T02:52:01Z

I have a preference for the interface in ColumnTransformer, where the named_transformer_ maps to the 'drop' and transformers_ matches transformers when it comes to 'drop'.

In the case of Stacking* and Voting*, I would prefer estimators_ to contain 'drop' and named_estimators_ to map to 'drop' when the estimator is dropped.

glemaitre · 2019-10-29T09:38:06Z

I did not consider the ColumnTransformer. It would make sense to have similar behaviour.

So we should be able to map 'drop' to a named estimator without breaking the backward compatibility. In this case, we keep #15375 and change the Stacking* to have the same behaviour.

For estimators_, this is more tricky. We would break the back-compatibility to introduce 'drop' in the list of estimators for the Voting* (so we need a change of behaviour there). We can change Stacking* since this is only in master.

So we need:

change named_estimators_ in Stacking*
change estimators_ in Stacking* and Voting* + announcing a change of behaviour for the latest.

This PR could focus on 1.

WDYT?

NicolasHug · 2019-10-29T13:39:36Z

I have a preference for the interface in ColumnTransformer, where the named_transformer_ maps to the 'drop' and transformers_ matches transformers when it comes to 'drop'.

Reading the code, that's not true.

transformers_ ignores the dropped estimators, and named_transformer_ too:

        return Bunch(**{name: trans for name, trans, _
                        in self.transformers_})

glemaitre · 2019-10-29T13:45:08Z

@NicolasHug Are you sure?

In [5]: import numpy as np 
   ...: from sklearn.compose import ColumnTransformer 
   ...: from sklearn.preprocessing import Normalizer 
   ...: ct = ColumnTransformer( 
   ...:     [("norm1", Normalizer(norm='l1'), [0, 1]), 
   ...:      ("norm2", Normalizer(norm='l1'), slice(2, 4))]) 
   ...: X = np.array([[0., 1., 2., 2.], 
   ...:               [1., 1., 0., 1.]]) 
   ...: # Normalizer scales each row of X to unit norm. A separate scaling 
   ...: # is applied for the two first and two last elements of each 
   ...: # row independently. 
   ...: ct.fit_transform(X)     
   ...:                                                                              
Out[5]: 
array([[0. , 1. , 0.5, 0.5],
       [0.5, 0.5, 0. , 1. ]])

In [6]: ct.set_params(norm1='drop')                                                  
Out[6]: 
ColumnTransformer(n_jobs=None, remainder='drop', sparse_threshold=0.3,
                  transformer_weights=None,
                  transformers=[('norm1', 'drop', [0, 1]),
                                ('norm2', Normalizer(copy=True, norm='l1'),
                                 slice(2, 4, None))],
                  verbose=False)

In [7]: ct.fit_transform(X)                                                          
Out[7]: 
array([[0.5, 0.5],
       [0. , 1. ]])

In [8]: ct.transformers_                                                             
Out[8]: 
[('norm1', 'drop', [0, 1]),
 ('norm2', Normalizer(copy=True, norm='l1'), slice(2, 4, None))]

In [9]: ct.named_transformers_                                                       
Out[9]: {'norm1': 'drop', 'norm2': Normalizer(copy=True, norm='l1')}

norm1 is reported in both even if dropped.

glemaitre · 2019-10-29T13:58:13Z

We also have another issues with estimators_ if we want to make it behave as transformers_.

transformers_ returns a list of tuple (similar to transfomers). However, estimators_ is only a list of fitted estimators and not a tuple (name, estimator).

NicolasHug · 2019-10-29T14:09:48Z

Sorry on my phone rn so can't double check but both the code comments and the docstrings suggest otherwise. Docstrings say "fitted estimator" which inpoes9the dropped estimators aren't there

glemaitre · 2019-10-29T14:17:59Z

I am confused now. If you refer to the transformers_ the documentation mentioned the following:

fitted_transformer can be an estimator, ‘drop’, or ‘passthrough’

If you refer to the estimators_ then I agree. That's why, I am asking if we should go toward making things with a similar semantic (list of tuple instead of just estimator).

NicolasHug · 2019-10-29T15:47:09Z

Sorry, read too fast. I was confused that a 'drop' qualifies as a "fitted estimator" Ignore my comments ^^

thomasjpfan · 2019-10-30T01:41:04Z

I am okay with changing Stack* to include dropped estimators in Stacking*.

glemaitre · 2019-10-30T11:33:54Z

So I added the dropped estimator in named_estimators_ for stacking. I factorize the tests in common at the same time. I am unsure that there is a need for a what's new here since this maintenance and changing the behavior of an estimator which is not released yet.

We can have further discussion regarding what to do with estimators_ in Voting* and Stackign*, later on. This is not a release-critical issue apart from that we will have to deprecate in both classes if we want to make a change.

ogrisel · 2019-10-30T14:23:29Z

there is a need for a what's new here since this maintenance and changing the behavior of an estimator which is not released yet.

No need to update the changelog for this PR.

glemaitre · 2019-11-04T10:47:20Z

ping @NicolasHug @jnothman @thomasjpfan

jnothman

I like this, thank you.

sklearn/ensemble/tests/test_common.py

Co-Authored-By: Joel Nothman <joel.nothman@gmail.com>

thomasjpfan

LGTM

thomasjpfan · 2019-11-06T15:08:50Z

Merged with master to make sure the docs are okay (the lint error comes from the renaming of the flake8_diff.sh file on master).

Will merge when tests pass.

glemaitre · 2019-11-07T14:53:26Z

Thanks for merging

thomasjpfan · 2019-11-07T20:03:49Z

Thanks you for the PR! @glemaitre

TST add common test for ensemble of heterogeneous estimators

f5a276c

glemaitre mentioned this pull request Oct 28, 2019

[MRG] BUG Fixes voting named_estiamtor bug #15375

Merged

glemaitre added this to the 0.22 milestone Oct 28, 2019

glemaitre added the Blocker label Oct 28, 2019

glemaitre added 2 commits October 30, 2019 10:48

iter

552797e

iter

84a1d3f

glemaitre changed the title ~~TST add common test for ensemble of heterogeneous estimators~~ TST factorize some common test of ensemble of heterogeneous estimators Oct 30, 2019

PEP8

2059ee8

jnothman approved these changes Nov 5, 2019

View reviewed changes

sklearn/ensemble/tests/test_common.py Outdated Show resolved Hide resolved

glemaitre and others added 2 commits November 6, 2019 08:57

Update sklearn/ensemble/tests/test_common.py

7492dee

Co-Authored-By: Joel Nothman <joel.nothman@gmail.com>

Update test_common.py

833ed68

thomasjpfan approved these changes Nov 6, 2019

View reviewed changes

Merge remote-tracking branch 'upstream/master' into pr/15387

7aa5395

thomasjpfan changed the title ~~TST factorize some common test of ensemble of heterogeneous estimators~~ TST factorize some common test of ensemble of heterogeneous es… Nov 6, 2019

thomasjpfan merged commit 1578132 into scikit-learn:master Nov 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TST factorize some common test of ensemble of heterogeneous es… #15387

TST factorize some common test of ensemble of heterogeneous es… #15387

glemaitre commented Oct 28, 2019 •

edited

glemaitre commented Oct 28, 2019

jnothman commented Oct 29, 2019

thomasjpfan commented Oct 29, 2019

glemaitre commented Oct 29, 2019

NicolasHug commented Oct 29, 2019

glemaitre commented Oct 29, 2019

glemaitre commented Oct 29, 2019

NicolasHug commented Oct 29, 2019

glemaitre commented Oct 29, 2019

NicolasHug commented Oct 29, 2019 •

edited

thomasjpfan commented Oct 30, 2019

glemaitre commented Oct 30, 2019

ogrisel commented Oct 30, 2019

glemaitre commented Nov 4, 2019

jnothman left a comment

thomasjpfan left a comment

thomasjpfan commented Nov 6, 2019

glemaitre commented Nov 7, 2019

thomasjpfan commented Nov 7, 2019

TST factorize some common test of ensemble of heterogeneous es… #15387

TST factorize some common test of ensemble of heterogeneous es… #15387

Conversation

glemaitre commented Oct 28, 2019 • edited

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

glemaitre commented Oct 28, 2019

jnothman commented Oct 29, 2019

thomasjpfan commented Oct 29, 2019

glemaitre commented Oct 29, 2019

NicolasHug commented Oct 29, 2019

glemaitre commented Oct 29, 2019

glemaitre commented Oct 29, 2019

NicolasHug commented Oct 29, 2019

glemaitre commented Oct 29, 2019

NicolasHug commented Oct 29, 2019 • edited

thomasjpfan commented Oct 30, 2019

glemaitre commented Oct 30, 2019

ogrisel commented Oct 30, 2019

glemaitre commented Nov 4, 2019

jnothman left a comment

Choose a reason for hiding this comment

thomasjpfan left a comment

Choose a reason for hiding this comment

thomasjpfan commented Nov 6, 2019

glemaitre commented Nov 7, 2019

thomasjpfan commented Nov 7, 2019

glemaitre commented Oct 28, 2019 •

edited

NicolasHug commented Oct 29, 2019 •

edited