Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EHN raise error with inconsistant transformer_weight in FeatureUnion #17876

Merged
merged 18 commits into from Aug 25, 2020
Merged

EHN raise error with inconsistant transformer_weight in FeatureUnion #17876

merged 18 commits into from Aug 25, 2020

Conversation

Ultramann
Copy link
Contributor

@Ultramann Ultramann commented Jul 9, 2020

Reference Issues/PRs

Fixes #17863.

What does this implement/fix? Explain your changes.

Passing a transformer_weights dictionary to FeatureUnion with a key that isn't found in transformers_list will cause the transformer to not be weighted. This is the correct, but silent behavior. Would like to warn the user when this happens.

This PR adds new a new method _validate_transformer_weights, called at end of FeatureUnion.__init__. This method loops through all the names in transformer_weights, raising a warning if it is not a name in the transformer_list.

@Ultramann Ultramann marked this pull request as draft July 9, 2020 20:35
@Ultramann Ultramann changed the title [WIP] Validate transformer_weight names in FeatureUnion [MRG] Validate transformer_weight names in FeatureUnion Jul 10, 2020
@Ultramann Ultramann marked this pull request as ready for review July 10, 2020 13:48
@Ultramann Ultramann changed the title [MRG] Validate transformer_weight names in FeatureUnion [WIP] Validate transformer_weight names in FeatureUnion Jul 13, 2020
@Ultramann
Copy link
Contributor Author

It looks like the two failed checks might be related to work in #17913. I changed the status of the PR to WIP until @thomasjpfan's fix gets merged and I can run again.

@Ultramann Ultramann changed the title [WIP] Validate transformer_weight names in FeatureUnion [MRG] Validate transformer_weight names in FeatureUnion Jul 15, 2020
Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this should be warning or error? Are there cases where users may have exploited this behaviour intentionally?

Otherwise LGTM

@Ultramann
Copy link
Contributor Author

Ultramann commented Jul 16, 2020

I think that's a really good question, it was at the back of my mind too, so I'm glad you asked it. I'm struggling to think of a way that this could be exploited. I'm also inclined to think of it from the opposite perspective, is not throwing an error violating the contract that FeatureUnion exposes. I don't think this is the case, FeatureUnion say's it will weight a transformer by name, "Keys are transformer names" is in the doc string. If you pass the wrong name then the transformer will/should not be weighted. While this points to throwing a warning, it isn't quite enough of a reason to choose not throwing an error. If you tried to look up the wrong key in a dictionary, you'd get a key error. So maybe what we want is behavior similar to that. I'm curious if you, @jnothman, or anyone else has an opinion.

@Ultramann
Copy link
Contributor Author

@jnothman, having thought about your question a bit more I'm thinking that a warning is the right way to go. The way the FeatureUnion's doc string is written makes me think that there's no reason to expect an error if a transformer is referenced by the wrong name in the weights argument.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @Ultramann !

Can we place the check in fit? I would prefer new code to follow our own guidelines and validate in fit.

@Ultramann
Copy link
Contributor Author

Ultramann commented Jul 26, 2020

@thomasjpfan, I can definitely move the validation to fit. Should I also move _validate_transformers while I'm at it? Maybe not since that's out of the scope of this PR?

Also, regarding @jnothman's question above, are you good with a warning?

Do you think this should be warning or error?

@Ultramann
Copy link
Contributor Author

@thomasjpfan looking again I found _validate_transformers in the _parallel_func method. So I moved _validate_transformer_weights there too.

@thomasjpfan
Copy link
Member

Should I also move _validate_transformers while I'm at it?

For this PR, let's leave _validate_transformers alone (in __init__)

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an entry to the change log at doc/whats_new/v0.24.rst with tag |Enhancement|. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

sklearn/pipeline.py Outdated Show resolved Hide resolved
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Ultramann
Copy link
Contributor Author

@jnothman could you take another look at this?

@glemaitre
Copy link
Contributor

I would advocate raising an error because nobody reading either the warning or the documentation :)
More seriously, the docstring mentioned that Keys are transformer names. We could add that if the key is not a correct name then we will raise an error. I don't see the use-case where it would be meaningful to pass an inappropriate (or unknown) name.

@glemaitre
Copy link
Contributor

@thomasjpfan did you think about any usecase?

sklearn/pipeline.py Outdated Show resolved Hide resolved
Copy link
Contributor

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, I am fine with the implementation

sklearn/pipeline.py Outdated Show resolved Hide resolved
if not self.transformer_weights:
return

transformer_names = set(name for name, _ in self.transformer_list)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second thought: I find it weird to make a set here. We should not have two transformers with the same name, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should not have two transformers with the same name. But since transformer_names is being used to solely to determine membership with if name not in transformer_names on line 878, I thought a set would be more appropriate. Happy to change it to a list if that seems more intuitive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking a bit more about it, it will be more efficient with the in operator afterwards. So let's use the set.

@thomasjpfan
Copy link
Member

@thomasjpfan did you think about any usecase?

I agree raising an error would be better.

sklearn/pipeline.py Outdated Show resolved Hide resolved
sklearn/pipeline.py Outdated Show resolved Hide resolved
doc/whats_new/v0.24.rst Outdated Show resolved Hide resolved
sklearn/tests/test_pipeline.py Outdated Show resolved Hide resolved
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
sklearn/tests/test_pipeline.py Outdated Show resolved Hide resolved
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
@glemaitre glemaitre changed the title [MRG] Validate transformer_weight names in FeatureUnion EHN raise error with inconsistant transformer_weight in FeatureUnion Aug 25, 2020
Copy link
Contributor

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will merge when it is green

@glemaitre glemaitre merged commit a0a76fc into scikit-learn:master Aug 25, 2020
@glemaitre
Copy link
Contributor

Thanks @Ultramann

jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020
…cikit-learn#17876)

Co-authored-by: Cary Goltermann <cgoltermann@kpmg.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FeatureUnion silently ignores unknown transformer weights
4 participants