New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EHN raise error with inconsistant transformer_weight in FeatureUnion #17876
EHN raise error with inconsistant transformer_weight in FeatureUnion #17876
Conversation
It looks like the two failed checks might be related to work in #17913. I changed the status of the PR to WIP until @thomasjpfan's fix gets merged and I can run again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think this should be warning or error? Are there cases where users may have exploited this behaviour intentionally?
Otherwise LGTM
I think that's a really good question, it was at the back of my mind too, so I'm glad you asked it. I'm struggling to think of a way that this could be exploited. I'm also inclined to think of it from the opposite perspective, is not throwing an error violating the contract that |
@jnothman, having thought about your question a bit more I'm thinking that a warning is the right way to go. The way the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @Ultramann !
Can we place the check in fit
? I would prefer new code to follow our own guidelines and validate in fit
.
@thomasjpfan, I can definitely move the validation to Also, regarding @jnothman's question above, are you good with a warning?
|
@thomasjpfan looking again I found |
For this PR, let's leave |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add an entry to the change log at doc/whats_new/v0.24.rst
with tag |Enhancement|. Like the other entries there, please reference this pull request with :pr:
and credit yourself (and other contributors if applicable) with :user:
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@jnothman could you take another look at this? |
I would advocate raising an error because nobody reading either the warning or the documentation :) |
@thomasjpfan did you think about any usecase? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, I am fine with the implementation
if not self.transformer_weights: | ||
return | ||
|
||
transformer_names = set(name for name, _ in self.transformer_list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second thought: I find it weird to make a set here. We should not have two transformers with the same name, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we should not have two transformers with the same name. But since transformer_names
is being used to solely to determine membership with if name not in transformer_names
on line 878, I thought a set would be more appropriate. Happy to change it to a list if that seems more intuitive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking a bit more about it, it will be more efficient with the in
operator afterwards. So let's use the set.
I agree raising an error would be better. |
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will merge when it is green
Thanks @Ultramann |
…cikit-learn#17876) Co-authored-by: Cary Goltermann <cgoltermann@kpmg.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Reference Issues/PRs
Fixes #17863.
What does this implement/fix? Explain your changes.
Passing a
transformer_weights
dictionary toFeatureUnion
with a key that isn't found intransformers_list
will cause the transformer to not be weighted. This is the correct, but silent behavior. Would like to warn the user when this happens.This PR adds new a new method
_validate_transformer_weights
, called at end ofFeatureUnion.__init__
. This method loops through all the names intransformer_weights
, raising a warning if it is not a name in thetransformer_list
.