Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH add support for 'passthrough' in FeatureUnion #20860

Merged
merged 22 commits into from Sep 14, 2021

Conversation

shubhraneel
Copy link
Contributor

@shubhraneel shubhraneel commented Aug 27, 2021

Reference Issues/PRs

Fixes #20699

What does this implement/fix? Explain your changes.

  • Adds a feature to 'passthrough' to pass the features without a transformation and added test and documentation for it.
  • Changes made in FeatureUnion._validate_transformers() to allow 'passthrough'
  • Changes made in FeatureUnion._iter() to use FunctionTransformer(none) in place of passthrough to represent the identity transformer.
  • Added FeatureUnion._passthrough_function() which returns a FunctionTransformer(none) with get_feature_names set. This separate function is needed as the FunctionTransformer class does not have a get_feature_names function which is needed by FeatureUnion.
  • Wrote a test function test_set_feature_union_passthrough() to test the functionality.

Any other comments?

It may be better to check if more than one transformers are not set to passthrough as having a repetition of the same features is redundant.

@shubhraneel shubhraneel changed the title Passthrough feature union Support 'passthrough' in FeatureUnion Aug 27, 2021
sklearn/pipeline.py Outdated Show resolved Hide resolved
sklearn/pipeline.py Outdated Show resolved Hide resolved
sklearn/pipeline.py Outdated Show resolved Hide resolved
sklearn/tests/test_pipeline.py Outdated Show resolved Hide resolved
@shubhraneel
Copy link
Contributor Author

@amueller I have added these new changes and I removed the get_feature_names from passthrough as it seems unnecessary. If you suggest get_feature_names could be useful for passthrough, then I will implement it.

@shubhraneel
Copy link
Contributor Author

@amueller I am not able to understand how shall I fix this Check Changelog check. Could you please help me regarding this?

@glemaitre
Copy link
Member

Please add an entry to the change log at doc/whats_new/v*.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

@shubhraneel
Copy link
Contributor Author

@glemaitre thanks. I did that.

@glemaitre glemaitre self-requested a review September 2, 2021 15:04
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the test, we should add a case where we passthrough features but with a weight to check that the features are still transformed by the weight.

sklearn/pipeline.py Outdated Show resolved Hide resolved
sklearn/pipeline.py Outdated Show resolved Hide resolved
sklearn/pipeline.py Outdated Show resolved Hide resolved
sklearn/tests/test_pipeline.py Outdated Show resolved Hide resolved
assert_array_equal(X, ft.fit_transform(X)[:, -columns:])
assert not record

pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this line

@glemaitre glemaitre changed the title Support 'passthrough' in FeatureUnion ENH add support for 'passthrough' in FeatureUnion Sep 2, 2021
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass. Indeed we probably need additional tests to cover the interaction of "passthrough" with get_feature_names.

Note that this will be impacted by #18444 once merged.

doc/whats_new/v1.0.rst Outdated Show resolved Hide resolved
sklearn/pipeline.py Outdated Show resolved Hide resolved
sklearn/tests/test_pipeline.py Outdated Show resolved Hide resolved
sklearn/tests/test_pipeline.py Outdated Show resolved Hide resolved
sklearn/tests/test_pipeline.py Outdated Show resolved Hide resolved
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shubhraneel Would you be able to finish this PR?

doc/whats_new/v1.0.rst Outdated Show resolved Hide resolved
shubhraneel and others added 4 commits September 8, 2021 21:24
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@shubhraneel
Copy link
Contributor Author

@glemaitre Yes, I will be able to finish this PR

@shubhraneel
Copy link
Contributor Author

@glemaitre I have addressed all the review points. And, I have added the passthrough with transform_weights in test_pipeline too.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @shubhraneel !

Tests look good. Some comments on the implementation.

sklearn/pipeline.py Outdated Show resolved Hide resolved
sklearn/pipeline.py Outdated Show resolved Hide resolved
shubhraneel and others added 2 commits September 13, 2021 09:04
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thomasjpfan thomasjpfan merged commit f309ffb into scikit-learn:main Sep 14, 2021
@glemaitre glemaitre mentioned this pull request Oct 23, 2021
10 tasks
samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021
Co-authored-by: shubhraneel <shubhraneel@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support "passthrough" in FeatureUnion
5 participants