Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ColumnTransformer input feature name and count validation #14544

Merged
merged 10 commits into from Aug 7, 2019

Conversation

@adrinjalali
Copy link
Member

adrinjalali commented Aug 1, 2019

Fixes #14251.

This raises a deprecation warning if the X at the time of fit and transform don't match in terms of feature names or count. It implements option (1) of #14251 (comment)

@amueller

This comment has been minimized.

Copy link
Member

amueller commented Aug 1, 2019

now you only need to adjust the tests ;)

adrinjalali added 3 commits Aug 2, 2019
@adrinjalali

This comment has been minimized.

Copy link
Member Author

adrinjalali commented Aug 5, 2019

The failing test is test_sag_regressor_computed_correctly, not related.

@NicolasHug do you agree with this version?

@NicolasHug

This comment has been minimized.

Copy link
Contributor

NicolasHug commented Aug 5, 2019

My general thought is that we should have gone for a stricter version of #14237 and just error for any discrepancy match, as a bug fix.

But OK for doing a deprecation. I'm OK with this but we still need to error when n_features_fit < n_features_transform and negative indices are used. This is a bug that isn't caught right now, see #14251 (comment)

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Aug 5, 2019

I agree with all of that @NicolasHug. I lean a little more towards a deprecation.

@adrinjalali

This comment has been minimized.

Copy link
Member Author

adrinjalali commented Aug 6, 2019

I added a test for the case where the indexing is negative, I'd appreciate a close review on that part.

@adrinjalali

This comment has been minimized.

Copy link
Member Author

adrinjalali commented Aug 7, 2019

@amueller you fine with the changes?

Copy link
Contributor

NicolasHug left a comment

a few comments, other than that LGTM

X_array_fewer = np.array([[0, 1, 2], ]).T
err_msg = 'Number of features'
with pytest.raises(ValueError, match=err_msg):
ct.transform(X_array_fewer)
with pytest.warns(DeprecationWarning, match=msg):

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Aug 7, 2019

Contributor

Why warn when passing less features? I thought the only scenario we still accept is passing more features, not less?

This will be confusing for users since the message says it should not error until 0.24

with pytest.raises(ValueError, match=err_msg):
tf.transform(X_trans_df)
with pytest.warns(DeprecationWarning, match=warn_msg):

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Aug 7, 2019

Contributor

same here, why warn here since we error as well?


tf = ColumnTransformer([('bycol', Trans(), [0, -1])])
tf.fit(df_extra)
msg = "At least one negative column was used to"

This comment has been minimized.

Copy link
@NicolasHug

NicolasHug Aug 7, 2019

Contributor

can you also check that no warning is raised when n_features matches with negative indexing? Just in case.

@adrinjalali

This comment has been minimized.

Copy link
Member Author

adrinjalali commented Aug 7, 2019

@NicolasHug done :)

Copy link
Contributor

NicolasHug left a comment

Thanks Adrin, LGTM if all goes green

@amueller amueller merged commit da6614f into scikit-learn:master Aug 7, 2019
15 checks passed
15 checks passed
LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python No new or fixed alerts
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
scikit-learn.scikit-learn Build #20190807.35 succeeded
Details
scikit-learn.scikit-learn (Linux py35_conda_openblas) Linux py35_conda_openblas succeeded
Details
scikit-learn.scikit-learn (Linux py35_ubuntu_atlas) Linux py35_ubuntu_atlas succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_conda_mkl_pandas) Linux pylatest_conda_mkl_pandas succeeded
Details
scikit-learn.scikit-learn (Linux32 py35_ubuntu_atlas_32bit) Linux32 py35_ubuntu_atlas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py35_pip_openblas_32bit) Windows py35_pip_openblas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py37_conda_mkl) Windows py37_conda_mkl succeeded
Details
scikit-learn.scikit-learn (macOS pylatest_conda_mkl) macOS pylatest_conda_mkl succeeded
Details
@jnothman jnothman mentioned this pull request Oct 28, 2019
0 of 8 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.