Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Fix KernelCenterer requires square input but doesn't error on non-square #14336

Merged
merged 27 commits into from Jul 30, 2019

Conversation

gdex1
Copy link
Contributor

@gdex1 gdex1 commented Jul 13, 2019

Reference Issues/PRs

Fix #14240

What does this implement/fix? Explain your changes.

KernelCenterer.fit will now raise a ValueError when called on a non-square kernel matrix.

Added pairwise_estimator_convert_X to check_fit2d_1sample and check_transformer_data_not_an_array as described by #14240 (comment).

Added logical check X.T != X to assert_raises statement in _check_transformer. Previously, _check_transformer assumed that X.T had a different number of features than X. However, if pairwise_estimator_convert_X is called where transformer is pairwise, then X.T == X therefore no error ValueError will be raised so the check is invalid.

  • Adding pairwise_estimator_convert_X to check_transformer_general caused two tests to fail through _check_transformer assert_raises statement that "The transformer {} does not raise an error when the number of features in transform is different from the number of features in fit."

Any other comments?

Fitting KernelCenterer on an nxn matrix and transforming a kxk matrix where n!=k is not currently tested by any unit test. Doing this manually raises the errorValueError: operands could not be broadcast together with shapes (k,k) (n,) (k,k). _check_transformer could be modified to transpose X before pairwise_estimator_convert_X is called to make the test work on pairwise estimators. KernelCenterer would need to be modified to give a more useful error message.

@gdex1 gdex1 closed this Jul 13, 2019
@gdex1 gdex1 reopened this Jul 13, 2019
@gdex1
Copy link
Contributor Author

@gdex1 gdex1 commented Jul 13, 2019

Do the proposed changes in the additional comments section seem like the appropriate ways to handle these issues?

I'm new to open source so any feedback on my solutions/code/workflow/PR is welcome. Thanks.

sklearn/utils/estimator_checks.py Outdated Show resolved Hide resolved
@amueller
Copy link
Member

@amueller amueller commented Jul 13, 2019

Thanks. Your proposed solution makes sense but leaves that part of the code untested as I mentioned above so I suggested an alternative.

@gdex1
Copy link
Contributor Author

@gdex1 gdex1 commented Jul 18, 2019

I've made the change mentioned #14240 and added a test to check the change. I am not sure if I structured the test correctly so could someone check it?

I think that possibly some other tests and errors need to be added to KernelCenterer. Am I correct that a new issue would be a better place to discuss it?

@gdex1 gdex1 changed the title [WIP] Fix KernelCenterer requires square input but doesn't error on non-square [MRG] Fix KernelCenterer requires square input but doesn't error on non-square Jul 18, 2019
@adrinjalali
Copy link
Member

@adrinjalali adrinjalali commented Jul 19, 2019

It's kinda expected than when you add a test to the common tests, it detects issues with a couple of estimators that we were not detecting before. For the ones which are easy to fix, you can fix them, and for the rest, apply the changes you have done here, and we'll see together why the other tests are failing.

@gdex1
Copy link
Contributor Author

@gdex1 gdex1 commented Jul 19, 2019

I think everything is complete now. In total I added:
-Value error when KernelCenterer gets non-square input
-Unit test to check pairwise estimators raise an error on non-square input
-Value error when KNeighborsRegressor(metric='precomputed') get non square input

and modified some tests that gave non-square input to pairwise estimators

@adrinjalali
Copy link
Member

@adrinjalali adrinjalali commented Jul 22, 2019

Otherwise LGTM!

Copy link
Member

@adrinjalali adrinjalali left a comment

LGTM, thanks @gdex1

Copy link
Member

@jnothman jnothman left a comment

This is looking good. The nearest neighbour changes might be duplicated in #10482 but that's fine.

sklearn/utils/tests/test_estimator_checks.py Outdated Show resolved Hide resolved
sklearn/utils/tests/test_estimator_checks.py Outdated Show resolved Hide resolved
sklearn/utils/estimator_checks.py Outdated Show resolved Hide resolved
@jnothman
Copy link
Member

@jnothman jnothman commented Jul 24, 2019

Tests failing. I've not checked why

@gdex1
Copy link
Contributor Author

@gdex1 gdex1 commented Jul 24, 2019

@jnothman Hi. If I'm interpreting the failed tests correctly, the problem is that estimator_checks.py imports pytest for pytest.raises as described in #14336 (comment). I am not exactly sure how to fix this as other places in the repository seem to import pytest without causing an error.

Edit: Also I was wondering what the best way to test my code is before I push?

@jnothman
Copy link
Member

@jnothman jnothman commented Jul 25, 2019

Ahh right. For good or bad, we can't use pytest.raises in estimator_checks.py as it is considered public sklearn API and sklearn does not have a formal dependency on pytest (only its test suite does). So please revert to using assert_raise*

sklearn/utils/estimator_checks.py Outdated Show resolved Hide resolved
@gdex1 gdex1 changed the title [MRG] Fix KernelCenterer requires square input but doesn't error on non-square [MRG+1] Fix KernelCenterer requires square input but doesn't error on non-square Jul 26, 2019
Copy link
Member

@jnothman jnothman left a comment

Please add an entry to the change log at doc/whats_new/v0.22.rst under the Changes to Estimator Checks section. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:

@gdex1
Copy link
Contributor Author

@gdex1 gdex1 commented Jul 28, 2019

@jnothman I've added document changes, but I am not completely sure if it is correct. I also noticed that the modules are not in alphabetical order.

@amueller
Copy link
Member

@amueller amueller commented Jul 30, 2019

awesome, thanks!

@amueller amueller merged commit 86c5998 into scikit-learn:master Jul 30, 2019
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants