New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG + 1] ENH Ensure PCA and randomized_svd_low_rank don't upcast float to double #9067

Merged
merged 8 commits into from Jun 21, 2017

Conversation

Projects
None yet
7 participants
@raghavrv
Member

raghavrv commented Jun 8, 2017

Refer #8769

This will ensure float32 is not upcast to float64 in `randomized_

cc: @ogrisel @massich @GaelVaroquaux

@massich

This comment has been minimized.

Show comment
Hide comment
@massich

massich Jun 8, 2017

Contributor

LGTM.

@raghavrv should we do something similar here
assert_almost_equal(lr_32.coef_, lr_64.coef_, decimal=5) ?

Contributor

massich commented Jun 8, 2017

LGTM.

@raghavrv should we do something similar here
assert_almost_equal(lr_32.coef_, lr_64.coef_, decimal=5) ?

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Jun 9, 2017

Member

Can you expand this PR to add a test to PCA that, with the different solvers, it doesn't upcast float32 to float64.

Member

GaelVaroquaux commented Jun 9, 2017

Can you expand this PR to add a test to PCA that, with the different solvers, it doesn't upcast float32 to float64.

@@ -99,38 +99,50 @@ def test_logsumexp():
assert_array_almost_equal(np.exp(logsumexp(logX, axis=1)), X.sum(axis=1))
def test_randomized_svd_low_rank():
def check_randomized_svd_low_rank(dtype):

This comment has been minimized.

@TomDLT

TomDLT Jun 9, 2017

Member

nitpick: I prefer to keep the word test instead of check.
It is a unit test, not a input check inside the code.

@TomDLT

TomDLT Jun 9, 2017

Member

nitpick: I prefer to keep the word test instead of check.
It is a unit test, not a input check inside the code.

This comment has been minimized.

@raghavrv

raghavrv Jun 9, 2017

Member

It was renamed from test to check so it doesn't run as a separate test by itself. There is a test function below which will use this function to test different dtypes.

@raghavrv

raghavrv Jun 9, 2017

Member

It was renamed from test to check so it doesn't run as a separate test by itself. There is a test function below which will use this function to test different dtypes.

This comment has been minimized.

@TomDLT

TomDLT Jun 9, 2017

Member

It is not run when there is an undefined parameter, isn't it?

@TomDLT

TomDLT Jun 9, 2017

Member

It is not run when there is an undefined parameter, isn't it?

This comment has been minimized.

@raghavrv

raghavrv Jun 9, 2017

Member

Oh I didn't know that... Thanks!

@raghavrv

raghavrv Jun 9, 2017

Member

Oh I didn't know that... Thanks!

This comment has been minimized.

@raghavrv

raghavrv Jun 9, 2017

Member

humm doesn't seem so. both pytest and nose raise error if I rename it back to test. Basically it seems to run whenever the function name has test in it.

@raghavrv

raghavrv Jun 9, 2017

Member

humm doesn't seem so. both pytest and nose raise error if I rename it back to test. Basically it seems to run whenever the function name has test in it.

This comment has been minimized.

@ogrisel

ogrisel Jun 9, 2017

Member

It's a convention in the projet to prefix by check_ helper functions that are called by test_ generators (with the yield syntax).

@ogrisel

ogrisel Jun 9, 2017

Member

It's a convention in the projet to prefix by check_ helper functions that are called by test_ generators (with the yield syntax).

This comment has been minimized.

@TomDLT

TomDLT Jun 9, 2017

Member

ok then

@TomDLT

TomDLT Jun 9, 2017

Member

ok then

@raghavrv raghavrv changed the title from [MRG] ENH Ensure randomized_svd_low_rank doesn't upcast float to double to [MRG] ENH Ensure PCA and randomized_svd_low_rank doesn't upcast float to double Jun 9, 2017

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Jun 9, 2017

Member

@raghavrv should we do something similar here

sorry I didn't respond here. as discussed IRL, yeah this seems cleaner than astype + comparison.

Member

raghavrv commented Jun 9, 2017

@raghavrv should we do something similar here

sorry I didn't respond here. as discussed IRL, yeah this seems cleaner than astype + comparison.

@ogrisel ogrisel changed the title from [MRG] ENH Ensure PCA and randomized_svd_low_rank doesn't upcast float to double to [MRG] ENH Ensure PCA and randomized_svd_low_rank don't upcast float to double Jun 9, 2017

@ogrisel

ogrisel approved these changes Jun 9, 2017

LGTM once the following comments are addressed.

def test_pca_dtype_preservation():
for svd_solver in solver_list:

This comment has been minimized.

@ogrisel

ogrisel Jun 9, 2017

Member

Please use a test generator (with the yield syntax) to generate the checks for different solver names so that we get the solver name in the test failure report.

@ogrisel

ogrisel Jun 9, 2017

Member

Please use a test generator (with the yield syntax) to generate the checks for different solver names so that we get the solver name in the test failure report.

Show outdated Hide outdated sklearn/decomposition/tests/test_pca.py
Show outdated Hide outdated sklearn/decomposition/tests/test_pca.py
Show outdated Hide outdated sklearn/utils/tests/test_extmath.py
@@ -362,7 +362,7 @@ def _fit(self, X):
raise TypeError('PCA does not support sparse input. See '
'TruncatedSVD for a possible alternative.')
X = check_array(X, dtype=[np.float64], ensure_2d=True,
X = check_array(X, dtype=[np.float64, np.float32], ensure_2d=True,

This comment has been minimized.

@ogrisel

ogrisel Jun 9, 2017

Member

@raghavrv this line makes it such that if X is integer based, it would be converted to np.float64 (the first element of the list) on all the platforms.

This answers your question at: https://github.com/scikit-learn/scikit-learn/pull/9067/files#r121114986

@ogrisel

ogrisel Jun 9, 2017

Member

@raghavrv this line makes it such that if X is integer based, it would be converted to np.float64 (the first element of the list) on all the platforms.

This answers your question at: https://github.com/scikit-learn/scikit-learn/pull/9067/files#r121114986

This comment has been minimized.

@amueller

amueller Jun 19, 2017

Member

Should we change this behavior of check_array?

@amueller

amueller Jun 19, 2017

Member

Should we change this behavior of check_array?

This comment has been minimized.

@raghavrv

raghavrv Jun 19, 2017

Member

I think this is helpful. It converts to the dtype as per the given order.

@raghavrv

raghavrv Jun 19, 2017

Member

I think this is helpful. It converts to the dtype as per the given order.

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Jun 12, 2017

Member

@ogrisel Thanks for the review. Have addressed your comments.

Member

raghavrv commented Jun 12, 2017

@ogrisel Thanks for the review. Have addressed your comments.

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Jun 16, 2017

Member

@ogrisel The appveyor failed on some estimator checks. I've updated the checks in a way that only 4 decimal places are checked for float32 data. Could you verify the last commit and see if it is correct?

Member

raghavrv commented Jun 16, 2017

@ogrisel The appveyor failed on some estimator checks. I've updated the checks in a way that only 4 decimal places are checked for float32 data. Could you verify the last commit and see if it is correct?

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Jun 16, 2017

Member

Ah! There was a change to the master which actually addressed the failures. I think CIs should pass after the last commit.

Member

raghavrv commented Jun 16, 2017

Ah! There was a change to the master which actually addressed the failures. I think CIs should pass after the last commit.

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Jun 16, 2017

Member

Yeah. @ogrisel @GaelVaroquaux final review and merge?

Member

raghavrv commented Jun 16, 2017

Yeah. @ogrisel @GaelVaroquaux final review and merge?

@raghavrv raghavrv changed the title from [MRG] ENH Ensure PCA and randomized_svd_low_rank don't upcast float to double to [MRG + 1] ENH Ensure PCA and randomized_svd_low_rank don't upcast float to double Jun 16, 2017

@jnothman jnothman added this to the 0.19 milestone Jun 18, 2017

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Jun 20, 2017

Member

LGTM once my comment above is addressed.

Member

GaelVaroquaux commented Jun 20, 2017

LGTM once my comment above is addressed.

@raghavrv

This comment has been minimized.

Show comment
Hide comment
@raghavrv

raghavrv Jun 21, 2017

Member

Thanks @GaelVaroquaux. Have fixed it.

Member

raghavrv commented Jun 21, 2017

Thanks @GaelVaroquaux. Have fixed it.

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Jun 21, 2017

Member

OK, good. Merging! Thanks

Member

GaelVaroquaux commented Jun 21, 2017

OK, good. Merging! Thanks

@GaelVaroquaux GaelVaroquaux merged commit 1c1566e into scikit-learn:master Jun 21, 2017

5 checks passed

ci/circleci Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 96.29%)
Details
codecov/project 96.31% (+0.01%) compared to 7238b46
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@raghavrv raghavrv deleted the raghavrv:randomized_svd_dtype branch Jun 23, 2017

dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017

[MRG + 1] ENH Ensure PCA and randomized_svd_low_rank don't upcast flo…
…at to double (#9067)

* ENH Ensure randomized_svd_low_rank doesn't upcast float to double

* ENH ensure PCA does not upcase f32 to f64; (int is upcast to f32)

* ENH ensure that when input is of type int, the output is float32/64

* ENH prefer float64 over float32; Use float64 for int inputs

* Make sure int types are upcasted to float64; Address Olivier's comments

* FIX check only for 4 decimals when dtype is float32

* Fix spurious line removal

dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017

[MRG + 1] ENH Ensure PCA and randomized_svd_low_rank don't upcast flo…
…at to double (#9067)

* ENH Ensure randomized_svd_low_rank doesn't upcast float to double

* ENH ensure PCA does not upcase f32 to f64; (int is upcast to f32)

* ENH ensure that when input is of type int, the output is float32/64

* ENH prefer float64 over float32; Use float64 for int inputs

* Make sure int types are upcasted to float64; Address Olivier's comments

* FIX check only for 4 decimals when dtype is float32

* Fix spurious line removal

NelleV added a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017

[MRG + 1] ENH Ensure PCA and randomized_svd_low_rank don't upcast flo…
…at to double (#9067)

* ENH Ensure randomized_svd_low_rank doesn't upcast float to double

* ENH ensure PCA does not upcase f32 to f64; (int is upcast to f32)

* ENH ensure that when input is of type int, the output is float32/64

* ENH prefer float64 over float32; Use float64 for int inputs

* Make sure int types are upcasted to float64; Address Olivier's comments

* FIX check only for 4 decimals when dtype is float32

* Fix spurious line removal

paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

[MRG + 1] ENH Ensure PCA and randomized_svd_low_rank don't upcast flo…
…at to double (#9067)

* ENH Ensure randomized_svd_low_rank doesn't upcast float to double

* ENH ensure PCA does not upcase f32 to f64; (int is upcast to f32)

* ENH ensure that when input is of type int, the output is float32/64

* ENH prefer float64 over float32; Use float64 for int inputs

* Make sure int types are upcasted to float64; Address Olivier's comments

* FIX check only for 4 decimals when dtype is float32

* Fix spurious line removal

AishwaryaRK added a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017

[MRG + 1] ENH Ensure PCA and randomized_svd_low_rank don't upcast flo…
…at to double (#9067)

* ENH Ensure randomized_svd_low_rank doesn't upcast float to double

* ENH ensure PCA does not upcase f32 to f64; (int is upcast to f32)

* ENH ensure that when input is of type int, the output is float32/64

* ENH prefer float64 over float32; Use float64 for int inputs

* Make sure int types are upcasted to float64; Address Olivier's comments

* FIX check only for 4 decimals when dtype is float32

* Fix spurious line removal

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

[MRG + 1] ENH Ensure PCA and randomized_svd_low_rank don't upcast flo…
…at to double (#9067)

* ENH Ensure randomized_svd_low_rank doesn't upcast float to double

* ENH ensure PCA does not upcase f32 to f64; (int is upcast to f32)

* ENH ensure that when input is of type int, the output is float32/64

* ENH prefer float64 over float32; Use float64 for int inputs

* Make sure int types are upcasted to float64; Address Olivier's comments

* FIX check only for 4 decimals when dtype is float32

* Fix spurious line removal

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

[MRG + 1] ENH Ensure PCA and randomized_svd_low_rank don't upcast flo…
…at to double (#9067)

* ENH Ensure randomized_svd_low_rank doesn't upcast float to double

* ENH ensure PCA does not upcase f32 to f64; (int is upcast to f32)

* ENH ensure that when input is of type int, the output is float32/64

* ENH prefer float64 over float32; Use float64 for int inputs

* Make sure int types are upcasted to float64; Address Olivier's comments

* FIX check only for 4 decimals when dtype is float32

* Fix spurious line removal
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment