New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Ensures that partial_fit for sklearn.decomposition.IncrementalPCA uses float division #9492

Merged
merged 8 commits into from Aug 14, 2017

Conversation

Projects
None yet
4 participants
@jrbourbeau
Contributor

jrbourbeau commented Aug 4, 2017

Reference Issue

Fixes #9489

What does this implement/fix? Explain your changes.

Currently the partial_fit method for sklearn.decomposition.IncrementalPCA will use integer division for Python 2 version and float division for Python 3 (see https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/decomposition/incremental_pca.py#L249). This PR ensures that float division is used for all Python versions by casting the relevant numerator to a floating-point number.

Any other comments?

While casting the numerator to a float fixes issue #9489. I'm not sure if this is the preferred style, or if adding from __future__ import division is a better option.

@NelleV

This comment has been minimized.

Show comment
Hide comment
@NelleV

NelleV Aug 4, 2017

Member

Hi @jrbourbeau
Thanks a lot for the patch. It looks good overall, but can you add a test?

Member

NelleV commented Aug 4, 2017

Hi @jrbourbeau
Thanks a lot for the patch. It looks good overall, but can you add a test?

@NelleV

This comment has been minimized.

Show comment
Hide comment
@NelleV

NelleV Aug 4, 2017

Member

(also, it might be more elegant to use the from __future__ import division solution.)

Member

NelleV commented Aug 4, 2017

(also, it might be more elegant to use the from __future__ import division solution.)

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Aug 5, 2017

Member
Member

jnothman commented Aug 5, 2017

@jrbourbeau

This comment has been minimized.

Show comment
Hide comment
@jrbourbeau

jrbourbeau Aug 6, 2017

Contributor

I would prefer the from __future__ import division route as well. @jnothman, do you think it's safe to assume that the internals of sklearn.decomposition.IncrementalPCA aren't something users will be doing a copy-paste with?

Contributor

jrbourbeau commented Aug 6, 2017

I would prefer the from __future__ import division route as well. @jnothman, do you think it's safe to assume that the internals of sklearn.decomposition.IncrementalPCA aren't something users will be doing a copy-paste with?

@jrbourbeau

This comment has been minimized.

Show comment
Hide comment
@jrbourbeau

jrbourbeau Aug 6, 2017

Contributor

@NelleV, do have any suggestions for a test to add? I was thinking the original code that caught this problem in issue #9489 might be a good test. Something like

import numpy as np
from sklearn.decomposition import IncrementalPCA

def test_partial_fit_correct_answer():
    A = np.array([[1, 2, 4], [5, 3, 6]])
    B = np.array([[6, 7, 3], [5, 2, 1], [3, 5, 6]])
    C = np.array([[3, 2, 1]])

    ipca = IncrementalPCA(n_components=2)
    ipca.partial_fit(A)
    ipca.partial_fit(B)
    # Know answer is [[-1.48864923 -3.15618645]]
    np.testing.assert_equal(ipca.transform(C), [[-1.48864923 -3.15618645]])

But I'm not sure if hard-coding the correct answer is the way to go, or if one should just simply check for float division (since that's the real problem here) with something like

def test_float_division():
    assert 3/2 == 1.5
Contributor

jrbourbeau commented Aug 6, 2017

@NelleV, do have any suggestions for a test to add? I was thinking the original code that caught this problem in issue #9489 might be a good test. Something like

import numpy as np
from sklearn.decomposition import IncrementalPCA

def test_partial_fit_correct_answer():
    A = np.array([[1, 2, 4], [5, 3, 6]])
    B = np.array([[6, 7, 3], [5, 2, 1], [3, 5, 6]])
    C = np.array([[3, 2, 1]])

    ipca = IncrementalPCA(n_components=2)
    ipca.partial_fit(A)
    ipca.partial_fit(B)
    # Know answer is [[-1.48864923 -3.15618645]]
    np.testing.assert_equal(ipca.transform(C), [[-1.48864923 -3.15618645]])

But I'm not sure if hard-coding the correct answer is the way to go, or if one should just simply check for float division (since that's the real problem here) with something like

def test_float_division():
    assert 3/2 == 1.5
@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Aug 9, 2017

Member

A test hard-coding the correct answer with a comment saying it's a non regression test for issue#9489 would be great

Member

jnothman commented Aug 9, 2017

A test hard-coding the correct answer with a comment saying it's a non regression test for issue#9489 would be great

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Aug 9, 2017

Member

Here is a snippet that can be easily turned into a regression test:

import numpy as np
from sklearn.decomposition import IncrementalPCA

rng = np.random.RandomState(0)
A = rng.randn(5, 3) + 2
B = rng.randn(7, 3) + 5

pca = IncrementalPCA(n_components=2)
pca.partial_fit(A)
pca.n_samples_seen_ = float(pca.n_samples_seen_)
pca.partial_fit(B)
print(pca.singular_values_)

pca2 = IncrementalPCA(n_components=2)
pca2.partial_fit(A)
pca2.partial_fit(B)
print(pca2.singular_values_)

With scikit-learn 0.18.2 and Python 2.7, the output is:

[ 7.70269857  3.88561777]
[ 6.64122066  3.88184665]
Member

lesteve commented Aug 9, 2017

Here is a snippet that can be easily turned into a regression test:

import numpy as np
from sklearn.decomposition import IncrementalPCA

rng = np.random.RandomState(0)
A = rng.randn(5, 3) + 2
B = rng.randn(7, 3) + 5

pca = IncrementalPCA(n_components=2)
pca.partial_fit(A)
pca.n_samples_seen_ = float(pca.n_samples_seen_)
pca.partial_fit(B)
print(pca.singular_values_)

pca2 = IncrementalPCA(n_components=2)
pca2.partial_fit(A)
pca2.partial_fit(B)
print(pca2.singular_values_)

With scikit-learn 0.18.2 and Python 2.7, the output is:

[ 7.70269857  3.88561777]
[ 6.64122066  3.88184665]

@jrbourbeau jrbourbeau changed the title from Ensures that partial_fit for sklearn.decomposition.IncrementalPCA uses float division to [MRG] Ensures that partial_fit for sklearn.decomposition.IncrementalPCA uses float division Aug 10, 2017

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Aug 10, 2017

Member

Also you want to add an entry into doc/whats_new.rst about your fix. Do mention it is Python 2 only.

Member

lesteve commented Aug 10, 2017

Also you want to add an entry into doc/whats_new.rst about your fix. Do mention it is Python 2 only.

@jrbourbeau

This comment has been minimized.

Show comment
Hide comment
@jrbourbeau

jrbourbeau Aug 10, 2017

Contributor

It looks like I added a bunch of other commits I didn't intend to in my most recent push. I was trying to update doc/whats_new.rst with an entry for this PR, but had to rebase on master to get the updates to doc/whats_new.rst that had been added since the feature branch for this PR was made.

I attempted the rebase following the instructions from contributing guide (https://docs.scipy.org/doc/numpy/dev/gitwash/development_workflow.html#rebasing-on-master), but must have accidentally included all the other commits as well.

I'm not exactly sure what to do here. Does anyone know of a way to fix this? Or should I open up a new PR? Thanks!!

Contributor

jrbourbeau commented Aug 10, 2017

It looks like I added a bunch of other commits I didn't intend to in my most recent push. I was trying to update doc/whats_new.rst with an entry for this PR, but had to rebase on master to get the updates to doc/whats_new.rst that had been added since the feature branch for this PR was made.

I attempted the rebase following the instructions from contributing guide (https://docs.scipy.org/doc/numpy/dev/gitwash/development_workflow.html#rebasing-on-master), but must have accidentally included all the other commits as well.

I'm not exactly sure what to do here. Does anyone know of a way to fix this? Or should I open up a new PR? Thanks!!

@jrbourbeau

This comment has been minimized.

Show comment
Hide comment
@jrbourbeau

jrbourbeau Aug 10, 2017

Contributor

I made a backup branch before I used git rebase. Would

# reset branch back to the saved point
git reset --hard backup_branch
# push this saved point to the remote branch for this PR
git push -f origin remote_feature_branch

work to revert this PR back to its state before I did the rebase?

Contributor

jrbourbeau commented Aug 10, 2017

I made a backup branch before I used git rebase. Would

# reset branch back to the saved point
git reset --hard backup_branch
# push this saved point to the remote branch for this PR
git push -f origin remote_feature_branch

work to revert this PR back to its state before I did the rebase?

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Aug 10, 2017

Member
Member

jnothman commented Aug 10, 2017

@jrbourbeau

This comment has been minimized.

Show comment
Hide comment
@jrbourbeau

jrbourbeau Aug 11, 2017

Contributor

Thanks so much @jnothman!

I want to add an entry to doc/whats_new.rst related to this PR. However, there have been changes made to the current master branch version of doc/whats_new.rst. Namely, a section for version 0.20 has been added. I would like to add an entry for this PR to the version 0.20 section without creating any merge conflicts with the current master branch.

I was wondering how I should go about this. One possible way would be to copy the new version 0.20 section from the current master branch into my feature branch, add an entry for this PR, and then commit it. Is this the preferred way, or is there a different way to go about this?

Contributor

jrbourbeau commented Aug 11, 2017

Thanks so much @jnothman!

I want to add an entry to doc/whats_new.rst related to this PR. However, there have been changes made to the current master branch version of doc/whats_new.rst. Namely, a section for version 0.20 has been added. I would like to add an entry for this PR to the version 0.20 section without creating any merge conflicts with the current master branch.

I was wondering how I should go about this. One possible way would be to copy the new version 0.20 section from the current master branch into my feature branch, add an entry for this PR, and then commit it. Is this the preferred way, or is there a different way to go about this?

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Aug 11, 2017

Member
Member

jnothman commented Aug 11, 2017

jrbourbeau added some commits Aug 11, 2017

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn
…into IncrementalPCA-partial_fit-float-division

Want to get recent changes made to `doc/whats_new.rst` into my feature branch
@jrbourbeau

This comment has been minimized.

Show comment
Hide comment
@jrbourbeau

jrbourbeau Aug 11, 2017

Contributor

It worked! Thanks for helping me with that @jnothman

Contributor

jrbourbeau commented Aug 11, 2017

It worked! Thanks for helping me with that @jnothman

@jrbourbeau jrbourbeau changed the title from [MRG] Ensures that partial_fit for sklearn.decomposition.IncrementalPCA uses float division to [WIP] Ensures that partial_fit for sklearn.decomposition.IncrementalPCA uses float division Aug 11, 2017

@jrbourbeau

This comment has been minimized.

Show comment
Hide comment
@jrbourbeau

jrbourbeau Aug 11, 2017

Contributor

PR #9526 updated doc/whats_new.rst on master in preparation for the 0.19 release, leading to a merge conflict. But now all conflicts have been resolved. Unless there are any other comments, I think this PR is good to go.

To summarize, this PR:

  1. Makes sure the partial_fit method for sklearn.decomposition.IncrementalPCA will use float
    division for Python 2 versions.
  2. Adds non-regression test for issue #9489 to sklearn/decomposition/tests/test_incremental_pca.py.
  3. Updates doc/whats_new.rst with an entry for this PR.
Contributor

jrbourbeau commented Aug 11, 2017

PR #9526 updated doc/whats_new.rst on master in preparation for the 0.19 release, leading to a merge conflict. But now all conflicts have been resolved. Unless there are any other comments, I think this PR is good to go.

To summarize, this PR:

  1. Makes sure the partial_fit method for sklearn.decomposition.IncrementalPCA will use float
    division for Python 2 versions.
  2. Adds non-regression test for issue #9489 to sklearn/decomposition/tests/test_incremental_pca.py.
  3. Updates doc/whats_new.rst with an entry for this PR.

@jrbourbeau jrbourbeau changed the title from [WIP] Ensures that partial_fit for sklearn.decomposition.IncrementalPCA uses float division to [MRG] Ensures that partial_fit for sklearn.decomposition.IncrementalPCA uses float division Aug 11, 2017

@jnothman

Yes, LGTM. Thanks

Show outdated Hide outdated doc/whats_new.rst

@jnothman jnothman changed the title from [MRG] Ensures that partial_fit for sklearn.decomposition.IncrementalPCA uses float division to [MRG+1] Ensures that partial_fit for sklearn.decomposition.IncrementalPCA uses float division Aug 14, 2017

@NelleV NelleV merged commit 86d8f18 into scikit-learn:master Aug 14, 2017

5 checks passed

ci/circleci Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 96.17%)
Details
codecov/project 96.17% (+<.01%) compared to 9c119e6
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@NelleV

This comment has been minimized.

Show comment
Hide comment
@NelleV

NelleV Aug 14, 2017

Member

Thanks @jrbourbeau !

Member

NelleV commented Aug 14, 2017

Thanks @jrbourbeau !

@jrbourbeau jrbourbeau deleted the jrbourbeau:IncrementalPCA-partial_fit-float-division branch Aug 14, 2017

paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

[MRG+1] Ensures that partial_fit for sklearn.decomposition.Incrementa…
…lPCA uses float division (scikit-learn#9492)

* Ensures that partial_fit uses float division

* Switches to using future division for float division

* Adds non-regression test for issue scikit-learn#9489

* Updates test to remove dependence on a "known answer"

* Updates doc/whats_new.rst with entry for this PR

* Specifies bug fix is for Python 2 versions in doc/whats_new.rst

AishwaryaRK added a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017

[MRG+1] Ensures that partial_fit for sklearn.decomposition.Incrementa…
…lPCA uses float division (scikit-learn#9492)

* Ensures that partial_fit uses float division

* Switches to using future division for float division

* Adds non-regression test for issue scikit-learn#9489

* Updates test to remove dependence on a "known answer"

* Updates doc/whats_new.rst with entry for this PR

* Specifies bug fix is for Python 2 versions in doc/whats_new.rst

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

[MRG+1] Ensures that partial_fit for sklearn.decomposition.Incrementa…
…lPCA uses float division (scikit-learn#9492)

* Ensures that partial_fit uses float division

* Switches to using future division for float division

* Adds non-regression test for issue scikit-learn#9489

* Updates test to remove dependence on a "known answer"

* Updates doc/whats_new.rst with entry for this PR

* Specifies bug fix is for Python 2 versions in doc/whats_new.rst

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

[MRG+1] Ensures that partial_fit for sklearn.decomposition.Incrementa…
…lPCA uses float division (scikit-learn#9492)

* Ensures that partial_fit uses float division

* Switches to using future division for float division

* Adds non-regression test for issue scikit-learn#9489

* Updates test to remove dependence on a "known answer"

* Updates doc/whats_new.rst with entry for this PR

* Specifies bug fix is for Python 2 versions in doc/whats_new.rst
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment