[MRG+1] FIX unstable cumsum #7376

Merged
merged 14 commits into from Oct 17, 2016

Conversation

Projects
None yet
8 participants
@yangarbiter
Contributor

yangarbiter commented Sep 9, 2016

Reference Issue

#7359

What does this implement/fix? Explain your changes.

np.cumsum is reported unstable when dealing with float32 data or very large arrays of float64 data in #6842. This pull request change them into sklearn.utils.extmath.stable_cumsum to solve this problem (#7331).

@TomDLT

This comment has been minimized.

Show comment
Hide comment
@TomDLT

TomDLT Sep 9, 2016

Member

Failure in test_random_choice_csc is due to a call to stable_cumsum(array([nan])), which fails since np.allclose(np.nan, np.nan) = False.

Member

TomDLT commented Sep 9, 2016

Failure in test_random_choice_csc is due to a call to stable_cumsum(array([nan])), which fails since np.allclose(np.nan, np.nan) = False.

@yangarbiter yangarbiter changed the title from [WIP] FIX unstable cumsum to [MRG] FIX unstable cumsum Sep 12, 2016

sklearn/utils/extmath.py
- out = np.cumsum(arr, dtype=np.float64)
- expected = np.sum(arr, dtype=np.float64)
- if not np.allclose(out[-1], expected, rtol=rtol, atol=atol):
+ if np_version < (1, 9):

This comment has been minimized.

@TomDLT

TomDLT Sep 12, 2016

Member

you should add a comment explaining why we skip the check in this case

@TomDLT

TomDLT Sep 12, 2016

Member

you should add a comment explaining why we skip the check in this case

@@ -844,7 +844,7 @@ def _deterministic_vector_sign_flip(u):
return u
-def stable_cumsum(arr, rtol=1e-05, atol=1e-08):
+def stable_cumsum(arr, axis=None, rtol=1e-05, atol=1e-08):

This comment has been minimized.

@TomDLT

TomDLT Sep 12, 2016

Member

You should add axis in the docstring

@TomDLT

TomDLT Sep 12, 2016

Member

You should add axis in the docstring

This comment has been minimized.

@jnothman

jnothman Sep 12, 2016

Member

axis needs testing.

@jnothman

jnothman Sep 12, 2016

Member

axis needs testing.

@TomDLT TomDLT changed the title from [MRG] FIX unstable cumsum to [MRG+1] FIX unstable cumsum Sep 12, 2016

@TomDLT

This comment has been minimized.

Show comment
Hide comment
@TomDLT

TomDLT Sep 12, 2016

Member

LGTM

Member

TomDLT commented Sep 12, 2016

LGTM

sklearn/datasets/samples_generator.py
@@ -333,7 +334,7 @@ def make_multilabel_classification(n_samples=100, n_features=20, n_classes=5,
generator = check_random_state(random_state)
p_c = generator.rand(n_classes)
p_c /= p_c.sum()
- cumulative_p_c = np.cumsum(p_c)
+ cumulative_p_c = stable_cumsum(p_c)

This comment has been minimized.

@jnothman

jnothman Sep 12, 2016

Member

I don't think this is much value. p_c will always be high precision, and problems of numerical instability in cumulative summing aren't likely to be issue at the scale of "number of classes". Please apply this change with more discretion. It comes at a (small) cost.

@jnothman

jnothman Sep 12, 2016

Member

I don't think this is much value. p_c will always be high precision, and problems of numerical instability in cumulative summing aren't likely to be issue at the scale of "number of classes". Please apply this change with more discretion. It comes at a (small) cost.

sklearn/utils/random.py
@@ -143,7 +144,7 @@ def choice(a, size=None, replace=True, p=None, random_state=None):
# Actual sampling
if replace:
if p is not None:
- cdf = p.cumsum()
+ cdf = stable_cumsum(p)

This comment has been minimized.

@jnothman

jnothman Sep 12, 2016

Member

As a backport, we should leave this file unchanged.

@jnothman

jnothman Sep 12, 2016

Member

As a backport, we should leave this file unchanged.

sklearn/mixture/gmm.py
@@ -394,7 +394,7 @@ def sample(self, n_samples=1, random_state=None):
if random_state is None:
random_state = self.random_state
random_state = check_random_state(random_state)
- weight_cdf = np.cumsum(self.weights_)
+ weight_cdf = stable_cumsum(self.weights_)

This comment has been minimized.

@jnothman

jnothman Sep 12, 2016

Member

Again, I think this is unlikely to be a problem case. self.weights_.shape is small.

@jnothman

jnothman Sep 12, 2016

Member

Again, I think this is unlikely to be a problem case. self.weights_.shape is small.

@jnothman jnothman added this to the 0.19 milestone Sep 13, 2016

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 13, 2016

Member

LGTM

Member

amueller commented Sep 13, 2016

LGTM

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 13, 2016

Member

I've not reviewed this fully yet and don't consider it an immediate priority. I think we should use some discretion with the helper as it is a little more expensive.

Member

jnothman commented Sep 13, 2016

I've not reviewed this fully yet and don't consider it an immediate priority. I think we should use some discretion with the helper as it is a little more expensive.

@NelleV

This comment has been minimized.

Show comment
Hide comment
@NelleV

NelleV Sep 21, 2016

Member

I am slightly concerned about this patch as it does not propose an alternative to cumsum. It is going to raise an error, but the users won't have any ways of fixing the problem.
Unless I did not understand?

Member

NelleV commented Sep 21, 2016

I am slightly concerned about this patch as it does not propose an alternative to cumsum. It is going to raise an error, but the users won't have any ways of fixing the problem.
Unless I did not understand?

sklearn/datasets/samples_generator.py
@@ -1196,7 +1196,7 @@ def make_sparse_spd_matrix(dim=1, alpha=0.95, norm_diag=False,
The size of the random matrix to generate.
alpha: float between 0 and 1, optional (default=0.95)

This comment has been minimized.

@NelleV

NelleV Sep 21, 2016

Member

It is not from your patch, but do you mind adding a space after the column here? Else numpydoc won't render it properly.

@NelleV

NelleV Sep 21, 2016

Member

It is not from your patch, but do you mind adding a space after the column here? Else numpydoc won't render it properly.

This comment has been minimized.

@yangarbiter

yangarbiter Sep 21, 2016

Contributor

Sure

@yangarbiter

yangarbiter Sep 21, 2016

Contributor

Sure

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 21, 2016

Member

It is going to raise an error, but the users won't have any ways of fixing the problem.

Perhaps you're right. I think an error is better than a silent failure with indeterminate consequences, but perhaps a warning is better than an error. Frankly, I will be interested to see whether/when these cases occur at all, and then whether we can find a solution.

Member

jnothman commented Sep 21, 2016

It is going to raise an error, but the users won't have any ways of fixing the problem.

Perhaps you're right. I think an error is better than a silent failure with indeterminate consequences, but perhaps a warning is better than an error. Frankly, I will be interested to see whether/when these cases occur at all, and then whether we can find a solution.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Oct 7, 2016

Member

So the approach is merge and wait for the issue tracker to blow up? ;) I mean that's a possibility. Needs a rebase.

Member

amueller commented Oct 7, 2016

So the approach is merge and wait for the issue tracker to blow up? ;) I mean that's a possibility. Needs a rebase.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 8, 2016

Member

I still haven't taken a good look at these. I'd like to be a bit conservative about it.

Member

jnothman commented Oct 8, 2016

I still haven't taken a good look at these. I'd like to be a bit conservative about it.

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Oct 8, 2016

Member

Is the plan that we are going to raise errors on users? As @NelleV, I wouldn't find this very useful for the end users. I would much prefer raising a warning. If we want to control for such problem in our test codebase, we could specifically turn this warning into an error (eg with warnings.simplefilter) during the tests.

Member

GaelVaroquaux commented Oct 8, 2016

Is the plan that we are going to raise errors on users? As @NelleV, I wouldn't find this very useful for the end users. I would much prefer raising a warning. If we want to control for such problem in our test codebase, we could specifically turn this warning into an error (eg with warnings.simplefilter) during the tests.

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Oct 8, 2016

Member
Member

GaelVaroquaux commented Oct 8, 2016

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 13, 2016

Member

I'm happy to change the behaviour to a warning.

Member

jnothman commented Oct 13, 2016

I'm happy to change the behaviour to a warning.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 13, 2016

Member

@yangarbiter do you want to incorporate that change?

Member

jnothman commented Oct 13, 2016

@yangarbiter do you want to incorporate that change?

@yangarbiter

This comment has been minimized.

Show comment
Hide comment
@yangarbiter

yangarbiter Oct 13, 2016

Contributor

Sure. I can do that!

On Thu, Oct 13, 2016, 9:11 PM Joel Nothman notifications@github.com wrote:

@yangarbiter https://github.com/yangarbiter do you want to incorporate
that change?


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#7376 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AD_51aIF-MRzujMg9KMrDkJuIsgYjkm0ks5qzi4SgaJpZM4J4_Zp
.

--YY

Contributor

yangarbiter commented Oct 13, 2016

Sure. I can do that!

On Thu, Oct 13, 2016, 9:11 PM Joel Nothman notifications@github.com wrote:

@yangarbiter https://github.com/yangarbiter do you want to incorporate
that change?


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#7376 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AD_51aIF-MRzujMg9KMrDkJuIsgYjkm0ks5qzi4SgaJpZM4J4_Zp
.

--YY

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Oct 14, 2016

Member

I wanted to give my +1 and merge, but there are test failures both in AppVeyor and in travis, with different failure: you seem to have a mixture of tabs and spaces for indentation, and the test needs to be upgraded to test for the warning, and not the RunTime error.

Member

GaelVaroquaux commented Oct 14, 2016

I wanted to give my +1 and merge, but there are test failures both in AppVeyor and in travis, with different failure: you seem to have a mixture of tabs and spaces for indentation, and the test needs to be upgraded to test for the warning, and not the RunTime error.

@yangarbiter

This comment has been minimized.

Show comment
Hide comment
@yangarbiter

yangarbiter Oct 14, 2016

Contributor

Sorry about that, I've fixed it.

Contributor

yangarbiter commented Oct 14, 2016

Sorry about that, I've fixed it.

@tguillemot

This comment has been minimized.

Show comment
Hide comment
@tguillemot

tguillemot Oct 17, 2016

Contributor

I think we have a +3 for that.
@jnothman ok to merge ?

Contributor

tguillemot commented Oct 17, 2016

I think we have a +3 for that.
@jnothman ok to merge ?

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Oct 17, 2016

Member

+1 to merge. Merging. Thanks!

Member

GaelVaroquaux commented Oct 17, 2016

+1 to merge. Merging. Thanks!

@GaelVaroquaux GaelVaroquaux merged commit fa59873 into scikit-learn:master Oct 17, 2016

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 18, 2016

Member

I hadn't actually looked at this in full, so I hope others gave it a proper
review. Thanks @yangarbiter.

On 18 October 2016 at 00:49, Gael Varoquaux notifications@github.com
wrote:

Merged #7376 #7376.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7376 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz61zHw9wtgcA7-SPfWnF-UQ7crMdoks5q03zkgaJpZM4J4_Zp
.

Member

jnothman commented Oct 18, 2016

I hadn't actually looked at this in full, so I hope others gave it a proper
review. Thanks @yangarbiter.

On 18 October 2016 at 00:49, Gael Varoquaux notifications@github.com
wrote:

Merged #7376 #7376.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7376 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz61zHw9wtgcA7-SPfWnF-UQ7crMdoks5q03zkgaJpZM4J4_Zp
.

@yangarbiter

This comment has been minimized.

Show comment
Hide comment
@yangarbiter

yangarbiter Oct 18, 2016

Contributor

Thank you too!

Contributor

yangarbiter commented Oct 18, 2016

Thank you too!

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Oct 19, 2016

Member

ConvergenceWarning seems like a slightly strange choice for stable_cumsum, doesn't it? Should we not use a RuntimeWarning like numpy seems to do for overflows?

Member

lesteve commented Oct 19, 2016

ConvergenceWarning seems like a slightly strange choice for stable_cumsum, doesn't it? Should we not use a RuntimeWarning like numpy seems to do for overflows?

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 19, 2016

Member

ConvergenceWarning should probably be a RuntimeWarning. But yes, I suppose
this is a question of numerical stability rather than parameter choice, so
RuntimeWarning may be more appropriate.

On 19 October 2016 at 18:14, Loïc Estève notifications@github.com wrote:

ConvergenceWarning seems like a slightly strange choice for stable_cumsum,
doesn't it? Should we not use a RuntimeWarning like numpy seems to do for
overflows?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7376 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz672EeCrXMRZIRpvDjG8gqXVHeMIyks5q1cNggaJpZM4J4_Zp
.

Member

jnothman commented Oct 19, 2016

ConvergenceWarning should probably be a RuntimeWarning. But yes, I suppose
this is a question of numerical stability rather than parameter choice, so
RuntimeWarning may be more appropriate.

On 19 October 2016 at 18:14, Loïc Estève notifications@github.com wrote:

ConvergenceWarning seems like a slightly strange choice for stable_cumsum,
doesn't it? Should we not use a RuntimeWarning like numpy seems to do for
overflows?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7376 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz672EeCrXMRZIRpvDjG8gqXVHeMIyks5q1cNggaJpZM4J4_Zp
.

@yangarbiter

This comment has been minimized.

Show comment
Hide comment
@yangarbiter

yangarbiter Oct 19, 2016

Contributor

Let me fix that.
Do I need to start a new PR?

Contributor

yangarbiter commented Oct 19, 2016

Let me fix that.
Do I need to start a new PR?

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Nov 22, 2016

Member

Opened #7922 to replace ConvergenceWarning by RuntimeWarning

Member

lesteve commented Nov 22, 2016

Opened #7922 to replace ConvergenceWarning by RuntimeWarning

afiodorov added a commit to unravelin/scikit-learn that referenced this pull request Apr 25, 2017

[MRG+1] FIX unstable cumsum (#7376)
* FIX unstable cumsum in utils.random

* equal_nan = true for isclose
since numpy < 1.9 sum is as unstable as cumsum, fallback to np.cumsum

* added axis parameter to stable_cumsum

* FIX unstable sumsum in ensemble.weight_boosting and utils.stats

* FIX axis problem in stable_cumsum

* FIX unstable cumsum in mixture.gmm and mixture.dpgmm

* FIX unstable cumsum in cluster.k_means_, decomposition.pca, and manifold.locally_linear

* FIX unstable sumsum in dataset.samples_generator

* added docstring for parameter axis of stable_cumsum

* added comment for why fall back to np.cumsum when np version < 1.9

* remove unneeded stable_cumsum

* added stable_cumsum's axis testing

* FIX numpy docstring for make_sparse_spd_matrix

* change stable_cumsum from error to warning

Sundrique added a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017

[MRG+1] FIX unstable cumsum (#7376)
* FIX unstable cumsum in utils.random

* equal_nan = true for isclose
since numpy < 1.9 sum is as unstable as cumsum, fallback to np.cumsum

* added axis parameter to stable_cumsum

* FIX unstable sumsum in ensemble.weight_boosting and utils.stats

* FIX axis problem in stable_cumsum

* FIX unstable cumsum in mixture.gmm and mixture.dpgmm

* FIX unstable cumsum in cluster.k_means_, decomposition.pca, and manifold.locally_linear

* FIX unstable sumsum in dataset.samples_generator

* added docstring for parameter axis of stable_cumsum

* added comment for why fall back to np.cumsum when np version < 1.9

* remove unneeded stable_cumsum

* added stable_cumsum's axis testing

* FIX numpy docstring for make_sparse_spd_matrix

* change stable_cumsum from error to warning

paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

[MRG+1] FIX unstable cumsum (#7376)
* FIX unstable cumsum in utils.random

* equal_nan = true for isclose
since numpy < 1.9 sum is as unstable as cumsum, fallback to np.cumsum

* added axis parameter to stable_cumsum

* FIX unstable sumsum in ensemble.weight_boosting and utils.stats

* FIX axis problem in stable_cumsum

* FIX unstable cumsum in mixture.gmm and mixture.dpgmm

* FIX unstable cumsum in cluster.k_means_, decomposition.pca, and manifold.locally_linear

* FIX unstable sumsum in dataset.samples_generator

* added docstring for parameter axis of stable_cumsum

* added comment for why fall back to np.cumsum when np version < 1.9

* remove unneeded stable_cumsum

* added stable_cumsum's axis testing

* FIX numpy docstring for make_sparse_spd_matrix

* change stable_cumsum from error to warning

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

[MRG+1] FIX unstable cumsum (#7376)
* FIX unstable cumsum in utils.random

* equal_nan = true for isclose
since numpy < 1.9 sum is as unstable as cumsum, fallback to np.cumsum

* added axis parameter to stable_cumsum

* FIX unstable sumsum in ensemble.weight_boosting and utils.stats

* FIX axis problem in stable_cumsum

* FIX unstable cumsum in mixture.gmm and mixture.dpgmm

* FIX unstable cumsum in cluster.k_means_, decomposition.pca, and manifold.locally_linear

* FIX unstable sumsum in dataset.samples_generator

* added docstring for parameter axis of stable_cumsum

* added comment for why fall back to np.cumsum when np version < 1.9

* remove unneeded stable_cumsum

* added stable_cumsum's axis testing

* FIX numpy docstring for make_sparse_spd_matrix

* change stable_cumsum from error to warning
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment