Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] FIX unstable cumsum #7376

Merged
merged 14 commits into from Oct 17, 2016
Merged

[MRG+1] FIX unstable cumsum #7376

merged 14 commits into from Oct 17, 2016

Conversation

@yangarbiter
Copy link
Contributor

@yangarbiter yangarbiter commented Sep 9, 2016

Reference Issue

#7359

What does this implement/fix? Explain your changes.

np.cumsum is reported unstable when dealing with float32 data or very large arrays of float64 data in #6842. This pull request change them into sklearn.utils.extmath.stable_cumsum to solve this problem (#7331).

@yangarbiter yangarbiter force-pushed the yangarbiter:cumsum branch from 15fe9d5 to 34d8eef Sep 9, 2016
@TomDLT
Copy link
Member

@TomDLT TomDLT commented Sep 9, 2016

Failure in test_random_choice_csc is due to a call to stable_cumsum(array([nan])), which fails since np.allclose(np.nan, np.nan) = False.

@yangarbiter yangarbiter force-pushed the yangarbiter:cumsum branch 15 times, most recently from 7198a3f to 64b244b Sep 10, 2016
@yangarbiter yangarbiter changed the title [WIP] FIX unstable cumsum [MRG] FIX unstable cumsum Sep 12, 2016
out = np.cumsum(arr, dtype=np.float64)
expected = np.sum(arr, dtype=np.float64)
if not np.allclose(out[-1], expected, rtol=rtol, atol=atol):
if np_version < (1, 9):

This comment has been minimized.

@TomDLT

TomDLT Sep 12, 2016
Member

you should add a comment explaining why we skip the check in this case

@@ -844,7 +844,7 @@ def _deterministic_vector_sign_flip(u):
return u


def stable_cumsum(arr, rtol=1e-05, atol=1e-08):
def stable_cumsum(arr, axis=None, rtol=1e-05, atol=1e-08):

This comment has been minimized.

@TomDLT

TomDLT Sep 12, 2016
Member

You should add axis in the docstring

This comment has been minimized.

@jnothman

jnothman Sep 12, 2016
Member

axis needs testing.

@TomDLT TomDLT changed the title [MRG] FIX unstable cumsum [MRG+1] FIX unstable cumsum Sep 12, 2016
@TomDLT
Copy link
Member

@TomDLT TomDLT commented Sep 12, 2016

LGTM

@@ -333,7 +334,7 @@ def make_multilabel_classification(n_samples=100, n_features=20, n_classes=5,
generator = check_random_state(random_state)
p_c = generator.rand(n_classes)
p_c /= p_c.sum()
cumulative_p_c = np.cumsum(p_c)
cumulative_p_c = stable_cumsum(p_c)

This comment has been minimized.

@jnothman

jnothman Sep 12, 2016
Member

I don't think this is much value. p_c will always be high precision, and problems of numerical instability in cumulative summing aren't likely to be issue at the scale of "number of classes". Please apply this change with more discretion. It comes at a (small) cost.

@@ -143,7 +144,7 @@ def choice(a, size=None, replace=True, p=None, random_state=None):
# Actual sampling
if replace:
if p is not None:
cdf = p.cumsum()
cdf = stable_cumsum(p)

This comment has been minimized.

@jnothman

jnothman Sep 12, 2016
Member

As a backport, we should leave this file unchanged.

@@ -394,7 +394,7 @@ def sample(self, n_samples=1, random_state=None):
if random_state is None:
random_state = self.random_state
random_state = check_random_state(random_state)
weight_cdf = np.cumsum(self.weights_)
weight_cdf = stable_cumsum(self.weights_)

This comment has been minimized.

@jnothman

jnothman Sep 12, 2016
Member

Again, I think this is unlikely to be a problem case. self.weights_.shape is small.

@jnothman jnothman added this to the 0.19 milestone Sep 13, 2016
@yangarbiter yangarbiter force-pushed the yangarbiter:cumsum branch 2 times, most recently from 63ded1e to d771a26 Sep 13, 2016
@amueller
Copy link
Member

@amueller amueller commented Sep 13, 2016

LGTM

@jnothman
Copy link
Member

@jnothman jnothman commented Sep 13, 2016

I've not reviewed this fully yet and don't consider it an immediate priority. I think we should use some discretion with the helper as it is a little more expensive.

@yangarbiter yangarbiter force-pushed the yangarbiter:cumsum branch from 507b89f to 20e0724 Oct 8, 2016
@jnothman
Copy link
Member

@jnothman jnothman commented Oct 8, 2016

I still haven't taken a good look at these. I'd like to be a bit conservative about it.

@GaelVaroquaux
Copy link
Member

@GaelVaroquaux GaelVaroquaux commented Oct 8, 2016

Is the plan that we are going to raise errors on users? As @NelleV, I wouldn't find this very useful for the end users. I would much prefer raising a warning. If we want to control for such problem in our test codebase, we could specifically turn this warning into an error (eg with warnings.simplefilter) during the tests.

@GaelVaroquaux
Copy link
Member

@GaelVaroquaux GaelVaroquaux commented Oct 8, 2016

@jnothman
Copy link
Member

@jnothman jnothman commented Oct 13, 2016

I'm happy to change the behaviour to a warning.

@jnothman
Copy link
Member

@jnothman jnothman commented Oct 13, 2016

@yangarbiter do you want to incorporate that change?

@yangarbiter
Copy link
Contributor Author

@yangarbiter yangarbiter commented Oct 13, 2016

Sure. I can do that!

On Thu, Oct 13, 2016, 9:11 PM Joel Nothman notifications@github.com wrote:

@yangarbiter https://github.com/yangarbiter do you want to incorporate
that change?


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#7376 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AD_51aIF-MRzujMg9KMrDkJuIsgYjkm0ks5qzi4SgaJpZM4J4_Zp
.

--YY

@GaelVaroquaux
Copy link
Member

@GaelVaroquaux GaelVaroquaux commented Oct 14, 2016

I wanted to give my +1 and merge, but there are test failures both in AppVeyor and in travis, with different failure: you seem to have a mixture of tabs and spaces for indentation, and the test needs to be upgraded to test for the warning, and not the RunTime error.

@yangarbiter yangarbiter force-pushed the yangarbiter:cumsum branch from eb8dac5 to fb8cd36 Oct 14, 2016
@yangarbiter
Copy link
Contributor Author

@yangarbiter yangarbiter commented Oct 14, 2016

Sorry about that, I've fixed it.

@yangarbiter yangarbiter force-pushed the yangarbiter:cumsum branch 2 times, most recently from dabe6f6 to d7d003d Oct 14, 2016
@yangarbiter yangarbiter force-pushed the yangarbiter:cumsum branch from d7d003d to fd9a02e Oct 14, 2016
@tguillemot
Copy link
Contributor

@tguillemot tguillemot commented Oct 17, 2016

I think we have a +3 for that.
@jnothman ok to merge ?

@GaelVaroquaux
Copy link
Member

@GaelVaroquaux GaelVaroquaux commented Oct 17, 2016

+1 to merge. Merging. Thanks!

@GaelVaroquaux GaelVaroquaux merged commit fa59873 into scikit-learn:master Oct 17, 2016
3 checks passed
3 checks passed
ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jnothman
Copy link
Member

@jnothman jnothman commented Oct 18, 2016

I hadn't actually looked at this in full, so I hope others gave it a proper
review. Thanks @yangarbiter.

On 18 October 2016 at 00:49, Gael Varoquaux notifications@github.com
wrote:

Merged #7376 #7376.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7376 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz61zHw9wtgcA7-SPfWnF-UQ7crMdoks5q03zkgaJpZM4J4_Zp
.

@yangarbiter
Copy link
Contributor Author

@yangarbiter yangarbiter commented Oct 18, 2016

Thank you too!

@lesteve
Copy link
Member

@lesteve lesteve commented Oct 19, 2016

ConvergenceWarning seems like a slightly strange choice for stable_cumsum, doesn't it? Should we not use a RuntimeWarning like numpy seems to do for overflows?

@jnothman
Copy link
Member

@jnothman jnothman commented Oct 19, 2016

ConvergenceWarning should probably be a RuntimeWarning. But yes, I suppose
this is a question of numerical stability rather than parameter choice, so
RuntimeWarning may be more appropriate.

On 19 October 2016 at 18:14, Loïc Estève notifications@github.com wrote:

ConvergenceWarning seems like a slightly strange choice for stable_cumsum,
doesn't it? Should we not use a RuntimeWarning like numpy seems to do for
overflows?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#7376 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz672EeCrXMRZIRpvDjG8gqXVHeMIyks5q1cNggaJpZM4J4_Zp
.

@yangarbiter
Copy link
Contributor Author

@yangarbiter yangarbiter commented Oct 19, 2016

Let me fix that.
Do I need to start a new PR?

@lesteve
Copy link
Member

@lesteve lesteve commented Nov 22, 2016

Opened #7922 to replace ConvergenceWarning by RuntimeWarning

afiodorov added a commit to unravelin/scikit-learn that referenced this pull request Apr 25, 2017
* FIX unstable cumsum in utils.random

* equal_nan = true for isclose
since numpy < 1.9 sum is as unstable as cumsum, fallback to np.cumsum

* added axis parameter to stable_cumsum

* FIX unstable sumsum in ensemble.weight_boosting and utils.stats

* FIX axis problem in stable_cumsum

* FIX unstable cumsum in mixture.gmm and mixture.dpgmm

* FIX unstable cumsum in cluster.k_means_, decomposition.pca, and manifold.locally_linear

* FIX unstable sumsum in dataset.samples_generator

* added docstring for parameter axis of stable_cumsum

* added comment for why fall back to np.cumsum when np version < 1.9

* remove unneeded stable_cumsum

* added stable_cumsum's axis testing

* FIX numpy docstring for make_sparse_spd_matrix

* change stable_cumsum from error to warning
Sundrique added a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
* FIX unstable cumsum in utils.random

* equal_nan = true for isclose
since numpy < 1.9 sum is as unstable as cumsum, fallback to np.cumsum

* added axis parameter to stable_cumsum

* FIX unstable sumsum in ensemble.weight_boosting and utils.stats

* FIX axis problem in stable_cumsum

* FIX unstable cumsum in mixture.gmm and mixture.dpgmm

* FIX unstable cumsum in cluster.k_means_, decomposition.pca, and manifold.locally_linear

* FIX unstable sumsum in dataset.samples_generator

* added docstring for parameter axis of stable_cumsum

* added comment for why fall back to np.cumsum when np version < 1.9

* remove unneeded stable_cumsum

* added stable_cumsum's axis testing

* FIX numpy docstring for make_sparse_spd_matrix

* change stable_cumsum from error to warning
paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
* FIX unstable cumsum in utils.random

* equal_nan = true for isclose
since numpy < 1.9 sum is as unstable as cumsum, fallback to np.cumsum

* added axis parameter to stable_cumsum

* FIX unstable sumsum in ensemble.weight_boosting and utils.stats

* FIX axis problem in stable_cumsum

* FIX unstable cumsum in mixture.gmm and mixture.dpgmm

* FIX unstable cumsum in cluster.k_means_, decomposition.pca, and manifold.locally_linear

* FIX unstable sumsum in dataset.samples_generator

* added docstring for parameter axis of stable_cumsum

* added comment for why fall back to np.cumsum when np version < 1.9

* remove unneeded stable_cumsum

* added stable_cumsum's axis testing

* FIX numpy docstring for make_sparse_spd_matrix

* change stable_cumsum from error to warning
maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
* FIX unstable cumsum in utils.random

* equal_nan = true for isclose
since numpy < 1.9 sum is as unstable as cumsum, fallback to np.cumsum

* added axis parameter to stable_cumsum

* FIX unstable sumsum in ensemble.weight_boosting and utils.stats

* FIX axis problem in stable_cumsum

* FIX unstable cumsum in mixture.gmm and mixture.dpgmm

* FIX unstable cumsum in cluster.k_means_, decomposition.pca, and manifold.locally_linear

* FIX unstable sumsum in dataset.samples_generator

* added docstring for parameter axis of stable_cumsum

* added comment for why fall back to np.cumsum when np version < 1.9

* remove unneeded stable_cumsum

* added stable_cumsum's axis testing

* FIX numpy docstring for make_sparse_spd_matrix

* change stable_cumsum from error to warning
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

8 participants
You can’t perform that action at this time.