[MRG + 1] FIX use high precision cumsum and check it is stable enough #7331

Merged
merged 6 commits into from Sep 9, 2016

Conversation

Projects
None yet
6 participants
@jnothman
Member

jnothman commented Sep 2, 2016

Fixes #6842. I don't know a test that will run quickly enough.

@TomDLT TomDLT added this to the 0.18 milestone Sep 2, 2016

@TomDLT

This comment has been minimized.

Show comment
Hide comment
@TomDLT

TomDLT Sep 2, 2016

Member

Wow even with float64 there are many errors ...
Small mistake in the check.

Member

TomDLT commented Sep 2, 2016

Wow even with float64 there are many errors ...
Small mistake in the check.

sklearn/utils/extmath.py
+ """
+ out = np.cumsum(arr, dtype=np.float64)
+ expected = np.sum(arr, dtype=np.float64)
+ if not np.allclose(out, expected):

This comment has been minimized.

@TomDLT

TomDLT Sep 2, 2016

Member
if not np.allclose(out[-1], expected):
@TomDLT

TomDLT Sep 2, 2016

Member
if not np.allclose(out[-1], expected):

This comment has been minimized.

@jnothman

jnothman Sep 3, 2016

Member

I thought I committed that change :p

@jnothman

jnothman Sep 3, 2016

Member

I thought I committed that change :p

sklearn/utils/extmath.py
+ expected = np.sum(arr, dtype=np.float64)
+ if not np.allclose(out, expected):
+ raise RuntimeError('cumsum was found to be unstable: '
+ 'its results do not correspond to sum')

This comment has been minimized.

@yenchenlin

yenchenlin Sep 2, 2016

Contributor

'its results' -> 'its last element' ?

@yenchenlin

yenchenlin Sep 2, 2016

Contributor

'its results' -> 'its last element' ?

This comment has been minimized.

@jnothman

jnothman Sep 3, 2016

Member

If you say so.

@jnothman

jnothman Sep 3, 2016

Member

If you say so.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 3, 2016

Member

Sorry for opening a PR and running away :)

Member

jnothman commented Sep 3, 2016

Sorry for opening a PR and running away :)

@yenchenlin

This comment has been minimized.

Show comment
Hide comment
@yenchenlin

yenchenlin Sep 3, 2016

Contributor

LGTM

Contributor

yenchenlin commented Sep 3, 2016

LGTM

@jnothman jnothman added the Bug label Sep 3, 2016

@tguillemot

This comment has been minimized.

Show comment
Hide comment
@tguillemot

tguillemot Sep 5, 2016

Contributor

Indeed that could be problematic. LGTM

Contributor

tguillemot commented Sep 5, 2016

Indeed that could be problematic. LGTM

@TomDLT

This comment has been minimized.

Show comment
Hide comment
@TomDLT

TomDLT Sep 5, 2016

Member

LGTM

Member

TomDLT commented Sep 5, 2016

LGTM

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 5, 2016

Member

Any ideas of pathological cases to test with quickly?

Member

jnothman commented Sep 5, 2016

Any ideas of pathological cases to test with quickly?

@TomDLT

This comment has been minimized.

Show comment
Hide comment
@TomDLT

TomDLT Sep 6, 2016

Member

No idea, except comparing roc_auc_score(y_true, y_pred, sample_weight=sample_weight_32) and  roc_auc_score(y_true, y_pred, sample_weight=sample_weight_64), yet it does not test the bug we want to fix, but the correction we propose, so I am not fond of this solution.

Member

TomDLT commented Sep 6, 2016

No idea, except comparing roc_auc_score(y_true, y_pred, sample_weight=sample_weight_32) and  roc_auc_score(y_true, y_pred, sample_weight=sample_weight_64), yet it does not test the bug we want to fix, but the correction we propose, so I am not fond of this solution.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 6, 2016

Member

I only think it's an effective test for very large vectors, hence
impractical.

On 6 September 2016 at 20:05, Tom Dupré la Tour notifications@github.com
wrote:

No idea, except comparing roc_auc_score(y_true, y_pred,
sample_weight=sample_weight_32) and roc_auc_score(y_true, y_pred,
sample_weight=sample_weight_64), yet it does not test the bug we want to
fix, but the correction we propose, so I am not fond of this solution.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#7331 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz6zdyrhBjU7xPTazZ20BVeBDqeRrgks5qnTrggaJpZM4JzY5A
.

Member

jnothman commented Sep 6, 2016

I only think it's an effective test for very large vectors, hence
impractical.

On 6 September 2016 at 20:05, Tom Dupré la Tour notifications@github.com
wrote:

No idea, except comparing roc_auc_score(y_true, y_pred,
sample_weight=sample_weight_32) and roc_auc_score(y_true, y_pred,
sample_weight=sample_weight_64), yet it does not test the bug we want to
fix, but the correction we propose, so I am not fond of this solution.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#7331 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz6zdyrhBjU7xPTazZ20BVeBDqeRrgks5qnTrggaJpZM4JzY5A
.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 6, 2016

Member

how long does a test with a large-enough vector run?

Member

amueller commented Sep 6, 2016

how long does a test with a large-enough vector run?

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 6, 2016

Member

lgtm

Member

amueller commented Sep 6, 2016

lgtm

@amueller amueller changed the title from [MRG] FIX use high precision cumsum and check it is stable enough to [MRG + 1] FIX use high precision cumsum and check it is stable enough Sep 6, 2016

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 7, 2016

Member
import sys
import time
import numpy as np
import pandas as pd
n_trials = 50
all_results = []
for dtype in [np.float32, np.float64]:
    for i in range(3, 8):
        n = 1 * 10 ** i
        absdiff, reldiff = [], []
        s = time.time()
        for j in range(n_trials):
            x = np.random.rand(n).astype(dtype)
            a = np.cumsum(x)[-1]
            b = np.sum(x)
            absdiff.append(np.abs(a - b))
            reldiff.append(absdiff[-1] / b)
        all_results.append((dtype.__name__, n, (time.time() - s) / len(results), np.log10(np.mean(absdiff)), np.log10(np.mean(reldiff))))

pd.DataFrame(all_results, columns=['dtype', 'n', 'time', 'log abs diff', 'log rel diff'])
dtype n time log abs diff log rel diff
float32 1000 0.00 -3.84 -6.54
float32 10000 0.00 -2.23 -5.93
float32 100000 0.00 -0.64 -5.34
float32 1000000 0.02 0.71 -4.99
float32 10000000 0.22 2.28 -4.41
float64 1000 0.00 -12.59 -15.28
float64 10000 0.00 -10.96 -14.66
float64 100000 0.00 -9.47 -14.16
float64 1000000 0.03 -7.96 -13.66
float64 10000000 0.23 -6.40 -13.09

np.allclose has default rtol=1e-5, atol=1e-8

Two questions:

  • can we build a non-regression case where the old code would have been broken? possibly: all the absolute diffs above for float32 exceed the atol; for 1e6 samples we get relative diff just greater than allclose's rtol in 0.02s, but this would be slower embedded in the rest of roc_auc_score.
  • can we get test coverage for the error message here? probably not, without having rtol as a parameter to stable_cumsum. This might be worth doing.

So I'll try build a non-regression test with 1e6 samples.

Member

jnothman commented Sep 7, 2016

import sys
import time
import numpy as np
import pandas as pd
n_trials = 50
all_results = []
for dtype in [np.float32, np.float64]:
    for i in range(3, 8):
        n = 1 * 10 ** i
        absdiff, reldiff = [], []
        s = time.time()
        for j in range(n_trials):
            x = np.random.rand(n).astype(dtype)
            a = np.cumsum(x)[-1]
            b = np.sum(x)
            absdiff.append(np.abs(a - b))
            reldiff.append(absdiff[-1] / b)
        all_results.append((dtype.__name__, n, (time.time() - s) / len(results), np.log10(np.mean(absdiff)), np.log10(np.mean(reldiff))))

pd.DataFrame(all_results, columns=['dtype', 'n', 'time', 'log abs diff', 'log rel diff'])
dtype n time log abs diff log rel diff
float32 1000 0.00 -3.84 -6.54
float32 10000 0.00 -2.23 -5.93
float32 100000 0.00 -0.64 -5.34
float32 1000000 0.02 0.71 -4.99
float32 10000000 0.22 2.28 -4.41
float64 1000 0.00 -12.59 -15.28
float64 10000 0.00 -10.96 -14.66
float64 100000 0.00 -9.47 -14.16
float64 1000000 0.03 -7.96 -13.66
float64 10000000 0.23 -6.40 -13.09

np.allclose has default rtol=1e-5, atol=1e-8

Two questions:

  • can we build a non-regression case where the old code would have been broken? possibly: all the absolute diffs above for float32 exceed the atol; for 1e6 samples we get relative diff just greater than allclose's rtol in 0.02s, but this would be slower embedded in the rest of roc_auc_score.
  • can we get test coverage for the error message here? probably not, without having rtol as a parameter to stable_cumsum. This might be worth doing.

So I'll try build a non-regression test with 1e6 samples.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 8, 2016

Member

A regression test for roc_auc_score turns out to be too slow.

Member

jnothman commented Sep 8, 2016

A regression test for roc_auc_score turns out to be too slow.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 8, 2016

Member

I've added a direct test of stable_cumsum using configurable atol, rtol

Member

jnothman commented Sep 8, 2016

I've added a direct test of stable_cumsum using configurable atol, rtol

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 8, 2016

Member

I suppose the remaining question is: are there other places in the codebase where cumsum is happening and might be over float32 data or very very large arrays?

Member

jnothman commented Sep 8, 2016

I suppose the remaining question is: are there other places in the codebase where cumsum is happening and might be over float32 data or very very large arrays?

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Sep 8, 2016

Member

One of the Travis error will go away if you rebase on master but there is one on Python 2.7 that seems genuine:

======================================================================
FAIL: sklearn.utils.tests.test_extmath.test_stable_cumsum
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/miniconda/envs/testenv/lib/python2.6/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/travis/sklearn_build_oldest/scikit-learn/sklearn/utils/tests/test_extmath.py", line 656, in test_stable_cumsum
    stable_cumsum, r, rtol=0, atol=0)
  File "/home/travis/sklearn_build_oldest/scikit-learn/sklearn/utils/testing.py", line 438, in assert_raise_message
    (names, function.__name__))
AssertionError: RuntimeError not raised by stable_cumsum
Member

lesteve commented Sep 8, 2016

One of the Travis error will go away if you rebase on master but there is one on Python 2.7 that seems genuine:

======================================================================
FAIL: sklearn.utils.tests.test_extmath.test_stable_cumsum
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/miniconda/envs/testenv/lib/python2.6/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/travis/sklearn_build_oldest/scikit-learn/sklearn/utils/tests/test_extmath.py", line 656, in test_stable_cumsum
    stable_cumsum, r, rtol=0, atol=0)
  File "/home/travis/sklearn_build_oldest/scikit-learn/sklearn/utils/testing.py", line 438, in assert_raise_message
    (names, function.__name__))
AssertionError: RuntimeError not raised by stable_cumsum
@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 8, 2016

Member

One of the Travis error will go away if you rebase on master but there is one on Python 2.7 that seems genuine:

Yeah, I was wondering if there'd be some platform that didn't fail my test....

Member

jnothman commented Sep 8, 2016

One of the Travis error will go away if you rebase on master but there is one on Python 2.7 that seems genuine:

Yeah, I was wondering if there'd be some platform that didn't fail my test....

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 8, 2016

Member

I assume that means numpy used to have an unstable implementation of sum, rather than it used to have a stable implementation of cumsum

Member

jnothman commented Sep 8, 2016

I assume that means numpy used to have an unstable implementation of sum, rather than it used to have a stable implementation of cumsum

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 8, 2016

Member

I wonder: Should I only run the test on recent numpy, or just remove it?

Member

jnothman commented Sep 8, 2016

I wonder: Should I only run the test on recent numpy, or just remove it?

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 8, 2016

Member

Hm... testing the error message seems slightly odd since it relies on numpy being broken. I guess having a test with a known correct result that wasn't correct before might be nicer, but not offer the error coverage?

Like

x = np.empty(1e7)
x.fill(1e-7)
np.isclose(stable_cumsum(x)[-1], 1, rtol=0, atol=1e-12)
Member

amueller commented Sep 8, 2016

Hm... testing the error message seems slightly odd since it relies on numpy being broken. I guess having a test with a known correct result that wasn't correct before might be nicer, but not offer the error coverage?

Like

x = np.empty(1e7)
x.fill(1e-7)
np.isclose(stable_cumsum(x)[-1], 1, rtol=0, atol=1e-12)
@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 8, 2016

Member

It might just be my bedtime, but I'm not sure I get what test you're suggesting.

Member

jnothman commented Sep 8, 2016

It might just be my bedtime, but I'm not sure I get what test you're suggesting.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 8, 2016

Member

np.isclose(cumsum(x)[-1], 1, rtol=0, atol=1e-12)
fails for np.cumsum currently, but should work for stable_cumsum

Member

amueller commented Sep 8, 2016

np.isclose(cumsum(x)[-1], 1, rtol=0, atol=1e-12)
fails for np.cumsum currently, but should work for stable_cumsum

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 8, 2016

Member

No, I don't think it will work for stable_cumsum in old numpy. I might be wrong though.

Member

jnothman commented Sep 8, 2016

No, I don't think it will work for stable_cumsum in old numpy. I might be wrong though.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 8, 2016

Member

ah... huh...

Member

amueller commented Sep 8, 2016

ah... huh...

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 8, 2016

Member

So just drop the test, I suppose.

Member

jnothman commented Sep 8, 2016

So just drop the test, I suppose.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 8, 2016

Member

And merge the PR?

Member

jnothman commented Sep 8, 2016

And merge the PR?

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 8, 2016

Member

fine with me

Member

amueller commented Sep 8, 2016

fine with me

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Sep 8, 2016

Member

I believe the improved stability of np.sum was done in numpy 1.9: http://docs.scipy.org/doc/numpy/release.html#better-numerical-stability-for-sum-in-some-cases and I also quickly checked that in 1.8 cumsum and sum were giving the same wrong result in one of your snippet but 1.9 was fine. Maybe skip the test for numpy < 1.9?

Member

lesteve commented Sep 8, 2016

I believe the improved stability of np.sum was done in numpy 1.9: http://docs.scipy.org/doc/numpy/release.html#better-numerical-stability-for-sum-in-some-cases and I also quickly checked that in 1.8 cumsum and sum were giving the same wrong result in one of your snippet but 1.9 was fine. Maybe skip the test for numpy < 1.9?

sklearn/utils/tests/test_extmath.py
@@ -648,6 +650,8 @@ def test_softmax():
def test_stable_cumsum():
+ if np_version < (1, 19):

This comment has been minimized.

@lesteve

lesteve Sep 8, 2016

Member

19 -> 9 otherwise we may wait a while until this test gets run ;-).

@lesteve

lesteve Sep 8, 2016

Member

19 -> 9 otherwise we may wait a while until this test gets run ;-).

This comment has been minimized.

@jnothman

jnothman Sep 8, 2016

Member

Argh. Making so many errors. Should be in bed. Should take a break from this stuff, too!

@jnothman

jnothman Sep 8, 2016

Member

Argh. Making so many errors. Should be in bed. Should take a break from this stuff, too!

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 8, 2016

Member

could please run a check for np1.9, @lesteve?

Member

jnothman commented Sep 8, 2016

could please run a check for np1.9, @lesteve?

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Sep 8, 2016

Member

could please run a check for np1.9, @lesteve?

Not sure what you exactly mean, but here is what I tried:

import numpy as np

np.random.seed(42)
n_samples = 4 * 10 ** 7
y = np.random.randint(2, size=n_samples)
prediction = np.random.normal(size=n_samples) + y * 0.01
trivial_weight = np.ones(n_samples)

print(np.cumsum(trivial_weight.astype('float32'))[-1])
print(np.sum(trivial_weight.astype('float32')))

Output for numpy 1.8:

1.67772e+07
1.67772e+07

Output for numpy 1.9:

1.67772e+07
4e+07
Member

lesteve commented Sep 8, 2016

could please run a check for np1.9, @lesteve?

Not sure what you exactly mean, but here is what I tried:

import numpy as np

np.random.seed(42)
n_samples = 4 * 10 ** 7
y = np.random.randint(2, size=n_samples)
prediction = np.random.normal(size=n_samples) + y * 0.01
trivial_weight = np.ones(n_samples)

print(np.cumsum(trivial_weight.astype('float32'))[-1])
print(np.sum(trivial_weight.astype('float32')))

Output for numpy 1.8:

1.67772e+07
1.67772e+07

Output for numpy 1.9:

1.67772e+07
4e+07
@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Sep 8, 2016

Member

I also made sure that the test was run for numpy 1.9. LGTM will wait for AppVeyor and then merge.

Member

lesteve commented Sep 8, 2016

I also made sure that the test was run for numpy 1.9. LGTM will wait for AppVeyor and then merge.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Sep 9, 2016

Member

Merging, thanks!

Member

lesteve commented Sep 9, 2016

Merging, thanks!

@lesteve lesteve merged commit 49d126f into scikit-learn:master Sep 9, 2016

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

rsmith54 pushed a commit to rsmith54/scikit-learn that referenced this pull request Sep 14, 2016

[MRG + 1] FIX use high precision cumsum and check it is stable enough (
…#7331)

* FIX use high precision cumsum and check it is stable enough

TomDLT added a commit to TomDLT/scikit-learn that referenced this pull request Oct 3, 2016

[MRG + 1] FIX use high precision cumsum and check it is stable enough (
…#7331)

* FIX use high precision cumsum and check it is stable enough

Sundrique added a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017

[MRG + 1] FIX use high precision cumsum and check it is stable enough (
…#7331)

* FIX use high precision cumsum and check it is stable enough

paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

[MRG + 1] FIX use high precision cumsum and check it is stable enough (
…#7331)

* FIX use high precision cumsum and check it is stable enough
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment