Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roc_auc_score computation is wrong for large samples #6842

Closed
arogozhnikov opened this issue May 29, 2016 · 15 comments
Closed

roc_auc_score computation is wrong for large samples #6842

arogozhnikov opened this issue May 29, 2016 · 15 comments
Labels
Milestone

Comments

@arogozhnikov
Copy link

arogozhnikov commented May 29, 2016

Hi, we've recently found some strange inconsistency in the behavior of roc_auc_score. After long digging @tata-antares found out that it is wrongly computed for large data samples if weights passed are float32.

Minimal example

from sklearn.metrics import roc_auc_score
import numpy
numpy.random.seed(42)
​
n_samples = 4 * 10 ** 7
y = numpy.random.randint(2, size=n_samples)
prediction = numpy.random.normal(size=n_samples) + y * 0.01
trivial_weight = numpy.ones(n_samples)

roc_auc_score(y, prediction)
#prints 0.50273924526046887

roc_auc_score(y, prediction, sample_weight=trivial_weight.astype('float32'))
#**prints nan** and wrong warning
/moosefs/miniconda/envs/ipython_py2/lib/python2.7/site-packages/sklearn/metrics/ranking.py:530: UndefinedMetricWarning: No negative samples in y_true, false positive value should be meaningless
  UndefinedMetricWarning)

roc_auc_score(y, prediction, sample_weight=trivial_weight.astype('float64'))
#prints 0.50273924526046887

The value when dtype=float64 is correct, but in the case we investigated after passing float32 metric returned a regular number, which is very different from correct one (e.g 0.58 instead of 0.52). It is some occasion we detected this problem.

sklearn == '0.17.1'

@nelson-liu
Copy link
Contributor

nelson-liu commented May 29, 2016

maybe related to issues # 6711 and 3864? Nevermind it's not.

@arogozhnikov
Copy link
Author

#3864 is due to clipping, may be related to #6711, but seems no weights there.

@jnothman
Copy link
Member

jnothman commented May 31, 2016

I can confirm getting "UndefinedMetricWarning: No negative samples in y_true, false positive value should be meaningless" but I'm not able to replicate the nan. Is there a reason you did not report the rest of your platform details?

@jnothman
Copy link
Member

Scratch that. I do get nan. Just wasn't reading output correctly.

@jnothman
Copy link
Member

This is what it comes down to:

In [25]: np.cumsum(trivial_weight.astype('float32'))[-1]
Out[25]: 16777216.0

In [26]: np.cumsum(trivial_weight.astype('float64'))[-1]
Out[26]: 40000000.0

In [27]: np.sum(trivial_weight.astype('float32'))
Out[27]: 40000000.0

In [28]: np.sum(trivial_weight.astype('float32')).dtype
Out[28]: dtype('float32')

Cumsum is losing precision, even when a float32 sum does not. This might be a numpy bug. If so, it lies in np.add.accumulate:

In [10]: np.add.accumulate(trivial_weight.astype('float32'))[-1]
Out[10]: 16777216.0

We can solve here by always passing dtype='float64' to cumsum, or doing so if the number of samples is high or something. There might also be a way to do cumsum in batches in a way that reduces the issue. Certainly we can identify the error by doing the cumsum and checking if its final value is equal to the sum.

@jnothman
Copy link
Member

jnothman commented May 31, 2016

@arogozhnikov, please let me know if you report a bug to numpy. Feel free to submit a PR here.

and @nelson-liu, this is entirely independent of those other bugs.

@jnothman jnothman added the Bug label May 31, 2016
@nelson-liu
Copy link
Contributor

@jnothman ah ok, sorry I didn't look super closely into it; I just remembered seeing those other two threads and xref'd them here. thanks for letting me know.

@arogozhnikov
Copy link
Author

arogozhnikov commented Jun 3, 2016

@jnothman
it's not a bug, it is an expected behavior, since this code

x = trivial_weight
for i in range(len(x) - 1):
    x[i + 1] += x[i]

gives the same (wrong) result.

np.sum is a different story — one can change order of summation there in any possible way.
As for np.cumsum, this will drive to doubling number of computations (even for optimal implementation), so I don't think numpy team will do this (at least till the moment they need multithreading for np.cumsum).

@jnothman
Copy link
Member

jnothman commented Jun 4, 2016

I don't find your claim to be true:


In [9]: x = trivial_weight.copy()

In [10]: for i in range(len(x) - 1):
   ....:         x[i + 1] += x[i]
   ....:
x[-1]

In [11]: x[-1]
Out[11]: 40000000.0

@jnothman
Copy link
Member

jnothman commented Jun 4, 2016

whoops forgot astype. stupid.

@jnothman
Copy link
Member

jnothman commented Jun 4, 2016

you're right.... though that's really quite nasty.

@amueller amueller added this to the 0.18 milestone Aug 15, 2016
@jnothman
Copy link
Member

Would we be happy with a fix that performs cumsum in float64?

@amueller
Copy link
Member

+1 for cumsum in float64.

@arogozhnikov
Copy link
Author

arogozhnikov commented Aug 31, 2016

Would we be happy with a fix that performs cumsum in float64?

Yes, sounds good enough.

@jnothman
Copy link
Member

Is there a way to test this? Would we better off with a (somewhat costly) helper to insure stability even in the float64 case? Like:

def stable_cumsum(arr):
    out = np.cumsum(arr, dtype=np.float64)
    expected = np.sum(arr, dtype=np.float64)
    if not np.allclose(out, expected):
        raise RuntimeError('cumsum was found to be unstable: its results do not correspond to sum')
    return out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants