-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scipy.special.smirnov and scipy.special.smirnovi have accuracy & convergence issues #7456
Comments
…scipy.stats.ksone) scipy.special.{smirnov,smirnovi} have accuracy & convergence issues incorrect values for scipy.stats.ksone.pdf() Add a smirnovp(), derivative of smirnov(), function. Use smirnovp() in smirnovi() instead of an asymptotic. If the initial estimate for x in smirnovi() is not going to work for Newton-Raphson (because it is on a "bad" part of the curve), do some bisection steps to get a better estimate, one which is on the correct side of the desired value. Replace the relative stopping criteria with that used in np.allclose(). With this in place, the algorithm seems to converge in about 5 iterations (10 function evaluations) for most values of p. (The most I've observed is 18 iterations, 31 functions). Exposing smirnovp() allows its use in scipy.stats.ksone() which addresses the inaccurate pdf(). Requires changes in many cephes infrastructure files, adding doc, exports and tests.
Causes:
In particular, the convergence is neither guaranteed nor quadratic. S4. The algorithm used to compute
This switch in of itself should result in only small differences. In reality however, the first algorithm has failed for n much lower n than 1013. I.e. the first algorithm, the one for small n, is currently underreporting the probability for some range of values of n and x. |
…scipy.stats.ksone) scipy.special.{smirnov,smirnovi} have accuracy & convergence issues incorrect values for scipy.stats.ksone.pdf() Add a smirnovp(), derivative of smirnov(), function. Use smirnovp() in smirnovi() instead of an asymptotic. If the initial estimate for x in smirnovi() is not going to work for Newton-Raphson (because it is on a "bad" part of the curve), perform some bisection steps to get a better estimate, one which is on the correct side of the desired value. Replace the relative stopping criteria with same logic used in np.allclose(). With these changes, the algorithm seems to converge in about 5 NR iterations (10 function evaluations) for most values of p. Exposing smirnovp() allows its use in scipy.stats.ksone() which addresses the inaccurate pdf(). Requires changes in many cephes infrastructure files, adding doc, exports.
* smirnovi_stability: Addresses scipy#7456, scipy#7426, scipy#7455 (scipy.special.smirnov, scipy.stats.ksone) scipy.special.{smirnov,smirnovi} have accuracy & convergence issues incorrect values for scipy.stats.ksone.pdf()
This gets rid of a lot of unused functionality, simplifying the code quite a bit, plus fixes incorrect handling of endianness of output arrays, making the fixes in scipy#7456 unnecessary. For simplicity, some performance is lost in the case of a non-aligned or byteswapped user provided output array.
This gets rid of a lot of unused functionality, simplifying the code quite a bit, plus fixes incorrect handling of endianness of output arrays, making the fixes in scipy#7456 unnecessary. For simplicity, some performance is lost in the case of a non-aligned or byteswapped user provided output array.
This gets rid of a lot of unused functionality, simplifying the code quite a bit, plus fixes incorrect handling of endianness of output arrays, making the fixes in scipy#7456 unnecessary. For simplicity, some performance is lost in the case of a non-aligned or byteswapped user provided output array.
This gets rid of a lot of unused functionality, simplifying the code quite a bit, plus fixes incorrect handling of endianness of output arrays, making the fixes in scipy#7456 unnecessary. For simplicity, some performance is lost in the case of a non-aligned or byteswapped user provided output array.
This gets rid of a lot of unused functionality, simplifying the code quite a bit, plus fixes incorrect handling of endianness of output arrays, making the fixes in scipy#7456 unnecessary. For simplicity, some performance is lost in the case of a non-aligned or byteswapped user provided output array.
This gets rid of a lot of unused functionality, simplifying the code quite a bit, plus fixes incorrect handling of endianness of output arrays, making the fixes in #7456 unnecessary. For simplicity, some performance is lost in the case of a non-aligned or byteswapped user provided output array.
* smirnovi_stability: Added smirnovc/smirnovci, the CDF for the Smirnov probability distribution. Addresses scipy#7456, scipy#7426, scipy#7455 (scipy.special.smirnov, scipy.stats.ksone) scipy.special.{smirnov,smirnovi} have accuracy & convergence issues; incorrect values for scipy.stats.ksone.pdf()
…ogorov. Addresses scipy#7426, scipy#7455, scipy#7456, scipy#7492. scipy.special.cephes: A rewrite of the algorithms in the Kolomgorov-Smirnov statistics package. Kolmogorov: Improve efficiency of kolmogorov(). Fixed NANs, false convergence, unbounded computation in kolmogi(). Added dedicated functions (kolmogc/kolmogci/ kolmogp) to compute the CDF, PPF and PDF. Smirnov: Improve accuracy of smirnov(). Computation suffered from underflow and/or denormalization, resulting in probabilities occasionally off by orders of magnitude. The SF/CDF is now usually with 1 ULP, the PDF within a few ULP. Fixed NANs, convergence failures, false convergence in smirnovi(). Added dedicated functions (smirnovc/smirnovci/smirnovp) to compute the CDF, PPF and PDF. Added additional testing of K-S, removed xfail on previous tests. scipy.stats: ksone and kstwobign classes. Added dedicated _sf, _pdf, _isf methods to call new K-S functionality. Updated _cdf and _ppf methods to make appropriate calls. The accuracy improvements to smirnov() required increased precision in some parts of the computation. Ported most of the dd_real part of the qd package and used in kolmogorov.c
…ogorov. Addresses scipy#7426, scipy#7455, scipy#7456, scipy#7492. scipy.special.cephes: A rewrite of the algorithms in the Kolomgorov-Smirnov statistics package. Kolmogorov: Improve efficiency of kolmogorov(). Fixed NANs, false convergence, unbounded computation in kolmogi(). Added dedicated functions (kolmogc/kolmogci/ kolmogp) to compute the CDF, PPF and PDF. Smirnov: Improve accuracy of smirnov(). Computation suffered from underflow and/or denormalization, resulting in probabilities occasionally off by orders of magnitude. The SF/CDF is now usually with 1 ULP, the PDF within a few ULP. Fixed NANs, convergence failures, false convergence in smirnovi(). Added dedicated functions (smirnovc/smirnovci/smirnovp) to compute the CDF, PPF and PDF. Added additional testing of K-S, removed xfail on previous tests. scipy.stats: ksone and kstwobign classes. Added dedicated _sf, _pdf, _isf methods to call new K-S functionality. Updated _cdf and _ppf methods to make appropriate calls. The accuracy improvements to smirnov() required increased precision in some parts of the computation. Uses the recent port the dd_real part of the qd package. Allow FuncData testing to specify a list/array of dtypes to apply. FuncData.check(dtypes=[int, float, float]) will treat the first 3 columns as type int, float, float respectively, rather than blanketly treating all as the underlying type of the data. Needed for testing some C functions that take an int as a parameter.
…ogorov. Addresses scipy#7426, scipy#7455, scipy#7456, scipy#7492. scipy.special.cephes: A rewrite of the algorithms in the Kolomgorov-Smirnov statistics package. Kolmogorov: Improve efficiency of kolmogorov(). Fixed NANs, false convergence, unbounded computation in kolmogi(). Added dedicated functions (kolmogc/kolmogci/ kolmogp) to compute the CDF, PPF and PDF. Smirnov: Improve accuracy of smirnov(). Computation suffered from underflow and/or denormalization, resulting in probabilities occasionally off by orders of magnitude. The SF/CDF is now usually with 1 ULP, the PDF within a few ULP. Fixed NANs, convergence failures, false convergence in smirnovi(). Added dedicated functions (smirnovc/smirnovci/smirnovp) to compute the CDF, PPF and PDF. Added additional testing of K-S, removed xfail on previous tests. scipy.stats: ksone and kstwobign classes. Added dedicated _sf, _pdf, _isf methods to call new K-S functionality. Updated _cdf and _ppf methods to make appropriate calls. The accuracy improvements to smirnov() required increased precision in some parts of the computation. Uses the recent port the dd_real part of the qd package. Allow FuncData testing to specify a list/array of dtypes to apply. FuncData.check(dtypes=[int, float, float]) will treat the first 3 columns as type int, float, float respectively, rather than blanketly treating all as the underlying type of the data. Needed for testing some C functions that take an int as a parameter.
…ogorov. Addresses scipy#7426, scipy#7455, scipy#7456, scipy#7492. scipy.special.cephes: A rewrite of the algorithms in the Kolomgorov-Smirnov statistics package. Kolmogorov: Improve efficiency of kolmogorov(). Fixed NANs, false convergence, unbounded computation in kolmogi(). Added dedicated functions (kolmogc/kolmogci/ kolmogp) to compute the CDF, PPF and PDF. Smirnov: Improve accuracy of smirnov(). Computation suffered from underflow and/or denormalization, resulting in probabilities occasionally off by orders of magnitude. The SF/CDF is now usually with 1 ULP, the PDF within a few ULP. Fixed NANs, convergence failures, false convergence in smirnovi(). Added dedicated functions (smirnovc/smirnovci/smirnovp) to compute the CDF, PPF and PDF. Added additional testing of K-S, removed xfail on previous tests. scipy.stats: ksone and kstwobign classes. Added dedicated _sf, _pdf, _isf methods to call new K-S functionality. Updated _cdf and _ppf methods to make appropriate calls. The accuracy improvements to smirnov() required increased precision in some parts of the computation. Uses the recent port the dd_real part of the qd package. Allow FuncData testing to specify a list/array of dtypes to apply. FuncData.check(dtypes=[int, float, float]) will treat the first 3 columns as type int, float, float respectively, rather than blanketly treating all as the underlying type of the data. Needed for testing some C functions that take an int as a parameter.
…ogorov. Addresses scipy#7426, scipy#7455, scipy#7456, scipy#7492. scipy.special.cephes: A rewrite of the algorithms in the Kolomgorov-Smirnov statistics package. Kolmogorov: Improve efficiency of kolmogorov(). Fixed NANs, false convergence, unbounded computation in kolmogi(). Added dedicated functions (kolmogc/kolmogci/ kolmogp) to compute the CDF, PPF and PDF. Smirnov: Improve accuracy of smirnov(). Computation suffered from underflow and/or denormalization, resulting in probabilities occasionally off by orders of magnitude. The SF/CDF is now usually with 1 ULP, the PDF within a few ULP. Fixed NANs, convergence failures, false convergence in smirnovi(). Added dedicated functions (smirnovc/smirnovci/smirnovp) to compute the CDF, PPF and PDF. Added additional testing of K-S, removed xfail on previous tests. scipy.stats: ksone and kstwobign classes. Added dedicated _sf, _pdf, _isf methods to call new K-S functionality. Updated _cdf and _ppf methods to make appropriate calls. The accuracy improvements to smirnov() required increased precision in some parts of the computation. Uses the recent port the dd_real part of the qd package. Allow FuncData testing to specify a list/array of dtypes to apply. FuncData.check(dtypes=[int, float, float]) will treat the first 3 columns as type int, float, float respectively, rather than blanketly treating all as the underlying type of the data. Needed for testing some C functions that take an int as a parameter.
Fixed by gh-8737 |
While investigating #7426 and #7455 I noticed 4 related issues with
scipy.special.smirnov
&scipy.special.smirnovi
, functions which form the basis forspicy.stats.ksone
.S1:
scipy.special.smirnovi(n, p)
returns NAN for p close to 0S2:
scipy.special.smirnovi(n, p)
returns incorrect values for p close to 1S3:
scipy.special.smirnovi(n, p)
not monotonic for p close to 1S4:
scipy.special.smirnov(n, p)
not monotonic for increasing nIn detail:
S1:
scipy.special.smirnovi
returns NAN for p close to 0S1 Summary: Some unexpected NANs are returned, but the non-NAN values that are returned are OK.
S2:
scipy.special.smirnovi
returns incorrect values for p close to 1The first few values in the array look fine, the zeroes at the end are troubling, and the values in the middle have lost lots of accuracy.
S2 Summary:
smirnovi(n, p)
returns inaccurate non-NAN values as p->1.S3:
scipy.special.smirnovi
not monotonic for p close to 1smirnovi(n, p)
should be monotonic in p. [It arises from a probability distribution.(1-smirnov(n, x))
is a probability CDF, hencesmirnovi(n, p)
should be monotonically decreasing as p->1.]However it bounces around as p->1.
I've also observed that it can take many iterations to converge.
S3 Summary: Not monotonic, causes invalid probability distribution random variable generation.
S4:
smirnov(n, p)
not monotonically decreasing for increasing n.As n increases,
smirnov(n, p)
should decrease dues to its probability interpretation. (The probability of a gap of size x goes down as more samples are observed.)The values do decrease until n=1012, at which point there is a jump up and followed by decreases from then on. It is very obvious when looking at a plot over the whole of [0,1], so it is not just an issue with one particular value of x.
S4 Summary: Discontinuity between n=1012 and 1013.
Reproducing code example:
The examples above often use n=1, but examples for higher n also exist.
Scipy/Numpy/Python version information:
The text was updated successfully, but these errors were encountered: