Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Invalid IndexError from scipy.stats.nbinom.logcdf #19747

Closed
asnelt opened this issue Dec 23, 2023 · 1 comment · Fixed by #19779
Closed

BUG: Invalid IndexError from scipy.stats.nbinom.logcdf #19747

asnelt opened this issue Dec 23, 2023 · 1 comment · Fixed by #19779
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.stats
Milestone

Comments

@asnelt
Copy link
Contributor

asnelt commented Dec 23, 2023

Describe your issue.

The logcdf method of nbinom throws an IndexError for certain inputs containing negative entries. For instance, k = [5, -1, 1] triggers the issue. On the other hand, k = [5, -1] works fine and returns -inf for -1 as expected.

I confirmed the issue on a Debian amd64 system and on a Termux arm system.

Reproducing Code Example

from scipy.stats import nbinom
p = 0.5
n = 10
k = [5, -1, 1]
nbinom.logcdf(k, n, p)

Error message

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/data/data/com.termux/files/usr/lib/python3.11/site-packages/scipy/stats/_distn_infrastructure.py", line 3499, in logcdf
    place(output, cond, self._logcdf(*goodargs))
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/data/com.termux/files/usr/lib/python3.11/site-packages/scipy/stats/_discrete_distns.py", line 348, in _logcdf
    logcdf[cond] = f1(k[cond], n[cond], p[cond])
                               ~^^^^^^
IndexError: boolean index did not match indexed array along dimension 0; dimension is 1 but corresponding boolean dimension is 2

SciPy/NumPy/Python version and system information

1.11.4 1.26.2 sys.version_info(major=3, minor=11, micro=6, releaselevel='final', serial=0)
lapack_armpl_info:
  NOT AVAILABLE
lapack_mkl_info:
  NOT AVAILABLE
lapack_ssl2_info:
  NOT AVAILABLE
openblas_lapack_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/data/data/com.termux/files/usr/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/data/data/com.termux/files/usr/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
blas_armpl_info:
  NOT AVAILABLE
blas_mkl_info:
  NOT AVAILABLE
blas_ssl2_info:
  NOT AVAILABLE
blis_info:
  NOT AVAILABLE
openblas_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/data/data/com.termux/files/usr/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/data/data/com.termux/files/usr/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
Supported SIMD extensions in this NumPy install:
    baseline = NEON,NEON_FP16,NEON_VFPV4,ASIMD
    found = ASIMDHP,ASIMDDP
    not found = ASIMDFHM
@asnelt asnelt added the defect A clear bug or issue that prevents SciPy from being installed or used as expected label Dec 23, 2023
@dschmitz89
Copy link
Contributor

dschmitz89 commented Dec 26, 2023

Strictly speaking, k must be $\geq 0$. For negative $k$, the negative binomial distribution is not defined. From a quick glance though, ScPy's discrete distributions do not check the input of $k$.

The root cause here is that the CDF computation via boost returns nan. This nan is converted to 0 somehow in the public cdf method by the distribution infrastructure. But in the internal _logcdf function the nan from boost causes broadcasting trouble unfortunately.

from scipy.stats._boost import _nbinom_cdf
from scipy.stats import nbinom

k, n, p = -1, 5, 0.5
_nbinom_cdf(k, n, p), nbinom._cdf(k, n, p), nbinom.cdf(k, n, p)
# nan, nan, 0

An additional broadcasting step in the _logcdf function works around this issue:

diff --git a/scipy/stats/_discrete_distns.py b/scipy/stats/_discrete_distns.py
index 3c2352e36..169222855 100644
--- a/scipy/stats/_discrete_distns.py
+++ b/scipy/stats/_discrete_distns.py
@@ -336,9 +336,9 @@ class nbinom_gen(rv_discrete):

     def _logcdf(self, x, n, p):
         k = floor(x)
+        k, n, p = np.broadcast_arrays(k, n, p)
         cdf = self._cdf(k, n, p)
         cond = cdf > 0.5
-
         def f1(k, n, p):
             return np.log1p(-special.betainc(k + 1, n, 1 - p))

Ideally, the distribution infrastructure would figure this out on its own but in future the new infrastructure could keep this in mind for #15928 .

Any strong opinions here from stats people?

@dschmitz89 dschmitz89 added the needs-decision Items that need further discussion before they are merged or closed label Dec 26, 2023
@tylerjereddy tylerjereddy added this to the 1.12.0 milestone Jan 9, 2024
@lucascolley lucascolley removed the needs-decision Items that need further discussion before they are merged or closed label Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.stats
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants