Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: rv_discrete fails when support is unbounded below #18564

Closed
matteosantama opened this issue May 27, 2023 · 5 comments · Fixed by #18585
Closed

BUG: rv_discrete fails when support is unbounded below #18564

matteosantama opened this issue May 27, 2023 · 5 comments · Fixed by #18585
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.stats
Milestone

Comments

@matteosantama
Copy link

matteosantama commented May 27, 2023

Describe your issue.

I am trying to create a discrete distribution on the integers (both positive and negative). The documentation states that rv_discrete takes an "optional" lower bound parameter a. It does not, however, seem to allow for an unbounded distribution.

The rv_discrete class should support unbounded distributions. I recognize this might be a big project, but in the meantime the documentation should be updated to reflect that

  1. The a parameter isn't really optional (an optional in Python allows for the value None). Rather, it has a pre-set default.
  2. a must be bounded, even though b can be unbounded

Reproducing Code Example

from typing import Any

import scipy
import numpy as np


class RV(scipy.stats.rv_discrete):

    def __init__(self) -> None:
        # NOTE: np.nan, None, -np.inf, and float("-inf") all fail
        super().__init__(a=np.nan, b=0)

    def _pmf(self, k: np.ndarray[int], *_: Any) -> np.ndarray[float]:
        raise NotImplementedError


RV().expect(lb=-5, ub=0)

Error message

Depends on what you pass for `a`, but generally a `ValueError`

SciPy/NumPy/Python version and system information

1.10.1 1.24.3 sys.version_info(major=3, minor=11, micro=3, releaselevel='final', serial=0)
Build Dependencies:
  blas:
    detection method: cmake
    found: true
    include directory: unknown
    lib directory: unknown
    name: OpenBLAS
    openblas configuration: unknown
    pc file directory: unknown
    version: 0.3.18
  lapack:
    detection method: cmake
    found: true
    include directory: unknown
    lib directory: unknown
    name: OpenBLAS
    openblas configuration: unknown
    pc file directory: unknown
    version: 0.3.18
Compilers:
  c:
    commands: cc
    linker: ld64
    name: clang
    version: 13.1.6
  c++:
    commands: c++
    linker: ld64
    name: clang
    version: 13.1.6
  cython:
    commands: cython
    linker: cython
    name: cython
    version: 0.29.33
  fortran:
    commands: gfortran
    linker: ld64
    name: gcc
    version: 12.1.0
  pythran:
    include directory: /private/var/folders/_f/lyvxf0v13gs7984d7sf7j83c0000gn/T/pip-build-env-c1_tgrdc/overlay/lib/python3.11/site-packages/pythran
    version: 0.12.1
Machine Information:
  build:
    cpu: aarch64
    endian: little
    family: aarch64
    system: darwin
  cross-compiled: false
  host:
    cpu: aarch64
    endian: little
    family: aarch64
    system: darwin
Python Information:
  path: /private/var/folders/_f/lyvxf0v13gs7984d7sf7j83c0000gn/T/cibw-run-s3_k_ke5/cp311-macosx_arm64/build/venv/bin/python
  version: '3.11'
@matteosantama matteosantama added the defect A clear bug or issue that prevents SciPy from being installed or used as expected label May 27, 2023
@mdhaber
Copy link
Contributor

mdhaber commented May 27, 2023

I'll keep this in mind when I'm working on the infrastructure this summer. This is now tracked in gh-15928.

@josef-pkt
Copy link
Member

josef-pkt commented May 27, 2023

There is already one distribution with unbounded support (all integers)
don't remember name, it's difference of two Poisson distributions

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.skellam.html

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.dlaplace.html
has also unbound integer support

@josef-pkt
Copy link
Member

aside: docstring for dlaplace is confusing because formula does not correspond to the literature or R packages

based on PR #2562
this is just a rewriting of the standard formula, but that's not obvious

@mdhaber
Copy link
Contributor

mdhaber commented May 30, 2023

Agreed with #18564 (comment)'; there's not an inherent problem with unbounded support.

import numpy as np
from scipy import stats

class RV(stats.rv_discrete):
  
    def _pmf(self, k):
        return np.tanh(1/2) * np.exp(-abs(k))

rv = RV(a=-np.inf, name='RV', longname='An unbounded discrete random variable')
k = np.arange(-100, 110, 10)
rv.pmf(k)
# array([1.71911093e-44, 3.78659382e-40, 8.34052793e-36, 1.83712353e-31,
#        4.04653386e-27, 8.91308397e-23, 1.96323739e-18, 4.32431813e-14,
#        9.52494453e-10, 2.09800865e-05, 4.62117157e-01, 2.09800865e-05,
#        9.52494453e-10, 4.32431813e-14, 1.96323739e-18, 8.91308397e-23,
#        4.04653386e-27, 1.83712353e-31, 8.34052793e-36, 3.78659382e-40,
#        1.71911093e-44])

The problem is with certain methods, like cdf, that try to use an unbounded end of the support.

def _cdf_single(self, k, *args):
_a, _b = self._get_support(*args)
m = arange(int(_a), k+1)
return np.sum(self._pmf(m, *args), axis=0)

The immediate error is that _a, -np.inf, can't be converted into an int. And of course, it wouldn't be possible to create an array with all integers from negative infinity to k.

The infrastructure would need a more sophisticated algorithm for handling infinite sums (e.g. like mpmath.nsum). I understand that this can be tricky with limited precision arithmetic (e.g. some of the techniques suffer from catastrophic cancellation), but we can probably do better than erroring out due to an undocumented limitation. For well-behaved distributions, we could just start summing terms from the mode outward, for example.

I'd suggest that this can be closed by documenting the limitation in the current infrastructure (and the fact that the support must contain only integers) since the need to handle unbounded distributions is tracked by gh-15928.

@josef-pkt
Copy link
Member

cdf could also add negative numbers, as in expect, AFAIR. However, that still has problems because it's difficult to get a measure or error for the lower tail. (prob of an interval is easier but cdf needs lower tail prob)

same with stats, mean, moments.

@tupui tupui added this to the 1.12.0 milestone May 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.stats
Projects
None yet
4 participants