statsmodels.stats.rates.test_poisson_2indep has ZeroDivisionError if count2 is 0 #8313

Anaphory · 2022-06-16T14:25:22Z

Describe the bug

When running statsmodels.stats.rates.test_poisson_2indep on data with count2=0, the function raises a ZeroDivisionError instead of computing the test statistics.

Code Sample, a copy-pastable example if possible

>>> from statsmodels.stats.rates import test_poisson_2indep
/home/gereon/.local/lib/python3.10/site-packages/statsmodels/compat/pandas.py:65: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import Int64Index as NumericIndex
>>> test_poisson_2indep(1, 1, 0, 4)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/gereon/.local/lib/python3.10/site-packages/statsmodels/stats/rates.py", line 140, in test_poisson_2indep
    ratio = rates[0] / rates[1]
ZeroDivisionError: float division by zero
>>> test_poisson_2indep(0, 4, 1, 1)
<class 'statsmodels.stats.base.HolderTuple'>
statistic = -2.0
pvalue = 0.04550026389635839
distribution = 'normal'
method = 'score'
alternative = 'two-sided'
rates = (0.0, 1.0)
ratio = 0.0
ratio_null = 1
tuple = (-2.0, 0.04550026389635839)

Expected Output

I believe the test is symmetric, so

>>> test_poisson_2indep(1, 1, 0, 4)
<class 'statsmodels.stats.base.HolderTuple'>
statistic = -2.0
pvalue = 0.04550026389635839
distribution = 'normal'
method = 'score'
alternative = 'two-sided'
rates = (1.0, 0.0)
ratio = 0.0
ratio_null = 1
tuple = (-2.0, 0.04550026389635839)

Output of `import statsmodels.api as sm; sm.show_versions()`

INSTALLED VERSIONS

Python: 3.10.4.final.0
OS: Linux 5.10.114-1-MANJARO #1 SMP PREEMPT Mon May 9 07:52:55 UTC 2022 x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

statsmodels

Installed: 0.13.2 (/home/gereon/.local/lib/python3.10/site-packages/statsmodels)

Required Dependencies

cython: 0.29.28 (/usr/lib/python3.10/site-packages/Cython)
numpy: 1.22.3 (/usr/lib/python3.10/site-packages/numpy)
scipy: 1.8.1 (/home/gereon/.local/lib/python3.10/site-packages/scipy)
pandas: 1.4.2 (/usr/lib/python3.10/site-packages/pandas)
dateutil: 2.8.2 (/usr/lib/python3.10/site-packages/dateutil)
patsy: 0.5.2 (/home/gereon/.local/lib/python3.10/site-packages/patsy)

Optional Dependencies

matplotlib: 3.5.2 (/usr/lib/python3.10/site-packages/matplotlib)
backend: GTK3Agg
cvxopt: Not installed
joblib: 1.1.0 (/usr/lib/python3.10/site-packages/joblib)

Developer Tools

IPython: 8.3.0 (/usr/lib/python3.10/site-packages/IPython)
jinja2: 3.0.3 (/usr/lib/python3.10/site-packages/jinja2)
sphinx: 4.5.0 (/usr/lib/python3.10/site-packages/sphinx)
pygments: 2.12.0 (/usr/lib/python3.10/site-packages/pygments)
pytest: 7.1.2 (/usr/lib/python3.10/site-packages/pytest)
virtualenv: 20.11.0 (/usr/lib/python3.10/site-packages/virtualenv)

The text was updated successfully, but these errors were encountered:

Anaphory · 2022-06-16T14:30:23Z

The offending line is this one.

statsmodels/statsmodels/stats/rates.py

Line 141 in 96c41cd

rates = (y1 / n1, y2 / n2)

If I am right about it being symmetric, it should boil down to

if count2 == 0:
    return test_poisson_2indep(count2, exposure2, count1, exposure1, ratio_null=1/ratio_null, method=method, alternative=alternative, etest_kwds=etest_kwds)

but there may be things in the one-sided tests in particular that I am forgetting.

josef-pkt · 2022-06-16T15:02:57Z

zero division should be a warning, but Python raises.
We should use numpy in cases like ratio = rates[0] / rates[1]

np.float64(5) / 0
<ipython-input-93-51152f6d3e5e>:1: RuntimeWarning: divide by zero encountered in double_scalars
  np.float64(5) / 0
inf

The problem is still present in my current PR #8166

score test works even for zero counts,

(using my branch)

test_poisson_2indep(1, 1, np.float64(0), 4)
...\statsmodels\statsmodels\stats\rates.py:858: RuntimeWarning: divide by zero encountered in double_scalars
  ratio = rate1 / rate2

<class 'statsmodels.stats.base.HolderTuple'>
statistic = 2.0
pvalue = 0.04550026389635839
distribution = 'normal'
method = 'score'
alternative = 'two-sided'
rates = (1.0, 0.0)
ratio = inf
diff = 1.0
value = None
rates_cmle = None
ratio_null = 1
tuple = (2.0, 0.04550026389635839)

I think the fix should be something like the following which works automatically for scalar and vectorized rates
np.float64(1) * rate1 / rate2

edit
using np.divide is simpler

josef-pkt · 2022-06-16T15:39:16Z

current behavior in #8166
several RuntimeWarnings

It looks like we got continuity with respect to count around zero

Note, nan in exact-cond and cond-midp statistics are by design, not available

count1, n1, count2, n2 = 1, 1, np.float64(0), 4

for compare in ["ratio", "diff"]:
    print(compare)
    for meth in method_names_poisson_2indep["test"][compare]:
        tst = test_poisson_2indep(count1, n1, count2, n2, method=meth,
                                  compare=compare,
                                  value=None, alternative='two-sided')
        tst2 = test_poisson_2indep(count1, n1, count2+1e-10, n2, method=meth,
                                  compare=compare,
                                  value=None, alternative='two-sided')
        print("   %-12s" % meth, tst.tuple, tst2.tuple)

ratio
   wald         (1.0, 0.31731050786291415) (0.9999999999718752, 0.3173105078765248)
   score        (2.0, 0.04550026389635839) (1.99999999985, 0.0455002639125557)
   score-log    (inf, 0.0) (9.764858116912382, 1.5933487650281872e-22)
   wald-log     (nan, nan) (0.00024412145289839742, 0.9998052192637062)
   exact-cond   (nan, 0.19999999999999996) (nan, 0.19999999999999996)
   cond-midp    (nan, 0.09999999999999995) (nan, 0.09999999999223139)
   sqrt         (1.549895138835137, 0.12116668661678107) (1.5498951387621074, 0.12116668663431229)
   etest-score  (2.0, 0.08791618383341977) (1.99999999985, 0.08791618383563313)
   etest-wald   (1.0, 0.5411722584954477) (0.9999999999718752, 0.5411722585188068)

diff
   wald         (1.0, 0.31731050786291415) (0.9999999999718752, 0.3173105078765248)
   score        (2.0, 0.04550026389635839) (1.99999999985, 0.0455002639125557)
   waldccv      (0.8081220356417685, 0.4190203335343159) (0.8081220356199162, 0.4190203335468944)
   etest-score  (2.0, 0.08791618383341977) (1.99999999985, 0.08791618383563313)
   etest-wald   (1.0, 0.5411722584954477) (0.9999999999718752, 0.5411722585188068)

josef-pkt · 2022-06-17T14:00:10Z

wald-log is messier, 3 divisions

elif method in ['wald-log']:
     stat = (np.log(y1 / y2) - np.log(r_d)) / np.sqrt(1 / y1 + 1 / y2)

also, we don't use asarray in the functions

josef-pkt · 2022-06-17T14:07:37Z

asarray it is

It works and it still returns a scalar (nulmpy scalar)

type(tst.pvalue)
numpy.float64

change will be in PR #8166 (fixing main would just cause messy merge conflicts)

josef-pkt added type-bug comp-stats corner-case labels Jun 17, 2022

josef-pkt added this to the 0.14 milestone Jun 17, 2022

josef-pkt closed this as completed in e3ac6b2 Jun 20, 2022

bashtage modified the milestones: 0.14, 0.13.2, 0.13.3 Oct 30, 2022

bashtage modified the milestones: 0.13.3, 0.14 Apr 14, 2023

josef-pkt mentioned this issue Jun 14, 2024

Add Pyodide support and CI jobs for statsmodels #9270

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

statsmodels.stats.rates.test_poisson_2indep has ZeroDivisionError if count2 is 0 #8313

statsmodels.stats.rates.test_poisson_2indep has ZeroDivisionError if count2 is 0 #8313

Anaphory commented Jun 16, 2022

Anaphory commented Jun 16, 2022

josef-pkt commented Jun 16, 2022 •

edited

Loading

josef-pkt commented Jun 16, 2022

josef-pkt commented Jun 17, 2022

josef-pkt commented Jun 17, 2022 •

edited

Loading

statsmodels.stats.rates.test_poisson_2indep has ZeroDivisionError if count2 is 0 #8313

statsmodels.stats.rates.test_poisson_2indep has ZeroDivisionError if count2 is 0 #8313

Comments

Anaphory commented Jun 16, 2022

Describe the bug

Code Sample, a copy-pastable example if possible

Expected Output

Output of import statsmodels.api as sm; sm.show_versions()

INSTALLED VERSIONS

statsmodels

Required Dependencies

Optional Dependencies

Developer Tools

Anaphory commented Jun 16, 2022

josef-pkt commented Jun 16, 2022 • edited Loading

josef-pkt commented Jun 16, 2022

josef-pkt commented Jun 17, 2022

josef-pkt commented Jun 17, 2022 • edited Loading

Output of `import statsmodels.api as sm; sm.show_versions()`

josef-pkt commented Jun 16, 2022 •

edited

Loading

josef-pkt commented Jun 17, 2022 •

edited

Loading