Error when trying to use an exposure term or offset term with `statsmodels.discrete.count_model.Poisson` #7015

ilevantis · 2020-09-01T19:21:19Z

Describe the bug

Error when trying to use an exposure term or offset term with statsmodels.discrete.count_model.Poisson and statsmodels.discrete.count_model.ZeroInflatedPoisson.

import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from patsy import dmatrices

data = pd.read_csv('https://gist.githubusercontent.com/sachinsdate/09cfd42b7701c48ec68b04c786786434/raw/4b50e718a83a4b20adcff5552ccea8f1054ce919/fish.csv')

Pmodel1 = smf.glm('FISH_COUNT ~ C(CAMPER)', data=data, family=sm.families.Poisson(), exposure=data['PERSONS']).fit() # this works
Pmodel1.summary()


y_mat, X_mat = dmatrices('FISH_COUNT ~ C(CAMPER)', data, return_type='dataframe')

Pmodel2 = sm.Poisson(endog=y_mat, exog=X_mat, exposure=data['PERSONS']).fit() # this gives an error

ZIPmodel = sm.ZeroInflatedPoisson(endog=y_mat, exog=X_mat, exposure=data['PERSONS'], exog_infl=X_mat, inflation='logit').fit() # this gives an error

Traceback

ValueError                                Traceback (most recent call last)
<ipython-input-8-623bd13d22ad> in <module>
----> 1 Pmodel2 = sm.Poisson(endog=y_mat, exog=X_mat, exposure=data['PERSONS']).fit() # this gives an error

~/miniconda3/envs/stdsci/lib/python3.6/site-packages/statsmodels/discrete/discrete_model.py in fit(self, start_params, method, maxiter, full_output, disp, callback, **kwargs)
   1104                                              disp=disp,
   1105                                              callback=callback,
-> 1106                                              **kwargs)
   1107 
   1108         if 'cov_type' in kwargs:

~/miniconda3/envs/stdsci/lib/python3.6/site-packages/statsmodels/discrete/discrete_model.py in fit(self, start_params, method, maxiter, full_output, disp, callback, **kwargs)
    231                              disp=disp,
    232                              callback=callback,
--> 233                              **kwargs)
    234 
    235         return mlefit  # It is up to subclasses to wrap results

~/miniconda3/envs/stdsci/lib/python3.6/site-packages/statsmodels/base/model.py in fit(self, start_params, method, maxiter, full_output, disp, fargs, callback, retall, skip_hessian, **kwargs)
    525                                                        callback=callback,
    526                                                        retall=retall,
--> 527                                                        full_output=full_output)
    528 
    529         # NOTE: this is for fit_regularized and should be generalized

~/miniconda3/envs/stdsci/lib/python3.6/site-packages/statsmodels/base/optimizer.py in _fit(self, objective, gradient, start_params, fargs, kwargs, hessian, method, maxiter, full_output, disp, callback, retall)
    216                             disp=disp, maxiter=maxiter, callback=callback,
    217                             retall=retall, full_output=full_output,
--> 218                             hess=hessian)
    219 
    220         optim_settings = {'optimizer': method, 'start_params': start_params,

~/miniconda3/envs/stdsci/lib/python3.6/site-packages/statsmodels/base/optimizer.py in _fit_newton(f, score, start_params, fargs, kwargs, disp, maxiter, callback, retall, full_output, hess, ridge_factor)
    314     while (iterations < maxiter and np.any(np.abs(newparams -
    315             oldparams) > tol)):
--> 316         H = np.asarray(hess(newparams))
    317         # regularize Hessian, not clear what ridge factor should be
    318         # keyword option with absolute default 1e-10, see #1847

~/miniconda3/envs/stdsci/lib/python3.6/site-packages/statsmodels/base/model.py in hess(params, *args)
    507 
    508             def hess(params, *args):
--> 509                 return self.hessian(params, *args) / nobs
    510         else:
    511             def score(params, *args):

~/miniconda3/envs/stdsci/lib/python3.6/site-packages/statsmodels/discrete/discrete_model.py in hessian(self, params)
   1317         X = self.exog
   1318         L = np.exp(np.dot(X,params) + exposure + offset)
-> 1319         return -np.dot(L*X.T, X)
   1320 
   1321     def hessian_factor(self, params):

~/miniconda3/envs/stdsci/lib/python3.6/site-packages/pandas/core/ops/common.py in new_method(self, other)
     63         other = item_from_zerodim(other)
     64 
---> 65         return method(self, other)
     66 
     67     return new_method

~/miniconda3/envs/stdsci/lib/python3.6/site-packages/pandas/core/ops/__init__.py in wrapper(left, right)
    343         result = arithmetic_op(lvalues, rvalues, op)
    344 
--> 345         return left._construct_result(result, name=res_name)
    346 
    347     wrapper.__name__ = op_name

~/miniconda3/envs/stdsci/lib/python3.6/site-packages/pandas/core/series.py in _construct_result(self, result, name)
   2755         # We do not pass dtype to ensure that the Series constructor
   2756         #  does inference in the case where `result` has object-dtype.
-> 2757         out = self._constructor(result, index=self.index)
   2758         out = out.__finalize__(self)
   2759 

~/miniconda3/envs/stdsci/lib/python3.6/site-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    312                     if len(index) != len(data):
    313                         raise ValueError(
--> 314                             f"Length of passed values is {len(data)}, "
    315                             f"index implies {len(index)}."
    316                         )

ValueError: Length of passed values is 2, index implies 250.

Fitting a Poisson model with an exposure or offset term works if done through the formula API but trying to do so using the non-fomrula API gives the error above. This is especialyl a problem when the model is not accessible with the formula API e.g. statsmodels.discrete.count_model.ZeroInflatedPoisson.

Output of `import statsmodels.api as sm; sm.show_versions()`

INSTALLED VERSIONS

Python: 3.6.7.final.0
OS: Linux 5.8.0-1-default #1 SMP Tue Aug 4 07:30:59 UTC 2020 (9bc0044) x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8

statsmodels

Installed: 0.12.0 (/home/ilya/miniconda3/envs/stdsci/lib/python3.6/site-packages/statsmodels)

Required Dependencies

cython: Not installed
numpy: 1.19.1 (/home/ilya/miniconda3/envs/stdsci/lib/python3.6/site-packages/numpy)
scipy: 1.5.2 (/home/ilya/miniconda3/envs/stdsci/lib/python3.6/site-packages/scipy)
pandas: 1.1.1 (/home/ilya/miniconda3/envs/stdsci/lib/python3.6/site-packages/pandas)
dateutil: 2.8.1 (/home/ilya/miniconda3/envs/stdsci/lib/python3.6/site-packages/dateutil)
patsy: 0.5.1 (/home/ilya/miniconda3/envs/stdsci/lib/python3.6/site-packages/patsy)

Optional Dependencies

matplotlib: 3.3.1 (/home/ilya/miniconda3/envs/stdsci/lib/python3.6/site-packages/matplotlib)
backend: module://ipykernel.pylab.backend_inline
cvxopt: Not installed
joblib: Not installed

Developer Tools

IPython: 7.11.1 (/home/ilya/miniconda3/envs/stdsci/lib/python3.6/site-packages/IPython)
jinja2: Not installed
sphinx: Not installed
pygments: 2.6.1 (/home/ilya/miniconda3/envs/stdsci/lib/python3.6/site-packages/pygments)
pytest: Not installed
virtualenv: Not installed

The text was updated successfully, but these errors were encountered:

josef-pkt · 2020-09-01T19:56:53Z

thanks for a compete example that can be run.

looks like exposure is missing an asarray so we have pandas DataFrame internally which doesn't work

Using numpy array for exposure exposure=data['PERSONS'].values fits without exception.
(I didn't look at the results itself)

I get a ConvergenceWarning, but the summary says converged. The ConvergenceWarning is likely from an initial model estimate to get better start_params and not the final model.

ilevantis · 2020-09-01T20:45:47Z

Thanks for the quick response!
Converting it to a numpy array with .values fixes it for me. Though I never would have figured that out from the seemingly unrelated messages in the traceback myself!

Presumably a line could be added to the docs that a pandas series does not count as an "array-like" in this case. Although would make even more sense to add code to handle a series in the same way that the smf.glm api is able to.

I only tested these two model types but I would guess that it also applies to other discrete and count models such as NegativeBinomial.

josef-pkt · 2020-09-01T21:15:21Z

Though I never would have figured that out from the seemingly unrelated messages in the traceback myself!

It's actually pretty obvious from the traceback, knowing statsmodels internals.
When computing the hessian, we get a pandas exception. That shouldn't occur because our internals and linalg is supposed to only use numpy.

GLM has an asarray for offset in __init__. model.exposure is an array, not pandas Series, so this also has been converted somewhere.

maybe in the old times, np.log of a pandas Series returned an ndarray. obviously no unit tests for pandas exposure

I think the easiest is to use np.asarray for offset and exposure at the beginning of discrete.CountModel.__init__.

josef-pkt · 2020-09-01T21:16:50Z

If we release a 0.12.1, then fixing this would be a backport candidate.

BUG/ENH: Poisson offset rebased closes #7015 closes #7127

josef-pkt added comp-discrete type-bug labels Sep 1, 2020

josef-pkt added this to the 0.13 milestone Sep 14, 2020

josef-pkt added the prio-elev label Sep 14, 2020

esmucler mentioned this issue Oct 30, 2020

BUG: fix error in Poisson when offset or exposure is a pd.Series #7127

Closed

4 tasks

josef-pkt mentioned this issue Dec 22, 2020

BUG/ENH: Poisson offset rebased #7228

Merged

josef-pkt closed this as completed in #7228 Jan 25, 2021

josef-pkt added a commit that referenced this issue Jan 25, 2021

Merge pull request #7228 from josef-pkt/poisson-offset_rebased

2fdea4f

BUG/ENH: Poisson offset rebased closes #7015 closes #7127

bashtage pushed a commit that referenced this issue Jan 27, 2021

Merge pull request #7228 from josef-pkt/poisson-offset_rebased

7974ab9

BUG/ENH: Poisson offset rebased closes #7015 closes #7127

bashtage modified the milestones: 0.13, 0.12 Jan 27, 2021

bashtage modified the milestones: 0.12, 0.13 Sep 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when trying to use an exposure term or offset term with `statsmodels.discrete.count_model.Poisson` #7015

Error when trying to use an exposure term or offset term with `statsmodels.discrete.count_model.Poisson` #7015

ilevantis commented Sep 1, 2020

INSTALLED VERSIONS

statsmodels

Required Dependencies

Optional Dependencies

Developer Tools

josef-pkt commented Sep 1, 2020 •

edited

Loading

ilevantis commented Sep 1, 2020

josef-pkt commented Sep 1, 2020

josef-pkt commented Sep 1, 2020

Error when trying to use an exposure term or offset term with statsmodels.discrete.count_model.Poisson #7015

Error when trying to use an exposure term or offset term with statsmodels.discrete.count_model.Poisson #7015

Comments

ilevantis commented Sep 1, 2020

Describe the bug

Output of import statsmodels.api as sm; sm.show_versions()

INSTALLED VERSIONS

statsmodels

Required Dependencies

Optional Dependencies

Developer Tools

josef-pkt commented Sep 1, 2020 • edited Loading

ilevantis commented Sep 1, 2020

josef-pkt commented Sep 1, 2020

josef-pkt commented Sep 1, 2020

Error when trying to use an exposure term or offset term with `statsmodels.discrete.count_model.Poisson` #7015

Error when trying to use an exposure term or offset term with `statsmodels.discrete.count_model.Poisson` #7015

Output of `import statsmodels.api as sm; sm.show_versions()`

josef-pkt commented Sep 1, 2020 •

edited

Loading