Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when trying to use an exposure term or offset term with statsmodels.discrete.count_model.Poisson #7015

Closed
ilevantis opened this issue Sep 1, 2020 · 4 comments · Fixed by #7228

Comments

@ilevantis
Copy link

Describe the bug

Error when trying to use an exposure term or offset term with statsmodels.discrete.count_model.Poisson and statsmodels.discrete.count_model.ZeroInflatedPoisson.

import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from patsy import dmatrices

data = pd.read_csv('https://gist.githubusercontent.com/sachinsdate/09cfd42b7701c48ec68b04c786786434/raw/4b50e718a83a4b20adcff5552ccea8f1054ce919/fish.csv')

Pmodel1 = smf.glm('FISH_COUNT ~ C(CAMPER)', data=data, family=sm.families.Poisson(), exposure=data['PERSONS']).fit() # this works
Pmodel1.summary()


y_mat, X_mat = dmatrices('FISH_COUNT ~ C(CAMPER)', data, return_type='dataframe')

Pmodel2 = sm.Poisson(endog=y_mat, exog=X_mat, exposure=data['PERSONS']).fit() # this gives an error

ZIPmodel = sm.ZeroInflatedPoisson(endog=y_mat, exog=X_mat, exposure=data['PERSONS'], exog_infl=X_mat, inflation='logit').fit() # this gives an error
Traceback
ValueError                                Traceback (most recent call last)
<ipython-input-8-623bd13d22ad> in <module>
----> 1 Pmodel2 = sm.Poisson(endog=y_mat, exog=X_mat, exposure=data['PERSONS']).fit() # this gives an error

~/miniconda3/envs/stdsci/lib/python3.6/site-packages/statsmodels/discrete/discrete_model.py in fit(self, start_params, method, maxiter, full_output, disp, callback, **kwargs)
   1104                                              disp=disp,
   1105                                              callback=callback,
-> 1106                                              **kwargs)
   1107 
   1108         if 'cov_type' in kwargs:

~/miniconda3/envs/stdsci/lib/python3.6/site-packages/statsmodels/discrete/discrete_model.py in fit(self, start_params, method, maxiter, full_output, disp, callback, **kwargs)
    231                              disp=disp,
    232                              callback=callback,
--> 233                              **kwargs)
    234 
    235         return mlefit  # It is up to subclasses to wrap results

~/miniconda3/envs/stdsci/lib/python3.6/site-packages/statsmodels/base/model.py in fit(self, start_params, method, maxiter, full_output, disp, fargs, callback, retall, skip_hessian, **kwargs)
    525                                                        callback=callback,
    526                                                        retall=retall,
--> 527                                                        full_output=full_output)
    528 
    529         # NOTE: this is for fit_regularized and should be generalized

~/miniconda3/envs/stdsci/lib/python3.6/site-packages/statsmodels/base/optimizer.py in _fit(self, objective, gradient, start_params, fargs, kwargs, hessian, method, maxiter, full_output, disp, callback, retall)
    216                             disp=disp, maxiter=maxiter, callback=callback,
    217                             retall=retall, full_output=full_output,
--> 218                             hess=hessian)
    219 
    220         optim_settings = {'optimizer': method, 'start_params': start_params,

~/miniconda3/envs/stdsci/lib/python3.6/site-packages/statsmodels/base/optimizer.py in _fit_newton(f, score, start_params, fargs, kwargs, disp, maxiter, callback, retall, full_output, hess, ridge_factor)
    314     while (iterations < maxiter and np.any(np.abs(newparams -
    315             oldparams) > tol)):
--> 316         H = np.asarray(hess(newparams))
    317         # regularize Hessian, not clear what ridge factor should be
    318         # keyword option with absolute default 1e-10, see #1847

~/miniconda3/envs/stdsci/lib/python3.6/site-packages/statsmodels/base/model.py in hess(params, *args)
    507 
    508             def hess(params, *args):
--> 509                 return self.hessian(params, *args) / nobs
    510         else:
    511             def score(params, *args):

~/miniconda3/envs/stdsci/lib/python3.6/site-packages/statsmodels/discrete/discrete_model.py in hessian(self, params)
   1317         X = self.exog
   1318         L = np.exp(np.dot(X,params) + exposure + offset)
-> 1319         return -np.dot(L*X.T, X)
   1320 
   1321     def hessian_factor(self, params):

~/miniconda3/envs/stdsci/lib/python3.6/site-packages/pandas/core/ops/common.py in new_method(self, other)
     63         other = item_from_zerodim(other)
     64 
---> 65         return method(self, other)
     66 
     67     return new_method

~/miniconda3/envs/stdsci/lib/python3.6/site-packages/pandas/core/ops/__init__.py in wrapper(left, right)
    343         result = arithmetic_op(lvalues, rvalues, op)
    344 
--> 345         return left._construct_result(result, name=res_name)
    346 
    347     wrapper.__name__ = op_name

~/miniconda3/envs/stdsci/lib/python3.6/site-packages/pandas/core/series.py in _construct_result(self, result, name)
   2755         # We do not pass dtype to ensure that the Series constructor
   2756         #  does inference in the case where `result` has object-dtype.
-> 2757         out = self._constructor(result, index=self.index)
   2758         out = out.__finalize__(self)
   2759 

~/miniconda3/envs/stdsci/lib/python3.6/site-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    312                     if len(index) != len(data):
    313                         raise ValueError(
--> 314                             f"Length of passed values is {len(data)}, "
    315                             f"index implies {len(index)}."
    316                         )

ValueError: Length of passed values is 2, index implies 250.

Fitting a Poisson model with an exposure or offset term works if done through the formula API but trying to do so using the non-fomrula API gives the error above. This is especialyl a problem when the model is not accessible with the formula API e.g. statsmodels.discrete.count_model.ZeroInflatedPoisson.

Output of import statsmodels.api as sm; sm.show_versions()

INSTALLED VERSIONS

Python: 3.6.7.final.0
OS: Linux 5.8.0-1-default #1 SMP Tue Aug 4 07:30:59 UTC 2020 (9bc0044) x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8

statsmodels

Installed: 0.12.0 (/home/ilya/miniconda3/envs/stdsci/lib/python3.6/site-packages/statsmodels)

Required Dependencies

cython: Not installed
numpy: 1.19.1 (/home/ilya/miniconda3/envs/stdsci/lib/python3.6/site-packages/numpy)
scipy: 1.5.2 (/home/ilya/miniconda3/envs/stdsci/lib/python3.6/site-packages/scipy)
pandas: 1.1.1 (/home/ilya/miniconda3/envs/stdsci/lib/python3.6/site-packages/pandas)
dateutil: 2.8.1 (/home/ilya/miniconda3/envs/stdsci/lib/python3.6/site-packages/dateutil)
patsy: 0.5.1 (/home/ilya/miniconda3/envs/stdsci/lib/python3.6/site-packages/patsy)

Optional Dependencies

matplotlib: 3.3.1 (/home/ilya/miniconda3/envs/stdsci/lib/python3.6/site-packages/matplotlib)
backend: module://ipykernel.pylab.backend_inline
cvxopt: Not installed
joblib: Not installed

Developer Tools

IPython: 7.11.1 (/home/ilya/miniconda3/envs/stdsci/lib/python3.6/site-packages/IPython)
jinja2: Not installed
sphinx: Not installed
pygments: 2.6.1 (/home/ilya/miniconda3/envs/stdsci/lib/python3.6/site-packages/pygments)
pytest: Not installed
virtualenv: Not installed

@josef-pkt
Copy link
Member

josef-pkt commented Sep 1, 2020

thanks for a compete example that can be run.

looks like exposure is missing an asarray so we have pandas DataFrame internally which doesn't work

Using numpy array for exposure exposure=data['PERSONS'].values fits without exception.
(I didn't look at the results itself)

I get a ConvergenceWarning, but the summary says converged. The ConvergenceWarning is likely from an initial model estimate to get better start_params and not the final model.

@ilevantis
Copy link
Author

Thanks for the quick response!
Converting it to a numpy array with .values fixes it for me. Though I never would have figured that out from the seemingly unrelated messages in the traceback myself!

Presumably a line could be added to the docs that a pandas series does not count as an "array-like" in this case. Although would make even more sense to add code to handle a series in the same way that the smf.glm api is able to.

I only tested these two model types but I would guess that it also applies to other discrete and count models such as NegativeBinomial.

@josef-pkt
Copy link
Member

Though I never would have figured that out from the seemingly unrelated messages in the traceback myself!

It's actually pretty obvious from the traceback, knowing statsmodels internals.
When computing the hessian, we get a pandas exception. That shouldn't occur because our internals and linalg is supposed to only use numpy.

GLM has an asarray for offset in __init__. model.exposure is an array, not pandas Series, so this also has been converted somewhere.

maybe in the old times, np.log of a pandas Series returned an ndarray. obviously no unit tests for pandas exposure

I think the easiest is to use np.asarray for offset and exposure at the beginning of discrete.CountModel.__init__.

@josef-pkt
Copy link
Member

If we release a 0.12.1, then fixing this would be a backport candidate.

@josef-pkt josef-pkt added this to the 0.13 milestone Sep 14, 2020
josef-pkt added a commit that referenced this issue Jan 25, 2021
BUG/ENH: Poisson offset rebased closes #7015 closes #7127
bashtage pushed a commit that referenced this issue Jan 27, 2021
BUG/ENH: Poisson offset rebased closes #7015 closes #7127
@bashtage bashtage modified the milestones: 0.13, 0.12 Jan 27, 2021
@bashtage bashtage modified the milestones: 0.12, 0.13 Sep 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants