dtype object, glm.fit() gives AttributeError: sqrt #1842

Closed
lafras opened this Issue Jul 22, 2014 · 12 comments

Projects

None yet

3 participants

@lafras
lafras commented Jul 22, 2014

I got the following message while trying to fit a glm (in both a standard ubuntu 14.04 package install and a fresh pip install in a virtualenv):

File "fit.py", line 18, in <module>
    glm.fit()
  File "/home/lafras/.virtualenvs/hq/local/lib/python2.7/site-packages/statsmodels/genmod/generalized_linear_model.py", line 406, in fit
    wls_results = lm.WLS(wlsendog, wlsexog, self.weights).fit()
  File "/home/lafras/.virtualenvs/hq/local/lib/python2.7/site-packages/statsmodels/regression/linear_model.py", line 381, in __init__
    weights=weights, hasconst=hasconst)
  File "/home/lafras/.virtualenvs/hq/local/lib/python2.7/site-packages/statsmodels/regression/linear_model.py", line 79, in __init__
    super(RegressionModel, self).__init__(endog, exog, **kwargs)
  File "/home/lafras/.virtualenvs/hq/local/lib/python2.7/site-packages/statsmodels/base/model.py", line 137, in __init__
    self.initialize()
  File "/home/lafras/.virtualenvs/hq/local/lib/python2.7/site-packages/statsmodels/regression/linear_model.py", line 84, in initialize
    self.wexog = self.whiten(self.exog)
  File "/home/lafras/.virtualenvs/hq/local/lib/python2.7/site-packages/statsmodels/regression/linear_model.py", line 405, in whiten
    return np.sqrt(self.weights)[:,None]*X
AttributeError: sqrt
@josef-pkt
Member

what's type(glm.weights)?

which numpy version? np.__version__

my first guess is that there is something wrong with self.weights, and that we don't have enough checks that it's a valid type.

@josef-pkt
Member

for example:

>>> np.sqrt(np.array(5, object))
Traceback (most recent call last):
  File "<pyshell#19>", line 1, in <module>
    np.sqrt(np.array(5, object))
AttributeError: sqrt

But I don't know why weights would not be a numeric array in glm.

@lafras
lafras commented Jul 22, 2014

I put type(glm.weights) directly into the source and it gives<type 'numpy.ndarray'>. The error as reported above follows.

The numpy version is 1.8.1.

@lafras
lafras commented Jul 22, 2014

Incidentally I did find a mention on stackoverflow about using dtype = object.

@josef-pkt
Member

What is glm.weights.dtype?

The easiest would be if you can provide a reproducable example with data.
Otherwise you could show your full calls to GLM and check the types and dtypes of all your data to see where object or something similar might enter the model.

@lafras
lafras commented Jul 22, 2014

If I do type(glm.weights) in my script it says:

Traceback (most recent call last):
  File "hfit.py", line 17, in <module>
    print type(glm.weights)
AttributeError: 'GLM' object has no attribute 'weights'

in source type(self.weights.dtype) gives <type 'numpy.dtype'>

If I do print self.weights.dtype in source I get object.

@lafras
lafras commented Jul 22, 2014

OK. So I'm reading my data from an xslx file using pandas' read_excel function. Checking the dtype for the read-in data it is object.

@lafras
lafras commented Jul 22, 2014

Calling the astype(float) method on the data frame columns that I use in my fit fixes the problem.

@josef-pkt
Member

Good, to know, but we need to either convert or raise an exception in this case.

Both pandas and numpy have been creating more object arrays recently, and we don't have a check for this, so it breaks at some "random" point with non-informative message.

see also #1242 #864
and search https://github.com/statsmodels/statsmodels/search?q=check+dtype&ref=cmdform&state=open&type=Issues

@josef-pkt josef-pkt changed the title from glm.fit() gives AttributeError: sqrt to dtype object, glm.fit() gives AttributeError: sqrt Jul 22, 2014
@josef-pkt
Member

object propagates

>>> np.column_stack((np.ones(5, float), np.ones(5, object)))
array([[1.0, 1],
       [1.0, 1],
       [1.0, 1],
       [1.0, 1],
       [1.0, 1]], dtype=object)

so we could also just check endog and exog for dtype after all transformations (patsy, ...)

@josef-pkt josef-pkt added this to the 0.6 milestone Aug 10, 2014
@josef-pkt josef-pkt added the prio-high label Aug 10, 2014
@jseabold jseabold added the FAQ label Sep 20, 2014
@jseabold
Member

Not a blocker I don't think. FAQ fodder. Feature addition that will be addressed when we handle dtypes systematically.

@jseabold jseabold removed this from the 0.6 milestone Sep 20, 2014
@jseabold
Member

Closing as (close enough) duplicate of #880.

@jseabold jseabold closed this Sep 26, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment