Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dtype object, glm.fit() gives AttributeError: sqrt #1842

Closed
lafras opened this issue Jul 22, 2014 · 12 comments

Comments

Projects
None yet
3 participants
@lafras
Copy link

commented Jul 22, 2014

I got the following message while trying to fit a glm (in both a standard ubuntu 14.04 package install and a fresh pip install in a virtualenv):

File "fit.py", line 18, in <module>
    glm.fit()
  File "/home/lafras/.virtualenvs/hq/local/lib/python2.7/site-packages/statsmodels/genmod/generalized_linear_model.py", line 406, in fit
    wls_results = lm.WLS(wlsendog, wlsexog, self.weights).fit()
  File "/home/lafras/.virtualenvs/hq/local/lib/python2.7/site-packages/statsmodels/regression/linear_model.py", line 381, in __init__
    weights=weights, hasconst=hasconst)
  File "/home/lafras/.virtualenvs/hq/local/lib/python2.7/site-packages/statsmodels/regression/linear_model.py", line 79, in __init__
    super(RegressionModel, self).__init__(endog, exog, **kwargs)
  File "/home/lafras/.virtualenvs/hq/local/lib/python2.7/site-packages/statsmodels/base/model.py", line 137, in __init__
    self.initialize()
  File "/home/lafras/.virtualenvs/hq/local/lib/python2.7/site-packages/statsmodels/regression/linear_model.py", line 84, in initialize
    self.wexog = self.whiten(self.exog)
  File "/home/lafras/.virtualenvs/hq/local/lib/python2.7/site-packages/statsmodels/regression/linear_model.py", line 405, in whiten
    return np.sqrt(self.weights)[:,None]*X
AttributeError: sqrt
@josef-pkt

This comment has been minimized.

Copy link
Member

commented Jul 22, 2014

what's type(glm.weights)?

which numpy version? np.__version__

my first guess is that there is something wrong with self.weights, and that we don't have enough checks that it's a valid type.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Jul 22, 2014

for example:

>>> np.sqrt(np.array(5, object))
Traceback (most recent call last):
  File "<pyshell#19>", line 1, in <module>
    np.sqrt(np.array(5, object))
AttributeError: sqrt

But I don't know why weights would not be a numeric array in glm.

@lafras

This comment has been minimized.

Copy link
Author

commented Jul 22, 2014

I put type(glm.weights) directly into the source and it gives<type 'numpy.ndarray'>. The error as reported above follows.

The numpy version is 1.8.1.

@lafras

This comment has been minimized.

Copy link
Author

commented Jul 22, 2014

Incidentally I did find a mention on stackoverflow about using dtype = object.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Jul 22, 2014

What is glm.weights.dtype?

The easiest would be if you can provide a reproducable example with data.
Otherwise you could show your full calls to GLM and check the types and dtypes of all your data to see where object or something similar might enter the model.

@lafras

This comment has been minimized.

Copy link
Author

commented Jul 22, 2014

If I do type(glm.weights) in my script it says:

Traceback (most recent call last):
  File "hfit.py", line 17, in <module>
    print type(glm.weights)
AttributeError: 'GLM' object has no attribute 'weights'

in source type(self.weights.dtype) gives <type 'numpy.dtype'>

If I do print self.weights.dtype in source I get object.

@lafras

This comment has been minimized.

Copy link
Author

commented Jul 22, 2014

OK. So I'm reading my data from an xslx file using pandas' read_excel function. Checking the dtype for the read-in data it is object.

@lafras

This comment has been minimized.

Copy link
Author

commented Jul 22, 2014

Calling the astype(float) method on the data frame columns that I use in my fit fixes the problem.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Jul 22, 2014

Good, to know, but we need to either convert or raise an exception in this case.

Both pandas and numpy have been creating more object arrays recently, and we don't have a check for this, so it breaks at some "random" point with non-informative message.

see also #1242 #864
and search https://github.com/statsmodels/statsmodels/search?q=check+dtype&ref=cmdform&state=open&type=Issues

@josef-pkt josef-pkt changed the title glm.fit() gives AttributeError: sqrt dtype object, glm.fit() gives AttributeError: sqrt Jul 22, 2014

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Jul 25, 2014

object propagates

>>> np.column_stack((np.ones(5, float), np.ones(5, object)))
array([[1.0, 1],
       [1.0, 1],
       [1.0, 1],
       [1.0, 1],
       [1.0, 1]], dtype=object)

so we could also just check endog and exog for dtype after all transformations (patsy, ...)

@josef-pkt josef-pkt added this to the 0.6 milestone Aug 10, 2014

@josef-pkt josef-pkt added the prio-high label Aug 10, 2014

@jseabold jseabold added the FAQ label Sep 20, 2014

@jseabold

This comment has been minimized.

Copy link
Member

commented Sep 20, 2014

Not a blocker I don't think. FAQ fodder. Feature addition that will be addressed when we handle dtypes systematically.

@jseabold jseabold removed this from the 0.6 milestone Sep 20, 2014

@jseabold

This comment has been minimized.

Copy link
Member

commented Sep 26, 2014

Closing as (close enough) duplicate of #880.

@jseabold jseabold closed this Sep 26, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.