Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
anova_lm throws error on models created from api.ols but not formula.api.ols #1855
If I fit a linear regression using the array based api, I get the following error:
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-18-58bd3b88eadf> in <module>() ----> 1 anova_lm(model.fit()) /usr/local/lib/python3.4/site-packages/statsmodels/stats/anova.py in anova_lm(*args, **kwargs) 324 if len(args) == 1: 325 model = args --> 326 return anova_single(model, **kwargs) 327 328 try: /usr/local/lib/python3.4/site-packages/statsmodels/stats/anova.py in anova_single(model, **kwargs) 61 62 response_name = model.model.endog_names ---> 63 design_info = model.model.data.orig_exog.design_info 64 exog_names = model.model.exog_names 65 # +1 for resids /usr/local/lib/python3.4/site-packages/pandas/core/generic.py in __getattr__(self, name) 1841 return self[name] 1842 raise AttributeError("'%s' object has no attribute '%s'" % -> 1843 (type(self).__name__, name)) 1844 1845 def __setattr__(self, name, value): AttributeError: 'DataFrame' object has no attribute 'design_info'
This does not occur if when I perform the same ols but use the formula.api.ols methods. This is from Python 3
@jeffmax What's your use case? What's the structure of your design matrix, exog?
To expand a bit on Skipper's answer:
If we don't have the information which columns represent the same underlying explanatory variable, then we can look only at one column at a time. In that case, Anova type 3 is essentially the same as the t-test for the params table in summary.
similar issues in planned features:
stepwise regression or similar:
anova_lm still has to be extended to other models, where we essentially have a list of models that are compared with different tests,
@josef-pkt Thanks for the explanation. I am a student taking a linear regression course where most of the instruction is given in terms of how things are done in Minitab. I use Python a lot at work, and want to know how to use the statistics libraries, so I typically try to duplicate my results from Minitab in Python. I do not believe we have go over on the different types of ANOVA and how derivative terms are dropped, in fact, the way we are doing this in Minitab, I don't think it has any idea about how variables are constructed. If I want to do a 2nd order regression, I have to create a new column of data that is derived from the first order column (squaring each value, and putting the result in a new column), and then the regression is done on those two variables.
The issue here could be that this is just an introductory course on regression? We typically use the ANOVA to determine whether or not all of the exog variables are insignificant.
I found this problem initially because I was first using the regular api OLS to do regression because it was quicker than writing out a formula (and it was the first way I discovered to do it), but I kept running into the error when I tried to do the ANOVA on it. I think as @jseabold commented, it would be helpful if the error pointed the user towards the formula-api instead of just showing an DataFrame attribute error.
@jeffmax Thanks for the explanation.
We don't have the simple ANOVA table associated with a regression. Stata also reports it for the linear regression.
Some other models, like the discrete models, have