TST: anova with `-1` noconstant, add tests #1770

Merged
merged 2 commits into from Jul 20, 2014

Projects

None yet

3 participants

@josef-pkt
Member

adds unit test for noconstant case for ANOVA

issue #1213 looks outdated, results in examples are the same as in R without changes

where/when was this fixed

AFAIU: df_model doesn't know anything about a constant
obviously it does now take implicit constants into account see last comment in #1213

@josef-pkt josef-pkt referenced this pull request Jun 18, 2014
Closed

df_diff in anova_lm #1213

@josef-pkt
Member

@jseabold Does this look ok to you. It's your code.

Do you know since when df_model subtracts 1 for an implicit constant, and where this is set?

@coveralls

Coverage Status

Coverage increased (+0.02%) when pulling 99b8217 on josef-pkt:anova_noconstant_1213 into d5c095a on statsmodels:master.

@jseabold
Member

I just used git bisect. It was fixed in d44e12b.

@jseabold jseabold and 1 other commented on an outdated diff Jun 18, 2014
statsmodels/stats/anova.py
@@ -354,6 +354,8 @@ def anova_lm(*args, **kwargs):
table["ssr"] = lmap(getattr, args, ["ssr"]*n_models)
table["df_resid"] = lmap(getattr, args, ["df_resid"]*n_models)
table.ix[1:]["df_diff"] = np.diff(lmap(getattr, args, ["df_model"]*n_models))
+ # issue #1213 if there are problems with constant, but examples are correct
+ #table.ix[1:]["df_diff"] = -np.diff(lmap(getattr, args, ["df_resid"]*n_models))
@jseabold
jseabold Jun 18, 2014 Member

Can you remove this?

@josef-pkt
josef-pkt Jun 18, 2014 Member

I'd like to switch to using df_resid
I don't trust df_model even if it works correctly in these cases.

@jseabold
jseabold Jun 18, 2014 Member

AFAICT df_resid doesn't account for the constant, so would not be correct in this case?

@josef-pkt
josef-pkt Jun 18, 2014 Member

That's the point, the degrees of freedom difference for the test only depends on the number of parameters and is independent of whether one of them is a constant or not, which would be in df_resid.
In constrast, df_model excludes the constant if we can identify hasconst=1. (so we need to watch out that we count all parameters. (df_model + hasconst)_modelA - (df_model + hasconst)_modelB

what I'm not sure about:
how does it work for the first comparison with "no regressors" at all if there is no constant. Given that it's an f_test it would be just that all coefficients are zero, including the constant if there is a constant.
This still assumes that we have a nested sequence of models, so either they all have an explicit or implicit constant or none of them does.

(aside: penalized models have fractional df_resid, but df_model should be adjusted accordingly, so it doesn't add any additional problems)

@coveralls

Coverage Status

Coverage increased (+0.79%) when pulling 106f5df on josef-pkt:anova_noconstant_1213 into d5c095a on statsmodels:master.

josef-pkt added some commits Jun 18, 2014
@josef-pkt josef-pkt TST: anova with `-1` noconstant, add tests 6f7b493
@josef-pkt josef-pkt REF: anova_lm: use diff(df_resid)
8db273a
@josef-pkt
Member

rebased, force pushed, ready for merge

@josef-pkt josef-pkt merged commit 88a4c51 into statsmodels:master Jul 20, 2014

2 checks passed

continuous-integration/appveyor AppVeyor build succeeded
Details
continuous-integration/travis-ci The Travis CI build passed
Details
@josef-pkt josef-pkt deleted the josef-pkt:anova_noconstant_1213 branch Jul 20, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment