New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Nested ANOVA incorrect results -- wrong matrix definition #8506
Comments
Problem #8336 is seemingly similar (wrong number of DOF) but does not reference the zero columns, high condition number issue. |
I just saw the statsstackexchange question Do you have the expected output? I never really figured out nested effects and formulas for it. trying out some things
|
We can keep this issue because it has a complete example. related possibility: |
@josef-pkt yes, I think it does drop the collinear variables; in particular, looking into the DesignMatrix produced by Patsy I see some columns are all-empty, which are the interactions between the nested factors (say B) and the levels of A in which they do not appear. I tried dropping them manually but then Statsmodels complains that the DesignInfo is no longer matching the DesignMatrix. Is there any way when I build the model that I can specify to drop collinear columns? |
Aun con este cambio no realiza correctamente el Anova anidada |
Description of the bug
statsmodels.formula.api.ols run on data with nested factors (i.e. not complete crossing) gives incorrect results.
This problem has been reported at least twice independently
Code Sample
I'll replicate the code from example 2) above:
The crossing factor A:B has 33 DOF, which is only right if Statsmodels crossed factors nested in A1 with A2.
Running model.summary() shows the condition number is very low:
which you would expect when all-zero columns are left in the model matrix.
Requested change
Add automatic detection and ignore of all-zero columns (inappropriately crossed factors) or give user capability to manually do so. As Statsmodel works currently, any nested model fitted with OLS will give completely erroneous results.
The text was updated successfully, but these errors were encountered: