df_model uses a separate svd to get rank of exog
we can use svd/pinv or qr directly to find the rank of exog, but it would need to be moved to fit instead of initialize.
do we need to check jacobian during WLS iteration in GLM and RLM ?
Related: For nonlinear models we know the rank of the Jacobian only in fit, not during __init__
I fixed a minor duplicate, but I see the following cases as easily improvable:
@josef-pkt if this is what you had in mind, I can submit a PR that does all three.
Thanks. That sounds good to me. We've punted on this, but we might also think about switching to QR by default. I can't remember the exact performance comparisons, but I think QR is O(k^3) in the best case and SVD is O(nk^2) for X - n x k. Does that sound right? We also re-use QR decomposition later (if available) for ANOVA.
@guyrt Thanks for looking into this.
Yes, the 3 points is what I had in mind.
A PR would be very good for this. I don't expect we have problems in linear models.
As a later followup, we can look at how it affects other possible subclasses, and classes that reuse the linear model in their estimation.
One difference, after moving df_model to the pinv in fit, is that rank will be based on the whitened exog, wexog. which might be a more appropriate choice if df_model based on exog is not the same as dv_model based on wexog.
maybe there are no cases,
(I was thinking about df_resid and cases where we have reduced number of observation, WLS/RLM when weights are zero, dropping initial conditions in GLSAR. - but that's independent of calculating df_model/rank.)
Switching to QR as default is is a separate issue and not just an internal change it has consequences for the user, that are backwards incompatible. We can switch to it, but it will break with singular matrices which are currently allowed.
More recent scipy (>=0.10 IIRC) have rank revealing pivoting QR.
#566 is issue for QR
I'll put off QR for now, but one thing I would note in pinv's favor is that we still need to compute the svd to get the condition number of exog (computed in RegressionResult).
Ok, I've pushed a set of changes that reduces number of calls to svd or eig from 3 to 1.
Ignore my previous comment about QR: computing the svd of R is very cheap. In particular, since R is square in the number of variables, the complexity of svd does not depend on nobs.
cross ref changes are in PR #1092
Does this PR now also contain the cond changes from the granger PR #1018 ?
This PR does not contain Granger changes. If we get this one merged in first, I can rebase the Granger changes and resolve any conflicts.