streamline linear algebra for linear model #1081

Closed
josef-pkt opened this Issue Sep 12, 2013 · 8 comments

Projects

None yet

3 participants

@josef-pkt
Member

df_model uses a separate svd to get rank of exog

we can use svd/pinv or qr directly to find the rank of exog, but it would need to be moved to fit instead of initialize.

do we need to check jacobian during WLS iteration in GLM and RLM ?

Related: For nonlinear models we know the rank of the Jacobian only in fit, not during __init__

@guyrt
Contributor
guyrt commented Sep 24, 2013

I fixed a minor duplicate, but I see the following cases as easily improvable:

  1. Move degrees of freedom and rank computations to fit, as you suggest. I've done this locally and it does not break tests.
  2. In QR case, use R to compute rank
  3. In pinv case, write our own pinv that returns the svd matrices. This way we can call svd once and use it for both rank and the pseudo-inverse.

@josef-pkt if this is what you had in mind, I can submit a PR that does all three.

@jseabold
Member

Thanks. That sounds good to me. We've punted on this, but we might also think about switching to QR by default. I can't remember the exact performance comparisons, but I think QR is O(k^3) in the best case and SVD is O(nk^2) for X - n x k. Does that sound right? We also re-use QR decomposition later (if available) for ANOVA.

@josef-pkt
Member

@guyrt Thanks for looking into this.

Yes, the 3 points is what I had in mind.

A PR would be very good for this. I don't expect we have problems in linear models.

As a later followup, we can look at how it affects other possible subclasses, and classes that reuse the linear model in their estimation.

One difference, after moving df_model to the pinv in fit, is that rank will be based on the whitened exog, wexog. which might be a more appropriate choice if df_model based on exog is not the same as dv_model based on wexog.
maybe there are no cases,
(I was thinking about df_resid and cases where we have reduced number of observation, WLS/RLM when weights are zero, dropping initial conditions in GLSAR. - but that's independent of calculating df_model/rank.)

Switching to QR as default is is a separate issue and not just an internal change it has consequences for the user, that are backwards incompatible. We can switch to it, but it will break with singular matrices which are currently allowed.

More recent scipy (>=0.10 IIRC) have rank revealing pivoting QR.

@guyrt
Contributor
guyrt commented Sep 24, 2013

#566 is issue for QR

@guyrt
Contributor
guyrt commented Sep 24, 2013

I'll put off QR for now, but one thing I would note in pinv's favor is that we still need to compute the svd to get the condition number of exog (computed in RegressionResult).

@guyrt
Contributor
guyrt commented Sep 26, 2013

Ok, I've pushed a set of changes that reduces number of calls to svd or eig from 3 to 1.

Ignore my previous comment about QR: computing the svd of R is very cheap. In particular, since R is square in the number of variables, the complexity of svd does not depend on nobs.

@josef-pkt
Member

cross ref changes are in PR #1092

Does this PR now also contain the cond changes from the granger PR #1018 ?

@guyrt
Contributor
guyrt commented Sep 26, 2013

This PR does not contain Granger changes. If we get this one merged in first, I can rebase the Granger changes and resolve any conflicts.

@jseabold jseabold closed this in #1092 Oct 24, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment