-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed duplicate svd in RegressionModel #1092
Conversation
Thanks, that's a good catch This should also go into 0.5.1 |
made comments in commit instead of PR (by accident) Looks good from what I have seen so far. |
I've addressed all the comments from @josef-pkt in the previous commit. The major update is that df_xxx are now properties. If for some reason you want to update them between init and fit, the behavior hasn't changed. I wasn't 100% satisfied with the way i was calculating rank in the QR case: I would rather still use a rank helper function for consistency. What I've done is pass a smaller matrix of equal rank to the rank() function so we still get a time savings. [EDIT: this paragraph is incorrect. Please ignore. ] Last thing to check: I'm currently using exog in QR and wexog in pinv. That's the way it was when I started this PR, but it isn't clear to me why we would want to be solving with different matrices in those two cases. I think we meant wexog in the QR code path but I want a +1 before I change that. |
just last point exog is a local alias for self.wexog, when I checked this (confusing naming) |
yeah I noticed that. The last commit fixes it. |
I've rebased this to one commit for clarity. |
@josef-pkt Unless you see anything you want to change, this branch is ready. |
if self._df_model is None: | ||
self._df_model = float(self.rank - self.k_constant) | ||
if self._df_resid is None: | ||
self.df_resid = self.nobs - self.rank |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think I'm starting to see how this works.
If fit is called after __init__
then this sets df_model and df_resid and the extra calculations in the properties are never done.
sound ok
I liked it better when it had the original separate commits. moving milestone to 0.6 |
@@ -345,6 +345,29 @@ def isestimable(C, D): | |||
return True | |||
|
|||
|
|||
def extendedpinv(X, rcond=1e-15): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a different name with pinv in front so it's closer to a simple replacement in calculations, pinvext
pinvs
... ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pinv_extended
Looks good and should give us a nice speedup. Ready to merge except for a two suggested changes. We will be able to use the new pinv function also in some other places where we can use the additional information about singular values. @guyrt Thank you for this. |
Sorry about the commit squashing. It's the way they do things over in pandas. I've made the two changes. Rank computation in the two df computations now check whether rank is already defined. I'm not familiar with statsmodels' caching system. Isn't there a ticket about improving it out there somewhere? It's probably worth taking a look at some point. |
about the cache decorator: Our results classes are full of But we use it so far only for simple cases when an attribute is stored in a method. In your case, the caching is across methods, and your |
Should we use the getter and setter decorators for properties since we have 2.6 as minimum Python? I prefer them, though it's not a huge deal. |
Changed name of extendedpinv to pinv_extended Updated properties using decorators
@jseabold Agreed. Fixed. |
Looks good to me. Merge? |
+1 on my end |
yes, good to merge |
ENH: Avoid duplicate svd in RegressionModel
For the future, we need to remember to request an update to the changes file in PRs like this to make release time easier on ourselves. @guyrt if you find the time/desire, could you send a PR to update the I'm imagining this can go under the |
ENH: Avoid duplicate svd in RegressionModel
Closes #1081
Removes several duplicate calls to svd or eigenvalue routines. This should cause speedups.
Also removed a related TODO in the RegressionResults