Fixed duplicate svd in RegressionModel #1092

guyrt · 2013-09-24T01:26:01Z

Removes several duplicate calls to svd or eigenvalue routines. This should cause speedups.

Also removed a related TODO in the RegressionResults

coveralls · 2013-09-24T01:36:26Z

Coverage remained the same when pulling 12a4694 on guyrt:streamline-linearalgebra into 16365ed on statsmodels:master.

josef-pkt · 2013-09-24T03:13:46Z

Thanks, that's a good catch

This should also go into 0.5.1

coveralls · 2013-09-26T01:18:01Z

Coverage remained the same when pulling 602295e on guyrt:streamline-linearalgebra into 16365ed on statsmodels:master.

josef-pkt · 2013-09-26T02:34:19Z

made comments in commit instead of PR (by accident)

Looks good from what I have seen so far.
I don't understand the adjustments to the df in the unit tests.

guyrt · 2013-09-26T21:13:56Z

I've addressed all the comments from @josef-pkt in the previous commit.

The major update is that df_xxx are now properties. If for some reason you want to update them between init and fit, the behavior hasn't changed.

I wasn't 100% satisfied with the way i was calculating rank in the QR case: I would rather still use a rank helper function for consistency. What I've done is pass a smaller matrix of equal rank to the rank() function so we still get a time savings.

[EDIT: this paragraph is incorrect. Please ignore. ] Last thing to check: I'm currently using exog in QR and wexog in pinv. That's the way it was when I started this PR, but it isn't clear to me why we would want to be solving with different matrices in those two cases. I think we meant wexog in the QR code path but I want a +1 before I change that.

josef-pkt · 2013-09-26T21:22:26Z

just last point exog is a local alias for self.wexog, when I checked this (confusing naming)

guyrt · 2013-09-26T21:25:43Z

yeah I noticed that. The last commit fixes it.

coveralls · 2013-09-26T21:45:49Z

Coverage remained the same when pulling daae4a8 on guyrt:streamline-linearalgebra into 16365ed on statsmodels:master.

…sion fit.

guyrt · 2013-09-28T21:44:43Z

I've rebased this to one commit for clarity.

coveralls · 2013-09-28T21:52:26Z

Coverage remained the same when pulling 9e78d52 on guyrt:streamline-linearalgebra into 16365ed on statsmodels:master.

guyrt · 2013-10-13T22:36:00Z

@josef-pkt Unless you see anything you want to change, this branch is ready.

josef-pkt · 2013-10-22T23:12:21Z

statsmodels/regression/linear_model.py

+        if self._df_model is None:
+            self._df_model = float(self.rank - self.k_constant)
+        if self._df_resid is None:
+            self.df_resid = self.nobs - self.rank


Ok, I think I'm starting to see how this works.
If fit is called after __init__ then this sets df_model and df_resid and the extra calculations in the properties are never done.
sound ok

josef-pkt · 2013-10-22T23:20:31Z

I liked it better when it had the original separate commits.
I doubt we want to backport this to 0.5.1, there are now changes that might change the behavior too much for a bugfix release.

moving milestone to 0.6

josef-pkt · 2013-10-22T23:42:34Z

statsmodels/tools/tools.py

@@ -345,6 +345,29 @@ def isestimable(C, D):
    return True


+def extendedpinv(X, rcond=1e-15):


maybe a different name with pinv in front so it's closer to a simple replacement in calculations, pinvext pinvs ... ?

pinv_extended

josef-pkt · 2013-10-22T23:56:37Z

Looks good and should give us a nice speedup. Ready to merge except for a two suggested changes.

We will be able to use the new pinv function also in some other places where we can use the additional information about singular values.

@guyrt Thank you for this.
Sorry about the delay. I would prefer if we don't squash commits if they are separate logical units. It took me a bit of time to figure out what the new properties were doing.

guyrt · 2013-10-23T02:55:35Z

Sorry about the commit squashing. It's the way they do things over in pandas.

I've made the two changes. Rank computation in the two df computations now check whether rank is already defined.

I'm not familiar with statsmodels' caching system. Isn't there a ticket about improving it out there somewhere? It's probably worth taking a look at some point.

coveralls · 2013-10-23T03:01:29Z

Coverage remained the same when pulling cdc6830 on guyrt:streamline-linearalgebra into 16365ed on statsmodels:master.

josef-pkt · 2013-10-23T03:32:47Z

about the cache decorator:
It has been working very well for many years and we never made a change to the original version by Pierre GM.
The only open issue that I could find is #886. There are some problem how cache-decorated properties are handled by sphinx. It's not obvious in the generated docs that these use attribute access and are not methods. There were also some problems with overriding them in subclasses, but nothing serious or clear enough for an issue (most likely just the way python behaves).
The cache itself is a dictionary attached as _cache to the instance.

Our results classes are full of @cache_readonly decorated methods to calculate and store attributes on demand (lazily), for example the set of methods in the RegressionResults: https://github.com/statsmodels/statsmodels/blob/master/statsmodels/regression/linear_model.py#L892

But we use it so far only for simple cases when an attribute is stored in a method. In your case, the caching is across methods, and your _df_model check is more explicit and easier to understand.

jseabold · 2013-10-23T10:52:10Z

Should we use the getter and setter decorators for properties since we have 2.6 as minimum Python? I prefer them, though it's not a huge deal.

Changed name of extendedpinv to pinv_extended Updated properties using decorators

guyrt · 2013-10-23T14:06:45Z

@jseabold Agreed. Fixed.

coveralls · 2013-10-23T14:15:31Z

Coverage increased (+0.008%) to 74.286% when pulling 162e0e3 on guyrt:streamline-linearalgebra into 84e7607 on statsmodels:master.

jseabold · 2013-10-23T14:59:09Z

Looks good to me. Merge?

guyrt · 2013-10-23T15:51:32Z

+1 on my end

josef-pkt · 2013-10-23T19:38:58Z

yes, good to merge

ENH: Avoid duplicate svd in RegressionModel

jseabold · 2013-10-24T10:27:42Z

For the future, we need to remember to request an update to the changes file in PRs like this to make release time easier on ourselves. @guyrt if you find the time/desire, could you send a PR to update the docs/source/release/version0.6.rst with a blurb about this change? You can look in version0.5.rst to get an idea of what we're after.

I'm imagining this can go under the Other important new features header.

ENH: Avoid duplicate svd in RegressionModel

josef-pkt mentioned this pull request Sep 26, 2013

streamline linear algebra for linear model #1081

Closed

ENH: Reduce number of duplicate linear algebraic operations in regres…

9e78d52

…sion fit.

josef-pkt reviewed Oct 22, 2013
View reviewed changes

Removed one more rank issue

162e0e3

Changed name of extendedpinv to pinv_extended Updated properties using decorators

jseabold added a commit that referenced this pull request Oct 24, 2013

Merge pull request #1092 from guyrt/streamline-linearalgebra

1bc6576

ENH: Avoid duplicate svd in RegressionModel

jseabold merged commit 1bc6576 into statsmodels:master Oct 24, 2013

PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this pull request Sep 2, 2014

Merge pull request statsmodels#1092 from guyrt/streamline-linearalgebra

294b824

ENH: Avoid duplicate svd in RegressionModel

josef-pkt mentioned this pull request Jun 4, 2017

linear_model code paths #3740

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed duplicate svd in RegressionModel #1092

Fixed duplicate svd in RegressionModel #1092

guyrt commented Sep 24, 2013

coveralls commented Sep 24, 2013

josef-pkt commented Sep 24, 2013

coveralls commented Sep 26, 2013

josef-pkt commented Sep 26, 2013

guyrt commented Sep 26, 2013

josef-pkt commented Sep 26, 2013

guyrt commented Sep 26, 2013

coveralls commented Sep 26, 2013

guyrt commented Sep 28, 2013

coveralls commented Sep 28, 2013

guyrt commented Oct 13, 2013

josef-pkt Oct 22, 2013

josef-pkt commented Oct 22, 2013

josef-pkt Oct 22, 2013

guyrt Oct 23, 2013

josef-pkt commented Oct 22, 2013

guyrt commented Oct 23, 2013

coveralls commented Oct 23, 2013

josef-pkt commented Oct 23, 2013

jseabold commented Oct 23, 2013

guyrt commented Oct 23, 2013

coveralls commented Oct 23, 2013 •

edited

Loading

jseabold commented Oct 23, 2013

guyrt commented Oct 23, 2013

josef-pkt commented Oct 23, 2013

jseabold commented Oct 24, 2013

		@@ -345,6 +345,29 @@ def isestimable(C, D):
		return True


		def extendedpinv(X, rcond=1e-15):

Fixed duplicate svd in RegressionModel #1092

Fixed duplicate svd in RegressionModel #1092

Conversation

guyrt commented Sep 24, 2013

coveralls commented Sep 24, 2013

josef-pkt commented Sep 24, 2013

coveralls commented Sep 26, 2013

josef-pkt commented Sep 26, 2013

guyrt commented Sep 26, 2013

josef-pkt commented Sep 26, 2013

guyrt commented Sep 26, 2013

coveralls commented Sep 26, 2013

guyrt commented Sep 28, 2013

coveralls commented Sep 28, 2013

guyrt commented Oct 13, 2013

josef-pkt Oct 22, 2013

Choose a reason for hiding this comment

josef-pkt commented Oct 22, 2013

josef-pkt Oct 22, 2013

Choose a reason for hiding this comment

guyrt Oct 23, 2013

Choose a reason for hiding this comment

josef-pkt commented Oct 22, 2013

guyrt commented Oct 23, 2013

coveralls commented Oct 23, 2013

josef-pkt commented Oct 23, 2013

jseabold commented Oct 23, 2013

guyrt commented Oct 23, 2013

coveralls commented Oct 23, 2013 • edited Loading

jseabold commented Oct 23, 2013

guyrt commented Oct 23, 2013

josef-pkt commented Oct 23, 2013

jseabold commented Oct 24, 2013

coveralls commented Oct 23, 2013 •

edited

Loading