Allow matrix structure in covariance matrices to be exploited #1643

Merged
merged 6 commits into from May 20, 2014

Projects

None yet

3 participants

@kshedden
Contributor

This PR provides adds a new method covariance_matrix_solve to the CovStruct class that solves a system of equations whose coefficient matrix is the covariance matrix represented by the CovStruct instance.

Currently, we construct this covariance matrix explicitly, and solve these systems with a general-purpose solver. However, for many of the dependence structures, the linear algebra can be optimized to exploit the structure of the matrix.

We provide a default implementation of covariance_matrix_solve that uses a general-purpose solver.

We override this with optimized methods for the independence, autoregressive, and exchangeable cases.

The speed improvement would be most noticeable when the cluster sizes are large.

Tests for the independence and exchangeable cases match results obtained from R (as did the earlier code).

We don't have a comparator for the autoregressive case, so I added a regression test. It agrees with the results from the previous (non-optimized) implementation.

@coveralls

Coverage Status

Coverage remained the same when pulling 45728d3 on kshedden:gee-linalg-refactor2 into b52bc09 on statsmodels:master.

@josef-pkt josef-pkt commented on an outdated diff Apr 28, 2014
statsmodels/genmod/generalized_estimating_equations.py
- vinv_d = spl.cho_solve(vco, dmat)
+ rslt = self.cov_struct.covariance_matrix_solve(expval, i,
+ sdev, (dmat, resid))
+ vinv_d, vinv_resid = tuple(rslt)
@josef-pkt
josef-pkt Apr 28, 2014 Member

rslt might be None
?

@coveralls

Coverage Status

Coverage remained the same when pulling 078eaae on kshedden:gee-linalg-refactor2 into b52bc09 on statsmodels:master.

@josef-pkt josef-pkt commented on an outdated diff Apr 28, 2014
statsmodels/genmod/dependence_structures/covstruct.py
+ x = x[:, None]
+ flatten = True
+ x1 = x / sdev[:, None]
+
+ z0 = np.zeros((1, x.shape[1]))
+ rhs1 = np.concatenate((x[1:,:], z0), axis=0)
+ rhs2 = np.concatenate((z0, x[0:-1,:]), axis=0)
+
+ y = c0*x + c2*rhs1 + c2*rhs2
+ y[0, :] = c1*x[0, :] + c2*x[1, :]
+ y[-1, :] = c1*x[-1, :] + c2*x[-2, :]
+
+ y /= sdev[:, None]
+
+ if flatten:
+ y = y[:,0]
@josef-pkt
josef-pkt Apr 28, 2014 Member

np.squeeze would be safer I think.
np.squeeze will cause shape error later if y.shape[1] > 1
IIUC

@josef-pkt josef-pkt commented on an outdated diff Apr 28, 2014
statsmodels/genmod/dependence_structures/covstruct.py
def covariance_matrix(self, expval, index):
dim = len(expval)
return np.eye(dim, dtype=np.float64), True
+ def covariance_matrix_solve(self, expval, i, sdev, rhs):
@josef-pkt
josef-pkt Apr 28, 2014 Member

I would prefer stdev in similarity to std

@josef-pkt
Member

M-dependent is the only case I really worked my way through it. So I might postpone understanding this and just do a superficial review before merge.
Frank's m-dependent PR #1495 will need an update after this is merged.

@coveralls

Coverage Status

Coverage remained the same when pulling 4bf5117 on kshedden:gee-linalg-refactor2 into b52bc09 on statsmodels:master.

@coveralls

Coverage Status

Coverage remained the same when pulling 69e1166 on kshedden:gee-linalg-refactor2 into b52bc09 on statsmodels:master.

@coveralls

Coverage Status

Coverage remained the same when pulling 69e1166 on kshedden:gee-linalg-refactor2 into b52bc09 on statsmodels:master.

@kshedden
Contributor

Thanks for the comments. I think I've addressed everything.

The m-dependent structure needs a bit of cleanup. I will work with Frank on that.

@josef-pkt

for consistency with the generic (and readability), i -> index

@josef-pkt
Member

I'm not a big fan of rhs, but I don't see an alternative. Now with the docstring it's easy to understand.
We don't have a good terminology for this kind of linear equations. Linear restrictions use r_matrix and q_matrix or something like this, R b = q.
rhs is a bit confusing, because before the docstring I thought of y = x b and x is the "rhs", not y.

This looks like a clean branch according to the network tree, and can be merged with green button.

@josef-pkt josef-pkt commented on an outdated diff Apr 29, 2014
statsmodels/genmod/dependence_structures/covstruct.py
def covariance_matrix(self, expval, index):
dim = len(expval)
dp = self.dep_params * np.ones((dim, dim), dtype=np.float64)
return dp + (1 - self.dep_params) * np.eye(dim), True
+ def covariance_matrix_solve(self, expval, i, stdev, rhs):
+
+ k = len(expval)
+ c = self.dep_params / (1 - self.dep_params)
+ c /= 1 + self.dep_params * (k - 1)
+
+ rslt = []
+ for x in rhs:
+ if x.ndim == 1:
+ x1 = x / stdev
+ y = x1 / (1 - self.dep_params)
@josef-pkt
josef-pkt Apr 29, 2014 Member

side comment, doesn't need to be changed

in general, I still prefer floating point integers to signal that we are working with floats not integers.
(1. - self.dep_params)

(I still have a left-over habit to hunt for bugs caused by integer division.)

@josef-pkt
Member

one more: can you inherit the docstring for the methods in the subclasses? for update, covariance_matrix_solve, ...
e.g. generalized linear model has this
622: remove_data.__doc__ = base.LikelihoodModelResults.remove_data.__doc__

I mentioned this in Frank's PR

@josef-pkt
Member

I think this is about ready to merge.
Kerby, whenever you think it's ready, I will hit the merge button.

@kshedden
Contributor

Ready to merge now. Thanks for your help.

@josef-pkt josef-pkt merged commit fb2a7de into statsmodels:master May 20, 2014

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details
@josef-pkt
Member

merged

@kshedden kshedden deleted the kshedden:gee-linalg-refactor2 branch Jun 9, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment