Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow matrix structure in covariance matrices to be exploited #1643

Merged
merged 6 commits into from May 20, 2014

Conversation

kshedden
Copy link
Contributor

This PR provides adds a new method covariance_matrix_solve to the CovStruct class that solves a system of equations whose coefficient matrix is the covariance matrix represented by the CovStruct instance.

Currently, we construct this covariance matrix explicitly, and solve these systems with a general-purpose solver. However, for many of the dependence structures, the linear algebra can be optimized to exploit the structure of the matrix.

We provide a default implementation of covariance_matrix_solve that uses a general-purpose solver.

We override this with optimized methods for the independence, autoregressive, and exchangeable cases.

The speed improvement would be most noticeable when the cluster sizes are large.

Tests for the independence and exchangeable cases match results obtained from R (as did the earlier code).

We don't have a comparator for the autoregressive case, so I added a regression test. It agrees with the results from the previous (non-optimized) implementation.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 45728d3 on kshedden:gee-linalg-refactor2 into b52bc09 on statsmodels:master.

vinv_d = spl.cho_solve(vco, dmat)
rslt = self.cov_struct.covariance_matrix_solve(expval, i,
sdev, (dmat, resid))
vinv_d, vinv_resid = tuple(rslt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rslt might be None
?

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 078eaae on kshedden:gee-linalg-refactor2 into b52bc09 on statsmodels:master.

y /= sdev[:, None]

if flatten:
y = y[:,0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.squeeze would be safer I think.
np.squeeze will cause shape error later if y.shape[1] > 1
IIUC

@josef-pkt
Copy link
Member

M-dependent is the only case I really worked my way through it. So I might postpone understanding this and just do a superficial review before merge.
Frank's m-dependent PR #1495 will need an update after this is merged.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 4bf5117 on kshedden:gee-linalg-refactor2 into b52bc09 on statsmodels:master.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 69e1166 on kshedden:gee-linalg-refactor2 into b52bc09 on statsmodels:master.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 69e1166 on kshedden:gee-linalg-refactor2 into b52bc09 on statsmodels:master.

@kshedden
Copy link
Contributor Author

Thanks for the comments. I think I've addressed everything.

The m-dependent structure needs a bit of cleanup. I will work with Frank on that.

@josef-pkt
Copy link
Member

I'm not a big fan of rhs, but I don't see an alternative. Now with the docstring it's easy to understand.
We don't have a good terminology for this kind of linear equations. Linear restrictions use r_matrix and q_matrix or something like this, R b = q.
rhs is a bit confusing, because before the docstring I thought of y = x b and x is the "rhs", not y.

This looks like a clean branch according to the network tree, and can be merged with green button.

for x in rhs:
if x.ndim == 1:
x1 = x / stdev
y = x1 / (1 - self.dep_params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side comment, doesn't need to be changed

in general, I still prefer floating point integers to signal that we are working with floats not integers.
(1. - self.dep_params)

(I still have a left-over habit to hunt for bugs caused by integer division.)

@josef-pkt
Copy link
Member

one more: can you inherit the docstring for the methods in the subclasses? for update, covariance_matrix_solve, ...
e.g. generalized linear model has this
622: remove_data.__doc__ = base.LikelihoodModelResults.remove_data.__doc__

I mentioned this in Frank's PR

@josef-pkt
Copy link
Member

I think this is about ready to merge.
Kerby, whenever you think it's ready, I will hit the merge button.

@kshedden
Copy link
Contributor Author

Ready to merge now. Thanks for your help.

josef-pkt added a commit that referenced this pull request May 20, 2014
REF: Allow matrix structure in covariance matrices to be exploited
@josef-pkt josef-pkt merged commit fb2a7de into statsmodels:master May 20, 2014
@josef-pkt
Copy link
Member

merged

@kshedden kshedden deleted the gee-linalg-refactor2 branch June 9, 2014 01:55
PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this pull request Sep 2, 2014
REF: Allow matrix structure in covariance matrices to be exploited
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants