Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: OLS/WLS based on summary statistic, (cov_data) #3901

Open
josef-pkt opened this issue Sep 3, 2017 · 4 comments
Open

ENH: OLS/WLS based on summary statistic, (cov_data) #3901

josef-pkt opened this issue Sep 3, 2017 · 4 comments

Comments

@josef-pkt
Copy link
Member

(I don't find an issue, but the idea is old.)

This is similar to #3570 but for the specific case of adding a model class for linear regression based on summary statistics like the covariance matrix of the data. As variation of this the summary statistic can be a matrix decomposition of exog.

This can include multivariate OLS.

One usecase is to use this with non-standard covariance matrix, e.g. a robust #3230 or penalized #3197 cov_data estimate.
For a sparse model it is not clear whether or when we need sparsified covariance or sparsified inverse covariance.

similar other multivariate methods can be based on "modified" covariance estimate.

@sunilk747
Copy link

Hello, I am new to this repo. I would like to work on this issue if nobody is working on this.

@josef-pkt
Copy link
Member Author

Hi, AFAIK nobody is working on this

just a warning: this requires being familiar with the statsmodels models and results class structure.

The rough idea is to create a new model class that only takes the cross-product matrix of column_stacked (x, y), where x needs to have the constant as first column. as argument, and then replicate as much as possible of the RegressionResults class but only using the cross-product information.

@josef-pkt
Copy link
Member Author

bump for 0.11
maybe something quick like OLSVectorized in special linear model #5382

Even having just minimal model and results classes would already be helpful for quick experimentation.
(and leave extension to using other linalg decomposition of the moment matrices for later.)

@josef-pkt
Copy link
Member Author

bump again

after #8129 we can have outlier robust OLS based on cov

two possible interfaces

  • moment matrices z.T z where z is column_stacked [exog, endog]
  • centered moment cov(z) plus mean(z), cov(z) could be scatter/shape matrix for ols params

question is how do we compute cov_params, especially robust scale (residual variance), and possibly sandwiches?

One possibility for that would be to use only params for outlier-influence checks that check for more than a single outlier as the current outlier-influence diagnostic, i.e. allow higher breakdown point for outlier and influential point identification.
(I will open a separate issue)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants