New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: add OLSVectorized, initial version #5382
base: main
Are you sure you want to change the base?
Conversation
r"""results class for vectorized OLS | ||
""" | ||
|
||
@cache_writable() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why writable? This is used in very few places
return (wendog * wendog).sum(0) | ||
|
||
def conf_int(self, alpha=.05, cols=None): | ||
#print('using OLSVectorizedResults.conf_int') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete
Examples | ||
-------- | ||
>>> import numpy as np | ||
>>> import statsmodels.api as sm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If no examples, remove section
|
||
|
||
class OLSVectorized(OLS): | ||
_results_class = OLSVectorizedResults |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing this is better than setting it in init, but to make this work you had to put the model class below the results class, contrary to the pattern. Prettier to follow statespace/regimeswitching precedent and make this a property.
res2_list = self.res2_list | ||
|
||
|
||
attrs = ['params', 'scale', 'bse', 'tvalues', 'pvalues', 'ssr', 'centered_tss', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parametrize
Big picture, what’s the difference between VectorizedOLS and SUR? |
It's a special case with limited functionality that can be vectorized. OLSVectorized assumes common exog, but assumes cov_resid is diagonal or irrelevant for inference and only provides inference within an equation but not for the joint params. That is, it is just a computational improvement over running many (univariate endog) OLS in a loop. |
application residual bootstrap, exog is the same for all bootstrap samples "The code implements the multiplier bootstrap efficiently by executing all 10,000 regressions simultaneously, exploiting the fact that the regressors are common across the bootstrap replications." Bruce E. Hansen 2017: Regression Kink With an Unknown Threshold bootstrap is used to get simulated p-values that are not pivotal |
test for joint hypothesis |
I this still planned for a merge? @josef-pkt import statsmodels.api as sm
n_rows, n_col = some_array.shape
X = sm.add_constant(temp)
for j in range(n_cols):
Y = some_array[:, j]
model = sm.OLS(Y, X)
results = model.fit()
# append results to some variable This snippet is from a function that is called in another loop many times. It is very slow and I was wondering if there is another way I can speed this loop up? |
@zoj613 Yes, the plan is still to merge this. The main open question is what should be in the results class, e.g. vectorized t-test. Your loop can be done in a single regression which should be much faster unless there are memory problems with huge |
X is a 1d array whose size is equal to the number of rows of |
your don't have a constant in your regression? the basic idea is the The main problem is that the residual scale and standard errors are different for each y, which needs some workarounds to get, for example, pvalues. |
If you have only one explanatory variable, then explicit formulas are likely faster. I only looked at multiple regression here. |
04aa547
to
5f5f3b3
Compare
Hello @josef-pkt! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
|
Thanks for the suggestion. I was able to get a fast solution by just using the matrix OLS formulation and take advantage of numpy's broadcasting rules. The rsquared and pvalues were fairly easy to compute by hand using scipy's t distribution and formulae for the various sums of squares. |
rebased all main attributes correspond to the loop at rtol 1e-13, I had around with a failure when agreement was around 5e-13.
It should be easy to merge, in case we raise notimplementederror with wald_test and similar. other methods, e.g. get_prediction ? |
This pull request introduces 1 alert when merging 5f5f3b3 into 152e27d - view on LGTM.com new alerts:
|
t_test has unit tests, so only joint hypothesis are not supported no checks for no pandas results wrapper. I might skip that here. summary work but only has the params table. update in my example: |
see #2203 adding special_linear_model
see #4771 adding vectorized OLS
this subclasses OLS and RegressionResults but only a subset of results will work
The core attributes are working correctly,
Among the methods I only checked conf_int and t_test so far.