New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: add rsquared for statespace model #4734 #6620
base: main
Are you sure you want to change the base?
ENH: add rsquared for statespace model #4734 #6620
Conversation
Hello @BenjaminLiuPenrose! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-04-17 23:11:02 UTC |
Codecov Report
@@ Coverage Diff @@
## master #6620 +/- ##
==========================================
- Coverage 85.31% 85.3% -0.01%
==========================================
Files 646 646
Lines 103922 103926 +4
Branches 11311 11311
==========================================
+ Hits 88657 88658 +1
- Misses 12806 12809 +3
Partials 2459 2459
Continue to review full report at Codecov.
|
where should I write the test case to pass the |
Thanks very much for submitting this PR, @BenjaminLiuPenrose! This will be a really nice feature to have. I have a couple of suggestions:
Thanks again! |
Sure. Will work on that |
Note: there was typo in my code computing the |
@ChadFulton how about now? |
That's great, thanks! I will have a change to make a couple of minor comments on the code itself hopefully later today. The only major thing remaining is coming up with some unit tests. |
sure;) |
@ChadFulton any updates? |
I have a couple of comments, but the main thing now is to get some unit tests. Are you familiar with unit testing? Do you know of any resources we could test this against, or have you thought about other ways that we could write some tests? |
@ChadFulton guess I will add but what I can do for now is to check r2 is displayed but I don't know the 'ground true' value for each of r2 wrt to the also, can you also point me out how to extend r2 metric for multi-endog variables problem? |
@ChadFulton test case added, it is a trivial case |
@ChadFulton not sure how to print R2 for multivariate case |
This is looking good, thanks!
I agree that this is a tough call. Currently for other output, like the test statistics, I just print the list, but that is not very pretty. One option would be to create a table in the summary output that displays the R2 for each |
I'm now printing it as np array |
any ideas why the ci/appveyor/pr test failed? |
I'm not sure, but it seems like it's unrelated. On a side note - this is looking good, and I'm sorry for the delay - it will probably take me a day or two to get back to this, based on other time commitments. |
sure, np |
@ChadFulton updates? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some more comments. The only big thing that we need to do before this can be merged is to add unit tests for the rwdrift
and seasonal
cases.
@@ -2894,6 +2894,76 @@ def zvalues(self): | |||
""" | |||
return self.params / self.bse | |||
|
|||
def get_rsquared(self, baseline="rwdrift", **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add kwargs
later if necessary, but we should avoid it unless we actually need to capture unknown keyword arguments (it can lead to problems with, e.g., misspelled keyword arguments).
endog = np.array([1, 2, 4, 8, 16]) | ||
exog = np.array([1, 2, 4, 8, 16]) | ||
|
||
mod = sarimax.SARIMAX(endog, exog, order=(0, 0, 0), trend='c') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that trend='c'
is doing anything here, so we may as well remove it.
exog = np.array([1, 2, 4, 8, 16]) | ||
|
||
mod = sarimax.SARIMAX(endog, exog, order=(0, 0, 0), trend='c') | ||
res = mod.fit(disp=-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to fit
the model - instead, you could put this after you fit the benchmark model and use:
res = mod.smooth(benchmark_res.params)
def test_summary_rsquared(): | ||
from statsmodels.regression.linear_model import OLS | ||
endog = np.array([1, 2, 4, 8, 16]) | ||
exog = np.array([1, 2, 4, 8, 16]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be a perfect fitting model, so you should make exog not identical to endog. Also, so that the OLS R^2 matches, exog needs to include a constant column.
endog = np.array([1, 2, 4, 8, 16]) | ||
exog = np.array([1, 2, 4, 8, 16]) | ||
|
||
mod = sarimax.SARIMAX(endog, exog, order=(0, 0, 0), trend='c') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add concentrate_scale=True
so you don't have to estimate the variance parameter.
Yea, I agreed. What do you think will be a good test case for |
rsquared = 1. - sse / ssm | ||
elif baseline == "seasonal": | ||
from statsmodels.regression.linear_model import OLS | ||
from statsmodels.tools.tools import add_constant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could just use AutoReg which directly supports this model specification at a high level.
statsmodels.tsa.ar_model.AutoReg(endog, 0, trend='c', seasonal=True, periods=seasonal)
However, we have implmented `rsquared_rwdrift` and `rsquared_mean` | ||
It is recommended to use `rsquared_rwdrift` | ||
""" | ||
return self.get_rsquared('') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this produce an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the error is intentional, you should directly raise NotImplementedError here. You can pull the common message outside the class to share it.
@cache_readonly | ||
def rsquared_mean(self): | ||
""" | ||
(float or array) conventional R-squared, 1 - sse/ssm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring all have the wrong format. Could you please update to NumPy style?
Overall pretty close and a useful contribution. It would be good to get this across the line. |
NumPy's guide.
Notes:
needed for doc changes.
then show that it is fixed with the new code.
verify you changes are well formatted by running
flake8
is installed. This command is also available on Windowsusing the Windows System for Linux once
flake8
is installed in thelocal Linux environment. While passing this test is not required, it is good practice and it help
improve code quality in
statsmodels
.