-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add direct proportion option to statistics.linear_regression() #89927
Comments
Signature: def linear_regression(x, y, /, *, proportional=False): Additional docstring with example:
y = slope * x + noise >>> y = [3 * x[i] + noise[i] for i in range(5)]
>>> linear_regression(x, y, proportional=True) #doctest: +ELLIPSIS
LinearRegression(slope=3.0244754248461283, intercept=0.0) See Wikipedia entry for regression without an intercept term: Compare with the *const* parameter in MS Excel's linest() function: Compare with the *IncludeConstantBasis* option in Mathematica: |
Hi Raymond, I'm conflicted by this. Regression through the origin is clearly a thing which is often desired. In that sense, I'm happy to see it added, and thank you. But on the other hand, this may open a can of worms that I personally don't feel entirely competent to deal with. Are you happy to hold off a few days while I consult with some statistics experts?
https://web.ist.utl.pt/~ist11038/compute/errtheory/,regression/regrthroughorigin.pdf
https://www.theanalysisfactor.com/regression-through-the-origin/
https://pubs.cif-ifc.org/doi/pdf/10.5558/tfc71326-3 but it's not clear how to revise the calculation, with some methods giving R squared negative or greater than 1.
|
Sure, I’m happy to wait. My thoughts:
|
It usually isn't wise to be preachy in the docs, but we could add a suggestion that proportional=True be used only when (0, 0) is known to be in the dataset and when it is in the same neighborhood as the other data points. A reasonable cross-check would be to verify than a plain OLS regression would produce an intercept near zero. linear_regression(hours_since_poll_started, number_of_respondents, proportional=True) |
Hi Raymond, I'm satisfied that this should be approved. The code looks good to me I don't think there is any need to verify that plain OLS regression Regarding my concern with the coefficient of determination, I don't For the record, an example of the problem can be seen on the last slide The computed r**2 of 1.0 is clearly too high for the RTO line. |
Thanks for looking at this and giving it some good thought. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: