Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the r-value outputted by scipy.stats.linregress always the Pearson correlation coefficient? #14416

Closed
veeara282 opened this issue Jul 14, 2021 · 1 comment · Fixed by #14458
Labels
Documentation Issues related to the SciPy documentation. Also check https://github.com/scipy/scipy.org scipy.stats
Milestone

Comments

@veeara282
Copy link
Contributor

The documentation just says that rvalue is the "Correlation coefficient" but there are many different correlation coefficients, not just the Pearson one. However, it also says that the square of rvalue is the coefficient of determination, and according to Wikipedia:

When an intercept is included, then r2 is simply the square of the sample correlation coefficient (i.e., r) between the observed outcomes and the observed predictor values.[4] If additional regressors are included, R2 is the square of the coefficient of multiple correlation.

Is the output always the Pearson coefficient, or is it sometimes the coefficient of multiple correlation? We should clarify that in the documentation.

@charlotte12l charlotte12l added Documentation Issues related to the SciPy documentation. Also check https://github.com/scipy/scipy.org scipy.stats labels Jul 16, 2021
@charlotte12l
Copy link
Contributor

charlotte12l commented Jul 16, 2021

In scipy.stats.linregress , r = ssxym / sqrt( ssxm * ssym ) so I think it is Pearson coefficient.

However, it also says that the square of rvalue is the coefficient of determination, and according to Wikipedia:

When an intercept is included, then r2 is simply the square of the sample correlation coefficient (i.e., r) between the observed outcomes and the observed predictor values.[4] If additional regressors are included, R2 is the square of the coefficient of multiple correlation.

Is the output always the Pearson coefficient, or is it sometimes the coefficient of multiple correlation? We should clarify that in the documentation.

According to Wikipedia 1.3 As squared correlation coefficient :

In linear least squares multiple regression with an estimated intercept term, R2 equals the square of the Pearson correlation coefficient between the observed {\displaystyle y}y and modeled (predicted) {\displaystyle f}f data values of the dependent variable.
In a linear least squares regression with an intercept term and a single explanator, this is also equal to the squared Pearson correlation coefficient of the dependent variable {\displaystyle y}y and explanatory variable {\displaystyle x.}x.

Here in our linear least-squares regression scipy.stats.linregress, the square of rvalue(Pearson coefficient) is equal to R2 (Coefficient of determination).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Issues related to the SciPy documentation. Also check https://github.com/scipy/scipy.org scipy.stats
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants