# Week 09 - R-Squared, R, r, TSS, ESS, and RSS

## R Squared

* $SS_{res} = \sum{(y - \hat{y})^2}$
* $SS_{tot} = \sum{(y - \bar{y})^2}$
* $R^2 = 1 - \frac{SS_{res}}{SS_{tot}}$

In statistics, the Pearson correlation coefficient ― also known as Pearson's r ― is a measure of linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations; thus it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationship or correlation. As a simple example, one would expect the age and height of a sample of teenagers from a high school to have a Pearson correlation coefficient significantly greater than 0, but less than 1 (as 1 would represent an unrealistically perfect correlation).

https://en.wikipedia.org/wiki/Pearson_correlation_coefficient

In statistics, the coefficient of determination, denoted R2 or r2 and pronounced "R squared", is the proportion of the variation in the dependent variable that is predictable from the independent variable(s).

https://en.wikipedia.org/wiki/Coefficient_of_determination

* r shows correlation between x and y
* r squared shows strength of model, the proportion of the variance y that can be explained by X in a linear regression model

## R Squared, R, and r

* R-Squared: Coefficient of Determination, proportion of the variation in the dependent variable that is predictable from the independent variable(s)
* R: Coefficient of multiple correlation, a measure of how well a given variable can be predicted using a linear function of a set of other variables
* r: Pearson's r, cCorrelation is different from regression, as it does not assume any sort of dependency between two quantitative variables and it is only meant to express their joint variability
* https://www.r-bloggers.com/2022/11/the-coefficient-of-determination-is-it-the-r-squared-or-r-squared/



## The Rs with Simple Linear Regression

In [1]:
import pandas as pd
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from scipy.stats import pearsonr

# create dataframe
X, y = make_regression(n_samples=1000, n_features=1, noise=13)
df = pd.DataFrame(data=X, columns=['Feature 1'])
df['y'] = y

# train test split; change the random_state to 42 if not using the generated dataset
X_train, X_test, y_train, y_test = train_test_split(df.drop('y', axis=1), df['y'], test_size=.25, random_state=42)

# create and train the model
model = LinearRegression()
model.fit(X_train, y_train)

# create and train the model
model2 = LinearRegression()
model2.fit(X, y)

# test set prediction results
y_hat = model.predict(X_test)
print('r with df.corr()', df.corr())
pcorr, _ = pearsonr(X.flatten(), y)
print('r with scipy', pcorr)
print('r^2', pcorr**2)
print('R squared (model2 score): ', model2.score(X, y))
print('R squared (model score with X_train and y_train): ', model.score(X_train, y_train))
print('R squared (metric from sklearn of y_test and y_hat)', r2_score(y_test, y_hat))

r with df.corr()            Feature 1         y
Feature 1   1.000000  0.987579
y           0.987579  1.000000
r with scipy 0.9875791397513904
r^2 0.9753125572720964
R squared (model2 score):  0.975312557272096
R squared (model score with X_train and y_train):  0.97584774237287
R squared (metric from sklearn of y_test and y_hat) 0.9735012189411446


## The Rs with Multiple Regression

In [2]:
import pandas as pd
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

# create dataframe
X, y = make_regression(n_samples=1000, n_features=3, noise=13)
df = pd.DataFrame(data=X, columns=['Feature 1', 'Feature 2', 'Feature 3'])
df['y'] = y

# train test split; change the random_state to 42 if not using the generated dataset
X_train, X_test, y_train, y_test = train_test_split(df.drop('y', axis=1), df['y'], test_size=.25, random_state=42)

# create and train the model
model = LinearRegression()
model.fit(X_train, y_train)

# test set prediction results
y_hat = model.predict(X_test)
print('R squared (model score): ', model.score(X_train, y_train))
print('R squared (metric from sklearn of y_test and y_hat)', r2_score(y_test, y_hat))

R squared (model score):  0.9657387189006659
R squared (metric from sklearn of y_test and y_hat) 0.9592641536685931


## Total Sum of Squares

According to Wikipedia, if $\bar{y}$ is the mean of the observed data:

$
\bar{y} = \frac{1}{N}\sum(y)
$

then the variability of the data set can be measured with two sums of squares formulas:

The total sum of squares (proportional to the variance of the data):

$
SS_{tot} = \sum(y - \bar{y})^2
$

## Residual Sum of Squares

The sum of squares of residuals, also called the residual sum of squares:

$
SS_{res} = \sum(y - \hat{y})^2
$

$\hat{y}$ represents our predicted y.

With this information, we can get r-squared:

$
R^2 = 1 - \large{\frac{SS_{res}}{SS_{tot}}}
$

https://en.wikipedia.org/wiki/Residual_sum_of_squares

## Explained Sum of Squares

Another sum of squares not mentioned is Explained Sum of Squares:

$
ESS = \sum(\hat{y} - \bar{y})^2
$

or

$
TSS = RSS + ESS
$

or 

$
ESS = TSS - RSS
$

https://en.wikipedia.org/wiki/Explained_sum_of_squares

The formula for Adjusted R-Squared is:

$
R^2_{adj} = 1 - (1 - R^2)\large{\frac{n-1}{n - p - 1}}
$

or

$
R^2_{adj} = 1 - \large{\frac{\frac{SS_{res}}{df_e}}{\frac{SS_{tot}}{df_t}}}
$

where $df_t$ is the degrees of freedom n – 1 of the estimate of the population variance of the dependent variable, and $df_e$ is the degrees of freedom n – p – 1 of the estimate of the underlying population error variance. Note: p = parameters or features; n = observations

The adjusted R2 can be negative, and its value will always be less than or equal to that of R2. Unlike R2, the adjusted R2 increases only when the increase in R2 (due to the inclusion of a new explanatory variable) is more than one would expect to see by chance. If a set of explanatory variables with a predetermined hierarchy of importance are introduced into a regression one at a time, with the adjusted R2 computed each time, the level at which adjusted R2 reaches a maximum, and decreases afterward, would be the regression with the ideal combination of having the best fit without excess/unnecessary terms.

https://en.wikipedia.org/wiki/Coefficient_of_determination#Adjusted_R2