<a href="https://colab.research.google.com/github/roniel06/python_ml_exc/blob/main/R_squared_and_adjusted_R_squared.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Here We are going to implement the R^2 metric and the Adjusted R^2

This is for learning pourpose


### Imports

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score


 ### Data Gathering

In [3]:
dataset=pd.read_csv('https://pastebin.com/raw/JFACpGgf')
dataset.head()

Unnamed: 0,TV,Radio,Newspaper,Sales
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,12.0
3,151.5,41.3,58.5,16.5
4,180.8,10.8,58.4,17.9


### Model Creation

In [17]:
linear_regression = LinearRegression()
results = {
    'R-Squared':list()
}


y = dataset['Sales']
y.head()

0    22.1
1    10.4
2    12.0
3    16.5
4    17.9
Name: Sales, dtype: float64

*Now were going to train the model based in selected features *

So, we're going to make some feature combinations in order to check the fitness 

In [18]:
# Tv Only 

linear_regression.fit(dataset[['TV']],dataset[['Sales']])

y_pred = linear_regression.predict(dataset[['TV']])
tv_fitness_score = r2_score(y, y_pred)

results['R-Squared'].append(tv_fitness_score)
results

{'R-Squared': [0.8121757029987414]}

In [19]:
#Radio Only

linear_regression.fit(dataset[['Radio']], dataset[['Sales']])
y_pred = linear_regression.predict(dataset[['Radio']])
radio_fitness_score = r2_score(y, y_pred)

results['R-Squared'].append(radio_fitness_score)
results

{'R-Squared': [0.8121757029987414, 0.1222419039947863]}

In [20]:
#Newspaper Only

linear_regression.fit(dataset[['Newspaper']], dataset[['Sales']])
y_pred = linear_regression.predict(dataset[['Newspaper']])
newspaper_fitness_score = r2_score(y, y_pred)

results['R-Squared'].append(newspaper_fitness_score)
results

{'R-Squared': [0.8121757029987414, 0.1222419039947863, 0.024951369862864836]}

**Now we're gonna add some more features to the model**

In [21]:
# Tv + Radio

features = ['TV','Radio']
linear_regression.fit(dataset[features], dataset['Sales'])

y_pred = linear_regression.predict(dataset[features])
tv_radio_fitness = r2_score(y, y_pred)
results['R-Squared'].append(tv_radio_fitness)
results

{'R-Squared': [0.8121757029987414,
  0.1222419039947863,
  0.024951369862864836,
  0.9025896186081139]}

In [22]:
features = ['TV','Radio', 'Newspaper']
linear_regression.fit(dataset[features], dataset['Sales'])

y_pred = linear_regression.predict(dataset[features])
tv_radio_newspp_fitness = r2_score(y, y_pred)
results['R-Squared'].append(tv_radio_newspp_fitness)
results

{'R-Squared': [0.8121757029987414,
  0.1222419039947863,
  0.024951369862864836,
  0.9025896186081139,
  0.9025912899684558]}

## Interpreting Results



In [23]:
index_header = ['TV','Radio','Newspaper','TV + Radio', 'TV + Radio + Newspaper']
r2_df = pd.DataFrame(results, index=index_header).transpose()
display(r2_df)

Unnamed: 0,TV,Radio,Newspaper,TV + Radio,TV + Radio + Newspaper
R-Squared,0.812176,0.122242,0.024951,0.90259,0.902591


So in the cell above, you can se how accurate the model is being fited after we started to add some features to the regression. 

In this case we can concluthe that the accurate combination of features in order to get best sellings is TV, Radio and Newspapers. 

## Adjusted R-Squared

the formula of Adjusted Rsquare is

$$\bar{R^2} = 1-(1- R^2) \frac{(n-1)}{n-d-1} $$


Here, ($n-1$) is the degrees of freedom that encounters the population variance of dependent output variable
whereas ($n-d-1$) is the degrees of freedom that encounters the population variance or error.

$\bar{R^2}$ increases if and only if the increase in $R^2$ is more than one would expect to see by chanse. 

In [24]:
adjusted_results ={
    "Rbar":list()
}

def adjusted_r2(r2,n,d):
  return 1-((1-r2) * (n-1) / (n-d-1))



In [29]:
## TV only
tv_r2 = r2_df.iloc[0,:]['TV']
tv_adj_r2 = adjusted_r2(tv_r2, 200,1)

adjusted_results['Rbar'].append(tv_adj_r2)
adjusted_results


{'Rbar': [0.811227095438129]}

In [30]:
#Radio 
radio_r2 = r2_df.iloc[0,:]['Radio']
radio_adj_r2 = adjusted_r2(radio_r2, 200,1)

adjusted_results['Rbar'].append(radio_adj_r2)
adjusted_results

{'Rbar': [0.811227095438129, 0.11780878229779024]}

In [31]:
#Newspaper 
newspaper_r2 = r2_df.iloc[0,:]['Newspaper']
newspaper_adj_r2 = adjusted_r2(newspaper_r2, 200,1)

adjusted_results['Rbar'].append(newspaper_adj_r2)
adjusted_results

{'Rbar': [0.811227095438129, 0.11780878229779024, 0.02002688183186918]}

In [33]:
## Tv + Radio

tv_radio_r2 = r2_df.iloc[0,:]['TV + Radio']
tv_radio_adj_r2 = adjusted_r2(tv_radio_r2, 200,2)

adjusted_results['Rbar'].append(tv_radio_adj_r2)
adjusted_results

{'Rbar': [0.811227095438129,
  0.11780878229779024,
  0.02002688183186918,
  0.9016006807259628]}

In [35]:
## Tv + Radio + Newspaper

tv_radio_news_r2 = r2_df.iloc[0,:]['TV + Radio + Newspaper']
tv_radio_news_adj_r2 = adjusted_r2(tv_radio_news_r2, 200,3)

adjusted_results['Rbar'].append(tv_radio_news_adj_r2)
adjusted_results

{'Rbar': [0.811227095438129,
  0.11780878229779024,
  0.02002688183186918,
  0.9016006807259628,
  0.9011003403251159]}

## Intepreting Adjusted Results


In [36]:
index_header = ['TV','Radio','Newspaper','TV + Radio', 'TV + Radio + Newspaper']
r2_adj_df = pd.DataFrame(adjusted_results, index=index_header).transpose()
display(r2_adj_df)

Unnamed: 0,TV,Radio,Newspaper,TV + Radio,TV + Radio + Newspaper
Rbar,0.811227,0.117809,0.020027,0.901601,0.9011


## Conclusion
Here are the adjusted for each of the input sets. For simple linear regression, TV as input has the highest adjusted $R^2$ so it is to be chosen if we want simple linear regression.  Increasing one independent variable has increased
adjusted too, so this feature adds a lot to fit the data points