<a href="https://colab.research.google.com/github/evandroamparo/simple-linear-regression/blob/main/Linear_regression_Coefficient_estimates.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Simple Linear Regression

## Estimating the Coefficients

From the course _Introduction to Data Science with Python_ from Harvard, available on [edX](https://learning.edx.org/course/course-v1:HarvardX+CS109x+1T2022/home).

The book [An Introduction to Statistical Learning](https://www.statlearning.com/) shows how to estimate the coefficients $\beta_0$ and $\beta_1$ using the Advertising dataset.



In [None]:
import pandas as pd

data = pd.read_csv('https://www.statlearning.com/s/Advertising.csv')

data.head()

Unnamed: 0.1,Unnamed: 0,TV,radio,newspaper,sales
0,1,230.1,37.8,69.2,22.1
1,2,44.5,39.3,45.1,10.4
2,3,17.2,45.9,69.3,9.3
3,4,151.5,41.3,58.5,18.5
4,5,180.8,10.8,58.4,12.9


For example, we could predict the number of units of a product sold based on the ammount of dollars spent on TV advertising.

We can express this linear relationship as 

$ Y \approx \beta_0 + \beta_1 X $

or 

$ \text{sales} \color{black} \approx \beta_0 + \beta_1 \times \text{TV} $

The coefficients can be estimated by

$ 
\hat{\beta}1 = \frac
  {\sum_{i=1}^{n} (x_i - \bar{x}) (y_i - \bar{y})}
  {\sum_{i=1}^{n}(x_i - \bar{x})^2}
\\
\hat{\beta_0} = \bar{y} - \hat{\beta_1} \bar{x}
$

These are the _least squares coefficient estimates_ for the linear regression, where 
$ \bar{y} \equiv \frac{1}{n} \sum_{i = 1}^{n} y_{i}$ 
and 
$ \bar{x} \equiv \frac{1}{n} \sum_{i = 1}^{n} x_{i}$

For the Advertising data, the fit for the regression of sales onto TV results in the following coefficients:

$\beta_0$ = 7.03 and 

$\beta_1$ = 0.0475

This is a simple implementation showing how to calculate beta0 and beta1 from Pandas series X and Y.

In [None]:
X = data['TV']
Y = data['sales']

In [None]:
def beta1(x, y):
  mean_x = X.mean()
  mean_y = Y.mean()
  return ((X - mean_x) * (Y - mean_y)).sum() / ((X - mean_x) ** 2).sum()

def beta0(X, Y):
  return Y.mean() - beta1(X, Y) * X.mean()

print('b0 = ', beta0(X, Y))
print('b1 = ', beta1(X, Y))


b0 =  7.0325935491276965
b1 =  0.047536640433019736


In [None]:
b0 = beta0(X, Y)
b1 = beta1(X, Y)

sales = b0 + b1 * X

sales

0      17.970775
1       9.147974
2       7.850224
3      14.234395
4      15.627218
         ...    
195     8.848493
196    11.510545
197    15.446579
198    20.513985
199    18.065848
Name: TV, Length: 200, dtype: float64