### Linear Regression  

----- 

Consider adversitising data from ch. 2, and the overall goal of suggesting a marketing plan that will result in high product sales 

some questions we could ask ourselves 

- any relationships between budget and sales 
    - is there even a basis for advertising spend? 
    - look at simple correlations 
- strength of relationship between budget and sales 
    - given a relationship, how better of a predictor of sales is budget than random guess? 
    - maybe use a simple regression analysis 
- media contributions to sales 
    - TV, radio, newspaper effects on sales 
- effect of each media medium on sales 
    - give a quantitative answer – for every 1 dollar spent on TV ads, how much do sales go up 
- can we accurately predict future sales? 
    - on average, what is the prediction accuracy w the given features
- synergy / relationships / interaction effects amongst the features? 
    - is there a best combination of features?  
    
    
linear regression can answer all of these!! 


-----


*simple linear regression* - predict $Y$ based on a single predictor $X$  

$$ Y \approx \beta_0 + \beta_1 X$$ 


in practice, the true $\beta_0$ and $\beta_1$ are unknown... however we can use data points to estimate 

different criteria can be used, but one common one is to minmize the *least squares criterion* 

consider $\hat{y}_i$ to be the prediction for the $i$th value in $X$

therefore, $e_i = y_i - \hat{y}_i$ is the error, or residual, for the $i$th observation is the difference in predicted and actual values 

the residual sum of squares (RSS) is 

$$ RSS = e_1^2 + e_2^2 + \dots + e_n^2$$ 

and, if we rearrange the prediction for a single observation, this is equivalent to 

$$ RSS = (y_1 - \hat{\beta_0} - \hat{\beta_1}x_1)^2 + (y_2 - \hat{\beta_0} - \hat{\beta_1}x_2)^2 + \dots + (y_n - \hat{\beta_0} - \hat{\beta_1}x_n)^2$$ 

if this RSS formula is minimized through $\hat{\beta_0}$ and $\hat{\beta_1}$, we get the following values: 


$$ \hat{\beta}_1 = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2}$$ 

$$ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}$$ 

where $\bar{x}$ and $\bar{y}$ are the sample means 

let's write a function for this to implement from scratch 

In [71]:
import numpy as np 
import pandas as pd  

def calculate_least_square_coeff(X, y): 
    # assume X and y are 1-D vectors 
    
    # --- beta 1 ---- 
    
    # get numerator 
    x_bar = np.repeat(X.mean(), len(X)) 
    x_diff = X - x_bar
    
    y_bar = np.repeat(y.mean(), len(y)) 
    y_diff = y - y_bar 
    
    xy = x_diff * y_diff 
    numerator = xy.sum() 
    
    # get denomenator 
    denom = (x_diff**2).sum()
    
    BETA_1 = numerator / denom  
    
    # --- beta 0 ---- 
    
    BETA_0 = y.mean() - (BETA_1 * X.mean())
    
    return BETA_1, BETA_0

the above function implements the following equations: 

$$ \hat{\beta}_1 = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2}$$ 

$$ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x}$$ 

in a vectorized fashion


underneath, i simulate the data and compare the runtime to the numpy linalg calculation 

In [64]:
yy = np.random.random_sample(100)  
xx = 3*yy + np.random.normal(0, 0.25) 

In [65]:
calculate_least_square_coeff(xx, yy)

(0.3333333333333334, -0.08966134038837265)

In [66]:
A = np.vstack([xx, np.ones(len(xx))]).T
m, c = np.linalg.lstsq(A, yy, rcond=None)[0]

In [67]:
m, c

(0.33333333333333337, -0.08966134038837242)

In [68]:
%timeit calculate_least_square_coeff(xx, yy)

31.3 µs ± 1.16 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [69]:
%timeit np.linalg.lstsq(A, yy, rcond=None)[0]

37 µs ± 804 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


wow!!! my vectorized implementation was actually slightly faster (why? i should look into this) 