# Linear Regression


## What?

Linear regression is a way of modeling the relationship between a scalar dependent variable y and one or more explanatory variables

(in the simplest case this is just)

$$ F(x) = mx + b $$ 

Lets look at a (contrived) example:

$$ F(x) = (9/5)x + 32$$

This equation states the relationship between the temperature in the celsius scale to the farenheit scale. We know it well.

But what if you didn't know it?

You do have two therometers, one that measures the temperature in celsius and the other in farenheit. The only catch is they are not perfect and sometimes mismeasure the temperature.

In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

def convert(temp_in_c):
    
    """ converts a temperature from celsius to farenheit
    
    Args:
        temp_in_c<int>
    
    Return:
        the temperature in farenheit<int>
    """
    
    return temp_in_c * (9/5) + 32

def measurements():
    
    """
    
    For number_of_days days declares the temperature and measures it with noise
    
    Return:
        A dataframe containing the actual temp in c the measured temp in f
    
    """
    
    number_of_days = 100
    
    # the temperature is usually 20 but can vary by 10 degrees at a time
    
    average_temperature, standard_deviation = 20, 10
    actual_c = np.random.normal(average_temperature, standard_deviation, number_of_days)
    
    # convert c to f
    
    actual_f = convert(actual_c)
    
    # the therometer is usually 2 off but can be off by up to 4 degrees
    
    average_jankiness, standard_jankiness = 2, 2
    temperature_deviations = np.random.normal(average_jankiness, standard_jankiness, number_of_days)
    
    measured_f = actual_f + temperature_deviations
    
    return pd.DataFrame({
        'actual_c': actual_c,
        'actual_f': actual_f,
        'measured_f': measured_f,
    })
    
data = measurements()

#plt.scatter(data['actual_c'],data['measured_f'])


In [144]:
from sklearn import linear_model

def do_scikit_learn_regresssion():
    
    """
    
    Uses scikit learn as a black box to perform linear regression
    
    """
    
    
    regr = linear_model.LinearRegression()
    x = data['actual_c'].values.reshape(100,1)
    y = data['measured_f'].values.reshape(100,1)
    print (x.sum(),y.sum(), y.sum()/ x.sum() - 1.8)
    
    regr.fit(data['actual_c'].values.reshape(100,1), data['measured_f'].values.reshape(100,1))
    
    return '\n'.join((
        f'Coefficient of {regr.coef_[0][0]} compared to actual {9/5}',
        f'Intercept of {regr.intercept_[0]} compared to actual {32}'
    ))
     
print (do_scikit_learn_regresssion())

2117.47157237 7196.50579862 1.59863160031
Coefficient of 1.76936230300193 compared to actual 1.8
Intercept of 34.499314207949205 compared to actual 32


# How does Linear Regression Work?

### Linear Regression is an optimization problem

Find the values of m and b that minimize the following function

$$ F(m,b) = \frac{1}{2} \sum_i \left(y_i - x_i m - b \right)^2$$

### How do we find the minimum of a convex function?
Set the derivative equal to zero and solve 

$$ \frac{\partial F(m,b)}{\partial b} = $$

Substitue the value calculated for b into the next equation

$$ \frac{\partial F(m,b)}{\partial m} = $$

Substitue the value calculated for b into the equation where m is minimized

### The following equation is retrieved:

$$foo $$

In [2]:
def perform_linear_regressions(xs, ys):
    
    
    return m, x
