## Linear Regression Algorithm Walk through
#### Lew Sears

In [1]:
import pandas as pd
import numpy as np

Linear regression is one of the simplest, yet most important principals of data science. While simple as it may be, it's important to have a really good understanding of the foundational building blocks before you can begin to talk about more powerful ideas like Neural Networks (which in a simplistic way stack many linear regressions). It's general goal is to take a vector of features and output a value, usually a prediction. For example, give the model a bunch of data about a house and try to create a prediction of the price.

To predict that value of the house, we will create weights that value how important changes in each feature is to the final target. These weights are trained using data and targets from information about houses that we know. As you may expect, a linear regression works best with continuous data, but a few binary features aren't the end of the word. For categorical data, best practice is to one-hot-encode the rows. 

In [207]:
class LinearRegression():
    
    def __init__(self, learning_rate):
        '''Initializes the class with a learning rate for the optimization of weights.'''
        self.learning_rate = learning_rate
          
    def fit(self, train_data, train_target):
        '''Input the training data and its respective target values'''
        
        #Convert data to numpy arrays
        constant_term = np.array([np.zeros(len(train_data))+1])
        X = np.concatenate((constant_term.T, np.array(train_data)), axis = 1)
        y = np.array(train_target)
        #Initialize weights:
        self.weights = np.zeros(X.shape[1])
        self.errors = []
        
        i = 0
        while i < 10:
            predict = np.matmul(self.weights, X)
            errors = y-predict
            sq_errors = errors ** 2 
            self.errors.append(np.sum(sq_errors))
            
            #update weights
            gradient = (-2) * np.dot(X, errors) 
            self.weights = self.weights - self.learning_rate * gradient 
            i += 1
            
        return self












#### A simple example

In [208]:
x = pd.DataFrame(np.random.normal(0 , 1, size = (1000,2))*10)

In [209]:
#we make the target the first row, plus the second row, plus 5 with some noise.
x[2] = x[0]+x[1]+5+np.random.normal(0 , 2, size = (1000))

In [210]:
lin_model = LinearRegression(0.1)

In [211]:
lin_model.fit(x[[0,1]], x[2])

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 1000 is different from 3)

In [177]:
weights = lin_model.weights

In [205]:
weights.shape

(3,)

In [165]:
lin_model.errors

[955253.2003173728,
 20189912400769.285,
 7.992972540452371e+21,
 3.174535923498148e+30,
 1.2645683443510306e+39,
 5.051271527685712e+47,
 2.022844093586656e+56,
 8.119633351490424e+64,
 3.2661410420549717e+73,
 1.3163585060204298e+82]

In [181]:
npx = np.array(x)

In [206]:
[y.shape, npx.shape]

[(3,), (1000, 3)]

In [182]:
y= np.array([1,2,3])

In [189]:
np.matmul(npx,y)

array([-7.14309990e+01,  2.94349684e+01,  1.53080151e+01, -4.76056669e+00,
       -7.12129971e+01,  6.38179932e+01,  7.93472766e+01, -7.38977006e+01,
        4.72391705e+01,  1.50663305e+02, -4.46133249e+01, -5.00858408e+01,
       -3.14681521e+01,  2.26407116e+01, -7.67465088e+01, -9.16155543e+01,
       -2.52360436e+01, -7.43213383e+00,  4.11158151e+01, -2.80895307e+00,
        4.99092969e+01,  5.05859101e+00,  3.39016295e+01, -9.99031935e+01,
       -1.21415928e+01,  1.05688940e+01,  1.39252026e+00, -7.07389458e+01,
        2.34423661e+01,  4.91648420e+01,  9.51157615e+00, -1.51348416e+01,
        2.11700602e+01, -1.58489365e+01,  1.30938910e+01,  5.84764720e+01,
        2.71661820e+01,  6.30680904e+01, -6.24056189e+01,  1.93249611e+02,
        7.68554795e+01,  2.13316804e+01,  6.80628831e+01,  1.11037870e+02,
        8.19751412e+01, -1.30167023e+01,  1.20877214e+02, -2.65566869e+01,
        3.56845109e+01,  2.57396673e+01,  1.50763954e+02,  1.98463045e+01,
       -7.02391091e+01, -

In [None]:
7.14309990e