## Linear Regression python implementation from scratch

Linear regression is the most basic and most used technique when it comes to predicting using regression analysis. In large parts of corporate business analytics, regression has been the end goal. It is very important to understand this and move forward without getting stuck. By far the easiest way is to experiment and build things. This notebook starts out with basic code to clean things and put them together. This comes from numpy and pandas but from there we pick up the bricks and build this house. 

Source: https://www.youtube.com/watch?v=4b4MUYve_U8&list=PLoROMvodv4rMiGQp3WXShtMGgzqpfVfbU&index=2

In [1]:
%load_ext autoreload
%autoreload 2

%matplotlib inline

In [2]:
from basic import *
from sklearn.datasets import *
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error

#### Basic Structure

Assume sample weights to be zero in the beginning. We assume that the input is normalised and contains values between a small range(0,1). This helps the algorithm to train faster and easily. 

We use back propagation and learning to change our sample weights. 

We also add an additional column in the training data to use this for intercept calculation.

The end equation is:
    
    for each iteration:
        
        prediction = [weights] @ [Input dataframe]
        weights = weights - (predictions - true output) * learning rate @ input data frame. 

In [3]:
class LinearModel():
    def __init__(self, X, y, n_iter=1000, learning_rate=0.01):
        self.w = np.zeros((X.shape[1], 1))
        self.learning_rate = learning_rate/X.shape[0]
        self.n_iter = n_iter
        self.X = X
        self.y = y
        
    def fit(self):
        for i in range(self.n_iter):
            predictions = self.X @ self.w - self.y
            delta = self.learning_rate*(self.X.T @ (predictions))
            self.w -= delta
    
    def predict(self, X):
        return X @ self.w

In [7]:
dataset = load_boston()
X_train = pd.DataFrame(dataset.data, columns=dataset.feature_names)
y_train = dataset.target[:, None]

In [8]:
to_drop = ['DIS', 'INDUS', 'LSTAT', 'NOX', 'RAD', 'TAX', 'ZN']

In [9]:
X_train = X_train.drop(to_drop, axis=1)

In [10]:
X_train = (X_train-X_train.mean())/X_train.std()
X_train['intercept'] = np.ones(X_train.shape[0])

In [11]:
lr = LinearModel(X_train, y_train)

In [12]:
lr.fit()

In [14]:
r2_score(y_train, lr.predict(X_train))

0.6368893071818802

In [15]:
sklearn_lin = LinearRegression()
sklearn_lin.fit(X_train, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [16]:
sklearn_lin.score(X_train, y_train)

0.6368894212487402

## Conclusion

This notebook created a Linear Regression model from scratch. Its fast but slightly slower than the sklearn model but it is almost eqaully good predictive of the output.