# Linear Regression from Scratch:

**Linear regression is a type of supervised machine learning algorithm that computes a linear relationship between an independent variable and a dependent variable.**
- The variable you want to predict is called the **'dependent variable'**.
- The variable you are using to predict the other variable's value is called the **'independent variable'**.

We can find the Linear Regression solutions by two methods:
- i) Closed Form Solution
- ii) Non-Closed Form Solution

**Closed Form Solution:**
In mathematics, an expression is in closed form if it is formed with constants, variables and a finite set of basic functions connected by arithmetic operations (+, −, ×, /, and integer powers) and function composition. Closed form solution also known as OLS method.

**Non-Closed Form Solution:**
Non-closed form solutions generally need to be computed, often through successive approximation. E.g Gradient Descent.

In [1]:
# importing libraries
import pandas as pd
import numpy as np

In [2]:
# loading data
data = pd.read_csv('studenthrs.csv')

In [3]:
data.head()

Unnamed: 0,Hours,Scores
0,2.5,24
1,5.1,47
2,3.2,27
3,8.5,75
4,3.5,30


In [4]:
# separating columns from the data
X = data.iloc[:, 0].values
y = data.iloc[:, 1].values

In [5]:
X

array([2.5, 5.1, 3.2, 8.5, 3.5, 1.5, 9.2, 5.5, 8.3, 2.7, 7.7, 5.9, 4.5,
       3.3, 1.1, 8.9, 2.5, 1.9, 6.1, 7.4, 2.7, 4.8, 3.8, 6.9, 7.8, 2.5,
       3.3, 4.2, 2.5, 1.6, 2.7, 3.6, 5.7, 1.3, 6.2, 5.1, 4.1, 3.4, 6.2,
       5.7, 5.1, 4.2, 4.1, 3.6, 3.4, 3.3, 2.7, 2.5, 1.6, 1.3, 8.9, 7.8,
       7.7, 7.4, 6.9, 6.1, 5.9, 4.8, 4.5, 4.2, 3.8, 3.3, 3.3, 2.7, 2.5,
       2.5, 2.5, 1.9, 1.1, 3.3, 1.1, 8.9, 2.5, 1.9, 6.1, 7.4, 2.7, 4.8,
       3.8, 6.9, 7.8, 2.5, 3.3, 4.2])

In [6]:
y

array([24, 47, 27, 75, 30, 20, 88, 60, 81, 25, 85, 62, 41, 42, 17, 95, 30,
       24, 67, 69, 30, 54, 35, 76, 86, 30, 43, 39, 24, 20, 26, 45, 58, 18,
       70, 48, 38, 43, 70, 58, 48, 39, 38, 45, 43, 43, 26, 24, 20, 18, 95,
       86, 85, 69, 76, 67, 62, 54, 41, 39, 35, 42, 43, 30, 30, 30, 24, 24,
       17, 42, 17, 95, 30, 24, 67, 69, 30, 54, 35, 76, 86, 30, 43, 39],
      dtype=int64)

##### Before moving towards OLS Method solution, I recommend to go-through my linear_regrssion_library notebook file in which I use 'Scikit Learn' library so that things more clear to you. But in this notebook I make an Linear Regression from Scratch, in which I take some values (for verification, for clearance). Then, I compare the solutions from that file.

### Closed Form aka (OLS Method):

Formula for b is:
- b = ȳ - mx̄

Formula for m is:
- m = Σ(xi - x̄)(yi - ȳ) / Σ(xi - x̄)^2

In [19]:
class MeraLR:
    def __init__(self):
        self.m = None
        self.b = None
    
    def fit(self, X_train, y_train):
        
        num = 0
        den = 0
        
        for i in range(X_train.shape[0]):
           
            num = num + ((X_train[i] - X_train.mean()) * (y_train[i] - y_train.mean()))
            den = den + ((X_train[i] - X_train.mean()) * (X_train[i] - X_train.mean()))
        
        self.m = num/den
        self.b = y_train.mean() - (self.m * X_train.mean())
        print(self.m)
        print(self.b)
            
    def predict(self, X_test):
        
        print(X_test)
        return self.m * X_test + self.b

In [9]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=2)

In [15]:
X_train.shape

(67,)

In [20]:
# created an object
lr = MeraLR()

In [21]:
# look at X_train.shape and my class function gives same results
lr.fit(X_train, y_train)

(67,)


In [22]:
X_train.shape[0]

67

In [11]:
X_train[0] # first student study hrs

1.6

In [12]:
X_train

array([1.6, 5.1, 2.5, 5.1, 4.8, 6.9, 3.3, 2.5, 1.1, 8.9, 2.7, 7.7, 6.1,
       4.2, 6.1, 4.1, 2.7, 5.7, 7.7, 3.3, 2.7, 7.8, 1.9, 1.9, 3.4, 8.3,
       3.3, 3.3, 3.3, 5.9, 4.5, 6.1, 2.5, 3.2, 7.4, 7.4, 2.5, 9.2, 2.7,
       1.1, 7.8, 1.5, 8.5, 3.8, 4.5, 1.3, 2.5, 6.2, 7.8, 4.1, 3.5, 1.9,
       5.7, 3.4, 2.7, 3.6, 2.7, 2.5, 1.3, 6.2, 5.5, 7.4, 3.6, 3.8, 2.5,
       8.9, 5.1])

In [21]:
# So my model gives value of slope and intercept
lr.fit(X_train, y_train)

9.685926977495177
4.281268942155847


- This gives the same values of slope and intercept as gave in the linear_reg_library notebook.

In [16]:
# lets tests the model

In [17]:
X_test

array([5.9, 2.5, 4.8, 2.5, 3.8, 4.2, 1.1, 1.6, 6.9, 3.3, 4.2, 6.9, 2.5,
       4.8, 3.3, 8.9, 4.2])

In [18]:
X_test[0]

5.9

In [22]:
lr.predict(X_test[0])

5.9


61.428238109377396

**I predict the student 'Score' who study 5.9 hrs, it returns the same value 61.42 as return the linear_reg_library notebook file.**