# Ordinary least Square

This notebook file consists of the implementation of OLS to determine the parameters in a simple linear regression.

### Importing The Necessary Libraries

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

#### For this task we are using a simple linear regression problem hence using a dataset that consists of one independent variable.

In [5]:
db = pd.read_csv("Salary_Data.csv")
print(db.shape)
db.head()

(30, 2)


Unnamed: 0,YearsExperience,Salary
0,1.1,39343.0
1,1.3,46205.0
2,1.5,37731.0
3,2.0,43525.0
4,2.2,39891.0


### Forumale for calulation of $\theta_0$ and $\theta_1$ :
using the given formula:
\begin{align}
        \mathbf{\theta _1} = \frac{\sum_{i=0}^n(x_i-\bar{x})(y_i-\bar{y}) }{\sum_{i=1}^n(x_i-\bar{x})^2}.
    \end{align}
$\theta_o$ is an intercept and calculated as:
\begin{align}
        \mathbf{\theta_o} = \bar{y} - \theta_1 \bar{x}
    \end{align}

### Determining the Parameters  with OLS
To find the values of $\theta_o $ and $\theta_1$ we first need to calculate the mean of X and Y

In [7]:
X = db["YearsExperience"].values
Y = db["Salary"].values

if X.shape == Y.shape :
    print("Proceed")
else : 
    print("shape doesnt match")

Proceed


### Using numpy to calculate the mean 

In [8]:
Xmean = X.mean()
Ymean = Y.mean ()

#### Computing the value of $\theta_o $ and $\theta_1$

In [9]:
num , denum = 0 , 0 

for x , y in zip(X,Y):
    num +=  ((x - Xmean) * (y - Ymean)) 
    denum += ( x - Xmean)**2 
    
theta1 = num / denum

print(theta1)

theta0 = Ymean - theta1 * Xmean

print(theta0)

9449.962321455077
25792.20019866869


### Predicting the values using the computed $\theta_o $ and $\theta_1$

In [11]:
Ypred = []
np.array(Ypred)
for i in X:
    y = theta0 + theta1 * i 
    Ypred.append(y) 

### Testing the Accuracy of The Model 

In [12]:
mse = 0 
for ypred , y in zip(Ypred, Y):
    mse  += ((ypred - y)**2)/len(X)
print(np.sqrt(mse))

5592.043608760662


**This was the implementation of simple linear regression from scratch. Now lets, do the same using the sklearn library**

In [13]:
X = X.reshape((len(X), 1))
print(X.shape)

(30, 1)


In [14]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score

# Cannot use Rank 1 matrix in scikit learn
X = X.reshape((len(X), 1))

# Creating Model
reg = LinearRegression()
# Fitting training data
reg = reg.fit(X,Y)
# Y Prediction
Y_pred = reg.predict(X)

# Calculating RMSE and R2 Score
print("R2 Score is : ", r2_score(Y, Y_pred))
print("RMSE is : " , np.sqrt(mean_squared_error(Y, Y_pred)))

R2 Score is :  0.9569566641435086
RMSE is :  5592.043608760662


Tada... We did it !