# [Simple Linear Regression with examples using Numpy](https://medium.com/analytics-vidhya/simple-linear-regression-with-example-using-numpy-e7b984f0d15e)

This topic was directly taken from an onlne article by [Arun Ramji Shanmugam](https://medium.com/@arunramji11)


### What is Linear Regression ?

Linear regression is the mathematical technique to guess the future outputs based on the past data.<br><br>
For example, let’s say you are watching your favourite player playing football in today’s match, he is having very good track record against this opponent team with an average of 2 goals in every match, based on this simple calculation in your mind you may expect him to score at least 2 score or more than that, so what your brain did was calculating the simple average or mean.<br><br>
**average = total score against opponent team / number of match against opponent**<br><br>
Linear regression also similar to that but instead of taking an average, we are doing much better statistical guess using linear relationship between the input variable (x) and target variable (y).<br><br>
Note: Linear Regression can be applied only for continuous variable like rain vs humidity , heart rate vs running speed etc .
let’ see how to it works by implementing it in popular numerical computing python package NumPy.

### Linear Regression using NumPy
Step 1: Import all the necessary package will be used for computation.

In [None]:
import pandas as pd
import numpy as np

Step 2 : Read the input file using pandas library.  Take a quick peak at the information.

In [None]:
data = pd.read_csv(r'C:\Users\glenn\Downloads\HappinessAlcoholConsumption.csv')
data.head()

In [None]:
data["Alcohol_PerCapita"] = data["Beer_PerCapita"] + data["Spirit_PerCapita"] + data["Wine_PerCapita"]
data.head()

Step 3: Filter only the required variables


In [None]:
A = data[['Alcohol_PerCapita', 'HappinessScore']]
A.tail()

Step 4: Convert the pandas data frame in to numpy array .

In [None]:
matrix = np.array(A.values,'float')
matrix[0:5,:]    #first 5 rows of data


In [None]:
# Examine the Arrary
print("matrix size: ", matrix.size)
print("matrix shapre: ", matrix.shape)

Step 5: Assign input and target variable, x and y for further computation.

In [None]:
#Assign input and target variable
X = matrix[:,0]
y = matrix[:,1]

In [None]:
# Peak at X and y
X
# y


**Step 6:** Feature Normalisation <br>
It is one of the important step for many ML models, what we actually do is compressing all our input variable in to smaller and similar magnitude so that later computation will be faster and efficient.<br><br>Below we have one of the feature normalisation technique to make the input variable x in similar magnitude.

In [None]:
#feature normalization
# input variable divided by maximum value among input values in X
X = X/(np.max(X)) 

# Normal distribution are figured between 0 and 1

In [None]:
X

**Step 7:** Since it is one input variable and one output variable, we can plot the 2d plot and see how it is distributed. <br> This will help us to understand the data and problem in better way.

In [None]:
import matplotlib.pyplot as plt
# This will be covered in a leter study group today we are just going to use it

plt.plot(X,y,'bo')
plt.ylabel('Happiness Score')
plt.xlabel('Alchohol consumption')
plt.legend(['Happiness Score'])
plt.title('Alchohol_Vs_Happiness')
plt.grid()
plt.show()

Now it is clear that there are some correlation between alcohol consumption and happiness score, which means we can see that country which consumes more alcohol tend to be more happy!!<br><br>

**now let’s begin computing the hypothesis**
<br><br>

>Hypothesis testing is done to confirm our observation about the population using sample data, within the desired error level. Through hypothesis testing, we can determine whether we have enough statistical evidence to conclude if the hypothesis about the population is true or not.

For a detailed explanation of hypothesis testing check out Brandon Foltz's  [Statistics 101 PL09 Hypothesis Testing](https://youtube.com/playlist?list=PLIeGtxpvyG-IZRHcZcOy12jp7ywuRbE7l) - Set aside an afternoon.

---

#### Create function to calulate SSE (Sum of Squared Error)

> SSE is the difference between our hypothesis and actual data points

**Step 8:** Define the function to calculate the cost or SSE.

For a detailed explanation of [Sum of Squared Errors](https://youtu.be/6OvhLPS7rj4) check out Kahn Academy. (7 minutes)

In [None]:
def computecost(x,y,theta):
    
    a = 1/(2*m)
    b = np.sum(((x@theta)-y)**2)
    j = (a)*(b)
    return j

**Step 9:** Appending a term x0 in our existing matrix X for mathematical convenience, x0 should be having values as ‘1’.

In [None]:
#initialising parameter
m = np.size(y)
X = X.reshape([122,1])
# np.hstack concatenates arrays column-wise
x = np.hstack([np.ones_like(X),X])
theta = np.zeros([2,1])
print(theta,'\n',m)


In [None]:
print(computecost(x,y,theta))


Gradient descend is a one such algorithm used to find the optimal parameter ‘theta’ using the given parameters, <br><br>
x — Input values<br>
y — output values<br>
Initial_theta — in most cases NULL theta<br>
alpha — rate at which gradient pointer descending to optimal value<br>
iteration — setting how many iteration it should take <br><br>
understanding [“Gradinet Desecnd”](https://youtu.be/sDv4f4s2SB8) may require bit of calculus , but it is not necessary to implement and using it for ML problems . Knowing the role of the above mentioned parameters is often enough for implementation.

---
**Step 10:** Defining function for gradient descent algorithm .


In [None]:
def gradient(x,y,theta):
    
    alpha = 0.00001
    iteration = 2000
    
    #gradient descend algorithm
    J_history = np.zeros([iteration, 1])
    
    for iter in range(0,2000):
        
        error = (x @ theta) -y
        temp0 = theta[0] - ((alpha/m) * np.sum(error*x[:,0]))
        temp1 = theta[1] - ((alpha/m) * np.sum(error*x[:,1]))
        theta = np.array([temp0,temp1]).reshape(2,1)
        J_history[iter] = (1 / (2*m) ) * (np.sum(((x @ theta)-y)**2))   #compute J value for each iteration 
 
    return theta, J_history

Now let’s use the gradient function for our data,

In [None]:
theta , J = gradient(x,y,theta)
print('theta:')
print(theta)
print('J:')
print(J)

cost or SSE value is 115.42 which is much better than 1941.78 was calculated when theta = 0<br>

---
**Step 11:** Now let’s plot our line on data to see how well it fits the data.

In [None]:
#plot linear fit for our theta
plt.plot(X,y,'bo')
plt.plot(X,x@theta,'-')
plt.axis([0,1,3,7])
plt.ylabel('Happiness Score')
plt.xlabel('Alcohol consumption')
plt.legend(['HAPPY','LinearFit'])
plt.title('Alcohol_Vs_Happiness')
plt.grid()
plt.show()