<a href="https://colab.research.google.com/github/mkjubran/MachineLearning/blob/master/Regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Clone the Source GitHub Reporsitory 
We need to clone some source files to be used throughtout this tutorial from a GitHub reprository

In [0]:
!rm -rf ./MachineLearning
!git clone https://github.com/mkjubran/MachineLearning.git

# Linear Regression
**Introduction**

In this section, we will come up with a technique to estimate the prices of houses based on their area (size). This is achieved through the following procedure: \\
1- collect some statistics about houses which include the area and price of each house, \\
2- we will use this data to build a model to correlate the prices of the houses with their area, \\
3- next, we will use the model to estimate the prices of new houses based on their areas

**Theory** \\

Linear regression is a linear approach to modeling the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables). $^{[1]}$ \\

![alt text](https://drive.google.com/uc?id=1_r9xTDOQbhv42ystuiIzFryatHAwuuyg)

But we might have different options to represent this relation as shown in the figure below:

![alt text](https://drive.google.com/uc?id=1HvCkHLnCVWP5MHM-YWR_0kqqbvfflSnY)

We need to determine the values of **m** and **b** which minimize the residual error between actual and predicted values of the dependent variable (y).\\

![alt text](https://drive.google.com/uc?id=10nLdNVaRfmi5_-tq8LzST39JnrPltzZC)



[1] https://en.wikipedia.org/wiki/Linear_regression

**Implementation**

In a previous module, you learned how to extract and collect data. Now, let us assume the data which includes areas and prices of houses are saved in a csv file called "HouseAreasPrices.csv" \\
To read the data in the file, we will be using the pandas library (https://pandas.pydata.org/).

In [0]:
import pandas as pd
df = pd.read_csv("./MachineLearning/1_Regression/HousesAreasPrices.csv")
print(df)

Now, to visualize the data, we will plot the pairs (area, price) of each house on a scattered plot. To do this we need to use the matplotlib library (https://matplotlib.org/).

In [0]:
import matplotlib.pyplot as plt
plt.scatter(df.area,df.price,color='r', marker='+')
plt.xlabel('Area ($m^2$)',fontsize=20)
plt.ylabel('Prices ($)',fontsize=20)

As can be observed from the plot, a straight line can be used to represent the data. So we will use the LinearRegression method in the sklearn library (https://scikit-learn.org/stable/) to derive the best fitting line (determine the best coefficient and interception values) based on the given data.

In [0]:
from sklearn import linear_model
reg = linear_model.LinearRegression()
reg.fit(df[['area']],df.price)
print(reg.coef_) ## print the coefficient
print(reg.intercept_) ## print the intercept

To visualize the line, we plot the best fitting line on the scattered plot

In [0]:
plt.scatter(df.area,df.price,color='r', marker='+')
plt.xlabel('Area ($m^2$)',fontsize=20)
plt.ylabel('Prices ($)',fontsize=20)
plt.plot(df.area,reg.predict(df[['area']]),color='b')

After building the model, we will use it to estimate the prices of a list of houses based on their areas. Let us assume that the areas of few houses are stored in a csv file called "HousesAreas.csv". We will read the data from the file into a dataframe, and apply the area values in the dataframe to the model to determine the estimated prices, then we will append the estimated prices to the dataframe and store the new dataframe to a new csv file called "PredictedHouusesAreasPrices.csv"

In [0]:
df2 = pd.read_csv("./MachineLearning/1_Regression/HousesAreas.csv")
p=reg.predict(df2)
df2['price']=p
print(df2)
df2.to_csv('./MachineLearning/1_Regression/PredictHousesAreasPrices.csv',index=False)

We could also plot the estimated prices with the original data and the best fitting line on the same figure as

In [0]:
df_merged= df.append(df2, ignore_index=True)

plt.scatter(df.area,df.price,color='r', marker='+')
plt.xlabel('Area ($m^2$)',fontsize=20)
plt.ylabel('Prices ($)',fontsize=20)
plt.plot(df_merged.area, reg.predict(df_merged[['area']]),color='b',linestyle='-.',linewidth=0.5)
plt.scatter(df2.area,df2.price,color='m', marker='o')

As can be observed from the figure, the estimated prices of the new houses are located on the best fit line

**Exercise**

Use linear regression to estimate the prices of the houses based on how old the houses are. To complete this exercise, two files are included in the repository: \\
1- HousesAgesPrices.csv: a list of houses' prices and their ages \\
2- HousesAges.csv: a list of ages of houses

# Multiple Regression
**Introduction**

In this section, we will extend the model derives in the Linear Regression section to include more than one independent variable. This method is called Multiple Regression. We will use multiple properties of the house (area, number of rooms, age) to predict its price.

**Theory**

Multiple regression is an extension of simple linear regression. It is used when we want to predict the value of a variable based on the value of two or more other variables. The variable we want to predict is called the dependent variable (or sometimes, the outcome, target or criterion variable). The variables we are using to predict the value of the dependent variable are called the independent variables (or sometimes, the predictor, explanatory or regressor variables). For example, you could use multiple regression to understand whether exam performance can be predicted based on revision time, test anxiety, lecture attendance and gender. $^{[2]}$

[2] https://statistics.laerd.com/spss-tutorials/multiple-regression-using-spss-statistics.php

**Implementation**

Read the data in the file "HousesPrices.csv" using pandas libarary.

In [0]:
import pandas as pd
df = pd.read_csv("./MachineLearning/1_Regression/HousesPrices.csv")
print(df)

In the above table, there is more than one feature that corresponds to the price of each house. In the multivariate, the number of input independent variables (features) is at least two or more. In our case, the features are area, number of bedrooms, and age of the house. And the dependent variable is the price of the house. 


![alt text](https://drive.google.com/uc?id=1a4tq7w_mewJ3gUykvvT5PiOg4rXKsBMJ)


We notice that there is a NaN number of bedrooms next to house index 2, this is typically due to empty value in the csv file. Thus, we need to process the dataframe to clean the data. In our case, we will replace the NaN value with the median of the other bedroom values in the table.

In [0]:
import math
median_bedrooms = df.bedrooms.median() # media of number of bedrooms in dataframe
print(median_bedrooms)
median_bedrooms = math.floor(df.bedrooms.median())# use the math library to compute the floor of the media of number of bedrooms in dataframe
print(median_bedrooms)
df.bedrooms = df.bedrooms.fillna(median_bedrooms) #replace the NAN with the median value
print(df)

Now the data is clean. Next, we will use the LinearRegression method in the sklearn library (https://scikit-learn.org/stable/) to derive the best fitting line (determine the best coefficients and interception values) based on the given data. 


In [0]:
from sklearn import linear_model
regm = linear_model.LinearRegression()
regm.fit(df[['area','bedrooms','age']],df.price)
print(regm.coef_) ## print the coefficients
print(regm.intercept_) ## print the intercept

After building the model, we will use it to estimate the prices of a list of houses based on their features (area, number of bedrooms, age). Let us assume features of few houses are stored in a csv file called "HousesFeatures.csv". We will read the data from the file into a dataframe, and apply the values of the features in the dataframe to the model to determine the estimated prices, then we will append the etimted prices to the dataframe and store the new dataframe to new csv file called "PredictedHouusesFeaturesPrices.csv"

In [0]:
dfm = pd.read_csv("./MachineLearning/1_Regression/HousesFeatures.csv")
p=regm.predict(dfm)
dfm['price']=p
print(dfm)
dfm.to_csv('./MachineLearning/1_Regression/PredictHousesFeaturesPrices.csv',index=False)

Optional: we could also check the residul error between the actual prices (in 'HousesPrices.csv') and the predicted values. 

In [0]:
ppr=regm.predict(df[['area','bedrooms','age']])
df['predicted_prices']=ppr
df['Residual']=df['price']-df['predicted_prices']
print(df)


**Exercise (1)**
Use multiple regression to estimate the prices of the houses based on houses' features. To complete this exercise, two files are included in the repository: \\
1- HousesPrices_Exercise.csv: a list of houses' prices and their features \\
2- HousesFeatures_Exercise.csv: a list of ages of houses

hint: use the word2number (https://pypi.org/project/word2number/) library to convert number words (eg. twenty one) to numeric digits (21).

**Exercise (2)**
Use multiple regression to estimate the salary of a person based on experience, test results, and interview score. You are given some data in the "hiring.csv" file included in the repository. You need to propose a salary for the following two persons: 

**Person 1**: nine years of experience, 9 in the test score, and 6 in the interview, \\
**Person 2**: twelve years of experience, 10 in the test score, and 9 in the interview 

# Cost Function and Gradient Descent
In this section, we will learn how to use gradient descent to determine the optimal coefficients and intercept of linear regression.

**Theory**

In Machine Learning (ML), cost function is a measure of how wrong the model is in terms of its ability to estimate the relationship between the independent variable (x) and the dependent variable (y). This is typically expressed as a difference or distance between the predicted value and the actual value.$^{[1]}$ One common cost function that is often used and will be used in this sesisons is mean squared error (MSE), which measures the difference between the ground truth ($y_i$) and the estimated value ($\hat{y}_i$). \\

\begin{equation}
\begin{aligned}
MSE=\frac{1}{n} \sum^n_{i=1}{(y_i -\hat{y}_i)^2}   
\end{aligned}
\end{equation}

The cost function (you may also see this referred to as loss or error.) can be estimated by iteratively running the model to compare estimated predictions against “ground truth” — the known values of y. The objective of a ML model, therefore, is to find parameters, weights or a structure that minimises the cost function.$^{[1]}$ Gradient descent is an efficient optimization algorithm that attempts to find a local or global minima of a function. **Gradient descent** enables a model to learn the gradient or direction that the model should take in order to reduce errors (differences between actual y and predicted y).$^{[1]}$

![alt text](https://drive.google.com/uc?id=1fRW5deq8-LDrJcA537TvCOpDvChSk1pa)

In the linear regression case, we need to determine the values of **m** and **b** that minimize the cost function (**MSE**) as shown below. 

![alt text](https://drive.google.com/uc?id=1-djT6TUolxA_C5eDiX0hIAZcD7M68vCn)

We will use the gradient descent to determine the direction and step size to progress from an initial point toward the global minimum of the **MSE**. 
![alt text](https://drive.google.com/uc?id=16MCwuGihzmVaRYziuy5jxvWnE5m1sr_3)

Given a set of data point, below is a visualization of how the gradient descent works $^{2}$


![alt text](https://drive.google.com/uc?id=11MmCe-tEwK_SQ-qwwbc4ZY2QpE8GhIL4)


[1] https://towardsdatascience.com/machine-learning-fundamentals-via-linear-regression-41a5d11f5220

[2] https://github.com/mattnedrich/GradientDescentExample/blob/master/gradient_descent_example.gif


**Readings and Resources** \\
1- https://towardsdatascience.com/machine-learning-fundamentals-via-linear-regression-41a5d11f5220

2- https://medium.com/@lachlanmiller_52885/machine-learning-week-1-cost-function-gradient-descent-and-univariate-linear-regression-8f5fe69815fd

**Implementation**

In order to determine the best fit line, we need to determine the values of **m** and **b** of the straight line $\hat{y}_i=mx_i+b$ that minimze the MSE. 

\begin{equation}
\begin{aligned}
MSE=J=\frac{1}{n} \sum^n_{i=1}{(y_i -\hat{y}_i)^2}   
\end{aligned}
\end{equation}

So we substitute $\hat{y}_i=mx_i+b$ into the cost function as


\begin{equation}
\begin{aligned}
J=\frac{1}{n} \sum^n_{i=1}{(y_i -mx_i+b)^2}   
\end{aligned}
\end{equation}

Then, we determine the gradient by taking the partial derivative of the cost function with respect to **m** and **b** as

\begin{equation}
\begin{aligned}
\frac{\partial J}{\partial m}=\frac{2}{n} \sum^n_{i=1}{(y_i -mx_i+b) \times (-x_i)} 
\end{aligned}
\end{equation}

\begin{equation}
\begin{aligned}
\frac{\partial J}{\partial b}=\frac{2}{n} \sum^n_{i=1}{(y_i -mx_i+b) \times (-1)} 
\end{aligned}
\end{equation}

So now to implement the gradient descent, we start with some values of **m** ($m_0$) and **b** ($b_0$) and iteratively modify them according the gradient and learning rate ($\lambda$) as follows:

\begin{equation}
\begin{aligned}
m_i = m_{i-1} - \lambda \times \frac{\partial J}{\partial m} 
\end{aligned}
\end{equation}

\begin{equation}
\begin{aligned}
b_i = b_{i-1} - \lambda \times \frac{\partial J}{\partial b} 
\end{aligned}
\end{equation}


In [0]:
import numpy as np
def gradient_descent_basic(x,y,m_curr,b_curr,learning_rate,iterations):
    n = len(x)
    for i in range(iterations):
        y_pred = m_curr * x + b_curr
        
        md = - ( 2 / n ) * sum( x * ( y - y_pred ))
        bd = - ( 2 / n ) * sum(( y - y_pred ))

        m_curr = m_curr - learning_rate * md 
        b_curr = b_curr - learning_rate * bd 

        J = ( 1 / n ) * sum(( y - y_pred )**2)

        print('J = {}, m = {}, b = {}, Iteration = {}'.format(J ,m_curr, b_curr, i ))
    return m_curr,b_curr,i,J

## try the gradient_descent using sample data
x = np.array([0,1,2,3]);
y = np.array([1,3,5,7]); ## y=2x+1

m_curr = 0; b_curr = 0;
gradient_descent_basic(x,y,m_curr,b_curr,0.2,20) ## learning rate = 0.2 and iteration = 20

Let us increase learning rate to 0.5 and see how the gradient descent converges.

In [0]:
## try the gradient_descent with learning rate = 0.5 and iteration = 20
m_curr = 0; b_curr = 0;
gradient_descent_basic(x,y,m_curr,b_curr,0.5,20)

As can be seen, the cost function increases instead of descreasing.

![alt text](https://drive.google.com/uc?id=1Urf6nAJ0-G5miH1EdCk4gCo5ctBtqVTh)

So usually, we start with low iteration value and some value of learning rate and see if the cost function is reducing.  Then we increase the learning rate slowly to the value just before the cost function starts increasing. This value is the best learning rate (converge with the least number of iterations).

Regarding the required number of iterations, you may stop the gradient descent search once the difference in the cost function between successive iterations reduces to less than some value (such as 1e-5 or 1e-6). Next we will modify the code to stop when the error (MSE) is less than 1e-6.

In [0]:
import numpy as np
import copy
def gradient_descent(x,y,m_curr,b_curr,learning_rate,epochs):
    n = len(x)
    i = 0 
    j_curr = 100000
    while True:
        i= i + 1
        j_before = j_curr
        y_pred = m_curr * x + b_curr
        
        md = - ( 2 / n ) * sum( x * ( y - y_pred ))
        bd = - ( 2 / n ) * sum( y - y_pred )

        m_curr = m_curr - learning_rate * md 
        b_curr = b_curr - learning_rate * bd 

        j_curr = ( 1 / n ) * sum(( y - y_pred )**2)

        if ((abs(j_curr - j_before) < 1e-5) or (i >= epochs)):
          return m_curr,b_curr,i,j_curr

## try the gradient_descent using sample data
x = np.array([0,1,2,3]);
y = np.array([1,3,5,7]); ## y=2x+1

m_curr = 0; b_curr = 0;
gradient_descent(x,y,m_curr,b_curr,0.2,100) ## learning rate = 0.2 and iteration = 20

Next, we will solve the original problem (linear regression section) using our gradient descent implementation: \\
1- Read the data in the file "HousesAreasPrices.csv" using pandas libarary, \\
2- convert the fields in the data frame to np.arrays, \\
3- then we apply the gradient_descent(x,y) on the np.arrays. \\
Recall, the solution we got in the linear regression section is **m** = 135.78767123 and **b** = 180616.43835616432

Let us begin by determining the learning rate. We will use gradient_descent_basic() to print error while trying different learning rate values.

In [0]:
import pandas as pd
df = pd.read_csv("./MachineLearning/1_Regression/HousesAreasPrices.csv")
print(df)
x=np.array(df.area)
y=np.array(df.price)
## change the learning rate and iterations
m_gd, b_gd, iters, j_curr= gradient_descent_basic(x,y,0,0,0.00000001,20)
print('J= {}, m = {}, b = {}, iterations = {}'.format(j_curr, m_gd, b_gd,iters))

As can be observed, we need to use very low learning rate to make sure error is decreasing. However, such a low learning rate needs a lot of iterations to converge. To deal with this we apply data scaling. So we scale the independent random variable according to its mean and standard deviations. 

In [0]:
import pandas as pd
df = pd.read_csv("./MachineLearning/1_Regression/HousesAreasPrices.csv")
print(df)
x=np.array(df.area)
y=np.array(df.price)

## scaling the independent random variable
x_new = (x - np.mean(x)) / np.std(x)

## change the learning rate and iterations
m_gd, b_gd, iters, j_curr= gradient_descent_basic(x_new,y,0,0,0.1,20)
print('J = {}, m = {}, b = {}, iterations = {}'.format(j_curr, m_gd, b_gd,iters))

Now, to find the best coefficients, we increase the number of iterations.

In [0]:
## change the learning rate and iterations
m_gd, b_gd, iters, j_curr= gradient_descent(x_new,y,0,0,0.01,20000)
print('J = {}, m = {}, b = {}, iterations = {}'.format(j_curr, m_gd, b_gd,iters))

Notice that the number of iterations required such that the difference between MSE of successive iterations is less than 1e-5 is 868 only.

As an inclass exercise, we will compare the error function between the rg.fit method (linear regression section) and the gradient descent.

In [0]:
m_reg = 135.78767123; ## from linear regression section
b_reg = 180616.43835616432; ## from linear regression section

m_gd ## from gradient descient results in code cell above
b_gd ## from gradient descient results in code cell above

y_actual = np.array(df.price)
y_pred_reg = m_reg * x + b_reg;
y_pred_gd = m_gd * x_new + b_gd;

n=len(y_actual)

J_reg = (1/n)*sum(abs(y_actual - y_pred_reg));
J_gd = (1/n)*sum(abs(y_actual - y_pred_gd));

dif = J_reg - J_gd;

print('J_reg = {}, J_gd = {}, Difference = {}'.format(J_reg, J_gd, dif))

As can be seen the coefficients are not the same. However the difference between MSE of both methods is very small. Let us try to plot the reg.fit line and gradient descent line on the same plot.

In [0]:
import matplotlib.pyplot as plt
plt.scatter(df.area,df.price,color='r', marker='+')
plt.xlabel('Area ($m^2$)',fontsize=20)
plt.ylabel('Prices ($)',fontsize=20)
plt.plot(df.area,y_pred_reg,color='b',label='reg.fit') ## best fit line using reg.predict
plt.plot(df.area,y_pred_gd,color='g',linestyle='--',linewidth=3,label='Gradient Descent') ## best fit line using gradient descent
plt.legend()

As can be seen, the reg.fit and the gradient descent lines are exactly the same.  

**Exercise**

Modify the gradient descent to be used for multiple linear regression with three independent variables. Then use it to estimate the houses' prices given the area, number of bedrooms, and age discussed in the multiple linear regression section.