<a href="https://colab.research.google.com/github/mohammad0alfares/MachineLearningNotebooks/blob/master/RegressionBasics_Part1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Basics of Regression Techniques - Part1

In this tutorial, we will explore the basics of regression techniques. We will be discussing Liner Regression, Multiple Regression, Cost Function and Gradient Descent. Click [here](https://github.com/mkjubran/MachineLearning/blob/master/1_Regression/LinearRegression.pdf) to download a brief presentation about some of the topics covered in this tutorial.

**By the end of this tutorial**, you will be able to:
-	Recognize the topics to be covered in this tutorial.
-	Master the concepts of linear and multivariant regression.
-	Acquire thorough knowledge of the mathematical formulation of cost function and cost function minimization in Machine Learning.
-	Be able to build and apply simple regression models.

**Before Session**:
-	Watch a video about the main steps to build and apply machine learning algorithms (video source 1.1)
-	Watch two videos about linear regression to gain basic knowledge about linear regression, and to learn about modeling problems that can be solved using linear regression (video sources 1.2, 1.3, and 1.4)
-	Read about linear regression and gradient descent (reading sources 1.5 - 1.8)

**Resources:**

1.1	The 7 Steps of Machine Learning (video): https://youtu.be/nKW8Ndu7Mjw

1.2	The linear regression model (video): https://youtu.be/m88h75F3Rl8

1.3	Simple Linear Regression (video): https://www.youtube.com/watch?v=owI7zxCqNY0

1.4	Regression Analysis (video): https://youtu.be/DtOYBxi4AIE

1.5	Linear Regression Explained (reading): https://towardsdatascience.com/linear-regression-explained-d0a1068accb9 

1.6	Linear Regression — Detailed View (reading): https://towardsdatascience.com/linear-regression-detailed-view-ea73175f6e86

1.7	Machine learning fundamentals (I) (reading): Cost functions and gradient descent:   https://towardsdatascience.com/machine-learning-fundamentals-via-linear-regression-41a5d11f5220

1.8	Cost Function, Gradient Descent and Univariate Linear Regression (reading): https://medium.com/@lachlanmiller_52885/machine-learning-week-1-cost-function-gradient-descent-and-univariate-linear-regression-8f5fe69815fd



**Clone the Source GitHub Reporsitory**

Before we start applying the procedure of this tutorial, we need to clone some source files to be used throughtout this tutorial from a GitHub reprository

In [0]:
!rm -rf ./MachineLearning
!git clone https://github.com/mkjubran/MachineLearning.git

# Linear Regression
**Introduction**

In this section, we will come up with a technique to estimate the amount of spending of mall cutomers based on their annual income. This is achieved through the following procedure: \\
1- collect some statistics about mall customers which include their annual income and spendings, \\
2- we will use this data to build a model to correlate the spendings of the customers with their income, \\
3- next, we will use the model to estimate the spendings of new customers based on their annual income.

**Implementation**

In a previous module, you learned how to extract and collect data. Now, let us assume the data which includes annual income and spendings of the mall customers which are saved in a csv file called "Mall_Customers_short.csv" \\
To read the data in the file, we will be using the pandas library (https://pandas.pydata.org/).

In [0]:
import pandas as pd
df = pd.read_csv("./MachineLearning/1_Regression/Mall_Customers_short.csv")
df.head()

You could also list values at the end of the dataframe using

In [0]:
df.tail()

You could also print the whole data using

In [0]:
print(df)

Now, to visualize the data, we will plot the pairs (Annual Income (K), Spendings) of each house on a scattered plot. To do this we need to use the matplotlib library (https://matplotlib.org/).

In [0]:
import matplotlib.pyplot as plt
plt.scatter(df['Annual Income (K)'],df[['Spendings']],color='r', marker='+')
plt.xlabel('Annual Income (K)',fontsize=20)
plt.ylabel('Spendings',fontsize=20)

As can be observed from the plot, a straight line can be used to represent the data. So we will use the Linear Regression method in the sklearn library (https://scikit-learn.org/stable/) to derive the best fitting line (determine the best coefficient and interception values) based on the given data.

In [0]:
from sklearn import linear_model
reg = linear_model.LinearRegression()
reg.fit(df[['Annual Income (K)']],df[['Spendings']])
print(reg.coef_) ## print the coefficient
print(reg.intercept_) ## print the intercept

To visualize the line, we plot the best fitting line on the scattered plot

In [0]:
plt.scatter(df[['Annual Income (K)']],df[['Spendings']],color='r', marker='+')
plt.xlabel('Annual Income (K)',fontsize=20)
plt.ylabel('Spendings',fontsize=20)
plt.plot(df[['Annual Income (K)']],reg.predict(df[['Annual Income (K)']]),color='b')

After building the model, we will use it to estimate the spendings of a list of new mall customers based on their annual income. Let us assume that the income of a list of new customers is stored in a csv file called "Mall_Customers_short_new.csv". We will read the data from the file into a dataframe, and apply the annual income values in the dataframe to the model to determine the estimated spendings, then we will append the estimated spenings to the dataframe and store the new dataframe to a new csv file called "Predicted_Mall_Customers_short_new.csv"

In [0]:
df2 = pd.read_csv("./MachineLearning/1_Regression/Mall_Customers_short_new.csv")
p=reg.predict(df2)
df2['Spendings']=p
print(df2)
df2.to_csv('./MachineLearning/1_Regression/Predicted_Mall_Customers_short_new.csv',index=False)

We could also plot the estimated spendings of cusromers in the original data and the best fitting line on the same figure as

In [0]:
df_merged= df.append(df2, ignore_index=True)

plt.scatter(df[['Annual Income (K)']],df[['Spendings']],color='r', marker='+')
plt.xlabel('Annual Income (K)',fontsize=20)
plt.ylabel('Spendings',fontsize=20)
plt.plot(df_merged[['Annual Income (K)']], reg.predict(df_merged[['Annual Income (K)']]),color='b',linestyle='-.',linewidth=0.5)
plt.scatter(df2[['Annual Income (K)']],df2[['Spendings']],color='m', marker='o')

As can be observed from the figure, the estimated spendings of the new customers are located on the best fit line

**Exercise 1.1:**

Use linear regression to estimate the spendings of custoemrs based on their age. To complete this exercise, two files are included in the repository: \\
1- Mall_Customer_Age_Spendings.csv: a list of customers ages and spendings \\
2- Mall_Customer_Age.csv: a list of ages of new customers

# Multiple Regression
**Introduction**

In this section, we will extend the model derived in the Linear Regression section to include more than one independent variable. This method is called Multiple Regression. We will use the annual income of a family (Annual Income), the number of workers in the family (Working), and the number of children who are not working in the family (Kids) to estimate the level of spendings.

**Implementation**

Read the data in the file "FamilySpendings.csv" using pandas libarary.

In [0]:
import pandas as pd
df = pd.read_csv("./MachineLearning/1_Regression/FamilySpendings.csv")
df.head()

In the above table, there is more than one feature that corresponds to the spendings of each family. In the multivariate, the number of input independent variables (features) is at least two or more. In our case, the features are Annual Income, Working (number of working family members), and Kids (number of kids in the family). And the dependent variable is the spendings of the family. 

We notice that the Working feature index 2 has a value of NaN, this is typically due to empty value in the csv file. Thus, we need to process the dataframe to clean the data. In our case, we will replace the NaN value with the rounded median of the other Working values in the table.

In [0]:
import math
median_Working = df.Working.median() # media of number of working family members in dataframe
print(median_Working)
median_Working = math.floor(df.Working.median())# use the math library to compute the floor of the media of number of working family members in dataframe
print(median_Working)
df.Working = df.Working.fillna(median_Working) #replace the NAN with the median value
df.head()

Now the data is clean. Next, we will use the LinearRegression method in the sklearn library (https://scikit-learn.org/stable/) to derive the best fitting line (determine the best coefficients and interception values) based on the given data. 


In [0]:
from sklearn import linear_model
regm = linear_model.LinearRegression()
regm.fit(df[['Annual Income','Working','Kids']],df.Spendings)
print(regm.coef_) ## print the coefficients
print(regm.intercept_) ## print the intercept

After building the model, we will use it to estimate the spendings of a list of new families based on their features (Annual Income,Working, Kids). Let us assume features of few families are stored in a csv file called "FamilyIncomeWorkingKids.csv". We will read the data from the file into a dataframe, and apply the values of the features in the dataframe to the model to determine the estimated spendings, then we will append the etimted spendings to the dataframe and store the new dataframe to new csv file called "PredictedFamilySpendings.csv"

In [0]:
dfm = pd.read_csv("./MachineLearning/1_Regression/FamilyIncomeWorkingKids.csv")
p=regm.predict(dfm)
dfm['Spendings']=p
dfm.to_csv('./MachineLearning/1_Regression/PredictFamilySpendings.csv',index=False)
dfm.head()

Optional: we could also check the residul error between the actual spendings (in 'FamilySpendings.csv') and the predicted values. 

In [0]:
ppr=regm.predict(df[['Annual Income','Working','Kids']])
df['Predicted Spendings']=ppr
df['Residual']=df['Spendings']-df['Predicted Spendings']
df.head()

**Exercise 1.2:** \\
Use multiple regression to estimate the spending of a family based on the family (household) income, number of workers in the family, number of kids in the family, and city tax. You are given some data in the "FamilySpendings_exercise.csv" file included in the repository. You need to estimate the spendings of the following families:

*Family One*: Two workers, two kids, an annual income of \$30000, and pay \$3000 taxes for the city.

*Family Two*: One worker, one kid, an annual income of \$20000, and pay \$500 taxes for the city.

*Family Three*: Two workers, two kids, an annual income of \$30000, and you don't know how much taxes the family pays for the city.

# Cost Function and Gradient Descent
In this section, we will learn how to use gradient descent to determine the optimal coefficients and intercept of linear regression.

**Implementation**

In order to determine the best fit line, we need to determine the values of **m** and **b** of the straight line $\hat{y}_i=mx_i+b$ that minimze the MSE. 

\begin{equation}
\begin{aligned}
MSE=J=\frac{1}{n} \sum^n_{i=1}{(y_i -\hat{y}_i)^2}   
\end{aligned}
\end{equation}

So we substitute $\hat{y}_i=mx_i+b$ into the cost function as


\begin{equation}
\begin{aligned}
J=\frac{1}{n} \sum^n_{i=1}{(y_i -mx_i+b)^2}   
\end{aligned}
\end{equation}

Then, we determine the gradient by taking the partial derivative of the cost function with respect to **m** and **b** as

\begin{equation}
\begin{aligned}
\frac{\partial J}{\partial m}=\frac{2}{n} \sum^n_{i=1}{(y_i -mx_i+b) \times (-x_i)} 
\end{aligned}
\end{equation}

\begin{equation}
\begin{aligned}
\frac{\partial J}{\partial b}=\frac{2}{n} \sum^n_{i=1}{(y_i -mx_i+b) \times (-1)} 
\end{aligned}
\end{equation}

So now to implement the gradient descent, we start with some values of **m** ($m_0$) and **b** ($b_0$) and iteratively modify them according the gradient and learning rate ($\lambda$) as follows:

\begin{equation}
\begin{aligned}
m_i = m_{i-1} - \lambda \times \frac{\partial J}{\partial m} 
\end{aligned}
\end{equation}

\begin{equation}
\begin{aligned}
b_i = b_{i-1} - \lambda \times \frac{\partial J}{\partial b} 
\end{aligned}
\end{equation}


Watch this annimation to visualize how gradient descent works https://github.com/mattnedrich/GradientDescentExample/blob/master/gradient_descent_example.gif

In [0]:
import numpy as np
def gradient_descent_basic(x,y,m_curr,b_curr,learning_rate,iterations):
    n = len(x)
    for i in range(iterations):
        y_pred = m_curr * x + b_curr
        
        md = - ( 2 / n ) * sum( x * ( y - y_pred ))
        bd = - ( 2 / n ) * sum(( y - y_pred ))

        m_curr = m_curr - learning_rate * md 
        b_curr = b_curr - learning_rate * bd 

        J = ( 1 / n ) * sum(( y - y_pred )**2)

        print('J = {}, m = {}, b = {}, Iteration = {}'.format(J ,m_curr, b_curr, i ))
    return m_curr,b_curr,i,J

## try the gradient_descent using sample data
x = np.array([0,1,2,3]);
y = np.array([1,3,5,7]); ## y=2x+1

m_curr = 0; b_curr = 0;
gradient_descent_basic(x,y,m_curr,b_curr,0.2,20) ## learning rate = 0.2 and iteration = 20

Let us increase learning rate to 0.5 and see how the gradient descent converges.

In [0]:
## try the gradient_descent with learning rate = 0.5 and iteration = 20
m_curr = 0; b_curr = 0;
gradient_descent_basic(x,y,m_curr,b_curr,0.5,20)

As can be seen, the cost function increases instead of descreasing.

So usually, we start with low iteration value and some value of learning rate and see if the cost function is reducing.  Then we increase the learning rate slowly to the value just before the cost function starts increasing. This value is the best learning rate (converge with the least number of iterations).

Regarding the required number of iterations, you may stop the gradient descent search once the difference in the cost function between successive iterations reduces to less than some value (such as 1e-5 or 1e-6). Next we will modify the code to stop when the error (MSE) is less than 1e-6.

In [0]:
import numpy as np
import copy
def gradient_descent(x,y,m_curr,b_curr,learning_rate,epochs):
    n = len(x)
    i = 0 
    j_curr = 100000
    while True:
        i= i + 1
        j_before = j_curr
        y_pred = m_curr * x + b_curr
        
        md = - ( 2 / n ) * sum( x * ( y - y_pred ))
        bd = - ( 2 / n ) * sum( y - y_pred )

        m_curr = m_curr - learning_rate * md 
        b_curr = b_curr - learning_rate * bd 

        j_curr = ( 1 / n ) * sum(( y - y_pred )**2)

        if ((abs(j_curr - j_before) < 1e-5) or (i >= epochs)):
          return m_curr,b_curr,i,j_curr

## try the gradient_descent using sample data
x = np.array([0,1,2,3]);
y = np.array([1,3,5,7]); ## y=2x+1

m_curr = 0; b_curr = 0;
gradient_descent(x,y,m_curr,b_curr,0.2,100) ## learning rate = 0.2 and iteration = 20

Next, we will determine the cooeficient and intercept of the best fitting straigh line using linear regression (similar to the first part) and also by using our implementation of logistic regression. We will start by reading the data in the training dataset (.csv file)

In [0]:
import pandas as pd
df = pd.read_csv("./MachineLearning/1_Regression/Mall_Customers_Logitic_short.csv")
df.head()

To plot this data using scatter plot, use:

In [0]:
import matplotlib.pyplot as plt
plt.scatter(df['Annual Income'],df[['Spendings']],color='r', marker='+')
plt.xlabel('Annual Income (K)',fontsize=20)
plt.ylabel('Spendings',fontsize=20)

Let us begin by determine the linear regression coefficients as done in the Liner regression section

In [0]:
from sklearn import linear_model
reg = linear_model.LinearRegression()
reg.fit(df[['Annual Income']],df[['Spendings']])
print(reg.coef_) ## print the coefficient
print(reg.intercept_) ## print the intercept

m_reg = reg.coef_
b_reg = reg.intercept_

To plot this line:

In [0]:
plt.scatter(df[['Annual Income']],df[['Spendings']],color='r', marker='+')
plt.xlabel('Annual Income (K)',fontsize=20)
plt.ylabel('Spendings',fontsize=20)
plt.plot(df[['Annual Income']],reg.predict(df[['Annual Income']]),color='b')

Let us begin by determining the learning rate. We will use gradient_descent_basic() to print error while trying different learning rate values.

In [0]:
x=np.array(df[['Annual Income']])
y=np.array(df[['Spendings']])
## change the learning rate and iterations
m_gd, b_gd, iters, j_curr= gradient_descent_basic(x,y,0,0,0.0000000001,20)
print('J= {}, m = {}, b = {}, iterations = {}'.format(j_curr, m_gd, b_gd,iters))

As can be observed, we need to use very low learning rate to make sure error is decreasing. However, such a low learning rate needs a lot of iterations to converge. To deal with this we apply data scaling. So we scale the independent random variable according to its mean and standard deviations. 

In [0]:
x=np.array(df[['Annual Income']])
y=np.array(df[['Spendings']])

## scaling the independent random variable
x_new = (x - np.mean(x)) / np.std(x)

## change the learning rate and iterations
m_gd, b_gd, iters, j_curr= gradient_descent_basic(x_new,y,0,0,0.01,20)
print('J = {}, m = {}, b = {}, iterations = {}'.format(j_curr, m_gd, b_gd,iters))

Now, to find the best coefficients, we increase the number of iterations.

In [0]:
## change the learning rate and iterations
m_gd, b_gd, iters, j_curr= gradient_descent(x_new,y,0,0,0.01,20000)
print('With data scaling: J = {}, m = {}, b = {}, iterations = {}'.format(j_curr, m_gd, b_gd,iters))

Notice that the number of iterations required such that the difference between MSE of successive iterations is less than 1e-5 is 338 only.

As an in class exercise, we will compare the error function between the rg.fit method (linear regression section) and the gradient descent.

In [0]:
m_reg ## from Linear Regression in code cell above
b_reg ## from Linear Regression in code cell above

m_gd ## from gradient descient results in code cell above
b_gd ## from gradient descient results in code cell above

y_actual = np.array(df.Spendings)

y_pred_reg = m_reg * x + b_reg;

y_pred_gd = m_gd * x_new + b_gd;

n=len(y_actual)

J_reg = (1/n)*sum(abs(y_actual - y_pred_reg[:,0]));
J_gd = (1/n)*sum(abs(y_actual - y_pred_gd[:,0]));

dif = J_reg - J_gd;

print('J_reg = {}, \n\nJ_gd = {}, \n\nDifference = {}'.format(J_reg, J_gd, dif))

As can be seen the coefficients are not the same. However the difference between MSE of both methods is very small. Let us try to plot the reg.fit line and gradient descent line on the same plot.

In [0]:
import matplotlib.pyplot as plt
plt.scatter(df[['Annual Income']],df[['Spendings']],color='r', marker='+')
plt.xlabel('Annual Income',fontsize=20)
plt.ylabel('Spendings',fontsize=20)
plt.plot(df[['Annual Income']],y_pred_reg,color='b',label='reg.fit') ## best fit line using reg.predict
plt.plot(df[['Annual Income']],y_pred_gd,color='g',linestyle='--',linewidth=3,label='Gradient Descent') ## best fit line using gradient descent
plt.legend()

As can be seen, the reg.fit and the gradient descent lines are exactly the same.  

**Exercise 1.3 (optional):**

Modify the gradient descent to be used for multiple linear regression with three independent variables. Then use it to estimate the spendings given the annual income, number of persons working in the family, and number of kids discussed in the multiple linear regression section.