Authors: Vo, Huynh Quang Nguyen; Nguyen, Duc Huy.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython import display
from utils import autograder_multivariate_calculus

# Note: This quiz use autograder
1. To finish a coding task, you first complete the coding cell denoted as the `CODING SESSION`. Next, you run the coding cell and subsequently the grading cell denoted as `GRADING SESSION`. 


2. If your code does not have any errors, no notification will appear. Otherwise, an `assertError` appears and you are prompted to fix the code before continuing.

***

As an example, our task is to create the following matrix:
$$
A = \begin{bmatrix} 3 & 4 & 5 \\ 3 & 4 & 5\end{bmatrix}
$$

We enter the following code, then run the cell containing the code:

In [None]:
###
# CODING SESSION
#
A = np.array([[3,4,5],[3,4,5]])

We finally run the cell containing the autograder to finish the task. Again, if there is no error in our code, no notification will appear.

In [None]:
###
# GRADING SESSION
#
autograder_multivariate_calculus.example(A)

However, if we change our code to a 'wrong one' and run the autograder, an AssertionError will appear.

In [None]:
###
# Uncomment the following codes to see how the notification appears when our code is wrong.
#

#A = np.array([[1,2,3],[4,5,6]])
#autograder_multivariate_calculus.example(A)

# I. Derivatives of a function
In this task, you will use the Sympy library to find the derivative of a function-under-interest, then evaluate this derivative at a point:

1. $f(x) = x^{3/2} + \pi x^2 + \sqrt{7}$, $f'(x)$ evaluated at $x = 2$


2. $f(x) = sin(x)e^{cos(x)}$, $f'(x)$ evaluated at $x = \pi$


3. $f(x) = e^{(x+1)^2}$, $f'(x)$ evaluated at $x = 1$


4. $f(x) = x^2cos^3(x)$, $f'(x)$ evaluated at $x = \pi$


In [None]:
##
# CODING SESSION
#
x = symbols('x')

f_1 = ...
eval_1 = ...

f_2 = ...
eval_2 = ...

f_3 = ...
eval_3 = ...

f_4 = ...
eval_4 = ...

In [None]:
##
# GRADING SESSION
#
autograder_multivariate_calculus.derivative(eval_1,eval_2,eval_3,eval_4)

# II. Jacobians and Hessians
In this practice, you will use the Sympy library to find the Jacobian/Hessian of a function-under-interest, then evaluate its Jacobian/Hessian at a point:

1. $f(x,y,z) = x^2cos(y) + e^zsin(y)$, $J[f(x)]$ evaluated at $x = (\pi,\pi,1)$.


2. $u(x,y) = x^2y - cos(x)sin(y)$ and $v(x,y) = e^{x +y}$. The Jacobian of vector-value functions $\begin{bmatrix} J[u(x,y)] \\ J[v(x,y)] \end{bmatrix}$ evaluated at $x = (0,\pi)$.


3. $f(x,y) = x^3cos(y) - xsin(y)$, $H[f(x,y)]$ evaluated at $x = (0,0)$.


4. $f(x,y,z) = xycos(z)âˆ’sin(x)e^yz^3$, $H[f(x,y,z)]$ evaluated at $(x,y,z) = (0,0,0)$.

In [None]:
##
# CODING SESSION
#

In [None]:
##
# GRADING SESSION
#

# III. Linear Regression with Gradient Descent
1. In statistics, linear regression is a linear approach to model the relationship between a response and one or more explanatory variables". Mathematically speaking, given a response variable $y$, an explanatory variable $x$, and an assummed linear relationship between $y$ and $x$: $y=mx+b$, our aim is to find $m$ and $b$ that best fits $x$ to $y$. 


2. Sometimes, we will find the equation for linear regression expressed as $y = mx_1 + x_0$ for the sake of programming. If $\mathbf{X}$ is a multidimensional vector, this expression can be expressed as:
$$
y = x_0 + \mathbf{M}\mathbf{X}= x_0 + m_{1}x_{1} + m_{2}x_{2} + m_{3}x_{3} + ... + m_{n}x_{n}$.
$$


3. In Machine Learning, we often call: 
    * x: input data. 
    * y: output data.
    * m: weight

***

In this exercise, you will see on how to solve a simple linear regression problem by using a very famous technique in Machine Learning called Gradient Descent. The steps to implement Gradient Descent are as follow: 

1. Initialize $m$ and $b$ by random numbers, or just $m=0$, $b=0$.


2. Find $y_{pred} = mx + b$.


3. Update $m$ and $b$ by using $m = m - \alpha\frac{\partial f(m,b)}{\partial m}$ and $b = b - \alpha\frac{\partial f(m,b)}{\partial b}$. 

In this step, you will need to find the partial derivatives $\frac{\partial f(m,b)}{\partial m}$ and $\frac{\partial f(m,b)}{\partial b}$ first. $f(m,b)$ is the cost function representing the difference between the predicted values and the actual (also known as the ground truth) values. Here, our cost function is 'One Half Mean Square Error':

$$
J(m,b) = \frac{1}{2} \times \frac{1}{n}\sum_{i=1}^n (y_{ipred}-y_i)^2 = \frac{1}{2} \times \frac{1}{n}\sum_{i=1}^n ((mx_i+b)-y_i)^2
$$.

The partial derivatives of our cost function are as follows:
> * $\frac{\partial J(m,b)}{\partial m} =  \frac{1}{2} \times \frac{2}{n}\sum_{i=1}^n x_i((mx_i+b)-y_i)=\frac{1}{n}\sum_{i=1}^n x_i((mx_i+b)-y_i)=\frac{1}{n}\sum_{i=1}^n x_i(y_{ipred}-y_i)$.

> * $\frac{\partial J(m,b)}{\partial b} =  \frac{1}{2} \times \frac{2}{n}\sum_{i=1}^n 1((mx_i+b)-y_i)=\frac{1}{n}\sum_{i=1}^n ((mx_i+b)-y_i)=\frac{1}{n}\sum_{i=1}^n (y_{ipred}-y_i)$.


4. Repeat from step 2.

<img src="images/gradient_descent.gif" width="80%">

Figure 1: Visualization of the gradient descent algorithm for linear regression. 

In [None]:
##
# Read the .csv file and show the data:
#
data = pd.read_csv("./dataset/HourStudied_vs_ExamScore.csv") 
print(data)

In [None]:
##
# Assign the .csv data to X and y:
# 
X = np.array(data['Hours Studied'])
y = np.array(data['Percentage Score'])
plt.scatter(X, y)
plt.title('Time spent on studying vs. Exam Score')
plt.xlabel('Hours Studied')
plt.ylabel('Exam Score')
plt.show()

In [None]:
## 
# Check the size of dataset:
#
print('Shape of vector X:', X.shape)
print('Shape of vector y:', y.shape)

In [None]:
##
# Reshape the shape of vector for the purpose of mathematical operation:
#
X = X.reshape(X.shape[0],1)
y = y.reshape(y.shape[0],1)
print('Shape of vector X:', X.shape)
print('Shape of vector y:', y.shape)

In [None]:
##
# Step 1: Initialize m and b:
#
m = 0
b = 0

for i in range(50):   
    display.clear_output(wait= True)

    ##
    # Step 2: Find y_pred = mx + b:
    #
    y_pred = m*X + b
        
    ##
    # Step 3: Update m and b using the Gradient Descent algorithm:
    #
    dm = np.mean((y_pred - y) * X)
    db = np.mean(y_pred - y)
    m = m - 0.005*dm
    b = b - 0.005*db
    
    ##
    # Step 4: Plot and repeat:
    #
    plt.scatter(X, y)
    plt.plot(X, y_pred)
    plt.title('Time spent on studying vs. Exam Score')
    plt.xlabel('Hours Studied')
    plt.ylabel('Exam Score')
    plt.show()

Using the example given above, solve the linear regression problem for the ``Xy_dataset.csv` dataset. Then, answer the following questions:

1. Using the exact parameters above ($m = 0$, $b = 0$, `for i in range(50):`, `m = m - 0.005*dm`, `b = b - 0.005*db`), does your code work?


2. For the part `m = m - 0.005*dm` and `b = b - 0.005*db`, try to change $0.005$ to some larger number (like $0.1$). What do you observe? Is it better now?


3. Again, for the part `m = m - 0.005*dm` and `b = b - 0.005*db`, try to change $0.005$ to $0.2$. What do you see? (In fact, this effect is called overshoot. And the `0.005` is called 'learning rate'. Generally, we often start with a small learning rate (but not too small!). Technically speaking, the learning rate depends on the dataset. A more advanced technique is called 'learning rate scheduler' which will modify the learning rate on the fly instead of using the fixed one, but you don't need to worry about this for now.


4. For the part `for i in range(50):`, change it to `for i in range(100):`. Also change the 'learning rate' to $0.1$. Does it work better now? (The `50` in `for i in range(50):` is actually called epochs, or a number of iteration. Next time, instead of writing `for i in range(50):`, we will write `for i in range(epochs):`.

In [None]:
##
# Read the .csv file:
#
data = pd.read_csv("./dataset/Xy_dataset.csv") 

##
# Assign CSV data to X and y. Noted that we only use 'Biking' and 'Heart Disease':
#
X = np.array(data['X'])
y = np.array(data['y'])
plt.scatter(X, y)
plt.title('X vs. y')
plt.xlabel('X')
plt.ylabel('y')
plt.show()

In [None]:
##
# Check the size of dataset:
#
print('Shape of vector X:', X.shape)
print('Shape of vector y:', y.shape)

##
# Reshape to appropriate size:
#
X = X.reshape(X.shape[0],1)
y = y.reshape(y.shape[0],1)
print('Shape of vector X:', X.shape)
print('Shape of vector y:', y.shape)

In [None]:
## 
# CODING SESSION
# 1.

##
# Step 1: Initialize m and b:
#
m = ...
b = ...

for i in range(...):
    display.clear_output(wait = True)
    ##
    # Step 2: Find y_pred = mx + b:
    #
    ...
    
    ##
    # Step 3: Update m and b using the Gradient Descent algorithm:
    #
    ...
    
    ##
    # Step 4: Plot and repeat
    #
    ...

In [None]:
##
# GRADING SESSION
#
autograder_multivariate_calculus.question1(m,b,X,y)

In [None]:
## 
# CODING SESSION
# 2.

##
# Step 1: Initialize m and b:
#
m = ...
b = ...

for i in range(...):
    display.clear_output(wait = True)
    ##
    # Step 2: Find y_pred = mx + b:
    #
    ...
    
    ##
    # Step 3: Update m and b using the Gradient Descent algorithm:
    #
    ...
    
    ##
    # Step 4: Plot and repeat
    #
    ...

In [None]:
##
# GRADING SESSION
#
autograder_multivariate_calculus.question2(m,b,X,y)

In [None]:
## 
# CODING SESSION
# 3.

##
# Step 1: Initialize m and b:
#
m = ...
b = ...

for i in range(...):
    display.clear_output(wait = True)
    ##
    # Step 2: Find y_pred = mx + b:
    #
    ...
    
    ##
    # Step 3: Update m and b using the Gradient Descent algorithm:
    #
    ...
    
    ##
    # Step 4: Plot and repeat
    #
    ...

In [None]:
##
# GRADING SESSION
#
autograder_multivariate_calculus.question3(m,b,X,y)

In [None]:
## 
# CODING SESSION
# 4.

##
# Step 1: Initialize m and b:
#
m = ...
b = ...

for i in range(...):
    display.clear_ouput(wait = True)
    ##
    # Step 2: Find y_pred = mx + b:
    #
    ...
    
    ##
    # Step 3: Update m and b using the Gradient Descent algorithm:
    #
    ...
    
    ##
    # Step 4: Plot and repeat
    #
    ...

In [None]:
##
# GRADING SESSION
#
autograder_multivariate_calculus.question4(m,b,X,y)

# Demo: Application of L1 Normalization in Data Preprocessing