## Gradient Descent

Let's say we have an equation and its derivative : <br><br>
$f(x) = x^{2}$ &ensp; &ensp; &ensp; $\frac{dy}{dx} = 2x$<br><br>

We can get the lowest value of x using the equation : <br><br>
$x_{new} = x_{old} - \alpha(2x_{old})$ $
\begin{bmatrix}
\ 1 \\
\ 1 - 0.1(2)(1) = 0.8 \\
\ 0.8 - 0.1(2)(0.8) = 0.64 \\
\ 0.64 - 0.1(2)(0.64) = 0.512 \\
\vdots \\
\end{bmatrix}$

In [None]:
x = 10
derivative = []
y = []

for i in range(1000):
    old_value = x
    y.append(old_value ** 2)
    derivative.append(old_value - 0.01 * 2 * old_value)
    x = old_value - 0.01 * 2 * old_value

In [None]:
y[:20]

Gradient descent is dependent on the running rate $\alpha$ and the number of loops. If $\alpha$ and running rate is too small, we will not be able to reach the minimum value, and it is time-inefficient.

Same is true if the parameters are too big.

## Linear Regression with Gradient Descent

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

### 1. Load Dataset

In [None]:
#create a dataframe
df = pd.read_csv('slr06.csv')
df.head()

In [None]:
#reshape the x_column
raw_X = df["X"].values.reshape(-1, 1)
y = df["Y"].values

In [None]:
raw_x.shape

In [None]:
#plot the values
plt.figure(figsize=(10,6))
plt.plot(raw_X, y, 'o', alpha=0.5)

In [None]:
#preview the data points
raw_X[:5], y[:5]

In [None]:
#fill the first column of raw_X with ones
np.ones((len(raw_X),1))[:3]
X = np.concatenate( (np.ones((len(raw_X),1)), raw_X ), axis=1)
X[:5]

In [None]:
w = np.random.normal((2,1)) 
# w = np.array([5,3]) w is theta
w

**Note**: w[0] is the expected intercept and w[1] is the expected slope. The reason why we have the first column of X filled with ones is because we want to dot product X and w, where the first column of X is 1 and the first column of w is just the expected intercept (multiplication of the two yields just the intercept)

In [None]:
plt.figure(figsize=(10,5))
y_predict = np.dot(X, w)
plt.plot(raw_X,y,'o', color='blue', alpha=0.5) #raw_X and y are from the dataset we imported
plt.plot(raw_X,y_predict)

In [None]:
y_predict[:10]

### 2. Hypothesis and Cost Function

Hypothesis function : 
$$\large h_{\theta}(x^{(i)})$$

In [None]:
def hypothesis_function(X, theta):
    """
    input: matrix X and theta values
    output: expected values of y from matrix X and theta values
    """
    return X.dot(theta)

In [None]:
#a vector containing expected values of y from random weight values
# note that this is the same as y_predict values from section 1
h = hypothesis_function(X,w)
h[:10]

Cost function is as follows : 

$$\large J(w_0, w_1) = \large \frac{1}{2m} \sum\limits_{i=1}^m (h_{\theta}(x^{(i)}) - y^{(i)})^{2}$$

In [None]:
def cost_function(h, y):
    """
    input: hypothesis function and y-values
    output: cost_function output
    """
    return (1/(2*len(y))) * np.sum((h-y)**2)

In [None]:
h = hypothesis_function(X,w)
cost_function(h, y)

### 3. Gradient Descent

In [None]:
def gradient_descent(X, y, w, alpha, iterations):
    theta = w
    m = len(y)
    
    theta_list = [theta.tolist()]
    cost = cost_function(hypothesis_function(X, theta), y)
    cost_list = [cost]

    for i in range(iterations):
        t0 = theta[0] - (alpha / m) * np.sum(np.dot(X, theta) - y)
        t1 = theta[1] - (alpha / m) * np.sum((np.dot(X, theta) - y) * X[:,1])
        theta = np.array([t0, t1])
        
        if i % 10== 0:
            theta_list.append(theta.tolist())
            cost = cost_function(hypothesis_function(X, theta), y)
            cost_list.append(cost)


    return theta, theta_list, cost_list

In [None]:
iterations = 10000
alpha = 0.001

theta, theta_list, cost_list = gradient_descent(X, y, w, alpha, iterations)
cost = cost_function(hypothesis_function(X, theta), y)

print("theta:", theta)
print('cost:', cost_function(hypothesis_function(X, theta), y))

In [None]:
theta_list = np.array(theta_list)

In [None]:
plt.figure(figsize=(10,5))

y_predict_step= np.dot(X, theta_list.transpose())

y_predict_step
plt.plot(raw_X,y,"o", alpha=0.5)
for i in range (0,len(cost_list),100):
    plt.plot(raw_X,y_predict_step[:,i], label='Line %d'%i)

plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.show()