<a href="https://colab.research.google.com/github/jwoonge/ML-algorithms/blob/master/02%20Linear%20Regression%20Visualization%5C02_Linear_regression_Visualization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

20141261 송제웅  
# 02 Visualization of Gradient Descent algorithm based on Linear Regression problem


## - function definition
[1] csv file read

In [0]:
import numpy as np
import matplotlib.pylab as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm

def read_csv(path):
    data = np.genfromtxt('data.csv', delimiter=',')

    x_data = data[:, 0]
    y_data = data[:, 1]
    return x_data, y_data

[2] result(2) 출력을 위한 Linear function

In [0]:
def linear_function(x_s, a, b):
    y_s = []
    for i in range(len(x_s)):
        y = a*x_s[i] + b
        y_s.append(y)
    return y_s

[3] object function $J(\theta ) = {1\over2m}\sum_{i=1}^N{(h_\theta({x}_{i})-{y}_{i})^2}$  및 그 구현을 위한 linear model

In [0]:
def linear_model(theta0, theta1, x):
    return theta0 + theta1 * x

def object_function(theta0, theta1, x_range, datas):
    ret = 0
    for i in range(len(x_range)):
        ret += (linear_model(theta0, theta1, x_range[i]) - datas[i])**2
    ret /= 2*(len(x_range))
    return ret

[4] gradient descent  
이전의 $\theta$ 값들 및 데이터 y_s로부터 새로운 $\theta$들을 도출한다    

$\theta_0(t+1) =\theta_0(t) -\alpha\frac{1}{m} \sum_{i=1}^m(h_\theta (x_i)−y_i)$  

$\theta_1(t+1) =\theta_1(t) -\alpha\frac{1}{m} \sum_{i=1}^m((h_\theta (x_i)−y_i)*x_i)$

In [0]:
def gradient_descent(theta0, theta1, x_s, y_s, learning_rate = 0.005):
    update_theta0 = 0
    for i in range(len(x_s)):
        update_theta0 += (linear_model(theta0, theta1, x_s[i]) - y_s[i])/len(x_s)

    update_theta1 = 0
    for i in range(len(x_s)):
        update_theta1 += ((linear_model(theta0, theta1, x_s[i]) - y_s[i]) * x_s[i])/len(x_s)

    theta0_new = theta0 - learning_rate * update_theta0
    theta1_new = theta1 - learning_rate * update_theta1

    return theta0_new, theta1_new

[5] : 종료 조건인 convergence함을 판별하기 위한 boolean 함수  
theta 값 두 가지의 변동률이 convergence rate 보다 작아지면 convergence로 판별한다  

In [0]:
def convergence(theta0, theta1, t, convergence_rate = 0.000001):
    if theta0[t-1]==0 or theta1[t-1]==0:
        return False
    if np.abs((theta0[t]-theta0[t-1])/theta0[t-1]) < convergence_rate:
        if np.abs((theta1[t]-theta1[t-1])/theta1[t-1]) < convergence_rate:
            return True
    return False

## - Main

csv file의 데이터를 읽어온다

In [0]:
x_data, y_data = read_csv('data.csv')

반복문을 위한 초기 조건 설정  
$\theta_0$, $\theta_1$ 은 각각 -30부터 gradient descent (func[4] )를 실행

In [0]:
t=0
theta0=[-30]
theta1=[-30]
energy = []
energy.append(object_function(theta0[t],theta1[t],x_data, y_data))

gradient descent로 optimization을 하는 반복문  
1회의 반복이 한 번의 최적화 단계가 된다.  
convergence (func[5])하게 되면 중지하고, 가장 object function의 값이 작았던 시점을 기억한다.

In [0]:
while True:
    theta0_new, theta1_new = gradient_descent(theta0[t], theta1[t], x_data, y_data)
    theta0.append(theta0_new)
    theta1.append(theta1_new)
    t += 1
    energy.append(object_function(theta0[t],theta1[t],x_data,y_data))
    if convergence(theta0, theta1,t):
        break
min_t = energy.index(min(energy))

## - Result

  [1] Input Points


*   plot a set of points There was an error rendering this math block that are loaded from 'data.csv' file (in black color)

In [0]:
plt.title("1_input_points")
plt.plot(x_data, y_data, 'k.')
plt.show()

[2] Linear regression result


*   plot a set of points There was an error rendering this math block that are loaded from 'data.csv' file (in black color)
*   plot a straight line obtained by the optimal linear regression based on the given set of points (in red color)
*   the estimated straight line (linear function) is superimposed on the set of points



In [0]:
x_range = [min(x_data), max(x_data)]
plt.title("2_linear_regression_result")
plt.plot(x_data, y_data, 'k.')
plt.plot(x_range, linear_function(x_range, theta1[min_t], theta0[min_t]), 'r')
plt.show()

[3] Plot the energy surface
*   plot the energy surface ($\theta_0$, $\theta_1$, $J(\theta_0,\theta_1)$) with the range of variables $\theta_0$=[-30:0.1:30] and $\theta_1$=[-30:0.1:30]

In [0]:
theta0_range = np.arange(-30,30,0.1)
theta1_range = np.arange(-30,30,0.1)
theta0_range, theta1_range = np.meshgrid(theta0_range, theta1_range)
J = object_function(theta0_range, theta1_range, x_data, y_data)

fig = plt.figure()
plt.title("3_energy_surface")
ax = fig.gca(projection='3d')
ax.plot_surface(theta0_range, theta1_range, J, alpha=0.7, cmap=cm.jet)
ax.set_xlabel('theta_0')
ax.set_ylabel('theta_1')
ax.set_zlabel('energy')
ax.view_init(45,45)
plt.show()

[4] Plot the gradient descent path on the energy surface
*   plot the energy surface ($\theta_0$, $\theta_1$, $J(\theta_0,\theta_1)$) with the range of variables $\theta_0$=[-30:0.1:30] and $\theta_1$=[-30:0.1:30]
*   plot the energy value with the updated variables $\theta_0(t)$ and $\theta_1(t)$ at each gradient descent step on the energy surface
*   the initial condition is used by $\theta_0(0)=-30$ and $\theta_1(1)=-30$
*   the gradient descent is performed until the convergence is achieved
*   the gradient descent path is superimposed on the energy surface

In [0]:
fig = plt.figure()
plt.title("4_gradient_descent_path")
ax = fig.gca(projection='3d')
ax.plot_surface(theta0_range, theta1_range, J, alpha=0.7, cmap=cm.jet)
ax.set_xlabel('theta_0')
ax.set_ylabel('theta_1')
ax.set_zlabel('energy')

theta0 = theta0[0:min_t+1]
theta1 = theta1[0:min_t+1]
energy = energy[0:min_t+1]
ax.plot(theta0, theta1, energy, c='k', zorder=5)
ax.view_init(45,45)
plt.show()