# Tobig's 14기 2주차 Optimization 과제
### Made by 이지용

# Gradient Descent 구현하기

### 1) "..." 표시되어 있는 빈 칸을 채워주세요  
### 2) 강의내용과 코드에 대해 공부한 내용을 적어서 과제를 채워주세요

In [1]:
import pandas as pd
import numpy as np
import random

In [2]:
data = pd.read_csv('assignment_2.csv')
data.head()

Unnamed: 0,Label,bias,experience,salary
0,1,1,0.7,48000
1,0,1,1.9,48000
2,1,1,2.5,60000
3,0,1,4.2,63000
4,0,1,6.0,76000


## Train Test 데이터 나누기
### 데이터셋을 train/test로 나눠주는 메소드  
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

In [3]:
from sklearn.model_selection import train_test_split

In [4]:
X_train, X_test, y_train, y_test = train_test_split(data.iloc[:, 1:], data.iloc[:, 0], test_size=0.25, random_state = 0)

In [5]:
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((150, 3), (50, 3), (150,), (50,))

## Scaling  

experience와 salary의 단위, 평균, 분산이 크게 차이나므로 scaler를 사용해 단위를 맞춰줍니다. 

In [6]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
bias_train = X_train["bias"]
bias_train = bias_train.reset_index()["bias"]
X_train = pd.DataFrame(scaler.fit_transform(X_train), columns = X_train.columns)
X_train["bias"] = bias_train
X_train.head()

Unnamed: 0,bias,experience,salary
0,1,0.187893,-1.143335
1,1,1.185555,0.043974
2,1,-0.310938,-0.351795
3,1,-1.629277,-1.34122
4,1,-1.3086,0.043974


이때 scaler는 X_train에 fit 해주시고, fit한 scaler를 X_test에 적용시켜줍니다.  
똑같이 X_test에다 fit하면 안돼요!

- cf.) 학습할 때와 동일한 기반 설정으로 동일하게 테스트 데이터를 변환해야 하는 것입니다.  학습 데이터에서 Scale된 데이터를 기반으로 Classifier가 학습이 되었기 때문에 이렇게 학습된 Classifier가 예측을 할 때에도 학습 데이터의 Scale 기준으로 테스트 데이터를 변환 한 뒤 predict해야 합니다.

In [7]:
bias_test = X_test["bias"]
bias_test = bias_test.reset_index()["bias"]
X_test = pd.DataFrame(scaler.transform(X_test), columns = X_test.columns)
X_test["bias"] = bias_test
X_test.head()

Unnamed: 0,bias,experience,salary
0,1,-1.344231,-0.615642
1,1,0.50857,0.307821
2,1,-0.310938,0.571667
3,1,1.363709,1.956862
4,1,-0.987923,-0.747565


In [8]:
# parameter 개수
N = len(X_train.loc[0])
N

3

In [9]:
# 초기 parameter들을 임의로 설정해줍니다.
parameters = np.array([random.random() for i in range(N)])
parameters

array([0.52910875, 0.99883984, 0.76909378])

### * LaTeX   

Jupyter Notebook은 LaTeX 문법으로 수식 입력을 지원하고 있습니다.  
http://triki.net/apps/3466  
https://jjycjnmath.tistory.com/117

## Logistic Function

## $p = {1\over{(1+e^{-z})}}$
## $p = {1\over{(1+e^{-X_i\theta})}}$

In [10]:
X_train.iloc[1]

bias          1.000000
experience    1.185555
salary        0.043974
Name: 1, dtype: float64

In [11]:
parameters

array([0.52910875, 0.99883984, 0.76909378])

In [12]:
def logistic(X, parameters):
    z = 0
    for i in range(len(parameters)):
        z += X[i]*parameters[i]

    p = 1/(1+np.exp(-z))
    
    return p

In [13]:
logistic(X_train.iloc[1], parameters)

0.8515877881305174

- To generate probabilities, logistic regression uses a function that gives outputs between 0 and 1 for all values of X. 

## Object Function

Object Function : 목적함수는 Gradient Descent를 통해 최적화 하고자 하는 함수입니다.  
로지스틱 회귀의 목적함수를 작성해주세요
## $l(p) =-\sum{\{y_ilogp(X_i)+(1-y_i)log(1-p(X_i))\}}$

In [14]:
def cross_entropy_i(X, y, parameters) :
    p = logistic(X, parameters)                            # 위에서 작성한 함수를 활용하세요
    loss = y*np.log(p) + (1-y)*np.log(1-p)
    return loss

In [15]:
def cross_entropy(X_set, y_set, parameters) :
    loss = 0
    for i in range(X_set.shape[0]):
        X = X_set.iloc[i, :]
        y = y_set.iloc[i]
        loss += cross_entropy_i(X, y, parameters)
    return -loss

In [16]:
cross_entropy(X_test, y_test, parameters)

70.23175981302765

## Gradient of Cross Entropy

## ${\partial\over{\partial \theta_j}}l(p)= -\sum{(y_i-p_i)x_{ij}}$

In [17]:
# cross_entropy를 theta_j에 대해 미분한 값을 구하는 함수
def get_gradient_ij_cross_entropy(X, y, parameters, j):
    p = logistic(X, parameters)
    gradient = -((y-p) * X[j])
    
    return gradient

In [18]:
get_gradient_ij_cross_entropy(X_train.iloc[0, :], y_train.iloc[0], parameters, 1)

-0.10156516226073803

- 주의! gradient 계산 시, minus(-) 붙여줘야 함. <br>
$\because$ 기울기의 반대방향으로 가기 때문에 (반대방향일 때 내적이 최소)

## Batch Gradient Descent  

Batch Gradient Descent : "..."  
- 개념: 한 epoch 때 모든 데이터의 gradients 확인하여 parameters update


<img width="430" alt="bgd_sgd" src="https://user-images.githubusercontent.com/40483474/89044499-036fef80-d385-11ea-8c6b-6eb6f60d4c15.png">

In [19]:
def get_gradients_bgd(X_train, y_train, parameters) :
    gradients = [0 for i in range(len(parameters))]
    
    for i in range(X_train.shape[0]):
        X = X_train.iloc[i, :]
        y = y_train.iloc[i]
        for j in range(len(parameters)):
            gradients[j] += get_gradient_ij_cross_entropy(X, y, parameters, j)
            
    return gradients

In [20]:
gradients_bgd = get_gradients_bgd(X_train, y_train, parameters)
gradients_bgd

[44.66696557619928, 23.079461565445126, 52.1408468222716]

## Stochastic Gradient Descent  

Stochastic Gradient Descent : "..." 
- 개념: 한 epoch 때 임의의 데이터 하나의 gradients 확인하여 paramters update

In [21]:
def get_gradients_sgd(X_train, y, parameters) :
    gradients = [0 for i in range(len(parameters))]
    r = int(random.random()*X_train.shape[0])
    X = X_train.iloc[r, :]
    y = y_train.iloc[r]
        
    for j in range(len(parameters)):
        gradients[j] = get_gradient_ij_cross_entropy(X, y, parameters, j)
        
    return gradients

In [22]:
gradients_sgd = get_gradients_sgd(X_train, y_train, parameters)
gradients_sgd

[0.5515482484510512, -0.1125410298789458, -0.08488905277571648]

## Update Parameters  

In [23]:
def update_parameters(parameters, gradients, learning_rate) :
    for i in range(len(parameters)) :
        gradients[i] *= learning_rate
    parameters -= gradients[i]
    return parameters

In [24]:
update_parameters(parameters, gradients_bgd, 0.01)

array([0.00770028, 0.47743137, 0.24768531])

## Gradient Descent  

위에서 작성한 함수들을 조합해서 Gradient Descent를 진행하는 함수를 완성해주세요

learning_rate = 0.01
max_iter = 100000
tolerance = 0.0001

In [25]:
def gradient_descent(X_train, y_train, learning_rate=0.01, max_iter=100000, tolerance=0.0001, optimizer="bgd") :
    count = 1
    point = 100 if optimizer == "bgd" else 10000
    N = len(X_train.iloc[0])
    parameters = np.array([random.random() for i in range(N)])
    gradients = [0 for i in range(N)]
    loss = cross_entropy(X_train, y_train, parameters)

    while count < max_iter :
        
        if optimizer == "bgd" :
            gradients = get_gradients_bgd(X_train, y_train, parameters)
        elif optimizer == "sgd" :
            gradients = get_gradients_sgd(X_train, y_train, parameters)
            # loss, 중단 확인

#         if count % 10 == 0:
#             print("---그라디언트!---")
#             print(gradients)
#             print("---엔트로피!---")
#             print(cross_entropy(X_train, y_train, parameters))
#             print("")
        if count%point == 0 :
            new_loss = cross_entropy(X_train, y_train, parameters)
            print(count, "loss: ",new_loss, "params: ", parameters, "gradients: ", gradients)
            
#             중단 조건
            if abs(new_loss-loss) < tolerance/len(y_train) :
                break
            loss = new_loss
        
        parameters = update_parameters(parameters, gradients, learning_rate)
        count += 1
    return parameters

In [26]:
new_param_bgd = gradient_descent(X_train, y_train)
new_param_bgd

100 loss:  118.03170465346912 params:  [ 0.19692199 -0.30696677 -0.16174474] gradients:  [40.07670561556923, -33.522520413595885, -1.9984014443252818e-15]
200 loss:  118.03170465346912 params:  [ 0.19692199 -0.30696677 -0.16174474] gradients:  [40.07670561556923, -33.522520413595885, -1.9984014443252818e-15]


array([ 0.19692199, -0.30696677, -0.16174474])

In [27]:
cross_entropy(X_test, y_test, new_param_bgd)

37.52586355854357

## Hyper Parameter Tuning

Hyper Parameter들을 매번 다르게 해서 학습을 진행해 보세요. 다른 점들을 발견할 수 있습니다.

In [28]:
new_param_sgd = gradient_descent(X_train, y_train, learning_rate=0.01, max_iter=100000, tolerance=0.0001, optimizer="sgd")
new_param_sgd

10000 loss:  96.6892089182223 params:  [-0.15205674  0.0005528  -0.42129498] gradients:  [-0.40507062637437286, 0.6022404399121228, 0.5165696457593445]
20000 loss:  96.70073304570573 params:  [-0.1125669   0.04004265 -0.38180514] gradients:  [0.5204884125288339, -0.19893047564965333, -0.2861022095812832]
30000 loss:  96.75561131683565 params:  [-0.16898426 -0.01637472 -0.4382225 ] gradients:  [0.4035354623172334, 0.003930065371905254, 0.20407024222544376]
40000 loss:  96.72110674526039 params:  [-0.10709388  0.04551567 -0.37633212] gradients:  [0.42063537318021865, 0.019084174129252035, 0.24046356281902997]
50000 loss:  96.70099640455305 params:  [-0.11248723  0.04012231 -0.38172547] gradients:  [0.35227810048464575, 0.43019708306567617, 0.5034648328860628]
60000 loss:  96.84310042935813 params:  [-0.08602366  0.06658589 -0.35526189] gradients:  [0.4607422337125833, -0.061179235105998146, 0.08104349249790588]
70000 loss:  96.66704021356806 params:  [-0.13910549  0.01350406 -0.40834373]

array([-0.12184227,  0.03076727, -0.39108051])

In [29]:
cross_entropy(X_test, y_test, new_param_sgd)

31.025869575046688

- Q. epoch 100번에 한 번 loss 값 출력되야 하는데, 왜 10000이 나온거지? <br>
-> bgd가 아니면 point 10000으로 설정함

## Predict Label

In [30]:
y_predict = []
for i in range(len(y_test)):
    p = logistic(X_test.iloc[i,:], new_param_bgd)
    if p> 0.5 :
        y_predict.append(1)
    else :
        y_predict.append(0)

## Confusion Matrix

In [31]:
from sklearn.metrics import *
tn, fp, fn, tp = confusion_matrix(y_test, y_predict).ravel()
confusion_matrix(y_test, y_predict)

array([[21, 19],
       [ 6,  4]], dtype=int64)

In [32]:
accuracy = (tn+tp) / (tn+tp+fn+fp)
accuracy

0.5