# 使用Scikit-learn實作人工類神經網路

## 介紹

##  Cite Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.

Class  <font size="3" color="red">MLPClassifier</font> implements a multi-layer perceptron (MLP) algorithm that trains using Backpropagation.

MLP trains on two arrays: array X of size (n_samples, n_features), which holds the training samples represented as floating point feature vectors; and array y of size (n_samples), which holds the target values (class labels) for the training samples:

## IMPORT
<font size="3" color="red">from sklearn.neural_network import MLPClassifier</font> 
## CLASS
```C
MLPClassifier(hidden_layer_sizes=(100, ), 
              activation=’relu’, 
              solver=’adam’, 
              alpha=0.0001, 
              batch_size=’auto’, 
              learning_rate=’constant’, 
              learning_rate_init=0.001, 
              power_t=0.5, 
              max_iter=200, 
              shuffle=True, 
              random_state=None, 
              tol=0.0001, 
              verbose=False, 
              warm_start=False, 
              momentum=0.9, 
              nesterovs_momentum=True, 
              early_stopping=False, 
              validation_fraction=0.1, 
              beta_1=0.9, 
              beta_2=0.999, 
              epsilon=1e-08)
```              
```C              
參數說明

hidden_layer_sizes=(100, 2)  # 隱藏層的數量, 此範例是第一層100個nodes，第二層2個nodes
default 100


activation='relu'  # 激勵函數
default relu {identity, logistic, tanh, relu}

# identity, no-op activation, useful to implement linear bottleneck, returns f(x) = x
# logistic, the logistic sigmoid function, returns f(x) = 1 / (1 + exp(-x)).
# tanh, the hyperbolic tan function, returns f(x) = tanh(x).
# relu, the rectified linear unit function, returns f(x) = max(0, x)

solver='adam' #  The solver for weight optimization(使 LOSS最小之最佳化方式).
default adam {lbfgs, adam, sgd}

# ‘lbfgs’ is an optimizer in the family of quasi-Newton methods.
# ‘sgd’ refers to stochastic gradient descent.
# ‘adam’ refers to a stochastic gradient-based optimizer proposed by Kingma, Diederik, and Jimmy Ba
# 建議小型的資料集使用L-BFGS (要計算Hessian逆矩阵)
# 資料集	方式
#  小     lbfgs
#  大	 adam/sgd

alpha = 1e-05  #L2 penalty (regularization term) parameter( L2正規化參數，可降低過擬合的風險)
default 0.0001

batch_size  # Size of minibatches for stochastic optimizers
default auto(200, n_samples)
#如果solver是lbfgs，則不考慮
    
learning_rate='constant' #Learning rate schedule for weight updates (學習速率參數變化方式)
default constant {constant, invscaling, adaptive}
# Only used when solver=‘sgd’.
# constant:依learning_rate_init之設定，不改變
# invscaling:逐漸減小，effective_learning_rate = learning_rate_init / pow(t, power_t)
# adaptive:只要Cost function保持下降，那學習效率就會保持不變。但是當不能有效降低或當early_stopping=on，不能增加驗證分數的時候，那學習效率就會調整
    
learning_rate_init==0.001 #學習速率參數
default 0.001
# Only used when solver=’sgd’ or ‘adam’.

power_t=0.5  #反縮放學習效率的指數，當learning_rate=invscaling時用來更新學習效率用。
default 0.5
# Only used when solver=’sgd’.

max_iter=200  #最大迭代次數，看是先到tol還是先到max_iter。
default 200


shuffle #每次的迭次是否要亂數洗牌。
default True {True, False}
#Only used when solver=’sgd’ or ‘adam’.

random_state=1 #隨機數種子
default None

tol  #Tolerance for the optimization.假如連續兩次的迭代無法降低成本函數，或是得分無法增加，除非learning_rate=‘adaptive’，不然就當做已經收斂完成而結束。
default 0.0001

verbose=0  #過程是否輸出
default 0

# 0 不輸出
# 1 偶爾輸出
# 2 一定輸出

warm_start 如果你想做更多的監控來了解模型走向的話，就可以自己寫for來配合使用。
default False {True, False}


momentum  # Momentum for gradient descent update. 配合sgd的一個動量設置，(0-1)
default 0.9

nesterovs_momentum #nesterovs_momentum是momentum的一個改良。
default True {True, False}

#只用於sgd與momentum>0

early_stopping
default False {True, False}
# 設置用於當驗證得分沒有改善的時候是否要提早結束。
如果設置True的話，則會自動拿10%數據集來做驗證，當最後兩次的迭代都沒有改善的時候就會停止。
# Only effective when solver=’sgd’ or ‘adam’

validation_fraction  # 驗證數據比例，early_stopping=True的時候有效。
default 0.1

beta_1
default 0.9(0-1)
# Only used when solver=’adam’

beta_2
default 0.999(0-1)
# Only used when solver=’adam’

epsilon
default 1e-8
# Only used when solver=’adam’
```

## 簡單的範例

In [1]:
from sklearn.neural_network import MLPClassifier

In [2]:
X = [[0., 0.], [1., 1.]]
y = [0, 1]

In [3]:
clf = MLPClassifier(solver='lbfgs', alpha=1e-5,
                    hidden_layer_sizes=(5, 2), random_state=1)

In [4]:
clf.fit(X, y)

MLPClassifier(alpha=1e-05, hidden_layer_sizes=(5, 2), random_state=1,
              solver='lbfgs')

In [5]:
clf.predict([[2., 2.], [-2.,-2.]])

array([1, 0])

MLP can fit a non-linear model to the training data. clf.coefs_ contains the weight matrices that constitute the model parameters:

In [6]:
[coef.shape for coef in clf.coefs_]

[(2, 5), (5, 2), (2, 1)]

In [7]:
clf.coefs_[0]

array([[-0.14196276, -0.02104562, -0.85522848, -3.51355396, -0.60434709],
       [-0.69744683, -0.9347486 , -0.26422217, -3.35199017,  0.06640954]])

In [8]:
clf.coefs_[1]

array([[ 0.29164405, -0.14147894],
       [ 2.39665167, -0.6152434 ],
       [-0.51650256,  0.51452834],
       [ 4.0186541 , -0.31920293],
       [ 0.32903482,  0.64394475]])

In [9]:
clf.coefs_[2]

array([[-4.53025854],
       [-0.86285329]])

In [10]:
clf.predict_proba([[2., 2.], [-2., -2.]]) 

array([[1.96718015e-004, 9.99803282e-001],
       [1.00000000e+000, 2.80288501e-171]])

MLPClassifier supports multi-class classification by applying Softmax as the output function.

Further, the model supports multi-label classification in which a sample can belong to more than one class. For each class, the raw output passes through the logistic function. Values larger or equal to 0.5 are rounded to 1, otherwise to 0. For a predicted output of a sample, the indices where the value is 1 represents the assigned classes of that sample:

In [11]:
X = [[0., 0.], [1., 1.]]
y = [[0, 1], [1, 1]]
clf = MLPClassifier(solver='lbfgs', 
                    alpha=1e-5,
                    hidden_layer_sizes=(15, ), 
                    random_state=1)   

In [12]:
clf.fit(X, y)

MLPClassifier(alpha=1e-05, hidden_layer_sizes=(15,), random_state=1,
              solver='lbfgs')

In [13]:
clf.predict([[1., 1.]])

array([[1, 1]])

In [14]:
clf.predict([[0., 0.]])

array([[0, 1]])

## 使用ANN預測股市漲跌
讀入上一次處理好的股市漲跌資料來進行預測

In [15]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [16]:
market = pd.read_csv('market.csv')
market.head()

Unnamed: 0,Date,^TWII,^AORD,^AXJO,^BFX,^BSESN,^BUK100P,^BVSP,^DJI,^FCHI,...,^MXX,^N225,^NYA,^NZ50,^RUT,^STOXX50E,^TA125.TA,^XAX,000001.SS,IMOEX.ME
0,2015-04-15,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0
1,2015-04-27,1.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,...,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0
2,2015-04-29,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0
3,2015-05-06,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,...,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
4,2015-05-11,0.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [17]:
from sklearn.model_selection import train_test_split
train_x, test_x, train_y, test_y = train_test_split(
    market.iloc[:, 2:], market['^TWII'], 
    train_size=0.8, 
    random_state=1)

In [18]:
mlp = MLPClassifier(hidden_layer_sizes=(50,), max_iter=20, alpha=1e-4,
                    solver='sgd', verbose=10, tol=1e-4, random_state=1,
                    learning_rate_init=.1)

mlp.fit(train_x, train_y)
print("Training set score: %f" % mlp.score(train_x, train_y))
print("Test set score: %f" % mlp.score(test_x, test_y))

Iteration 1, loss = 0.67819994
Iteration 2, loss = 0.66690368
Iteration 3, loss = 0.65170263
Iteration 4, loss = 0.63698058
Iteration 5, loss = 0.61959334
Iteration 6, loss = 0.60526840
Iteration 7, loss = 0.59336309
Iteration 8, loss = 0.58504754
Iteration 9, loss = 0.57799943
Iteration 10, loss = 0.57284179
Iteration 11, loss = 0.56872590
Iteration 12, loss = 0.56472188
Iteration 13, loss = 0.56818511
Iteration 14, loss = 0.55847016
Iteration 15, loss = 0.55556555
Iteration 16, loss = 0.55986945
Iteration 17, loss = 0.55402597
Iteration 18, loss = 0.54808346
Iteration 19, loss = 0.54577976
Iteration 20, loss = 0.54305632
Training set score: 0.736041
Test set score: 0.636364




## 改變感知機數量提高預測分數

In [19]:
mlp = MLPClassifier(hidden_layer_sizes=(100, 100, 100, 100, 100, 100), 
                    max_iter=100000000,
                    alpha=1e-4,
                    solver='sgd', 
                    tol=1e-4, 
                    random_state=1,
                    learning_rate_init=.1)

mlp.fit(train_x, train_y)
mlp.score(test_x, test_y)

0.6565656565656566

In [20]:
mlp = MLPClassifier(hidden_layer_sizes=(1000, 500, 500, 500, 500), 
                    max_iter=100000000,
                    alpha=1e-4,
                    solver='sgd', 
                    tol=1e-4, 
                    random_state=1,
                    learning_rate_init=.1)

mlp.fit(train_x, train_y)
mlp.score(test_x, test_y)

0.6767676767676768

In [21]:
mlp = MLPClassifier(hidden_layer_sizes=(100, 300, 300, 100, 300, 500), 
                    max_iter=100000000,
                    alpha=1e-4,
                    solver='sgd', 
                    tol=1e-4, 
                    random_state=1,
                    learning_rate_init=.1)

mlp.fit(train_x, train_y)
mlp.score(test_x, test_y)

0.7171717171717171

## 請嘗試改變 hidden layer, node number, learning rate, normalization, or changing activaton function to improve your ANN models.

In [25]:
mlp = MLPClassifier(hidden_layer_sizes=(200, 300, 300, 100, 300, 500), 
                    max_iter=100000000,
                    alpha=1e-4,
                    solver='sgd', 
                    tol=1e-4, 
                    random_state=1,
                    learning_rate_init=.1)

mlp.fit(train_x, train_y)
mlp.score(test_x, test_y)

0.6161616161616161