## 題目
### Use LSTM & CNN model to classify customized candlestick pattern (at least 3 classes)

* 所有檔案: candlestick_lstm_R09723057_蔡易辰.py、candlestick_cnn_R09723057_蔡易辰.py
* 此處使用本機連結，使用請更改連結

#### 1. Use LSTM model to classify customized candlestick pattern
* candlestick_lstm_R09723057_蔡易辰.py

In [1]:
from sklearn.metrics import confusion_matrix
import pickle
import keras
from keras.layers import LSTM
from keras.layers import Dense, Activation, Conv2D, MaxPool2D, Dropout, Flatten
from keras.datasets import mnist
from keras.models import Sequential
from keras.optimizers import Adam


def load_pkl(pkl_name):
    # load data from data folder
    with open(pkl_name, 'rb') as f:
        data = pickle.load(f)
    return data

def lstm_preprocess(x_train, x_test, y_train, y_test, n_step, n_input, n_classes):
    x_train = x_train.reshape(-1, n_step, n_input)
    x_test = x_test.reshape(-1, n_step, n_input)
    x_train = x_train.astype('float32')
    x_test = x_test.astype('float32')
    x_train /= 255
    x_test /= 255
    y_train = keras.utils.to_categorical(y_train, n_classes)
    y_test = keras.utils.to_categorical(y_test, n_classes)
    return (x_train, x_test, y_train, y_test)

def lstm_model(n_input, n_step, n_hidden, n_classes):
    model = Sequential()
    model.add(LSTM(n_hidden, batch_input_shape=(None, n_step, n_input), unroll=True))
    model.add(Dense(n_classes))
    model.add(Activation('softmax'))
    return model



def train_lstm(model, x_train, y_train, x_test, y_test, 
        learning_rate, training_iters, batch_size):
    adam = Adam(lr=learning_rate)
    model.summary()
    model.compile(optimizer=adam,
        loss='categorical_crossentropy', metrics=['accuracy'])
    model.fit(x_train, y_train,
        batch_size=batch_size, epochs=training_iters,
        verbose=1, validation_data=(x_test, y_test))

def print_result(data, x_train, x_test, model):
    # get train & test pred-labels
    train_pred = model.predict_classes(x_train)
    test_pred = model.predict_classes(x_test)
    # get train & test true-labels
    train_label = data['train_label'][:, 0]
    test_label = data['test_label'][:, 0]
    # confusion matrix
    train_result_cm = confusion_matrix(train_label, train_pred, labels=range(9))
    test_result_cm = confusion_matrix(test_label, test_pred, labels=range(9))
    print(train_result_cm, '\n'*2, test_result_cm)

def candlestick_lstm_main(iters):
    # training parameters
    learning_rate = 0.001
    training_iters = iters
    batch_size = 128

    # model parameters
    n_input = 40
    n_step = 10
    n_hidden = 256
    n_classes = 10
    
    #此處連結改成本機連結
    data = load_pkl('C:\\Users\\TsaiYiChen\\Desktop\\ntu_financial_innovation\\label8_eurusd_10bar_1500_500_val200_gaf_culr.pkl')
    x_train, y_train, x_test, y_test = data['train_gaf'], data['train_label'][:, 0], data['test_gaf'], data['test_label'][:, 0]
    x_train, x_test, y_train, y_test = lstm_preprocess(x_train, x_test, y_train, y_test, n_step, n_input, n_classes)

    model = lstm_model(n_input, n_step, n_hidden, n_classes)
    train_lstm(model, x_train, y_train, x_test, y_test, learning_rate, 
               training_iters, batch_size)
    scores = model.evaluate(x_test, y_test, verbose=0)
    print('LSTM test accuracy:', scores[1])
    print_result(data, x_train, x_test, model)

Using TensorFlow backend.


##### LSTM - 10 iterations
* 發現第九次效果最佳
* The ninth iteration has the best result with accuracy equals to 0.7906

In [2]:
candlestick_lstm_main(10)

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (None, 256)               304128    
_________________________________________________________________
dense_1 (Dense)              (None, 10)                2570      
_________________________________________________________________
activation_1 (Activation)    (None, 10)                0         
Total params: 306,698
Trainable params: 306,698
Non-trainable params: 0
_________________________________________________________________
Train on 15000 samples, validate on 5000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
LSTM test accuracy: 0.7645999789237976
[[1517  301  280  168   56  264  167  199   48]
 [  27 1347    0  126    0    0    0    0    0]
 [ 177    0 1320    0    3    0    0    0    0]
 [  11   45    0 1317    0    0    

##### LSTM - 50 iterations
* The 31th iteration has the best result with accuracy equals to 0.8752
* Generally, 50 iterations does better than 10 iterations

In [3]:
candlestick_lstm_main(50)

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_2 (LSTM)                (None, 256)               304128    
_________________________________________________________________
dense_2 (Dense)              (None, 10)                2570      
_________________________________________________________________
activation_2 (Activation)    (None, 10)                0         
Total params: 306,698
Trainable params: 306,698
Non-trainable params: 0
_________________________________________________________________
Train on 15000 samples, validate on 5000 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Ep

#### 2. Use CNN model to classify customized candlestick pattern
* candlestick_cnn_R09723057_蔡易辰.py

In [4]:
from sklearn.metrics import confusion_matrix
import numpy as np
import pickle

from keras import backend as K
from keras import optimizers
from keras.models import Sequential
from keras.layers import Dense, Flatten, Conv2D, Activation, MaxPool2D


def load_pkl(pkl_name):
    # load data from data folder
    with open(pkl_name, 'rb') as f:
        data = pickle.load(f)
    return data

def get_cnn_model(params):
    model = Sequential()
    model.add(Conv2D(filters=32, kernel_size=(5,5), padding='same', activation='relu', input_shape=(10, 10, 4)))
    model.add(Conv2D(filters=48, kernel_size=(5,5), padding='valid', activation='relu'))
    model.add(Flatten())
    model.add(Dense(256, activation='relu'))
    model.add(Dense(84, activation='relu'))
    model.add(Dense(9, activation='softmax'))
    return model

def train_model(params, data):
    model = get_cnn_model(params)
    model.compile(loss='categorical_crossentropy', optimizer=params['optimizer'], metrics=['accuracy'])
    hist = model.fit(x=data['train_gaf'], y=data['train_label_arr'],
                     batch_size=params['batch_size'], epochs=params['epochs'], verbose=2)
    return (model, hist)

def print_result(data, model):
    # get train & test pred-labels
    train_pred = model.predict_classes(data['train_gaf'])
    test_pred = model.predict_classes(data['test_gaf'])
    # get train & test true-labels
    train_label = data['train_label'][:, 0]
    test_label = data['test_label'][:, 0]
    # confusion matrix
    train_result_cm = confusion_matrix(train_label, train_pred, labels=range(9))
    test_result_cm = confusion_matrix(test_label, test_pred, labels=range(9))
    print(train_result_cm, '\n'*2, test_result_cm)

In [5]:
PARAMS = {}
#改成本機連結
PARAMS['pkl_name'] = 'C:\\Users\\TsaiYiChen\\Desktop\\ntu_financial_innovation\\label8_eurusd_10bar_1500_500_val200_gaf_culr.pkl'
PARAMS['classes'] = 9
PARAMS['lr'] = 0.01
PARAMS['epochs'] = 10
PARAMS['batch_size'] = 64
PARAMS['optimizer'] = optimizers.SGD(lr=PARAMS['lr'])

# ---------------------------------------------------------
# load data & keras model
data = load_pkl(PARAMS['pkl_name'])
# train cnn model
model, hist = train_model(PARAMS, data)
# train & test result
scores = model.evaluate(data['test_gaf'], data['test_label_arr'], verbose=0)
print('CNN test accuracy:', scores[1])
print_result(data, model)

Epoch 1/10
 - 7s - loss: 1.5741 - accuracy: 0.4170
Epoch 2/10
 - 6s - loss: 0.7894 - accuracy: 0.7139
Epoch 3/10
 - 6s - loss: 0.5994 - accuracy: 0.7837
Epoch 4/10
 - 6s - loss: 0.5308 - accuracy: 0.8045
Epoch 5/10
 - 6s - loss: 0.4904 - accuracy: 0.8217
Epoch 6/10
 - 6s - loss: 0.4649 - accuracy: 0.8292
Epoch 7/10
 - 6s - loss: 0.4430 - accuracy: 0.8419
Epoch 8/10
 - 6s - loss: 0.4317 - accuracy: 0.8443
Epoch 9/10
 - 6s - loss: 0.4134 - accuracy: 0.8524
Epoch 10/10
 - 6s - loss: 0.4036 - accuracy: 0.8503
CNN test accuracy: 0.8086000084877014
[[2438   61   75   43   73  116  115   11   68]
 [  80 1417    0    2    0    0    1    0    0]
 [ 128    0 1359    0   13    0    0    0    0]
 [ 326   25    0 1088    0    2    0   59    0]
 [ 100    0   22    0 1253    0    5    0  120]
 [  93    2    0    0    0 1400    2    3    0]
 [  93    1    1    0    1    0 1376    0   28]
 [ 727    0    0   88    0   83    0  602    0]
 [  70    0    2    0  117    0   48    0 1263]] 

 [[817  20  29  

- Under 10 iterations, CNN does better than LSTM

## Reference

- https://github.com/pecu/FinTech_CommonWealth_Magazine/tree/master/Financial_Innovation/FiancailVision/HW2_ID_%E5%A7%93%E5%90%8D
- 
[Keras:基于Python的深度学习库](https://keras-cn.readthedocs.io/en/latest/)
- 
[Keras資料庫](https://keras.io/)