## **環境準備**
### **確認工作目錄**
因為有自定義的module，所以必須要確認當前的工作目錄，與自定義的python檔案相同。否則匯入module時將會出錯。
- 使用 `os` 套件的 `getcwd` function 來確認工作目錄
- 使用 `os` 套件的 `chdir`function 來變更工作目錄
- 使用 `google.colab.drive` 的 `mount` 來連接google雲端 (僅限於Google colab上面跑)

In [1]:
import os

# 確認目前的工作目錄
print("當前工作目錄：", os.getcwd())

當前工作目錄： /content


In [2]:
# 掛載雲端硬碟 Mount google drive in CoLab enviroment
from google.colab import drive
drive.mount('gdrive')

Drive already mounted at gdrive; to attempt to forcibly remount, call drive.mount("gdrive", force_remount=True).


In [3]:
os.chdir("/content/gdrive/MyDrive/Colab Notebooks/cs229")
print("當前工作目錄：", os.getcwd())

當前工作目錄： /content/gdrive/MyDrive/Colab Notebooks/cs229


### **載入需要的模組**

In [4]:
import warnings
import numpy as np
import pandas as pd
import evaluation as eval
from util import get_clean_data
from LogisticRegression import RegressionModel
from svm_model import SVMModel

## **定義資料的特徵與標籤**

### **定義 features 和 labels**

In [5]:
# 先定義特徵欄位與標籤欄位
features = [
    'EMA10','EMA12','EMA20','EMA26','EMA50','EMA100','EMA200',
    'SMA5','SMA10','SMA15','SMA20','SMA50','SMA100','SMA200',
    'EMA10Cross','EMA12Cross','EMA20Cross','EMA26Cross','EMA50Cross','EMA100Cross','EMA200Cross',
    'MACD','Volume','Price',
    'Up-Down5','Up-Down10','Up-Down15','Up-Down20','Up-Down50','Up-Down100',
    'SMA5Cross','SMA10Cross','SMA15Cross','SMA20Cross','SMA50Cross','SMA100Cross','SMA200Cross'
]
regularized_features = [
        'SMA5','SMA15','SMA20','SMA200',
        'EMA10Cross','EMA20Cross','EMA26Cross','EMA50Cross','EMA100Cross','EMA200Cross',
        'MACD','Volume','Price',
        'Up-Down10','Up-Down15','Up-Down50','Up-Down100',
        'SMA5Cross','SMA10Cross','SMA15Cross','SMA20Cross','SMA50Cross','SMA100Cross','SMA200Cross'
]
label = ['Class']

### **取得資料**
- 使用的 `get_clean_data()` function 定義於 `util.py` 中。

In [6]:
# 取得資料 (訓練集、驗證集與測試集)
df_train, df_valid, df_test = get_clean_data()

### **定義資料集中的 X 與 Y**
- 使用了 `pandas` 套件中 dataframe 的 `.value()` method
- 使用了 `numpy` 套件中的 `.ravel()` method 進行降維

In [7]:
# 定義訓練集的 X 與 Y
Xtrain = df_train[regularized_features].values
Ytrain = df_train[label].values.ravel()

# 定義驗證集的 X 與 Y
Xvalid = df_valid[regularized_features].values
Yvalid = df_valid[label].values.ravel()

# 定義測試集的 X 與 Y
Xtest = df_test[regularized_features].values
Ytest = df_test[label].values.ravel()

## **選擇模型與訓練**

### **邏輯斯模型**
定義於 `LogisticRegression.py` 中。
- Logistic 迴歸
- Ridge 迴歸
- Lasso 迴歸

In [8]:
logistic = RegressionModel(1)
logistic.train(Xtrain, Ytrain)
print("trained")

trained


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [9]:
logistic_ridge = RegressionModel(2)
logistic_ridge.train(Xtrain, Ytrain)
print("trained")

trained


In [10]:
logistic_lasso = RegressionModel(3)
logistic_lasso.train(Xtrain, Ytrain)
print("trained")

trained


### **SVM**
定義於 `svm_model.py` 中。

如果查看該檔案，會發現 1 ~ 4 分別對應到的 kernel 為 linear, polinomial, Gaussian Radial Basis Function 和 sigmoid kernel。

關於 kernel 是什麼可以看這裡：<br>
https://chih-sheng-huang821.medium.com/%E6%A9%9F%E5%99%A8%E5%AD%B8%E7%BF%92-kernel-%E5%87%BD%E6%95%B8-47c94095171

In [11]:
# 指定模型
svm_linear = SVMModel(1)
svm_poly = SVMModel(2)
svm_rbf = SVMModel(3)
svm_sigmoid = SVMModel(4)

In [12]:
# 訓練模型
svm_linear.train(Xtrain, Ytrain)
print("Finish SVM Linear")

Finish SVM Linear


In [None]:
svm_poly.train(Xtrain, Ytrain)
print("Finish SVM Poly")

[LibSVM]

In [None]:
svm_rbf.train(Xtrain, Ytrain)
print("Finish SVM Rbf")

In [None]:
svm_sigmoid.train(Xtrain, Ytrain)
print("Finish SVM Sigmoid")

## **預測**
對模型使用 `.predict()` method 來進行預測。依模型不同，分別定義於 `LogisticRegression.py` 和 `svm_model.py` 當中。

In [None]:
all_pred = {}

# Regression 
# all_pred['logistic_pred'] = np.array(logistic.predict(Xtest)[:,1])
all_pred['logistic_ridge_pred'] = np.array(logistic_ridge.predict(Xtest))
all_pred['logistic_lasso_pred'] = np.array(logistic_lasso.predict(Xtest))

# # SVM
# all_pred['svm_linear_pred'] = np.array(svm_linear.predict(Xvalid))
# all_pred['svm_poly_pred'] = np.array(svm_poly.predict(Xvalid))
# all_pred['svm_rbf_pred'] = np.array(svm_rbf.predict(Xvalid))
# all_pred['svm_sigmoid_pred'] = np.array(svm_sigmoid.predict(Xvalid))

# SVM
all_pred['svm_linear_pred'] = np.array(svm_linear.predict(Xtest))
# all_pred['svm_poly_pred'] = np.array(svm_poly.predict(Xtest))
# all_pred['svm_rbf_pred'] = np.array(svm_rbf.predict(Xtest))
# all_pred['svm_sigmoid_pred'] = np.array(svm_sigmoid.predict(Xtest))

## **評估模型**
使用 accuracy 和 F1 score 對結果進行評估。

In [None]:
for key, pred in all_pred.items():
    print('Accuracy Score of', key)
    # print(eval.accuracy(prediction = pred, true_class = Yvalid))  原始內容
    print(eval.accuracy(prediction = pred, true_class = Ytest))   
    print('F1 Score of', key)
    # print(eval.f1score(prediction = pred, true_class = Yvalid, average='macro'))  原始內容
    print(eval.f1score(prediction = pred, true_class = Ytest, average='macro'))
    print('='*50)