<a href="https://colab.research.google.com/github/qcore-info/advent-calendar-2019/blob/master/Qore%E3%82%B5%E3%83%B3%E3%83%97%E3%83%AB1_with_UCI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## QoreSDKの導入

1. [Advent Calenderの公式Github](https://github.com/qcore-info/advent-calendar-2019)からバイナリパッケージをダウンロードしてください。  


2. サイドバーのファイルメニューへバイナリをドラッグしてください
 ![](https://drive.google.com/uc?export=view&id=1ycgCTyCnDd6Gl5JDkhX6hwcui3BzMcKD)  

3. pipでqore_sdkを導入します。
https://drive.google.com/file/d/1VbZ6nPqXLggZgAtck31qWyepXNt7VErr/view?usp=sharing

In [0]:
!pip install ./qore_sdk-0.1.0-cp36-cp36m-linux_x86_64.whl

Processing ./qore_sdk-0.1.0-cp36-cp36m-linux_x86_64.whl
Installing collected packages: qore-sdk
Successfully installed qore-sdk-0.1.0


# 必要なライブラリを読み込む

In [0]:
from qore_sdk.client import WebQoreClient
from sklearn import model_selection
from sklearn.metrics import accuracy_score, f1_score
from sklearn.neural_network import MLPClassifier
from sklearn.linear_model import LogisticRegression
import time
import numpy as np
import json
import os

# データの準備

今回はUCIが提供しているJapanese Vowelというデータセットを使います。  
これは、9人の日本人の母音から、個人を特定するタスクです。<br>  
今回はこれをJson形式に変換したファイルを使用します。

*UCI Machine Learning Repository: Japanese Vowels Dataset. https://archive.ics.uci.edu/ml/datasets/Japanese+Vowels*

In [0]:
!mkdir data
%cd /content/data/
!curl gdrive.sh | bash -s  1YAtEGe-_xTMDhWeSBvXHLpQTKtzzRBKU
!curl gdrive.sh | bash -s  196nFe8vB-TFWjPg1NGX3ptAZfNALnVot
!curl gdrive.sh | bash -s  1g0UFllMm7m7DXVoyHOLIigwVFvzSIX2e
!curl gdrive.sh | bash -s  1EqyotynOrxEJwwCxO-sZ75etHZDzFyHO

/content/data
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2874  100  2874    0     0   1623      0  0:00:01  0:00:01 --:--:--  1623
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   388    0   388    0     0    441      0 --:--:-- --:--:-- --:--:--   441
100 1885k  100 1885k    0     0  1285k      0  0:00:01  0:00:01 --:--:-- 13.7M
curl: Saved to filename 'jpvow_train_x.json'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2874  100  2874    0     0  95800      0 --:--:-- --:--:-- --:--:-- 95800
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left 

In [0]:
%cd /content/data/
with open("jpvow_train_x.json", "r") as f:
    X_train = json.load(f)
with open("jpvow_train_y.json", "r") as f:
    y_train = json.load(f)
with open("jpvow_test_x.json", "r") as f:
    X_test = json.load(f)
with open("jpvow_test_y.json", "r") as f:
    y_test = json.load(f)

X_train = np.array(X_train)
y_train = np.array(y_train)
X_test = np.array(X_test)
y_test = np.array(y_test)


/content/data


このままのデータで学習しても構わないが、  
ラベルが順番に並んでいるなどの偏りが見られるため、データ全体を結合しシャッフルする。

In [0]:
data = np.concatenate((X_train, X_test), axis=0)
target = np.concatenate((y_train, y_test), axis=0)
X_train, X_test, y_train, y_test = model_selection.train_test_split(
    data, target, test_size=0.2, random_state=1
)

print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

(512, 29, 12)
(512, 1)
(128, 29, 12)
(128, 1)


# Qoreクライアントを準備する
事前に発行されたユーザーネーム、パスワード、Endpointが必要  
詳しくは[Advent Calenderの公式Github](https://github.com/qcore-info/advent-calendar-2019)を参照

In [0]:
client = WebQoreClient(username="", 
                       password="", 
                       endpoint="")

学習を行う

In [0]:
start = time.time()
res = client.classifier_train(X=X_train, Y=y_train)
print(res)

{'res': 'ok', 'train_time': 1.2235112190246582}


`
classifier_test
`を用いると、精度が簡単に求められて便利




In [0]:
res = client.classifier_test(X=X_test, Y=y_test)
print(res)

{'accuracy': 0.9921875, 'f1': 0.9922253787878788, 'res': 'ok'}


最後には推論もしてみる

In [0]:
res = client.classifier_predict(X=X_test)
print("acc=", accuracy_score(y_test.tolist(), res["Y"]))
print("f1=", f1_score(y_test.tolist(), res["Y"], average="weighted"))
elapsed_time = time.time() - start
print("elapsed_time:{0}".format(elapsed_time) + "[sec]")
print(res['Y'])


acc= 0.9921875
f1= 0.9921496212121212
elapsed_time:3.022348165512085[sec]
[5, 9, 7, 1, 3, 3, 4, 9, 9, 3, 8, 2, 1, 6, 9, 7, 3, 4, 6, 9, 1, 4, 8, 1, 8, 3, 7, 7, 8, 4, 8, 4, 7, 2, 6, 7, 3, 9, 4, 2, 8, 3, 7, 6, 5, 4, 2, 1, 8, 7, 2, 7, 3, 6, 5, 2, 5, 7, 1, 4, 2, 4, 8, 2, 7, 1, 8, 9, 3, 7, 4, 6, 8, 8, 3, 7, 3, 1, 6, 2, 3, 8, 7, 9, 8, 3, 7, 2, 4, 5, 3, 2, 6, 3, 5, 8, 3, 8, 6, 9, 8, 3, 6, 1, 9, 2, 3, 7, 6, 3, 4, 9, 5, 8, 8, 3, 3, 3, 1, 8, 5, 3, 9, 4, 7, 4, 1, 8]


ちなみに、サーバーはAWSのMediumで動いている  
メモリサイズは1GB

# 参考
単純な線形回帰、簡単な深層学習と比較する

In [0]:
X_train = X_train.reshape(len(X_train), -1).astype(np.float64)
X_test = X_test.reshape(len(X_test), -1).astype(np.float64)
y_train = np.ravel(y_train)
y_test = np.ravel(y_test)

print("===LogisticRegression(Using Sklearn)===")
start = time.time()
lr_cls = LogisticRegression(C=9.0)
lr_cls.fit(X_train, y_train)
elapsed_time = time.time() - start
print("elapsed_time:{0}".format(elapsed_time) + "[sec]")
res = lr_cls.predict(X=X_test)
print("acc=", accuracy_score(y_test.tolist(), res))
print("f1=", f1_score(y_test.tolist(), res, average="weighted"))

print("===MLP(Using Sklearn)===")
start = time.time()
mlp_cls = MLPClassifier(hidden_layer_sizes=(100, 100, 100, 10))
mlp_cls.fit(X_train, y_train)
elapsed_time = time.time() - start
print("elapsed_time:{0}".format(elapsed_time) + "[sec]")
res = mlp_cls.predict(X=X_test)
print("acc=", accuracy_score(y_test.tolist(), res))
print("f1=", f1_score(y_test.tolist(), res, average="weighted"))

===LogisticRegression(Using Sklearn)===




elapsed_time:0.21763110160827637[sec]
acc= 0.9765625
f1= 0.9761245153216563
===MLP(Using Sklearn)===
elapsed_time:1.273435354232788[sec]
acc= 0.9609375
f1= 0.9602474709896586
