<a href="https://colab.research.google.com/github/matteosoo/aimsfellows_DL/blob/master/project/sketcher_template/sketcher_traditional_chinese_char.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# sketcher_traditional_chinese_char
- Reference: https://github.com/zaidalyafeai/zaidalyafeai.github.io/tree/master/sketcher 
- 將繁體手寫字做手繪辨識

##  Mount the Google drive to colab
- Note
  - !ls 可以將自己目前的資料夾給show出來
  - !cd 表示change direct，讓你所執行的根目錄，轉換到quick_draw這個資料夾
- p.s. 請根據自己擺放在雲端硬碟的路徑做適當的調變

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
!ls '/content/drive/My Drive/Colab Notebooks/quick_draw/'

In [None]:
!cd '/content/drive/My Drive/Colab Notebooks/quick_draw/'

## Import package

In [None]:
# 1.15.2為助教在測試本專案後，建議各位同學使用的版本
!pip install tensorflow==1.15.2

In [None]:
%tensorflow_version 1.x
import os
import glob
import numpy as np
from tensorflow.python.keras import layers
from tensorflow import keras 
import tensorflow as tf
from tqdm import tqdm
print(tf.__version__) # print出目前使用的tensorflow版本

## Load the Data
- Note
  - 這邊的mini_classes.txt要自己建檔，放所要訓練用到的字
  - utf8解碼是可以使中文字的檔名在解碼過程中不會變為亂碼編碼
  - readline()為一行一行讀，所以在建檔時務必確保用enter換行，不要多也不要少
  - HandWritting_npy資料夾下必須自己去查詢如何轉換資料，才能正確使用第2個cell所寫的load_data() function
  - 也要注意image_size我們已幫各位從原專案28轉成300，因為過低的解析度可能不利於辨識繁體字的複雜

In [None]:
classes = []
with open('mini_classes.txt', 'r', encoding='utf8') as f:
    lines = f.readlines()
    for line in lines:
        line = line.strip('\n')
        classes.append(line)

print(len(classes))
print(classes)

In [None]:
def load_data(root, vfold_ratio=0.2, max_items_per_class= 4000 ):
    all_files = glob.glob(os.path.join(root, '*.npy'))

    #initialize variables 
    x = np.empty([0, 90000])
    y = np.empty([0])
    class_names = []

    #load each data file 
    for idx, file in enumerate(tqdm(all_files)):
        data = np.load(file)
        data = data[0: max_items_per_class, :]
        labels = np.full(data.shape[0], idx)

        x = np.concatenate((x, data), axis=0)
        y = np.append(y, labels)

        class_name, ext = os.path.splitext(os.path.basename(file))
        class_names.append(class_name)

    data = None
    labels = None
    
    #randomize the dataset 
    permutation = np.random.permutation(y.shape[0])
    x = x[permutation, :]
    y = y[permutation]

    #separate into training and testing 
    vfold_size = int(x.shape[0]/100*(vfold_ratio*100))

    x_test = x[0:vfold_size, :]
    y_test = y[0:vfold_size]

    x_train = x[vfold_size:x.shape[0], :]
    y_train = y[vfold_size:y.shape[0]]
    return x_train, y_train, x_test, y_test, class_names

In [None]:
x_train, y_train, x_test, y_test, class_names = load_data('HandWritting_npy')
num_classes = len(class_names)
image_size = 300

In [None]:
print(len(x_train))

Show some random data

In [None]:
import matplotlib.pyplot as plt
from random import randint
%matplotlib inline  
idx = randint(0, len(x_train))
plt.imshow(x_train[idx].reshape(300,300))  # reshape這邊只是show出來的比例所以不影響訓練
print(class_names[int(y_train[idx].item())])

## Preprocess the Data

In [None]:
# Reshape and normalize
x_train = x_train.reshape(x_train.shape[0], image_size, image_size, 1).astype('float32')
x_test = x_test.reshape(x_test.shape[0], image_size, image_size, 1).astype('float32')

x_train /= 255.0
x_test /= 255.0

# Convert class vectors to class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

## The Model
- Note
  - 最後一層的網路是藉接分類的，所以請思考該如何給下正確的參數
  - 想要優化，網路架構的設計也往往是一大重點

In [None]:
# Define model
model = keras.Sequential()
model.add(layers.Convolution2D(16, (3, 3),
                        padding='same',
                        input_shape=x_train.shape[1:], activation='relu'))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
model.add(layers.Convolution2D(32, (3, 3), padding='same', activation= 'relu'))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
model.add(layers.Convolution2D(64, (3, 3), padding='same', activation= 'relu'))
model.add(layers.MaxPooling2D(pool_size =(2,2)))
model.add(layers.Flatten())
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(100, activation='softmax')) 
# Train model
adam = tf.train.AdamOptimizer()
model.compile(loss='categorical_crossentropy',
              optimizer=adam,
              metrics=['top_k_categorical_accuracy'])
model.summary()

## Training
- Note
  - 上課有教導overfitting的概念，所以我們知道不能過度訓練
  - 反而言之，我們應該如何調整，讓這個model能夠fit到最好的accuracy呢?

In [None]:
model.fit(x = x_train, y = y_train, validation_split=0.1, batch_size = 256, verbose=1, epochs=5)

## Testing

In [None]:
score = model.evaluate(x_test, y_test, verbose=0)
print('Test accuarcy: {:0.2f}%'.format(score[1] * 100))

## Inference

In [None]:
import matplotlib.pyplot as plt
from random import randint
%matplotlib inline  
idx = randint(0, len(x_test))
img = x_test[idx]
plt.imshow(img.squeeze())
pred = model.predict(np.expand_dims(img, axis=0))[0]
ind = (-pred).argsort()[:5]
latex = [class_names[x] for x in ind]
print(latex)

## Store the classes

In [None]:
with open('class_names.txt', 'w') as file_handler:
    for item in class_names:
        file_handler.write("{}\n".format(item))

## Install TensorFlowJS

In [None]:
!pip install tensorflowjs

## Save model and Convert to tensorflowJS
- Note
  - !mkdir 是一個讓你在目前的資料夾下創建一個新資料夾(名為model3)的指令
  - !cp為copy一file到另一個
目的地file
  - 最後我們將model3的資料夾整包壓縮(zip)並下載下來就完成了

In [None]:
model.save('keras.h5')

In [None]:
!mkdir model3
!tensorflowjs_converter --input_format keras keras.h5 model/

In [None]:
!cp class_names.txt model3/class_names.txt

In [None]:
!zip -r model3.zip model3

In [None]:
from google.colab import files
files.download('model3.zip')