## 作業目標
* 嘗試比較用 color histogram 和 HOG 特徵來訓練的 SVM 分類器在 cifar10 training 和 testing data 上準確度的差別

In [1]:
import os
from tensorflow import keras

import numpy as np
import cv2 # 載入 cv2 套件
import matplotlib.pyplot as plt
from liblinear import liblinearutil as svm

train, test = keras.datasets.cifar10.load_data()

In [2]:
x_train, y_train = train
x_test, y_test = test
y_train = y_train.astype(int).reshape(-1)
y_test = y_test.astype(int).reshape(-1)

### 產生直方圖特徵的訓練資料

In [3]:
x_train_histogram = []
x_test_histogram = []

# 對於所有訓練資料
for i in range(len(x_train)):
    chans = cv2.split(x_train[i]) # 把圖像的 3 個 channel 切分出來
    # 對於所有 channel
    hist_feature = []
    for chan in chans:
        # 計算該 channel 的直方圖
        hist = cv2.calcHist([chan], [0], None, [16], [0, 256]) # 切成 16 個 bin
        hist_feature.extend(hist.flatten())
    # 把計算的直方圖特徵收集起來
    x_train_histogram.append(hist_feature)

# 對於所有測試資料也做一樣的處理
for i in range(len(x_test)):
    chans = cv2.split(x_test[i]) # 把圖像的 3 個 channel 切分出來
    # 對於所有 channel
    hist_feature = []
    for chan in chans:
        # 計算該 channel 的直方圖
        hist = cv2.calcHist([chan], [0], None, [16], [0, 256]) # 切成 16 個 bin
        hist_feature.extend(hist.flatten())
    x_test_histogram.append(hist_feature)

x_train_histogram = np.array(x_train_histogram)
x_test_histogram = np.array(x_test_histogram)

### 產生 HOG 特徵的訓練資料
* HOG 特徵通過計算和統計圖像局部區域的梯度方向直方圖來構建特徵，具體細節不在我們涵蓋的範圍裡面，有興趣的同學請參考[補充資料](https://www.cnblogs.com/zyly/p/9651261.html)哦

In [4]:
bin_n = 16 # Number of bins

def hog(img):
    img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
    gx = cv2.Sobel(img, cv2.CV_32F, 1, 0)
    gy = cv2.Sobel(img, cv2.CV_32F, 0, 1)
    mag, ang = cv2.cartToPolar(gx, gy)
    bins = np.int32(bin_n * ang / (2 * np.pi))    # quantizing binvalues in (0...16)
    bin_cells = bins[:10, :10], bins[10:, :10], bins[:10, 10:], bins[10:, 10:]
    mag_cells = mag[:10, :10], mag[10:, :10], mag[:10, 10:], mag[10:, 10:]
    hists = [np.bincount(b.ravel(), m.ravel(), bin_n) for b, m in zip(bin_cells, mag_cells)]
    hist = np.hstack(hists)     # hist is a 64 bit vector
    return hist.astype(np.float32)

x_train_hog = np.array([hog(x) for x in x_train])
x_test_hog = np.array([hog(x) for x in x_test])

### SVM model
* SVM 是機器學習中一個經典的分類算法，具體細節有興趣可以參考 [該知乎上的解釋](https://www.zhihu.com/question/21094489)

In [5]:
help(svm.train)

Help on function train in module liblinear.liblinearutil:

train(arg1, arg2=None, arg3=None)
    train(y, x [, options]) -> model | ACC
    
    y: a list/tuple/ndarray of l true labels (type must be int/double).
    
    x: 1. a list/tuple of l training instances. Feature vector of
          each training instance is a list/tuple or dictionary.
    
       2. an l * n numpy ndarray or scipy spmatrix (n: number of features).
    
    train(prob [, options]) -> model | ACC
    train(prob, param) -> model | ACC
    
    Train a model from data (y, x) or a problem prob using
    'options' or a parameter param.
    
    If '-v' is specified in 'options' (i.e., cross validation)
    either accuracy (ACC) or mean-squared error (MSE) is returned.
    
    options:
        -s type : set type of solver (default 1)
          for multi-class classification
             0 -- L2-regularized logistic regression (primal)
             1 -- L2-regularized L2-loss support vector classification (dual)
  

### 用 histogram 特徵訓練 SVM 模型

In [6]:
SVM_hist = svm.train(y_train, x_train_histogram, ["-s", 1, "-c", 1])

# prediction
print("=== Training Accuracy ===")
_ = svm.predict(y_train, x_train_histogram, SVM_hist)
print("=== Testing Accuracy ===")
_ = svm.predict(y_test, x_test_histogram, SVM_hist)

=== Training Accuracy ===
Accuracy = 25.424% (12712/50000) (classification)
=== Testing Accuracy ===
Accuracy = 25.25% (2525/10000) (classification)


### 用 HOG 特徵訓練 SVM 模型

In [7]:
SVM_hog = svm.train(y_train, x_train_hog, ["-s", 1, "-c", 1])

# prediction
print("=== Training Accuracy ===")
_ = svm.predict(y_train, x_train_hog, SVM_hog)
print("=== Testing Accuracy ===")
_ = svm.predict(y_test, x_test_hog, SVM_hog)

=== Training Accuracy ===
Accuracy = 39.156% (19578/50000) (classification)
=== Testing Accuracy ===
Accuracy = 38.54% (3854/10000) (classification)
