# Deep Learning HW1
### Image Classification with Multiple Models and Features
**數據所碩一 黃亮臻**   

- Features Extraction
    - Color histograms
    - HOG
    - BRISK
    - ORB
      
- Model
    - KNN
    - RF
    - AdaBoost
    - SVM
      

- Reference
    - [OpenCV Python Tutorials](https://opencv-python-tutorials.readthedocs.io/zh/latest/)
    - [OpenCV-with-Python](https://github.com/chewbacca89/OpenCV-with-Python/blob/master/Lecture%205.5%20-%20SIFT%2C%20SURF%2C%20FAST%2C%20BRIEF%20%26%20ORB.ipynb)
    - [Bag-of-Words](https://medium.com/@derekliao_62575/nlp%E7%9A%84%E5%9F%BA%E6%9C%AC%E5%9F%B7%E8%A1%8C%E6%AD%A5%E9%A9%9F-ii-bag-of-words-%E8%A9%9E%E8%A2%8B%E8%AA%9E%E8%A8%80%E6%A8%A1%E5%9E%8B-3b670a0c7009)
    - [Udacity 電腦視覺](https://medium.com/chiukevin0321/%E4%BA%BA%E8%87%89%E8%BE%A8%E8%AD%98-face-detection-face-recognition-7ba98aaf1a02)

In [2]:
# ! pip install opencv-contrib-python==3.4.11.45
# ! pip install numpy pandas matplotlib scikit-image scikit-learn

In [1]:
import time
import numpy as np
import cv2
import pickle

from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import f1_score, accuracy_score

from platform import python_version
from sklearn import __version__ as sklearn_version

from importlib import reload

import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

print("【Python】", python_version())
print("【OpenCV】", cv2.__version__)
print("【NumPy】", np.__version__)
print("【Scikit-learn】", sklearn_version)


【Python】 3.11.5
【OpenCV】 4.9.0
【NumPy】 1.24.3
【Scikit-learn】 1.3.0


In [2]:
# 載入影像處理模組: imgModule.py
import imgModule
reload(imgModule) 

<module 'imgModule' from '/Users/liang/Documents/NCKU_DS/DL/imgModule.py'>

In [4]:
# 解壓縮檔案
# import zipfile
# zip = zipfile.ZipFile('/home/liang/DL/HW1/TinyImageNet.zip')
# zip.extractall(path='/home/liang/DL/HW1/TinyImageNet')
# zip.close()

In [5]:
# 讀入檔案
image_file = imgModule.OpenImageFile(directory='TinyImageNet/TIN')
image_file.path_totxt()
x, y = image_file.load_img('train.txt')
tx, ty = image_file.load_img('test.txt')

In [6]:
print(len(x))
print(x.shape)
print(y.shape)
print(len(tx))
print(tx.shape)
print(ty.shape)

99600
(99600, 256, 256, 3)
(99600,)
200
(200, 256, 256, 3)
(200,)


In [7]:
# 特徵擷取方法與模型
feature_methods = ["BRIEF", "ORB", "ColorHist", "HOG", ]
ml_methods = {"SVM":SVC(), # 若使用SVM 則PCA先降至32維
              "KNN":KNeighborsClassifier(),
              "RF": RandomForestClassifier(),
              "AdaBoost":AdaBoostClassifier()}

In [None]:
results = {}
best_acc = 0
best_model_details = {}

for feature in feature_methods:
    # 提取特徵
    f_start_time = time.time()
    newfeature = imgModule.FeatureExtractor(feature)
    X_train = newfeature.get_feature(x)
    X_test = newfeature.get_feature(tx)
    feature_dimension = X_train.shape[1]
    f_end_time = time.time()
    feature_time = f_end_time - f_start_time
    print(f'{feature} Done! spend {feature_time:.0f} s')

    for ml_name, ml_method in ml_methods.items():
        applypca = True if ml_name == "SVM" else False
        
        # 訓練模型
        t_start_time = time.time()
        model = imgModule.ModelingEvaluate(ml_method, applypca)
        train_acc, train_f1 = model.modeling(X_train, y)
        t_end_time = time.time()
        train_time = t_end_time - t_start_time
        print(f'- {ml_name} Done! spend {train_time:.0f} s')

        # 評估模型
        test_acc, test_f1 = model.evaluate(X_test, ty)

        # 儲存表現最好的模型
        if test_acc > best_acc:
            best_acc = test_acc
            model.save_model()
            print(f"The model is saved: {feature}+{ml_name}")
            
            # optional
            best_model_details = {'feature': feature, 'ml_name': ml_name}
            with open(f'best_model_details.pkl', 'wb') as file:
                pickle.dump(best_model_details, file)

        # 記錄結果
        results[(feature, ml_name)] = (feature_dimension, train_acc, train_f1, test_acc, test_f1, feature_time, train_time)
    print("==========================")


BRIEF Done! spend 560 s
Use PCA to reduce dimension...
- SVM Done! spend 8245 s
The model is saved: BRIEF+SVM
- KNN Done! spend 215 s
- RF Done! spend 186 s
- AdaBoost Done! spend 86 s
ORB Done! spend 613 s
Use PCA to reduce dimension...


In [None]:
# 儲存 result 成 pickle
with open('results.pkl', 'wb') as f:
    pickle.dump(results, f)

# with open('results.pkl', 'rb') as f:
#     results = pickle.load(f)

with open('best_model_details.pkl', 'rb') as f:
    details = pickle.load(f)
    

In [15]:
for (feature_name, ml_name), (feature_dimension, train_acc, train_f1, test_acc, test_f1, feature_time, train_time) in results.items():
    print(f"** {feature_name} + {ml_name} ** \n"
          f"Feature Dimension: {feature_dimension},\n"
          f"Training Accuracy: {train_acc:.4f},\n"
          f"Training F1: {train_f1:.4f},\n"
          f"Testing Accuracy: {test_acc:.4f},\n"
          f"Testing F1: {test_f1:.4f},\n"
          f"Time for Feature Extraction: {feature_time:.0f} s,\n"
          f"Time for Training: {train_time:.0f} s, \n"
          "==================================")

** BRIEF + SVM ** 
Feature Dimension: 32,
Training Accuracy: 0.1363,
Training F1: 0.1262,
Testing Accuracy: 0.0200,
Testing F1: 0.0101,
Time for Feature Extraction: 560 s,
Time for Training: 8245 s, 
** BRIEF + KNN ** 
Feature Dimension: 32,
Training Accuracy: 0.2382,
Training F1: 0.1972,
Testing Accuracy: 0.0050,
Testing F1: 0.0050,
Time for Feature Extraction: 560 s,
Time for Training: 215 s, 
** BRIEF + RF ** 
Feature Dimension: 32,
Training Accuracy: 0.9989,
Training F1: 0.9989,
Testing Accuracy: 0.0050,
Testing F1: 0.0050,
Time for Feature Extraction: 560 s,
Time for Training: 186 s, 
** BRIEF + AdaBoost ** 
Feature Dimension: 32,
Training Accuracy: 0.0296,
Training F1: 0.0184,
Testing Accuracy: 0.0050,
Testing F1: 0.0004,
Time for Feature Extraction: 560 s,
Time for Training: 86 s, 
** ORB + SVM ** 
Feature Dimension: 32,
Training Accuracy: 0.1025,
Training F1: 0.0911,
Testing Accuracy: 0.0100,
Testing F1: 0.0039,
Time for Feature Extraction: 613 s,
Time for Training: 10520 s, 
*