<a href="https://colab.research.google.com/github/soline013/Machine_Learning-ML/blob/master/SSUML_Fashion_MNIST_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#SSUML Fashion MNIST Project

##Introduction.

머신러닝/딥러닝 소모임 SSUML의 프로젝트 과제이다.

Kaggle 홈페이지에 Fashion MNIST에 대해 예제를 참고하여 정보를 얻었다.한 가지 방법을 이용하는 것이 아닌 여러 레이어를 구성하고 싶었고, Keras를 이용해 Sequential한 모델을 만들 수 있었다.

##Datasets.

- Fashion MNIST is a dataset of Zalando's article images.

- Training set: 60,000 examples.

- Test set: 10,000 examples.

- Label: 10 classes.

- Each example: 28X28 grayscale image, total 784 pixels.

- Each Pixel-value: 0~255, indicating the lightness or darkness, higher numbers meaning darker.

- Columns: 785, first column is class labels, and the article of clothing.

##Load Packages.

In [None]:
#Need to Tensorflow 1.x
%tensorflow_version 1.x

TensorFlow 1.x selected.


In [None]:
#Need to Google Dirve Mount
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import numpy as np
import pandas as pd
from tensorflow.python import keras
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from tensorflow.python.keras.models import Sequential
from tensorflow.python.keras.layers import Dense, Flatten, Conv2D, Dropout, MaxPooling2D

##Parameters.

In [None]:
IMG_ROWS = 28
IMG_COLS = 28
LAB_CLASSES = 10
VALID_SIZE = 0.25
BATCH_SIZE = 128
EPOCH = 50

##Read the data.

In [None]:
train_file = "/content/drive/My Drive/Colab Notebooks/2243_9243_bundle_archive/fashion-mnist_train.csv"
test_file  = "/content/drive/My Drive/Colab Notebooks/2243_9243_bundle_archive/fashion-mnist_test.csv"

train_data = pd.read_csv(train_file)
test_data = pd.read_csv(test_file)

##Data exploration.

In [None]:
print("Train Data:", train_data.shape)
print("Test Data:", test_data.shape)

Train Data: (60000, 785)
Test Data: (10000, 785)


##Train set images class distribution.

In [None]:
labels = {0 : "T-shirt/top", 1: "Trouser", 2: "Pullover", 3: "Dress", 4: "Coat", 5: "Sandal", 6: "Shirt", 7: "Sneaker", 8: "Bag", 9: "Ankle boot"}

def Class_Distribution(data):
    #The count for each label
    labels_counts = data["label"].value_counts()

    #The count for total number
    total_counts = len(data)

    for i in range(len(labels_counts)):
        label = labels[labels_counts.index[i]]
        count = labels_counts.values[i]
        percent = (count / total_counts) * 100
        print("{:15s}:  {} or {}%".format(label, count, percent))

Class_Distribution(train_data)

Ankle boot     :  6000 or 10.0%
Bag            :  6000 or 10.0%
Sneaker        :  6000 or 10.0%
Shirt          :  6000 or 10.0%
Sandal         :  6000 or 10.0%
Coat           :  6000 or 10.0%
Dress          :  6000 or 10.0%
Pullover       :  6000 or 10.0%
Trouser        :  6000 or 10.0%
T-shirt/top    :  6000 or 10.0%


##Test set images class distribution.

In [None]:
Class_Distribution(test_data)

Sneaker        :  1000 or 10.0%
Shirt          :  1000 or 10.0%
Sandal         :  1000 or 10.0%
Coat           :  1000 or 10.0%
Dress          :  1000 or 10.0%
Pullover       :  1000 or 10.0%
Ankle boot     :  1000 or 10.0%
Trouser        :  1000 or 10.0%
Bag            :  1000 or 10.0%
T-shirt/top    :  1000 or 10.0%


##Prepare the model

In [None]:
def Preprocess(data):
    images_num = data.shape[0]
    x_array = data.values[: , 1:]
    x_outcome = x_array.reshape(images_num, IMG_ROWS, IMG_COLS, 1)
    y_outcome = keras.utils.to_categorical(data.label, LAB_CLASSES)
    return x_outcome, y_outcome

In [None]:
X, Y = Preprocess(train_data)
X_test, Y_test = Preprocess(test_data)

In [None]:
#Split train in train and validation set
X_train, X_valid, Y_train, Y_valid = train_test_split(X, Y, test_size=VALID_SIZE)

print("X: Train Data:", X_train.shape)
print("X: Valid Data:", X_valid.shape)
print("X: Test Data:", X_test.shape)

print("Y: Train Data:", Y_train.shape)
print("Y: Valid Data:", Y_valid.shape)
print("Y: Test Data:", Y_test.shape)

X: Train Data: (45000, 28, 28, 1)
X: Valid Data: (15000, 28, 28, 1)
X: Test Data: (10000, 28, 28, 1)
Y: Train Data: (45000, 10)
Y: Valid Data: (15000, 10)
Y: Test Data: (10000, 10)


##Model.

In [None]:
model = Sequential()

model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', kernel_initializer='he_normal', input_shape=(IMG_ROWS, IMG_COLS, 1)))
model.add(MaxPooling2D((2, 2)))
model.add(Dropout(0.3))

model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.3))

model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(Dropout(0.3))

model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(10, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy, optimizer='adam', metrics=['accuracy'])

model.summary()

Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         
_________________________________________________________________
dropout (Dropout)            (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 5, 5, 64)          0         
_________________________________________

In [None]:
train_model = model.fit(X_train, Y_train, batch_size=BATCH_SIZE, epochs=EPOCH, validation_data=(X_valid, Y_valid)) 

Train on 45000 samples, validate on 15000 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


##Accuracy and Loss.

In [None]:
result = model.evaluate(X_test, Y_test)
print('Loss:', result[0])
print('Accuracy:', result[1])

Loss: 0.26332143685817716
Accuracy: 0.9069


In [None]:
predicted_classes = model.predict_classes(X_test)
Y_true = test_data.iloc[:, 0]

In [None]:
target_names = ["Class {} ({}) :".format(i, labels[i]) for i in range(LAB_CLASSES)]
print(classification_report(Y_true, predicted_classes, target_names=target_names))

                         precision    recall  f1-score   support

Class 0 (T-shirt/top) :       0.82      0.89      0.85      1000
    Class 1 (Trouser) :       0.99      0.99      0.99      1000
   Class 2 (Pullover) :       0.85      0.87      0.86      1000
      Class 3 (Dress) :       0.89      0.95      0.92      1000
       Class 4 (Coat) :       0.86      0.85      0.86      1000
     Class 5 (Sandal) :       0.99      0.97      0.98      1000
      Class 6 (Shirt) :       0.77      0.65      0.70      1000
    Class 7 (Sneaker) :       0.92      0.98      0.95      1000
        Class 8 (Bag) :       0.99      0.98      0.98      1000
 Class 9 (Ankle boot) :       0.98      0.94      0.96      1000

               accuracy                           0.91     10000
              macro avg       0.91      0.91      0.91     10000
           weighted avg       0.91      0.91      0.91     10000

