[View in Colaboratory](https://colab.research.google.com/github/lewisliching/Fashion_MNIST_with_Keras/blob/master/Fashion_MNIST_with_Keras.ipynb)

##Fashion MNIST with Keras for getting start of Google Colab

- Take Fashion MNIST with Keras as example of getting start Deep learning in Google Colab and Google Drive

- Each Fashion image is 28 x 28 = 784 pixels in total.
- Each pixel value is an integer between 0 and 255 with higher for darker.
- In the csv dataset, each row is consist of  1 columns(1st column as label) and 784 columns (as 784 pixels of one image)
- labels of 1st column is as follows:
  0.	T-shirt/top
  1.	Trouser
  2.	Pullover
  3.	Dress
  4.	Coat
  5.	Sandal
  6.	Shirt
  7.	Sneaker
  8.	Bag
  9.	Ankle boot

- 60000 images in fashion-mnist_train.csv
- 10000 images in fashion-mnist_test.csv


- Reference:

 - https://www.kaggle.com/zalando-research/fashionmnist/home
 - https://github.com/zalandoresearch/fashion-mnist

##Set up Working environemnt of Google Colab and Google drive

- Supposed Tensorflow is ready under Googel Colab

- Don't forget to set Python 3 and using GPU as accelerator in Google Colab

- Install Keras, in which Numpy....libraries are updated as well

- Install PyDrive for accessing Google drive and authorize Google Colab to access files

In [0]:
!pip install keras
!pip install -U -q PyDrive


##Setup ngrok and run TensorBoard on Colab (optional)

**ngrok is service to tunnel traffic from Tensorboard to localhost.**

![alt text](https://gitcdn.xyz/cdn/Tony607/blog_statics/d425c3fe4cf0d92067572e25ae6cc3198d51936b//images/ngrok/ngrok.jpg)

- Download and unzip ngrok for installation

- Set folder "log" in current folder of Colab to be the folder of Tensorboard log data

- set Tensorboard 6006 port to "log" folder and then ngrok service

---
**Notes:**

- Before compiling cnn_model.fit, we need to set up tensorboard to output log data to "log" folder

-  After compiling, generate Public URL of ngrok to access Tensorboard





In [0]:
!wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
!unzip ngrok-stable-linux-amd64.zip

LOG_DIR = './log'
get_ipython().system_raw(
    'tensorboard --logdir {} --host 0.0.0.0 --port 6006 &'
    .format(LOG_DIR)
)

get_ipython().system_raw('./ngrok http 6006 &')

##Connect Google drive for CSV files

- Authenticate and create the PyDrive client and it will prompt for unique key of verify Google Colab can have access to Google Drive. Just click and copy the unique key and copy to the blank

- Copy the Folder ID from URL bar of that folder

- Search for 2 csv files in the folder and load files into Google colab and create filename accordingly

In [0]:
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# list out files in folder of Google drive
file_list = drive.ListFile({'q': "'1xt6pBbi0fLeFeUqc4OuwD0SciUad9jhf' in parents and trashed=false"}).GetList()

# Load files into Google colab
for file1 in file_list:
    if file1['title'] == 'fashion-mnist_train.csv':
      train_downloaded = drive.CreateFile({'id': file1['id']})
      train_downloaded.GetContentFile(file1['title'])  
      print('%s is loaded into Colab' % (file1['title']))

    if file1['title'] == 'fashion-mnist_test.csv':
      train_downloaded = drive.CreateFile({'id': file1['id']})
      train_downloaded.GetContentFile(file1['title'])  
      print('%s is loaded into Colab' % (file1['title']))


##Import all relevant libraries & Packages

In [0]:
import keras
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Dropout
from keras.optimizers import Adam
from keras.callbacks import TensorBoard

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

##Import CSV file into DataFrame(Pandas) from Colab

- Using Pandas read_csv to read file in raw format

- Printout header (1st 5 row) of DataFrame(df) for double checking (1st column label = 2 9 6 0 3)

In [0]:
train_df = pd.read_csv(r'fashion-mnist_train.csv')
test_df = pd.read_csv(r'fashion-mnist_test.csv')

# check df by printout 1st 5 rows
train_df.head()

## Load DataFrame (Pandas) to Array (Numpy) for further data analysis

- A Series object in pandas represents a one-dimensional labeled indexed array based on the NumPy ndarray. 

- NumPy allows us to work with high-performance N-dimensional array for selecting array elements, logical operations, slicing, reshaping, combining(stacking), splitting...etc.

- A series object is more efficient than using  N-dimensional for aligning data or label matching while N-dimensional array is more efficient in data manipulation. Thus, this is the reason why we load data from Data Frame to Array with:

  1. Data Splitting (X & Y for both train data and test data)
  2. Data Normalization by 255 (range of 0 - 255 integer of each pixel)
  3. Print out dimension of arrays for double checking
  4. Print out one of image of x_train for double checking (any number e.g. 50000)

In [0]:
train_data = np.array(train_df, dtype='float32')
test_data = np.array(test_df, dtype='float32')

# all rows from column 1 till end
# all rows of column 0
x_train = train_data[:, 1:]/255
y_train = train_data[:, 0]

x_test = test_data[:, 1:]/255
y_test = test_data[:, 0]

print('x_train:', x_train.shape)
print('y_train:', y_train.shape)
print(' x_test:', x_test.shape)
print(' y_test:', y_test.shape)

#take 50000th row data of x_train and reshape into 28x28 array
image = x_train[50000, :].reshape(28,28)
plt.imshow(image)
plt.show()

##Splitting train data into training data and validate date for model training

- From library sklearn, we use train_test_split for data splitting
  - test size = 0.2 is meant splitting validate data by 20% of train data
  - random_state = 123 (or any number) is keep consistent result of same randomize picking. If we want to true random picking everytime, just don't set this value

In [0]:

x_training, x_validate, y_training, y_validate = train_test_split(
  x_train, y_train, test_size=0.2, random_state=123, 
)
print('x_training:{}'.format(x_training.shape))
print('x_validate:{}'.format(x_validate.shape))


##Create CNN model for deep learning

- **Reshape dataset**
  - adding 28 x 28 x 1 into the dataset in order to fit CNN model
  - Print out the shape of each dataset for double checking
  - batch size ?? (not sure, under studying)
  
- **Define the CNN model**
 - setting all parameters (under studying)
 
- **Compile the CNN model**
 - setting all parameters (under studying)
 
- **Setting Tensorboard for log data analysis (optional)**
 - Set up tensorboard to output log data to "log" folder
 - For details, please refer to the top of this project

- **Fit the CNN model**
 - setting all parameters (under studying)
 - It really takes a long long time with log data of Tensorboard
 - Callbacks[tensorboard] is needed only if you want Tensorboard

 

In [0]:
im_rows = 28
im_cols = 28
batch_size = 512
im_shape = (im_rows, im_cols, 1)

x_training = x_training.reshape(x_training.shape[0], *im_shape)
x_validate = x_validate.reshape(x_validate.shape[0], *im_shape)
x_test = x_test.reshape(x_test.shape[0], *im_shape)

print('x_training shape:{}'.format(x_training.shape))
print('x_validate shape:{}'.format(x_validate.shape))
print('x_test shape:{}'.format(x_test.shape))

In [0]:
cnn_model = Sequential([
    Conv2D(filters=32, kernel_size=3, activation='relu', input_shape=im_shape),
    MaxPooling2D(pool_size=2),
    Dropout(0.2),
    
    Flatten(),
    Dense(32, activation='relu'),
    Dense(10, activation='softmax')
    
])

In [0]:
cnn_model.compile(
    loss='sparse_categorical_crossentropy',
    optimizer=Adam(lr=0.001),
    metrics=['accuracy']
)

In [0]:
#optional for Tensorboard
tensorboard = TensorBoard(log_dir='./log', histogram_freq=1,
                         write_graph=True,
                         write_grads=True,
                         batch_size=batch_size,
                         write_images=True)

cnn_model.fit(
    x_training, y_training, batch_size=batch_size,
    epochs=10, verbose=1,
    validation_data=(x_validate, y_validate),
    callbacks=[tensorboard] #optional for Tensorboard
)

##Evaluation of the CNN model with test dataset
- Evaluate the CNN model by test dataset.
- Printout loss & accuracy for evaluation by comparing with fitting result above

In [0]:
score = cnn_model.evaluate(x_test, y_test, verbose=0)

print('test loss: {:.4f}'.format(score[0]))
print('test accuracy: {:.4f}'.format(score[1]))


##Access Tensorboard webpage for log data analysis (Optional)
- URL of Ngrok will be resulted. Click it for Tensorboard.
- For details, please refer to the top of this project.

In [0]:
! curl -s http://localhost:4040/api/tunnels | python3 -c \
    "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"
