<a href="https://colab.research.google.com/github/xup5/Computational-Neuroscience-Class/blob/main/Convolutional%20Neural%20Network/cnn_fashion_tf.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Convolutional neural network (CNN) for Fashion MNIST dataset from scratch**


by Xu Pan

Change colab runtime type to GPU to accelerate: "Runtime" --> "Change runtime type"--> under "Hardware accelerator" select "GPU".


We will implement a samilar CNN model you have already implemented with Keras. We will try to use generic TF functions as much as possible rather than high Keras encapsulation. Hopefully this will help you understand every details of deep neural network.



## **Common deep learning steps**

* Step 1: Generate training and test data (and preprocess)
* Step 2: Initialize the network parameters
* Step 3: Forward propagation
* Step 4: Compute the cost/loss
* Step 5: Backpropagation or create an optimizer to minimize the cost from step 4
* Step 6: Evaluate the model with your test set

In [57]:
# imports
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from tensorflow import keras
import os   # to save the checkpoint
from tqdm import tqdm

In [30]:
# load dataset
fashion_mnist = tf.keras.datasets.fashion_mnist
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

# check shape
print('Training data size:' + str(x_train.shape))
print('Training labels size: ' + str(y_train.shape))

print('Test data size: ' + str(x_test.shape))
print('Test label size: ' + str(y_test.shape))

Training data size:(60000, 28, 28)
Training labels size: (60000,)
Test data size: (10000, 28, 28)
Test label size: (10000,)


In [31]:
# normalize input range
x_train, x_test = x_train / 255.0, x_test / 255.0

# Add a channels dimension
x_train = x_train[..., np.newaxis].astype("float")
x_test = x_test[..., np.newaxis].astype("float")

# Change target from categorical to one-hot
y_train_one_hot = np.zeros((y_train.size, y_train.max() + 1))
y_train_one_hot[np.arange(y_train.size), y_train] = 1
y_test_one_hot = np.zeros((y_test.size, y_test.max() + 1))
y_test_one_hot[np.arange(y_test.size), y_test] = 1

In [None]:
# Tensorflow dataset class is useful in many cases. Though we are not using it here.

# train_ds = tf.data.Dataset.from_tensor_slices((x_train, y_train_one_hot)).shuffle(60000).batch(32)
# test_ds = tf.data.Dataset.from_tensor_slices((x_test, y_test_one_hot)).batch(32)

In [None]:
# keras.Model is a useful model interface. We can define our model computations inside
# the call function.

# class MyModel(Model):
#   def __init__(self):
#     super(MyModel, self).__init__()
#     self.conv1 = Conv2D(32, 3, activation='relu')
#     self.flatten = Flatten()
#     self.d1 = Dense(128, activation='relu')
#     self.d2 = Dense(10)

#   def call(self, x):
#     x = self.conv1(x)
#     x = self.flatten(x)
#     x = self.d1(x)
#     return self.d2(x)

In [92]:
class MyModel():
  def __init__(self):
    self.w1 = tf.Variable(np.random.normal(0, 0.1, size=(3,3,1,64))) # [filter_height, filter_width, in_channels, out_channels]
    self.w2 = tf.Variable(np.random.normal(0, 0.1, size=(3,3,64,32)))
    self.w3 = tf.Variable(np.random.normal(0, 0.1, size=(1152,128)))
    self.w4 = tf.Variable(np.random.normal(0, 0.1, size=(128,10)))
    self.b1 = tf.Variable(np.zeros(64))
    self.b2 = tf.Variable(np.zeros(32))

  def call(self, x):
    x = tf.nn.conv2d(input=x, filters=self.w1, strides=[1, 1, 1, 1], padding='VALID') # b x 26 x 26 x 64
    x = tf.nn.bias_add(x, self.b1)
    x = tf.nn.relu(x)
    x = tf.nn.max_pool(x, ksize=2, strides=2, padding="SAME")

    x = tf.nn.conv2d(input=x, filters=self.w2, strides=[1, 1, 1, 1], padding='VALID') # b x 12 x 12 x 32
    x = tf.nn.bias_add(x, self.b2)
    x = tf.nn.relu(x)
    x = tf.nn.max_pool(x, ksize=2, strides=2, padding="SAME")

    x = tf.reshape(x, [x.shape[0], 1, -1]) # b x 1 x 1152
    x = tf.linalg.matmul(x,self.w3) # b x 1 x 32
    x = tf.nn.relu(x)

    x = tf.linalg.matmul(x,self.w4) # b x 1 x 10
    x = tf.reshape(x, [x.shape[0], -1]) # b x 1
    x = tf.nn.softmax(x)
    return x
  
  def get_var(self):
    return [self.w1, self.w2, self.w3, self.w4, self.b1, self.b2]

# Create an instance of the model
model = MyModel()

In [None]:
max_epoch = 10
batch_size = 32
max_step_train = int(60000 / batch_size)
max_step_test = int(10000 / batch_size)
lr = 0.0001
# optimizer = tf.keras.optimizers.Adam()
for epoch in range(max_epoch):
  # train
  for step in tqdm(range(max_step_train)):
    with tf.GradientTape() as tape:
      # training=True is only needed if there are layers with different
      # behavior during training versus inference (e.g. Dropout).
      predictions = model.call(x_train[step*batch_size:(step+1)*batch_size, ...])
      loss = tf.nn.softmax_cross_entropy_with_logits(y_train_one_hot[step*batch_size:(step+1)*batch_size, ...], predictions)
    gradients = tape.gradient(loss, model.get_var())
    # optimizer.apply_gradients(zip(gradients, model.get_var()))
    for var, grad in zip(model.get_var(), gradients):
      var.assign_sub(lr*grad)
  
  # test
  test_accuracy = 0
  for step in range(max_step_test):
    predictions = model.call(x_test[step*batch_size:(step+1)*batch_size, ...])
    test_accuracy += np.sum(np.argmax(predictions,axis=1) == y_test[step*batch_size:(step+1)*batch_size, ...])
  test_accuracy = test_accuracy/10000
  print(f'test accuracy: {test_accuracy}.')

## Questions

1.  Why do we need to randomly initialize weights?

2.  Print out the training accuracy and loss during the training (hint: moving average)

3.  What you can do to get higher accuracy?


