<a href="https://colab.research.google.com/github/stephenbaek/dlcourse/blob/main/assignments/a2_mnist.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open and Execute in Google Colaboratory"></a>

# Assignment 2

In this assignment, you will be implementing a Perceptron hand-written digit classifier using TensorFlow by answering the following questions.

### Background

The [Modified NIST (MNIST) Database](http://yann.lecun.com/exdb/mnist/) is a database of 70,000 hand-written digits, which is a subset of a larger set available from the [National Institute of Standards and Technology (NIST)](https://www.nist.gov/srd/nist-special-database-19). The MNIST dataset is comprised of 60,000 training images and 10,000 test images, each of which are size-normalized to a 28-by-28-pixel image (see https://en.wikipedia.org/wiki/MNIST_database for more).

TensorFlow comes with an MNIST helper code under [`tf.keras.datasets`](https://www.tensorflow.org/api_docs/python/tf/keras/datasets) module, which facilitates loading and parsing MNIST images and labels:

In [None]:
import tensorflow as tf
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()

print('train images shape:', train_images.shape )
print('train labels:', len(train_labels))
print(train_labels)
print('test images shape:', test_images.shape)
print('test labels:', len(test_labels))
print(test_labels)

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(train_images[i], cmap=plt.cm.binary)
    plt.xlabel(train_labels[i])
plt.show()

### Question 1.

Use `tf.data.Dataset.from_tensor_slices()` to create `tf.data.Dataset` objects named `train_image_ds`, `train_label_ds`, `test_image_ds`, and `test_label_ds` from the MNIST dataset above. (Hint: https://www.tensorflow.org/api_docs/python/tf/data/Dataset#from_tensor_slices)

In [None]:
# PROVIDE YOUR ANSWER HERE

### Question 2.

Consolidate `train_image_ds` and `train_label_ds` to build `train_ds` using `tf.data.Dataset.zip()` (https://www.tensorflow.org/api_docs/python/tf/data/Dataset#zip). Repeat the same for `test_image_ds` and `test_label_ds` to build `test_ds`.

In [None]:
# PROVIDE YOUR ANSWER HERE

### Checkpoint 1.
If your answers to the above two questions were correct, you should be able to run the following code to visualize the first 25 training images:

In [None]:
plt.figure(figsize=(10,10))
for i, (image, label) in enumerate(train_ds.take(25)):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(image, cmap=plt.cm.binary)
    plt.xlabel(label.numpy())
plt.show()

### Question 3.

As like many other image datasets, MNIST datasets originally have pixel values ranging from 0 to 255. Write a function named `normalize()` to normalize the pixel values, so that they range between $[0, 1]$.

Hint:
- `normalize` function must take two arguments, `image` and `label`.
- Note that `image` is originally of data type `tf.uint8` (8-bit unsigned integers). This must be casted to `tf.float32` before to be normalized.

In [None]:
# PROVIDE YOUR ANSWER HERE

### Question 4.

The raw MNIST data comes with integer labels, from 0 to 9. Implement a function named `one_hot()` to convert the integer labels to probability labels (one-hot encoding).

Hint:
- `one_hot` function must expect two arguments, `image` and `label`.
- You can either write your own code or use `tf.one_hot` (https://www.tensorflow.org/api_docs/python/tf/one_hot).

In [None]:
# PROVIDE YOUR ANSWER HERE

### Question 5.
Images in the MNIST dataset are size-normalized to `(28,28)` rank-2 tensors. Later on, we will need to "flatten" them to rank-1 tensors of dimension 784=28*28. Implement a function named `flatten()` to perform such an operation.

Hint:
- `flatten` function must expect two arguments, `image` and `label`.
- `tf.reshape` function is available in TensorFlow (https://www.tensorflow.org/api_docs/python/tf/reshape).

In [None]:
# PROVIDE YOUR ANSWER HERE

### Question 6.

Apply `normalize()`, `one_hot()`, and `flatten()` functions to `train_ds` using `tf.data.Dataset.map()` (https://www.tensorflow.org/api_docs/python/tf/data/Dataset#map). Repeat the same for `test_ds`. Reuse the names `train_ds` and `test_ds` for the normalized datasets.

In [None]:
# PROVIDE YOUR ANSWER HERE

### Question 7.

Randomly shuffle the elements of `train_ds` and divide them into batches of size 32. Divide the elements of `test_ds` into batches of the same size. You don't need to shuffle `test_ds`. Hint: https://www.tensorflow.org/api_docs/python/tf/data/Dataset#shuffle and https://www.tensorflow.org/api_docs/python/tf/data/Dataset#batch

In [None]:
# PROVIDE YOUR ANSWER HERE

### Checkpoint 2.

If your answers to the questions above were correct, the following code should produce a congratulations message.

In [None]:
for ds in [train_ds, test_ds]:
    for image_batch, label_batch in ds.shuffle(1024).take(10):   # testing only randomly selected subsets to save time
        assert tf.reduce_all(tf.shape(image_batch) == (32, 784,))
        assert tf.reduce_max(image_batch) <= 1
        assert tf.reduce_min(image_batch) >= 0
        assert tf.reduce_all(tf.shape(label_batch) == (32, 10,))
        assert tf.reduce_sum(label_batch) == 32
        assert tf.reduce_max(label_batch) == 1
        assert tf.reduce_min(label_batch) == 0
print("Congratulations! You passed the second checkpoint.")

### Question 8.

A perceptron model can be written as $f(X \mid W, b) = \sigma(XW + b)$ where $W$ and $b$ are model paramters and $\sigma$ is the sigmoid function. In our case, $X$ is a flattened image of the dimension 784. Given this, what must be the rank and dimensions for tensors $W$ and $b$? Implement a code to initialize `W` and `b` with their respective sizes.

In [None]:
# PROVIDE YOUR ANSWER HERE

### Question 9.
Now, define a function named `perceptron()` which takes a `(-1, 784)` tensor `X` as an input and returns the evaluation of $\sigma(XW + b)$ as an output. Note that the expected output of `perceptron()` is class probabilities implying the predicted probability of `X` falling under each class. Hint: the sigmoid function $\sigma$ is available as `tf.sigmoid()` in TensorFlow.

In [None]:
# PROVIDE YOUR ANSWER HERE

### Question 10.

Define a function `accuracy()` which takes a one-hot encoded ground truth `y` and a perceptron prediction `f` as inputs and returns the percentage of correct predictions out of all predictions. Remember that both `y` and `f` are probability valued, to which `tf.argmax()` might be useful. The agreement between ground truth labels and predicted labels could be compared using `tf.equal` or `==` operator. Finally, note that you are expected to return the "average of correctness", to which `tf.reduce_mean()` might be of use. 

In [None]:
# PROVIDE YOUR ANSWER HERE

### Question 11.
Define `cross_entropy()` function, which takes the same set of inputs as the `accuracy()` function above, but returns the average cross entropy. Do not use off-the-shelf cross entropy functions available in TensorFlow but implement it from scratch.

In [None]:
# PROVIDE YOUR ANSWER HERE

### Question 12.

Define a function named `gradient_descent()` to update `W` and `b` by one step using gradients produced from `cross_entropy()` function above.

Note:
- `gradient_descent()` must expect three inputs, namely `X`, `y`, and `learning_rate`.
- `X` is a tensor containing a flattened image or a batch of flattened images.
- `y` is a tensor containing probability-encoded labels corresponding to `X`.
- `learning_rate` is a floating point value.
- `gradient_descent()` is expected to return two values, `loss, acc`, corresponding to cross entropy and accuracy, respectively.

In [None]:
# PROVIDE YOUR ANSWER HERE

### Question 13.

Implement `evaluate()` function that takes a batch dataset `ds` and evaluates the average loss and the average accuracy of the perceptron across `ds`.

In [None]:
# PROVIDE YOUR ANSWER HERE

### Checkpoint 3 (Final)

If you were successful implementing the above, you should be able to run the training code below.

In [None]:
!pip install -q tqdm
from tqdm import tqdm # Progress bar
import time

learning_rate = 1e-3
for epoch in range(10):
    cnt = 0
    total_acc = 0.0
    total_loss = 0.0
    for X, y in tqdm(train_ds, desc=f'Epoch {epoch:03d}'):
        loss, acc = gradient_descent(X, y, learning_rate)
        total_acc += acc*X.shape[0]
        total_loss += loss*X.shape[0]
        cnt += X.shape[0]
    print('Train loss:', total_loss.numpy() / cnt, '- acc:', total_acc.numpy() / cnt)
    test_loss, test_acc = evaluate(test_ds)
    print('Test loss:', test_loss.numpy(), '- acc:', test_acc.numpy())
    time.sleep(1) # sleep one second to flush out the progress bar


### Bonus Question

What was the accuracy of your perceptron? How does it compare to the random guess (Probability = 1/10)? Implement a MLP to see if there's any improvement.

In [None]:
# PROVIDE YOUR ANSWER HERE