<a href="https://colab.research.google.com/github/hkbu-kennycheng/comp3065/blob/main/lab5_Tensorflow_(Keras)_for_deep_learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Sequential model in Keras

You will experiment how to train an image classifition model and text classification model in this lab. During the lab, we would learn about `Sequenttial` API of `Keras`, and also 

## What is Keras?

[Keras](https://keras.io/) is a Python library to provide Application Programming Interface (API) for developing deep learning applications and algorithms.


### Installation
If you are running notebook in your local machine, you will need to install Tensorflow by using either `pip`, or `conda` for Anaconda and miniconda. You may refer to [Tensorflow installation guide](https://www.tensorflow.org/install) for reference.

After installation, you should able to `import` it in the Python code. We usually `import tensorflow as tf` for simplicity.

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

## What is a Sequential model? When to use a Sequential model?

A sequential model in Keras is a stack of layers, where each layer has only one input tensor and output tensor.


### Example

Here is an example which is the first model that we are going to build in this lab session. It would accept a single input (a single channel image) and provide a output (probability of labels).

```
 +---------+    +-------+    +-------+
 | Flatten | -> | Dense | -> | Dense |
 +---------+    +-------+    +-------+
   Layer 1       layer 2      layer 3 
```

You will see details of each layer later in this lab session.

In [None]:
# Define Sequential model with 3 layers
model = keras.Sequential(
  [
    layers.Flatten(input_shape=(28, 28), name="layer1"),
    layers.Dense(128, activation="relu", name="layer2"),
    layers.Dense(10, name="layer3")
  ]
)

2021-07-24 02:36:59.064355: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2021-07-24 02:36:59.064907: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)


Metal device set to: Apple M1


In [None]:
# showing layers
model.layers

[<tensorflow.python.keras.layers.core.Flatten at 0x29031ce20>,
 <tensorflow.python.keras.layers.core.Dense at 0x2902c7fd0>,
 <tensorflow.python.keras.layers.core.Dense at 0x2902c7e50>]

Although above example illustration is in horizontally manner. Stack operations are supported for managing sequential model in Kares. Let's import the Python modules and try it out.

In [None]:
# Remove last layer
model.pop()

In [None]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
layer1 (Flatten)             (None, 784)               0         
_________________________________________________________________
layer2 (Dense)               (None, 128)               100480    
Total params: 100,480
Trainable params: 100,480
Non-trainable params: 0
_________________________________________________________________


In [None]:
# Another syntax for adding layer to the model
model.add(layers.Dense(10, name="layer3"))

In [None]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
layer1 (Flatten)             (None, 784)               0         
_________________________________________________________________
layer2 (Dense)               (None, 128)               100480    
_________________________________________________________________
layer3 (Dense)               (None, 10)                1290      
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________


## When not to use Sequential model?

A Sequential model is not appropriate when:

- Your model has multiple inputs or multiple outputs
- Any of your layers has multiple inputs or multiple outputs
- You need to do layer sharing
- You want non-linear topology (e.g. a residual connection, a multi-branch model)

For above scenario, you may consider using Keras Functional API to build a complex model.

# Task 1: Building a model for image classification


## Dataset: [Fashion MNIST](https://github.com/zalandoresearch/fashion-mnist)

![](https://url2img-web.herokuapp.com/aHR0cHM6Ly9naXRodWIuY29tL3phbGFuZG9yZXNlYXJjaC9mYXNoaW9uLW1uaXN0I2Zhc2hpb24tbW5pc3Q=?w=1280&h=800)

### Why Fashion MNIST?

Orignal [MNIST dataset](http://yann.lecun.com/exdb/mnist/) is a dataset of  handwritten digits. Because of its simplicty, the machine learning community would use MNIST dataset to validate their algorithms at the very begining during development. But it's too easy for the algorithms nowadays and could not represent modern computer vision tasks.

`Fashion MNIST` serve the same purpose as orignal MNIST dataset and could be used as a drop-in replacement.

### Import dataset from keras

`Fashion MNIST` is already included in `Keras` under `tensorflow.keras.datasets` namespace.

In [None]:
fashion_mnist = tf.keras.datasets.fashion_mnist

# split training set and testing set
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz


### Explore the data and preprocessing

Each item in the dataset is a 28x28 Numpy array of `uint8` representing the single channel monolithic pixel value, ranging from 0 (black color) to 255 (white color).

In [None]:
print(train_images.shape) # 60000 of 28x28 images
print(train_images[0]) # 2d array showing the first image pixel value

# from google.colab.patches import cv2_imshow
# cv2_imshow(train_images[0]) # showing actual image with imshow

(60000, 28, 28)
[[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   1   0   0  13  73   0
    0   1   4   0   0   0   0   1   1   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   3   0  36 136 127  62
   54   0   0   0   1   3   4   0   0   3]
 [  0   0   0   0   0   0   0   0   0   0   0   0   6   0 102 204 176 134
  144 123  23   0   0   0   0  12  10   0]
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0 155 236 207 178
  107 156 161 109  64  23  77 130  72  15]
 [  0   0   0   0   0   0   0   0   0   0   0   1   0  69 207 223 218 216
  216 163 127 121 122 146 141  88 172  66]
 [  0   0   0   0   0   0   0   0   0   1   1   

The first label value is `9`, meaning that the pixel values showing on above is an Ankle boot.

In [None]:
train_labels # The integer values represent 10 corresponding label as follows.

array([9, 0, 0, ..., 3, 0, 5], dtype=uint8)

|value|name|
|-----|----|
|0|T-shirt/top|
|1|Trouser|
|2|Pullover|
|3|Dress|
|4|Coat|
|5|Sandal|
|6|Shirt|
|7|Sneaker|
|8|Bag|
|9|Ankle boot|

In [None]:
# Thus, we defines a list of corresponding class name for showing a human readable value.
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
print(class_names[train_labels[0]]) # printing out the class name of first label

Ankle boot


Before we feed data to our neural network model, we need to scale to pixel value range from `0 - 255 (integer)` to `0 - 1 (float)`. It's because the underlying architecture only work well on floating point value.

In [None]:
train_images = train_images / 255.0
test_images = test_images / 255.0

## Build the model

After data preparation, it's time to start building our model. In above introduction to `Sequential` API, we have already build the following model by stacking different layers.

```
 +---------+    +-------+    +-------+
 | Flatten | -> | Dense | -> | Dense |
 +---------+    +-------+    +-------+
   Layer 1       layer 2      layer 3 
```

To build the model, we need to define layer stack using `Sequential` API and then compile it.

In [None]:
model = keras.Sequential(
  [
    layers.Flatten(input_shape=(28, 28), name="layer1"),
    layers.Dense(128, activation="relu", name="layer2"),
    layers.Dense(10, name="layer3")
  ]
)

### Layers

- Flatten layer reformats the input data from 2d array `(28, 28)` to 1d array `(784)`, which has no parameters to learn.

- Dense layer is a fully connected nerual network. The first `Dense` layer consists of 128 nodes (or neurons), which has `784 pixels * 128 nodes + 128 biases = 100480 parameters` to learn. The second `Dense` layer has 10 nodes (or neurons) consisting of `128 * 10 + 10 = 1290` parameters, which returns a [logits](https://developers.google.com/machine-learning/glossary#logits) array with length of 10. They are probabilities of the 10 classes.

There are many other layers, like CNN, RNN and etc. You could read more on the [layers API](https://keras.io/api/layers/) document.


In [None]:
 model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
layer1 (Flatten)             (None, 784)               0         
_________________________________________________________________
layer2 (Dense)               (None, 128)               100480    
_________________________________________________________________
layer3 (Dense)               (None, 10)                1290      
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________


### Compile the model and specifiying optimizer, loss and metrics


- Loss function — This measures how accurate the model is during training. You want to minimize this function to "steer" the model in the right direction.
  - The following example uses `SparseCategoricalCrossentropy`. it's a loow function for two or more label classes.
- Optimizer — This is how the model is updated based on the data it sees and its loss function.
  - [Adam](https://arxiv.org/abs/1412.6980) optimization is a stochastic gradient descent method that is based on adaptive estimation of first-order and second-order moments. 
- Metrics — Used to monitor the training and testing steps.
  - The following example uses accuracy, the fraction of the images that are correctly classified.


In [None]:
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

## Train the model

### Monitoring progress with Tensorboard

In [None]:
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir="logs")

%load_ext tensorboard
%tensorboard --logdir logs

2021-07-24 02:38:14.687658: I tensorflow/core/profiler/lib/profiler_session.cc:126] Profiler session initializing.
2021-07-24 02:38:14.687680: I tensorflow/core/profiler/lib/profiler_session.cc:141] Profiler session started.
2021-07-24 02:38:14.690164: I tensorflow/core/profiler/lib/profiler_session.cc:159] Profiler session tear down.


### Feeding data to model

Feeding data to model could be done by invoking `model.fit` with training data, labels and defining epochs number. `epochs` defines the number times that the learning algorithm will work through the entire training dataset.



In [None]:
model.fit(train_images, train_labels, epochs=10, callbacks=[tensorboard_callback])

2021-07-24 02:38:19.129826: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-07-24 02:38:19.143424: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2021-07-24 02:38:19.259269: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


Epoch 1/10
  25/1875 [..............................] - ETA: 14s - loss: 1.5188 - accuracy: 0.4537

2021-07-24 02:38:20.417731: I tensorflow/core/profiler/lib/profiler_session.cc:126] Profiler session initializing.
2021-07-24 02:38:20.417743: I tensorflow/core/profiler/lib/profiler_session.cc:141] Profiler session started.
2021-07-24 02:38:20.424871: I tensorflow/core/profiler/lib/profiler_session.cc:66] Profiler session collecting data.
2021-07-24 02:38:20.442962: I tensorflow/core/profiler/lib/profiler_session.cc:159] Profiler session tear down.
2021-07-24 02:38:20.455706: I tensorflow/core/profiler/rpc/client/save_profile.cc:137] Creating directory: logs/train/plugins/profile/2021_07_24_02_38_20
2021-07-24 02:38:20.456536: I tensorflow/core/profiler/rpc/client/save_profile.cc:143] Dumped gzipped tool data for trace.json.gz to logs/train/plugins/profile/2021_07_24_02_38_20/kennys-Mac-mini.lan.trace.json.gz
2021-07-24 02:38:20.459010: I tensorflow/core/profiler/rpc/client/save_profile.cc:137] Creating directory: logs/train/plugins/profile/2021_07_24_02_38_20
2021-07-24 02:38:20.4590

Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x2919b29a0>

### Evaluate accuracy

To evaluate accuracy, simply invoke `model.evaluate` with testing data and labels. `verbose` defines the level for showing debug information, `0` shows nothing and `2` would show all information including number of data, time used, loss and accuracy.


In [None]:
test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)

2021-07-24 02:39:41.440398: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


313/313 - 1s - loss: 0.3462 - accuracy: 0.8808


## Make prediction using the trained model

### Convert logits to probabilities

With the model trained, you can use it to make predictions about some images. The model's linear outputs, logits. Attach a softmax layer to convert the logits to probabilities, which are easier to interpret. 

In [None]:
probability_model = tf.keras.Sequential([model, tf.keras.layers.Softmax()])


A prediction is an array of 10 numbers. They represent the model's "confidence" that the image corresponds to each of the 10 different articles of clothing. 

In [None]:
predictions = probability_model.predict(test_images)
predictions[0]

2021-07-24 02:39:42.811736: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


array([8.5523095e-07, 3.5839862e-10, 1.8969330e-10, 5.5467264e-10,
       6.0188734e-09, 2.2098024e-03, 6.8269486e-07, 6.3465713e-03,
       1.2584621e-06, 9.9144089e-01], dtype=float32)

# Task 2: Building a text classification model for sentiment analysis


## Dataset: IMDB Large Movie Review Dataset

![](https://url2img-web.herokuapp.com/aHR0cHM6Ly9haS5zdGFuZm9yZC5lZHUvfmFtYWFzL2RhdGEvc2VudGltZW50Lw==)

This dataset consists of 25000 movies reviews from [IMDB](https://www.imdb.com/). Each review have been preprocessed and labeled by sentiment, positve and negative. The content of each review is encoded as a list of word indexes.

This dataset is already included in Keras, thus we could simply load it by invoking `tf.keras.datasets.imdb.load_data()`. The training set and testing set 

### Loading the dataset using `tensorflow-datasets`


In [None]:
!pip install tensorflow-datasets

Collecting tensorflow-datasets
  Downloading tensorflow_datasets-4.3.0-py3-none-any.whl (3.9 MB)
[K     |████████████████████████████████| 3.9 MB 1.2 MB/s 
Collecting tensorflow-metadata
  Downloading tensorflow_metadata-1.1.0-py3-none-any.whl (48 kB)
[K     |████████████████████████████████| 48 kB 1.7 MB/s 
Collecting dill
  Downloading dill-0.3.4-py2.py3-none-any.whl (86 kB)
[K     |████████████████████████████████| 86 kB 1.4 MB/s 
[?25hCollecting promise
  Downloading promise-2.3.tar.gz (19 kB)
Collecting future
  Downloading future-0.18.2.tar.gz (829 kB)
[K     |████████████████████████████████| 829 kB 1.6 MB/s 
Collecting googleapis-common-protos<2,>=1.52.0
  Downloading googleapis_common_protos-1.53.0-py2.py3-none-any.whl (198 kB)
[K     |████████████████████████████████| 198 kB 1.2 MB/s 
[?25hBuilding wheels for collected packages: future, promise
  Building wheel for future (setup.py) ... [?25l- \ done
[?25h  Created wheel for future: filename=future-0.18.2-py3-non

In [None]:
import tensorflow_datasets as tfds

train_data, validation_data, test_data = tfds.load(
    name="imdb_reviews", 
    split=('train[:60%]', 'train[60%:]', 'test'),
    as_supervised=True)

2021-07-24 02:39:52.495444: W tensorflow/core/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "Not found: Could not locate the credentials file.". Retrieving token from GCE failed with "Failed precondition: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata".


[1mDownloading and preparing dataset 80.23 MiB (download: 80.23 MiB, generated: Unknown size, total: 80.23 MiB) to /Users/kenny/tensorflow_datasets/imdb_reviews/plain_text/1.0.0...[0m


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Generating splits...:   0%|          | 0/3 [00:00<?, ? splits/s]

Generating train examples...:   0%|          | 0/25000 [00:00<?, ? examples/s]

Shuffling imdb_reviews-train.tfrecord...:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test examples...:   0%|          | 0/25000 [00:00<?, ? examples/s]

Shuffling imdb_reviews-test.tfrecord...:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised examples...:   0%|          | 0/50000 [00:00<?, ? examples/s]

Shuffling imdb_reviews-unsupervised.tfrecord...:   0%|          | 0/50000 [00:00<?, ? examples/s]

[1mDataset imdb_reviews downloaded and prepared to /Users/kenny/tensorflow_datasets/imdb_reviews/plain_text/1.0.0. Subsequent calls will reuse this data.[0m


## Explore the data

Let's take a look to the first 10 data. Label `1` means positive, and `0` means negative.

In [None]:
print(iter(train_data.batch(10)).next())

(<tf.Tensor: shape=(10,), dtype=string, numpy=
array([b"This was an absolutely terrible movie. Don't be lured in by Christopher Walken or Michael Ironside. Both are great actors, but this must simply be their worst role in history. Even their great acting could not redeem this movie's ridiculous storyline. This movie is an early nineties US propaganda piece. The most pathetic scenes were those when the Columbian rebels were making their cases for revolutions. Maria Conchita Alonso appeared phony, and her pseudo-love affair with Walken was nothing but a pathetic emotional plug in a movie that was devoid of any real meaning. I am disappointed that there are movies like this, ruining actor's like Christopher Walken's good name. I could barely sit through it.",
       b'I have been known to fall asleep during films, but this is usually due to a combination of things including, really tired, being warm and comfortable on the sette and having just eaten a lot. However on this occasion I fell

2021-07-24 02:42:01.575804: W tensorflow/core/kernels/data/cache_dataset_ops.cc:768] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.


## Data preprocessing

Before training, input data of a text classification model need to go throught the processes (standardize, tokenize, and vectorize) in order to the sentences or words to vector. Here is an example for the processes.

```
             +---------------------------------------------+
Raw data     | This was an<br />absolutely terrible movie. |
             +---------------------------------------------+
                                   |
               +------------------------------------------+
Standardized   |  This was an absolutely terrible movie.  |
               +------------------------------------------+
                  /     /    |       |          \       \
             +----+ +---+ +--+ +----------+ +--------+ +-----+
Tokens       |This| |was| |an| |absolutely| |terrible| |movie|
             +----+ +---+ +--+ +----------+ +--------+ +-----+
                  \   |     |       |            |     /
                   +----------------------------------+
vector             |       (7, 21, 1, 64, 89, 5)      |
                   +----------------------------------+
```


### Data cleaning and standardization

As you could see in above data, there are some `<br />` (line-break in HTML) in some review. We need to remove that kind of noise before feeding into our model. Here we define a `custom_standardization` function which accept a string input, process it and return as a tensor.

In [None]:
import re
import string

def custom_standardization(input_data):
  lowercase = tf.strings.lower(input_data)
  stripped_html = tf.strings.regex_replace(lowercase, '<br />', ' ')
  return tf.strings.regex_replace(stripped_html,
                                  '[%s]' % re.escape(string.punctuation),
                                  '')

Let's try with the example data

In [None]:
custom_standardization('This was an<br />absolutely terrible movie.')

<tf.Tensor: shape=(), dtype=string, numpy=b'this was an absolutely terrible movie'>

### Vectorization

In [None]:
max_features = 10000
sequence_length = 250

vectorize_layer = layers.experimental.preprocessing.TextVectorization(
    standardize=custom_standardization,
    max_tokens=max_features,
    output_mode='int',
    output_sequence_length=sequence_length)

Adapt all the words in `train_data` to `vectorize_layer`.

In [None]:
text_data = train_data.map(lambda x, y: x)
vectorize_layer.adapt(text_data)

## Build the model

Here is a simple model for this text classification problem.

```
+-----------------+    +-----------+    +----------------------+    +-------+
|TextVectorization| -> | Embedding | -> |GlobalAveragePooling1D| -> | Dense |
+-----------------+    +-----------+    +----------------------+    +-------+
```

In [None]:
embedding_dim = 16

model = tf.keras.Sequential([
  vectorize_layer,
  layers.Embedding(max_features + 1, embedding_dim),
  layers.GlobalAveragePooling1D(),
  layers.Dense(1)
])

## Define loss function and optimzer

In [None]:
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
              optimizer='adam',
              metrics=tf.metrics.BinaryAccuracy(threshold=0.0))

## Train the model


In [None]:
history = model.fit(train_data.shuffle(10000).batch(512),
                    epochs=10,
                    validation_data=validation_data.batch(512),
                    verbose=1)

Epoch 1/10


2021-07-24 02:42:03.552335: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.




2021-07-24 02:42:04.926287: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:112] Plugin optimizer for device_type GPU is enabled.


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


## Evaluate the model

Again, we could calculate loss and accuracy of the model by passing in test data to `model.evaluate`.

In [None]:
loss, accuracy = model.evaluate(test_data.batch(512))

print("Loss: ", loss)
print("Accuracy: ", accuracy)

Loss:  0.6239250302314758
Accuracy:  0.7565600275993347
