<a href="https://colab.research.google.com/github/nyp-sit/mindef-ai/blob/main/day1-pm/video-anomaly-v2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="https://www.nyp.edu.sg/content/dam/nyp/logo.png" width="238" height="70"/>

Welcome to the lab! Before we get started here are a few pointers on Jupyter notebooks.

1. The notebook is composed of cells; cells can contain code which you can run, or they can hold text and/or images which are there for you to read.

2. You can execute code cells by clicking the ```Run``` icon in the menu, or via the following keyboard shortcuts ```Shift-Enter``` (run and advance) or ```Ctrl-Enter``` (run and stay in the current cell).

3. To interrupt cell execution, click the ```Stop``` button on the toolbar or navigate to the ```Kernel``` menu, and select ```Interrupt ```.


# Video Anomaly Detection 
                                                             
<center>
    <img src="https://nyp-aicourse.s3-ap-southeast-1.amazonaws.com/resources/video-anomaly.png" height="400" width="400" style="vertical-align:middle;margin:10px 20px"/><center>
                

In this lab, we will build an anomaly detector that can help to detect unusual activities in video frames. We will make use of two sets of videos: one set of videos contains only normal pedestrian traffic, and another set of videos contains anomalous activities, such as someone riding a bicycle or a car moving through the scene. We will train an autoencoder model to learn what is normal pedestrian traffic and then use it to detect unusual activities.


## Import libraries

We begin by importing the libraries that we need, mainly the tensorflow (which is the framework that we use to build the autoencoder neural network) and some utility libraries that help us draw and display the images.

In [None]:
import tensorflow as tf
import os
from utils import *
from IPython.display import display
from IPython.display import Image as ipyImage
from dataset_util import prepare_dataset


In [None]:
# Uncomment this is to fix the libtiff bugs on Windows
# !conda install libtiff=4.1.0=h885aae3_4 -c conda-forge -y

## Dataset 

**UCSD Anomaly Detection Dataset**

The UCSD Anomaly Detection Dataset is a set of video frames from a stationary camera overlooking pedestrian walkways. The crowd density in the walkways was variable, ranging from sparse to very crowded. In the normal setting, the video contains only pedestrians. Abnormal events include bikers, skaters, small carts, and people walking across a walkway or in the grass that surrounds it. The data was split into 2 subsets, each corresponding to a different scene. The video footage recorded from each scene was split into various clips of around 200 frames:

- *Peds1*: scenes of people walking towards and away from the camera, and some amount of perspective distortion. Contains 34 training video samples and 36 testing video samples.

- *Peds2*: scenes with pedestrian movement parallel to the camera plane. Contains 16 training video samples and 12 testing video samples.

In this lab, we will use the *Peds1* dataset. In the next lab, you will experiment with *Peds2* dataset. 

**Note:** The [original dataset](http://www.svcl.ucsd.edu/projects/anomaly/UCSD_Anomaly_Dataset.tar.gz) hosted by University of San Diego (UCSD) contains some corrupted TIF image frames, which causes errors when loaded by the python image libray. The dataset you will be using is the one we have cleaned up to exclude those corrupted images. So be aware if you intend to use the dataset directly downloaded from the UCSD website.

### Download the Dataset

Run the cells below to download the dataset. After the dataset is downloaded, it will be unzipped to the directory called video_dataset. You can see the `video_dataset` in the file browser in this Jupyter lab IDE. The `video_dataset` directory contains two sub-folders: *UCSDped1* and *UCSDped2*. 

In [None]:
base_dataset_dir = 'video_dataset'
datafile_url  = 'https://nyp-aicourse.s3-ap-southeast-1.amazonaws.com/datasets/UCSD_Anomaly_Dataset.v1p2.zip'
download_data(base_dataset_dir, datafile_url, extract=True, force=False)

We will use the UCSDped1 dataset here. The dataset is split into two subsets: one for training, and one for testing. 
The training data (consists of 24 video clips) are in Train subfolder. Each clip is in a separate subfolder 'Train001', 'Train002', etc, and each of these subfolders contains 200 image frames.

In the code below, we are just setting up all the filepaths to be used later.

In [None]:
# For now we use the UCSDped1 dataset.
dataset = 'UCSDped1'

# setup all the relative path
root_path = os.path.join(base_dataset_dir, dataset)
train_dir = os.path.join(root_path, 'Train')
test_dir = os.path.join(root_path, 'Test')

### Visualize the Train dataset

Our training set contains only video scenes that are 'normal'. Let's look at a few samples. 

You can change the variable `train_sample_folder` to another folder e.g. Train010, Train200, etc. 

The variable `image_range = (1,9)` allows you to view images from 1 to 8 (the left-hand number in the bracket is excluded). Feel free to change the range to view other images.

In [None]:
# You can change the following train_sample_folder to another folder to view other clips
train_sample_folder = 'Train034' 
image_range = (1,9)  # this display image from 1 to 8
image_folder = os.path.join(train_dir, train_sample_folder)
display_images(image_folder, image_range=image_range, max_per_row=4)

#### Visualize as 'Video'

We will convert the image frames to a 'video' (actually as animated gif) for easier viewing. The video consists of 200 frames. From the left navigation panel, you will see that a gif file called `<train_sample_folder_name>.gif` has been created, e.g. `Train034.gif`.

In [None]:
gif_filename = train_sample_folder + '.gif' 
create_gif(image_folder, gif_filename, img_type='tif')

Now we will play the video.

In [None]:
with open(gif_filename,'rb') as file:
    display(ipyImage(file.read(), format='png'))

## Visualize the Test dataset

Let us visualize the video frames from the test dataset folder Test001, as an animated gif. You should be able to see some anomalous event (e.g. someone riding a bicycle) in the animated gif you create. 

In [None]:
# set the test sample folder to folder of Test001 and set the image_folder accordingly
test_sample_folder = 'Test001' 
image_folder = os.path.join(test_dir, test_sample_folder) 

create_gif(image_folder, gif_filename, img_type='tif')

with open(gif_filename,'rb') as file:
    display(ipyImage(file.read(), format='png'))

### Prepare Training and Validation Dataset

Now we create a Tensorflow dataset suitable for use in training the Autoencoder network later. In preparing the dataset, we resize the all the images to same height `(IMG_HEIGHT)` and width `(IMG_WIDTH)`. Typically we set the height same as width (square image) for training, even though the original image may not be square. For deep learning network, it does not matter whether the image is square or rectangle.

We also split the data into training set (80%) and validation set (20%). We use the validation set to check if we are overfitting model to the training data.

In [None]:
IMG_HEIGHT=100
IMG_WIDTH=100
BATCH_SIZE=16

train_fileset = os.path.join(train_dir, '*/*.tif')

train_dataset, validation_dataset = prepare_dataset(train_fileset,
                                img_height=IMG_HEIGHT, 
                                img_width=IMG_WIDTH, 
                                batch_size=BATCH_SIZE,
                                shuffle=True,
                                split=True,
                                test_size=0.2)

In [None]:
# We have a total of 34 x 200 = 6800 images. 
# 80% allocated to train set = (0.8 * 6800)/16 = 340 batches
# 20% allocated to validation set = (0.2 * 6800)/16 = 85

print('Number of batches of train images = {}'.format(len(list(train_dataset))))
print('Number of batches of validation images = {}'.format(len(list(validation_dataset))))

### Building the Autoencoder Model


![autoencoder](https://nyp-aicourse.s3-ap-southeast-1.amazonaws.com/resources/autoencoder.png)


We will first build the Encoder network. In the lecture, we learnt that autoencoder learns the latent representation of the data by having a bottleneck layer, so that it is forced to capture only the most important features that allows it to reconstruct the input. 
You can see that as we move deeper into the encoder, the number of neurons typically are getting smaller and is the smallest at the 'latent' layer.

In [None]:
# The encoder part of the Audo-encoder model
inputs = tf.keras.layers.Input(shape=(100,100,1))
x = tf.keras.layers.Conv2D(32, kernel_size=5, activation='relu')(inputs)
x = tf.keras.layers.MaxPool2D(pool_size=2)(x)
x = tf.keras.layers.Conv2D(32, kernel_size=5, activation='relu')(x)
x = tf.keras.layers.MaxPool2D(pool_size=2)(x)
x = tf.keras.layers.Flatten()(x)
encoded = tf.keras.layers.Dense(2000)(x)
encoder = tf.keras.Model(inputs=[inputs], outputs=[encoded])


Let's print out the model's summary so that we can see the number of output (think of these as number of neurons) at each layer. As you will observe, our input layer has input shape of 100x100x1 (Note: the last number is the number of channels, and since we dealing with grayscale image, there is only 1 channel). 

After the 1st convolutional + maxpooling layer, the number of outputs becomes 48x48x32. This is further reduced after the 2nd convolutional + maxpooling layer, to 22x22x32, and further reduced to 2000  at the latent layer. This means the network is forced to learn to capture the most important 'latent' information in the training data into a mere 2000 neurons. The information captured in this latent layer will then be used by the decoder to reconstruct the original image.

In [None]:
encoder.summary()
#tf.keras.utils.plot_model(encoder)

Now we will build the decoder part of the network.

In [None]:
# The decoder part of the Audo-encoder model

decoder_inputs = tf.keras.layers.Input(shape=(2000))
x = tf.keras.layers.Dense(22*22*32, activation='relu')(decoder_inputs)
x = tf.keras.layers.Reshape(target_shape=(22,22,32))(x)
x = tf.keras.layers.UpSampling2D(2, interpolation='nearest')(x)
x = tf.keras.layers.Conv2DTranspose(32, kernel_size=5, activation='relu')(x)
x = tf.keras.layers.UpSampling2D(2, interpolation='nearest')(x)
decoded = tf.keras.layers.Conv2DTranspose(1, kernel_size=5, activation='sigmoid')(x)
decoder = tf.keras.Model(inputs=[decoder_inputs], outputs=[decoded])

Similarly we print out the summary of the decoder part, so that we can see the output shape of each decoder layer.
Opposite to the encoder, the number of neurons increases as we progress towards the output layer. We can see that from 2000 units in the latent layer, we use UpSampling and Transpose Convolution to increase the number of output neurons, until we get back the original size of the input image, i.e. 100x100x1.

In [None]:
decoder.summary()
#tf.keras.utils.plot_model(decoder)

Now we stack the encoder and decoder to become the complete autoencoder network.

In [None]:
encoding = encoder(inputs)
decoding = decoder(encoding)
conv_ae = tf.keras.Model(inputs=[inputs], outputs=[decoding])

In [None]:
conv_ae.summary()

In the code below, we specify to use Mean Squared Error (MSE) as our loss function. Basically the network compute the square of difference between original image and reconstructed image and use this loss (or MSE) to adjust the weights to minimise the loss. The MSE is given by the equation below:

$$\frac{1}{m}\sum_{i=1}^m (\hat{y}_{i} - y_i)^2$$


where $\hat{y}$ is the predicted output and $y$ is the actual value. 



In [None]:
conv_ae.compile(loss=tf.keras.losses.MeanSquaredError(), 
        optimizer=tf.keras.optimizers.Adam(lr=1e-4, decay=1e-4),
        metrics=['mae'])


Let's the training begin!! This might take a while so *sit back, relax and wait!*

In [None]:
num_epochs = 30
 
run_logdir = get_run_logdir() # e.g., './my_logs/run_2019_06_07-15_15_22'
tensorboard_cb = tf.keras.callbacks.TensorBoard(run_logdir)
history = conv_ae.fit(train_dataset, 
                  validation_data=test_dataset,
                  epochs=num_epochs, 
                  callbacks=[tensorboard_cb])


Let's plot the training and validation loss to see how our network progress with the training. 

In [None]:
plot_training_loss(history.history)

Here we are just setting up path to a sample image from train set and test set respectively. The train sample shows a 'normal' scene, while the test sample shows an 'anomalous' scene.

In [None]:
sample_train_image = os.path.join(train_dir, 'Train001/001.tif')
sample_test_image = os.path.join(test_dir, 'Test024/140.tif')

Now we will take a 'normal' image from the train set, and see how well the autoencoder reconstructs it. We will plot the original image on the left and the reconstructed image on the right. 

Here you can see that it can mostly reconstruct the original image (the left)

In [None]:
show_reconstructions(conv_ae, sample_train_image)

Let's look at the 'abnormal' image from the test set where a cart can be seen driving through the walkway.  

Since the cart is something that the autoencoder has never seen before, it failed to reconstruct it properly. 

In [None]:
show_reconstructions(conv_ae, sample_test_image)

Let's us look at how the reconstruction loss varies for each video frame. Here you can clearly see that the loss starting to spike after the cart has entered the scene and continue to rise until the cart disappears from the scene, when the loss drops suddenly.

### Prepare Testing Dataset

Now we will create a test dataset that we can use to test the trained auto-encoder. 

**Exercise**:

You can change the following `test_sample_folder` to the video folder you want to test. For now, let's just use the one from Test014 which you have visualized earlier in the lab.

In [None]:
BATCH_SIZE=1

test_sample_folder = 'Test014'

test_fileset = os.path.join(test_dir, test_sample_folder, "*.tif")

test_dataset = prepare_dataset(test_fileset,
                                img_height=IMG_HEIGHT, 
                                img_width=IMG_WIDTH, 
                                batch_size=BATCH_SIZE,
                                shuffle=False)
print(len(list(test_dataset)))

#### Reconstruction loss over different video frames

The following codes take all the video frames from the test folder and runs through the autoencoder and compute the reconstruction loss and show the reconstruction loss for each frame.  

In [None]:
create_losses_animation(conv_ae, test_dataset, "losses.gif")
with open('losses.gif','rb') as file:
    display(ipyImage(file.read(), format='png'))

#### Identification of anomalous object from the video frames

The function `identify_anomaly()` will compute the differences of each pixel between original frame and the reconstructed frame (for a total of 200 fames), and by comparing the differences over a patch of 4x4 pixels, and if the difference is above certain threshold, it will mark that patch with red color to signify that there is an anomalous object detected within that patch. 

The 200 frames will be displayed as animated gif to better visualize the changes over time.

**Exercise**

Try changing the threshold below to adjust the sensitivity of the certain pixels being classified as anomalous.

In [None]:
threshold = 4.0
identify_anomaly(conv_ae, test_dataset, "video.gif", threshold)
with open('video.gif','rb') as file:
    display(ipyImage(file.read(), format='png'))