<a href="https://colab.research.google.com/github/qidopox/Deep_Learning_for_image_processing_practice/blob/main/DL_practice.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to Deep Learning for image processing practice

Convolutional Neural network (CNN) was usually used for image processing. Common applications include image classification, segmentation and denoising. In this practice, we are going to train a neural network for classifying handwritten numbers （MNIST dataset）. We are going to use two different trained neural networks for two separate tasks - image segmentation and denoising.

##Exercise 1: Train a neural network for classifying handwritten numbers

In this exercise, we will use MNIST database. The MNIST database (Modified National Institute of Standards and Technology database) is a collection of handwritten digits. For details of the dataset, please see https://paperswithcode.com/dataset/mnist

This exercise is manageable using CPU time. As a free Colab account would have limited GPU runtime, I would like to recommend you to do this exercise using CPU.

To change runtime type, on the top left option bar, select *Runtime* --> *Change runtime type*.

We first import the relevant python packages. In this case, we will use tensorflow and keras to build and train neural networks. The matlotlib package is for figure plotting. The datatime package is to provide access to date and time.

In [None]:
import tensorflow as tf
import tensorflow.keras as keras
import matplotlib.pyplot as plt
import datetime
import numpy as np


MNIST dataset was included in the keras dataset and thus can be loaded directly from keras.

As tensorflow takes float32 as the input, we need to convert the input images, which are in uint8 format, to float32.

In [None]:
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = tf.cast(x_train, tf.float32) / 255
x_test = tf.cast(x_test, tf.float32) / 255

Printing out the sizes of the training dataset and test dataset. There are 60,000 examples in the training dataset and 10,000 in the test dataset. The image inputs are in the sizes of 28 by 28.

In [None]:
print('training data input shape:',x_train.shape,'\n training data label shape',y_train.shape,'\n test data input shape:',x_test.shape,'\n test data label shape:',y_test.shape)

Plotting out an example from the training dataset and check its corresponding label is correct.

In [None]:
i = 0
plt.imshow(x_train[i,:,:])
plt.show()
print('label of the figure is ', y_train[i])

Construct a neural network with an input the same size as the images and an output equals to 10 which corresponds to the 10 different digits, 0-9.

In [None]:
inputs = inputs = keras.Input(shape=(x_train.shape[1], x_train.shape[2]))
x = keras.layers.Flatten()(inputs)
x = keras.layers.Dense(64, activation=tf.nn.relu)(x)
outputs = keras.layers.Dense(10, activation=tf.nn.softmax)(x)
model= keras.Model(inputs=inputs, outputs=outputs,name='mnist_classification_fully_corrected')

We can print out a summary of the neural network.

In [None]:
model.summary()

We select "Adam" as the optimisation algorithm for the neural network training. The loss function is "SparseCategoricalCrossentropy". We also use tensorboard to monitor the training process.

For details of "Adam", please find https://arxiv.org/abs/1412.6980

For details of "SparseCategoricalCrossentropy", please find https://www.tensorflow.org/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy

In [None]:
model.compile(optimizer=keras.optimizers.Adam(learning_rate=1e-3),
              loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],)
log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)

We fit x_train dataset as the input training dataset and y_train as the output training dataset. We select to train a maximum of 30 epochs. We fit x_test dataset as the input validation dataset and y_test as the output validation dataset.

By running this cell, you will start to train your neural network.

In [None]:
model.fit(
    x=x_train,y=y_train,
    epochs=10,
    validation_data=(x_test,y_test),
    callbacks=[tensorboard_callback],
)

Open tensorboard to monitor the network training.

In [None]:
%load_ext tensorboard
%tensorboard --logdir logs/fit

The trained network model is saved in the folder "models" as a h5 file.

In [None]:
model.save('./models/'+model.name+'_.h5')

Loading the trained model. Using the trained model to classify the digit from handwritten number.

In [None]:
model = keras.models.load_model('./models/'+model.name+'_.h5', compile=False)
i = 1
y_pred=np.argmax(model.predict(np.asarray(x_test[i,:,:]).reshape([1,28,28,1])))

print('label:',y_test[i],'\n prediction:',y_pred)

##Exercise 2: Train a convolutional neural network working on MNIST dataset for classifying handwritten numbers

This exercise is manageable using CPU time.

Let's now construct a CNN for the same task. Please compare the architectures and the performances of the two types of networks.

In [None]:
inputs = keras.Input(shape=(x_train.shape[1], x_train.shape[2],1))
x = keras.layers.Conv2D(4,(3, 3),activation=tf.nn.relu,
                  kernel_initializer="glorot_uniform",
                  padding="same",name="conv1",)(inputs)
x = keras.layers.MaxPooling2D((2, 2), strides=(2, 2), name="pool1")(x)
x = keras.layers.Conv2D(8,(3, 3),activation=tf.nn.relu,
                  kernel_initializer="glorot_uniform",
                  padding="same",name="conv2",)(x)
x = keras.layers.MaxPooling2D((2, 2), strides=(2, 2), name="pool2")(x)
x = keras.layers.Flatten()(x)
x = keras.layers.Dense(16, activation=tf.nn.relu)(x)
outputs = keras.layers.Dense(10, activation=tf.nn.softmax)(x)
model_CNN = keras.Model(inputs=inputs, outputs=outputs,name='mnist_classification_CNN')


In [None]:
model_CNN.summary()

In [None]:
model_CNN.compile(optimizer=keras.optimizers.Adam(learning_rate=1e-3),
              loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=[tf.keras.metrics.SparseCategoricalAccuracy()],)
log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)

In [None]:
model_CNN.fit(
    x=x_train,y=y_train,
    epochs=10,
    validation_data=(x_test,y_test),
    callbacks=[tensorboard_callback],
)

 Please take a look at the loss function plot (epoch_loss) on the tensorboard. What are the differences of the two plots? Why do you think the differences?
 If you repeat the process and retrain the network, do you obtain the same result plots on tensorboard?

##Exercise 3: Train a U-net for image segmentation through transfer learning

In this practice, we use the Oxford-IIIT Pet Dataset (Parkhi et al, 2012). The dataset consists of images of 37 pet breeds. Please see details here https://www.robots.ox.ac.uk/%7Evgg/data/pets/

This Exercise was adapted from the tensorflow tutorial https://www.tensorflow.org/tutorials/images/segmentation

You may wish to switch your runtime to GPU for this exercise.



In [None]:
import tensorflow_datasets as tfds

If you work on this practice in the same runtime as the first practice, you will not need to re-import the following packages.

In [None]:
import tensorflow as tf
import tensorflow.keras as keras
import matplotlib.pyplot as plt
import datetime

Load the Oxford-IIIT Pet Dataset. It may take a while.

In [None]:
dataset, info = tfds.load('oxford_iiit_pet:3.*.*', with_info=True)

Print the data info.

In [None]:
print(info)

Prepare the data for batch training. There are two functions and one class we defined here:

1. Normalise the data so that the pixel values of the images fall in the range of [0,1]. The pixel values in the masks for image segmentations were labeled either {1,2,3}. It is easier to work in python if the labels are {0,1,2} and thus we deduct 1 to all pixels in the masks.

2. Load the images. Tensorflow dataset has a non-specified image size that allow users to define. We set the input images and masks to have sizes of 128 by 128.

3. Data augmentation. We do some simple data augmentation by flip and rotate the images and masks.

Finally we define training batch and test batch to feed into network training.

* Note that for those who are not too familiar with python, *def* and *class* can be seen as two styles of python programming and differences were subtle. *def* is task oriented and *class* is data oriented.

In [None]:
def normalize(input_image, input_mask):
  input_image = tf.cast(input_image, tf.float32) / 255.0
  input_mask -= 1
  return input_image, input_mask

def load_image(datapoint):
  input_image = tf.image.resize(datapoint['image'], (128, 128))
  input_mask = tf.image.resize(
    datapoint['segmentation_mask'],
    (128, 128),
    method = tf.image.ResizeMethod.NEAREST_NEIGHBOR,
  )

  input_image, input_mask = normalize(input_image, input_mask)

  return input_image, input_mask

TRAIN_LENGTH = info.splits['train'].num_examples
BATCH_SIZE = 64
BUFFER_SIZE = 1000
STEPS_PER_EPOCH = TRAIN_LENGTH // BATCH_SIZE

train_images = dataset['train'].map(load_image, num_parallel_calls=tf.data.AUTOTUNE)
test_images = dataset['test'].map(load_image, num_parallel_calls=tf.data.AUTOTUNE)

class Augment(tf.keras.layers.Layer):
  def __init__(self, seed=(42,15,28)):
    super().__init__()
    # both use the same seed, so they'll make the same random changes.
    self.augment_inputs = tf.keras.layers.RandomFlip(mode="horizontal", seed=seed[0])
    self.augment_labels = tf.keras.layers.RandomFlip(mode="horizontal", seed=seed[0])

    self.augment_inputs = tf.keras.layers.RandomFlip(mode="vertical", seed=seed[1])
    self.augment_labels = tf.keras.layers.RandomFlip(mode="vertical", seed=seed[1])

    self.augment_inputs = tf.keras.layers.RandomRotation(factor=0.5, seed=seed[2])
    self.augment_labels = tf.keras.layers.RandomRotation(factor=0.5, seed=seed[2])


  def call(self, inputs, labels):
    inputs = self.augment_inputs(inputs)
    labels = self.augment_labels(labels)
    return inputs, labels

train_batches = (
    train_images
    .cache()
    .shuffle(BUFFER_SIZE)
    .batch(BATCH_SIZE)
    .repeat()
    .map(Augment())
    .prefetch(buffer_size=tf.data.AUTOTUNE))

test_batches = test_images.batch(BATCH_SIZE)

Now let's take a look of the training input images and their corresponding masks.

In [None]:
def display(display_list):
  plt.figure(figsize=(15, 15))

  title = ['Input Image', 'True Mask', 'Predicted Mask']

  for i in range(len(display_list)):
    plt.subplot(1, len(display_list), i+1)
    plt.title(title[i])
    plt.imshow(tf.keras.utils.array_to_img(display_list[i]))
    plt.axis('off')
  plt.show()

for images, masks in train_batches.take(5):
  sample_image, sample_mask = images[0], masks[0]
  display([sample_image, sample_mask])

Let's now construct a U-Net for the image segmentation task.

U-Net consists of an encoder (down-sampler) and a decoder (up-sampler). An encoder can be seen as a network mainly for feature extraction and a decoder can be seen as re-construct a image from the features. For the details of the U-Net, please see https://arxiv.org/abs/1505.04597

Due to the limited training dataset, to ensure a better performance, we will do transfer learning. We will use a trained encoder provided by keras MobileNetV2 and only run training on an untrained decoder. For details of MobileNetV2, please see https://arxiv.org/abs/1801.04381

* A question for you to consider: why for transfer learning we use an existing fixed trained encoder (the first half of the U-Net) and only train a decoder (the second half of the U-Net)? Why are we not doing the other way round (i.e. train an encoder and fix a decoder)?

An untrainable encoder model using MobileNetV2.

In [None]:
base_model = tf.keras.applications.MobileNetV2(input_shape=[128, 128, 3], include_top=False)

# Use the activations of these layers
layer_names = [
    'block_1_expand_relu',   # 64x64
    'block_3_expand_relu',   # 32x32
    'block_6_expand_relu',   # 16x16
    'block_13_expand_relu',  # 8x8
    'block_16_project',      # 4x4
]
base_model_outputs = [base_model.get_layer(name).output for name in layer_names]

# Create the feature extraction model
down_stack = tf.keras.Model(inputs=base_model.input, outputs=base_model_outputs)

down_stack.trainable = False

A trainable decoder model consisting 4 mini decoder block

In [None]:
def DecoderMiniBlock(prev_layer_input, skip_layer_input, n_filters=32):
    up = keras.layers.Conv2DTranspose(
                 n_filters,
                 (3,3),
                 strides=(2,2),
                 padding='same')(prev_layer_input)
    merge = keras.layers.concatenate([up, skip_layer_input], axis=3)
    conv = keras.layers.Conv2D(n_filters,
                 3,
                 activation='relu',
                 padding='same',
                 kernel_initializer='HeNormal')(merge)
    conv = keras.layers.Conv2D(n_filters,
                 3,
                 activation='relu',
                 padding='same',
                 kernel_initializer='HeNormal')(conv)
    return conv



Finally, let's define our UNet and put the untrainable encoder and trainable decoder together.

In [None]:
def unet_model(output_channels:int):
  inputs = tf.keras.layers.Input(shape=[128, 128, 3])

  # Downsampling through the model
  skips = down_stack(inputs)
  x = skips[-1]
  skips = reversed(skips[:-1])

  # Upsampling and establishing the skip connections
  for skip in skips:
    x = DecoderMiniBlock(x,skip)

  # This is the last layer of the model. Notice that kernel_size is 3 as our mask has 3 labels.
  last = keras.layers.Conv2DTranspose(
      filters=output_channels, kernel_size=3, strides=2,
      padding='same')  #64x64 -> 128x128

  x = last(x)

  return tf.keras.Model(inputs=inputs, outputs=x,name='Oxford-IIIT_Pet_UNet')

In [None]:
OUTPUT_CLASSES = 3

model = unet_model(output_channels=OUTPUT_CLASSES)
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)

Please print out the summary of the model. Could you notice the untrainable parameters of this UNet model? Where does these untrainable parameters come from? Do you expect this?

In [None]:
model.summary()

To easily view the structure of this slightly complicated network, alternatively you can use *keras.utils.plot_model()*. Visually this may be easier to see for some people.

In [None]:
keras.utils.plot_model(model, show_shapes=True)

Let's first examine the predictions of the UNet before training the model.

The output of the UNet is in the size of 128 by 128 by 3. To display the mask, we apply *argmax* to the last channel so that the maximum pixel reading out of the three channels would return its channel index (either {0,1,2}).

Without training the UNet, the returned predicted mask is random.

In [None]:
def create_mask(pred_mask,index):
  pred_mask = tf.math.argmax(pred_mask, axis=-1)
  pred_mask = pred_mask[..., tf.newaxis]
  return pred_mask[index]

def show_predictions(dataset=None, num=1,index = 0):
  if dataset:
    for image, mask in dataset.take(num):
      pred_mask = model.predict(image)
      display([image[index], mask[index], create_mask(pred_mask,index)])
  else:
    for images, masks in train_batches.take(1):
      pred_mask = model.predict(images)
      display([images[index], masks[index], create_mask(pred_mask,index)])

show_predictions()

In [None]:
EPOCHS = 20
VAL_SUBSPLITS = 5
VALIDATION_STEPS = info.splits['test'].num_examples//BATCH_SIZE//VAL_SUBSPLITS

model_history = model.fit(train_batches, epochs=EPOCHS,
                          steps_per_epoch=STEPS_PER_EPOCH,
                          validation_steps=VALIDATION_STEPS,
                          validation_data=test_batches,
                          callbacks=[tensorboard_callback],)

Please run the previous cell again to see how the predicted masks of training dataset using the trained model now look like.

Please run the following cell to see how the predicted masks of the testing dataset using hte trained model now look like.

In [None]:
show_predictions(test_batches,1,5)

##Exercise 4: Train a neural network for image denoising


In this exercise we will train a self-supervised learning CNN called *noise2void*. This model trains only on 1 noisy image and outputs the clean version of the image. For details of noise2void, please see https://arxiv.org/abs/1811.10980

We will first use StableDiffusion model from keras_cv to generate an image from the text of your choice. For details of StableDiffusion please see https://github.com/CompVis/stable-diffusion?tab=readme-ov-file#stable-diffusion-v1

We will then add random noise to the image and use noise2void to remove the noise.

You may wish to switch your runtime to GPU for this exercise.

As the current noise2void is only supported for python 3.9, tensorflow 2.7 and 2.10, let's first install tensorflow v2.10. Note that you may need to restart your runtime to allow this installation.

In [None]:
# Install python 3.9
!sudo apt-get update -y
!sudo apt-get install python3.9

In [None]:
!pip install tensorflow==2.10

StableDiffusion is available in *keras_cv*. Let's install *keras_cv*.

In [None]:
!pip install tensorflow keras_cv --upgrade --quiet

In [None]:
import time
import keras_cv
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt

Stablediffusion is an opensource text-to-image generation model. Please type in the text descibing the image you want to generate and run the cell. You may also adjust the size of your image by changing *img_width* and *img_height*. It may take a while to generate an image.

In [None]:
model = keras_cv.models.StableDiffusion(img_width=512, img_height=512)

image = model.text_to_image("a high resolution image of an astronaut riding a horse", batch_size=1)
image = tf.cast(image, tf.float32) / 255
plt.imshow(image[0])
plt.axis("off")

Let's now add some noise to the image. You may adjust the noise level by adjusting the standard deviation of the noise generation setting.

In [None]:
noisy_image = image+tf.random.normal(shape=image.shape,mean=0.0,stddev=0.1,)
noisy_image = tf.clip_by_value(noisy_image, clip_value_min=0., clip_value_max=1.)
plt.imshow(noisy_image[0])
plt.axis("off")

Let's now install the denoising model noise2void

In [None]:
!pip install n2v


In [None]:
from n2v.models import N2VConfig, N2V
import numpy as np
from csbdeep.utils import plot_history
from n2v.utils.n2v_utils import manipulate_val_data
from n2v.internals.N2V_DataGenerator import N2V_DataGenerator
from matplotlib import pyplot as plt

In order to traing a model for denoising on a single noisy image, small patches need to be generated from the single image to form the training dataset. Here we selected a patch size of 64 by 64. Module *N2V_DataGenerator()* would help us to generate all the available patches.

In [None]:
datagen = N2V_DataGenerator()
noisy_image = np.expand_dims(np.asarray(noisy_image),axis=0)
patch_shape=(64,64)
patches = datagen.generate_patches_from_list(noisy_image, shape=patch_shape)

Let's split the training dataset from validation dataset. Ideally we have a larger portion for training and a smaller portion for testing.

In [None]:
X = patches[:501]
X_val = patches[501:]

This is to show you an example of a training patch and a validation patch.

In [None]:
plt.figure(figsize=(14,7))
plt.subplot(1,2,1)
plt.imshow(X[0,...])
plt.title('Training Patch');
plt.subplot(1,2,2)
plt.imshow(X_val[0,...])
plt.title('Validation Patch');

This is the configuration we put in for this particular training.

In [None]:
config = N2VConfig(X, unet_kern_size=3,
                   unet_n_first=64, unet_n_depth=3, train_steps_per_epoch=int(X.shape[0]/128), train_epochs=25, train_loss='mse',
                   batch_norm=True, train_batch_size=128, n2v_perc_pix=0.198, n2v_patch_shape=(64, 64),
                   n2v_manipulator='uniform_withCP', n2v_neighborhood_radius=5, single_net_per_channel=False)

# Let's look at the parameters stored in the config-object.
vars(config)

model_name = 'n2v_2D'
# the base directory in which our model will live
basedir = 'models'
# We are now creating our network model.
model = N2V(config, model_name, basedir=basedir)

Let's now train the model.

In [None]:
history = model.train(X, X_val)

Let's now use our trained N2V model to denoise our noisy image.

In [None]:
pred = model.predict(noisy_image[0,0], axes='YXC')
plt.figure(figsize=(30,30))

# We show the noisy input...
plt.subplot(1,2,1)
plt.imshow(noisy_image[0,0] )
plt.title('Input');

# and the result.
plt.subplot(1,2,2)
plt.imshow( pred )
plt.title('Prediction');

##Exercise 5: Train your neural network for flower classification

In this exercise, you will design and train a network for flower images classification. The dataset is downloaded from *tensorflow example_images/flower_photos* consists of more than 3,600 images and 5 classes of flowers.

By now you should have taken your afternoon lecture and are aware that there are many choices on how you want to structure, regularise and train your neural network. You should also be aware of various ways to augment your training dataset.

We will run a small competition within this class to see who will train a network provides a highest validation accuracy. Please include the metric *tf.keras.metrics.SparseCategoricalAccuracy()* in *model.compile()* to enter the competition as this metric (calculated from the validation dataset) will be used to evaluate your neural network performance for the competition. Whoever enters the competion has their permission to put their (preferred) name on the whiteboard. If you think your network performs better than anyone else's on the white board (including your own previous entry), please shout out and a demonstrator will come around to verify.

The winner of this exercise will be awarded with a small gift.

Running this first cell will generate your basic training and validation dataset. To enter the competition, please do NOT modify this cell so that everyone starts with the same training/validation dataset split.

You may augment your training dataset. Please do NOT modify your validation dataset.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import PIL
import tensorflow as tf

from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential

import pathlib

dataset_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz"
data_dir = tf.keras.utils.get_file('flower_photos.tar', origin=dataset_url, extract=True)
data_dir = pathlib.Path(data_dir).with_suffix('')

batch_size = 32
img_height = 128
img_width = 128

train_ds = tf.keras.utils.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="training",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

val_ds = tf.keras.utils.image_dataset_from_directory(
  data_dir,
  validation_split=0.2,
  subset="validation",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

class_names = train_ds.class_names
print(class_names)

Visualising the dataset.

In [None]:
plt.figure(figsize=(10, 10))
for images, labels in train_ds.take(1):
  for i in range(4):
    ax = plt.subplot(2, 2, i + 1)
    plt.imshow(images[i].numpy().astype("uint8"))
    plt.title(class_names[labels[i]])
    plt.axis("off")

All set! Now it is time to build and train your neural network for flower classification!