# Hands-on Session 5: Convolutional Neural Network



In [None]:
import numpy as np
import matplotlib.pyplot as plt

In [None]:
import tensorflow as tf
#tf.logging.set_verbosity(tf.logging.ERROR)

In [None]:
from google.colab import drive
drive.mount('/content/gdrive/')


### Fruits-360 Dataset

In [None]:
!git clone https://github.com/Horea94/Fruit-Images-Dataset.git

Let's start by doing some preparation on the data and sort out the labels.

In [None]:
ls

In [None]:
import glob
fruitpath = glob.glob("/content/Fruit-Images-Dataset/Training/*/*.jpg")

In [None]:
print(fruitpath[0:5])
print("")
print(len(fruitpath))

In [None]:
import random
random.seed(42)
random.shuffle(fruitpath)
print(fruitpath[0:5])

In [None]:
import cv2
import os

print("GETTING DATA & LABELS...")
data = []
labels = []
for imgPath in fruitpath:
  image = cv2.imread(imgPath)
  image = cv2.resize(image, (32, 32)).flatten()  # why do we do this?
  data.append(image)
  labels.append(imgPath.split(os.path.sep)[-2])

print(len(data))
print(len(labels))

# scale the raw pixel intensities to the range [0, 1]
print("PREPARING...")
data = np.array(data, dtype="float") / 255.0
labels = np.array(labels)

print("DONE!")

In [None]:
print(np.unique(labels))
print(len(np.unique(labels)))

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelBinarizer

# partition the data into training and testing splits using 75% of
# the data for training and the remaining 25% for testing
(trainX, testX, trainY, testY) = train_test_split(data, labels, test_size=0.25, random_state=42, stratify=labels)
print(trainX.shape)
print(testX.shape)

In [None]:

from sklearn.preprocessing import LabelBinarizer


# convert the labels from integers to one-hot-encoded vectors (for 2-class
# binary classification you should use Keras' to_categorical function
# instead as the scikit-learn's LabelBinarizer will not return a
# vector)
lb = LabelBinarizer()
trainY = lb.fit_transform(trainY)
testY = lb.transform(testY)

# show one example
print(trainY[0])
print(testY[0])

Before we get to the CNN, let's build a simple MLP, which means that our input is a single dimension of values. The only way to transform an image (2D) into a vector (1D), is by "flattening" it. Flattening typically takes each row of the image and strings them together in order. Considering color images (3D), flattening would mean the process is repeated for all three channels.

A 32x32 RGB image will result in 32 * 32 * 3 = 3,072 values after flattening.

In [None]:
from keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD

# define the 3072-512-512 architecture using Keras
model = Sequential()
model.add(Dense(512, input_shape=(3072,), activation="sigmoid"))
model.add(Dense(512, activation="sigmoid"))
model.add(Dense(len(lb.classes_), activation="softmax"))

In [None]:
model.summary()

In [None]:
# initialize parameters
initial_lr = 0.01
EPOCHS = 50
BATCH_SIZE = 64
opt = SGD(learning_rate=initial_lr)

model.compile(loss="categorical_crossentropy", optimizer=opt, metrics=["accuracy"])

In [None]:
# train the neural network
H_mlp = model.fit(trainX, trainY, validation_data=(testX, testY), epochs=EPOCHS, batch_size=BATCH_SIZE)

In [None]:
from sklearn.metrics import classification_report


In [None]:
# evaluate the network
print("EVALUATING NETWORK...")
predictions = model.predict(testX, batch_size=32)
print(classification_report(testY.argmax(axis=1),
predictions.argmax(axis=1), target_names=lb.classes_))

In [None]:
# Write a function to plot both training loss and accuracy

def plotcurves(H_mlp, EPOCHS):
	N = np.arange(0, EPOCHS)
	plt.figure(figsize=(12,4))
	plt.subplot(121)
	plt.plot(N, H_mlp.history["loss"], label="train_loss")
	plt.plot(N, H_mlp.history["val_loss"], label="val_loss")
	plt.xlabel("Epoch #")
	plt.ylabel("Loss")
	plt.legend()

	plt.subplot(122)
	plt.plot(N, H_mlp.history["accuracy"], label="train_acc")
	plt.plot(N, H_mlp.history["val_accuracy"], label="val_acc")
	plt.xlabel("Epoch #")
	plt.ylabel("Accuracy")
	plt.legend()
	plt.tight_layout()

plotcurves(H_mlp, EPOCHS)

**Question:** Has the model converged? If it hasn't converge, what can be done to improve the convergence?

## Building a CNN model

Now, let's get to the CNN. We use Keras `Sequential` model to build the entire network structure incrementally or in "sequential" manner, by "adding" layers from scratch up. The other is the `Model` model, which we will encounter later.

Notice one big difference here compared to the MLP? We now put the entire RGB (3-dim) image into the CNN instead of just it flattened into a 1-dim input vector for MLP case. The structure of CNN allows for this since convolutions will be performed on to the image cube itself.

*Note: Dropouts are removed for now, but you can put them back in later to try*

In [None]:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D

# input image dimensions
w, h = 32, 32

cnn_model=Sequential()

# Feature Extraction layers: Conv layers
cnn_model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(w,h,3)))
cnn_model.add(Conv2D(64, (3, 3), activation='relu'))
cnn_model.add(MaxPooling2D(pool_size=(2, 2)))

# Convert 2D features to 1D Vector for classication
cnn_model.add(Flatten())

# Classification layers: FC/Dense layers
cnn_model.add(Dense(64, activation='sigmoid'))
cnn_model.add(Dense(len(lb.classes_), activation='softmax'))

In [None]:
cnn_model.summary()

Because of the change in input shape, we need to reshape the train and test data into the expected form. The labels remain the same.

In [None]:
cnn_model.compile(loss='categorical_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])

trainXb = np.reshape(trainX, (trainX.shape[0], w, h, 3))
testXb = np.reshape(testX, (testX.shape[0], w, h, 3))

lb = LabelBinarizer()
trainY = lb.fit_transform(trainY)
testY = lb.transform(testY)

EPOCHS = 30
H_cnn = cnn_model.fit(trainXb, trainY, validation_data=(testXb, testY),
          batch_size=BATCH_SIZE,
          epochs=EPOCHS,
          verbose=1)

score = cnn_model.evaluate(testXb, testY, verbose=0)

print('Test loss:', score[0])
print('Test accuracy:', score[1])

In [None]:
import matplotlib.pyplot as plt
plotcurves(H_cnn, EPOCHS)

**Exercise:**
What do you think of the performance of the model? From the loss plot, do you think it has it converged? If not, you can increase the epoch.

Try the following and see how these changes can affect the performance:

1) Change the **activation function** at the fully-connected (FC) layers from *sigmoid* to *ReLU*.

2) Change the **optimizer**; e.g. *adam*, *rmsprop*.

3) Change the **architecture** by adding more convolutional or fully-connected layers, change number of filters and filter size for each conv layer.

For something extra, let's show some of the images in our dataset.

In [None]:
from google.colab.patches import cv2_imshow

start = 2000
fruittray = []
for i in range(20):
  img = np.reshape(data[start+i]*255, (w, h, 3))
  fruittray.append(img)

ft = np.hstack(fruittray)
cv2_imshow(ft)


### Saving model and weights

Since training a CNN could take a long time, it is good to know how to save the essential parts of it. The most reliable way to do this is to save up the 'model' and the 'weights' in separate files. The model contains the definition of the the CNN structure and architecture, while the weights contain its corresponding weight values only. To deploy the network again, you need both parts to work.


In [None]:
%whos

In [None]:
from keras.models import model_from_json

# serialize model to JSON
model_json = cnn_model.to_json()
with open("fruits-model.json", "w") as json_file:
    json_file.write(model_json)

# serialize weights to HDF5
cnn_model.save_weights("fruits-model.weights.h5")
print("Saved model to disk")

Use some linux command to check on their file sizes.... (modify accordingly)

### Load models and weights

In [None]:
# load json and create model
# json_file = open('fruits-model.json', 'r')
# loaded_model_json = json_file.read()
# json_file.close()

from keras.initializers import glorot_uniform
#Reading the model from JSON file
with open('fruits-model.json', 'r') as json_file:
    loaded_model_json= json_file.read()

loaded_model = model_from_json(loaded_model_json)
loaded_model.summary()

loaded_model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])



In [None]:
# load weights into new model
loaded_model.load_weights("fruits-model.weights.h5")
print("Loaded model from disk")

loaded_model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])



### Perform prediction with the loaded model

**Q**: Noticed that we used image from the 'Training' folder images, for both training and validation! But this is fine, because we can now use the 'Test' folder images purely for evaluation of unseen samples. So, complete the code below to perform evaluation on the 'Test' images and see how well your model performs.

In [None]:
testpath = glob.glob("/content/Fruit-Images-Dataset/Test/*/*.jpg")

testdata = []
testlabels = []
for imgPath in testpath:
  image = cv2.imread(imgPath)
  image = cv2.resize(image, (32, 32))
  testdata.append(image)
  testlabels.append(imgPath.split(os.path.sep)[-2])

print(len(testdata))
print(len(testlabels))
print(len(np.unique(testlabels)))

# create one-hot encoding labels


# evaluate the model by performing prediction



print('Test loss:', testscore[0])
print('Test accuracy:', testscore[1])


Let's save out of Colab our best model and weights for future use.

In [None]:
from google.colab import files
files.download( "fruits-model.weights.h5" )
files.download( "fruits-model.json" )

### Try it in Streamlit

Place the model and weights into your working directory and run the streamlit `cnn_demo.py`. <br>
Upload some images to try out your model prediction.

## Using Pre-trained models

### Dog and Cat Dataset

In [None]:
!wget -c https://s3.amazonaws.com/content.udacity-data.com/nd089/Cat_Dog_data.zip

In [None]:
!unzip -q Cat_Dog_data.zip

In [None]:
%cd /content/Cat_Dog_data

Let's get a random dog picture from outside the dataset.

In [None]:
# load the class labels from disk

rows = open('/content/gdrive/My Drive/DreamCatcher/PAI(Jan2025)/Day2/synset_words.txt').read().strip().split("\n")
classes = [r[r.find(" ") + 1:].split(",")[0] for r in rows]

# print out ImageNet classes
print(len(classes))
print(classes)

Let's use one of the most robust CNN models around: The ResNet50 (50-layer deep residual network), which has already been pre-trained on ImageNet (millions of natural images) on 1,000 categories. It is unlikely that any of us would want waste resources to re-train again a large network involving massive amounts of images, so we can use these "off-the-shelf".

In [None]:
from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions
from keras.models import load_model

model = ResNet50(weights='imagenet', include_top=True)

img_path = '/content/gdrive/My Drive/DreamCatcher/PAI(Jan2025)/Day2/dog.jpg'

img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

preds = model.predict(x)
# decode the results into a list of tuples (class, description, probability)
# (one such list for each sample in the batch)
print('Predicted:\n', decode_predictions(preds, top=5)[0])

The results are pretty accurate. The dog is not only correctly classified as a 'dog', but since ImageNet contains fine level categories including different breeds of dogs, it is able to know this is a 'laborador retriever'.

Using ImageNet pre-trained models is a good starting point towards practical deep learning. If your category is among ImageNet's 1K categories, you could very well safely use a pre-trained model. But, if you are interested to adapt the model to suit your categories, you can fine-tune the model further.

# Transfer Learning

### ImageDataGenerator

Keras has a ImageDataGenerator which allows us to prepare a directory structure based on the partitioned data sets. The data should be placed into 2 different folders named as “train” and “test”. The train folder should contain ‘n’ folders each containing images of respective classes. For example, In the Dog vs Cats data set, the "train" folder should have 2 folders, namely “Dog” and “Cat” containing respective images inside them. Technically, images in the test folder would be in only a single directory because they will be used for prediction (no need class distinction for training). However, we normally also need to set aside a validation set to test our model, organized similarly to the train set.

<img src="https://cdn-images-1.medium.com/max/800/1*HpvpA9pBJXKxaPCl5tKnLg.jpeg" width=400 />

The folder names for the classes are important, name them consistently (for both train and validation sets) with respective label names so that it would be easy for you later.

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input

# change this according to your system RAM or GPU VRAM
train_batchsize = 100
val_batchsize = 10

# Mobilenet_V2
train_datagen = ImageDataGenerator(
        preprocessing_function=preprocess_input
        #rescale=1./255
        )

train_generator = train_datagen.flow_from_directory(
    directory=r"train/",
    target_size=(224, 224),
    color_mode="rgb",
    batch_size=train_batchsize,
    class_mode="categorical",
    shuffle=True,
    seed=0
)

valid_datagen = ImageDataGenerator(
        preprocessing_function=preprocess_input
        #rescale=1./255
        )

valid_generator = valid_datagen.flow_from_directory(
    directory=r"test/",
    target_size=(224, 224),
    color_mode="rgb",
    batch_size=val_batchsize,
    class_mode="categorical",
    shuffle=False,
    seed=0
)

### Using base CNN models from Keras

Keras Applications contain a small selection of deep learning models that are made available alongside pre-trained weights. These models can be used for prediction, feature extraction, and fine-tuning.

**Getting models with and without the 'top'**

Most of these models are a series of convolutional layers followed by one or a few dense (or fully connected) layers. So, they come with two flavours -- one with and one without the 'top', that is the "top" or the classifier end of the network.

`include_top` lets you select if you want the final dense layers or not.

- Convolutional layers work as feature extractors. They identify a series of patterns in the image, and each layer can identify more elaborate patterns by seeing patterns of patterns.
- Dense layers are capable of interpreting the found patterns in order to classify: this image contains cats, dogs, cars, etc. They can be perceived as part of a classifier (like a MLP).

The weights in a Dense layer are totally dependent on the input size (in this case, input to the Dense layer is the flattened last conv layer). It's one weight per element of the input. So your input is required to be always of a consistent same size, or else you will not be able to process further.

Hence, removing the final Dense layer (technically, its the activation connecting to the output) allows you to define the input size. The network can be **retrained with your own choice and design of Dense layers** based on the specific task you are working on.

### Fine-tuning a pre-trained model for your task

For the purpose of this training, we shall use a slim CNN architecture called MobileNetV2 instead of larger, deeper ones (VGG16, ResNet50). For the complete list of Keras pretrained models, see [here](https://keras.io/api/applications/).

In [None]:
# load base MobileNetV2 pre-trained model (from Keras)
base_model = MobileNetV2(include_top=False, weights='imagenet', pooling='avg', input_shape=(224, 224, 3))

In [None]:
base_model.summary()

#### Practical advice on fine-tuning

Fine-tuning is a way of adapting an existing pre-trained model to a different domain (that means, *different image data*, or *same type of data but different dataset*). Typically, the task is similar. If you are adapting an ImageNet-trained model (which was built for the purpose of image classification), chances are that you intend to use it for another image classification problem. Using it for a different kind of data such as text or speech would usually not work because the model may not be expecting the same shape or dimension of data.  

For example, if you are planning to fine-tune GoogLeNet model (trained on ImageNet data) for flower classification, it should be no issue. However, if you are planning to fine-tune GoogLeNet model for face identification, it would be a less suitable fit but still acceptable. If you want to fine-tune it for speech recognition, it would most likely be impossible for the architecture is not meant for that task.

Practically, fine-tuning is a quick way of creating a new model for your new domain...
- if you do not have the necessary computational resources to train from scratch on your data.
- if the amount of data in your new domain may not be sufficient to perform a full training (from scratch). In other words, your new dataset is much smaller than ImageNet's millions.

In [None]:
from tensorflow.keras.layers import Lambda
from keras import models
from keras import layers

# Create the model
model = models.Sequential()
model.add(Lambda(lambda x: x, input_shape=(224, 224, 3)))

# Add the resnet/vgg/mobilenet convolutional base model
model.add(base_model)

# Add the fully-connected layers
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dense(2, activation='softmax'))

model.summary()

model.compile(optimizer='sgd', loss='binary_crossentropy', metrics=['accuracy'])

In [None]:
from tensorflow.keras.layers import Lambda
from keras import models
from keras import layers

# Create the model
model = models.Sequential()
model.add(Lambda(lambda x: x, input_shape=(224, 224, 3)))
# Add the resnet/vgg/mobilnet convolutional base model
model.add(base_model)
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dense(2, activation='softmax'))

# first: train only the top layers (which were randomly initialized)
# i.e. freeze all convolutional layers
#for layer in base_model.layers[:-4]:
#    layer.trainable = False

# Check the trainable status of the individual layers
#for layer in base_model.layers:
#    print(layer, layer.trainable)

model.summary()

# compile the model (should be done *after* setting layers to non-trainable)
model.compile(optimizer='sgd', loss='binary_crossentropy', metrics=['accuracy'])

In [None]:
H = model.fit(train_generator,
        epochs=10,   # use a bigger number to train longer
        validation_data=valid_generator
)

model.save_weights('CNN_dog.weights.h5')  # always save your weights after training or during training

In [None]:
# Plot the accuracy and loss curves
acc = H.history['accuracy']
val_acc = H.history['val_accuracy']
loss = H.history['loss']
val_loss = H.history['val_loss']

epochs = range(len(acc))

plt.plot(epochs, acc, 'b', label='Training acc')
plt.plot(epochs, val_acc, 'r', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()

plt.plot(epochs, loss, 'b', label='Training loss')
plt.plot(epochs, val_loss, 'r', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()

In [None]:
# Create a generator for prediction
validation_dir = r"test/"
validation_generator = valid_datagen.flow_from_directory(
        validation_dir,
        target_size=(224, 224),
        batch_size=val_batchsize,
        class_mode='categorical',
        shuffle=False)

# Get the filenames from the generator
fnames = valid_generator.filenames

# Get the ground truth from generator
ground_truth = valid_generator.classes

# Get the label to class mapping from the generator
label2index = valid_generator.class_indices

# Getting the mapping from class index to class label
idx2label = dict((v,k) for k,v in label2index.items())

# Get the predictions from the model using the generator
predictions = model.predict(valid_generator, verbose=1)
predicted_classes = np.argmax(predictions,axis=1)

errors = np.where(predicted_classes != ground_truth)[0]
print("No of errors = {}/{}".format(len(errors),valid_generator.samples))


In [None]:
# Show the incorrectly predicted images (first ten only)
for i in range(10):
    pred_class = np.argmax(predictions[errors[i]])
    pred_label = idx2label[pred_class]

    title = 'Original label:{}, Prediction :{}, confidence : {:.3f}'.format(
        fnames[errors[i]].split('/')[0],
        pred_label,
        predictions[errors[i]][pred_class])

    original = image.load_img('{}/{}'.format(validation_dir,fnames[errors[i]]))
    plt.figure(figsize=[5,5])
    plt.axis('off')
    plt.title(title)
    plt.imshow(original)
    plt.show()

**Q**: What do you think of the performance of the model? What can be improved? What can be tested out?

Try the following:
* Use a different base network model (e.g. ResNet50, VGG16, Densenet) provided by Keras (https://keras.io/api/applications/)
* Modify the top (FC/Dense) layers
* Increase batch size
* Good to view the loss/accuracy plots to have a better idea if your network is underfitting or overfitting.
* Train more than just the final top layers (currently all weights from base model are frozen, no change allowed). Allow maybe the last few blocks to be trained as well.

To freeze more layers for fine-tuning, you just need to alter the 'trainable' property. Leaving them turned on (True) indicates that they will be fine-tuned.

In [None]:
# first: train only the top layers (which were randomly initialized)
# i.e. freeze all convolutional layers
for layer in base_model.layers[:-4]:
    layer.trainable = False

# Check the trainable status of the individual layers
for layer in base_model.layers:
    print(layer, layer.trainable)


# Early Stopping and Model Checkpoint

Choosing the number of training epochs to use in a neural network is not an easy task. Too many epochs can lead to overfitting the training dataset while too few may result in an underfit model. A way to solve this poblem is using **Early stopping callback**, a method that allows you to specify an arbitrary large number of training epochs and stop training once the model performance stops improving on a hold out validation dataset.

The EarlyStopping callback will stop training once triggered, but the model at the end of training may not be the model with best performance on the validation dataset. An additional callback that will save the best model observed during training for later use  is required. This is the **ModelCheckpoint callback**, which we can use it to save the best model observed during training as defined by a chosen performance measure on the validation dataset.

https://machinelearningmastery.com/how-to-stop-training-deep-neural-networks-at-the-right-time-using-early-stopping

The following are the sample codes for adding the callbacks for training the model:

In [None]:
# This code is not for standalone

# simple early stopping and model checkpointing
tenses = tf.keras.callbacks.EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=3)
mc = tf.keras.callbacks.ModelCheckpoint('best_model.keras', monitor='val_accuracy', mode='max', verbose=1, save_best_only=True)

# set the callbacks when fitting the model
history = model.fit(trainX, trainY, validation_data=(testX, testY), epochs=1000, verbose=0, callbacks=[es, mc])

# load the saved model
saved_model = load_model('best_model.keras')

**Extra Exercise**: Add the Early Stopping and Model Checkpoint callbacks to the training of the *Cats and Dogs* classification model and observe how it influences the training.

## Other Datasets

### WM-811K Dataset

WM-811K is a semiconductor dataset containing many process issues and wafer map patterns from CP Yield、WAT(Wafer Acceptance Test) and Particle can help the engineers find some clue.

But there is a big problem that how to classified the wafer map pattern into serveral groups without manual action. There are many papers to survey this problem, and here I will show the result of applying deep learning.

Threre are 811,457 images in the data but only 172,950 images have labels.There are a total of 9 types of problems. From all the labeled images, the 'none' pattern occupies 85.2%.

In [None]:
!wget http://mirlab.org/dataSet/public/WM-811K.zip

### CUB-200 Dataset

Caltech-UCSD Birds 200 (CUB-200) is an image dataset with photos of 200 bird species (mostly North American). For detailed information about the dataset, please see the technical report linked below.

Number of categories: 200

Number of images: 6,033

Annotations: Bounding Box, Rough Segmentation, Attributes

In [None]:
!wget http://www.vision.caltech.edu/visipedia-data/CUB-200/images.tgz

### Food-101 Dataset

Food-101is a challenging data set of 101 food categories, with 101,000 images. For each class, 250 manually reviewed test images are provided as well as 750 training images. On purpose, the training images were not cleaned, and thus still contain some amount of noise. This comes mostly in the form of intense colors and sometimes wrong labels. All images were rescaled to have a maximum side length of 512 pixels.

In [None]:
!wget http://data.vision.ee.ethz.ch/cvl/food-101.tar.gz