## **Getting Started With Computer Vision Using TensorFlow Keras**

Computer Vision attempts to perform the tasks that a human brain does with the aid of human eyes. Computer Vision is a branch of Deep Learning that deals with images and videos. Computer Vision tasks can be roughly classified into two categories:

1. Discriminative tasks
2. Generative tasks

To read about it more, please refer [this](https://analyticsindiamag.com/computer-vision-using-tensorflow-keras/) article.

This session aims to give a strong foundation to Computer Vision by exploring image classification tasks using Convolutional Neural Networks built with TensorFlow Keras. More importance has been given to both the coding part and the key concepts of theory and math behind each operation. Let’s start our Computer Vision journey!

## **Implementation**

Import necessary packages, libraries and modules.


We discuss Image Classification using TWO examples.
1. Fashion MNIST dataset (`tf.keras.datasets`)
2. Beans dataset (`tensorflow_datasets`)


References:

https://www.tensorflow.org/tutorials/images/cnn

https://www.tensorflow.org/api_docs/python/tf



In [None]:
!python -m pip install pip --upgrade --user -q --no-warn-script-location
!python -m pip install numpy pandas seaborn matplotlib scipy statsmodels sklearn tensorflow tensorflow_datasets keras opencv-python pillow scikit-image --user -q --no-warn-script-location

import IPython
IPython.Application.instance().kernel.do_shutdown(True)


In [None]:
import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow import keras
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### Load Fashion MNIST data from Keras Datasets

In [None]:
fashion_data = keras.datasets.fashion_mnist.load_data()
(x_train,y_train),(x_val,y_val)= fashion_data

Let’s have a look at the size of the train data.

In [None]:
x_train.shape, y_train.shape

There are 60,000 grayscale images in the train data, each of size 28×28. For each image, the corresponding label is available in y_train. The official Datasets page informs that there are 10 different classes. The classes are numerically represented from 0 to 9. The images are low-clarity images of fashion collections such as shirts, coats, shoes, trousers, pullovers, and sandals.

Similarly, we can have a look at the size of the validation data.

In [None]:
x_val.shape ,y_val.shape

There are 10,000 validation images and corresponding labels. Let’s sample an image and visualize it.

In [None]:
plt.imshow(x_train[10])
plt.colorbar()
plt.show()

The values range from 0 to 255. We should scale the data by dividing the values by 255.0

In [None]:
x_train = x_train/255.0
x_val = x_val/255.0

We can visualize some 25 images and their corresponding class labels for a better understanding of the data.

In [None]:
plt.figure(figsize=(7,7))
for i in range(1,26):
  plt.subplot(5,5,i)
  plt.imshow(x_train[i])
  plt.title(y_train[i],color='r')
  plt.xticks([])
  plt.yticks([])
plt.tight_layout()
plt.show() 

We can model a convolutional neural network to develop an image classifier. However, a convolution layer expects three dimensional data input. Usually, the shape of input images is (width, height, channels). Since we possess grayscale images, their shapes are of (width, height). We should increase the number of dimensions from 2 to 3 by expanding at the last axis.

In [None]:
x_train = tf.expand_dims(x_train, axis=-1)
x_val = tf.expand_dims(x_val, axis=-1)

Model building and Training

Let us build a Convolutional neural network.

In [None]:
classifier = keras.models.Sequential([# convolution layer
                                      keras.layers.Conv2D(64,(3,3),activation='relu',input_shape=(28,28,1)),
                                      # flattening layer
                                      keras.layers.Flatten(),
                                      # dense hidden layer
                                      keras.layers.Dense(128, activation='relu'),
                                      # dense output layer
                                      keras.layers.Dense(10,activation='softmax')
])

In [None]:
x_train.shape

We have built our convolutional neural network for our Computer Vision task. Here, we define an optimizer, a loss function and a metric required to train and evaluate the model. We use an Adam optimizer (a SGD variant), sparse categorical cross-entropy loss function (for multi-class classification) and accuracy metric.

In [None]:
classifier.compile(optimizer='adam',
                   loss='sparse_categorical_crossentropy',
                   metrics=['accuracy'])

Perform training for 10 epochs.

In [None]:
history = classifier.fit(x_train,y_train,validation_data=(x_val,y_val),epochs=10)

Performance Analysis

Visualize losses and accuracies over epochs for both training and evaluation.

In [None]:
hist = pd.DataFrame(history.history)
epochs = np.arange(1,11)
plt.plot(epochs,hist['loss'], label='Train Loss')
plt.plot(epochs,hist['val_loss'], label='Val Loss')
plt.legend()
plt.ylabel('Loss')
plt.xlabel('Epochs')
plt.xticks(epochs)
plt.show()

The losses during training are going down, while that during evaluation is exploding. This is the direct cause of overfitting. 

In [None]:
epochs = np.arange(1,11)
plt.plot(epochs,hist['accuracy'], label='Train Accuracy')
plt.plot(epochs,hist['val_accuracy'], label='Val Accuracy')
plt.legend()
plt.ylabel('Accuracy')
plt.xlabel('Epochs')
plt.xticks(epochs)
plt.show()

The accuracy plot confirms the insight provided by the losses plot. Steps against overfitting must be taken, such as implementing dropout layers, employing kernel regularizers, reducing model complexity, increasing the amount of data by augmentation and implementing early stopping.

### Load BEANS dataset from TENSORFLOW_DATASETS

We explore more options and methodologies in Computer Vision with a relatively complex dataset. The Beans dataset available in-built with TensorFlow Datasets has images belonging to three classes.

1. Healthy bean leaves
2. Leaves with bean rust (unhealthy)
3. Leaves with angular leaf spot (unhealthy)

The major advantage of the TensorFlow Datasets is that the data is pre-processed and vectorized to enhance the off-the-shelf strategy. Load the beans dataset and its metadata.

In [None]:
data, meta = tfds.load('beans',
                 as_supervised=True,
                 with_info=True,
                 )

In [None]:
train, val, test = data['train'], data['validation'], data['test']

In [None]:
ex = next(iter(test))
ex[0].numpy()[0][0]

The labels corresponding to three classes are provided as 0, 1 and 2. The corresponding readable label name can be extracted from the metadata.

In [None]:
label_extractor = meta.features['label'].int2str

Sample an image. Display its label and size, and visualize it.

In [None]:
for example,label in train.take(1):
  print(label.numpy())
  print(label_extractor(label))
  print(example.shape)
  plt.imshow(example)
  plt.colorbar()
  plt.show()

The images are of size 500 by 500 in three colour channels. The pixel values range from 0 to 255. Define a helper function to scale and resize the image to 160 by 160 (for memory efficiency). 

In [None]:
def normalize(img, label):
  img = tf.cast(img, tf.float32)
  img = img/255.0
  img = tf.image.resize(img,(160,160))
  return img,label

Scale pixel values and resize the images.

In [None]:
train = train.map(normalize)
val = val.map(normalize)
test = test.map(normalize)

In [None]:
plt.figure(figsize=(7,7))
i = 1
for example,label in train.skip(10).take(9):
  plt.subplot(3,3,i)
  plt.title(label_extractor(label),color='r')
  plt.imshow(example)
  plt.xticks([])
  plt.yticks([])
  i += 1
plt.tight_layout()
plt.show()

With some images, we humans can classify the leaves easily. Let’s check how far our model learns the same. Prepare the train and validation data in batches. Because Adam optimizer expects data to be in batches. Shuffle the train images, leaving validation and test images as such.

In [None]:
train_batch = train.shuffle(1000).batch(64)
val_batch = val.batch(64)
test_batch = test.batch(64)

Modeling and Training

There will be two parts in a convolutional neural network: a base with convolutional layers and their associated layers, and a head with Dense layers and their associated layers. Build a convolutional neural network base with three Conv2D layers and two MaxPooling2D layers in between. While a convolution layer extracts features from the input image or feature map, a max pooling layer retains the important features discarding the less-important features.

In [None]:
base = keras.models.Sequential([
                               keras.layers.Conv2D(64,(3,3), activation='relu',input_shape=[160,160,3]),
                               keras.layers.MaxPooling2D((2,2)),
                               keras.layers.Conv2D(128,(3,3),strides=2, activation='relu', kernel_regularizer='l1_l2'),
                               keras.layers.MaxPooling2D((2,2)),
                               keras.layers.Conv2D(128,(3,3),strides=2, activation='relu', kernel_regularizer='l1_l2'),                               
])

Build a head with one Flatten layer, three Dense layers and one dropout layer. 

In [None]:
head = keras.models.Sequential([
                                keras.layers.Flatten(),
                                keras.layers.Dense(128,activation='relu'),
                                keras.layers.Dropout(0.5),
                                keras.layers.Dense(64,activation='relu'),
                                keras.layers.Dense(3,activation='softmax')
])

Stack base and head to form the complete architecture. It should be noted that the base and head can be constructed in a single Sequential model in one go. 

In [None]:
model = keras.models.Sequential([base,head])

In [None]:
base.summary()

In [None]:
head.summary()

In [None]:
model.summary()

There are around 1.56 million parameters in our architecture. Let’s define our optimizer, loss function and metric to perform training and evaluation.

In [None]:
model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

Train the model for 40 epochs.

In [None]:
history = model.fit(train_batch, validation_data=val_batch, epochs=40)

Performance Analysis

Analyze the training performance using the training history. 

In [None]:
hist = pd.DataFrame(history.history)

In [None]:
epochs = np.arange(6,41)
plt.plot(epochs,hist['loss'][5:], label='Train Loss')
plt.plot(epochs,hist['val_loss'][5:], label='Val Loss')
plt.legend()
plt.ylabel('Loss')
plt.xlabel('Epochs')
plt.xticks(np.arange(5,42,2))
plt.show()

In [None]:
epochs = np.arange(1,41)
plt.plot(epochs,hist['accuracy'], label='Train Accuracy')
plt.plot(epochs,hist['val_accuracy'], label='Val Accuracy')
plt.legend()
plt.ylabel('Accuracy')
plt.xlabel('Epochs')
plt.xticks(np.arange(1,42,2))
plt.show()

In [None]:
print(hist.columns)

Losses keep on reducing till the final epoch, and accuracies keep on increasing till the final epoch. It suggests that the model should be trained for more epochs until convergence. The curves are not smooth. It suggests implementing Batch Normalization that can provide a stable training experience. 

Finally, we deploy our model to predict our test data!

Prediction on test data

In [None]:
preds = model.predict(test_batch)


In [None]:
preds.shape

In [None]:
images,labels = next(iter(test_batch))

In [None]:
p = images[0]
print(p.shape)

Let’s evaluate the performance of prediction on test data.

In [None]:
plt.figure(figsize=(7,7))

batch = next(iter(test_batch))
for i in range(9):
  plt.subplot(3,3,i+1)
  pred = np.argmax(preds[i])
  plt.title(f'Actual: {label_extractor(labels[i])}' ,color='b',size=12)
  if pred==labels[i]:
    plt.xlabel(f'Predicted: {label_extractor(pred)}', color='b',size=12)
  else:
    plt.xlabel(f'Predicted: {label_extractor(pred)}', color='r',size=12)
  plt.imshow(images[i])
  plt.xticks([])
  plt.yticks([])

plt.tight_layout()
plt.show()

Actual labels are in the top of each image (blue in colour). Predicted labels are at the bottom of each image. Labels in blue and red colours refer to correct and incorrect predictions respectively.

#**Related Articles:**

> * [Getting Started with Computer Vision using Tensorflow Keras](https://analyticsindiamag.com/computer-vision-using-tensorflow-keras/)

> * [Feature Extraction of Images with Skimage](https://analyticsindiamag.com/image-feature-extraction-using-scikit-image-a-hands-on-guide/)

> * [Bitwise Operations On Images Using OpenCV](https://analyticsindiamag.com/how-to-implement-bitwise-operations-on-images-using-opencv/)

> * [Face Swaping with OpenCV](https://analyticsindiamag.com/a-fun-project-on-building-a-face-swapping-application-with-opencv/)

> * [Create Watermark Images with OpenCV](https://analyticsindiamag.com/how-to-create-a-watermark-on-images-using-opencv/)

> * [Convert Image to Cartoon](https://analyticsindiamag.com/converting-an-image-to-a-cartoon/)

