# Introduction

In this lesson you will discover the architectural innovations in the development and use of convolutional neural networks. After reading the next sections, you will know:

- The **ImageNet Large Scale Visual Recognition Challenge** (ILSVRC) and the deep learning-based innovations it has helped to bring about.

- The **key architectural milestones** in the development and use of convolutional neural networks over the last two decades (Working in Progressing). In this first lesson will see only LeNet-5 and AlexNet, later other models will be presented

- Working with HDFS files and large datasets using Cat & Dogs competition.


# 1.0 ImageNet, ILSVRC, and Milestone Architectures

The rise in popularity and use of deep learning neural network techniques can be traced back to the innovations in applying convolutional neural networks to image classification tasks. Some of the most important innovations have sprung from submissions by academics and industry leaders to the [ImageNet Large Scale Visual Recognition Challenge](http://image-net.org/challenges/LSVRC/), or **ILSVRC**.

The ILSVRC is an annual computer vision competition developed upon a subset of a publicly available computer vision dataset called **ImageNet**. As such, the tasks and even the challenge itself are often referred to as the ImageNet Competition. In this section, you will discover the **ImageNet dataset**, the ILSVRC, and the key milestones in image classification that have resulted from the competitions. After reading this section, you will know:

- The ImageNet dataset is a huge collection of human-annotated photographs designed by academics for developing computer vision algorithms.
- The ImageNet Large Scale Visual Recognition Challenge, or ILSVRC, is an annual competition that uses subsets from the ImageNet dataset and is designed to foster the development and benchmarking of state-of-the-art algorithms.
- The ILSVRC tasks have led to milestone model architectures and techniques in intersection of computer vision and deep learning.

## 1.1 ImageNet Dataset & ILSVRC

**ImageNet** is a large dataset of annotated photographs intended for computer vision research. The goal of developing the dataset was to provide a resource to promote improved research and development methods for computer vision.

Based on statistics about the dataset recorded on the [ImageNet homepage](http://www.image-net.org/index), there are a little more than **14 million images** in the dataset, a little more than **21 thousand groups** or classes (called synsets), and a little more than **1 million images that have bounding box annotations** (e.g.
boxes around identified objects in the images). 

- The photographs were annotated by humans using crowdsourcing platforms such as [Amazon’s Mechanical Turk](https://www.mturk.com/). 
- The project to develop and maintain the dataset was organized and executed by a collocation between academics at [Princeton](https://www.cs.princeton.edu/~kaiyuy/), [Stanford](https://profiles.stanford.edu/fei-fei-li), and other American universities. 
- The project does not own the photographs that make up the images; instead, they are owned by the copyright holders. As such, the dataset is not distributed directly; URLs are provided to the images included in the dataset.


The **ImageNet Large Scale Visual Recognition Challenge**, or ILSVRC for short, is an annual computer vision competition held to challenge tasks to use subsets of the ImageNet dataset. The challenge was to both promote the development of better computer vision techniques and benchmark the state-of-the-art. The annual challenge focuses on multiple tasks for image classification that includes both assigning a class label to an image based on the main object in the photograph and object detection that involves localizing objects within the photograph.

The general challenge tasks for most years are as follows:

- **Image classification**: Predict the classes of objects present in an image.
- **Single-object localization**: Image classification + draw a bounding box around one example of each object present.
- **Object detection**: Image classification + draw a bounding box around each object present.

More recently, and given the great success in developing techniques for still photographs, the challenge tasks are changing to more complicated tasks such as labeling videos. 

> The datasets comprised approximately **1 million images** and **1,000 object classes**. 

The datasets used in challenge tasks are sometimes varied (depending on the task) and were released publicly to promote widespread participation from academia and industry. For each annual challenge, an annotated training dataset was released, along with an unannotated test dataset for which annotations had to be made and submitted to a server for evaluation. Typically, **the training dataset** was comprised of **1 million images**, with **50,000 for a validation** dataset and **150,000 for a test dataset**.

## 1.2 Deep Learning Milestones From ILSVRC

Researchers working on ILSVRC tasks have pushed back the frontier of computer vision research. The methods and papers that describe them are milestones in computer vision, deep learning, and, more broadly in artificial intelligence. The pace of improvement in the first five years of the ILSVRC was dramatic, perhaps even shocking to the field of computer vision. **Success has primarily been achieved by large (deep) convolutional neural networks (CNNs) on graphical processing unit (GPU) hardware**, which sparked an interest in deep learning that extended beyond the field out into the mainstream.

> State-of-the-art accuracy has improved significantly from ILSVRC2010 to ILSVRC2014, showcasing the massive progress that has been made in large-scale object recognition over the past five years.

There has been widespread participation in the ILSVRC over the years, with many significant developments and enormous academic publications. Picking out milestones from so much work is a challenge in and of itself. Nevertheless, there are techniques, often named for their parent university, research group, or company, that stand out and have become staples in the intersecting fields of deep learning and computer vision. The papers that describe the methods have become required reading, and the techniques used by the models have become
heuristics when using general techniques in practice.

> This section will highlight some of these milestone techniques proposed as part of ILSVRC in which they were introduced and the papers that describe them. The focus will be on image classification tasks.

**Alexnet**

Alex Krizhevsky, et al. from the University of Toronto in their 2012 paper titled [ImageNet
Classification with Deep Convolutional Neural Networks](https://www.cs.toronto.edu/~kriz/imagenet_classification_with_deep_convolutional.pdf) developed a convolutional neural network
that achieved top results on the ILSVRC-2010 and ILSVRC-2012 image classification tasks.
These results sparked interest in deep learning in computer vision.

**ZFNet**

Matthew Zeiler and Rob Fergus proposed a variation of AlexNet generally referred to as ZFNet
in their 2013 paper titled [Visualizing and Understanding Convolutional Networks](https://arxiv.org/abs/1311.2901), a variation of which won the ILSVRC-2013 image classification task.

**Incepiton (GoogLeNet)**

Christian Szegedy, et al. from Google achieved top results for object detection with their
GoogLeNet model that made use of the inception module and architecture. This approach was
described in their 2014 paper titled [Going Deeper with Convolutions](https://arxiv.org/abs/1409.4842).

**VGG**

Karen Simonyan and Andrew Zisserman from the Oxford Vision Geometry Group (VGG)
achieved top results for image classification and localization with their VGG model. Their
approach is described in their 2015 paper titled [Very Deep Convolutional Networks for Large-
Scale Image Recognition](https://arxiv.org/abs/1409.1556).

**ResNet**

Kaiming He, et al. from Microsoft Research achieved top results for object detection and object detection with localization tasks with their Residual Network or ResNet described in their 2015 paper titled [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385).

# 2.0 How Milestone Model Architectural Innovations Work

Convolutional neural networks are comprised of two straightforward elements, namely **convolutional layers** and **pooling layers**. Although simple, there are near-infinite ways to arrange these layers for a given computer vision problem. Fortunately, **there are both common patterns** for
configuring these layers and architectural innovations that you can use in order to develop very deep convolutional neural networks. 

> Studying these architectural design decisions developed for state-of-the-art image classification tasks can provide both a rationale and intuition for how to
use these designs when designing your own deep convolutional neural network models. 

In this section, you will discover the key architectural milestones for the use of convolutional neural networks for challenging image classification problems. After completing this tutorial, you will
know:

- How to pattern the number of filters and filter sizes when implementing convolutional neural networks.

- How to arrange convolutional and pooling layers in a uniform pattern to develop well-performing models.

- How to use the inception module and residual module to develop much deeper convolutional networks.

## 2.1 Architectural Design for CNNs

The elements of a convolutional neural network, such as convolutional and pooling layers, are relatively straightforward to understand. The challenging part of using convolutional neural networks in practice is how to design model architectures that best use these simple elements. A helpful approach to learning how to design effective convolutional neural network architectures is to study successful applications. This is remarkably straightforward to do because of the intense study and application of CNNs from 2012 to 2016 for the ImageNet Large Scale Visual Recognition Challenge, or ILSVRC. This challenge resulted in both the rapid advancement in the state-of-the-art for challenging computer vision tasks and the development of general innovations in the architecture of convolutional neural network models.

We will begin with **LeNet-5** that is often described as the first successful and important application of CNNs prior to the ILSVRC, then look at four different winning architectural innovations for CNNs developed for the ILSVRC, namely, **AlexNet**, **VGG**, **Inception**, and **ResNet**. By understanding these milestone models and their architecture or architectural innovations from a high-level, you will develop both an appreciation for the use of these architectural elements in modern applications of CNNs in computer vision, and be able to identify and choose architecture elements that may be useful in the design of your own models.

## 2.2 LeNet-5

Perhaps the first widely known and successful application of convolutional neural networks was **LeNet-5**, described by Yann LeCun, et al. in their 1998 paper titled [Gradient-Based Learning Applied to Document Recognition](https://ieeexplore.ieee.org/document/726791). The system was developed for use in a handwritten character recognition problem and demonstrated on the **MNIST standard dataset**, achieving approximately 99.2% classification accuracy (or a 0.8% error rate). The network was then described as the central technique in a broader system referred to as **Graph Transformer Networks**.

It is a long paper, and perhaps the best part to focus on is Section II. B. that describes the LeNet-5 architecture. In that section, the paper describes the **network as having seven layers** with input **grayscale images** having the shape **32 x 32**, the size of images in the **MNIST dataset**. 

> The model proposes a pattern of a convolutional layer followed by an average pooling layer, referred to as a **subsampling layer**. 

This pattern is repeated two and a half times before the output feature maps are flattened and fed to some fully connected layers for interpretation and a final prediction. A picture of the network architecture is provided in the paper and reproduced below.

<img width="800" src="https://drive.google.com/uc?export=view&id=1nqbLzHfqorX80I8upHMWINwPNfrmLW-V"/>

The pattern of blocks of convolutional layers and pooling layers (referred to as **subsampling**) grouped and repeated **remains a typical pattern in designing and using convolutional neural networks today, more than twenty years later**. Interestingly, the architecture uses a small number of filters with a modest size as the first hidden layer, specifically 6 filters, each with 5x5 pixels. After pooling, another convolutional layer has many more filters, again with the same size, precisely 16 filters with 5x5 pixels, again followed by pooling. In the repetition of these two blocks of convolution and pooling layers, the trend increases the number of filters.

Compared to modern applications, the number of filters is also small, but **the trend of increasing the number of filters with the depth of the network also remains a common pattern in modern usage of the technique.** The flattening of the feature maps and interpretation and classification of the extracted features by fully connected layers also remains a common pattern today. 

> In modern terminology, the **final section of the architecture** is often referred to as the **classifier**, whereas the **convolutional and pooling layers** earlier in the model are referred to as the **feature extractor**.

We can summarize the key aspects of the architecture relevant in modern models as follows:

- Fixed-sized input images.
- Group convolutional and pooling layers into blocks.
- Repetition of convolutional-pooling blocks in the architecture.
- Increase in the number of filters with the depth of the network.
- Distinct feature extraction and classifier parts of the architecture.

### 2.2.1 Implementing the LeNet-5 model

In [None]:
%%capture
!pip install wandb

In [None]:
!wandb login

In [None]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D,AveragePooling2D,Flatten,Dense
from sklearn.metrics import classification_report
import matplotlib.pyplot as plt
import numpy as np
import time
import datetime
import os
import pytz
import wandb
from wandb.keras import WandbCallback

In [None]:
# load the datasets
(train_x, train_y), (test_x, test_y) = tf.keras.datasets.mnist.load_data()
train_x = train_x / 255.0
test_x = test_x / 255.0

# train and test sets
train_x = tf.expand_dims(train_x, 3)
test_x = tf.expand_dims(test_x, 3)

# print shape
print("Train shape: {0:}".format(train_x.shape))
print("Test shape: {0:}".format(test_x.shape))

In [None]:
train_x[0].shape

<img width="800" src="https://drive.google.com/uc?export=view&id=1nqbLzHfqorX80I8upHMWINwPNfrmLW-V"/>

In [None]:
# Set an experiment name to group training and evaluation
experiment_name = wandb.util.generate_id()

# setup wandb
wandb.init(project="lesson06", 
           group=experiment_name,
           config={
               "filter_c1": 6,
               "filter_c1_size": (5,5),
               "filter_c3": 16,
               "filter_c3_size": (5,5),
               "layer_c5": 120,
               "layer_f6": 84,
               "loss": "sparse_categorical_crossentropy",
               "metric": "accuracy",
               "epoch": 6,
               "batch_size": 32,
               "optimizer": "adam"
           })
config = wandb.config

In [None]:
# 
# create LeNet-5 model
#
# it is composed of the 8 layers (5 layers considering FC as one layer) such as:
#      - 2 convolutional layers
#      - 2 subsampling (avg pooling) layers
#      - 1 flatten layer
#      - 2 fully connected layers
#      - 1 output layer with 10 outputs

lenet5 = Sequential()

lenet5.add(Conv2D(config.filter_c1, config.filter_c1_size, strides=1,  activation='tanh', input_shape=(28,28,1), padding='same')) #C1
lenet5.add(AveragePooling2D()) #S2
lenet5.add(Conv2D(config.filter_c3, config.filter_c3_size, strides=1, activation='tanh', padding='valid')) #C3
lenet5.add(AveragePooling2D()) #S4
lenet5.add(Flatten()) #Flatten
lenet5.add(Dense(config.layer_c5, activation='tanh')) #C5
lenet5.add(Dense(config.layer_f6, activation='tanh')) #F6
lenet5.add(Dense(10, activation='softmax')) #Output layer

In [None]:
lenet5.summary()

In [None]:
%%wandb

# configure the optimizer, loss, and metrics to monitor.
lenet5.compile(optimizer=config.optimizer,
               loss=config.loss, 
               metrics=[config.metric])

# training 
history = lenet5.fit(x=train_x,
                    y=train_y,
                    batch_size=config.batch_size,
                    epochs=config.epoch,
                    validation_data=(test_x,test_y),
                    callbacks=[WandbCallback()])

wandb.finish()

**Log Analysis**

Next, log an analysis run, using the same experiment name as the group parameter so that this run and the previous run are grouped together in W&B.

In [None]:
%%capture
# Install dependencies
!pip install scikit-plot -qqq

In [None]:
import numpy as np
from sklearn.metrics import f1_score
import matplotlib.pyplot as plt
from scikitplot.metrics import plot_confusion_matrix, plot_roc, plot_precision_recall

wandb.init(project="lesson06", group=experiment_name)

# Class proportions
labels = [str(i) for i in list(set(test_y))]
wandb.log({'Class Proportions': wandb.sklearn.plot_class_proportions(train_y,test_y,labels)}, commit=False) # Hold on, more incoming!

# Log F1 Score
test_y_pred = np.asarray(lenet5.predict(test_x))
test_y_pred_class = np.argmax(test_y_pred, axis=1)
f1 = f1_score(test_y, test_y_pred_class, average='micro')
wandb.log({"f1": f1}, commit=False)

# Log Confusion Matrix
fig, ax = plt.subplots(figsize=(16, 12))
plot_confusion_matrix(test_y, test_y_pred_class, ax=ax)
wandb.log({"confusion_matrix": wandb.Image(fig)}, commit=False)

# Log ROC Curve
fig, ax = plt.subplots(figsize=(16, 12))
plot_roc(test_y, test_y_pred, ax=ax)
wandb.log({"plot_roc": wandb.Image(fig)},commit=False)  # Now we've logged everything for this step

# Precision vs Recall
fig, ax = plt.subplots(figsize=(16, 12))
plot_precision_recall(test_y, test_y_pred, ax=ax)
wandb.log({"plot_precision_recall": wandb.Image(fig)},commit=False)  # Now we've logged everything for this step

# Class Scores
class_score_data = []
for test, pred in zip(test_y, test_y_pred):
    class_score_data.append([test, pred])

wandb.log({"class_scores": wandb.Table(data=class_score_data,
                                           columns=["test", "pred"])}, commit=False)

# 
# Visualize Predictions
# 
# visualize 18 numbers
def show_image(train_image, label, index):
    plt.subplot(3, 6, index+1)
    plt.imshow(tf.squeeze(train_image), cmap=plt.cm.gray)
    plt.title(label)
    plt.grid(b=False)

# predictions
predictions = lenet5.predict(test_x)
results = np.argmax(predictions, axis = 1)

# visualize the first 18 test results
plt.figure(figsize=(12, 8))
for index in range(18):
    label = results[index]
    image_pixels = test_x[index,:,:,:]
    show_image(image_pixels, label, index)
plt.tight_layout()

wandb.log({"Predictions": plt}, commit=True)

wandb.finish()

In [None]:
print("[INFO] evaluating network...")
predictions = lenet5.predict(test_x, batch_size=32)
print(classification_report(test_y,predictions.argmax(axis=1)))

### 2.2.2 Data Augmentation

According to [Goodfellow et al.](https://www.deeplearningbook.org/), regularization is

> “(...) any modification we make to a learning algorithm that is intended to reduce its generalization error, but not its training error”

In short, regularization seeks to reduce our testing error perhaps at the expense of increasing training error slightly.

We’ve already looked at different forms of regularization in the first part of this course; however, these were parameterized forms of regularization, requiring us to update our loss/update
function. In fact, there exist other types of regularization that either:

1. Modify the network architecture itself.
2. Augment the data passed into the network for training.

**Dropout** is a great example of modifying a network architecture by achieving greater generalizability. Here we insert a layer that randomly disconnects nodes from the previous layer to the next layer, thereby ensuring that no single node is responsible for learning how to represent a given class.

In this section we’ll be discussing another type of regularization called **data augmentation**. This method purposely perturbs training examples, changing their appearance slightly, before passing them into the network for training. The end result is that a network consistently sees “new” training data points generated from the original training data, partially alleviating the need for us to gather more training data (though in general, gathering more training data will rarely hurt your algorithm).

**Data augmentation** encompasses a wide range of techniques used to generate new training samples from the original ones by applying random jitters and perturbations such that the classes labels are
not changed. 

> Our goal when applying **data augmentation** is to increase the generalizability of the model. 

Given that our network is constantly seeing new, slightly modified versions of the input data points, it’s able to learn more robust features. 

> At testing time, we do not apply data augmentation
and evaluate our trained network – in most cases, you’ll see an increase in testing accuracy, perhaps at the expense at a slight dip in training accuracy.

<center><img width="600" src="https://drive.google.com/uc?export=view&id=1PWNBYi_ziF8YnCCd25vnmsf9nxq-KsGH"></center><center><b>Left</b>: A sample of 250 data points that follow a normal distribution exactly. <b>Right</b>: Adding a small amount of random “jitter” to the distribution. This type of data augmentation can
increase the generalizability of our networks.</center>


Let’s consider the Figure above (**left**) of a normal distribution with zero mean and unit variance. Training a machine learning model on this data may result in us modeling the distribution exactly –
however, in real-world applications, data rarely follows such a neat distribution.

Instead, to increase the generalizability of our classifier, we may first randomly jitter points along the distribution by adding some values e drawn from a random distribution (**right**). Our plot
still follows an **approximately normal distribution**, but it’s not a perfect distribution as on the left. A model trained on this data is more likely to generalize to example data points not included in the
training set.
  **In the context of computer vision, data augmentation lends itself naturally**. For example, we can obtain additional training data from the original images by apply simple geometric transforms such as random:

1. Translations
2. Rotations
3. Changes in scale
4. Shearing
5. Horizontal (and in some cases, vertical) flips

Applying a (small) amount of these transformations to an input image will change its appearance slightly, but it does not change the class label – thereby making data augmentation a very natural, easy method to apply to deep learning for computer vision tasks.

In [None]:
# load the datasets
(train_x, train_y), (test_x, test_y) = tf.keras.datasets.mnist.load_data()
train_x = train_x / 255.0
test_x = test_x / 255.0

# train and test sets
train_x = tf.expand_dims(train_x, 3)
test_x = tf.expand_dims(test_x, 3)

# print shape
print("Train shape: {0:}".format(train_x.shape))
print("Test shape: {0:}".format(test_x.shape))

In [None]:
# visualize 18 numbers
def show_image(train_image, label, index):
    plt.subplot(3, 6, index+1)
    plt.imshow(tf.squeeze(train_image), cmap=plt.cm.gray)
    plt.title(label)
    plt.grid(b=False)

In [None]:
# visualize the first 18 numbers
plt.figure(figsize=(12, 8))
for index in range(18):
    label = train_y[index]
    image_pixels = train_x[index,:,:,:]
    show_image(image_pixels, label, index)
plt.tight_layout()

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# construct the image generator for data augmentation then
# initialize the total number of images generated thus far
aug = ImageDataGenerator(rotation_range=30, width_shift_range=0.1,
                         height_shift_range=0.1, shear_range=0.2, zoom_range=0.2,
                         horizontal_flip=False, fill_mode="nearest")
total = 0
image = train_x[10:11,:,:,:]

# construct the actual Python generator
print("[INFO] generating images...")
imageGen = aug.flow(image, batch_size=1)

# create a figure
plt.figure(figsize=(12, 8))

# loop over examples from our image data augmentation generator
for img in imageGen:

  show_image(img, train_y[10], total)

  # increment our counter
  total += 1

  # if we have reached 10 examples, break from the loop
  if total == 18:
    break

In [None]:
%%capture
!pip install wandb

In [None]:
!wandb login

In [None]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D,AveragePooling2D,Flatten,Dense
from sklearn.metrics import classification_report
import matplotlib.pyplot as plt
import numpy as np
import time
import datetime
import os
import pytz
import wandb
from wandb.keras import WandbCallback

In [None]:
# Set an experiment name to group training and evaluation
experiment_name = wandb.util.generate_id()

# setup wandb
wandb.init(project="lesson06", 
           group=experiment_name,
           config={
               "filter_c1": 6,
               "filter_c1_size": (5,5),
               "filter_c3": 16,
               "filter_c3_size": (5,5),
               "layer_c5": 120,
               "layer_f6": 84,
               "loss": "sparse_categorical_crossentropy",
               "metric": "accuracy",
               "epoch": 6,
               "batch_size": 32,
               "optimizer": "adam"
           })
config = wandb.config

In [None]:
# 
# create LeNet-5 model
#
# it is composed of the 8 layers such as:
#      - 2 convolutional layers
#      - 2 subsampling (avg pooling) layers
#      - 1 flatten layer
#      - 2 fully connected layers
#      - 1 output layer with 10 outputs

lenet5 = Sequential()

lenet5.add(Conv2D(config.filter_c1, config.filter_c1_size, strides=1,  activation='tanh', input_shape=(28,28,1), padding='same')) #C1
lenet5.add(AveragePooling2D()) #S2
lenet5.add(Conv2D(config.filter_c3, config.filter_c3_size, strides=1, activation='tanh', padding='valid')) #C3
lenet5.add(AveragePooling2D()) #S4
lenet5.add(Flatten()) #Flatten
lenet5.add(Dense(config.layer_c5, activation='tanh')) #C5
lenet5.add(Dense(config.layer_f6, activation='tanh')) #F6
lenet5.add(Dense(10, activation='softmax')) #Output layer

In [None]:
lenet5.summary()

In [None]:
%%wandb

# configure the optimizer, loss, and metrics to monitor.
lenet5.compile(optimizer=config.optimizer,
               loss=config.loss, 
               metrics=[config.metric])

# construct the image generator for data augmentation then
# initialize the total number of images generated thus far
aug = ImageDataGenerator(rotation_range=30, width_shift_range=0.1,
                         height_shift_range=0.1, shear_range=0.2, zoom_range=0.2,
                         horizontal_flip=False, fill_mode="nearest")

print("[INFO] training network...")
history = lenet5.fit(aug.flow(train_x, train_y, batch_size=config.batch_size),
                     validation_data=(test_x, test_y), 
                     epochs=config.epoch, 
                      callbacks=[WandbCallback()])

wandb.finish()

In [None]:
%%capture
# Install dependencies
!pip install scikit-plot -qqq

In [None]:
import numpy as np
from sklearn.metrics import f1_score
import matplotlib.pyplot as plt
from scikitplot.metrics import plot_confusion_matrix, plot_roc, plot_precision_recall

wandb.init(project="lesson06", group=experiment_name)

# Class proportions
labels = [str(i) for i in list(set(test_y))]
wandb.log({'Class Proportions': wandb.sklearn.plot_class_proportions(train_y,test_y,labels)}, commit=False) # Hold on, more incoming!

# Log F1 Score
test_y_pred = np.asarray(lenet5.predict(test_x))
test_y_pred_class = np.argmax(test_y_pred, axis=1)
f1 = f1_score(test_y, test_y_pred_class, average='micro')
wandb.log({"f1": f1}, commit=False)

# Log Confusion Matrix
fig, ax = plt.subplots(figsize=(16, 12))
plot_confusion_matrix(test_y, test_y_pred_class, ax=ax)
wandb.log({"confusion_matrix": wandb.Image(fig)}, commit=False)

# Log ROC Curve
fig, ax = plt.subplots(figsize=(16, 12))
plot_roc(test_y, test_y_pred, ax=ax)
wandb.log({"plot_roc": wandb.Image(fig)},commit=False)  # Now we've logged everything for this step

# Precision vs Recall
fig, ax = plt.subplots(figsize=(16, 12))
plot_precision_recall(test_y, test_y_pred, ax=ax)
wandb.log({"plot_precision_recall": wandb.Image(fig)},commit=False)  # Now we've logged everything for this step

# Class Scores
class_score_data = []
for test, pred in zip(test_y, test_y_pred):
    class_score_data.append([test, pred])

wandb.log({"class_scores": wandb.Table(data=class_score_data,
                                           columns=["test", "pred"])}, commit=False)

# 
# Visualize Predictions
# 
# visualize 18 numbers
def show_image(train_image, label, index):
    plt.subplot(3, 6, index+1)
    plt.imshow(tf.squeeze(train_image), cmap=plt.cm.gray)
    plt.title(label)
    plt.grid(b=False)

# predictions
predictions = lenet5.predict(test_x)
results = np.argmax(predictions, axis = 1)

# visualize the first 18 test results
plt.figure(figsize=(12, 8))
for index in range(18):
    label = results[index]
    image_pixels = test_x[index,:,:,:]
    show_image(image_pixels, label, index)
plt.tight_layout()

wandb.log({"Predictions": plt}, commit=True)

wandb.finish()

In [None]:
print("[INFO] evaluating network...")
predictions = lenet5.predict(test_x, batch_size=32)
print(classification_report(test_y,predictions.argmax(axis=1)))

### 2.2.3 Extensions

This section lists some ideas for extending that you may wish to explore.

- **Batch normalization**. Implement BN technique before and after the activation function and
review the final result.
- **Other activation functions**. Investigate change the activation function to relu and compare the results.

If you explore any of these extensions, I’d love to know.

## 2.3 AlexNet

The work that perhaps could be credited with sparking renewed interest in neural networks and the beginning of the dominance of deep learning in many computer vision applications was the 2012 paper by Alex Krizhevsky et al. titled [ImageNet Classification with Deep Convolutional
Neural Networks](https://papers.nips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf). The paper describes a model later referred to as **AlexNet** designed to address the ImageNet Large Scale Visual Recognition Challenge or ILSVRC-2010 competition for classifying photographs of objects into one of 1,000 different categories.

The ILSVRC was a competition designed to spur innovation in the field of computer vision. Before the development of AlexNet, the task was thought very difficult and far beyond the capability of modern computer vision methods. 

> AlexNet successfully demonstrated the capability
of the convolutional neural network model in the domain and kindled a fire that resulted in many more improvements and innovations, many demonstrated on the same ILSVRC task in subsequent years. 

More broadly, **the paper showed that it is possible to develop deep and effective end-to-end models** for a challenging problem without using unsupervised pre-training techniques popular at the time.

Important in the design of AlexNet was a suite of new or successful methods, but not widely adopted at the time. Now, they have become requirements when using CNNs for image classification. 

> AlexNet used the rectified linear activation function, or ReLU, as the nonlinearly after each convolutional layer, instead of S-shaped functions such as the logistic or Tanh that were common up until that point. A softmax activation function was used in the output layer, now a staple for multiclass classification with neural networks.

The average pooling used in LeNet-5 was replaced with a max-pooling method, although in this case, overlapping pooling was found to outperform non-overlapping pooling that is commonly used today (e.g., stride of pooling operation is the same size as the pooling operation,
e.g., 2 by 2 pixels). The newly proposed dropout method was used to address overfitting between the fully connected layers of the classifier part of the model to improve generalization error. The architecture of AlexNet is deep and extends upon some of the patterns established
with LeNet-5. The image below, taken from the paper, summarizes the model architecture, in this case, split into two pipelines to train on the GPU hardware of the time.

<img width="800" src="https://drive.google.com/uc?export=view&id=111DrLxQn-ejJ2-7zNA4M8t78wmqcMQNe"/>


**It is similar to LeNet-5, only much larger and deeper**, and it was the first to stack convolutional layers directly on top of one another, instead of stacking a pooling layer on top of each convolutional layer. Table below presents this architecture.

<center><img width="600" src="https://drive.google.com/uc?export=view&id=193aOD83q_m_apxqjv1kFSjGRYFV7HfL2"></center><center>AlexNet Architecture.</center>


The model has **five convolutional layers** in the **feature extraction part** of the model and **three fully connected layers** in the **classifier part** of the model. Input images were fixed to the size **227x227 (there is a typo in the original paper using 224 x 224) with three color channels**. In terms of the number of **filters** used in each convolutional layer, the pattern of increasing the number of filters with depth seen in LeNet was mainly adhered to; in this case, the sizes: **96, 256, 384, 384, and 256**. Similarly, the **pattern of decreasing the size of the filter** (kernel) with depth was used, starting from the smaller size of 11x11 and decreasing to 5x5, and then to 3x3 in the deeper layers. **The use of small filters such as 5x5 and 3x3 is now the norm**.

The pattern of a convolutional layer followed by a pooling layer was used at the start and end of the feature detection part of the model. Interestingly, **a pattern of a convolutional layer followed immediately by a second convolutional** layer was used. **This pattern too has become a modern standard**. 

To reduce overfitting, the authors used **two regularization techniques**. First, they applied dropout with a 50% dropout rate during training to the outputs of layers F9 and F10. Second, they performed **data augmentation** by randomly shifting the training images by various offsets, flipping them horizontally, and changing the lighting conditions.

**Data augmentation** artificially increases the size of the training set by generating many realistic variants of each training instance. **This reduces overfitting**, making this a regularization technique. The generated instances should be as realistic as possible: ideally, given an image from the augmented training set, a human should not be able to tell whether it was augmented or not. Simply adding white noise will not help; the modifications should be learnable (white noise is not).

AlexNet also uses a competitive normalization step immediately after the ReLU step of layers C1 and C3, called **Local Response Normalization (LRN)**: the most strongly activated neurons inhibit other neurons located at the same position in neighboring feature maps (such competitive activation has been observed in biological neurons). This encourages different feature maps to specialize, pushing them apart and forcing them to explore a wider range of features, ultimately improving generalization.



We can summarize the key aspects of the architecture relevant in modern models as follows:

- Use of the ReLU activation function after convolutional layers and softmax for the output layer.
- Use of Max Pooling instead of Average Pooling.
- Use of Dropout regularization between the fully connected layers.
- The pattern of a convolutional layer fed directly to another convolutional layer.
- Use of Data Augmentation.


> A variant of AlexNet called [ZF Net](https://arxiv.org/abs/1311.2901) was developed by Matthew Zeiler and Rob Fergus and won the 2013 ILSVRC challenge. It is essentially AlexNet with a few tweaked hyperparameters (number of feature maps, kernel size, stride, etc.).

### 2.3.1 Implementing the AlexNet model

For the sake of understanding, look the table above and compare the hyperparameters and dimensions of each layer.

In [None]:
# import the necessary packages
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Dense
from tensorflow.keras.regularizers import l2

# create a model
model = Sequential()

# 
# Block #1: first CONV => RELU => POOL layer set
#
model.add(Conv2D(96, (11, 11), strides=(4, 4),
                 input_shape=(227,227,3), padding="valid",
                 kernel_regularizer=l2(0.0002),activation='relu'))

# Batch Normalization does not exist in 2012, here is a modification of original proposal
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 2)))
model.add(Dropout(0.25))

# 
# Block #2: second CONV => RELU => POOL layer set
#
model.add(Conv2D(256, (5, 5), padding="same",
                 kernel_regularizer=l2(0.0002),activation='relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 2)))
model.add(Dropout(0.25))

# 
# Block #3: CONV => RELU => CONV => RELU => CONV => RELU
#
model.add(Conv2D(384, (3, 3), padding="same",
                 kernel_regularizer=l2(0.0002),activation='relu'))
model.add(BatchNormalization())
model.add(Conv2D(384, (3, 3), padding="same",
                 kernel_regularizer=l2(0.002),activation='relu'))
model.add(BatchNormalization())
model.add(Conv2D(256, (3, 3), padding="same",
                 kernel_regularizer=l2(0.002),activation='relu'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 2)))
model.add(Dropout(0.25))

# 
# Block #4: first set of FC => RELU layers
#
model.add(Flatten())
model.add(Dense(4096, kernel_regularizer=l2(0.0002),activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))

# 
# Block #5: second set of FC => RELU layers
#
model.add(Dense(4096, kernel_regularizer=l2(0.0002),activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5))

# 
# softmax classifier
#
model.add(Dense(1000, kernel_regularizer=l2(0.0002)))
model.add(Activation("softmax"))

In [None]:
# 62M parameters!!!
model.summary()

# 3 Working with HDFS files and large datasets

So far in this course, we have only worked with datasets that can fit into the main memory of our machines (colab). For small datasets, this is a reasonable assumption – we load each individual image, preprocess it, and allow it to be fed through our network. However, for large scale deep
learning datasets (e.g., ImageNet), we need to create data generators that access only a portion of the dataset at a time (i.e., a mini-batch), then allow the batch to be passed through the network.

Luckily, Keras ships with methods that allow you to use the raw file paths on disk as inputs to a training process. You do not have to store the entire dataset in memory – simply supply the image paths to the Keras data generator and your images will be loaded in batches and fed through the
network. However, this method is terribly inefficient. Each and every image residing on your disk requires an I/O operation which introduces latency into your training pipeline. Training deep learning networks is already slow enough – we would do well to avoid the I/O bottleneck as much as possible.

**A more elegant solution would be to generate an HDF5 dataset for your raw images**. Not only is HDF5 capable of storing massive datasets, but it’s optimized for I/O operations, **especially for extracting batches (called “slices”)** from the file. As we’ll see throughout the remainder of this course, taking the extra step to pack the raw images residing on disk into an HDF5 file allows us to construct a deep learning framework that can be used to rapidly build datasets and train deep learning networks on top of them.

In the remainder of this section, we’ll demonstrate how to construct an HDF5 dataset for the [Kaggle Dogs vs. Cats competition](https://www.kaggle.com/c/dogs-vs-cats). Right after,  we’ll use this HDF5 dataset to train the seminal [AlexNet architecture](https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf), eventually resulting in a top-25 position on the leaderboard in the respective competition.

## 3.1 What is HDF5?

**HDF5** is binary data format created by the [HDF5 group](https://www.hdfgroup.org/solutions/hdf5/) to store gigantic numerical datasets on disk (far too large to store in memory) while facilitating easy access and computation on the rows of the datasets. 

> Data in HDF5 is stored hierarchically, similar to how a file system stores data. 

Data is first defined in groups, where a group is a container-like structure which can hold datasets and other
groups. Once a group has been defined, a dataset can be created within the group. A dataset can be
thought of as a multi-dimensional array (i.e., a NumPy array) of a homogeneous data type (integer,
float, unicode, etc.). An example of an HDF5 file containing a group with multiple datasets is
displayed in Figure below.

<center><img width="400" src="https://drive.google.com/uc?export=view&id=1-oiHlD5B97FTA3p9pnnm5PJhS4ywI_f_"></center><center>An example of a HDF5 file with three datasets. The first dataset contains the
label_names for CALTECH-101. We then have labels, which maps the each image to its
corresponding class label. Finally, the features dataset contains the image quantifications extracted
by the CNN</center>

**HDF5 is written in C**; however, by using the [h5py module](h5py.org), we can gain access to
the underlying C API using the Python programming language. What makes **h5py** so awesome
is the **ease of interaction with data**. 

> We can store huge amounts of data in our HDF5 dataset and manipulate the data in a NumPy-like fashion. 

For example, we can use standard Python syntax to access and slice rows from multi-terabyte datasets stored on disk as if they were simple NumPy arrays loaded into memory. Thanks to specialized data structures, these slices and row accesses are lighting quick.

When using HDF5 with h5py, **you can think of your data as a gigantic NumPy array**
that is too large to fit into main memory but can still be accessed and manipulated just the same.
Perhaps best of all, **the HDF5 format is standardized**

> meaning that datasets stored in HDF5 format are inherently portable and can be accessed by other developers using different programming languages such as C, MATLAB, and Java.

We’ll be writing a custom Python class that allows us to efficiently accept input data and write it to an HDF5 dataset. 

## 3.2 Writing to an HDF5 dataset

Before we can even think about treating CNN Architectures, we first need to develop a bit of infrastructure. In particular, we need to define a Python class named **HDF5DatasetWriter**, which as the name suggests, is responsible for taking an input set of NumPy arrays (whether features, raw images, etc.) and writing them to HDF5 format.

In [None]:
# import the necessary packages
import h5py
import os

class HDF5DatasetWriter:
  def __init__(self, dims, outputPath, dataKey="images",bufSize=1000):
    """
    The constructor to HDF5DatasetWriter accepts four parameters, two of which are optional.
    
    Args:
    dims: controls the dimension or shape of the data we will be storing in the dataset.
    if we were storing the (flattened) raw pixel intensities of the 28x28 = 784 MNIST dataset, 
    then dims=(70000, 784).
    outputPath: path to where our output HDF5 file will be stored on disk.
    datakey: The optional dataKey is the name of the dataset that will store
    the data our algorithm will learn from.
    bufSize: controls the size of our in-memory buffer, which we default to 1,000 feature
    vectors/images. Once we reach bufSize, we’ll flush the buffer to the HDF5 dataset.
    """

    # check to see if the output path exists, and if so, raise
    # an exception
    if os.path.exists(outputPath):
      raise ValueError("The supplied `outputPath` already "
        "exists and cannot be overwritten. Manually delete "
        "the file before continuing.", outputPath)

    # open the HDF5 database for writing and create two datasets:
    # one to store the images/features and another to store the
    # class labels
    self.db = h5py.File(outputPath, "w")
    # 
    # for resource limitations due to hard-disk space, a compression algorithm can be used, the price is the demand of computational power
    #
    self.data = self.db.create_dataset(dataKey, dims,dtype="float",compression='gzip')
    self.labels = self.db.create_dataset("labels", (dims[0],),dtype="int")

    # store the buffer size, then initialize the buffer itself
    # along with the index into the datasets
    self.bufSize = bufSize
    self.buffer = {"data": [], "labels": []}
    self.idx = 0

  def add(self, rows, labels):
    # add the rows and labels to the buffer
    self.buffer["data"].extend(rows)
    self.buffer["labels"].extend(labels)

    # check to see if the buffer needs to be flushed to disk
    if len(self.buffer["data"]) >= self.bufSize:
      self.flush()

  def flush(self):
    # write the buffers to disk then reset the buffer
    i = self.idx + len(self.buffer["data"])
    self.data[self.idx:i] = self.buffer["data"]
    self.labels[self.idx:i] = self.buffer["labels"]
    self.idx = i
    self.buffer = {"data": [], "labels": []}

  def storeClassLabels(self, classLabels):
    # create a dataset to store the actual class label names,
    # then store the class labels
    dt = h5py.special_dtype(vlen=str) # `vlen=unicode` for Py2.7
    labelSet = self.db.create_dataset("label_names",(len(classLabels),), dtype=dt)
    labelSet[:] = classLabels

  def close(self):
    # check to see if there are any other entries in the buffer
    # that need to be flushed to disk
    if len(self.buffer
           ["data"]) > 0:
      self.flush()

    # close the dataset
    self.db.close()

As you can see, the **HDF5DatasetWriter** doesn’t have much to do with machine learning or deep learning at all – it’s simply a class used to help us store data in HDF5 format. As you continue in your deep learning learning, you’ll notice that much of the initial labor when setting up a new problem is getting the data into a format you can work with. Once you have your data in a format that’s straightforward to manipulate, it becomes substantially easier to apply machine learning and deep learning techniques to your data.

## 3.3 Downloading Kaggle: Dogs vs. Cats

To download the Kaggle: Dogs vs. Cats dataset you’ll first need to create an account on kaggle.com. From there, head to the [Dogs vs. Cats homepage](https://www.kaggle.com/c/dogs-vs-cats/data).

In [None]:
# download cat & dogs dataset (catdogs.zip)
!gdown https://drive.google.com/uc?id=1RqVl-1AOFRYaktJuG3xxxmuur0ua8s-p

In [None]:
# train/dog.NNNN.jpg
# train/cat.NNNN.jpg
!unzip catdogs.zip

In [None]:
# we will be using the following data structure for this challenge
# 
# | ----- catdogs
# |       | ----- hdf5
# |       | ----- output
# |       | ----- train

!mkdir catdogs
!mkdir catdogs/hdf5
!mkdir catdogs/output
!mv train catdogs/

In [None]:
# 25k instances
!ls catdogs/train | wc -l

### 3.3.1 Building the Dataset

In [None]:
# define the paths to the images directory
IMAGES_PATH = "catdogs/train"

# since we do not have validation data or access to the testing
# labels we need to take a number of images from the training
# data and use them instead
NUM_CLASSES = 2
NUM_VAL_IMAGES = 1250 * NUM_CLASSES
NUM_TEST_IMAGES = 1250 * NUM_CLASSES

# define the path to the output training, validation, and testing
# HDF5 files
TRAIN_HDF5 = "catdogs/hdf5/train.hdf5"
VAL_HDF5 = "catdogs/hdf5/val.hdf5"
TEST_HDF5 = "catdogs/hdf5/test.hdf5"

# path to the output model file
MODEL_PATH = "catdogs/alexnet_dogs_vs_cats.model"

# define the path to the dataset mean
DATASET_MEAN = "catdogs/dogs_vs_cats_mean.json"

# define the path to the output directory used for storing plots,
# classification reports, etc.
OUTPUT_PATH = "catdogs/output"

> On Line 2 we define the path to the directory containing the dog and cat images – these are the images that we’ll be packing into a HDF5 dataset later in this section.

> Lines 7-9 define the total number of class labels (two: one for dog, another for cat) along with the number of validation and testing images (2,500 for each). 

> We can then specify the path to our output HDF5 files for the training, validation, and testing splits, respectively on Lines 13-15.

The second half of the configuration file defines the path to the output serialized weights, the dataset mean, and a general “output” path to store plots, classification reports, logs, etc.

The DATASET_MEAN file will be used to store the average red, green, and blue pixel intensity values across the entire (training) dataset. When we train our network, we’ll subtract the mean RGB values from every pixel in the image (the same goes for testing and evaluation as well). This method, called **mean subtraction**, is a type of data normalization technique and is more often used than scaling pixel intensities to the range [0,1] as it’s shown to be more effective on large datasets and deeper neural networks.

### 3.3.2 Aspect Aware

In [None]:
# import the necessary packages
import imutils
import cv2

# useful class to help the resize of images
class AspectAwarePreprocessor:
	def __init__(self, width, height, inter=cv2.INTER_AREA):
		# store the target image width, height, and interpolation
		# method used when resizing
		self.width = width
		self.height = height
		self.inter = inter

	def preprocess(self, image):
		# grab the dimensions of the image and then initialize
		# the deltas to use when cropping
		(h, w) = image.shape[:2]
		dW = 0
		dH = 0

		# if the width is smaller than the height, then resize
		# along the width (i.e., the smaller dimension) and then
		# update the deltas to crop the height to the desired
		# dimension
		if w < h:
			image = imutils.resize(image, width=self.width,
				inter=self.inter)
			dH = int((image.shape[0] - self.height) / 2.0)

		# otherwise, the height is smaller than the width so
		# resize along the height and then update the deltas
		# crop along the width
		else:
			image = imutils.resize(image, height=self.height,
				inter=self.inter)
			dW = int((image.shape[1] - self.width) / 2.0)

		# now that our images have been resized, we need to
		# re-grab the width and height, followed by performing
		# the crop
		(h, w) = image.shape[:2]
		image = image[dH:h - dH, dW:w - dW]

		# finally, resize the image to the provided spatial
		# dimensions to ensure our output image is always a fixed
		# size
		return cv2.resize(image, (self.width, self.height),
			interpolation=self.inter)

### 3.3.3 Converter images into HDF5 

In [None]:
#
# Convert jpg images into hdf5 (train, val, test)
#
# 25min to 35min

# import the necessary packages
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from imutils import paths
import numpy as np
import progressbar
import json
import cv2
import os

# grab the paths to the images
trainPaths = list(paths.list_images(IMAGES_PATH))
trainLabels = [p.split(os.path.sep)[-1].split(".")[0] for p in trainPaths]
le = LabelEncoder()
trainLabels = le.fit_transform(trainLabels)

# perform stratified sampling from the training set to build the
# testing split from the training data
split = train_test_split(trainPaths, trainLabels,
                         test_size=NUM_TEST_IMAGES, 
                         stratify=trainLabels,random_state=42)
(trainPaths, testPaths, trainLabels, testLabels) = split

# perform another stratified sampling, this time to build the
# validation data
split = train_test_split(trainPaths, trainLabels,
                         test_size=NUM_VAL_IMAGES, 
                         stratify=trainLabels,random_state=42)
(trainPaths, valPaths, trainLabels, valLabels) = split

# construct a list pairing the training, validation, and testing
# image paths along with their corresponding labels and output HDF5
# files
datasets = [
	("train", trainPaths, trainLabels, TRAIN_HDF5),
	("val", valPaths, valLabels, VAL_HDF5),
	("test", testPaths, testLabels, TEST_HDF5)]

# initialize the image pre-processor and the lists of RGB channel
# averages
aap = AspectAwarePreprocessor(256, 256)
(R, G, B) = ([], [], [])

# loop over the dataset tuples
for (dType, paths, labels, outputPath) in datasets:
	# create HDF5 writer
	print("[INFO] building {}...".format(outputPath))
	writer = HDF5DatasetWriter((len(paths), 256, 256, 3), outputPath)

	# initialize the progress bar
	widgets = ["Building Dataset: ", progressbar.Percentage(), " ",progressbar.Bar(), " ", progressbar.ETA()]
	pbar = progressbar.ProgressBar(maxval=len(paths),widgets=widgets).start()

	# loop over the image paths
	for (i, (path, label)) in enumerate(zip(paths, labels)):
		# load the image and process it
		image = cv2.imread(path)
		image = aap.preprocess(image)

		# if we are building the training dataset, then compute the
		# mean of each channel in the image, then update the
		# respective lists
		if dType == "train":
			(b, g, r) = cv2.mean(image)[:3]
			R.append(r)
			G.append(g)
			B.append(b)

		# add the image and label # to the HDF5 dataset
		writer.add([image], [label])
		pbar.update(i)

	# close the HDF5 writer
	pbar.finish()
	writer.close()

# construct a dictionary of averages, then serialize the means to a
# JSON file
print("[INFO] serializing means...")
D = {"R": np.mean(R), "G": np.mean(G), "B": np.mean(B)}
f = open(DATASET_MEAN, "w")
f.write(json.dumps(D))
f.close()

In [None]:
# copy hdf5 files for your google drive
!cp -r catdogs/hdf5 /content/drive/MyDrive/Atividades/Ensino/Disciplinas/POS-GRADUAÇÃO/Deep\ Learning/Lessons/Lesson\ #06/catdogs
!cp -r catdogs/dogs_vs_cats_mean.json /content/drive/MyDrive/Atividades/Ensino/Disciplinas/POS-GRADUAÇÃO/Deep\ Learning/Lessons/Lesson\ #06/catdogs
!cp -r catdogs/output/ /content/drive/MyDrive/Atividades/Ensino/Disciplinas/POS-GRADUAÇÃO/Deep\ Learning/Lessons/Lesson\ #06/catdogs

<font color="red"> Only execute the cell below if you already have hdf5 files stored in your google drive </font>

In [None]:
# 
# only in case loading data from drive
#
!mkdir catdogs
!cp -r /content/drive/MyDrive/Atividades/Ensino/Disciplinas/POS-GRADUAÇÃO/Deep\ Learning/Lessons/Lesson\ #06/catdogs/hdf5 catdogs
!cp -r /content/drive/MyDrive/Atividades/Ensino/Disciplinas/POS-GRADUAÇÃO/Deep\ Learning/Lessons/Lesson\ #06/catdogs/output catdogs
!cp -r /content/drive/MyDrive/Atividades/Ensino/Disciplinas/POS-GRADUAÇÃO/Deep\ Learning/Lessons/Lesson\ #06/catdogs/dogs_vs_cats_mean.json catdogs

## 3.4 Competing in Kaggle: Dogs vs. Cats

### 3.4.1 Image Preprocessors

In [None]:
# import the necessary packages
from tensorflow.keras.preprocessing.image import img_to_array

class ImageToArrayPreprocessor:
	def __init__(self, dataFormat=None):
		# store the image data format
		self.dataFormat = dataFormat

	def preprocess(self, image):
		# apply the Keras utility function that correctly rearranges
		# the dimensions of the image
		return img_to_array(image, data_format=self.dataFormat)

### 3.4.2 Mean preprocessor

Let’s get started with the mean pre-processor. We will learn how to convert an image
dataset to HDF5 format – part of this conversion involved computing the average Red, Green, and Blue pixel intensities across all images in the training dataset. Now that we have these averages, we are going to perform a pixel-wise subtraction of these values from our input images as a **form of data normalization**.

In [None]:
# import the necessary packages
import cv2

class MeanPreprocessor:
	def __init__(self, rMean, gMean, bMean):
		# store the Red, Green, and Blue channel averages across a
		# training set
		self.rMean = rMean
		self.gMean = gMean
		self.bMean = bMean

	def preprocess(self, image):
		# split the image into its respective Red, Green, and Blue
		# channels
		(B, G, R) = cv2.split(image.astype("float32"))

		# subtract the means for each channel
		R -= self.rMean
		G -= self.gMean
		B -= self.bMean

    # Keep in mind that OpenCV represents images in BGR order
		# merge the channels back together and return the image
		return cv2.merge([B, G, R])

### 3.4.3 Patch preprocessing

The PatchPreprocessor is responsible for randomly sampling MxN regions of an image during the training process. We apply patch preprocessing when the spatial dimensions of our input images are larger than what the CNN expects – this is a common technique to help reduce overfitting, and is, therefore, **a form of regularization**. Instead of using the entire image during training, we instead crop a random portion of it and pass it to the network.

As you will see, we will construct an HDF5 dataset of Kaggle Dogs vs. Cats images where each image is 256x256 pixels. However, the AlexNet architecture that we’ll be implementing later in this lesson can only accept images of size 227x227 pixels. This is an excellent opportunity to perform data augmentation by randomly cropping a 227x227 region from the 256x256 image during training using PatchPreprocessor.

In [None]:
# import the necessary packages
from sklearn.feature_extraction.image import extract_patches_2d

class PatchPreprocessor:
	def __init__(self, width, height):
		# store the target width and height of the image
		self.width = width
		self.height = height

	def preprocess(self, image):
		# extract a random crop from the image with the target width
		# and height
		return extract_patches_2d(image, (self.height, self.width),
			max_patches=1)[0]

### 3.4.4 Crop preprocessor

Next, we need to define a CropPreprocessor responsible for computing the 10-crops for oversampling. During the evaluating phase of our CNN, we’ll crop the four corners of the input image + the center region and then take their corresponding horizontal flips, for a total of ten samples per input image.

These ten samples will be passed through the CNN, and then the probabilities averaged.
Applying this over-sampling method tends to include 1-2 percent increases in classification accuracy (and in some cases, even higher).

In [None]:
# import the necessary packages
import numpy as np
import cv2

class CropPreprocessor:
	def __init__(self, width, height, horiz=True, inter=cv2.INTER_AREA):
		# store the target image width, height, whether or not
		# horizontal flips should be included, along with the
		# interpolation method used when resizing
		self.width = width
		self.height = height
		self.horiz = horiz
		self.inter = inter

	def preprocess(self, image):
		# initialize the list of crops
		crops = []

		# grab the width and height of the image then use these
		# dimensions to define the corners of the image based
		(h, w) = image.shape[:2]
		coords = [
			[0, 0, self.width, self.height],
			[w - self.width, 0, w, self.height],
			[w - self.width, h - self.height, w, h],
			[0, h - self.height, self.width, h]]

		# compute the center crop of the image as well
		dW = int(0.5 * (w - self.width))
		dH = int(0.5 * (h - self.height))
		coords.append([dW, dH, w - dW, h - dH])

		# loop over the coordinates, extract each of the crops,
		# and resize each of them to a fixed size
		for (startX, startY, endX, endY) in coords:
			crop = image[startY:endY, startX:endX]
			crop = cv2.resize(crop, (self.width, self.height),
				interpolation=self.inter)
			crops.append(crop)

		# check to see if the horizontal flips should be taken
		if self.horiz:
			# compute the horizontal mirror flips for each crop
			mirrors = [cv2.flip(c, 1) for c in crops]
			crops.extend(mirrors)

		# return the set of crops
		return np.array(crops)

### 3.4.5 HDF5 dataset generators

Before we can implement the AlexNet architecture and train it on the Kaggle Dogs vs. Cats dataset, we first need to define a class responsible for yielding batches of images and labels from our HDF5 dataset. Section 3.3.3 discussed how to convert a set of images residing on disk into an HDF5 dataset – but how do we get them back out again? The answer is to define an **HDF5DatasetGenerator** class.

Previously, all of our image datasets could be loaded into memory so we could rely on Keras generator utilities to yield our batches of images and corresponding labels. However, now that our datasets are too large to fit into memory, we need to handle implementing this generator ourselves.

In [None]:
# import the necessary packages
from tensorflow.keras.utils import to_categorical
import numpy as np
import h5py

class HDF5DatasetGenerator:
	def __init__(self, dbPath, batchSize, preprocessors=None, aug=None, binarize=True, classes=2):
		# store the batch size, preprocessors, and data augmentor,
		# whether or not the labels should be binarized, along with
		# the total number of classes
		self.batchSize = batchSize
		self.preprocessors = preprocessors
		self.aug = aug
		self.binarize = binarize
		self.classes = classes

		# open the HDF5 database for reading and determine the total
		# number of entries in the database
		self.db = h5py.File(dbPath, "r")
		self.numImages = self.db["labels"].shape[0]

	def generator(self, passes=np.inf):
		# initialize the epoch count
		epochs = 0

		# keep looping infinitely -- the model will stop once we have
		# reach the desired number of epochs
		while epochs < passes:
			# loop over the HDF5 dataset
			for i in np.arange(0, self.numImages, self.batchSize):
				# extract the images and labels from the HDF dataset
				images = self.db["images"][i: i + self.batchSize]
				labels = self.db["labels"][i: i + self.batchSize]

				# check to see if the labels should be binarized
				if self.binarize:
					labels = to_categorical(labels,
						self.classes)

				# check to see if our preprocessors are not None
				if self.preprocessors is not None:
					# initialize the list of processed images
					procImages = []

					# loop over the images
					for image in images:
						# loop over the preprocessors and apply each
						# to the image
						for p in self.preprocessors:
							image = p.preprocess(image)

						# update the list of processed images
						procImages.append(image)

					# update the images array to be the processed
					# images
					images = np.array(procImages)

				# if the data augmenator exists, apply it
				if self.aug is not None:
					(images, labels) = next(self.aug.flow(images,
						labels, batch_size=self.batchSize))

				# yield a tuple of images and labels
				yield (images, labels)

			# increment the total number of epochs
			epochs += 1

	def close(self):
		# close the database
		self.db.close()

### 3.4.6 Simple preprocessor

In [None]:
# import the necessary packages
import cv2

class SimplePreprocessor:
	def __init__(self, width, height, inter=cv2.INTER_AREA):
		# store the target image width, height, and interpolation
		# method used when resizing
		self.width = width
		self.height = height
		self.inter = inter

	def preprocess(self, image):
		# resize the image to a fixed size, ignoring the aspect
		# ratio
		return cv2.resize(image, (self.width, self.height),
			interpolation=self.inter)

### 3.4.7 Training monitor

In [None]:
# import the necessary packages
from tensorflow.keras.callbacks import BaseLogger
import matplotlib.pyplot as plt
import numpy as np
import json
import os

class TrainingMonitor(BaseLogger):
	def __init__(self, figPath, jsonPath=None, startAt=0):
		# store the output path for the figure, the path to the JSON
		# serialized file, and the starting epoch
		super(TrainingMonitor, self).__init__()
		self.figPath = figPath
		self.jsonPath = jsonPath
		self.startAt = startAt

	def on_train_begin(self, logs={}):
		# initialize the history dictionary
		self.H = {}

		# if the JSON history path exists, load the training history
		if self.jsonPath is not None:
			if os.path.exists(self.jsonPath):
				self.H = json.loads(open(self.jsonPath).read())

				# check to see if a starting epoch was supplied
				if self.startAt > 0:
					# loop over the entries in the history log and
					# trim any entries that are past the starting
					# epoch
					for k in self.H.keys():
						self.H[k] = self.H[k][:self.startAt]

	def on_epoch_end(self, epoch, logs={}):
		# loop over the logs and update the loss, accuracy, etc.
		# for the entire training process
		for (k, v) in logs.items():
			l = self.H.get(k, [])
			l.append(float(v))
			self.H[k] = l

		# check to see if the training history should be serialized
		# to file
		if self.jsonPath is not None:
			f = open(self.jsonPath, "w")
			f.write(json.dumps(self.H))
			f.close()

		# ensure at least two epochs have passed before plotting
		# (epoch starts at zero)
		if len(self.H["loss"]) > 1:
			# plot the training loss and accuracy
			N = np.arange(0, len(self.H["loss"]))
			plt.style.use("ggplot")
			plt.figure()
			plt.plot(N, self.H["loss"], label="train_loss")
			plt.plot(N, self.H["val_loss"], label="val_loss")
			plt.plot(N, self.H["accuracy"], label="train_acc")
			plt.plot(N, self.H["val_accuracy"], label="val_acc")
			plt.title("Training Loss and Accuracy [Epoch {}]".format(
				len(self.H["loss"])))
			plt.xlabel("Epoch #")
			plt.ylabel("Loss/Accuracy")
			plt.legend()

			# save the figure
			plt.savefig(self.figPath)
			plt.close()

### 3.4.8 Rank accuracy


In [None]:
# import the necessary packages
import numpy as np

def rank5_accuracy(preds, labels):
	# initialize the rank-1 and rank-5 accuracies
	rank1 = 0
	rank5 = 0

	# loop over the predictions and ground-truth labels
	for (p, gt) in zip(preds, labels):
		# sort the probabilities by their index in descending
		# order so that the more confident guesses are at the
		# front of the list
		p = np.argsort(p)[::-1]

		# check if the ground-truth label is in the top-5
		# predictions
		if gt in p[:5]:
			rank5 += 1

		# check to see if the ground-truth is the #1 prediction
		if gt == p[0]:
			rank1 += 1

	# compute the final rank-1 and rank-5 accuracies
	rank1 /= float(len(preds))
	rank5 /= float(len(preds))

	# return a tuple of the rank-1 and rank-5 accuracies
	return (rank1, rank5)

### 3.4.9 SimpleDatasetLoader

In [None]:
# import the necessary packages
import numpy as np
import cv2
import os

# helper to load images
class SimpleDatasetLoader:
	def __init__(self, preprocessors=None):
		# store the image preprocessor
		self.preprocessors = preprocessors

		# if the preprocessors are None, initialize them as an
		# empty list
		if self.preprocessors is None:
			self.preprocessors = []

	def load(self, imagePaths, verbose=-1):
		# initialize the list of features and labels
		data = []
		labels = []

		# loop over the input images
		for (i, imagePath) in enumerate(imagePaths):
			# load the image and extract the class label assuming
			# that our path has the following format:
			# /path/to/dataset/{class}/{image}.jpg
			image = cv2.imread(imagePath)
			label = imagePath.split(os.path.sep)[-2]

			# check to see if our preprocessors are not None
			if self.preprocessors is not None:
				# loop over the preprocessors and apply each to
				# the image
				for p in self.preprocessors:
					image = p.preprocess(image)

			# treat our processed image as a "feature vector"
			# by updating the data list followed by the labels
			data.append(image)
			labels.append(label)

			# show an update every `verbose` images
			if verbose > 0 and i > 0 and (i + 1) % verbose == 0:
				print("[INFO] processed {}/{}".format(i + 1,
					len(imagePaths)))

		# return a tuple of the data and labels
		return (np.array(data), np.array(labels))

## 3.5 Implementing AlexNet

In [None]:
# import the necessary packages
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Dense
from tensorflow.keras.regularizers import l2
from tensorflow.keras import backend as K

class AlexNet:
	@staticmethod
	def build(width, height, depth, classes, reg=0.0002):
		# initialize the model along with the input shape to be
		# "channels last" and the channels dimension itself
		model = Sequential()
		inputShape = (height, width, depth)
		chanDim = -1

		# if we are using "channels first", update the input shape
		# and channels dimension
		if K.image_data_format() == "channels_first":
			inputShape = (depth, height, width)
			chanDim = 1

		# Block #1: first CONV => RELU => POOL layer set
		model.add(Conv2D(96, (11, 11), strides=(4, 4),input_shape=inputShape, padding="valid",kernel_regularizer=l2(reg)))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 2)))
		model.add(Dropout(0.25))

		# Block #2: second CONV => RELU => POOL layer set
		model.add(Conv2D(256, (5, 5), padding="same",kernel_regularizer=l2(reg)))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 2)))
		model.add(Dropout(0.25))

		# Block #3: CONV => RELU => CONV => RELU => CONV => RELU
		model.add(Conv2D(384, (3, 3), padding="same",kernel_regularizer=l2(reg)))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(Conv2D(384, (3, 3), padding="same",kernel_regularizer=l2(reg)))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(Conv2D(256, (3, 3), padding="same",kernel_regularizer=l2(reg)))
		model.add(Activation("relu"))
		model.add(BatchNormalization(axis=chanDim))
		model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 2)))
		model.add(Dropout(0.25))

		# Block #4: first set of FC => RELU layers
		model.add(Flatten())
		model.add(Dense(4096, kernel_regularizer=l2(reg)))
		model.add(Activation("relu"))
		model.add(BatchNormalization())
		model.add(Dropout(0.5))

		# Block #5: second set of FC => RELU layers
		model.add(Dense(4096, kernel_regularizer=l2(reg)))
		model.add(Activation("relu"))
		model.add(BatchNormalization())
		model.add(Dropout(0.5))

		# softmax classifier
		model.add(Dense(classes, kernel_regularizer=l2(reg)))
		model.add(Activation("softmax"))

		# return the constructed network architecture
		return model

## 3.6 Training AlexNet on Kaggle: dogs vs cats

In [None]:
# define the path to the output training, validation, and testing
# HDF5 files
TRAIN_HDF5 = "catdogs/hdf5/train.hdf5"
VAL_HDF5 = "catdogs/hdf5/val.hdf5"
TEST_HDF5 = "catdogs/hdf5/test.hdf5"

# path to the output model file
MODEL_PATH = "catdogs/alexnet_dogs_vs_cats.model"

# define the path to the dataset mean
DATASET_MEAN = "catdogs/dogs_vs_cats_mean.json"

# define the path to the output directory used for storing plots,
# classification reports, etc.
OUTPUT_PATH = "/content/drive/MyDrive/Atividades/Ensino/Disciplinas/POS-GRADUAÇÃO/Deep Learning/Lessons/Lesson #06/catdogs/"

In [None]:
# import the necessary packages
# set the matplotlib backend so figures can be saved in the background
import matplotlib
matplotlib.use("Agg")

# import the necessary packages
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint

import json
import os

# Configurations related to checkpoint
# resume or not the model
resume = True

# checkpoint files
filepath= OUTPUT_PATH + "/epochs:{epoch:03d}-val_acc:{val_accuracy:.3f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_accuracy', verbose=1, save_best_only=True, mode='max')

# construct the training image generator for data augmentation
aug = ImageDataGenerator(rotation_range=20, zoom_range=0.15,
                         width_shift_range=0.2,
                         height_shift_range=0.2,
                         shear_range=0.15,
                         horizontal_flip=True, fill_mode="nearest")

# load the RGB means for the training set
means = json.loads(open(DATASET_MEAN).read())

# initialize the image preprocessors
sp = SimplePreprocessor(227, 227)
pp = PatchPreprocessor(227, 227)
mp = MeanPreprocessor(means["R"], means["G"], means["B"])
iap = ImageToArrayPreprocessor()

# initialize the training and validation dataset generators
trainGen = HDF5DatasetGenerator(TRAIN_HDF5, 128, aug=aug,preprocessors=[pp, mp, iap], classes=2)
valGen = HDF5DatasetGenerator(VAL_HDF5, 128,preprocessors=[sp, mp, iap], classes=2)

# initialize the optimizer
print("[INFO] compiling model...")
opt = Adam(lr=1e-3)

model = AlexNet.build(width=227, height=227, depth=3,classes=2, reg=0.0002)
model.compile(loss="binary_crossentropy", optimizer=opt,metrics=["accuracy"])

# construct the set of callbacks
path = os.path.sep.join([OUTPUT_PATH, "{}.png".format(os.getpid())])
callbacks = [TrainingMonitor(path)]

initial_epoch = 1

# load previou weights
if resume == True:
  model.load_weights(OUTPUT_PATH + "/epochs:045-val_acc:0.921.hdf5")
  initial_epoch = 45

# train the network
history = model.fit(trainGen.generator(),
          steps_per_epoch=trainGen.numImages // 128,
          validation_data=valGen.generator(),
          validation_steps=valGen.numImages // 128,
          epochs=45,
          max_queue_size=10,
          callbacks=[callbacks,checkpoint], verbose=1,
          initial_epoch=initial_epoch)

# save the model to file
print("[INFO] serializing model...")
model.save(MODEL_PATH, overwrite=True)

# close the HDF5 datasets
trainGen.close()
valGen.close()

## 3.7 Evaluating AlexNet

In [None]:
# import the necessary packages
from tensorflow.keras.models import load_model
from sklearn.metrics import classification_report
import numpy as np
import progressbar
import json

# load the RGB means for the training set
means = json.loads(open(DATASET_MEAN).read())

# initialize the image preprocessors
sp = SimplePreprocessor(227, 227)
mp = MeanPreprocessor(means["R"], means["G"], means["B"])
cp = CropPreprocessor(227, 227)
iap = ImageToArrayPreprocessor()

# load the pretrained network
print("[INFO] loading model...")
model = load_model(MODEL_PATH)

# initialize the testing dataset generator, then make predictions on
# the testing data
print("[INFO] predicting on test data (no crops)...")
testGen = HDF5DatasetGenerator(TEST_HDF5, 64,
                               preprocessors=[sp, mp, iap], classes=2)
predictions = model.predict(testGen.generator(),
                            steps=testGen.numImages // 64, max_queue_size=10)

# compute the rank-1 and rank-5 accuracies
(rank1, _) = rank5_accuracy(predictions, testGen.db["labels"])
print("[INFO] rank-1: {:.2f}%".format(rank1 * 100))
testGen.close()


# re-initialize the testing set generator, this time excluding the
# `SimplePreprocessor`
testGen = HDF5DatasetGenerator(TEST_HDF5, 64,
                               preprocessors=[mp], classes=2)
predictions = []

# initialize the progress bar
widgets = ["Evaluating: ", progressbar.Percentage(), " ", progressbar.Bar(), " ", progressbar.ETA()]
pbar = progressbar.ProgressBar(maxval=testGen.numImages // 64,widgets=widgets).start()

# loop over a single pass of the test data
for (i, (images, labels)) in enumerate(testGen.generator(passes=1)):
	# loop over each of the individual images
	for image in images:
		# apply the crop preprocessor to the image to generate 10
		# separate crops, then convert them from images to arrays
		crops = cp.preprocess(image)
		crops = np.array([iap.preprocess(c) for c in crops],
			dtype="float32")

		# make predictions on the crops and then average them
		# together to obtain the final prediction
		pred = model.predict(crops)
		predictions.append(pred.mean(axis=0))

	# update the progress bar
	pbar.update(i)

# compute the rank-1 accuracy
pbar.finish()
print("[INFO] predicting on test data (with crops)...")
(rank1, _) = rank5_accuracy(predictions, testGen.db["labels"])
print("[INFO] rank-1: {:.2f}%".format(rank1 * 100))
testGen.close()