__Chapter 15 - Classifying Images with Deep Convolutional Neural Networks__

1. [Building blocks of convolutional neural networks](#)
    1. [Understanding CNNs and learning feature hierarchies](#Understanding-CNNs-and-learning-feature-hierarchies)
    1. [Performing discrete convolutions in one dimension](#Performing-discrete-convolutions-in-one-dimension)
1. [](#)
1. [](#)
1. [](#)
1. [](#)
1. [](#)
1. [](#)
1. [](#)
1. [](#)
1. [](#)
1. [](#)
1. [](#)
1. [](#)


In [None]:
# Standard libary and settings
import os
import sys
import importlib
import itertools
import warnings; warnings.simplefilter('ignore')
dataPath = os.path.abspath(os.path.join('../../Data'))
modulePath = os.path.abspath(os.path.join('../../CustomModules'))
sys.path.append(modulePath) if modulePath not in sys.path else None
from IPython.core.display import display, HTML; display(HTML("<style>.container { width:95% !important; }</style>"))


# Data extensions and settings
import numpy as np
np.set_printoptions(threshold = np.inf, suppress = True)
import pandas as pd
pd.set_option('display.max_rows', 500)
pd.options.display.float_format = '{:,.6f}'.format


# Modeling extensions
import sklearn.base as base
import sklearn.cluster as cluster
import sklearn.datasets as datasets
import sklearn.decomposition as decomposition
import sklearn.discriminant_analysis as discriminant_analysis
import sklearn.ensemble as ensemble
import sklearn.feature_extraction as feature_extraction
import sklearn.feature_selection as feature_selection
import sklearn.linear_model as linear_model
import sklearn.metrics as metrics
import sklearn.model_selection as model_selection
import sklearn.neighbors as neighbors
import sklearn.pipeline as pipeline
import sklearn.preprocessing as preprocessing
import sklearn.svm as svm
import sklearn.tree as tree
import sklearn.utils as utils


# Visualization extensions and settings
import seaborn as sns
import matplotlib.pyplot as plt


# Custom extensions and settings
from quickplot import qp, qpUtil, qpStyle
from mlTools import powerGridSearch
sns.set(rc = qpStyle.rcGrey)


# Magic functions
%matplotlib inline


<a id = 'Building-blocks-of-convolutional-neural-networks'></a>

# Building blocks of convolutional neural networks

Convolutional neural networks (CNNs) were inspired by how the visual cortex of the human brain functions when it is recognizing objects. Due to the high performance of CNNs for image classification, this approach has gained a lot of attention and this led to great improvements in machine learning and computer vision applications. Neural networks are able to automatically learn the features from raw data that are most useful for a particular task. This is why neural networks are often thought of as feature extraction engine: that is, the intial layer that immediately follow the input nodes are those that are used to extract low-level features.

Multilayer neural networks, and in particular, deep convolutional neural networks, construct a feature hierarchy by combing low-levek featurs in layer-like fasion to form high-level features. In the context of images, low-level features like edges and blobs are extracted in the early layers, which are combined together to form high-level features that take more familiar shapes, such as buildings, cars or dogs. CNNs contruct feature maps from input images, where each element in the feature map comes from a local patch of pixels in the input images.

A local patch of pixels is refered to as the local receptive field. A CNN's performance on image-related tasks is driven by two important concepts:

1. Sparse connectivity - A single element in the feature map is connected to just a small patch of pixels, as opposed to the whole input image. The latter is true of perceptrons.
2. Parameter sharing - The same weights are used for different patches of the input image.

Because of the two concepts, the number of weights in the network drastically decrease, and there is also an improvement in the algorithm's ability to capture salient features. It makes intuitive sense that the nearby pixles are more relevant to each other than pixels that are far away from each other.

CNNs are typically composed of several convolutional layers and subsampling/pooling layers that are followed by one or more full connected layers at the end. The fully connected layers are effectively a multilater perceptron, where every input unit $i$ is connected to every output unit $j$ with weight $w_{ij}$. One thing to note about pooling/subsampling layers is that these do not have any learnable parameters - there are no weights or bias units. Both convolutional and fully connected layers have weights and biases to be optimized.

<a id = 'Understanding-CNNs-and-learning-feature-hierarchies'></a>

## Understanding CNNs and learning feature hierarchies

Salient, or relevant, features are essential for high performing machine learning algorithms. Traditional machine learning algorithms rely on features determined by a domain expert, or by some computational feature extraction technique.

<a id = 'Performing-discrete-convolutions-in-one-dimension'></a>

## Performing discrete convolutions in one dimension

A discrete convolution (or just a convolution) is a fundamental concept of CNNs. As a basic example, we can look at a discrete convolution between two on-dimensional vectors $\textbf{x}$ and $\textbf{w}$. This is denoted by the formula: $\textbf{y} = \textbf{x} * \textbf{w}$, where vector $\textbf{x}$ is the input, or signal, and $\textbf{w}$ is referred to as the filter, or kernel. A discrete convolution is mathematically defined as follows:

$$
\textbf{y} = \textbf{x} * \textbf{w} \rightarrow \textbf{y}[i] = \sum^\infty_{k = -\infty} \textbf{X}[i - k]\textbf{w}[k]
$$

The brackets [] are used to denote the indexing ofr vector elements. The index $i$ runs through each element of the output vector $\textbf{y}$. To clarify the positive and negative infinity indexing for $\textbf{x}$. A sum that runs through indices in such as range seems odd because, generally speaking, machine learning applications deal with finite feature vectors. As an example, if vector $\textbf{x}$ has 10 features with indices 0,1,2,3,4,5,6,7,8,9, then the indices $-\infty$ to -1 and 10 to $\infty$ are out of bounds for $\textbf{x}$. So in order to compute the summation shown in the predecing formula, it is assumed that $\textbf{x}$ and $\textbf{w}$ are filled with zero. This results in an output vetor $\textbf{y}$ that also has an infinite size with lots of zeros as well. Since this isn't useful in practice, $\textbf{x}$ is padded only with a finite number of zeros. This process is called zero-padding, or just padding. The original vector $\textbf{x}$, with padding $p$ = 2 zeroes, the vector changes from [3,2,1,7,1,2,5,4] to [0,0,3,2,1,7,1,2,5,4,0,0].

Let's assume the original input $\textbf{x}$ and $\textbf{w}$ have n and m elements, respectively, where m <= n. So the padded vector $\textbf{x}^p$ has the size $n + 2p$, and the practical formula for computing a discrete convolution will change to:

$$
\textbf{y} = \textbf{x} * \textbf{w} \rightarrow \textbf{y}[i] = \sum^{k=m-1}_{k = 0} \textbf{x}^p[i + m - k]\textbf{w}[k]
$$

This solves the infinite index issue. The second issue is indexing $\textbf{x}$ with $i + m - k$. The problem is that $\textbf{x}$ and $\textbf{w}$ are indexed in different directions in this summation. For this reason, we flip one of the vector, could be either one, after adding the padding. Then we can compute their dot product. So if we flip the filter $\textbf{w}$ to get $\textbf{w}^r$, then the dot product $\textbf{x}[i:i+m] \cdot \textbf{w}^r$ is computed to get one element $\textbf{y}[i]$, where $\textbf{x}[i:i+m]$ is a patch of $\textbf{x}$ with size m.

This is repeated in a way that mimic a window sliding across the image, which gets all of the output elements. A one-dimensional example would be:

$$
x = (3,2,1,7,1,2,5,4)
\\
w = \bigg(\frac{1}{2}, \frac{3}{4}, 1, \frac{1}{4}\bigg)
$$

If we flip $\textbf{w}$ we get:

$$
\textbf{w}^r = \bigg(\frac{1}{4}, 1, \frac{3}{4}, \frac{1}{2})\bigg)
$$

$m$ = 4, since $\textbf{w}$ has four elements, so the index takes on the values 

<a id = ''></a>

# A

<a id = ''></a>

# A

<a id = ''></a>

# A

<a id = ''></a>

# A

<a id = ''></a>

# A

<a id = ''></a>

# A

<a id = ''></a>

# A

<a id = ''></a>

# A

<a id = ''></a>

# A

<a id = ''></a>

# A

<a id = ''></a>

# A

<a id = ''></a>

# A