#Chapter 9 Convolutional Networks

* 손고리즘 / 손고리즘 ML : 파트 3 - DML [1]
* 김무성

# Contents

* 9.1 The Convolution Operation
* 9.2 Motivation
* 9.3 Pooling
* 9.4 Convolution and Pooling as an Inﬁnitely Strong Prior
* 9.5 Variants of the Basic Convolution Function
* 9.6 Structured Outputs
* 9.7 Convolutional Modules
* 9.8 Data Types
* 9.9 Eﬃcient Convolution Algorithms
* 9.10 Random or Unsupervised Features
* 9.11 The Neuroscientiﬁc Basis for Convolutional Networks
* 9.12 Convolutional Networks and the History of DeepLearning

Convolutional networks (also known as convolutional neural networks or CNNs)are a specialized kind of neural network for processing data that has a known,grid-like topology.

The name “convolutional neural network”indicates that the network employs a mathematical operation called convolution. Convolution is a specialized kind of linear operation.

#### Convolutional networksare simply neural networks that use convolution in place of general matrix multiplication in at least one of their layers.

# 9.1 The Convolution Operation

#### Convolution operator

* Suppose we are tracking the location of a spaceship with a laser sensor. Ourlaser sensor provides a single output x(t), the position of the spaceship at timet. 
* Now suppose that our laser sensor is somewhat noisy. To obtain a less noisyes timate of the spaceship’s position, we would like to average together several measurements. 
* Of course, more recent measurements are more relevant, so wewill want this to be a weighted average that gives more weight to recent measurements.
* We can do this with a weighting function w(a), where a is the age of ameasurement. If we apply such a weighted average operation at every moment,we obtain a new function s providing a smoothed estimate of the position of thespaceship:

<img src="figures/cap9.1.png" />

The convolution operation is typically denoted with an asterisk:

<img src="figures/cap9.2.png"  />

#### input & kernel & feature map

In convolutional network terminology, the ﬁrst argument (in this example,the function x) to the convolution is often referred to as the input and the secondargument (in this example, the function w) as the kernel. The output is sometimes referred to as the feature map.

#### discrete convolution

<img src="figures/cap9.3.png" />

#### multidimensional case

In machine learning applications, the input is usually a multidimensional arrayof data and the kernel is usually a multidimensional array of learn-able parameters.

We will refer to these multidimensional arrays as tensors.

For example, if we use a two-dimensional image I as our input, we probably alsowant to use a two-dimensional kernel K :

<img src="figures/cap9.4.png" />

Note that convolution is commutative, meaning we can equivalently write:

<img src="figures/cap9.5.png" />

#### cross-correlation

While the commutative property is useful for writing proofs, it is not usuallyan important property of a neural network implementation. Instead, many neuralnetwork libraries implement a related function called the cross-correlation, whichis the same as convolution but without ﬂipping the kernel:

<img src="figures/cap9.6.png" />

* Many machine learning libraries implement cross-correlation but call it convolution. 

<img src="figures/cap9.7.png" width=600 />

Discrete convolution can be viewed as multiplication by a matrix.

Viewing convolution as matrix multiplication usually does nothelp to implement convolution operations, but it is useful for understanding anddesigning neural networks.
* Any neural network algorithm that works with matrix multiplication and does not depend on speciﬁc properties of the matrix structure should work with convolution, without requiring any further changes to the neuralnetwork. 
* Typical convolutional neural networks do make use of further specializations in order to deal with large inputs eﬃciently, but these are not strictly necessary from a theoretical perspective.

# 9.2 Motivation

Convolution leverages three important ideas that can help improve a machinelearning system: 
* sparse interactions, 
* parameter sharing, and 
* equivariant representations. 

<img src="figures/cap9.8.png" width=600 />

<img src="figures/cap9.9.png" width=600 />

<img src="figures/cap9.10.png" width=600 />

<img src="figures/cap9.11.png" width=600 />

<img src="figures/cap9.12.png" width=600 />

# 9.3 Pooling

A typical layer of a convolutional network consists of three stages (see Fig. 9.7).
* In the ﬁrst stage, the layer performs several convolutions in parallel to produce aset of presynaptic activations. 
* In the second stage, each presynaptic activation isrun through a nonlinear activation function, such as the rectiﬁed linear activationfunction. This stage is sometimes called the detector stage. 
* In the third stage,we use a pooling function to modify the output of the layer further.

A pooling function 
* replaces the output of the net at a certain location with asummary statistic of the nearby outputs. 
* For example, 
    - the max pooling operation 
        - reports the maximum output within a rectangular neighborhood. 
* Other popular pooling functions include 
    - the average of a rectangular neighborhood, 
    - the L2 norm of a rectangular neighborhood, or 
    - a weighted average based on the distance from the central pixe

<img src="figures/cap9.13.png" width=600 />

<img src="figures/cap9.14.png" width=600 />

#### translation invariant

In all cases, pooling helps to make the representation become invariant to small translations of the input.

KEY IDEA : Invariance to local translationcan be a very useful property if we care more about whether somefeature is present than exactly where it is.

<img src="figures/cap9.15.png" width=600 />

#### inﬁnitely strong prior

The use of pooling can be viewed as adding an inﬁnitely strong prior thatthe function the layer learns must be invariant to small translations. When thisassumption is correct, it can greatly improve the statistical eﬃciency of the network.

#### transformation invariant

Pooling over spatial regions produces invariance to translation, but if we pool over the outputs of separately parametrized convolutions, the features can learn which transformations to become invariant to (see Fig. 9.9).

<img src="figures/cap9.16.png" width=600 />

#### pooling with downsampling

<img src="figures/cap9.17.png" width=600 />

# 9.4 Convolution and Pooling as an Inﬁnitely Strong Prior

* An inﬁnitely strong prior places zero probability on some parameters and says that these parameter values are completely forbidden, regardless of how much support the data gives to those values.
* Of course, implementing a convolutional net as a fully connected net with aninﬁnitely strong prior would be extremely computationally wasteful. But thinkingof a convolutional net as a fully connected net with an inﬁnitely strong prior cangive us some insights into how convolutional nets work.
* One key insight is that convolution and pooling can cause underﬁtting.
    - If a task relies on preserving precisionspatial information, then using pooling on all features can cause underﬁtting.
* Another key insight from this view is that we should only compare convolutional models to other convolutional models in benchmarks of statistical learningperformance.    

# 9.5 Variants of the Basic Convolution Function

<img src="figures/cap9.18.png" />

<img src="figures/cap9.19.png" />

<img src="figures/cap9.20.png" width=600 />

<img src="figures/cap9.21.png" width=600 />

<img src="figures/cap9.22.png" />

<img src="figures/cap9.23.png" />

<img src="figures/cap9.24.png" />

<img src="figures/cap9.25.png" />

<img src="figures/cap9.26.png" />

# 9.6 Structured Outputs

# 9.7 Convolutional Modules

# 9.8 Data Types

<img src="figures/cap9.27.png" width=600 />
<img src="figures/cap9.28.png" width=600 />

Table 9.1: Examples of diﬀerent formats of data that can be used with convolutional networks

# 9.9 Eﬃcient Convolution Algorithms

# 9.10 Random or Unsupervised Features

# 9.11 The Neuroscientiﬁc Basis for Convolutional Networks

<img src="figures/cap9.29.png" width=300 />

<img src="figures/cap9.30.png" width=600 />

<img src="figures/cap9.31.png" width=600 />

# 9.12 Convolutional Networks and the History of DeepLearning

<img src="figures/cap9.32.png" width=600 />

# 참고자료

* [1] bengio's book - Chapter 9 Convolutional Networks - http://www.iro.umontreal.ca/~bengioy/dlbook/version-07-08-2015/convnets.html
* [2] Linear Systems and Convolution - http://www.slideshare.net/lineking/lecture4-26782530
* [3] Convolutional Neural Networks: architectures, convolution / pooling layers - http://vision.stanford.edu/teaching/cs231n/slides/lecture7.pdf