<a href="https://colab.research.google.com/github/paperplane110/learning/blob/master/c5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Deep Learning applied in CV
- Use CNN to process computer vision
- We will use Conv2D and MaxPooling2D layers

## 5.1 Warm up: use CNN on MNIST

In [25]:
from keras.datasets import mnist
from keras.utils import to_categorical

(train_images,train_label),(test_images,test_label) = mnist.load_data()

- Process the data
- <font color='red'>Note that, the Conv2D layer's accepted input is 3D tensor (height,width,channel)</font>

In [26]:
train_images = train_images.reshape((60000,28,28,1))
train_images = train_images.astype('float32')/255

test_images = test_images.reshape((10000,28,28,1))
test_images = test_images.astype('float32')/255

train_label = to_categorical(train_label)
test_label = to_categorical(test_label)

- Build the CNN network

In [34]:
from keras import models
from keras import layers

network = models.Sequential()
network.add(layers.Conv2D(32,(3,3), activation='relu', input_shape=(28,28,1)))
network.add(layers.MaxPool2D((2,2)))
network.add(layers.Conv2D(64, (3,3), activation='relu'))
network.add(layers.MaxPool2D((2,2)))
network.add(layers.Conv2D(64, (3,3), activation='relu'))
network.summary()

Model: "sequential_7"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 3, 3, 64)          36928     
Total params: 55,744
Trainable params: 55,744
Non-trainable params: 0
_________________________________________________________________


In [35]:
network.add(layers.Flatten())
network.add(layers.Dense(64, activation='relu'))
network.add(layers.Dense(10, activation='softmax'))
network.summary()

Model: "sequential_7"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 3, 3, 64)          36928     
_________________________________________________________________
flatten (Flatten)            (None, 576)               0         
_________________________________________________________________
dense_12 (Dense)             (None, 64)               

- Compile the Network

In [37]:
network.compile(
    optimizer='rmsprop',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

- Fit!

In [38]:
history = network.fit(train_images,train_label,epochs=5,batch_size=64)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


- Evaluate

In [39]:
test_loss, test_acc = network.evaluate(test_images,test_label)
print('The acc of this model is: ', test_acc)

The acc of this model is:  0.9904999732971191


### 5.1.2 卷积运算
- 卷积具有平移不变形
- 卷积神经网络可以学到模式的空间层次结构
  - 第一层卷积学到小的模式
  - 下一层将小的模式结合抽象成更大的模式
- 卷积的输出叫做**特征图**
  - 输出的特征图是3D张量
  - 深度轴不再是RGB，而是取决于filter的个数
- 可以看到第一层卷积的产出是（26，26，32）的tensor
  - 这里的32指的是对于32个filter的32个响应图

#### § Conv2D的重点参数
1. 卷积核的大小，通常为（3，3）或（5，5）
2. 输出特征图的深度，即filter的个数

#### Q：输出的特征图长宽通常比输入小
1. 边界效应
  - 如何输出相同长宽？设置填充padding="same"
  - 默认设置：padding="valid"，即无填充
2. 卷积的步幅
  - 卷积核通常一个一个像素的平移
  - 步幅指的是卷积核每次移动所走过的像素个数
  - 通常不用其他步幅
