# CNN

## Fancier optimization
- SGD:存在较大陷入局部最小的风险
$$x_{t+1} = x_t - \alpha\nabla{f(x_t)}$$
```python
while True:
    dx = compute_gradient(x)
    x += learning_rate * dx
```
- SGD + Momentum:添加衰减项，使之滑动的长度加长，滑出局部最小点(添加一个速度向量)
$$v_{t+1} = \rho v_t + \nabla f(x_t) , \rho=0.9 or 0.99$$
$$x_{t+1} = x_t - \alpha v_{t+1}$$
```python
vx = 0 # 速度初始值设为0
while True:
    dx = compute_gradient(x)
    vx = rho * vx + dx
    x += learning_rate * vx
```
- Nesterov Momentum

$$v_{t+1} = \rho v_t - \alpha \nabla f(x_t)$$
$$x_{t+1} = x_t - \rho v_t + (1+\rho)v_{t+1})$$
```python
dx = compute_gradient(x)
old_v = v
v = rho * v - learning_rate * dx
x += -rho * old_v + (1 + rho) * v 
```
- AdaGrad:添加梯度平方项，并在迭代的过程中除去添加的项，应用于凸函数，且存在学习慢的情形。

```python
grad_squared = 0
while True:
    dx = compute_gradient(x)
    grad_squared += dx * dx
    x -= learning_rate * dx / (np.sqrt(grad_squared) + 1e-7)
```
- RMSProp：对AdaGrad进行优化

```python
grad_squared = 0
while True:
    dx = compute_gradient(x)
    grad_squared = decay_rate * grad_squared + (1 - decay_rate) * dx * dx
    x -= learning_rate * dx / (np.sqrt(grad_squared) + 1e-7)
```
- Adam:给RMSProp添加moment,**默认首选**

```python
first_moment = 0
second_moment = 0
for t in range(num_iterations):
    dx = compute_gradient(x)
    # beta1=0.9, beta2=0.999, learning_rate=1e-3 or 5e-4
    first_moment = beta1 * first_moment + (1 - beta1) * dx
    second_moment = beta2 * second_moment + (1 - beta2) * dx * dx
    first_unbias = first_moment / (1 - beta1 ** t)
    second_unbias = second_moment / (1 - beta2 ** t)
    x -= learning_rate * first_unbias / (np.sqrt(second_unbias) + 1e-7)
```
## Regulation
- L1, L2, L1 + L2
- dropout:应用较多
- batch normalization
- data Augmentation
- dropConnect
- stochatic depth:训练时随机丢掉一些层

*在选择使用中，优先使用BN，如果还出现过拟合的现象，再添加dropout(常用于FC层)*

## Transfer Learning
在数据集相似的情形下可以修改最后一层(训练集少)，最后多层(训练集多)来进行迁移学习。

## CPU vs GPU

## Deep Learning Framework
- caffe, caffe2
- touch, pytouch
- tensorflow
- mxnet


## 9 CNN Architectures
- Case Studies
    - AlexNet
    - VGG
    - GoogLenet
    - ResNet
- Also
    - NiN
    - Wide ResNet
    - ResNeXT
    - Stochastic Depth
    - DenseNet
    - FractalNet
    - SqueezeNet

## 10 RNN


## 11 
dataset: COCO

- Semantic Segmentation
- Classification Localization
- Object Detection
    - R-CNN
    - Fast-R-CNN
- Instance Segmentation
    - Mask R-CNN
**HyperQuest**

Dimensionality Reduction
- PCA
- t-SNE

## 12 deep dream
- Activations
    - Nearest neighbors
    - Dimensionality reduction
    - maximal patches
    - occlusion
- Gradients
    - Saliency maps
    - class visualization
    - fooling images
    - feature inversion
- Fun
    - DeepDream
    - Style Transfer

## 13 Generative
- Supervised Learning
    - Classfacation
    - Regression
    - Object Detection
    - semantic segmentation
    - image captioning
- Unsupervised Learning
    - Clustering
    - dimensionality reduction
    - feature learning
    - density estimation

- Generative Models
    - PixelRNN and PixelCNN
    - Variational Autoencoders(VAE)
    - Generative Adversarial Networks(GANs)

pix2pix

## 14 reinforcement learning
- Policy gradients
- Q-learning

## 15 

- Algorithms for Efficient Inference
    - Pruning
    - Weight Sharing
    - Quantization
    - Low Rank Approximation
    - Binary/ Ternary Net
    - Winograd Transformation
- Algorithms for Efficient Training
    - Parallelization:(params and models)
    - Mixed Precision with FP16 and FP32
    - Model Distillation
    - DSD:Dense-Sparse-Dense Training
- Hardware for Efficient Inference
- Hardware for Efficient Training

## 16 adversarial