## More Layers: Residual Block
### "Skip connection/shortcut" (ResNet)

$a^{[l]}= g(z^{[l]}) $


$a^{[l+2]}= g(z^{[l+2]}) \;\to\; a^{[l+2]}=g(z^{[l+2]+a^{[l]}}) $ 


<div class="verticalhorizontal">
    <img src="images/5_1.png" width ="450" height="250" alt="centered image" />
</div>

**Motivation/Explanation**
\begin{align*}
a^{[l+2]} &= g(z^{[l+2]}+a^{[l]}) \\
&=g(W^{[l+2]}a^{[l+1]} +b^{[l+2]} + a^{[l]})

\end{align*}

For $W^{[l+2]}\approx 0,\; \; b^{[l+2]}\approx 0 $, $\;a^{[l+2]}=g(a^{[l]}) $. Thus, it introduces stability in this system. At least, the model makes output in each layer continues moving forward, rather than shrinking to 0. Also, increasing layers is not likely to increase errors.

Important thing is make sure that **dim** $a^{[l+2]} =$ **dim** $a^{[l]} $ 


**Alternative Explanation**

\begin{align*}
z^{[l+2]}&= W^{[l+2]}a^{[l+1]}+b^{[l+2]}\\
&=W^{[l+2]}g(W^{[l+1]}a^{[l]}+b^{[l+1]})+b^{[l+2]}\\
&=F(a^{[l]})

\end{align*}

$a^{[l+2]}=a^{[l]} +F(a^{[l]}) $, iterative integration method

## More Layers: Bottleneck Layer 

Suppose a convolution from 28 x 28 x 192 $\to$ 28 x 28 x 32 by a Conv 5 x 5 x 192 filter. The operation is about 28 x 28 x 32 x 5 x 5 x 192 = 120M. The bottleneck layer introduces an intermidiate layer 28 x 28 x 16 with Conv 1 x 1 filters. The first part of operation is about 28 x 28 x 16 x 192 = 2.4M, the second is 28 x 28 x 32 x 5 x 5 x 16 = 10M. 

**Inception Network**

Through bottleneck method, having multiple operations at the same layer.  1x1 Conv

<div class="verticalhorizontal">
    <img src="images/5_2.png" width ="450" height="325" alt="centered image" />
</div>

## Computer Vision Problems and Networks Setup

* Object Classification 
* Object Localization
* Object Detection


### FCONV (convolutional sliding windows)
OverFeat

For example, 14 x 14 x 3 -CONV 5x5-> 10 x 10 x 16 -MPOOL 2x2-> 5 x 5 x 16 -FCONV 5x5-> 1 x 1 x 400 -FCONV-> 1 x 1 x 4

### YOLO (You Only Look Once)
Detection: Classification with Grids 

### Intersection Over Union (IoU)
$$\text{IoU} = A_1 \cap A_2 / (A_1 \cup A_2) \ge 0.5$$

### Non-max Suppression
* Discard all boxes with pc < pthresh (0.6)
* While boxes remained: pick the box with largest pc, discard any box with IoU > Itresh


 


## Neural Style Transfer

<div class="verticalhorizontal">
    <img src="images/5_3.png" width ="400" height="300" alt="centered image" />
</div>

* Define a cost function with two components 

    $J(G)=\alpha Jc(C,G) +\beta Js(S,G) = $ content + style losses
* Algorithm:
    1. Initiate G randomly = (100, 100, 3)
    2. Use GD to minimize J(G)
* Content Loss:
    - Use pretrained object classification/recognition Network (VGG)
    - How similar the activations of C and G when they propagate through layer I?
    $Jc(C,G) =\frac{1}{2}||a^{[l](C)}-a^{[l](G)}||_2 $

* Style Loss:
    - "style": how consistene are activations across channels?
    - Define **pairwise correlation** between activations: how often texture components occur together
    
    $\displaystyle G^{[l](G)}_{kk'} = \sum_{i=1}^{n^{[l]}_H}\sum_{j=1}^{n^{[l]}_W}a^{[l](G)}_{ijk}a^{[l](G)}_{ijk'} $

    $\displaystyle G^{[l](S)}_{kk'} = \sum_{i=1}^{n^{[l]}_H}\sum_{j=1}^{n^{[l]}_W}a^{[l](S)}_{ijk}a^{[l](S)}_{ijk'} $

    $\displaystyle Js^{[l]}(S,G)=\frac{1}{(n^{[l]}_Hn^{[l]}_Wn^{[l]}_c)^2}||G^{[l](S)}_{kk'}-G^{[l](G)}_{kk'}||_F $

    $$Js(S,G) =\sum_l \lambda^{[l]}Js^{[l]}(S,G) $$



## Summary 
* Conv, Pool Fundamental Operations

* Extensions
   - Skip Connections, Residual
   - BConv, Inception
   - FConv
