# Convolutional Neural Network Applications

# Deep Learning Libraries

<center><img src="https://www.houseofbots.com/images/news/4015/cover.png" width=800>

<img src="https://images.ctfassets.net/sc14p6l3fnnh/3TxuyeJrWZcIShYarhqV3i/0e1fb927536970856fe3c8bb4c65ba7a/pytorch-1-.png">

<img src="https://github.com/jordanott/DeepLearning/blob/master/Figures/tf_vs_pytorch.png?raw=true">

In [21]:
import tensorflow as tf
import numpy as np
import torch

N, D_in, H, D_out = 64, 1000, 100, 10

# Tensorflow (Static Computation Graph)

1. Build computational graph describing our computation (including finding paths for backprop)
2. Reuse the same graph on every iteration

In [22]:
# Create placeholders for data; these get filled when we execute the graph.
x = tf.placeholder(tf.float32, shape=(None, D_in))
y = tf.placeholder(tf.float32, shape=(None, D_out))

# Create Variables for the weights and initialize them with random data.
w1 = tf.Variable(tf.random_normal((D_in, H)))
w2 = tf.Variable(tf.random_normal((H, D_out)))

# Note that this doesn't actually perform any operations
h = tf.matmul(x, w1); h_relu = tf.maximum(h, tf.zeros(1))
y_pred = tf.matmul(h_relu, w2)
# Compute loss using operations on TensorFlow Tensors
loss = tf.reduce_sum((y - y_pred) ** 2.0)

# Compute gradient of the loss with respect to w1 and w2
grad_w1, grad_w2 = tf.gradients(loss, [w1, w2])

# To update weights we need to evaluate the graph
learning_rate = 1e-6
new_w1 = w1.assign(w1 - learning_rate * grad_w1)
new_w2 = w2.assign(w2 - learning_rate * grad_w2)
# Now we have built our computational graph;enter a TensorFlow session
with tf.Session() as sess:
    # Run the graph once to initialize the Variables w1 and w2.
    sess.run(tf.global_variables_initializer())

    # Create np arrays holding the real data for the inputs and targets 
    x_value = np.random.randn(N, 1000); y_value = np.random.randn(N, D_out)
    for _ in range(500):
        # Each time it executes we want to bind x_value to x and y_value to y
        loss_value, _, _ = sess.run([loss, new_w1, new_w2], 
                feed_dict={x: x_value, y: y_value})

# Tensorflow Keras Wrapper?!?


In [3]:
import tensorflow.keras

* There's already a whole library for keras
* Why is it included in tensorflow?

* In summary **WTF**


# PyTorch (Dynamic Computation Graph)

* Tensor: Like a numpy array, but can run on GPU
* Autograd: Package for building computational graphs out of Tensors, and automatically computing gradients
* Module: A neural network layer; may store state or learnable weights
* I like it because the syntax is almost identical to numpy but with GPU support

In [19]:
# Create random Tensors to hold input and outputs.
x = torch.randn(N, D_in, dtype=torch.float)
y = torch.randn(N, D_out, dtype=torch.float)

# Create random Tensors for weights.
w1 = torch.randn(D_in, H, dtype=torch.float, requires_grad=True)
w2 = torch.randn(H, D_out, dtype=torch.float, requires_grad=True)

optimizer = torch.optim.SGD([w1,w2], lr=1e-3)

for t in range(500):
    # Forward pass: compute predicted y using operations on Tensors; these are exactly the same operations we used to compute the forward pass using
    y_pred = x.mm(w1).clamp(min=0).mm(w2)

    # Compute and print loss using operations on Tensors.
    loss = (y_pred - y).pow(2).sum()
    
    # Use autograd to compute the backward pass. This call will compute the gradient of loss with respect to all Tensors with requires_grad=True.
    # After this call w1.grad and w2.grad will be Tensors holding the gradient of the loss with respect to w1 and w2 respectively.
    loss.backward()

    optimizer.step()

PyTorch autograd looks a lot like TensorFlow: in both frameworks we define a computational graph, and use automatic differentiation to compute gradients. The biggest difference between the two is that TensorFlowâ€™s computational graphs are static and PyTorch uses dynamic computational graphs.

In TensorFlow, we define the computational graph once and then execute the same graph over and over again, possibly feeding different input data to the graph. In PyTorch, each forward pass defines a new computational graph.

Static graphs are nice because you can optimize the graph up front; for example a framework might decide to fuse some graph operations for efficiency, or to come up with a strategy for distributing the graph across many GPUs or many machines. If you are reusing the same graph over and over, then this potentially costly up-front optimization can be amortized as the same graph is rerun over and over.

One aspect where static and dynamic graphs differ is control flow. For some models we may wish to perform different computation for each data point; for example a recurrent network might be unrolled for different numbers of time steps for each data point; this unrolling can be implemented as a loop. With a static graph the loop construct needs to be a part of the graph; for this reason TensorFlow provides operators such as tf.scan for embedding loops into the graph. With dynamic graphs the situation is simpler: since we build graphs on-the-fly for each example, we can use normal imperative flow control to perform computation that differs for each input.

# Summary

* If you want something high level that works fast: **Keras**
* Need to write custom layers and do fancy computations in your network: **PyTorch**


* If you hate yourself: **Tensorflow**


<img width=600 src="https://upload.wikimedia.org/wikipedia/en/thumb/f/f7/The_Aerospace_Corporation_logo.svg/1280px-The_Aerospace_Corporation_logo.svg.png">

<center><img src="https://cdn.drawception.com/images/panels/2014/11-18/eASthDRM7h-3.png">

* If a robot looses signal (i.e. no GPS)
* Want to be able to localize itself
* Using only images

[Video](https://www.youtube.com/watch?v=u0MVbL_RyPU)


# PoseNet


|  Input: RGB Image  |  Output: Lat,lon coordinates  |
| ------- | -------- |
| <img src="http://www.velvetblues.com/wp-content/uploads/google-street-view.jpg" width=500> | <img src="https://amp.businessinsider.com/images/5c9542680cf9131e9a761712-750-571.jpg" width=500>  |

* ~3 meter accuracy in large area of LA

# Popular Architectures

# AlexNet

<center><img src="https://miro.medium.com/max/3072/1*qyc21qM0oxWEuRaj-XJKcw.png">

# VGG

<center><img src="https://neurohive.io/wp-content/uploads/2018/11/vgg16-neural-network.jpg">

# Inception

<center><img src="https://devblogs.nvidia.com/wp-content/uploads/2015/08/image6.png">

# Inception Module

<center><img src="https://d2mxuefqeaa7sj.cloudfront.net/s_8C760A111A4204FB24FFC30E04E069BD755C4EEFD62ACBA4B54BBA2A78E13E8C_1490879611424_inception_module.png" width=800>

# Resnet

<center><img src="https://www.researchgate.net/profile/Seunghyoung_Ryu/publication/329954455/figure/fig1/AS:725290594623488@1549934161033/The-structure-of-ResNet-12.png">

# Loading Pretrained Models From Keras

In [None]:
from keras.applications.resnet50 import ResNet50

model = ResNet50(weights='imagenet')

<center><img src="https://github.com/jordanott/DeepLearning/blob/master/Figures/dune.png?raw=true">

# Computer Vision Problems

* Classification
* Localization
* Object Detection
* Semantic Segmentation
* Instance Segmentation

<center><img src="https://miro.medium.com/max/1400/1*onhKzFMWm8KcikvubonH0g.png">

<center><img width=800 src="https://miro.medium.com/max/1276/1*0EEeHWFg7kpFKzvUKw0WSw.jpeg">

# Object Detection

<center><img width=800 src="https://miro.medium.com/max/1400/1*9wzAGR0GDsI4cBUY8qHBVQ.jpeg">

# Semantic Segmentation

<center><img width=1200 src="https://miro.medium.com/max/1400/1*nXlx7s4wQhVgVId8qkkMMA.png">

<center><img width=1000 src="https://miro.medium.com/max/1276/1*ratkNlE3u5cT6AChInXmXQ.jpeg">

# How would you solve each of these problems?

# Applications

|  Self driving cars   |  Biomedical   |
|----|----|
| <center><img src="https://thumbs.gfycat.com/SociableAmazingApe-small.gif">   |  <img src="https://miro.medium.com/max/1284/1*8xxwKeAUQoFp28s_gw-67w.png">  |
|  Shopping  |  Satellite imaging |
|  <img src="https://eenews.cdnartwhere.eu/sites/default/files/styles/inner_article/public/sites/default/files/images/2018-07-20-s20_ai_autonomous_checkout_retail_cashierless_standard_cognition_amazon_go_.jpg?itok=dMQ-i57t"> | <center><img src="https://miro.medium.com/max/1838/1*eu72LO87fPueTbIwic4QAQ.png" width=400>

# Input: Image Output: Image

<center><img width=1000 src="https://cdn-images-1.medium.com/max/1600/1*JMIdlC6SitUNT4pAPzolGQ.png">

# Review of CNN

<center><img width=1200 src="https://miro.medium.com/max/1838/1*uAeANQIOQPqWZnnuH-VEyw.jpeg">

* At each layer the input is *convolved* with a weight matrix
    * i.e. dot product the weight matrix and input as the filter slides across the space
* Weights are shared across space
    * the same filter (weights) are used at all locations on the input
* After all the convolutional layers
    * the output volume is flattened
* Fed to fully connected layers for classification


# Review of Convolution


<center><img width=1000 src="https://miro.medium.com/max/1400/1*ciDgQEjViWLnCbmX-EeSrA.gif">

# Convolution as Matrix Multiplication


<img src="https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/image_folder_4/Convolution_With_Im2col.png">

* Instead of sliding the weight matrix to each position in the image with a `for` loop
* It's more effecient to perform a matrix multiplication


# Transposed Convolution

* Transposed
* Deconvolution
* Fractionally strided convolution

<center><img width=600 src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcS66UrIvW4fA4jglM3wLt_Oyy3k0CswWAabstS0olI5f9uWqLThXw">

Why it's called the [transposed convolution](https://medium.com/activating-robotic-minds/up-sampling-with-transposed-convolution-9ae4f2df52d0)

# Code a Deconvolution


# Fully Convolutional Network (FCN)


<img src="https://miro.medium.com/max/1838/0*jIBAjzSynvcV-lvO.png">


Here the architecture consists of convolutional layers followed by transposed convolutional layers. A downside of this type of architecture is that its overly simplistic and transposed convolutions on their own can lead to a checkerboarding effect in the output. 

# UNet


<center><img width=1100 src="https://i.stack.imgur.com/EtyQs.png">

The UNet was one of the first models developed to build on the FCN. It was originally made for bio medical imaging, in order to segment cell types. The UNet works by applying convolutions to downsample then applying transposed convolutions for upsampling, much like the FCN. However, here feature maps are copied over (grey arrows) and concatenated with the upsampled features. This helps to enhance the quality of the output.

# Pyramid Scene Parsing Network (PSPNet)


<img src="https://hszhao.github.io/projects/pspnet/figures/pspnet.png">

A more recent approach to image segmentation is the PSP Network. Here a similar approach is taken with initial convolutional operation (CNN in image). The addition this paper makes is to segment the image into varying resolutions via global average pooling (red, orange, blue, green). Then apply a 1x1 convolution to reduction dimensionality. Then these varying resolution feature maps are upsampled via bilinear interpolation with a transposed convolution. They are all concatenated together and applied to another convolution for the output. The PSP architecture helps to incorporate global information via the varying resolution feature maps. This is important for image segmentation. 


# You Only Look Once ([YOLO](https://youtu.be/MPU2HistivI?t=4))

<img src="https://lilianweng.github.io/lil-log/assets/images/yolo-network-architecture.png">

# Object Detection Evaluation - Intersection over Union

<center><img width=1000 src="https://www.pyimagesearch.com/wp-content/uploads/2016/09/iou_examples.png">

In [2]:
from IPython.display import HTML

In [5]:
HTML('<iframe width=100% height="500" src="https://www.youtube.com/embed/MPU2HistivI" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>')

# References

If you really want to know the [math](https://arxiv.org/pdf/1603.07285.pdf)
