# Practical Deep Learning
## Getting your deep neural network into a car

<br>
Sander van Dijk @ Parkopedia

<br>
3 May 2018

# SDC @ Parkopedia:  Autonomous Valet Parking

<center>
    <img src="figs/avp.png" alt="Autonomous Valet Parking"></center>
</center>

Parkopedia provides parking services for connected cars now, we aim to power autonomous parking in the future.

Most importantly this means providing detailed maps. To validate the usefulness, we are building a full self driving AVP demonstration.

# What's Deep Learning?

<center><img src="figs/nn_example-624x218.png"></center>

* *Neural network with more than 1 hidden layer*
* Decades old
* Now booming thanks to breakthroughs in:
    * Hardware (GPUs)
    * Large data sets
    * Some theoretical architecture/training advances


# Deep Learning Achievements: ImageNet

<center><img src="figs/imagenet-examples.png"></center>

# Deep Learning Achievements: ImageNet

<center><img src="figs/imagenet-scores.png"></center>

# Deep Learning Achievements: AlphaGo

<center><img src="figs/alphago.png"></center>

<small>Source: Mastering the game of Go with deep 
neural networks and tree search, Silver et al, Nature, 2016</small>

# Deep Learning Achievements: AlphaGo

<center><table style="border: 0px"><tr style="border: 0px">
    <td width="50%" style="border: 0px"><img src="figs/alphago-kejie.jpg"></td>
    <td style="border: 0px"><img src="figs/alphago-elo.png"></td>
</tr></table></center>

# Deep Learning Achievements: Translation

<center>
    <img src="figs/translation-viz.png" width="60%">

    <img src="figs/translation-example.png" width="60%">
</center>

<small>Source: <a href="https://research.googleblog.com/2016/09/a-neural-network-for-machine.html">Google research blog</a></small>

# Deep Learning Achievements: Translation

<center>
    <img src="figs/translation-perf.png">
</center>

# But At What Cost??

**Training: huge GPU (/TPU) cluster**

OK, we can live with that, as long as we can run it in our car. Can we...?

# But At What Cost??

* **SENet** (ImageNet 2017):

    > 209 ms on a server with 8 NVIDIA Titan X GPUs
* **AlphaGo**

    <img src="figs/alphago-cost.png" style="height: 160px">

* **GNMT**

    Trained on 12 machines, no info on runtime, but it's Google...

# That's a Lot of Computation..
<center><img src="figs/truck-computers.jpg"></center>

# That's a Lot of Computation..
<center><img src="figs/audi-trunk.jpg"></center>

# DL Tasks for Self Driving Cars

* **Object Detection** - Find bounding boxes of objects in an image (and what kind of objects they are)
* **Image Segmentation** - What kind of object does each pixel belong to?
* **Object Segmentation** - Which pixels belong to the same object (and what kind of object is that)?
* Reinforcement Learning - What actions should de car perform?
* Human Sensing - Is the driver paying attention?

Image classifiaction, not so much

# DL Tasks for SDCs: Object Detection

<center><img src="figs/yolo9000.png"></center>

<small>Source: <a href="https://github.com/karolmajek/darknet">Karol Majek</a> - YOLO9000</small>

# DL Tasks for SDCs: Semantic Segmentation

<center><img src="figs/semantic-segmentation.png"></center>

<small>Source: http://abhijitkundu.info/projects/fso/</small>

# DL Tasks for SDCs: Object Segmentation

<center><img src="figs/mask-rcnn.png"></center>

<small>Source: <a href="https://github.com/karolmajek/darknet">Karol Majek</a> - Mask-RCNN</small>

<center><img src="figs/mit-architectures.png"></center>

<small>Source: <a href="https://selfdrivingcars.mit.edu/">MIT 6.S094: Deep Learning for Self-Driving Cars</a></small>

If the network learns its output given image data, the base of the network is generally very similar: a bunch of encoding convolution layers

# How to Choose a Network?

### *Fits your problem*
### But: do you need DL all the way?
### Computation vs Performance

# Computation vs Performance

<center><img src="figs/mobilenet_v1.png"></center>

# Computation vs Performance

<center><img src="figs/yolo-perf.png"></center>

* Watch out for resolutions, and authors that use flashy names.
* Testing is best, but can get an objective intuition of operations before building a network.
* Because understanding each layer and building and testing your own network is always better

# Operation Costs: Convolution

<center><img src="figs/convolution.png" style="width: 70%;"></center>

<small>Source: <a href="http://machinethink.net/blog/googles-mobile-net-architecture-on-iphone/">MachineThink</a></small>

# Operation Costs: Convolution

<img src="figs/convolution.png" style="width: 30%; float: right;">


* A $W \times H$ input tensor (image)... 
* with $C$ features (channels)...
* is convolved with $F$ ...
* $N \times N$ kernels

Total number of operations (Multiply-Accumulate, MACs):

$W \times H \times C \times F \times N^2$