# Seminars 9. Intro to Basics of Deep Learning
Mikhail Belyaev

Intro to DL basics:
- Objective functions & Stochastic Gradient Descend;
 - Multilayer perceptron;
 - Activation functions (ReLu, Tanh);
 - Early Stopping;
 - Dropout;
 - Convolutional networks.
 
Overview of DL subfields 
 - image classification
 - image segmentations
 - networks with memory
 - reinforcement learning

## Objective functions

Supervised learning algorithms can be considered as an optimization problem.

$$ w_{opt} = argmin Q(w); \quad Q(w) = \sum_{i=1}^n L(y_i, f(x_i, w)) + R(w) $$

where 
 - $n$ is size of the sample,
 - $w$ are coefficients of the linear model,
 - $L$ is a loss function,
 - $f(x_i, w)$ is a decision function;
 - $R(w)$ is a regularization term.

For example,
 - Ridge regression: 
     $Q(w) = \sum_i \left(y_i - (w^T x_i + w_0)\right)^2 + \lambda \sum_{k=1}^p w_k^2 $;
     * $f(x_i, w) = w^T x_i + w_0$
 - Logistic regression with $L_1$ penalty for a binary classification problem
     $Q(w) = -\sum_i (y_i log(p(x_i)) + (1 - y_i) log(1 - p(x_i)) ) + \lambda \sum_{k=1}^p |w_k| $;
  
   * where $p(x_i)$ is output probablity of the classifier
    $$p(x_i) = \frac{exp(f(x_i, w))}{exp(f(x_i, w)) + 1} = \frac{exp(w^T x_i + w_0)}{exp(w^T x_i + w_0) + 1}$$
   * $f(x_i, w) = w^T x_i + w_0$
     
     
For the details of classification loss functions (i.e. multiclass log loss) see [The Elements of Statistical Learning](http://statweb.stanford.edu/~tibs/ElemStatLearn/), chapter 4.

### Loss functions
<div style="width:100%; text-align:center">
<img src=http://scikit-learn.org/stable/_images/sphx_glr_plot_sgd_loss_functions_0011.png width=500px>
</div>

### Penalty functions
<div style="width:100%; text-align:center">
<img src=http://scikit-learn.org/stable/_images/sphx_glr_plot_sgd_penalties_0011.png width=500px>
</div>

## Stochastic Gradient Descend (SGD)

$$Q(w) = \sum_{i=1}^n L(y_i, f(x_i, w)) + R(w)$$

To fit the classifier, we need the gradient of the objective function
$$\nabla Q(w) = \sum_{i=1}^n \nabla L(y_i, f(x_i, w)) + \nabla R(w) $$

SGD is based on the following idea: we can use a small fraction of points to (stochastically) estimate the gradient:

$$\nabla Q(w) \approx \sum_{i=[10, 20, 35]} \nabla L(y_i, f(x_i, w)) + \nabla R(w) $$ 
 - three points with numbers 10, 20, 35 were taken ad a random example;
 - number of points used for gradient estimation is called *Batch size*.

## Optimization approaches

The gradient descent has several drawbacks (to discuss), so there are other methods
- momentum
- Nesterov momentum
- Adagrad

See http://sebastianruder.com/optimizing-gradient-descent/index.html for details

<div style="width:100%; text-align:center">
<img src=http://sebastianruder.com/content/images/2016/09/contours_evaluation_optimizers.gif width=500px>
</div>

<div style="width:100%; text-align:center">
<img src=http://sebastianruder.com/content/images/2016/09/saddle_point_evaluation_optimizers.gif width=500px>
</div>

## Multilayer perceptron

Lets play with the [tensorFlow playground](http://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=circle&regDataset=reg-plane&learningRate=0.03&regularizationRate=0&noise=0&networkShape=&seed=0.13325&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false&discretize_hide=false)!

TODO:
 - start with a simple logistic regression, try different datasets;
 - add new features;
 - play with SGD parameters (Learning rate, Batch size)
 - remove all additional features, add new layers & build a multilayer fully connected layer

## Activation functions (ReLu, Tanh)

In [10]:
# TODO: plot tanh function vs Relu (f(x)= max(0, x)) 

Lets play with the [tensorFlow playground](http://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=circle&regDataset=reg-plane&learningRate=0.03&regularizationRate=0&noise=0&networkShape=&seed=0.13325&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false&discretize_hide=false)!

TODO:
 - try to solve the spiral problem using tanh
 - replace tanh by ReLu & try again

## Early Stopping

<div style="width:100%; text-align:center">
<img src=https://www.researchgate.net/profile/Giuseppina_Gini/publication/4310358/figure/fig2/AS:279627110076459@1443679706731/Fig-5-The-early-stopping-criterion.png width=500px>
</div>

## Dropout

<div style="width:100%; text-align:center">
<img src=http://engineering.flipboard.com/assets/convnets/dropout.png width=500px>
</div>

<div style="width:100%; text-align:center">
<img src=http://danielnouri.org/media/kfkd/lc2.png width=500px>
</div>

## Convolutional networks

<div style="width:100%; text-align:center">
<img src=http://deeplearning.stanford.edu/wiki/images/6/6c/Convolution_schematic.gif width=500px>
</div>

[Stanford deep learning tutorial.](http://ufldl.stanford.edu/tutorial/supervised/FeatureExtractionUsingConvolution/)

#### Max - Pooling 

<div style="width:100%; text-align:center">
<img src=https://qph.ec.quoracdn.net/main-qimg-8afedfb2f82f279781bfefa269bc6a90 width=500px>
</div>


#### Hierarchical feature extraction
Deeper layers -> more structured and complex patterns

<div style="width:100%; text-align:center">
<img src=http://engineering.flipboard.com/assets/convnets/yann_filters.png>
</div>

[Yann Lecun “ICML 2013 tutorial on Deep Learning”](http://www.cs.nyu.edu/~yann/talks/lecun-ranzato-icml2013.pdf)

Modern conv nets are deep

<div style="width:100%; text-align:center">
<img src=https://writelatex.s3.amazonaws.com/jrvbwfrsywbm/att/f32a655e267fc12ac892f1bedda068b2.png>
</div>


<div style="width:100%; text-align:center">
<img src=https://image.slidesharecdn.com/dlcvd2l4imagenet-160802094728/95/deep-learning-for-computer-vision-imagenet-challenge-upc-2016-4-638.jpg>
</div>

Actually they are extremely deep!

<div style="width:100%; text-align:center">
<img src=https://image.slidesharecdn.com/dlcvd2l4imagenet-160802094728/95/deep-learning-for-computer-vision-imagenet-challenge-upc-2016-48-638.jpg>
</div>

[Deep Learning for Computer Vision: ImageNet Challenge](https://www.slideshare.net/xavigiro/deep-learning-for-computer-vision-imagenet-challenge-upc-2016)

3D image classification example
<div style="width:100%; text-align:center">
<img src=https://writelatex.s3.amazonaws.com/jrvbwfrsywbm/att/1e0e7734f4976722cc858543a3f4847d.png>
</div>
[Residual and Plain Convolutional Neural Networks for 3D Brain MRI Classification](https://arxiv.org/abs/1701.06643)

## Image segmentation 

Object Recognition / Detection
<div style="width:100%; text-align:center">
<img src=https://i.imgur.com/9Y14Jo1.jpg?1>
</div>

- [Image segmentation](https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/image_segmentation.html)

Segmentation
<div style="width:100%; text-align:center">
<img src=https://i.imgur.com/BthG0K9.png?1>
</div>
- [Image segmentation](https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/image_segmentation.html)

Semantic Segmentation
<div style="width:100%; text-align:center">
<img src=https://i.imgur.com/69SQFsT.png?1>
</div>
- [Image segmentation](https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/image_segmentation.html)

Instance segmentation
<div style="width:100%; text-align:center">
<img src=https://i.stack.imgur.com/mPFUo.jpg>
</div>
- [Image segmentation](https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/image_segmentation.html)

3D Image Semantic Segmentation

<div style="width:100%; text-align:center">
<img src=https://sites.google.com/site/braintumorsegmentation/_/rsrc/1431350844030/home/brats_tasks.png>
</div>
[Brain tumor segmentation challenge](http://braintumorsegmentation.org)

## Dimensionality reduction witn DL
### Autoencoders
<div style="width:100%; text-align:center">
<img src=http://ufldl.stanford.edu/tutorial/images/Autoencoder636.png>
</div>




## Recurrent Neural Networks

<div style="width:100%; text-align:center">
<img src=http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-unrolled.png>
</div>

[Understanding-LSTMs](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

### LSTM

<div style="width:100%; text-align:center">
<img src=http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-chain.png>
</div>
[Understanding-LSTMs](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

### An example of the LSTM-generated text

### Он почувствовал себя не в силах думать о том, как он не приезжал к нему после последнего предмета. Борис подошел к ней. Он увидал себя одним из придворных домов и что нибудь приятно было на этой женщине. Она отвечала себе: он уже не приходил к этому настроению. Везде было еще более просто и бессмысленно, и в обществе не имела вопрос о нем, а может быть, и все таки не может быть свадьба моего превосходства и свое веселое чувство волнения, события, как к событиям, как он намерен был найти из обеда. Во всех полициях и военных он не согласился от выгодного мира и принять все свои места. Он только вопросительно наклонил голову и сел на диван.

Bidirectional RNN

<div style="width:100%; text-align:center">
<img src=https://devblogs.nvidia.com/wp-content/uploads/2015/02/rnn_fig-624x548.png>
</div>


[Deep speech](https://arxiv.org/abs/1412.5567)

## Reinforcement Learning

In [2]:
from IPython.display import HTML

In [3]:
HTML('<iframe width="560" height="315" \
     src="https://www.youtube.com/embed/V1eYniJ0Rnk?rel=0&amp;controls=0&amp;showinfo=0" \
     frameborder="0" allowfullscreen></iframe>')

[Demystifying Deep Reinforcement Learning](https://www.nervanasys.com/demystifying-deep-reinforcement-learning/)

<div style="width:100%; text-align:center">
<img src=http://www.ultimaratioregum.co.uk/game/files/2016/03/alphagonature.jpg>
</div>
[Google DeepMind's AlphaGo: How it works](https://www.tastehit.com/blog/google-deepmind-alphago-how-it-works/)

<div style="width:100%; text-align:center">
<img src=https://www.tastehit.com/blog/content/images/2016/03/policy-value-networks.png>
</div>
[Google DeepMind's AlphaGo: How it works](https://www.tastehit.com/blog/google-deepmind-alphago-how-it-works/)

## Generative Adversarial Networks

<div style="width:100%; text-align:center">
<img src=https://cdn-images-1.medium.com/max/800/1*-gFsbymY9oJUQJ-A3GTfeg.png>
</div>
[Generative Adversarial Networks (GANs) in 50 lines of code (PyTorch)](https://medium.com/@devnag/generative-adversarial-networks-gans-in-50-lines-of-code-pytorch-e81b79659e3f)

<div style="width:100%; text-align:center">
<img src=http://kvfrans.com/content/images/2016/06/gencnn-afe135ff8d2725325a22455a488562b0e1cb7ac6a3f60b3cecb373fd043eb202.svg>
</div>
[Generative Adversarial Networks Explained](http://kvfrans.com/generative-adversial-networks-explained/)

<div style="width:100%; text-align:center">
<img src=http://kvfrans.com/content/images/2016/06/cifar-early.png>
</div>
[Generative Adversarial Networks Explained](http://kvfrans.com/generative-adversial-networks-explained/)

## Software & hardware 

Libraries
 - Theano and Lasagne
 - Tensorflow
 - PyTorch

https://www.reddit.com/r/MachineLearning/comments/5w3q74/d_so_pytorch_vs_tensorflow_whats_the_verdict_on/

## Links

1. Python Libraries 
 - Tensorflow, https://www.tensorflow.org/
 - Keras, a higher level lib, http://keras.io
 - PyTorch,  https://www.pytorch.org/ 

2. Basic materials:
 - A good introductory text for shallow networks http://neuralnetworksanddeeplearning.com
 - A nice practical intro to several important aspects of DL
http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/

3. Some additional links
  - winning solution of the Plankton classification contest (more than 1000 participants; 175k \$ in the prizes):
http://benanne.github.io/2015/03/17/plankton.html
 - Convert words to vectors: http://sebastianruder.com/word-embeddings-1/
 - A pair of posts about AlphaGO:
   - is it really important? : https://www.quantamagazine.org/20160329-why-alphago-is-really-such-a-big-deal/
   - technical review of the algorithm: http://deeplearningskysthelimit.blogspot.ru/2016/04/part-2-alphago-under-magnifying-glass.html