
# Chapter 5: analysing celebrity faces
## Section 1: Explore possible solutions: downloading the data, visualising it, and trying our previous networks
Download the data here: http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html


In [1]:
## Load the data

## Using the load data pipeline of Tensorflow


## Section 2: Reducing our trainable parameters by adding pooling layers
As you can see in the thing above we have a lot of parameters. What we see is that our convolutional layers get activated, and that these activations tell a lot about what we can see in the image. 
For classification tasks it's a good idea to reduce the amount of parameters. 

To do this we can go over our feature map, take several activations, and group these together. This is called a "pooling layer", you take a pool of parameters and apply a simple function on it. 

https://en.wikipedia.org/wiki/Convolutional_neural_network#Pooling_layer

Famous pooling layers are max-pooling and average pooling. 
`pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)`

![max pool](https://upload.wikimedia.org/wikipedia/commons/thumb/e/e9/Max_pooling.png/314px-Max_pooling.png)

In [2]:
#  pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)
import tensorflow as tf
tf.layers?

## Section 3: Countering overfitting techniques: dropout and L2 regularization
With neural networks more data is always better. Large deep neural networks are almost always capable of learning input-output labels. This was recently demonstrated in this paper, in which they added random labels to images, the neural network was able to predict this nonsense label 100% of the time on the trainset. 

This "overfitting" is a big problem in neural networks. There are many ways to counter this, the two most famous ways are "L2 normalisation" and "dropout". 

### Dropout 
In neural networks every neuron tends to search for one specific pattern. This is all fine, except for that neurons tend to search for dataset specific patterns. To counter this we can randomly set the activations of neurons to zero during training. To score well on the trainset multiple neurons have to be able to detect the same patterns. This in turn ensures that your network generalises well and that you don't overfit. 

Units that are keps are scaled. This way the sum of the active neurons is unchanged. 

There is one parameter to dropout: what percentage of neurons to set to zero. A rate of 0.1 drops 10% of the input units. Hurray, another hyper-parameter you can control during learning!


### L2 regularization
When neural networks overfit some neurons have large weights to other neurons. If this pattern is not visible in the testset your network will still perform poorly. To counter this we can add an extra loss term: a penalty for having large weights! Although this is a great technique, it also required a lot of manual tuning. If you set the penalty too low your network does not care about the regularization. If you set the penalty too high your network won't be able to learn anything. 




In [6]:
tf.layers.dropout?

In [5]:
# https://www.tensorflow.org/api_docs/python/tf/nn/l2_loss

## Section 4: Tricks for faster training: batch normalisation
With picture humans are able to spot patterns that are difficult for neural networks to spot. An image can be grayscale, overlighted, very dark, or noisy: humans can still spot it. 

https://www.quora.com/Why-does-batch-normalization-help

It can help you learn faster, and also gives you a higher accuracy. 

There is a bit of a difference in the "backward pass" through the neural network. This writeup explains it in a very clear way. http://kratzert.github.io/2016/02/12/understanding-the-gradient-flow-through-the-batch-normalization-layer.html

https://www.tensorflow.org/api_docs/python/tf/nn/batch_normalization

Example in Tensorflow: http://ruishu.io/2016/12/27/batchnorm/


Remember when we applied dropout? We used a placeholder to set the percentage it had to drop. During inference (our testing and deployment) we don't want to have any dropout or batch normalization. 


## Section 5: Understand what your network learned: visualising layers and activations

A cool thing about neural networks is that you can see what your neuron reacts to. For normal neurons this can be difficult to see, especially in deeper layers. For convolutional layers however this can be a vital step to inspecting what your network actually learned. 

### Why is this important
Although we humans are good at knowing exactly what to look for when classifying an image, a neural network has to learn this from scratch. Some famous stories tell about the US building neural networks to classify tanks. As training data for the American tanks they took some pictures of the tanks outside of their base. For the Russian tanks they used spy cameras. Although the network performed great on the train and testset it did not work at all in the field... It turned out that the neural network learned the difference between a blurry image (taken with a spy camera) and a clear image (taken at the base itself). 

Another great example to consider is classifying if something is a wolf or a dog. Give it a try: 

![Search result dog](illustrations/dogsearch.png)
![Search result wolf](illustrations/wolfsearch.png)

What you might use to see the different are that wolfs have pointing up ears and thick fur. A neural network could go for totally different features: apparently dogs like to sit in green grass, and wolfs are often found in the snow. If you take the classefier you trained on this data to the local zoo it would likely not recognise the wolfs if there is no snow in the zoo. 


We are going to take a look at the activations of our layers, and will see if we can detect some nice features. 
