## CNN's and other cool neural networks stuff
```During this exercise you will train, examine and visualize convolutional neural networks. Images are a very unique example of high dimensional data. There are, indeed, many dimensions - even for low-resolution images (let's say, 32x32 pixels), while regarding each pixel as a feature, you can end up with hundreds of features. Such large vectors are, generally speaking, hard to visualize, but images make it all much easier, you can just draw the image. We will use this nice property to explore our networks. ```

~```Ittai Haran```

### Examining CNNs

In [1]:
import pandas as pd
import numpy as np
from keras.layers import Dense, Input, Conv2D, Flatten, MaxPool2D, Dropout, Softmax
from keras.models import Model

import matplotlib.pyplot as plt
%matplotlib inline

Using TensorFlow backend.


```As you recall, you used before the MNIST dataset. Now we will use a much harder dataset - CIFAR10, which contains low resolution images of 10 different objects - airplace, automobile, bird, cat, deer, dog, frog, horse, ship and truck. Start by loading the dataset.
Notice: the values given to pixels are in the range [0,255]. You might want to move them to the range [0,1] for later use in your neural networks.```

In [0]:
from keras.datasets import cifar10
(X_train, Y_train), (X_test, Y_test) = cifar10.load_data()
num_to_words = {0:'airplane',1:'automobile',2:'bird',3:'cat', 4:'deer', 5:'dog', 6:'frog', 7:'horse', 8:'ship', 9:'truck'}

```Draw a random picture using plt.imshow function.```

```Create a fully connected neural network that predicts the right class. after compiling your neural network (model.compile) you can use model.summary to print the architecture of your network. Use it, and make sure you have less than 2,000,000 parameters. Use any activations you'ld like.
Notice: Youl'd probably like to use the softmax activation on your last layer. Your network will have to get flattened vectors (or get matrices and flatten them on her own, using the Flatten layer) - what shape should your input have?
Use the adam optimizer. What should be the loss in this classification problem?```

```Train your network for 40 epochs (or 10 minutes, whichever comes faster) and get train accuracy of more than 0.6. What is the accuracy computed on the test? Would you say you are in overfit situation? explain your answer.```

```We will now attack the same problem using a simple CNN. Build a CNN model as follows:```
- ```Input layer```
- ```Convolutional layer of 32 filters of size (3,3) with relu activation```
- ```Convolutional layer of 32 filters of size (3,3) with relu activation```
- ```Max pooling layer```
- ```Dropout layer of p=0.25```
- ```Convolutional layer of 64 filters of size (3,3) with relu activation```
- ```Convolutional layer of 64 filters of size (3,3) with relu activation```
- ```Max pooling layer```
- ```Dropout layer of p=0.25```
- ```Flattning layer (Flatten in keras)```
- ```Fully connected layer of 200 hidden units```
- ```Dropout layer of p=0.5```
- ```Output layer```

```Use the same loss as with the fully connected model and the same optimizer. How many weights are there in your network? Train it for 40 epochs. What train loss did you get? What test loss did you get? How is it similar to the fully connected case?```

```We will now experience with visualization and interpretation of the network. Given a single image, we would like to know what parts of the image contribute most to the prediction of the network. In order to do, each time we will black out part of the image, to get all the possible images with blacked out part:```

![title](resources/image_for_notebook.png)

```In total you will have, for a 32x32 image, 1,024 blacked images. Write a function that, given an image (32x32x3 matrix) and a parameter``` $a$ ```, creates a new tensor (of shape 1024x32x32x3) so that tensor[num] is the original image with a blacked out square of edge size``` $a$ ```, concentrated around the (num//32,num%32)-pixel. We will call the output tensor the "variation tensor".```

```Use``` $a=4$ ```and choose a random image from the test segment that got labeled correctly using your model. Create, using the function you recently wrote, the variation tensor. Create the model's predictions for every blacked image in the tensor (using model.predict(tensor)).
You will get a matrix of shape 1024x10. Take only the column that matches the image's label, so you get a 1024-dimensions vector. Reshape it to be a 32x32 image. Now every pixel tells the probability of the model to label the image correctly, while there is a blacked out square concentrated around that pixel. Show the original image and the image you got. Normalize the scale of the image you got to the range [0,1] using plt.imshow(..., vmin=0, vmax=1).
Take your time to examine the procedure you just created on different images.```

```You can get a concept of "network focus" using the visualization you created.
Given the heatmap you created, find the 80 pixels with smallest values. That means, the pixels that, when removing a square around them, we get the maximum damage to the network prediction. Mark these pixels in the original image (actually, in a copy of the original image - you don't want to destroy your data) by adding 0.5 to the red component of the rgb.
Show the original image, the heatmap image and the marked image.```

### Transfer learning

```We've talked in class about transfer learning. Here you will experience with a simple task of transfer learning. We will take a VGGFace model, trained to predict faces among 2,622 classes, and use it to discriminate between the faces of two men: Gal and Philip. We will do it by cutting out the last layers and replacing them with layers of our own.```

In [0]:
from keras_vggface import VGGFace, utils # an extra directory supplied with the exercise
from sklearn.model_selection import train_test_split

#This next import will help you with augmentation - generating augmented photos from your originals.
#Read about this general teqnique, and also about ImageDataGenerator
from keras.preprocessing.image import ImageDataGenerator
from keras.layers import Flatten, Dense
from keras.models import Model

```Start by loading the data in images.pkl and in images_labels.pkl. Show an image of Gal and an image of Philip and make sure you can tell the difference ;)```

```In transfer learning we take a trained model, cut off its end and replace it with some layers of our own. Hence we will have to preprocess our data the same way it was preprocessed when training VGGFace. Use the following line to do so:```
```python
X_processed = utils.preprocess_input(X.copy().astype(float), version=1)
```
```Transform the values of Y to be [1 0] or [0 1]. Split your data to 80% train data and 20% test data.```

```Let's look at the VGGFace model we are about to use. Load it:```

In [0]:
vgg_model = VGGFace()

```The model is fitted to predict the name of the celebrity in the pictures it gets. Read the example picture given to you in face_example.pkl. Show it. Transform it into a tensor of shape (1, 224, 224, 3) and use utils.preprocess_input as seen before to get it ready to enter to your network. Use```
```python
utils.decode_predictions(model.predict(x))
```
```To get 5 most probable classes. Who is the man in the picture?```

```Let's return to our dataset. As you could impress, it's a pretty small one. Hence we can try to make it bigger using augmentations of the data. We will do it here using a mechanism supplied by keras. Create an instance of keras.preprocessing.image.ImageDataGenerator, which will define how you will create augmentations of each original image you've created. Choose all the parameters on your own (consider, for example, rotation_range, zoom_range, width_shift_range, horizontal_flip and so on).```

```Follow the following instructions:```
- ```Examine the architecture of the VGGFace model using .summary(). Understand what it means to replace the last 2 dense layers (including the final softmax layer). How many weights are there in the model?```
- ```Use .get_layer() to retrieve the last layer which you want to keep in your new network```
- ```Create 2 new Dense layers which continue the previous pretrained layers (the last layer should have a softmax activation, for the first one try tanh)```
- ```Create a new model with the input of the original model as input (vgg_model.input), and which outputs the new dense-softmax layer```
- ```Freeze all of the layers except the last 2 using .layers on the new model, and .trainable = False. This will stop you from training those layers```
- ```Compile the model with sgd optimizer and with metric=['accuracy'] (what does it do?)```

```Now you're ready to train the model:```
- ```Use .fit_generator() and not .fit(), since you'll be using the augmentor you created```
- ```Use .flow() on the instance of ImageDataGenerator as the first input```
- ```Choose a combination of batch_size(within .flow) and steps_per_epoch which will create a total number of images that you want per each epoch. You will have to use small batches (~20) so you won't get memory error```
- ```Use the test segment to validate your results - you can compute your score on the validation if you add validation_data=(X_test, Y_test) in your .fit_generator function```

```Could you transfer the VGGFace model to your new task?```