# VGG16
This notebook will recreate the VGG16 model from [FastAI Lesson 1](http://course.fast.ai/lessons/lesson1.html) ([wiki](http://wiki.fast.ai/index.php/Lesson_1_Notes))

Visual Geometry Group

VGG16 < Keras < TensorFlow < CUDA (cuDNN)

TensorFlow good for multi-GPU

NN doesn't look at 1 image or all images at a time. We look at a batch (or mini-batch) amount at a time. GPU does parallel computing, so we need multiple images to load up the GPU, but not so much that we run out of GPU memory.

## Keras Directory Structure
Directory structure is important in Kears. Each of the 3 subsets of our data need to be in its own folder:
* **Train** - data used to fit parameters for NN
* **Validate** - data used to fine tune the NN parameters
* **Test** - used to test the final model to see how well the NN generalizes to new data

Within each folder, keras expects each class to be in its own directory. That means we will have the same number of sub-directories in each for however many output classes we are trying to map the input to (dogs + cats = 2). If there was a third animal, we'd have 3 sub-directories under each.

     └───data
         └───dogscats
             ├───train
             │   ├───cats
             │   └───dogs
             ├───valid
             │   ├───cats
             │   └───dogs
             ├───test
             │   ├───cats
             │   └───dogs
             └───sample
                 ├───train
                 │   ├───cats
                 │   └───dogs
                 └───valid
                     ├───cats
                     └───dogs

**It is a good idea to have a sample/ directory with much smaller train and validate subdirectories in order to quickly test basic code functionality without running a huge batch**

## Repurposing the Pretrained VGG16 Model

The pretrained VGG16 model was trained using [image-net data](http://image-net.org/explore). This data is made up of thousands of categories of "things" which each have many framed, well-lit, centered, and focused photos. Knowing these characteristics of the training images will help us understand how this model can and can't work for our dogs vs. cats task.

The image-net data is more specific than just dogs and cats, it has been trained on specific breeds of each. One hot encoding is used to label images. This is where the label is a vector of 0's of size equal to the number of categories, but has a 1 where the category is true. So for [dogs, cats] a label of [0, 1] would mean it is a cat.

By repurposing the image-net VGG16 to look for just cats and dogs, we are **Finetuning** the mode. This is where we start with a model that already solved a similar problem. Many of the parameters should be the same, so we only select a subset of them to re-train. Finetune will replace the 1000's of image-net categories with the 2 it found in our directory structure (dogs and cats). It does this by removing (keras .pop method) and then adding a new output layer with size 2. Now we are using the pretrained VGG16 model specifically for categorizing just cats and dogs.

Why do Finetuning instead of training our own network?  
Image-net NN has already learned a lot about what the world looks like. The first layer of a NN looks for basic shapes, patterns, or gradients ... which are known as **gabor filters**. These images come from this paper ([Visualizing and Understanding Convolutional Networks](https://arxiv.org/pdf/1311.2901.pdf)):

<img src="images/Layer1.png" alt="Drawing" style="width: 600px;"/>

The second layer combines layer 1 filters to create newer, more complex filters. So it turns multiple line filters into corner filters, and combines curved edges into circle filters, for example.  

<img src="images/Layer2.png" alt="Drawing" style="width: 600px;"/>

Further into the hidden layers of a NN, filters start to find more complex shapes, repeating geometric patterns, faces, etc.

<img src="images/Layer3.png" alt="Drawing" style="width: 600px;"/>

<img src="images/Layer4-5.png" alt="Drawing" style="width: 600px;"/>

VGG16 has ... 16 ... layers, so there are tons of filters created at each layer. Finetuning keeps these lower level filters which have been created already and then combines them in a different way to address different inputs (i.e. cats and dogs instead of 1000's of categories). Neural networks pretrained on HUGE datasets have already found all of these lower level filters, so we don't need to spend weeks doing that part ourselves. Finetuning usually works best on the second to last layer, but it's also a good idea to try it at every layer.

Additional information on fine-tuning (aka transfer-learning) can be found on Stanford's CS231n website [here](http://cs231n.github.io/transfer-learning/).


## Custom Written VGG16 Model

Write and explain how to do an entire VGG16 model here...