# Table of contents
1. [Transfer Learning with TensorFlow](#Transfer Learning with TensorFlow)
2. [ImageNet Dataset Inference](#ImageNet Inference)
3. [AlexNet: Traffic Sign Inference](#Traffic Sign Inference)
    1. [Feature Extraction](#Feature Extraction)
    2. [Training the Feature Extractor](#Training the Feature Extractor)
4. [Transfer Learning with VGG, Inception:GoogLeNet, and ResNet](#Transfer Learning with)

# 1. Transfer Learning with TensorFlow <a name='Transfer Learning with TensorFlow'></a>

__Transfer learning__ is the practice of starting with a network that has already been trained, and then applying that network to your own problem.

Because neural networks can often take days or even weeks to train, transfer learning (i.e. starting with a network that somebody else has already trained) can greatly shorten training time.

How do we apply transfer learning? Two popular methods are __feature extraction__ and __finetuning__.

1. __Feature extraction__. Take a pretrained neural network and replace the final (classification) layer with a new classification layer, or perhaps even a small feedforward network that ends with a new classification layer. During training the weights in all the pre-trained layers are frozen, so only the weights for the new layer(s) are trained. In other words, the gradient doesn't flow backwards past the first new layer.
2. __Finetuning__. This is similar to feature extraction except the pre-trained weights aren't frozen. The network is trained end-to-end.

The labs in this lesson will focus on feature extraction since it's less computationally intensive.

# 2. ImageNet Dataset Inference <a name='ImageNet Inference'></a>

<img src='Images/ImageNet Inference.png' width=200>
$$ \text{top: Poodle, bottom: Weasel} $$

To start, run __imagenet_inference.py__, and verify that the network classifies the images correctly.

```Python
python imagenet_inference.py
```

The output should look similar to this:
```Python
Image 0
miniature poodle: 0.389
toy poodle: 0.223
Bedlington terrier: 0.173
standard poodle: 0.150
komondor: 0.026

Image 1
weasel: 0.331
polecat, fitch, foulmart, foumart, Mustela putorius: 0.280
black-footed ferret, ferret, Mustela nigripes: 0.210
mink: 0.081
Arctic fox, white fox, Alopex lagopus: 0.027

Time: 5.587 seconds
```

# 3.  AlexNet: Traffic Sign Inference <a name='Traffic Sign Inference'></a>

<img src='Images/Traffic Sign Inference.png' width=200>
$$ \text{top: construction sign, bottom: stop sign} $$

Next, run python __traffic_sign_inference.py__, and see how well the classifier performs on the example construction and stop signs.

OH NO!

AlexNet expects a 227x227x3 pixel image, whereas the traffic sign images are 32x32x3 pixels.

In order to feed the traffic sign images into AlexNet, you'll need to resize the images to the dimensions that AlexNet expects.

You could resize the images outside of this program, but that approach doesn't scale well. Instead, use the [tf.image.resize_images](https://www.tensorflow.org/api_guides/python/image#Resizing) method to resize the images as they are fed into the model.

Open up __traffic_sign_inference.py__ and complete the __TODO(s)__.

The output should look similar to this:
```Python
Image 0
screen, CRT screen: 0.051
digital clock: 0.041
laptop, laptop computer: 0.030
balance beam, beam: 0.027
parallel bars, bars: 0.023

Image 1
digital watch: 0.395
digital clock: 0.275
bottlecap: 0.115
stopwatch, stop watch: 0.104
combination lock: 0.086

Time: 0.592 seconds
```

__Quiz:__

In [None]:
"""
The traffic signs are 32x32 so you
have to resize them to be 227x227 before
passing them to AlexNet.
"""
import time
import tensorflow as tf
import numpy as np
from scipy.misc import imread
from caffe_classes import class_names
from alexnet import AlexNet

x = tf.placeholder(tf.float32, (None, 32, 32, 3))
# TODO: Resize the images so they can be fed into AlexNet.
# HINT: Use `tf.image.resize_images` to resize the images
resized = tf.image.resize_images(x, (227, 227))

assert resized is not Ellipsis, "resized needs to modify the placeholder image size to (227,227)"
probs = AlexNet(resized)

init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

# Read Images
im1 = imread("construction.jpg").astype(np.float32)
im1 = im1 - np.mean(im1)

im2 = imread("stop.jpg").astype(np.float32)
im2 = im2 - np.mean(im2)

# Run Inference
t = time.time()
output = sess.run(probs, feed_dict={x: [im1, im2]})

# Print Output
for input_im_ind in range(output.shape[0]):
    inds = np.argsort(output)[input_im_ind, :]
    print("Image", input_im_ind)
    for i in range(5):
        print("%s: %.3f" % (class_names[inds[-1 - i]], output[input_im_ind, inds[-1 - i]]))
    print()

print("Time: %.3f seconds" % (time.time() - t))

__Answer:__
```Python
"""
The traffic signs are 32x32 so you
have to resize them to be 227x227 before
passing them to AlexNet.
"""
import time
import tensorflow as tf
import numpy as np
from scipy.misc import imread
from caffe_classes import class_names
from alexnet import AlexNet


# placeholders
x = tf.placeholder(tf.float32, (None, 32, 32, 3))
resized = tf.image.resize_images(x, (227, 227))

probs = AlexNet(resized)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)

# Read Images
im1 = imread("construction.jpg").astype(np.float32)
im1 = im1 - np.mean(im1)

im2 = imread("stop.jpg").astype(np.float32)
im2 = im2 - np.mean(im2)

# Run Inference
t = time.time()
output = sess.run(probs, feed_dict={x: [im1, im2]})

# Print Output
for input_im_ind in range(output.shape[0]):
    inds = np.argsort(output)[input_im_ind, :]
    print("Image", input_im_ind)
    for i in range(5):
        print("%s: %.3f" % (class_names[inds[-1 - i]], output[input_im_ind, inds[-1 - i]]))
    print()

print("Time: %.3f seconds" % (time.time() - t))
```

The notable part being:
```Python
x = tf.placeholder(tf.float32, (None, 32, 32, 3))
resized = tf.image.resize_images(x, (227, 227))
```

### 3.1. Feature Extraction <a name='Feature Extraction'></a>

The problem is that AlexNet was trained on the [ImageNet](http://www.image-net.org/) database, which has 1000 classes of images. We can see the classes in the __caffe_classes.py__ file. None of those classes involves traffic signs.

In order to successfully classify our traffic sign images, we need to remove the final, 1000-neuron classification layer and replace it with a new, 43-neuron classification layer.

This is called _feature extraction_, because we're basically extracting the image features inferred by the penultimate layer, and passing these features to a new classification layer.

Open __feature_extraction.py__ and complete the __TODO(s)__.

The output will probably not precisely match the sample output below, since the output will depend on the (probably random) initialization of weights in the network. That being said, the output classes you see should be present in __signnames.csv__.

```Python
Image 0
Double curve: 0.059
Ahead only: 0.048
Road work: 0.047
Dangerous curve to the right: 0.047
Road narrows on the right: 0.039

Image 1
General caution: 0.079
No entry: 0.067
Dangerous curve to the right: 0.054
Speed limit (50km/h): 0.053
Ahead only: 0.048

Time: 0.500 seconds
```

__Quiz:__

In [None]:
import time
import tensorflow as tf
import numpy as np
import pandas as pd
from scipy.misc import imread
from alexnet import AlexNet

sign_names = pd.read_csv('signnames.csv')
nb_classes = 43

x = tf.placeholder(tf.float32, (None, 32, 32, 3))
resized = tf.image.resize_images(x, (227, 227))

# NOTE: By setting `feature_extract` to `True` we return
# the second to last layer.
fc7 = AlexNet(resized, feature_extract=True)
# TODO: Define a new fully connected layer followed by a softmax activation to classify
# the traffic signs. Assign the result of the softmax activation to `probs` below.
# HINT: Look at the final layer definition in alexnet.py to get an idea of what this
# should look like.
shape = (fc7.get_shape().as_list()[-1], nb_classes)  # use this shape for the weight matrix

fc8W = tf.Variable(tf.truncated_normal([4096, 43], stddev=1e-2))
fc8b = tf.Variable(tf.zeros(nb_classes))
logits = tf.nn.xw_plus_b(fc7, fc8W, fc8b)
probs = tf.nn.softmax(logits)

init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

# Read Images
im1 = imread("construction.jpg").astype(np.float32)
im1 = im1 - np.mean(im1)

im2 = imread("stop.jpg").astype(np.float32)
im2 = im2 - np.mean(im2)

# Run Inference
t = time.time()
output = sess.run(probs, feed_dict={x: [im1, im2]})

# Print Output
for input_im_ind in range(output.shape[0]):
    inds = np.argsort(output)[input_im_ind, :]
    print("Image", input_im_ind)
    for i in range(5):
        print("%s: %.3f" % (sign_names.ix[inds[-1 - i]][1], output[input_im_ind, inds[-1 - i]]))
    print()

print("Time: %.3f seconds" % (time.time() - t))

__Answer:__
```Python
import time
import tensorflow as tf
import numpy as np
import pandas as pd
from scipy.misc import imread
from alexnet import AlexNet

sign_names = pd.read_csv('signnames.csv')
nb_classes = 43

x = tf.placeholder(tf.float32, (None, 32, 32, 3))
resized = tf.image.resize_images(x, (227, 227))

# Returns the second final layer of the AlexNet model,
# this allows us to redo the last layer specifically for 
# traffic signs model.
fc7 = AlexNet(resized, feature_extract=True)
shape = (fc7.get_shape().as_list()[-1], nb_classes)
fc8W = tf.Variable(tf.truncated_normal(shape, stddev=1e-2))
fc8b = tf.Variable(tf.zeros(nb_classes))
logits = tf.nn.xw_plus_b(fc7, fc8W, fc8b)
probs = tf.nn.softmax(logits)

init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)

# Read Images
im1 = imread("construction.jpg").astype(np.float32)
im1 = im1 - np.mean(im1)

im2 = imread("stop.jpg").astype(np.float32)
im2 = im2 - np.mean(im2)

# Run Inference
t = time.time()
output = sess.run(probs, feed_dict={x: [im1, im2]})

# Print Output
for input_im_ind in range(output.shape[0]):
    inds = np.argsort(output)[input_im_ind, :]
    print("Image", input_im_ind)
    for i in range(5):
        print("%s: %.3f" % (sign_names.ix[inds[-1 - i]][1], output[input_im_ind, inds[-1 - i]]))
    print()

print("Time: %.3f seconds" % (time.time() - t))
```

The notable part being:

```Python
# Returns the second final layer of the AlexNet model,
# this allows us to redo the last layer specifically for 
# traffic signs model.
fc7 = AlexNet(resized, feature_extract=True)
shape = (fc7.get_shape().as_list()[-1], nb_classes)
fc8W = tf.Variable(tf.truncated_normal(shape, stddev=1e-2))
fc8b = tf.Variable(tf.zeros(nb_classes))
logits = tf.nn.xw_plus_b(fc7, fc8W, fc8b)
probs = tf.nn.softmax(logits)
```

First, I figure out the shape of the final fully connected layer, in my opinion this is the trickiest part. To do that I have to figure out the size of the output from __fc7__. Since it's a fully connected layer I know it's shape will be 2D so the second (or last) element of the list will be the size of the output. __fc7.get_shape().as_list()[-1]__ does the trick. I then combine this with the number of classes for the Traffic Sign dataset to get the shape of the final fully connected layer, __shape = (fc7.get_shape().as_list()[-1], nb_classes)__. The rest of the code is just the standard way to define a fully connected in TensorFlow. Finally, I calculate the probabilities via softmax, __probs = tf.nn.softmax(logits)__.

### 3.2. Training the Feature Extractor <a name='Training the Feature Extractor'></a>

The feature extractor we just created works, in the sense that data will flow through the network and result in predictions.

But the predictions aren't accurate, because we haven't yet trained the new classification layer.

In order to do that, we'll need to read in the training dataset and train the network.

Training AlexNet (even just the final layer!) can take a little while, so if we don't have a GPU, running on a subset of the data is a good alternative. As a point of reference one epoch over the training set takes roughly 53-55 seconds with a GTX 970.

Open up __train_feature_extraction.py__ and complete the __TODO(s)__.

In [None]:
import pickle
import time
import tensorflow as tf
from tqdm import tqdm
from sklearn.model_selection import train_test_split
from sklearn.utils import shuffle
from alexnet import AlexNet

nb_classes = 43
epochs = 10
batch_size = 128

# TODO: Load traffic signs data.
file_name = 'Data/train.p'
with open(file_name, 'rb') as file:
    data = pickle.load(file)

# TODO: Split data into training and validation sets.
X_train, X_test, y_train, y_test = train_test_split(data['features'], data['labels'], test_size=0.33, random_state=42)

# TODO: Define placeholders and resize operation.
features = tf.placeholder(tf.float32, (None, 32, 32, 3))
labels = tf.placeholder(tf.int64, None)
resized = tf.image.resize_images(features, (227, 227))

# TODO: pass placeholder as first argument to `AlexNet`.
fc7 = AlexNet(resized, feature_extract=True)
# NOTE: `tf.stop_gradient` prevents the gradient from flowing backwards
# past this point, keeping the weights before and up to `fc7` frozen.
# This also makes training faster, less work to do!
fc7 = tf.stop_gradient(fc7)

# TODO: Add the final layer for traffic sign classification.
shape = (fc7.get_shape().as_list()[-1], nb_classes)
print(shape)
fc8W = tf.Variable(tf.truncated_normal(shape, stddev=1e-2))
fc8b = tf.Variable(tf.zeros(nb_classes))
logits = tf.nn.xw_plus_b(fc7, fc8W, fc8b)
probs = tf.nn.softmax(logits)

# TODO: Define loss, training, accuracy operations.
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels)
loss_op = tf.reduce_mean(cross_entropy)
opt = tf.train.AdamOptimizer()
train_op = opt.minimize(loss_op, var_list=[fc8W, fc8b])
init_op = tf.global_variables_initializer()

# HINT: Look back at your traffic signs project solution, you may
# be able to reuse some the code.
preds = tf.arg_max(logits, 1)
accuracy_op = tf.reduce_mean(tf.cast(tf.equal(preds, labels), tf.float32))

# TODO: Train and evaluate the feature extraction model.
def eval_on_data(X, y, sess):
    total_acc = 0
    total_loss = 0
    for offset in range(0, X.shape[0], batch_size):
        end = offset + batch_size
        X_batch = X[offset:end]
        y_batch = y[offset:end]

        loss, acc = sess.run([loss_op, accuracy_op], feed_dict={features: X_batch, labels: y_batch})
        total_loss += (loss * X_batch.shape[0])
        total_acc += (acc * X_batch.shape[0])

    return total_loss/X.shape[0], total_acc/X.shape[0]

with tf.Session() as sess:
    sess.run(init_op)

    for i in tqdm(range(epochs)):
        # training
        X_train, y_train = shuffle(X_train, y_train)
        t0 = time.time()
        for offset in range(0, X_train.shape[0], batch_size):
            end = offset + batch_size
            sess.run(train_op, feed_dict={features: X_train[offset:end], labels: y_train[offset:end]})

        val_loss, val_acc = eval_on_data(X_val, y_val, sess)
        print("Epoch", i+1)
        print("Time: %.3f seconds" % (time.time() - t0))
        print("Validation Loss =", val_loss)
        print("Validation Accuracy =", val_acc)
        print("")

Most of the code should look familiar.
```Python
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits, labels)
loss_op = tf.reduce_mean(cross_entropy)
opt = tf.train.AdamOptimizer()
train_op = opt.minimize(loss_op, var_list=[fc8W, fc8b])
init_op = tf.initialize_all_variables()

preds = tf.arg_max(logits, 1)
accuracy_op = tf.reduce_mean(tf.cast(tf.equal(preds, labels), tf.float32))
```

Here are all the operations are defined (training, loss, accuracy, etc); eval_on_data is a utility function to calculate the loss and accuracy over a dataset to evaluate all at once.
```Python
with tf.Session() as sess:
    sess.run(init_op)

    for i in range(epochs):
        # training
        X_train, y_train = shuffle(X_train, y_train)
        t0 = time.time()
        for offset in range(0, X_train.shape[0], batch_size):
            end = offset + batch_size
            sess.run(train_op, feed_dict={features: X_train[offset:end], labels: y_train[offset:end]})

        val_loss, val_acc = eval_on_data(X_val, y_val, sess)
        print("Epoch", i+1)
        print("Time: %.3f seconds" % (time.time() - t0))
        print("Validation Loss =", val_loss)
        print("Validation Accuracy =", val_acc)
        print("")
```

This is the main training procedure. As we can see we run __train_op__ on each batch. Additionally, before each epoch the training set is shuffled using __shuffle__. At the end of each epoch the validation loss and accuracy are recorded and printed out.

Running the above code results in the following results after 10 epochs:
```Python
Epoch 10
Time: 53.402 seconds
Validation Loss = 0.126141663276
Validation Accuracy = 0.966069240196
```

# 3. Transfer Learning with VGG, Inception (GoogLeNet) and ResNet <a name='Transfer Learning with'></a>

In this lab, we will continue exploring transfer learning. We've already explored feature extraction with AlexNet and TensorFlow. Next, we will use Keras to explore feature extraction with the VGG, Inception and ResNet architectures. The models we will use were trained for days or weeks on the [ImageNet dataset](http://www.image-net.org/). Thus, the weights encapsulate higher-level features learned from training on thousands of classes.

There are some notable differences from AlexNet lab.
1. We're using two datasets. First, the German Traffic Sign dataset, and second, the [Cifar10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html).
2. Bottleneck Features. Unless you have a very powerful GPU, running feature extraction on these models will take a significant amount of time, as you might have observed in the AlexNet lab. To make things easier we've precomputed bottleneck features for each (network, dataset) pair. This will allow us to experiment with feature extraction even on a modest CPU. We can think of bottleneck features as feature extraction but with caching. Because the base network weights are frozen during feature extraction, the output for an image will always be the same. Thus, once the image has already been passed through the network, we can cache and reuse the output.
3. Furthermore, we've limited each class in both training datasets to 100 examples. The idea here is to push feature extraction a bit further. It also greatly reduces the download size and speeds up training. The validation files remain the same.

The files are encoded as such:
- {network}_{dataset}_100_bottleneck_features_train.p
- {network}_{dataset}_bottleneck_features_validation.p

"network", in the above filenames, can be one of 'vgg', 'inception', or 'resnet'.

"dataset" can be either 'cifar10' or 'traffic'.

### 3.1. Cifar10 Aside

Cifar10 images are also (32, 32, 3) so the main thing we'll need to change is __the number of classes from 43 to 10__. Cifar10 also doesn't come with a validation set, so we can randomly split training data into a training and validation.

We can easily download and load the Cifar10 dataset like this:
```Python
from keras.datasets import cifar10
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
# y_train.shape is 2d, (50000, 1). While Keras is smart enough to handle this
# it's a good idea to flatten the array.
y_train = y_train.reshape(-1)
y_test = y_test.reshape(-1)
```

We can then use sklearn to split off part of the data into a validation set:

```Python
from sklearn.model_selection import train_test_split
X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train, test_size=0.3, random_state=42, stratify = y_train)
```

The Cifar10 dataset contains 10 classes:
<img src='Images/Cifar10 Aside.png' width=600>
$$ \text{Overview of the Cifar10 dataset. Source: Alex Krizhevsky.} $$

While the German Traffic Sign dataset has more classes, the Cifar10 dataset is harder to classify due to the complexity of the classes. A ship is drastically different from a frog, and a frog is nothing like a deer, etc. These are the kind of datasets where the advantage of using a pre-trained model will become much more apparent.

Train the model on the Cifar10 dataset and record your results, keep these in mind when we train from the bottleneck features. Don't be discouraged if we get results significantly worse than the Traffic Sign dataset.

__The code explanation:__

```Python
import pickle
import tensorflow as tf
# TODO: import Keras layers you need here
```

Nothing fancy here, just some imports we need to run the code, __pickle__ is used to load the bottleneck features.

```Python
flags = tf.app.flags
FLAGS = flags.FLAGS

# command line flags
flags.DEFINE_string('training_file', '', "Bottleneck features training file (.p)")
flags.DEFINE_string('validation_file', '', "Bottleneck features validation file (.p)")
```

Here we define some command line flags, this avoids having to manually open and edit the file if we want to change the files we train and validate our model with.

Here's how we would run the file from the command line:
```Python
python feature_extraction.py --training_file vgg_cifar10_100_bottleneck_features_train.p --validation_file vgg_cifar10_bottleneck_features_validation.p
```

Running this program will train feature extraction with the VGG network/Cifar10 dataset bottleneck features. The 100 in __vgg_cifar10_100__ indicates this file has 100 examples per class.

You could define additional flags if you wish. Possible candidates could be the batch size or the number of epochs.

```Python
def load_bottleneck_data(training_file, validation_file):
    """
    Utility function to load bottleneck features.

    Arguments:
        training_file - String
        validation_file - String
    """
    print("Training file", training_file)
    print("Validation file", validation_file)

    with open(training_file, 'rb') as f:
        train_data = pickle.load(f)
    with open(validation_file, 'rb') as f:
        validation_data = pickle.load(f)

    X_train = train_data['features']
    y_train = train_data['labels']
    X_val = validation_data['features']
    y_val = validation_data['labels']

    return X_train, y_train, X_val, y_va
```

A utility function that loads the bottleneck features from the pickled training and validation files.

```Python
def main(_):
    # load bottleneck data
    X_train, y_train, X_val, y_val = load_bottleneck_data(FLAGS.training_file, FLAGS.validation_file)

    print(X_train.shape, y_train.shape)
    print(X_val.shape, y_val.shape)

    # TODO: define your model and hyperparams here
    # make sure to adjust the number of classes based on
    # the dataset
    # 10 for cifar10
    # 43 for traffic

    # TODO: train your model here


# parses flags and calls the `main` function above
if __name__ == '__main__':
    tf.app.run()
```

This is where we'll define and train the model. Notice __FLAGS.training_file__ and __FLAGS.validation_file__ are passed into load_bottleneck_data. These refer to the command line flags defined earlier.

Once we've trained the model, record the results. How do they compare to the results from the previous exercise?

In [8]:
import pickle
import tensorflow as tf
# TODO: import Keras layers you need here
import numpy as np
from keras.layers import Input, Dense, Flatten
from keras.models import Model

#flags = tf.app.flags
#FLAGS = flags.FLAGS

# command line flags
#flags.DEFINE_string('training_file', '', "Bottleneck features training file (.p)")
#flags.DEFINE_string('validation_file', '', "Bottleneck features validation file (.p)")
#flags.DEFINE_integer('epochs', 50, "The number of epochs.")
#flags.DEFINE_integer('batch_size', 256, "The batch size.")

training_file = 'Cifar10 Aside/vgg-100/vgg_cifar10_100_bottleneck_features_train.p'
validation_file = 'Cifar10 Aside/vgg-100/vgg_cifar10_bottleneck_features_validation.p'
epochs = 50
batch_size = 256

def load_bottleneck_data(training_file, validation_file):
    """
    Utility function to load bottleneck features.

    Arguments:
        training_file - String
        validation_file - String
    """
    print("Training file", training_file)
    print("Validation file", validation_file)

    with open(training_file, 'rb') as f:
        train_data = pickle.load(f)
    with open(validation_file, 'rb') as f:
        validation_data = pickle.load(f)

    X_train = train_data['features']
    y_train = train_data['labels']
    X_val = validation_data['features']
    y_val = validation_data['labels']

    return X_train, y_train, X_val, y_val


def main(_):
    # load bottleneck data
    #X_train, y_train, X_val, y_val = load_bottleneck_data(FLAGS.training_file, FLAGS.validation_file)
    X_train, y_train, X_val, y_val = load_bottleneck_data(training_file, validation_file)
    
    print(X_train.shape, y_train.shape)
    print(X_val.shape, y_val.shape)

    nb_classes = len(np.unique(y_train))
    # TODO: define your model and hyperparams here
    # make sure to adjust the number of classes based on
    # the dataset
    # 10 for cifar10
    # 43 for traffic
    input_shape = X_train.shape[1:]
    inp = Input(shape=input_shape)
    x = Flatten()(inp)
    x = Dense(nb_classes, activation='softmax')(x)
    model = Model(inp, x)
    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    
    # TODO: train your model here
    model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, validation_data=(X_val, y_val), shuffle=True)


# parses flags and calls the `main` function above
if __name__ == '__main__':
    tf.app.run()

Training file Cifar10 Aside/vgg-100/vgg_cifar10_100_bottleneck_features_train.p
Validation file Cifar10 Aside/vgg-100/vgg_cifar10_bottleneck_features_validation.p
(1000, 1, 1, 512) (1000, 1)
(10000, 1, 1, 512) (10000, 1)
Train on 1000 samples, validate on 10000 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


SystemExit: 

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


__Answer :__

Import the additional libraries required.
```Python
import numpy as np
from keras.layers import Input, Flatten, Dense
from keras.models import Model
```

add a couple of command-line flags to set the number of epochs and batch size. This is more for convenience than anything else.
```Python
flags.DEFINE_integer('epochs', 50, "The number of epochs.")
flags.DEFINE_integer('batch_size', 256, "The batch size.")
```

Here, the number of classes for the dataset can be found. __np.unique__ returns all the unique elements of a numpy array. The elements of __y\_train__ are integers, __0-9__ for Cifar10 and __0\-42__ for Traffic Signs. So, when combined with __len__ we get back the number of classes.
```Python
nb_classes = len(np.unique(y_train))
```

Very simple model is determined, a linear layer (Dense in Keras terms) followed by a softmax activation. The Adam optimizer is used.
```Python
# define model
input_shape = X_train.shape[1:]
inp = Input(shape=input_shape)
x = Flatten()(inp)
x = Dense(nb_classes, activation='softmax')(x)
model = Model(inp, x)
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
```
