# Exercise 4

Work on this before the next lecture on 26 April. We will talk about questions, comments, and solutions during the exercise after the third lecture.

Please do form study groups! When you do, make sure you can explain everything in your own words, do not simply copy&paste from others.

The solutions to a lot of these problems can probably be found with Google. Please don't. You will not learn a lot by copy&pasting from the internet.

If you want to get credit/examination on this course please upload your work to your GitHub repository for this course before the next lecture starts and post a link to your repository in [this thread](https://github.com/wildtreetech/advanced-computing-2018/issues/8). If you worked on things together with others please add their names to the notebook so we can see who formed groups.

The overall idea of this exercise is to get you using and building convolutional neural networks.

## Question 1

In the last exercise you built a neural network that can classify fashion items using only densely connected layers.

Build on this by using convolutions, pooling, dropout, batch norm, etc in your neural network. Can you outperform your densely connected network?

Start with a small network and a fraction of the data to check if you hooked everything up correctly. Don't go overboard with the size of the network either as even small networks take quite a while to train.

(If you want to experiment with a free GPU checkout https://kaggle.com/kernels .)

In [67]:
# your code here
import numpy as np
import keras
from keras.datasets import fashion_mnist
from keras.models import Model
from keras.layers import Input, Dense, Activation, Flatten, Conv2D, Conv1D, MaxPooling1D, BatchNormalization, Dropout
from keras import utils
from sklearn.model_selection import train_test_split

# Loading data and its separation  
(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

X_train, X_val, y_train, y_val = train_test_split(X_train, y_train,
                                                  test_size=10000,
                                                  random_state=42)




num_classes = 10
y_train_ = utils.to_categorical(y_train, num_classes)
y_val = utils.to_categorical(y_val, num_classes)
y_test = utils.to_categorical(y_test, num_classes)
y_train = utils.to_categorical(y_train, num_classes)






# we define the input shape (i.e., how many input features) **without** the batch size
x = Input(shape=(28, 28, ))


# turn a 28x28 matrix into a 784-d vector, this removes all information
# about the spatial relation between pixels. Using convolutions will
# allow us to take advantage of that information (see later)

#h = Flatten()(x)

#
# your network architecture here
#

# we want to predict one of ten classes

h = Conv1D(64,3,activation = 'relu')(x)
h = BatchNormalization()(h)
h = MaxPooling1D(2, strides=2)(h)
h = Dropout(0.3)(h)

h = Conv1D(128,3,activation = 'relu')(x)
h = BatchNormalization()(h)
h = MaxPooling1D(2, strides=2)(h)
h = Dropout(0.5)(h)

#h = Conv1D(256,3,activation = 'relu')(x)
#h = BatchNormalization()(h)
#h = MaxPooling1D(2, strides=2)(h)
#h = Dropout(0.7)(h)

#h = Conv1D(512,3,activation = 'relu')(x)
#h = BatchNormalization()(h)
#h = MaxPooling1D(2, strides=2)(h)
#h = Dropout(0.9)(h)

#h = Conv1D(64,3,activation = 'relu')(h)
#h = Conv1D(64,3,activation = 'relu')(h)
#h = MaxPooling1D(2, strides=2)(h)

#h = Conv1D(128,3,activation = 'relu')(h)
#h = Conv1D(128,3,activation = 'relu')(h)
#h = MaxPooling1D(2, strides=2)(h)




h = Flatten()(h)

h = Dense(1000)(h)
h = Dense(10)(h)

y = Activation('softmax')(h)

#looking at the val_loss/val_acc the better approach was so far increase both numbers of layers and apply wider layers,
# however there is a clear saturation in perfomance optimization.


# Package it all up in a Model
net = Model(x, y)

net.compile(loss='categorical_crossentropy',
            optimizer='sgd',
            metrics=['accuracy'])


batch_size = 128
history = net.fit(X_train, y_train,
                  batch_size=batch_size,
                  epochs=5,
                  verbose=1,
                  validation_data=(X_val, y_val))

Train on 50000 samples, validate on 10000 samples
Epoch 1/5
Epoch 2/5
 2176/50000 [>.............................] - ETA: 2:43 - loss: 0.7851 - acc: 0.7583

KeyboardInterrupt: 

## Question 2

For most real world applications we do not have enough labelled images to train a large neural network from scratch. Instead we can use a pre-trained network as a feature transformer and train a smaller model (or even just a logistic regression) on the output of the pre-trained network.

There are several pretrained networks available as part of keras: https://keras.io/applications/. The documentation usually gives some information or links about each network.

The documentation also contains snippets on how to use a pre-trained network as feature transformer ("Extract features with VGG16"). You should be able to generalise from that example using VGG16 to approximately any of the networks available there.

One important thing to not forget is that you need to preprocess your images before feeding them into a pretrained network. Keras provides the functions to do that as well, use them :) You might also need to resize your images first.

The task for this question is to build a classifier that can tell road bikes from mountain bikes. Start with using a pre-trained network as feature transformer and logistic regression as classifier on the output of the pretrained network. Once this works you can experiment with extracting features from earlier layers of the pre-trained network, compare your performance to a small network trained from scratch, try to beat your neural net by extracting features by hand and feeding them to a random forest, increasing your dataset size by [augmenting the data](https://keras.io/preprocessing/image/), etc.

The dataset containing about 100 labelled images for each road and mountain bikes is here: https://github.com/wildtreetech/advanced-computing-2018/blob/master/data/road-and-mountain-bikes.zip

In [None]:
from keras import applications
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
import numpy as np


# for example load the VGG16 network
model = applications.VGG16(include_top=False,
                           weights='imagenet')

img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

features = model.predict(x)

## Question 3

Think about what project you want to do. What makes a good project? It should use some of what you learnt in this class, there should be labelled data available already, and it should be something you are interested in.

You will have to write a short report on what you did. To write an interesting report you need to tell a story, not just first I did A, then I did B, then I did X and finally D.

It also has to go a bit beyond simply training a classifier or regression model.

An example based on the bike images from the previous question:

A local bike shop wants to keep an eye on sales of bikes on ebay. They specialise in road bikes so they want to be able to filter out all adverts for mountain bikes. They have found that people writing ebay adverts are not very good at correctly labelling their adverts. Can they use machine-learning to help classify adverts?

We investigate labelling adverts based on the image in the advert and study different trade offs in misclassifying bikes. The network was trained on 100 images from a catalog which show bikes on a white background. We compare the performance of the network on the training data and a small set of hand labelled images of bikes in the wild.