# The Fashion MNIST problem with Keras
In this notebook you will solve a binary variant of the Fashion MNIST problem. The original Fashion MNIST dataset contains images for 10 different classes of clothing. We have selected two classes from this dataset for you to work on.

You will preprocess the data to get it into the right shape to process with Keras, and see how overfitting and underfitting shows itself in the loss and accuracy curves during training.

Let's load Keras and some helper functions first:

In [None]:
library(keras)
source("02-helpers.R")

Let's load the data set:

In [None]:
data <- dataset_fashion_mnist_binary()
data_train <- data$train
data_test <- data$test

## Exercise 1
Inspect the dimensions of the training and test data. What do the different dimensions represent for the `x` and `y` variables inside each set?

**Hint**: first inspect the contents of data_train and data_test using the [`names`](https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/names) function. This function will show you the names of the variables inside of an R object.

In [None]:
<FILL IN>

Let's plot one of the samples in the training set:

In [None]:
library(ggplot2)
library(reshape2)

index <- 6000
options(repr.plot.width = 3, repr.plot.height = 3)
ggplot(melt(t(apply(data_train$x[index,,], 2, rev)), varnames=c('x', 'y')), aes(x=x, y=y, fill=value)) +
    geom_raster() +
    scale_x_continuous(expand = c(0, 0)) +
    scale_y_continuous(expand = c(0, 0)) +
    scale_fill_gradient(low="#000000", high="#FFFFFF") +
    theme_void() +
    theme(legend.position = "none") +
    ggtitle(paste('Label:', data_train$y[index]))
options(repr.plot.width = 6, repr.plot.height = 5)

## Exercise 2
The input images have pixel values that are not in the range 0 to 1. Rescale them such that all values are in this range.

**Hint**: use R's `min` and `max` functions to find out the minimum and maximum.

In [None]:
<FILL IN>

## Exercise 3
Divide the original training data 80%-20% between the new training and validation set. Make sure that you slice both the images and the labels, and verify the dimensions of the two sets using `dim` and `length`.

**Hints**:
1. You will need four variables for storing the images and labels of your two sets.
1. Think carefully about the dimensions of the data, and how it affects the slicing.
1. **If you are unable to solve this exercise, please continue with the next one, where we have done it for you.**

In [None]:
<FILL IN>

## Run the cell below only if you did not complete exercise 2 or 3
If you couldn't slice the training and validation set yourself, please run the following code to make sure you have the preprocessed data ready. If not, please continue to the next cell.

In [None]:
data <- dataset_fashion_mnist_binary()
data_train <- data$train
data_test <- data$test

# Exercise 2
index <- 0.8 * length(data_train$y)
dim(data_train$y)
x_val <- data_train$x[-(1:index),,]
y_val <- data_train$y[-(1:index)]

dim(x_val)
dim(y_val)

x_train <- data_train$x[1:index,,]
y_train <- data_train$y[1:index]

dim(x_train)
dim(y_train)

# Exercise 3
min(x_train)
max(x_train)
x_train <- x_train / 255  # these are images in the range 0 to 255, so we can just divide by 255
x_val <- x_val / 255 # same for the validation set

## Exercise 4
Create an initial model by filling out the skeleton below. We have provided you with the first two layers. The first layer is a **flatten** layer, which will convert the two-dimensional 28 x 28 image into a one-dimensional vector of length 784.

Print a summary of the model and verify the output shapes of each layer.

In [None]:
model <- keras_model_sequential() %>%
    layer_flatten(input_shape = c(28, 28)) %>%
    <FILL IN>

model

## Exercise 5
Compile the model by filling out the skeleton below according to the following requirements:
1. The optimizer should be `optimizer_adam` with learning rate 0.001
1. The loss should be `binary_crossentropy`
1. The metric should be accuracy (warning: Keras expects a character vector)

**Hint**: it may help to keep the previous notebook open for reference.

In [None]:
model %>% compile(
    optimizer = <FILL IN>,
    loss = <FILL IN>,
    metrics = <FILL IN>
)

## Exercise 6
Fit the model and plot the training history by filling out the skeleton below. Use 250 epochs.

Do you think the model has sufficient capacity to model the problem. Why or why not? Motivate your answer in a few sentences in the second cell.

In [None]:
history <- model %>% fit(
    x = <FILL IN>,
    y = <FILL IN>,
    epochs = <FILL IN>,
    batch_size = 1024,
    callbacks=list(Progress$new())
)
plot(history)

## Exercise 7
Add the validation set to your training process and increase the model's capacity by adding more neurons and/or layers. Make sure that the model has enough capacity to solve the problem. We have provided you with a skeleton to fill out below. You can modify the learning rate, number of epochs and the network architecture yourself.

Where does the model start overfitting? Motivate your answer in the second cell below as a comment.

In [None]:
model <- keras_model_sequential() %>%
    layer_flatten(input_shape = c(28, 28)) %>%
    <FILL IN>

cat(summary(model))

model %>% compile(
    optimizer = optimizer_adam(lr = 0.001),
    loss = "binary_crossentropy",
    metrics = c("accuracy")
)

history <- model %>% fit(
    x = x_train,
    y = y_train,
    validation_data = list(<FILL IN THE VALIDATION IMAGES, FILL IN THE VALIDATION LABELS>),
    epochs = 500,
    batch_size = 1024,
    callbacks=list(Progress$new())
)
plot(history)

Let's **evaluate** our model on the test set to obtain a final estimate of our model's performance. We need to rescale the test set's images based on the range from our training images. It was 0 to 255, as you may remember.

After rescaling, we evaluate using the [`evaluate`](https://tensorflow.rstudio.com/keras/reference/evaluate.html) function:

In [None]:
x_test <- data_test$x / 255
y_test <- data_test$y
model %>% evaluate(x_test, y_test)

As you can see, the test set accuracy is comparable to the validation set accuracy. Although we could use the validation accuracy as a measure of how well we do on unseen the data, we still optimise our model on the validation set by adapting the hyperparameters ourselves. In this sense, the validation set is not truly unseen: we need to rely on the test set to give us a final verdict on the model's performance.

## Exercise 8: early stopping
Retrain the model from exercise 7. This time, set the number of epochs to the number of epochs at which the model started overfitting. Evaluate your model again.

What happens to the accuracy and loss compared to the model you trained before? Why? Motivate your answer in a few sentences in the second cell below.

In [None]:
model <- keras_model_sequential() %>%
    layer_flatten(input_shape = c(28, 28)) %>%
    <FILL IN>

cat(summary(model))

model %>% compile(
    optimizer = optimizer_adam(lr = 0.001),
    loss = "binary_crossentropy",
    metrics = c("accuracy")
)

history <- model %>% fit(
    x = x_train,
    y = y_train,
    validation_data = list(x_val, y_val),
    epochs = <FILL IN>,
    batch_size = 1024,
    callbacks=list(Progress$new())
)
plot(history)

## Exercise 9: bonus
Reduce the capacity of the model such that you do not overfit. How many layers and neurons do you have?

In [None]:
<FILL IN>

## Exercise 9: bonus
Decrease the batch size by factors of four, and inspect the resulting loss curves. What happens when you approach a batch size of 1? How do you explain this behaviour?

In [None]:
<FILL IN>