##### Copyright 2018 The TensorFlow Authors.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

In [None]:
#@title MIT License
#
# Copyright (c) 2017 François Chollet
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.

# Lab 02 Classifying Images of Clothing

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/rses-dl-course/rses-dl-course.github.io/blob/master/notebooks/R/R02_classifying_images_of_clothing.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/rses-dl-course/rses-dl-course.github.io/blob/master/notebooks/R/R02_classifying_images_of_clothing.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>

In this tutorial, we'll build and train a neural network to classify images of clothing, like sneakers and shirts.

It's okay if you don't understand everything. This is a fast-paced overview of a complete TensorFlow program, with explanations along the way. The goal is to get the general sense of a TensorFlow project, not to catch every detail.

This guide uses the [Keras](https://www.tensorflow.org/guide/keras) R library, a high-level API to build and train models in TensorFlow [through R](https://keras.rstudio.com/). 

## Install and import dependencies

We'll need to install the `keras` R package which also includes the example datasets we'll be working with. 

Alternatively, there is also the option of installing the [`tfdatasets`](https://tensorflow.rstudio.com/guide/tfdatasets/introduction/) package. This package is an R interface to TensorFlow datasets and provides access to the Dataset API. The TensorFlow Dataset API provides various facilities for **creating scalable input pipelines for TensorFlow models**, including high-level convenience functions for easy integration with Keras and access to Tensorflow example datasets.

We'll also install a couple more packages that we will use for plotting.

In [None]:
install.packages(c("keras", "tidyr", "ggplot2", "cowplot"))

Once installed, load the library

In [None]:
library(keras)

## Import the Fashion MNIST dataset

This tutorial uses the [Fashion MNIST](https://github.com/zalandoresearch/fashion-mnist) dataset, which contains 70,000 grayscale images in 10 categories. The images show individual articles of clothing at low resolution (28 $\times$ 28 pixels), as seen here:

<table>
  <tr><td>
    <img src="https://tensorflow.org/images/fashion-mnist-sprite.png"
         alt="Fashion MNIST sprite" width="600">
  </td></tr>
  <tr><td align="center">
    <b>Figure 1.</b> <a href="https://github.com/zalandoresearch/fashion-mnist">Fashion-MNIST samples</a> (by Zalando, MIT License).<br/>&nbsp;
  </td></tr>
</table>

Fashion MNIST is intended as a drop-in replacement for the classic [MNIST](http://yann.lecun.com/exdb/mnist/) dataset—often used as the "Hello, World" of machine learning programs for computer vision. The MNIST dataset contains images of handwritten digits (0, 1, 2, etc) in an identical format to the articles of clothing we'll use here.

This guide uses Fashion MNIST for variety, and because it's a slightly more challenging problem than regular MNIST. Both datasets are relatively small and are used to verify that an algorithm works as expected. They're good starting points to test and debug code.

We will use 60,000 images to train the network and 10,000 images to evaluate how accurately the network learned to classify images. You can access the Fashion MNIST directly from the [Keras R package](https://keras.rstudio.com/reference/dataset_fashion_mnist.html) using:

In [None]:
fashion_mnist <- dataset_fashion_mnist()

In [None]:
str(fashion_mnist)

Loading the dataset returns a list, containing two elements, **`train`** which contains the *training dataset* and **`test`** containing the *test dataset*.

* The model is trained using `train`.
* The model is tested against `test`.


The images in each dataset (`x`) are 28 $\times$ 28 arrays, with pixel values in the range `[0, 255]`. The *labels* (`y`) are an array of integers, in the range `[0, 9]`. These correspond to the *class* of clothing the image represents:

<table>
  <tr>
    <th>Label</th>
    <th>Class</th>
  </tr>
  <tr>
    <td>0</td>
    <td>T-shirt/top</td>
  </tr>
  <tr>
    <td>1</td>
    <td>Trouser</td>
  </tr>
    <tr>
    <td>2</td>
    <td>Pullover</td>
  </tr>
    <tr>
    <td>3</td>
    <td>Dress</td>
  </tr>
    <tr>
    <td>4</td>
    <td>Coat</td>
  </tr>
    <tr>
    <td>5</td>
    <td>Sandal</td>
  </tr>
    <tr>
    <td>6</td>
    <td>Shirt</td>
  </tr>
    <tr>
    <td>7</td>
    <td>Sneaker</td>
  </tr>
    <tr>
    <td>8</td>
    <td>Bag</td>
  </tr>
    <tr>
    <td>9</td>
    <td>Ankle boot</td>
  </tr>
</table>

**Each image is mapped to a single label.**

Lets split this list into four separate arrays: The `train_images` and `train_labels` arrays are the **training set** — the data the model uses to learn. The model is tested against the **test set**: the `test_images`, and `test_labels` arrays.

In [None]:
c(train_images, train_labels) %<-% fashion_mnist$train
c(test_images, test_labels) %<-% fashion_mnist$test


Since the *class names* are not included with the dataset, we can store them in a character vector to use later when plotting the images:

In [None]:
class_names <- c('T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal',      'Shirt',   'Sneaker', 'Bag',   'Ankle boot')

### Explore the data

Let's explore the format of the dataset before training the model. The following shows there are 60,000 images in the training set, with each image represented as 28 x 28 pixels.

In [None]:
dim(train_images)

The pixel values across all images range between 0 and 255

In [None]:
range(train_images)

Likewise, there are 60,000 labels in the training set:

In [None]:
dim(train_labels)

Each label is an integer between 0 and 9:

In [None]:
train_labels %>% unique() %>% sort()

There are 10,000 images in the test set. Again, each image is represented as 28 x 28 pixels:

In [None]:
dim(test_images)

And the test set contains 10,000 images labels:

In [None]:
dim(test_labels)

## Preprocess the data

The data must be preprocessed before training the network. The value of each pixel in the image data is an integer in the range `[0,255]`. If you inspect the first image in the training set, you will see that the pixel values fall in the range of 0 to 255:

In [None]:
library(tidyr)
library(ggplot2)

plot_fashionmnist_image <- function(image){
  image <- as.data.frame(image)
  colnames(image) <- seq_len(ncol(image))
  image$y <- seq_len(nrow(image))
  image <- gather(image, "x", "value", -y)
  image$x <- as.integer(image$x)

  ggplot(image, aes(x = x, y = y, fill = value)) +
    geom_tile() +
    scale_fill_gradient(low = "white", high = "black", na.value = NA) +
    scale_y_reverse() +
    theme_minimal() +
          theme(panel.grid = element_blank(), 
                axis.text = element_blank(),
                axis.title = element_blank(),
                aspect.ratio = 1)
}

plot_fashionmnist_image(train_images[1,,])

For the model to work properly, these values need to be normalized to the range `[0,1]`. To perform the conversion we can divide each array by 255.

In [None]:
train_images <- train_images/255
test_images <- test_images/255

### Explore the processed data

We can check that pixel values have been successfully processed by checking the range of values in each array.

In [None]:
range(train_images)
range(test_images)

To verify that the data appears correctly labelled, let's also display the first 25 images from the training set as well as the class name below each image.

In [None]:
par(mfcol=c(5,5))
par(mar=c(0, 0, 1.5, 0), xaxs='i', yaxs='i')
for (i in 1:25) { 
  img <- train_images[i, , ]
  img <- t(apply(img, 2, rev)) 
  image(1:28, 1:28, img, col = gray((0:255)/255), xaxt = 'n', yaxt = 'n',
        main = paste(class_names[train_labels[i] + 1]))
}

We’re ready to build and train the network!

## Build the model

Building the neural network requires configuring the layers of the model, then compiling the model.

### Setup the layers

The basic building block of a neural network is the *layer*. A layer extracts a representation from the data fed into it. Hopefully, a series of connected layers results in a representation that is meaningful for the problem at hand.

Much of deep learning consists of chaining together simple layers. Most layers, like Keras `layer_dense`, have internal parameters which are adjusted ("learned") during training.

The network we want to build has three layers:

* **input** `layer_flatten()` — This layer transforms the images from a 2d-array of 28 $\times$ 28 pixels, to a 1d-array of 784 pixels (28\*28). Think of this layer as unstacking rows of pixels in the image and lining them up. This layer has no parameters to learn, as it only reformats the data.

* **"hidden"** `layer_dense()`— A densely connected layer of 128 neurons and ReLu activation. Each neuron (or node) takes input from all 784 nodes in the previous layer, weighting that input according to hidden parameters which will be learned during training, and outputs a single value to the next layer.

* **output** `layer_dense()` — A 10-node ***softmax*** layer, with each node representing a class of clothing. As in the previous layer, each node takes input from the 128 nodes in the layer before it. Each node weights the input according to learned parameters, and then outputs a value in the range `[0, 1]`, representing the probability that the image belongs to that class. The sum of all 10 node values is 1.

**The code below already has the first `flatten` layer, add the other two `Dense` layers to complete our model.**
* The hidden layer uses an activation function, you'll need to add an extra parameter `activation="relu"`.
* The output layer also uses an activation function, you'll need to add an extra parameter `activation="softmax"`.

For more details on implementing these layers, have a look at the [R keras core layer documentation](https://keras.rstudio.com/reference/index.html#section-core-layers).

In [None]:
model <- keras_model_sequential()
model %>%
  # Input layer
  layer_flatten(input_shape = c(28, 28)) %>%
  # TODO: Add dense hidden layer
    
  # TODO: Add dense output layer

#### Exercise 2.1 Solution (5 mins)

The solution for the exercise can be found [here](https://colab.research.google.com/github/rses-dl-course/rses-dl-course.github.io/blob/master/notebooks/R/solutions/E.2.1.ipynb)



## Compile the model

Before the model is ready for training, it needs a few more settings. These are added during the model’s compile step:

* Optimizer — This is **how the model is updated** based on the data it sees and its loss function. _Note that here we are specifying the optimiser using a character string. This means the optimiser will be used with default settings. If you need to change any of the default settings (e.g. the learning rate), you would need to use the [function form of the optimiser](https://keras.rstudio.com/reference/optimizer_adam.html)_.
* Loss function — This **measures how accurate the model is** during training. We want to minimize this function to “steer” the model in the right direction. Here because our targets are categories and the format of our targets is a one dimensional vector of integers we use [`sparse_categorical_crossentropy`](https://keras.rstudio.com/reference/loss_mean_squared_error.html#section-categorical-crossentropy) as our loss function.
* Metrics — Used to **monitor the training and testing steps**. The following example uses accuracy, the fraction of the images that are correctly classified.

In [None]:
model %>% compile(
  optimizer = 'adam', 
  loss = 'sparse_categorical_crossentropy',
  metrics = c('accuracy')
)

## Train the model

Training the neural network model requires the following steps:

* Feed the training data to the model — in this example, the `train_images` and `train_labels` arrays.
* The model learns to associate images and labels.
* We ask the model to make predictions about a test set — in this example, the `test_images` array. We verify that the predictions match the labels from the `test_labels` array.

#### To start training, call the [`fit`](https://keras.rstudio.com/reference/fit.keras.engine.training.Model.html) method — the model is “fit” to the training data:

Some default `fit` settings you should be aware of:
- `shuffle = TRUE` which randomizes the order our data is fed into our model each epoch so our model cannot learn anything from the order of the examples.
- Model fit proceeds using batches of data. If unspecified, `batch_size` will default to 32
- The `epochs = 5` parameter limits training to 5 full iterations of the training dataset, so a total of 5 * 60000 = 300000 examples.

In [None]:
history <- model %>% fit(train_images, train_labels, epochs = 5)

As the model trains (if your run this locally, does not work with R in google colab notebooks), the loss and accuracy metrics are displayed. This model reaches an accuracy of about 0.91 (or 91%) on the training data.

In [None]:
history

## Evaluate Accuracy

Next, compare how the model performs on the test dataset:

In [None]:
score <- model %>% evaluate(test_images, test_labels)
cat('Test loss:', score["loss"], "\n")
cat('Test accuracy:', score["accuracy"], "\n")

It turns out, the accuracy on the test dataset is a little less than the accuracy on the training dataset. This gap between training accuracy and test accuracy is an example of (very minor) overfitting. Overfitting is when a machine learning model performs worse on new data than on their training data.

## Make predictions and explore

With the model trained, we can use it to make predictions about some images. Let's subset the first 32 images and labels from the test dataset.

In [None]:
pred_images <- test_images[1:32 , , ,  drop = FALSE]
pred_labels <- test_labels[1:32]

Next we can use use our model and the keras `predict` function to generate some predictions for our image subset

In [None]:
preds_probs <- model %>%
  predict(x = pred_images)

head(preds_probs)

Here, because we used softmax activation on our output layer, the model has calculated the probability associated with each label for each image in our subset and returns a matrix with 10 columns (the number of classes) and 32 rows (the number of images in our subset. Note that we have only printed out the top 6 rows above).

To get a class prediction we select the class (column) with the highest probability. We can do that by applying the `which.max` function to each row of our prediction matrix. We also need to subtract 1 from each returned column index because while column indexes are 1 indexed (the first column number is indexed by 1), our classes are 0 indexed (the first class in our data is indexed with 0).

In [None]:
apply(preds_probs, 1, which.max) - 1

This prediction approach can be useful if you want to know the probability distribution for each prediction across all classes. However, if you are only interested in the predicted class, there is a shorthand keras function that can return predicted classes instead, `predict_classes`. 

In [None]:
preds_classes <- model %>%
  predict_classes(x = pred_images)

preds_classes

Let's take a look at the first prediction:

In [None]:
preds_classes[1]

Let's see what class that is:

In [None]:
class_names[preds_classes[1] + 1]

Our model predicts that this image is an **Ankle boot**, or `class_names[10]`.

We can graph this to look at the full set of 10 class predictions. First let's write some additional plotting functions.

In [None]:
library(cowplot)

# Function to plot bar plot of probabilities across each class
plot_preds_bar <- function(probs, label){

  plot_data <- data.frame(class = as.factor(0:9), probability = probs, pred_label = FALSE, label = FALSE)
  plot_data$pred_label[which.max(probs)] <- TRUE
  plot_data$label[label + 1] <- TRUE
  plot_data$legend <- "not predicted - correct"
  plot_data$legend[plot_data$pred_label & plot_data$label] <- "predicted - correct"
  plot_data$legend[plot_data$pred_label & !plot_data$label] <- "predicted - incorrect"
  plot_data$legend[plot_data$label & !plot_data$pred_label] <- "actual - incorrect"

  ggplot(plot_data, aes(x = class, y = probability, fill = legend)) + 
    geom_bar(stat = "identity") +
    scale_fill_manual(values = c("predicted - correct" = "blue",
                                 "predicted - incorrect" = "red",
                                 "actual - incorrect" = "blue",
                                 "not predicted - correct" = "grey"),
                      guide = "none") +
    theme_classic()
}
# Function to plot raw image and bar plot of probabilities across each class
plot_preds <- function(image, probs, label, class_names){
    
    pred_label <- which.max(probs) - 1
    correct_label <- label == pred_label
    title_colour <- if(correct_label){"blue"}else{"red"}

    # create title string
    title <- ggdraw() +
      draw_label(
        paste0(class_names[pred_label + 1], " ", format(max(probs) * 100, digits = 3), 
               "% ", " (", class_names[label + 1], ")"),
        fontface = 'bold',
        x = 0,
        hjust = 0,
        color = title_colour) +
      theme(plot.margin = margin(0, 0, 0, 7))
    
    # Generate the two plots
    p1 <- plot_fashionmnist_image(image)
    p2 <- plot_preds_bar(probs, label)

   # Create row of plots
   plot_row <- plot_grid(p1, p2)

   # Bring together title and row of images
   plot_grid(title, plot_row, ncol = 1, rel_heights = c(0.1, 1))
}


In [None]:
options(repr.plot.width = 10, repr.plot.height = 5)
i <- 1
plot_preds(pred_images[i,,], preds_probs[i,], pred_labels[i], class_names)

In [None]:
i <- 5
plot_preds(pred_images[i,,], preds_probs[i,], pred_labels[i], class_names)

In [None]:
options(repr.plot.width = 14, repr.plot.height = 8)
  j <- 16
  plot_list <- as.list(vector(length = j))
  for(i in 1:j){
    plot_list[[i]]<- plot_preds(pred_images[i,,], preds_probs[i,], 
                                pred_labels[i], class_names)
  }
  plot_grid(plotlist = plot_list, ncol = 4)

Finally, use the trained model to make a prediction about a single image.

In [None]:
image <- test_images[1,,,drop = FALSE]

In [None]:
preds_class_single <- model %>% predict_classes(image)

preds_class_single

# Exercise 2.2

Experiment with different models and see how the accuracy results differ. In particular change the following parameters:
*   Set training epochs set to 1
*   Number of neurons in the Dense layer following the Flatten one. For example, go really low (e.g. 10) in ranges up to 512 and see how accuracy changes
*   Add additional Dense layers between the Flatten and the final `Dense(10)`, experiment with different units in these layers
*   Don't normalize the pixel values, and see the effect that has


# Exercise 2.3 - CIFAR-10 Dataset (15 mins)

Let's apply what we've learned to another dataset.The [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

As our input is a colour image, we have now 3 values per pixel. When flattened, our input array is is 3072 long ($32\times32\times3$). 

* What happens when you use the same network as above?
* What is the best accuracy that you can achieve?

The dataset is also available through the `Keras R package`. We can load it using [**`dataset_cifar10()`**](https://keras.rstudio.com/reference/dataset_cifar10.html).

In [None]:
cifar10 <- dataset_cifar10()


Let's inspect the data. It's similar in structure to the fashion MNIST dataset but the images (`x`) are now 4 dimensional arrays, with a dimension for channels (RGB). The size of the images is **32 $\times$ 32**.

In [None]:
str(cifar10)

The number of labels is again 10 and each label (0-9) maps on to one of the following classes.

In [None]:
cifar_labels <- c('airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog',
 'horse', 'ship', 'truck')

Let's create a plotting function that can accomodate the RGB nature of the images and explore the first 21 images in the training dataset.

In [None]:
plot_raster_image <- function(image_array, label, max_value = 1, max_dim = 3){
  image_array %>%
  array_reshape(dim = c(dim(.)[1:max_dim])) %>%
  as.raster(max = max_value) %>%
  plot()
  title(main = label)
}

In [None]:
# set plotting options
options(repr.plot.width = 16, repr.plot.height = 8)

# set number of images to plot
n <- 21

# Loop plotting over n images
layout(matrix(1:n, ncol = 7), respect = FALSE)
for(i in 1:n){
  plot_raster_image(cifar10$train$x[i,,,],
                    cifar_labels[cifar10$train$y[i] + 1],
                    max_value = 255)
}


**Now that we've got a dataset, use what you've learned in this lab to build a model for classifying these images.**
* Don't forget to pre-process your data
* Stick to the same model structure as before
* Remember to **check our input shape as it's different from the fashion mnist dataset**

In [None]:
# TO DO - Build a model to classify CIFAR-10 images


## E2.3 Solution

The solution for the exercise can be found [here](https://colab.research.google.com/github/rses-dl-course/rses-dl-course.github.io/blob/master/notebooks/R/solutions/E.2.3.ipynb)

# Notice
Remember to enable GPU to make everything run faster (Runtime -> Change runtime type -> Hardware accelerator -> GPU).
Also, if you run into trouble, simply reset the entire environment and start from the beginning:
*   Edit -> Clear all outputs
*   Runtime -> Reset all runtimes