# Cifar10 with tf.keras, tf.data and image augmentation

In [None]:
import sys
import os
import tempfile
import shutil
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds

import numpy as np
%matplotlib inline
import random
import matplotlib.pyplot as plt

In [None]:
# Python version 3.5 or 3.6
assert sys.version_info >= (3, 5)
assert sys.version_info < (3, 7)
# Tensorflow 2.0
assert tf.__version__ >= "2.0"

The problem we are trying to solve here is to perform a sentiment analysis task on a movie reviews dataset, IMDB.

# Input Data Management

## Download the dataset

We will use the new API called [TensorFlow Datasets](https://www.tensorflow.org/datasets), which gives access to a large collection of ready to use datasets, which are exposed as `tf.data.Datasets` for performant input pipelines.

We will load the `imdb_reviews` dataset.

Documentation : https://www.tensorflow.org/datasets/datasets#imdb_reviews

In [None]:
train_data, test_data = tfds.load(name="imdb_reviews", split=["train", "test"], 
                                  batch_size=-1, as_supervised=True)

Convert to numpy arrays for further manipulation

In [None]:
train_examples, train_labels = tfds.as_numpy(train_data)
x_test, y_test = tfds.as_numpy(test_data)

Split into train and validation tests

In [None]:
x_train = train_examples[10000:]
x_val = train_examples[:10000]

y_train = train_labels[10000:]
y_val = train_labels[:10000]

## Visualize the data

In [None]:
print("Training entries: {}, test entries: {}".format(len(train_examples), len(x_test)))

In [None]:
train_examples[:2]

In [None]:
train_labels[:10]

## Create a tf.data Dataset

In [None]:
NUM_EPOCHS = 15
BATCH_SIZE = 100

We will now go back to a `tf.data.Dataset` format.

For the train set, the following steps are performed :
- Convert to dataset with the `from_tensor_slices` function
- Shuffle with buffer size 10000
- Batch
- Repeat

> <div class="mark">Create the input pipeline for the train set</div><i class="fa fa-lightbulb-o "></i>

Documentation : 
- https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/data/Dataset#from_tensor_slices

In [None]:
# Create Dataset
ds_train = tf.data.Dataset. # TODO
# Shuffle
ds_train = # TODO
# Batch
ds_train = # TODO
# Repeat
ds_train = # TODO

We will now go back to a `tf.data.Dataset` format.

The test set is only composed of the following steps :
- Convert to dataset with the `from_tensor_slices` function
- Batch

> <div class="mark">Create the input pipeline for the test set</div><i class="fa fa-lightbulb-o "></i>

Documentation : 
- https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/data/Dataset#from_tensor_slices

In [None]:
# Create Dataset
ds_test = tf.data.Dataset. # TODO
# Batch
ds_test = # TODO

# Model Management

## Build the model

In [None]:
module_url = "https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1"

Our Neural Network will now be composed of the following layer : 
- `KerasLayer` : add the `module_url`, the output_shape ([OUTPUT_SHAPE]), the input_shape ([]), dtype=tf.string and set `trainable` to False
- `Dense` Layer : 16 neurons, relu activation
- `Dense` Layer : 1 neuron, sigmoid activation

> <div class="mark">Create the model</div><i class="fa fa-lightbulb-o "></i>

Documentation : 
- https://www.tensorflow.org/hub/api_docs/python/hub/KerasLayer?hl=en
- https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense

In [None]:
OUTPUT_SHAPE = 20

model = tf.keras.models.Sequential()
# TODO


To make our network ready for training, we need to pick three more things, as part of "compilation" step:

* A loss function: the is how the network will be able to measure how good a job it is doing on its training data, and thus how it will be able to steer itself in the right direction.
* An optimizer: this is the mechanism through which the network will update itself based on the data it sees and its loss function.
* Metrics to monitor during training and testing. Here we will only care about accuracy (the fraction of the images that were correctly classified).

You will implement the following compilation step for your Neural Network : 
- "adam" optimizer
- "binary_crossentropy" loss
- metric : "accuracy"

Documentation : https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential#compile

### Compile the model

In [None]:
optimizer = tf.optimizers.Adam()

model.compile(loss='binary_crossentropy',
              optimizer=optimizer,
              metrics=['accuracy'])

Summarize the model

In [None]:
model.summary()

## Train the model

We are now ready to train our network, which in Keras is done via a call to the `fit` method of the network: 
we "fit" the model to its training data.

You will fit the network with the following configurations :
- `x`: ds_train
- `epochs` : 15 (passes on the whole dataset)
- `steps_per_epoch`: 150 steps
- `validation_data`: ds_test
- `validation_steps`: 10
- `callbacks`: tensorboard

You will also add a callback for launching TensorBoard to observe how the training is performing.

In [None]:
LOG_DIR = './tensorboard/tf_keras_data_hub'

tensorboard = tf.keras.callbacks.TensorBoard(log_dir=LOG_DIR, histogram_freq=1, update_freq="batch")

> <div class="mark">Fit the model with the above information.</div><i class="fa fa-lightbulb-o "></i>

In [None]:
shutil.rmtree(LOG_DIR, ignore_errors=True)

model. # TODO

Two quantities are being displayed during training: the "loss" of the network over the training data, and the accuracy of the network over the training data.

## Performance visualisation with Tensorboard

After executing the following command, click on the suggested link (virtualenv users) or open a new tab and go to 127.0.0.1:6006 (Docker users) to open Tensorboard.

Once finished, click on the STOP botton.

In [None]:
!tensorboard --logdir "tensorboard/" --port 6006

# Model Performance Evaluation

Now let's check that our model performs well on the test set too.

You can do this by calling the `evaluate` method of your network on the test set (use 300 for the `steps` argument).

Documentation : https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential#evaluate

> <div class="mark">Evaluate the model performance on test set</div><i class="fa fa-lightbulb-o "></i>

In [None]:
model. # TODO