Tips:

* Enable a GPU in Colab before running this notebook. *Edit -> Notebook settings -> Hardware accelerator -> GPU.*

* Should you need to reset your Colab environment to a clean state, use *Runtime -> Factory reset runtime*. 

# International Women's Day 2020: Training Neural Networks

Welcome! This notebook contains tutorials and short exercises. 

IWD workshops are of different lengths around the world (unless yours is a full day, you are not expected to complete this entire notebook). Your instructor will guide you through the sections we'll explore today. Our goals are for you to dive in, have fun, and to gain experience training neural networks. 

Note: If you're new to Deep Learning, this is a *lot* of material to see all at once. Of course, no one is expected to learn it all in one day. There are educational resources at the end for you to continue learning, and you can complete the sections we don't get to today at home. 

Here's an outline of what we'll cover.

**Part 1: Training neural networks**

* You'll train a neural network to classify handwritten digits. This is the "hello world" of computer vision, and a great place to begin if you're new to the subject.

* You'll train a convolutional neural network to classify images of cats and dogs, using a real-world dataset.

* Next, you'll use techniques like data augmentation and Dropout to reduce overfitting.

**Part 2: Interpreting neural networks**

* (Advanced) In this section, you will see a minimal version of [DeepDream](https://ai.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html), an experiment that asks a neural network to show us some of the complex features it has learned to detect. Deep Learning is a type of representation learning (a CNN automatically learns a feature hierarchy from pixels -> lines -> shapes -> textures). Each layer learns increasingly complex features, with later layers eventually responding to complex objects (like eyes). You'll be able to see some of those after running this code.

**Instructions**:

* Each section contains a tutorial, then a short exercise. Your instructor will walk you through the tutorial, then you will work on the exercise. You can find solutions for each exercise in a comment.

Let's get started!

In [0]:
%tensorflow_version 2.x
import tensorflow as tf
print("You are using TensorFlow version", tf.__version__)
if len(tf.config.list_physical_devices('GPU')) > 0:
  print("You have a GPU enabled.")
else:
  print("Enable a GPU before running this notebook.")

Note: Colab has a variety of GPU types available (each new  instance is assigned one randomly, depending on availability). To see which type of GPU you have, you can run ```!nvidia-smi``` in a code cell. Some are quite fast!

# Tutorial 1: MNIST

Training an image classifier on the MNIST dataset of handwritten digits is considered the "hello world" of computer vision. In this tutorial, you will download the dataset, then train a linear model, a neural network, and a deep neural network to classify it. 

The code below is based on a longer tutorial on [tensorflow.org](https://www.tensorflow.org/tutorials/keras/classification), which you can read later for more background.

In [0]:
from tensorflow import keras
import matplotlib.pyplot as plt

## Download the MNIST dataset
MNIST contains 70,000 grayscale images in 10 categories. The images are low resolution (28 by 28 pixels).

In [0]:
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

There are 60,000 images in the training set:

In [0]:
print(train_images.shape)

And 10,000 in the testing set:

In [0]:
print(test_images.shape)

Each label is an integer between 0-9:

In [0]:
print(train_labels)

## Preprocess the data
The pixel values in the images range between 0 and 255. Let's convert them to between 0 and 1 before feeding them to the network. To do so, divide the values by 255. It's important that the training set and the testing set are preprocessed in the same way:

In [0]:
train_images = train_images / 255.0
test_images = test_images / 255.0

Let's display the first 25 images from the training set, and display the label below each image.


In [0]:
plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(train_images[i], cmap=plt.cm.binary)
    plt.xlabel(train_labels[i])
plt.show()

## Build the model
Building the neural network requires configuring the layers of the model, then compiling the model. We will start with a single Dense layer. 

### Set up the layers

The basic building block of a neural network is the layer. Layers extract representations from the data fed into them. For example, the first layer in a network might learn to detect edges (combinations of pixels), and the next layer may learn to detect lines (combinations of edges). Most of deep learning consists of chaining together simple layers. Most layers, such as [tf.keras.layers.Dense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense), have parameters that are learned during training.

### Create a multiclass logistic regression model

This model contains a single Dense layer. It is not a neural network yet, it is equivilent to multiclass logistic regression. If you add another layer, it will become a neural network. You will do that in an exercise below.

In [0]:
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(10, activation='softmax')
])

The first layer in this network, [tf.keras.layers.Flatten](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten), transforms the format of the images from a two-dimensional array (of 28 by 28 pixels) to a one-dimensional array (of 28 * 28 = 784 pixels). Think of this layer as unstacking rows of pixels in the image and lining them up. This layer has no parameters to learn; it only reformats the data. This is necessary since Dense layers require arrays as input.

After the pixels are flattened, this model consists of a single Dense layer. This is a densely connected, or fully connected, neural layer. The Dense layer has 10 neurons with softmax activation. This returns an array of 10 probability scores that sum to 1. 

After classifying an image, each neuron will contains a score that indicates the probability that the current image belongs to one of the 10 classes.

## Compile the model

Before the model is ready for training, it needs a few more settings. These are added during the model's compile step:

*Loss function* — This measures how accurate the model is during training. You want to minimize this function to "steer" the model in the right direction.

*Optimizer* — This is how the model is updated based on the data it sees and its loss function.

*Metrics* — Used to monitor the training and testing steps. The following example uses accuracy, the fraction of the images that are correctly classified.

In [0]:
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

## Train the model
Training the neural network model requires the following steps:

1. Feed the training data to the model. In this example, the training data is in the ```train_images``` and ```train_labels``` arrays.

1. The model learns to associate images and labels.

1. You ask the model to make predictions about a test set—in this example, the ```test_images``` array.

1. Verify that the predictions match the labels from the ```test_labels``` array.

To begin training, call the ```model.fit``` method — so called because it "fits" the model to the training data:

In [0]:
model.fit(train_images, train_labels, epochs=10)

As the model trains, the loss and accuracy metrics are displayed. This model reaches an accuracy of about 0.90 (or 90%) on the training data.

## Evaluate accuracy
Next, compare how the model performs on the test dataset:

In [0]:
test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)
print('\nTest accuracy:', test_acc)

It turns out that the accuracy on the test dataset is a little less than the accuracy on the training dataset. This gap between training accuracy and test accuracy represents overfitting. Overfitting is when a machine learning model performs worse on new, previously unseen inputs than on the training data. An overfitted model "memorizes" the training data—with less accuracy on testing data. 

You will learn more about how to mitigate overfitting in the next tutorial.

## Make predictions
With the model trained, you can use it to make predictions about some images.

In [0]:
predictions = model.predict(test_images)

Here, the model has predicted the label for each image in the testing set. Let's take a look at the first prediction:

In [0]:
print(predictions[0])

A prediction is an array of 10 numbers. They represent the model's "confidence" that the image corresponds to each of the 10 digits. You can see which label has the highest confidence value:

In [0]:
print(tf.argmax(predictions[0]))

# Exercise 1: MNIST

In the above tutorial, you used a single Dense layer to reach about 90% accuracy. Now, let's transform your model into a neural network to reach about 95% accuracy. To do so, add a second Dense layer, then compile and train the model.

In [0]:
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    # TODO: your code here
    # Add a Dense layer with 128 neurons and relu activation.
    keras.layers.Dense(10, activation='softmax')
])

In [0]:
# Now, compile your new model by uncommenting this code 
# model.compile(optimizer='adam',
#              loss='sparse_categorical_crossentropy',
#              metrics=['accuracy'])

In [0]:
# Finally, train your model by uncommenting this code
# model.fit(train_images, train_labels, epochs=10)

**Questions to think about:**
- How does the accuracy of your neural network compare with your first model? 
- What is the effect of epochs on accuracy? On validation accuracy?
- What is the effect of the number of neurons per layer?

After completing this exercise, you will have trained a neural network. Deep learning is "code-light, and concept-heavy". That is to say, while there's only a few lines you need to write to train a DNN, you will spend a good deal of time studying all the different concepts involved, like gradient descent.

Now, let's work with a real-world dataset of thousands of images of cats and dogs.

## Solution

This is a neural network:

```
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])
```

This is a deep neural network:

```
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])
```

This is a deeper neural network =D

```
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])
```

# Game break 1: Teachable Machine
If you'd like to, now would be a great time to try the Teachable Machine.

https://teachablemachine.withgoogle.com/

# Tutorial 2: Cats and Dogs
In this exercise, you will train a convolutional neural network to classify images of cats and dogs, using a real-world dataset you will download, then read from disk.

 ## Download and explore the dataset

One nice thing about running this notebook in Colab, although you are downloading large files - you are doing so on Google Cloud Platform, rather than using your local WiFi connection. This means downloads will usually be fast.

In [0]:
import os

In [0]:
origin = 'https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip'
path_to_zip = tf.keras.utils.get_file('cats_and_dogs.zip', origin=origin, extract=True)
path_to_folder = os.path.join(os.path.dirname(path_to_zip), 'cats_and_dogs_filtered')

The unzipped dataset has the following directory structure:

<pre>
<b>cats_and_dogs_filtered</b>
|__ <b>train</b>
    |______ <b>cats</b>: [cat.0.jpg, cat.1.jpg, cat.2.jpg ....]
    |______ <b>dogs</b>: [dog.0.jpg, dog.1.jpg, dog.2.jpg ...]
|__ <b>validation</b>
    |______ <b>cats</b>: [cat.2000.jpg, cat.2001.jpg, cat.2002.jpg ....]
    |______ <b>dogs</b>: [dog.2000.jpg, dog.2001.jpg, dog.2002.jpg ...]
</pre>

Create variables that point to each of these directories.

In [0]:
train_dir = os.path.join(path_to_folder, 'train')
validation_dir = os.path.join(path_to_folder, 'validation')

In [0]:
train_cats_dir = os.path.join(train_dir, 'cats')
train_dogs_dir = os.path.join(train_dir, 'dogs')
validation_cats_dir = os.path.join(validation_dir, 'cats')
validation_dogs_dir = os.path.join(validation_dir, 'dogs')

Count the number of images in each directory.

In [0]:
num_cats_tr = len(os.listdir(train_cats_dir))
num_dogs_tr = len(os.listdir(train_dogs_dir))

num_cats_val = len(os.listdir(validation_cats_dir))
num_dogs_val = len(os.listdir(validation_dogs_dir))

total_train = num_cats_tr + num_dogs_tr
total_val = num_cats_val + num_dogs_val

print('Total training cat images:', num_cats_tr)
print('Total training dog images:', num_dogs_tr)
print('Total validation cat images:', num_cats_val)
print('Total validation dog images:', num_dogs_val)
print('---')
print("Total training images:", total_train)
print("Total validation images:", total_val)

Tip: in addition to Python, you can run shell commands in Colab (for example, ```!ls $train_cats_dir```).

In [0]:
!ls $train_cats_dir 

Let's take a look at a couple images.

In [0]:
_ = plt.imshow(plt.imread(os.path.join(train_cats_dir, "cat.0.jpg")))

In [0]:
_ = plt.imshow(plt.imread(os.path.join(train_cats_dir, "cat.1.jpg")))

## Data preprocessing

We will now need a way to read images from disk, and to format them into appropriately pre-processed floating point arrays before using them to train our network. Specifically, we will need to:

- Read the image off disk.
- Decode contents of these images and convert them into RGB arrays.
- Convert the pixels values from integer to floating point numbers.
- Rescale the pixel from values between 0 and 255 to values between 0 and 1 (neural networks work better with small input values - under the hood, each input is multiplied by a weight - large inputs could result in overflow).

All these tasks can be done with the `ImageDataGenerator` class provided by `tf.keras`. It can read images from disk and preprocess them into proper arrays.

In [0]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

In [0]:
# We will resize all images to the same size when they are read of disk.
IMG_HEIGHT = 150
IMG_WIDTH = 150

In [0]:
# Rescale the pixel values from 0-255 to 0-1
train_image_generator = ImageDataGenerator(rescale=1./255)
validation_image_generator = ImageDataGenerator(rescale=1./255)

After defining the generators for training and validation images, the `flow_from_directory` method load images from the disk, applies rescaling, and resizes the images into the required dimensions.

In [0]:
batch_size = 64 # Read a batch of 64 images at each step

In [0]:
train_data_gen = train_image_generator.flow_from_directory(batch_size=batch_size,
                                                           directory=train_dir,
                                                           shuffle=True,
                                                           target_size=(IMG_HEIGHT, IMG_WIDTH),
                                                           class_mode='binary')

In [0]:
val_data_gen = validation_image_generator.flow_from_directory(batch_size=batch_size,
                                                              directory=validation_dir,
                                                              target_size=(IMG_HEIGHT, IMG_WIDTH),
                                                              class_mode='binary')

## Display a few images and their labels

Next, we will extract a batch of images from the training generator, then plot several of them with `matplotlib`. The `next` function returns a batch from the dataset. The return value of `next` function is in form of `(x_train, y_train)` where x_train is training features and y_train, its labels.

In [0]:
image_batch, labels_batch = next(train_data_gen)

In [0]:
# (64, 150, 150, 3) means a list of 64 images, each of which is 150x150x3.
# 3 refers to the R,G,B color channels.
print(image_batch.shape)

In [0]:
# (64,) means a list of 64 numbers
# each of these will either be 0, or 1
print(labels_batch.shape)

In [0]:
# This function will plot images
# in the form of a grid with 1 row and 5 columns
def plot_images(images):
  fig, axes = plt.subplots(1, 5, figsize=(10,10))
  axes = axes.flatten()
  for img, ax in zip(images, axes):
      ax.imshow(img)
      ax.axis('off')
  plt.tight_layout()
  plt.show() 

In [0]:
plot_images(image_batch[:5])

Let's see how we can retrieve the labels, and which class they correspond to.

In [0]:
print(labels_batch[:5])

In [0]:
print(train_data_gen.class_indices)

## Create the model
Your model will consist of three convolutional blocks followed by max pooling. There's a fully connected layer with 256 units on top. This model will output class probabilities (between 0 and 1) based on the `sigmoid` activation function. If the output is closer to 1, the image will be classified as a dog, otherwise a cat.

In [0]:
from tensorflow.keras.layers import Conv2D, Dense, Flatten, MaxPooling2D
from tensorflow.keras.models import Sequential

In [0]:
model = Sequential([
    Conv2D(32, 3, padding='same', activation='relu', 
           input_shape=(IMG_HEIGHT, IMG_WIDTH ,3)),
    MaxPooling2D(),
    Conv2D(32, 3, padding='same', activation='relu'),
    MaxPooling2D(),
    Conv2D(64, 3, padding='same', activation='relu'),
    MaxPooling2D(),
    Flatten(),
    Dense(256, activation='relu'),
    Dense(1, activation='sigmoid')
])

Compile the model, and select the adam optimizer to be used for gradient descent, and binary cross entropy for our loss function (roughly, cross entropy is a way to measure the distance between the prediction we wanted the network to make, and the prediction it made).

In [0]:
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

## Model summary

View all the layers of the network using the model's `summary` method:

In [0]:
model.summary()

This model has about 5M parameters to learn. Notice, nearly all of them are in the Dense layer at the bottom!

## Train the model

Use the `fit` method to train the network. You will train the model for 15 epochs (an epoch is one "sweep" over the training set, where each image is used exactly once to perform a round of gradient descent, and update the models parameters). 

The longer you train the model, the more accurate it will become on the training set (but the more likely it is to overfit, or memorize the training images rather than learn patterns that enable it to perform well on the unseen data in the validation set). Overfitting is a challenge you will address in the next tutorial. 

In [0]:
epochs = 15

In [0]:
history = model.fit(
    train_data_gen,
    epochs=epochs,
    validation_data=val_data_gen,
)

After training, your model is likely 90% accurate on the training set, but probably only about 70% accurate on the validation set.

## Evaluate your model
Let's create plots to help us see the difference. Accuracy on the validation data is important: it helps you estimate how well our model is likely to work on new, unseen data in the future. 

We will create two plots, one for accuracy, and another for loss. Roughly, loss (or error) is the inverse of accuracy (lower is better). Unlike accuracy, loss takes the confidence of a prediction into account (a confidently wrong predicitions has a higher loss than one that is only slightly wrong).

In [0]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs_range = range(epochs)

plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()

When there are a small number of training examples, the model sometimes learns from noises or unwanted details from training examples, to an extent that it negatively impacts the performance of the model on new examples. This phenomenon causes overfitting. It means that the model will have a difficult time generalizing on a new dataset, or making accurate predictions on images of cats and dogs that weren't included in the training set. In the next tutorial, you'll use two techniques to mitigate overfitting: data augmentation and dropout. First, let's practice training a CNN on a new dataset.

# Exercise 2: Flowers

In this exercise, you write a CNN and use it to classify five different types of flowers (sunflowers, tulips, etc). The dataset contains 1000 images in the training set, and 500 in the validation set.

You will download the dataset, read and preprocess the images using ImageDataGenerator, then create, train and evaluate a model. 

A code outline is written for you, and there are several TODOs for you to complete, using the same pattern as the tutorial above.

### Download the dataset

In [0]:
origin = 'https://storage.googleapis.com/tensorflow-blog/datasets/mini_flowers.zip'
path_to_zip = tf.keras.utils.get_file('mini_flowers.zip', origin=origin, extract=True)
path_to_folder = os.path.join(os.path.dirname(path_to_zip))

train_dir = os.path.join(path_to_folder, "train/")
val_dir = os.path.join(path_to_folder, "val/")

### Read the images off disk

In [0]:
train_image_generator = ImageDataGenerator(rescale=1./255)

In [0]:
train_data_gen = train_image_generator.flow_from_directory(batch_size=32,
                                                           directory=train_dir,
                                                           shuffle=True,
                                                           target_size=(IMG_HEIGHT, IMG_WIDTH),
                                                           class_mode='categorical')

### Plot images and their labels

In [0]:
image_batch, labels_batch = next(train_data_gen)

In [0]:
plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(image_batch[i])
    plt.xlabel(str(labels_batch[i]))
plt.show()

## Understanding one-hot labels

Notice the labels are in one-hot format. Let's add some code to display the class names.

In [0]:
print(train_data_gen.class_indices)

In [0]:
class_names = {v:k for k,v in train_data_gen.class_indices.items()}

In [0]:
plt.figure(figsize=(10,10))
for i in range(25):
  plt.subplot(5,5,i+1)
  plt.xticks([])
  plt.yticks([])
  plt.grid(False)
  plt.imshow(image_batch[i])
  plt.xlabel(class_names[tf.argmax(labels_batch[i]).numpy()])
plt.show()

## Read the validation images

In [0]:
# Above, you created a ImageDataGenerator for the training set
# Next, create one to read the validation images
# For example:
# validation_image_generator = ImageDataGenerator ...
# val_data_gen = validation_image_generator.flow_from_directory ...

## Create a CNN

Now, it's time to define your model. You can create a similar model to the CNN used in the tutorial above.

The only difference is that the final Dense layer of your model (which classifies the data based on the features provided by the convolutional base) must use softmax activation and have five output classes:

```model.add(Dense(5, activation='softmax'))```

This is because we now have five different types of flowers, instead of just cats and dogs.

In [0]:
# TODO: your code here
# Define a CNN using code similar to the above 
# For example
# model = Sequential()
# model.add ...
# ...
# The last line of your model should be:
# model.add(Dense(5, activation='softmax'))

After you have defined your model, compile it by uncommenting and running this code. Important: notice that the loss has changed to ```categorical_crossentropy```. This is necessary because the labels are in one-hot format. Finally, although these loss functions sound complicated, there are only a handful for you to learn.


In [0]:
#model.compile(optimizer='adam',
#              loss='categorical_crossentropy',
#              metrics=['accuracy'])

Now train your model for 10 epochs using ```model.fit```. If you like, you can try to create plots of the training and validation accuracy and loss.

In [0]:
# TODO: your code here
# For example
# model.fit ...

If all has gone well, your model should be about 90% accurate on the training data.

## Solution

``` 
# Read the validation images
validation_image_generator = ImageDataGenerator(rescale=1./255)
val_data_gen = validation_image_generator.flow_from_directory(batch_size=32,
                                                              directory=val_dir,
                                                              shuffle=True,
                                                              target_size=(IMG_HEIGHT, IMG_WIDTH),
                                                              class_mode='categorical')
```

```
# Define a model
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', 
                        input_shape=(IMG_HEIGHT, IMG_WIDTH, 3)))
model.add(MaxPooling2D())
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D())
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D())

model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(5, activation='softmax'))
```

```
# Train the model
history = model.fit(
    train_data_gen,
    epochs=10,
    validation_data=val_data_gen,
)
```

# Game break 2: Quick, Draw!
If you'd like, now would be a good time to take a break from coding for a few minutes and try Quick, Draw!

https://quickdraw.withgoogle.com/

# Tutorial 3: How-to reduce overfitting


## Data augmentation
Overfitting often occurs when there are a "small" number of training examples. One way to fix this problem is to augment the dataset so that it has a larger number of training examples. Data augmentation generates more training data from existing training samples by applying random transformations (for example, rotation) that yield believable-looking images. With data augmentation, the model will never see the exact same picture twice during training. This helps expose the model to more aspects of the data, and can lead to better generalization.

You can implement this using the ImageDataGenerator. Specifiy different transformations to the dataset and it will take care of applying it during the training process.

In [0]:
# Paths to the cats and dogs dataset from tutorial 2
path_to_folder = os.path.join(os.path.dirname(path_to_zip), 'cats_and_dogs_filtered')
train_dir = os.path.join(path_to_folder, 'train')
validation_dir = os.path.join(path_to_folder, 'validation')

In [0]:
image_gen_train = ImageDataGenerator(
                    rescale=1./255,
                    rotation_range=45,
                    width_shift_range=.15,
                    height_shift_range=.15,
                    horizontal_flip=True,
                    zoom_range=0.5
                    )

In [0]:
train_data_gen = image_gen_train.flow_from_directory(batch_size=32,
                                                     directory=train_dir,
                                                     shuffle=True,
                                                     target_size=(IMG_HEIGHT, IMG_WIDTH),
                                                     class_mode='binary')

In [0]:
# Show how the same image appears with different data augmentation
augmented_images = [train_data_gen[0][0][0] for i in range(5)]
plot_images(augmented_images)

We only apply data augmentation to the training examples.

In [0]:
image_gen_val = ImageDataGenerator(rescale=1./255)

In [0]:
val_data_gen = image_gen_val.flow_from_directory(batch_size=32,
                                                 directory=validation_dir,
                                                 target_size=(IMG_HEIGHT, IMG_WIDTH),
                                                 class_mode='binary')

## Dropout

Another technique to reduce overfitting is to introduce dropout to the network. Dropout is a form of regularization that makes it more difficult for the network to memorize rare details (instead, it is forced to learn more general patterns).

When you apply dropout to a layer it randomly drops out (set to zero) a number of activations during training. Dropout takes a fractional number as its input value, in the form such as 0.1, 0.2, 0.4, etc. This means dropping out 10%, 20% or 40% of the output units randomly from the applied layer.

When appling 0.1 dropout to a certain layer, it randomly deactivates 10% of the output units in each training epoch.

Create a new network architecture using Dropout.

In [0]:
from tensorflow.keras.layers import Dropout

In [0]:
model = Sequential([
    Conv2D(16, 3, padding='same', activation='relu', 
           input_shape=(IMG_HEIGHT, IMG_WIDTH ,3)),
    MaxPooling2D(),
    Dropout(0.2),
    Conv2D(32, 3, padding='same', activation='relu'),
    MaxPooling2D(),
    Conv2D(64, 3, padding='same', activation='relu'),
    MaxPooling2D(),
    Dropout(0.2),
    Flatten(),
    Dense(256, activation='relu'),
    Dense(1, activation='sigmoid')
])

After introducing dropout to the network, compile the model and view the layers summary.

In [0]:
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.summary()

## Train your new model
This model will need to be trained for longer (more epochs) to achieve high accuracy. As this will take some time, here we'll train for only 15 epochs so we can move on to part two of this notebook. If you like, you can continue training at home. See if your new model can achieve higher validation accuracy than your original one.

In [0]:
history = model.fit(
    train_data_gen,
    epochs=epochs,
    validation_data=val_data_gen,
)

## Evaluate your new model
Create plots of accuracy and loss. If you compare them with your previous plots, the new model should show less overfitting than the original one.

In [0]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs_range = range(epochs)

plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()

You can now train your new model for longer (by increasing the number of epochs) and reach higher validation accuracy. As this will take some time to train, we will leave this as an exercise for you to explore at home, and continue on to part two.

# Game break 3: Sketch-RNN
If you'd like, now would be a good time to take a break from coding for a few minutes and try Sketch-RNN, an experiment where a computer attempts to "auto-complete" your drawings, based on the Quick, Draw! dataset.

https://magenta.tensorflow.org/sketch-rnn-demo

# Tutorial 4: Deep Dream
In this tutorial, you will see how to implement a minimal version of DeepDream, an experiment to visualize some of the features a convolutional neural network has learned to detect.

In [0]:
import numpy as np
from IPython.display import clear_output

## Download and display an image

In [0]:
url = 'https://storage.googleapis.com/download.tensorflow.org/example_images/YellowLabradorLooking_new.jpg'

In [0]:
def download(url, target_size=None):
  name = url.split('/')[-1]
  image_path = tf.keras.utils.get_file(name, origin=url)
  return tf.keras.preprocessing.image.load_img(image_path, target_size)

def show(img):
  plt.figure(figsize=(8,8))
  plt.grid(False)
  plt.axis('off')
  plt.imshow(img)
  plt.show()

original_img = download(url, target_size=[225, 375])
original_img = np.array(original_img)
show(original_img)

## Code to scale the pixel values

In [0]:
def preprocess(img):
  """ Convert RGB values from [0, 255] to [-1, 1] """
  img = tf.cast(img, tf.float32)
  img /= 128.0
  img -= 1.
  return img

def unprocess(img):
  """ Undo the preprocessing above """
  img = 255 * (img + 1.0) / 2.0
  return tf.cast(img, tf.uint8)

## Import a large, pretrained CNN
This model has been trained on ImageNet, a dataset with about 1M images in about 1K classes

In [0]:
conv_base = tf.keras.applications.InceptionV3(weights='imagenet', 
                                              include_top=False)

## Choose layers to excite
Normally, when you train a neural network, you use gradient descent to adjust the weights to minimize loss, in order to accurately classify images. In DeepDream, the trick is to use gradient descent to adjust the **image**, in order to increasingly activate certain layers from the network. You can explore different layers and see how this affects the results. You can find all the layer names using ```model.summary()```. 

In [0]:
names = ['mixed2', 'mixed3', 'mixed4', 'mixed5']
layers = [conv_base.get_layer(name).output for name in names]
model = tf.keras.Model(inputs=conv_base.input, outputs=layers)

## Implement a custom loss function
Normally, we would use cross-entropy loss (for classification), or mean squared error (for regression). Here, we'll write a loss function that describes how activated our layers were by the image.

In [0]:
def calc_loss(img):
  img_batch = tf.expand_dims(img, axis=0)
  layer_activations = model(img_batch)
  losses = [tf.math.reduce_mean(act) for act in layer_activations]
  return tf.reduce_sum(losses)

## Use the gradients to progressively modify the image to increasingly activate our layers

In [0]:
@tf.function
def step(img, lr=0.001):
  with tf.GradientTape() as tape:
    loss = calc_loss(img)

  gradients = tape.gradient(loss, img)
  gradients /= tf.math.reduce_std(gradients) + 1e-8 

  # Because the gradients are in the same shape 
  # as the image, we can directly add them to it!
  img.assign_add(gradients * lr)
  img.assign(tf.clip_by_value(img, -1, 1))

In [0]:
img = tf.Variable(preprocess(original_img))

steps = 1000
for i in range(steps):
  step(img)
  if i % 200 == 0:
    clear_output(wait=True)
    print ("Step {}".format(i))
    show(unprocess(img.numpy()))

clear_output(wait=True)
show(unprocess(img.numpy()))

Just for fun, you can try running DeepDream on your own images, or explore different combinations of layers. DeepDream is an advanced tutorial, and our goal here is just to show you some of the fascinating (and unexpected) things you can explore with DeepLearning.

# Learning more

To learn more about TensorFlow and Machine Learning in general, we recommend the book "Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow" ([GitHub](https://github.com/ageron/handson-ml2), [Website](https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/)). 

You can also stay in touch with the latest TensorFlow developments by:
- Reading our blog http://blog.tensorflow.org/
- Watching our YouTube channel https://www.youtube.com/tensorflow.
- And following our Twitter account: https://twitter.com/tensorflow

Thank you!