<a href="https://colab.research.google.com/github/rojiark/DeepLearningWorkshop/blob/master/DeepLearningWorkshop.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Jupyter Notebook <a name="1"></a>

The document you are reading is a  [Jupyter notebook](https://jupyter.org/), hosted in Colaboratory. It is not a static page, but an interactive environment that lets you write and execute code in Python and other languages. There are two main cell types in a notebook:


*   A **code cell** contains code to be executed in the kernel and displays its output below.
*   A **markdown cell** contains text formatted using Markdown and displays its output in-place when it is run.





## Code cells
For example, here is a **code cell** with a short Python script that computes a value, stores it in a variable, and prints the result:

In [0]:
workshop_duration = 2 * 60 * 60
print ("Workshop duration: ", workshop_duration, "seconds")

To execute the code in the above cell, select it and execute the contents in the following ways:

* Click the **Play icon** in the left gutter of the cell;
* Type **Cmd/Ctrl+Enter** to run the cell in place;
* Type **Shift+Enter** to run the cell and move focus to the next cell (adding one if none exists)
* Type **Alt+Enter** to run the cell and insert a new code cell immediately below it.

There are additional options for running some or all cells in the **Runtime** menu.

All cells modify the same **global** state, so variables that you define by executing a cell can be used in other cells:

In [0]:
workshop_duration = workshop_duration + (15 * 60)
print ("Workshop duration: ", workshop_duration, "seconds")

You can also execute **bash commands** by adding an **\!** sign at the start of the line:

In [0]:
!echo Hello World

**Magics** are shorthand annotations that change how a cell's text is executed. To learn more, see [Jupyter's magics page](http://nbviewer.jupyter.org/github/ipython/ipython/blob/1.x/examples/notebooks/Cell%20Magics.ipynb). 

In [0]:
%%html
<marquee style='width: 30%; color: blue;'><b>Goooooooo Cobras!</b></marquee>

## Markdown cells

This is a **markdown cell**. You can **double-click** to edit this cell. Text cells
use markdown syntax. To learn more, see the [markdown
guide](/notebooks/markdown_guide.ipynb).

You can also add math to text cells using [LaTeX](http://www.latex-project.org/)
to be rendered by [MathJax](https://www.mathjax.org). Just place the statement
within a pair of **\$** signs.

For example `$\sqrt{3x-1}+(1+x)^2$` becomes
$\sqrt{3x-1}+(1+x)^2.$




# Google Colaboratory <a name="2"></a>

Google Colab is an interactive document that lets you write, run and share Python code in Google Drive. You can think of colab as a **Jupyter Notebook** stored on the **cloud**. Colab connect the notebook to a cloud based runtime, meaning you can execute Python code withuot any setup or impact in your machine.

Colab offers several code snippets for typical Python tasks available on the sidebar.

In [0]:
# load an example dataset
from vega_datasets import data
cars = data.cars()

import altair as alt

interval = alt.selection_interval()

alt.Chart(cars).mark_point().encode(
  x='Horsepower',
  y='Miles_per_Gallon',
  color=alt.condition(interval, 'Origin', alt.value('lightgray'))
).properties(
  selection=interval
)

You can share notebooks via Google drive sharing or by exporting them to GitHub. The notebooks are saved in the standard jupyter notebook format, so they can be visualized with any compatible tool or framework.

You can also upload and open files using the sidebar menu, or directly with a code cell:


In [0]:
# Clone the entire repo.
!git clone -l -s git://github.com/jakevdp/PythonDataScienceHandbook.git cloned-repo
%cd cloned-repo
!ls

In [0]:
# Fetch a single <1MB file using the raw GitHub URL.
!curl --remote-name \
     -H 'Accept: application/vnd.github.v3.raw' \
     --location https://api.github.com/repos/jakevdp/PythonDataScienceHandbook/contents/notebooks/data/california_cities.csv

In [0]:
from google.colab import files
file = files.upload()

## Runtime

You can connect to different runtimes with google colab with different locations and resources:


*   Google Cloud runtime
*   Local runtime
*   Hosted runtime
*   CPU / GPU / TPU runtime
*   Python2 / Python3 runtime

Let's find out the available resources in our current runtime:

In [0]:
from tensorflow.python.client import device_lib
device_lib.list_local_devices()

In [0]:
!cat /proc/cpuinfo
!cat /proc/meminfo

To change the runtime type follow these steps:



1.   Navigate to Runtime → Change runtime type
2.   Select Hardware accelerator → GPU → SAVE
3.   Navigate to Runtime → Reset all runtimes ... → YES
4.   Now your runtime should have the GPU enabled


In [0]:
from tensorflow.python.client import device_lib
device_lib.list_local_devices()

Now repeat the same steps for TPU:

1.   Navigate to Runtime → Change runtime type
2.   Select Hardware accelerator → TPU → SAVE
3.   Navigate to Runtime → Reset all runtimes ... → YES
4.   Now your runtime should have the TPU enabled ?


In [0]:
from tensorflow.python.client import device_lib
device_lib.list_local_devices()

**What is wrong?**

TPUs provided by colab are not available in the same virtual machine, but rather provided as a service.

The enviroment variable `COLAB_TPU_ADDR` provides the TPU cluster address for your runtime:



In [0]:
import os

try:
  print (os.environ['COLAB_TPU_ADDR'])
except:
  print ('COLAB_TPU_ADDR is not available in the runtime')

## Usage considerations

*   Google colab is free to use
*   There isn't currently a way to be charged / pay for more resources or computation time
*   The runtime will reset after 10-15 minutes IDLE or 12 hours of continuous computation
*   When the runtime resets you lose all the uploaded / saved files and environment 
*   Resourcess are asigned dynamically acording to the type of work you are doing
*   The best available hardware is prioritized for users who use Colab interactively rather than for long-running computations. For example, the NVIDIA T4 GPU (\$5000) can be replaced with an NVIDIA Tesla K80 (\$900) if your computation is taking too long
*   You can use applications such as ngrok to build a network tunnel and access your virtual machine from outside.
*   Manage your active sessions with Runtime → Manage sessions
*   To avoid loosing all your data when the runtime resets connect your google drive to Colab and save checkpoints


In [0]:
# Run this code to mount your google drive in your Colab virtual machine
from google.colab import drive
drive.mount('/content/drive')

*   Stack overflow =)

In [0]:
print (var)

# Tensorflow

TensorFlow is an open-source machine learning library for research and production. TensorFlow offers APIs for beginners and experts to develop for desktop, mobile, web, and cloud. See the sections below to get started. Starting with Tensorflow 2.0, it includes the Keras API, a powerfull high level API that was initially a different project.



## Example: MNIST digits dataset classification

<img src="https://raw.githubusercontent.com/datapythonista/mnist/master/img/samples.png" alt="Drawing" style="width: 200px;"/>

The MNIST dataset contains a lot of handwritten digits. Members of the AI/ML/Data Science community love this dataset and use it as a benchmark to validate their algorithms. In fact, MNIST is often the first dataset researchers try. "If it doesn't work on MNIST, it won't work at all", they said. "Well, if it does work on MNIST, it may still fail on others." In this example we will use a simple neural network to classify the dataset.

Note that Google Colab already has Tensorflow 1.14.0 installed:

In [0]:
# To determine which version you're using:
!pip show tensorflow

In [0]:
import tensorflow as tf
print (tf.__version__)

First, load and process the **dataset**:

In [0]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape((x_train.shape[0], 28, 28, 1))
x_test = x_test.reshape((x_test.shape[0], 28, 28, 1))

Create your **model** using the Keras API. The Keras API was integrated into tensorflow in version 1.14.0. Sequential models are the bread and butter of keras, a basic model with a single data flow, input, and output. This is a classic Neural network with one 512 neurons hidden layer and a dropout of 20%.


*   **Flatten**: Transforms multidimensional into a 1D array. We only specify `input_shape` because it is the first layer of our model. Subsequent layers assume the input shape to be the same as the output of the previous layer
*   **Dense**: Fully connected layer. The first parameter is the number of neurons and the second parameter their activation function.
*   **Dropout**: A technique to avoid overfitting. Deactivate percentage of random connections on the network on each pass 





In [0]:
def create_model():
  model = tf.keras.models.Sequential()
  model.add(tf.keras.layers.Flatten(input_shape=x_train.shape[1:]))
  model.add(tf.keras.layers.Dense(512, activation=tf.nn.relu))
  model.add(tf.keras.layers.Dropout(0.2))
  model.add(tf.keras.layers.Dense(10, activation=tf.nn.softmax))
  return model

**Train** the model using **CPU**. The **compile** function is used to configure the model for training, here we need to define the optimizer and loss function. Optionally we can add optimizer parameters and metrics. The **fit** function performs the training.

In [0]:
import time

with tf.device('/cpu:0'):
  model = create_model()
  model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
  t = time.time()
  model.fit(x_train, y_train, epochs=5)
  cpu_time = time.time() - t
  model.evaluate(x_test, y_test)

print ("CPU time: ", cpu_time, "s")

Now let's train the same model using a **Tensor Processing Units** (TPU). TPUs are harder to use than just a GPU because they are a cluster of nodes, you need to distribute work to each node. Fortunately, tensorflow has an API to distribute work to TPUs and other clusters.

**tf.distribute.cluster_resolver**

This library contains all implementations of ClusterResolvers. ClusterResolvers are a way of specifying cluster information for distributed execution. Built on top of existing ClusterSpec framework, ClusterResolvers are a way for TensorFlow to communicate with various cluster management systems (e.g. GCE, AWS, etc...).

Classes:


*   ClusterResolver: Abstract
*   GCEClusterResolver: Google Compute Engine
*   KubernetesClusterResolver: Google Containers
*   SimpleClusterResolver: ClusterSpec
*   SlurmClusterResolver: Slurm Workload Manager
*   TFConfigClusterResolver: TF_CONFIG EnvVar
*   **TPUClusterResolver**: TPU
*   UnionResolver: Union of multiple resolvers

TPUs support the following data types:

*   tf.float32
*   tf.complex64
*   tf.int64
*   tf.bool
*   tf.bfloat64

Let's use this API to initialize the TPU and create an strategy to train our model. **Change your runtime type to TPU, reset runtimes, and re-run the previous steps**.

In [0]:
import os

resolver = tf.contrib.cluster_resolver.TPUClusterResolver('grpc://' + os.environ['COLAB_TPU_ADDR'])
tf.contrib.distribute.initialize_tpu_system(resolver)
strategy = tf.contrib.distribute.TPUStrategy(resolver)

Now use the **strategy** scope to train your model

In [0]:
import time
import numpy as np

with strategy.scope():
  model = create_model()
  model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
  t = time.time()
  model.fit(x_train.astype(np.float32), y_train.astype(np.float32), epochs=5) #TPUs don't support uint8
  tpu_time = time.time() - t
  model.evaluate(x_test.astype(np.float32), y_test.astype(np.float32))

print ("TPU time: ", tpu_time, "s")

**Why is TPU slower?**

Currently each call to a TPU function copies the weights to the TPU before it can start running, this affects small operations more significantly. Smaller models will have worst performance on TPU, since it is optimized for large operations.



Now let's train on **GPU**, for this step forward we will be using Tensorflow 2.0 Release Candidate 0. On previous steps we used tf 1.14.0 because the cluster_resolver changed and it is not very stable, so TPU training was impossible.

Change the runtime to GPU and Install **tf 2.0 rc0 gpu**

In [0]:
!pip install tensorflow-gpu==2.0.0-rc0

# To determine which version you're using:
!pip show tensorflow-gpu

Check that tf is seeing the GPU device and that the version imported is the right one

In [0]:
import tensorflow as tf
print (tf.test.gpu_device_name())
print (tf.__version__)

Check the devices

In [0]:
from tensorflow.python.client import device_lib
device_lib.list_local_devices()

Now repeat the training with **GPU**

In [0]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape((x_train.shape[0], 28, 28, 1))
x_test = x_test.reshape((x_test.shape[0], 28, 28, 1))

def create_model():
  model = tf.keras.models.Sequential()
  model.add(tf.keras.layers.Flatten(input_shape=x_train.shape[1:]))
  model.add(tf.keras.layers.Dense(512, activation=tf.nn.relu))
  model.add(tf.keras.layers.Dropout(0.2))
  model.add(tf.keras.layers.Dense(10, activation=tf.nn.softmax))
  return model

In [0]:
import time

with tf.device('/gpu:0'):
  model = create_model()
  model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
  t = time.time()
  model.fit(x_train, y_train, epochs=5)
  gpu_time = time.time() - t
  model.evaluate(x_test, y_test)

print ("GPU time: ", gpu_time, "s")

Now save the model generated:

In [0]:
model.save_weights('./mnist.h5', overwrite=True)

You can open a saved model from an hdf5 with:

In [0]:
loaded_model = create_model()
loaded_model.load_weights('./mnist.h5')
model.evaluate(x_test, y_test)

## Callbacks

Callbacks are an important part of the keras API. They are utilities that can be called at certain points during model training.

*   **BaseLogger**: Log metrics for each epoch
*   **EarlyStopping**: Stop training when a certain metric is met
*   **LearningRateScheduler**: Change the learning rate of the optimizer on the fly
*   **ModelCheckpoint**: Save the model after every epoch
*   **ReduceLROnPlateau**: Reduce learning rate when a certain metric is met
*   **TensorBoard**: Enable visualizations for TensorBoard

Early stopping and checkpoints are the most important callbacks to have in you model. 



**Early stoping** prevents overfitting and overtraining to an extend. You define a metric, treshold, and condition, if it is met for a certain number of epochs, the model will automatically stop. IT is also usefull if you don't know how many epochs your training will take.

Here is an example of early stopping with the same model we've been using:

In [0]:
model = create_model()

# Early stop callback
early_stop = tf.keras.callbacks.EarlyStopping(monitor='accuracy', min_delta=0.01, patience=2, mode='max', verbose=1)

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=1000, callbacks = [early_stop])


The **model checkpoint** callback creates a model checkpoint after each epoch or after a certain number of samples. It can also be configured to store only the best model.

In [0]:
model = create_model()

# Model checkpoint callback
checkpoint = tf.keras.callbacks.ModelCheckpoint("mnist_epoch_{epoch:02d}.h5", monitor='accuracy', verbose=1, save_best_only=False, mode='max', save_freq='epoch')

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=3, callbacks = [checkpoint])

## Data augmentation
Data augmentation is one of the most usefull trick when working with CNNs. Deep neural networks have a lot of parameters, so to prevent overfitting we need a lot of data. Something the available dataset is not big enough for the model we are trying to train, in these cases data augmentation can help. It consist on generating data with random scaling, rotation, croping and sometimes noise. A rule of thumb is that **the number of parameters and data should be around the same order of magnitude**.




In [0]:
model = create_model()
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.summary()
print (x_train.shape)
print (y_train.shape)

**ImageDataGenerator** is a keras tool for data augmentation. This includes capabilities such as:

*   Sample-wise standardization
*   Feature-wise standardization
*   ZCA whitening
*   Random rotation, shifts, shear and flips
*   Dimension reordering
*   Save augmented images to disk

Rather than performing the operations on the entire image dataset in memory, the API generates data during the training process. This reduces memory overhead, but adds computational cost during model training.

First plot the original data:

In [0]:
from matplotlib import pyplot
for i in range(0, 9):
	pyplot.subplot(330 + 1 + i)
	pyplot.imshow(x_train[i].reshape(28,28), cmap=pyplot.get_cmap('gray'))
pyplot.show()

Now add std normalization, random rotations and shifts:

In [0]:
datagen = tf.keras.preprocessing.image.ImageDataGenerator(featurewise_center=True,
                 featurewise_std_normalization=True,
                 rotation_range=60.,
                 width_shift_range=0.1,
                 height_shift_range=0.1)

# You can add a seed for reproducibility
seed = 1
datagen.fit(x_train, augment=True, seed=seed)

# Plot augmented data
for x_batch, y_batch in datagen.flow(x_train, y_train, batch_size=9):
	# create a grid of 3x3 images
	for i in range(0, 9):
		pyplot.subplot(330 + 1 + i)
		pyplot.imshow(x_batch[i].reshape(28, 28), cmap=pyplot.get_cmap('gray'))
	# show the plot
	pyplot.show()
	break

We can use the data generator in training by calling **fit_generator** instead of fit. Or we can generate the dataset ourselves and pass it to fit:



In [0]:
model = create_model()
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(datagen.flow(x_train, y_train, batch_size=10),
                    steps_per_epoch=len(x_train) * 10, epochs=1)

# Tensorboard

TensorBoard is an excellent tool for model and training visualization. The Keras API comes with several TensorBoard visualizations that can be used by installing **Tensoboard Callbacks**. We are going train a Keras model on Colab and visualize it while training with TensorBoard.





There is one thing we need to address before we start using tensorboard. Your Google Colab virtual machine is running on a local network located in a Google's server room, while your local machine could be anywhere else in the world. How to access the TensorBoard page from our local machine? We are going to use a free service named **ngrok** to tunnel the connection to your local machine.

Here is a graph to show how it works:

<img src="https://gitcdn.xyz/cdn/Tony607/blog_statics/d425c3fe4cf0d92067572e25ae6cc3198d51936b//images/ngrok/ngrok.jpg" alt="Drawing" style="width: 200px;"/>

Download and install ngrok

In [0]:
!wget https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
!unzip ngrok-stable-linux-amd64.zip

Run TensorBoard. Usually you would run the tensorboard server on localhost and connect from any browser, but here we are using ip 0.0.0.0. We use get_ipython().system_raw instead of the ! sign to be able to run in a separate thread (&)

In [0]:
LOG_DIR = './log'
get_ipython().system_raw(
    'tensorboard --logdir {} --host 127.0.0.1 --port 6006 &'
    .format(LOG_DIR)
)

Run ngrok. Also in a seaparate thread. The port must match the one passed to the tensorboard server.

In [0]:
get_ipython().system_raw('./ngrok http 6006 &')

Run the next command to obtain you network tunnel url. We will open this url in a browser after starting the training.

In [0]:
! curl -s http://localhost:4040/api/tunnels | python3 -c \
    "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"

Run a model with tensorboard

In [0]:
#TensorBoard callback
tensorboard = tf.keras.callbacks.TensorBoard(
                          log_dir='./log',
                          histogram_freq=10,
                          write_graph=True,
                          write_images=True,
                          update_freq=600000) #samples

model = create_model()
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy', 'mse'])
model.fit(x_train, y_train, epochs=1000, verbose=1, validation_data=(x_test, y_test), callbacks=[tensorboard])

Clean the runtime:

In [0]:
!ps aux | grep tensorboard
!ps aux | grep ngrok

In [0]:
!kill 881



In [0]:
!rm -rf log


*   ngrok has a limitation of **20 updates per minute**
*   sparse_categorical_crossentropy metric is not working on training in TF 2.0



# Deep Learning Basics


## Machine learning problem types

**Supervised**:
**unsupervised**:
**reinforced**:

**Classification Tasks**: Image level classification, Region level classification (Detection, Localization), Pixel level classification (Segmentation)

![Classification](https://developer.nvidia.com/sites/default/files/akamai/embedded/images/images/deep-vision-primitives.png)

**Regression**: from a group of points, get the function that best describes the data

![regression](https://upload.wikimedia.org/wikipedia/commons/thumb/3/3a/Linear_regression.svg/1200px-Linear_regression.svg.png)

**Clustering**: from a group of points, get groups with similar characteristics

![clusteringt](https://media.geeksforgeeks.org/wp-content/uploads/k-means-copy.jpg)

**Anomaly detection**

![anomaly](https://numenta.com/wp-content/uploads/2019/03/anomaly-detection-image.png)






## Convolutional Neural Networks (CNN)

**Image Analisys**

Classical Pattern Recognition had the researchers manually analyse the dataset in order to determine the most important features.

![alt text](https://neuralocean.de/wp-content/uploads/2018/03/patterrecognition-pipeline.png)

Let us assume that we want to create a neural network model that is capable of recognizing swans in images. The swan has certain characteristics that can be used to help determine whether a swan is present or not, such as its long neck, its white color, etc.

![alt text](https://miro.medium.com/max/2824/1*tfw7GTZKq96uW2oDKy0PFw.png)

For some images, it may be more difficult to determine whether a swan is present, consider the following image.

![alt text](https://miro.medium.com/max/2984/1*F4Hi-OtXMVs_y5-MxS97vA.png)

Can it get any worse? It definitely can.

![alt text](https://miro.medium.com/max/2020/1*HCbbEQv9x5kIQc39SrWvAQ.png)
![alt text](https://i.ibb.co/qsMXg3K/Image-from-i-OS.jpg)


Humans were designing these feature detectors, and that made them either too simple or hard to generalize.

*   What if we learned the features to detect?
*   We need a system that can do Representation Learning (or Feature Learning).


**Traditional neural networks (Multilayer perceptron)**

![alt text](https://miro.medium.com/max/500/1*BQ0SxdqC9Pl_3ZQtd3e45A.png)



*   Activation
*   Error
*   Back propagation


**How can we use classic NN with images?**

**What problems do you see with classic NN?**




**CNN**

CNN uses convolutions to resolve classic NN problems


![alt text](https://miro.medium.com/max/1950/1*p-_47puSuVNmRJnOXYPQCg.png)


Classical NN:


*   Do not scale well for images
*   Ignore spatial information
*   Cannot handle translations


CNN’s:

*   Pixel position has semantic meanings
*   Elements of interest can appear anywhere in the image

**Feature learning**

![alt text](https://miro.medium.com/max/1910/1*fLGuAUT5imTIGAeA4zzaWA.png)

**Standard CNN**
![alt text](https://miro.medium.com/max/2640/0*jXPgL1T2Gu5L4WSp.png)

**Example:**

In this example, you can try out using tf.keras and Cloud TPUs to train a model on the fashion MNIST dataset. **This examples uses a TPU runtime**


### Download data

Begin by downloading the fashion MNIST dataset using `tf.keras.datasets`, as shown below.

In [0]:
import tensorflow as tf
import numpy as np

import distutils
if distutils.version.LooseVersion(tf.__version__) < '1.14':
    raise Exception('This notebook is compatible with TensorFlow 1.14 or higher, for TensorFlow 1.13 or lower please use the previous version at https://github.com/tensorflow/tpu/blob/r1.13/tools/colab/fashion_mnist.ipynb')

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()

# add empty color dimension
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)

### Build the model
The following model has some errors, find those erros and add the necessary layers to improve the inference results:

Hints:


*   Can the model learn complex features with only one convolutional layer?
*   How to prevent overfitting?
*   Are the activation layers ok?
*   `relu` removes negative values. what happens if we use relu on the input layer?






In [0]:
def create_model():
  model = tf.keras.models.Sequential()
  model.add(tf.keras.layers.Conv2D(64, (5, 5), padding='same', activation='relu'))
  model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2), strides=(2,2)))

  model.add(tf.keras.layers.Flatten())
  model.add(tf.keras.layers.Dense(10))
  model.add(tf.keras.layers.Activation('relu'))
  return model

### Train the model

In [0]:
import os

resolver = tf.contrib.cluster_resolver.TPUClusterResolver('grpc://' + os.environ['COLAB_TPU_ADDR'])
tf.contrib.distribute.initialize_tpu_system(resolver)
strategy = tf.contrib.distribute.TPUStrategy(resolver)

with strategy.scope():
  model = create_model()
  model.compile(
      optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3, ),
      loss='sparse_categorical_crossentropy',
      metrics=['sparse_categorical_accuracy'])

model.fit(
    x_train.astype(np.float32), y_train.astype(np.float32),
    epochs=17,
    steps_per_epoch=60,
    validation_data=(x_test.astype(np.float32), y_test.astype(np.float32)),
    validation_freq=17
)

model.save_weights('./fashion_mnist.h5', overwrite=True)

Evaluate the model

In [0]:
LABEL_NAMES = ['t_shirt', 'trouser', 'pullover', 'dress', 'coat', 'sandal', 'shirt', 'sneaker', 'bag', 'ankle_boots']


cpu_model = create_model()
cpu_model.load_weights('./fashion_mnist.h5')

from matplotlib import pyplot
%matplotlib inline

def plot_predictions(images, predictions):
  n = images.shape[0]
  nc = int(np.ceil(n / 4))
  f, axes = pyplot.subplots(nc, 4)
  for i in range(nc * 4):
    y = i // 4
    x = i % 4
    axes[x, y].axis('off')
    
    label = LABEL_NAMES[np.argmax(predictions[i])]
    confidence = np.max(predictions[i])
    if i > n:
      continue
    axes[x, y].imshow(images[i])
    axes[x, y].text(0.5, 0.5, label + '\n%.3f' % confidence, fontsize=14)

  pyplot.gcf().set_size_inches(8, 8)  

plot_predictions(np.squeeze(x_test[:16]), 
                 cpu_model.predict(x_test[:16]))

## Long Short-term Memory (LSTM)

![alt text](https://upload.wikimedia.org/wikipedia/commons/3/3b/The_LSTM_cell.png)

In this example, you train the model on the combined works of William Shakespeare, then use the model to compose a play in the style of *The Great Bard*:

<blockquote>
Loves that led me no dumbs lack her Berjoy's face with her to-day.  
The spirits roar'd; which shames which within his powers  
	Which tied up remedies lending with occasion,  
A loud and Lancaster, stabb'd in me  
	Upon my sword for ever: 'Agripo'er, his days let me free.  
	Stop it of that word, be so: at Lear,  
	When I did profess the hour-stranger for my life,  
	When I did sink to be cried how for aught;  
	Some beds which seeks chaste senses prove burning;  
But he perforces seen in her eyes so fast;  
And _  
</blockquote>

**This example uses a TPU runtime**



### Download data

Download *The Complete Works of William Shakespeare* as a single text file from [Project Gutenberg](https://www.gutenberg.org/). You use snippets from this file as the *training data* for the model. The *target* snippet is offset by one character.

In [0]:
!wget --show-progress --continue -O /content/shakespeare.txt http://www.gutenberg.org/files/100/100-0.txt

In [0]:
import numpy as np
import tensorflow as tf
import os

import distutils
if distutils.version.LooseVersion(tf.__version__) < '1.14':
    raise Exception('This notebook is compatible with TensorFlow 1.14 or higher, for TensorFlow 1.13 or lower please use the previous version at https://github.com/tensorflow/tpu/blob/r1.13/tools/colab/shakespeare_with_tpu_and_keras.ipynb')

# This address identifies the TPU we'll use when configuring TensorFlow.
TPU_WORKER = 'grpc://' + os.environ['COLAB_TPU_ADDR']

SHAKESPEARE_TXT = '/content/shakespeare.txt'

def transform(txt):
  return np.asarray([ord(c) for c in txt if ord(c) < 255], dtype=np.int32)

def input_fn(seq_len=100, batch_size=1024):
  """Return a dataset of source and target sequences for training."""
  with tf.io.gfile.GFile(SHAKESPEARE_TXT, 'r') as f:
    txt = f.read()

  source = tf.constant(transform(txt), dtype=tf.int32)

  ds = tf.data.Dataset.from_tensor_slices(source).batch(seq_len+1, drop_remainder=True)

  def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text

  BUFFER_SIZE = 10000
  ds = ds.map(split_input_target).shuffle(BUFFER_SIZE).batch(batch_size, drop_remainder=True)

  return ds.repeat()

### Build the model

The model is defined as a two-layer, forward-LSTM, the same model should work both on CPU and TPU.

Because our vocabulary size is 256, the input dimension to the Embedding layer is 256.

When specifying the arguments to the LSTM, it is important to note how the stateful argument is used. When training we will make sure that `stateful=False` because we do want to reset the state of our model between batches, but when sampling (computing predictions) from a trained model, we want `stateful=True` so that the model can retain information across the current batch and generate more interesting text.

In [0]:
EMBEDDING_DIM = 512

def lstm_model(seq_len=100, batch_size=None, stateful=True):
  """Language model: predict the next word given the current word."""
  source = tf.keras.Input(
      name='seed', shape=(seq_len,), batch_size=batch_size, dtype=tf.int32)

  embedding = tf.keras.layers.Embedding(input_dim=256, output_dim=EMBEDDING_DIM)(source)
  lstm_1 = tf.keras.layers.LSTM(EMBEDDING_DIM, stateful=stateful, return_sequences=True)(embedding)
  lstm_2 = tf.keras.layers.LSTM(EMBEDDING_DIM, stateful=stateful, return_sequences=True)(lstm_1)
  predicted_char = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(256, activation='softmax'))(lstm_2)
  return tf.keras.Model(inputs=[source], outputs=[predicted_char])

### Train the model


In [0]:
tf.keras.backend.clear_session()

resolver = tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)
tf.contrib.distribute.initialize_tpu_system(resolver)
strategy = tf.contrib.distribute.TPUStrategy(resolver)

with strategy.scope():
  training_model = lstm_model(seq_len=100, stateful=False)
  training_model.compile(
      optimizer=tf.keras.optimizers.RMSprop(learning_rate=0.01),
      loss='sparse_categorical_crossentropy',
      metrics=['sparse_categorical_accuracy'])

training_model.fit(
    input_fn(),
    steps_per_epoch=100,
    epochs=10
)
training_model.save_weights('/tmp/bard.h5', overwrite=True)

### Evaluate the model

In [0]:
BATCH_SIZE = 5
PREDICT_LEN = 250

# Keras requires the batch size be specified ahead of time for stateful models.
# We use a sequence length of 1, as we will be feeding in one character at a 
# time and predicting the next character.
prediction_model = lstm_model(seq_len=1, batch_size=BATCH_SIZE, stateful=True)
prediction_model.load_weights('/tmp/bard.h5')

# We seed the model with our initial string, copied BATCH_SIZE times

seed_txt = 'Looks it not like the king?  Verily, we must go! '
seed = transform(seed_txt)
seed = np.repeat(np.expand_dims(seed, 0), BATCH_SIZE, axis=0)

# First, run the seed forward to prime the state of the model.
prediction_model.reset_states()
for i in range(len(seed_txt) - 1):
  prediction_model.predict(seed[:, i:i + 1])

# Now we can accumulate predictions!
predictions = [seed[:, -1:]]
for i in range(PREDICT_LEN):
  last_word = predictions[-1]
  next_probits = prediction_model.predict(last_word)[:, 0, :]
  
  # sample from our output distribution
  next_idx = [
      np.random.choice(256, p=next_probits[i])
      for i in range(BATCH_SIZE)
  ]
  predictions.append(np.asarray(next_idx, dtype=np.int32))
  

for i in range(BATCH_SIZE):
  print('PREDICTION %d\n\n' % i)
  p = [predictions[j][i] for j in range(PREDICT_LEN)]
  generated = ''.join([chr(c) for c in p])  # Convert back to text
  print(generated)
  print()
  assert len(generated) == PREDICT_LEN, 'Generated text too short'

# More example notebooks

- [Shakespeare in 5 minutes with Cloud TPUs and Keras](https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/shakespeare_with_tpu_and_keras.ipynb)
- [Shakespeare in 5 minutes with Cloud TPUs via TPUEstimator](https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/shakespeare_with_tpuestimator.ipynb)
- [Fashion MNIST with Keras and TPUs](https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/fashion_mnist.ipynb)
- [Neural Style Transfer](https://research.google.com/seedbank/seed/neural_style_transfer_with_tfkeras): Use deep learning to transfer style between images.
- [EZ NSynth](https://research.google.com/seedbank/seed/ez_nsynth): Synthesize audio with WaveNet auto-encoders.
- [Fashion MNIST with Keras and TPUs](https://research.google.com/seedbank/seed/fashion_mnist_with_keras_and_tpus): Classify fashion-related images with deep learning.
- [DeepDream](https://research.google.com/seedbank/seed/deepdream): Produce DeepDream images from your own photos.
- [Convolutional VAE](https://research.google.com/seedbank/seed/convolutional_vae): Create a generative model of handwritten digits.