<a href="https://colab.research.google.com/github/hrishikeshmane/MNIST_DRAW/blob/training-on-tpu-colab/Training/notebooks/MNIST_on_TPU.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

We Use GPUs for training our DNNs and CNNs models faster but TPUs can be way faster than GPUs while training your models.

Unfortunately, training on TPU is a bit different than training on CPUs and GPUs. So here's a small demo for training a CNN on TPU for MNIST dataset

### Install Tensorflow 1.12.0 and make sure you ***reset*** the environment before executing further cells

### Also Change runtime type to **TPU**
      Runtime -> Change runtime type -> Select TPU from Hardware Accelerator

In [22]:
!pip install tensorflow==1.12.0

Collecting tensorflow==1.12.0
  Using cached https://files.pythonhosted.org/packages/22/cc/ca70b78087015d21c5f3f93694107f34ebccb3be9624385a911d4b52ecef/tensorflow-1.12.0-cp36-cp36m-manylinux1_x86_64.whl
Collecting tensorboard<1.13.0,>=1.12.0
  Using cached https://files.pythonhosted.org/packages/07/53/8d32ce9471c18f8d99028b7cef2e5b39ea8765bd7ef250ca05b490880971/tensorboard-1.12.2-py3-none-any.whl
Installing collected packages: tensorboard, tensorflow
  Found existing installation: tensorboard 1.11.0
    Uninstalling tensorboard-1.11.0:
      Successfully uninstalled tensorboard-1.11.0
  Found existing installation: tensorflow 1.12.0rc1
    Uninstalling tensorflow-1.12.0rc1:
      Successfully uninstalled tensorflow-1.12.0rc1
Successfully installed tensorboard-1.12.2 tensorflow-1.12.0


In [0]:
import numpy as np
import tensorflow as tf
import time
import os

import tensorflow.keras
from tensorflow.keras.datasets import mnist, fashion_mnist
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Dropout, Flatten,Input
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras import backend as K

In [2]:
print(tf.__version__)

1.12.0


In [3]:
# check for TPU
try:
  device_name = os.environ['COLAB_TPU_ADDR']
  TPU_ADDRESS = 'grpc://' + device_name
  print('Found TPU at: {}'.format(TPU_ADDRESS))

except KeyError:
  print('TPU not found')

Found TPU at: grpc://10.13.189.242:8470


Note that batch size here is significant higher than what we use on CPU or GPU. One of the perks of using TPU for fast training

In [0]:
batch_size = 1280
num_classes = 10
epochs = 5
learning_rate = 0.001

img_rows, img_cols = 28, 28

# download and load the data 
(x_train, y_train),(x_test, y_test) = mnist.load_data()

In [0]:
# expand the channel dimension
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)

In [21]:
# normalize the value of pixels from [0, 255] to [0, 1] for further process
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples


In [0]:
# convert class vectors to binary class matrics
y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)

In [0]:
def train_input_fn(batch_size=1280):
  # Convert entries to a data set. 
  dataset = tf.data.Dataset.from_tensor_slices((x_train,y_train)) 
  dataset = dataset.shuffle(1000).repeat().batch(batch_size, drop_remainder=True)
  return dataset


def test_input_fn(batch_size=1280):
  # Convert entries to a data set.
  dataset = tf.data.Dataset.from_tensor_slices((x_test,y_test)) 
  dataset = dataset.shuffle(1000).repeat().batch(batch_size, drop_remainder=True)
  return dataset

In [0]:
Inp = tf.keras.Input(
    name='input', shape=input_shape, batch_size=batch_size, dtype=tf.float32)

x = Conv2D(32, kernel_size=(3, 3), activation='relu',name = 'Conv_01')(Inp)
x = MaxPooling2D(pool_size=(2, 2),name = 'MaxPool_01')(x)
x = Conv2D(64, (3, 3), activation='relu',name = 'Conv_02')(x)
x = MaxPooling2D(pool_size=(2, 2),name = 'MaxPool_02')(x)
x = Conv2D(64, (3, 3), activation='relu',name = 'Conv_03')(x)
x = Flatten(name = 'Flatten_01')(x)
x = Dense(64, activation='relu',name = 'Dense_01')(x)
x = Dropout(0.5,name = 'Dropout_02')(x)

output = Dense(num_classes, activation='softmax',name = 'Dense_02')(x)

In [0]:
model = tf.keras.Model(inputs=[Inp], outputs=[output])

In [0]:
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['acc'])

In [27]:
# model conversion
tpu_model = tf.contrib.tpu.keras_to_tpu_model(
    model,
    strategy=tf.contrib.tpu.TPUDistributionStrategy(
        tf.contrib.cluster_resolver.TPUClusterResolver(TPU_ADDRESS)))

INFO:tensorflow:Querying Tensorflow master (b'grpc://10.13.189.242:8470') for TPU system metadata.
INFO:tensorflow:Found TPU system:
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, -1, 13512207836376184998)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 12748413851137428604)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 5999868185035103339)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, 10259209721630297050)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 17179869184, 1292729078200471710)
INFO:tensorflow:*** Available Device: _DeviceAttrib

In [28]:
tpu_model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input (InputLayer)           (1280, 28, 28, 1)         0         
_________________________________________________________________
Conv_01 (Conv2D)             (1280, 26, 26, 32)        320       
_________________________________________________________________
MaxPool_01 (MaxPooling2D)    (1280, 13, 13, 32)        0         
_________________________________________________________________
Conv_02 (Conv2D)             (1280, 11, 11, 64)        18496     
_________________________________________________________________
MaxPool_02 (MaxPooling2D)    (1280, 5, 5, 64)          0         
_________________________________________________________________
Conv_03 (Conv2D)             (1280, 3, 3, 64)          36928     
_________________________________________________________________
Flatten_01 (Flatten)         (1280, 576)               0         
__________

In [29]:
# train
tpu_model.fit(
  train_input_fn,
  steps_per_epoch = 60000//batch_size,
  epochs=10,
)

Epoch 1/10
INFO:tensorflow:New input shapes; (re-)compiling: mode=train (# of cores 8), [TensorSpec(shape=(1280,), dtype=tf.int32, name=None), TensorSpec(shape=(1280, 28, 28, 1), dtype=tf.float32, name=None), TensorSpec(shape=(1280, 10), dtype=tf.float32, name=None)]
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Cloning Adam {'lr': 0.0010000000474974513, 'beta_1': 0.8999999761581421, 'beta_2': 0.9990000128746033, 'decay': 0.0, 'epsilon': 1e-07, 'amsgrad': False}
INFO:tensorflow:Remapping placeholder for input
INFO:tensorflow:KerasCrossShard: <tensorflow.python.keras.optimizers.Adam object at 0x7f4d7e4b6978> []
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 3.4039101600646973 secs
INFO:tensorflow:Setting weights on TPU model.
INFO:tensorflow:CPU -> TPU lr: 0.0010000000474974513 {0.001}
INFO:tensorflow:CPU -> TPU beta_1: 0.8999999761581421 {0.9}
INFO:tensorflow:CPU -> TPU beta_2: 0.9990000128746033 {0.999}
INFO:tensorflow:CPU -> TPU 

<tensorflow.python.keras.callbacks.History at 0x7f4d7eba5da0>

In [30]:
# evaluate
tpu_model.evaluate(test_input_fn, steps = 100)

INFO:tensorflow:New input shapes; (re-)compiling: mode=eval (# of cores 8), [TensorSpec(shape=(1280,), dtype=tf.int32, name=None), TensorSpec(shape=(1280, 28, 28, 1), dtype=tf.float32, name=None), TensorSpec(shape=(1280, 10), dtype=tf.float32, name=None)]
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Cloning Adam {'lr': 0.0010000000474974513, 'beta_1': 0.8999999761581421, 'beta_2': 0.9990000128746033, 'decay': 0.0, 'epsilon': 1e-07, 'amsgrad': False}
INFO:tensorflow:Remapping placeholder for input
INFO:tensorflow:KerasCrossShard: <tensorflow.python.keras.optimizers.Adam object at 0x7f4d7ca91d68> []
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 2.658039093017578 secs


[0.02362885305657983, 0.9914785271883011]

In [14]:
tpu_model.save_weights('MNIST_TPU.h5', overwrite=True)

INFO:tensorflow:Copying TPU weights to the CPU
INFO:tensorflow:TPU -> CPU lr: 0.0010000000474974513
INFO:tensorflow:TPU -> CPU beta_1: 0.8999999761581421
INFO:tensorflow:TPU -> CPU beta_2: 0.9990000128746033
INFO:tensorflow:TPU -> CPU decay: 0.0
INFO:tensorflow:TPU -> CPU epsilon: 1e-07
INFO:tensorflow:TPU -> CPU amsgrad: False
