[View in Colaboratory](https://colab.research.google.com/github/natemurthy/misc/blob/master/colab.research.google.com/TensorFlow_with_GPU.ipynb)

# Testing out the TPU connection

First, you'll need to enable TPUs for the notebook.

Navigate to Edit→Notebook Settings, and select TPU from the Hardware Accelerator drop-down (you can also access Notebook Settings via the command palette: cmd/ctrl-shift-P).

Next, we'll check that we can connect to the TPU.

In [1]:
import os
import tensorflow as tf

tpu_address = 'grpc://' + os.environ['COLAB_TPU_ADDR']
print ('TPU address is', tpu_address)

with tf.Session(tpu_address) as session:
  devices = session.list_devices()
  
print ('TPU devices:')
devices


TPU address is grpc://10.22.86.138:8470
TPU devices:


[_DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:CPU:0, CPU, -1, 5367837999772260330),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 10100309180633096616),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:XLA_GPU:0, XLA_GPU, 17179869184, 16957089878897873948),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 15576818800597952368),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, 6563325799648670620),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:2, TPU, 17179869184, 280552207182685731),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:3, TPU, 17179869184, 11447240344446465732),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:4, TPU, 17179869184, 13831751451533610930),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:5, TPU, 17179869184, 12942592464113598861),
 _DeviceAttributes(/job:tpu_worker

If the cell above reports an error, make sure that you have enabled TPU support in the notebook settings. (Edit menu → Notebook settings)

Now, let's try a simple computation.

In [2]:
import numpy as np

def add_op(x, y):
  return x + y
  
x = tf.placeholder(tf.float32, [10,])
y = tf.placeholder(tf.float32, [10,])
tpu_ops = tf.contrib.tpu.rewrite(add_op, [x, y])
  
session = tf.Session(tpu_address)
try:
  print('Initializing...')
  session.run(tf.contrib.tpu.initialize_system())
  print('Running ops')
  print(session.run(tpu_ops, {x: np.arange(10), y: np.arange(10)}))
finally:
  # For now, TPU sessions must be shutdown separately from
  # closing the session.
  session.run(tf.contrib.tpu.shutdown_system())
  session.close()

Initializing...
Running ops
[array([ 0.,  2.,  4.,  6.,  8., 10., 12., 14., 16., 18.], dtype=float32)]


# TPU FLOPs

Finally, we'll try a small test of floating point computations (floating point operations per seconds. (The units are FLOPS: floating point operations per second.)

In [10]:
N = 4096
COUNT = 100
import time

def flops():
  x = tf.random_uniform([N, N])
  y = tf.random_uniform([N, N])
  def _matmul(x, y):
    return tf.tensordot(x, y, axes=[[1], [0]]), y

  return tf.reduce_sum(
    tf.contrib.tpu.repeat(COUNT, _matmul, [x, y])
  )
  
tpu_ops = tf.contrib.tpu.batch_parallel(flops, [], num_shards=8)
  
session = tf.Session(tpu_address)
try:
  print('Warming up...')
  session.run(tf.contrib.tpu.initialize_system())
  session.run(tpu_ops)
  print('Profiling')
  start = time.time()
  session.run(tpu_ops)
  end = time.time()
  elapsed = end - start
  print(elapsed, 'TFlops: {:.2f}'.format(1e-12 * 8 * COUNT * 2*N*N*N / elapsed))
finally:
  session.run(tf.contrib.tpu.shutdown_system())
  session.close()

Warming up...
Profiling
0.665576696395874 TFlops: 165.20


# Next steps

A more involved example is [Shakespeare in 5 minutes with Cloud TPUs and Keras](https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/shakespeare_with_tpu_and_keras.ipynb).

We'll be sharing more examples of TPU use in Colab over time, so be sure to check back for additional example links, or [follow us on Twitter @GoogleColab](https://twitter.com/googlecolab).

Meanwhile, you can check out the [TPUEstimator documentation on TensorFlow.org](https://www.tensorflow.org/api_docs/python/tf/contrib/tpu/TPUEstimator). TPUEstimator is an easy way to update models to take advantage of TPU acceleration.

In [2]:
import os
import tensorflow as tf
import timeit

B = 100
H = 100
W = 100
C = 3

def cpu():
  with tf.device('/cpu:0'):
    with tf.Session() as sess:
      random_image_cpu = tf.random_normal((B, H, W, C))
      net_cpu = tf.layers.conv2d(random_image_cpu, 32, 7)
      net_cpu = tf.reduce_sum(net_cpu)
      sess.run(tf.global_variables_initializer())
      sess.run(net_cpu)
      sess.close()
  
def tpu():
  with tf.Session('grpc://' + os.environ['COLAB_TPU_ADDR']) as sess:
    random_image_tpu = tf.random_normal((B, H, W, C))
    net_tpu = tf.layers.conv2d(random_image_tpu, 32, 7)
    net_tpu = tf.reduce_sum(net_tpu)
    sess.run(tf.global_variables_initializer())
    sess.run(net_tpu)
    sess.close()

  
# Runs the op several times.
print('Time (s) to convolve 32x7x7x3 filter over random {}x{}x{}x{} images '
      '(batch x height x width x channel). Sum of ten runs.'.format(B,H,W,C))
print('CPU (s):')
cpu_time = timeit.timeit('cpu()', number=10, setup="from __main__ import cpu")
print(cpu_time)
print('TPU (s):')
tpu_time = timeit.timeit('tpu()', number=10, setup="from __main__ import tpu")
print(tpu_time)
print('TPU speedup over CPU: {}x'.format(int(cpu_time/tpu_time)))

Time (s) to convolve 32x7x7x3 filter over random 100x100x100x3 images (batch x height x width x channel). Sum of ten runs.
CPU (s):
10.55410370300001
TPU (s):
1.433097946000089
TPU speedup over CPU: 7x


In [3]:
def bench_tpu():
  X = tf.random_normal([2000000, 10000], mean=-1, stddev=4)
  Y = tf.random_normal([10000, 1], mean=-1, stddev=4)

  with tf.Session('grpc://' + os.environ['COLAB_TPU_ADDR']) as sess:
    sess.run(tf.matmul(X,Y))
    sess.close()
    
print('TPU (s):')
tpu_time = timeit.timeit('bench_tpu()', number=1, setup="from __main__ import bench_tpu")
print(tpu_time)


TPU (s):
24.28661741699989
