In [None]:
%matplotlib inline
import matplotlib
import seaborn as sns
sns.set()
matplotlib.rcParams['figure.dpi'] = 144

<!-- requirement: small_data/cal_house.json.gz -->
# Optimization with the Computation Graph

TensorFlow is supposed to be optimized for the mathematical operations common in machine learning. Let's benchmark the two linear regression classes from the [TF_Intro_TensorFlow](TF_Intro_TensorFlow.ipynb) notebook, one of which is based on NumPy and one of which is based on TensorFlow.

In [None]:
import tensorflow as tf
import numpy as np

In [None]:
class LinearRegressionNP():
    def __init__(self, eta=.1):
        self.W = 0
        self.b = 0
        self.eta = eta
    
    def loss(self, X, y):
        return ((X * self.W + self.b - y) ** 2).mean()
    
    def _gradients(self, X, y):
        return {'W': (2 * X * (X * self.W + self.b - y)).mean(),
                'b': (2 * (X * self.W + self.b - y)).mean()}
        
    def fit(self, X, y, steps=10):
        for _ in range(steps):
            gradients = self._gradients(X, y)
            self.W = self.W - self.eta * gradients['W']
            self.b = self.b - self.eta * gradients['b']
            
        return self

In [None]:
class LinearRegressionTF():
    def __init__(self, eta=.1):
        self.W = tf.Variable(0.)
        self.b = tf.Variable(0.)
        self.opt = tf.keras.optimizers.SGD(learning_rate=eta)
    
    def loss(self, X, y, return_func=False):
        def loss_():
            return tf.reduce_mean(tf.square(X * self.W + self.b - y))
        
        if not return_func:
            return loss_()
        
        return loss_
    
    def fit(self, X, y, steps=10):
        for _ in range(steps):
            self.opt.minimize(self.loss(X, y, return_func=True), [self.W, self.b])
            
        return self

In [None]:
np_model = LinearRegressionNP()
tf_model = LinearRegressionTF()

In [None]:
import gzip
import json
from sklearn.model_selection import ShuffleSplit

with gzip.open("small_data/cal_house.json.gz", "r") as fin:
    housing = json.load(fin)
    
for train, test in ShuffleSplit(1, 0.2, random_state=42).split(housing['data']):
    X_train = np.array(housing['data'])[train].astype(np.float32)
    y_train = np.array(housing['target'])[train].astype(np.float32)
    X_test = np.array(housing['data'])[test].astype(np.float32)
    y_test = np.array(housing['target'])[test].astype(np.float32)

In [None]:
%%timeit
np_model.fit(X_train[:, 0:1], y_train)

In [None]:
%%timeit
tf_model.fit(X_train[:, 0:1], y_train)

The NumPy model is faster (slightly) than the TensorFlow model! What's going on here?

TensorFlow's computational advantage comes in two forms:

1. Making use of a computational graph. The graph is compiled, allowing for fast execution.
2. TensorFlow can place computations on hardware accelerators (GPUs, TPUs).

## Computation Graph

TensorFlow can record tensors and operations on them in a data structure called a computation graph. This graph can be compiled before runtime for faster execution (e.g. operations can be parallelized based on analysis of the graph during compilation). The graph also enables TensorFlow to compute gradients more directly since the results of operations don't have to be accumulated as they are executed. 

In TensorFlow 2.0 we mark which computations should be entered into the graph using the `tf.function` decorator. In TensorFlow 1.x, all computation was accomplished through the computation graph. To contrast these two approaches, the graph generated by `tf.function` is sometimes referred to as the [autograph](https://www.tensorflow.org/beta/guide/autograph).

In [None]:
def mse(W, b, X, y):
    return tf.reduce_mean(tf.square(X * W + b - y))
    
def gradients(W, b, X, y):
    with tf.GradientTape() as tape:
        loss = mse(W, b, X, y)
        
    return tape.gradient(loss, [W, b])

@tf.function
def gradients_fn(W, b, X, y):
    with tf.GradientTape() as tape:
        loss = mse(W, b, X, y)
        
    return tape.gradient(loss, [W, b])

X = tf.random.uniform((10000, 1))
y = tf.random.uniform((10000, 1))
W = tf.Variable(0.)
b = tf.Variable(0.)

assert tf.math.reduce_all(tf.equal(gradients(W, b, X, y), gradients_fn(W, b, X, y)))

In [None]:
%%timeit
gradients(W, b, X, y)

In [None]:
%%timeit
gradients_fn(W, b, X, y)

Note that the graph is compiled at the time of the first function call (known as just-in-time, JIT, compilation). We only realize faster computation upon subsequent function calls.

In [None]:
import time

@tf.function
def gradients_fn(W, b, X, y):
    with tf.GradientTape() as tape:
        loss = mse(W, b, X, y)
        
    return tape.gradient(loss, [W, b])

start = time.time()
gradients_fn(W, b, X, y)
print("Time elapsed on first call: {}".format(time.time() - start))

start = time.time()
gradients_fn(W, b, X, y)
print("Time elapsed on second call: {}".format(time.time() - start))

Furthermore, compilation takes place any time the arguments different in `shape` or `dtype` from the previous arguments.

In [None]:
i = 100

start = time.time()
gradients_fn(W, b, X[:i], y[:i])
print("Time elapsed on first call: {}".format(time.time() - start))

start = time.time()
gradients_fn(W, b, X[:i], y[:i])
print("Time elapsed on second call: {}".format(time.time() - start))

This means that we typically want to supply a `tf.function` with tensors (or NumPy arrays) as arguments. Python native types do not have `shape` or `dtype` information, and therefore the graph will be recompiled any time the values change.

In [None]:
@tf.function
def gradients_fn(W, b, X, y):
    with tf.GradientTape() as tape:
        loss = mse(W, b, X, y)
        
    return tape.gradient(loss, [W, b])

X_np = np.random.random((10000, 1)).astype(np.float32).tolist()
y_np = np.random.random((10000, 1)).astype(np.float32).tolist()

start = time.time()
gradients_fn(W, b, X_np, y_np)
print("Time elapsed on first call: {}".format(time.time() - start))

# UNCOMMENT TO CONTRAST
# X_np = np.random.random((10000, 1)).astype(np.float32).tolist()
# y_np = np.random.random((10000, 1)).astype(np.float32).tolist()

start = time.time()
gradients_fn(W, b, X_np, y_np)
print("Time elapsed on second call: {}".format(time.time() - start))

### Exercise: Optimize `LinearRegressionTF`

Use the `tf.function` to optimize `LinearRegressionTF`. Demonstrate that the `LinearRegressionTF.fit` runs faster than `LinearRegressionNP.fit`.

## Using accelerators (GPUs/TPUs)

Making use of GPUs requires additional drivers and libraries that are not packaged with TensorFlow by default. Instead of installing `tensorflow` (which is installed in this environment) with your package manager, you would install `tensorflow-gpu`. Furthermore, your machine must have the compatible hardware, which you can check in TensorFlow.

In [None]:
print('TensorFlow built with GPU support: {}'.format(tf.test.is_built_with_cuda()))
print('Compatible GPU installed: {}'.format(tf.test.is_gpu_available()))

Each tensor and operation has a device attribute that describes what hardware is responsible for its evaluation.

In [None]:
X.device

TensorFlow will automatically preferentially place operations with GPU implementations on GPUs. If you need to exercise more manual control, TensorFlow provides a context manager for controlling on what device a tensor or operation resides.

In [None]:
with tf.device('/CPU:0'):
    on_cpu = tf.constant([[1, 2,], [3, 4]])
    
on_cpu.device

Unfortunately due to limitations of the environment, we only have one device available for demonstrating this facet of TensorFlow, but you can [read more about this capability in the TensorFlow documentation](https://www.tensorflow.org/beta/guide/using_gpu).

*Copyright &copy; 2019 Pragmatic Institute. This content is licensed solely for personal use. Redistribution or publication of this material is strictly prohibited.*