# 12. Custom Models and Training with TensorFlow

After introducing Keras high-level API, which will be good for most of our everyday use cases. In this chapter we will dive deeper in the lower level API.

This Chapter uses TF2. 

### Quick tour of TensorFlow

Here is a summary of what TensorFlow offers:

* Core similar to NumPy but with GPU support
* Distributed computing support
* Just-in-time (JIT) compiler that allows it to optimize computations for speed and memory usage. It works by: 
    1. Extracting **computation graph** from Py function
    2. Optimizing it (e.g., by pruning unused nodes)
    3. Running it efficiently (e.g., by automatically running independent operations in parallel)
* Exportable computation graph, potentially allowing to train in an env and run in another
* Implements autodiff and provides optimizers

### Using TensorFlow like NumPy

TensorFlow’s API revolves around **tensors**, which **flow** from operation to operation—hence the name _TensorFlow_. A tensor is usually a multidimensional array (exactly like a NumPy `ndarray`), but it can also hold a scalar. 

#### Tensors and Operations

We can create a tensor with `tf.constant()`:

In [1]:
import tensorflow as tf

tf.enable_eager_execution()

# floats matrix 2x3 
tf.constant([[1., 2., 3.], [4., 5., 6.]])

<tf.Tensor: id=0, shape=(2, 3), dtype=float32, numpy=
array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)>

In [2]:
tf.constant(42)

<tf.Tensor: id=1, shape=(), dtype=int32, numpy=42>

In [3]:
t = tf.constant([[1., 2., 3.], [4., 5., 6.]])

In [4]:
t.shape

TensorShape([Dimension(2), Dimension(3)])

In [5]:
t.dtype

tf.float32

Indexing similar to NumPy:

In [6]:
t[:, 1:]

<tf.Tensor: id=6, shape=(2, 2), dtype=float32, numpy=
array([[2., 3.],
       [5., 6.]], dtype=float32)>

In [7]:
t[..., 1, tf.newaxis]

<tf.Tensor: id=10, shape=(2, 1), dtype=float32, numpy=
array([[2.],
       [5.]], dtype=float32)>

Tensor operations as we would expect them:  

In [8]:
t + 10

<tf.Tensor: id=12, shape=(2, 3), dtype=float32, numpy=
array([[11., 12., 13.],
       [14., 15., 16.]], dtype=float32)>

In [9]:
tf.square(t)

<tf.Tensor: id=13, shape=(2, 3), dtype=float32, numpy=
array([[ 1.,  4.,  9.],
       [16., 25., 36.]], dtype=float32)>

In [10]:
# matrix multiplication
t @ tf.transpose(t)

<tf.Tensor: id=16, shape=(2, 2), dtype=float32, numpy=
array([[14., 32.],
       [32., 77.]], dtype=float32)>

Generally, NumPy and TensorFlow are compatible in terms of operations.

**Note**: NumPy uses 64-bit precision by default, so don't forget to set it to `dtype=tf.float32` (more than enough for NNs).

#### Type Conversions

Tf doesn't allow operations between different types, or even different bit precisions. 

#### Variables

For things that need to change (e.g. weights) we would need to use `tf.Variable`:

In [11]:
v = tf.Variable([[1., 2., 3.], [4., 5., 6.]])

In [12]:
v

<tf.Variable 'Variable:0' shape=(2, 3) dtype=float32, numpy=
array([[1., 2., 3.],
       [4., 5., 6.]], dtype=float32)>

A `tf.Variable` acts much like a `tf.Tensor` but it can also be modified using the `assign()` method.

In [13]:
v.assign(2 * v)

<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=float32, numpy=
array([[ 2.,  4.,  6.],
       [ 8., 10., 12.]], dtype=float32)>

In [14]:
v[0, 1].assign(42)

<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=float32, numpy=
array([[ 2., 42.,  6.],
       [ 8., 10., 12.]], dtype=float32)>

In [15]:
v[:, 2].assign([0., 1.])

<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=float32, numpy=
array([[ 2., 42.,  0.],
       [ 8., 10.,  1.]], dtype=float32)>

In [16]:
v.scatter_nd_update(indices=[[0, 0], [1, 2]], updates=[100., 200.])

<tf.Variable 'UnreadVariable' shape=(2, 3) dtype=float32, numpy=
array([[100.,  42.,   0.],
       [  8.,  10., 200.]], dtype=float32)>

### Customizing Models and Training Algorithms

#### Custom Loss Functions

Let's assume we are working with a very noisy dataset. Let's also pretend that we want to implement Huber loss and it's not include in Keras (it actually is):

In [17]:
def huber_fn(y_true, y_pred):
    error = y_true - y_pred
    is_small_error = tf.abs(error) < 1
    squared_loss = tf.square(error) / 2
    linear_loss = tf.abs(error) - 0.5
    return tf.where(is_small_error, squared_loss, linear_loss)

Now we can use this loss when you compile the Keras model, then train your model:

In [18]:
model.compile(loss=huber_fn, optimizer="nadam")
model.fit(X_train, y_train, [...])

NameError: name 'model' is not defined

#### Saving and Loading Models That Contain Custom Components

In [19]:
model = keras.models.load_model("my_model_with_a_custom_loss.h5",
                                custom_objects={"huber_fn": huber_fn})

NameError: name 'keras' is not defined

What if we want to change the loss function threshold for _small errors_ (1 in the example above)? Then we will have to create a function that creates a configured loss function:

In [20]:
def create_huber(threshold=1.0):
    def huber_fn(y_true, y_pred):
        error = y_true - y_pred
        is_small_error = tf.abs(error) < threshold
        squared_loss = tf.square(error) / 2
        linear_loss = threshold * tf.abs(error) - threshold**2 / 2
        return tf.where(is_small_error, squared_loss, linear_loss)
    return huber_fn

model.compile(loss=create_huber(2.0), optimizer="nadam")

NameError: name 'model' is not defined

Unfortunately this threshold will not be saved, we have to specify it again: 

In [21]:
# using name of function, not name of function creating the function

model = keras.models.load_model("my_model_with_a_custom_loss_threshold_2.h5",
                                custom_objects={"huber_fn":
create_huber(2.0)})

NameError: name 'keras' is not defined

#### Custom Activation Functions, Initializers, Regularizers, and Constraints

Most Keras functionalities can be customized in pretty much the same way:

In [22]:
def my_softplus(z): # = tf.nn.softplus(z)
    return tf.math.log(tf.exp(z) + 1.0)

In [23]:
def my_glorot_initializer(shape, dtype=tf.float32):
    stddev = tf.sqrt(2. / (shape[0] + shape[1]))
    return tf.random.normal(shape, stddev=stddev, dtype=dtype)

In [24]:
def my_l1_regularizer(weights):
    return tf.reduce_sum(tf.abs(0.01 * weights))

#### Custom Metrics

In most cases, designing a custom metric function is very similar to creating a custom loss function:

In [25]:
# using huber loss as metric
model.compile(loss="mse", optimizer="nadam", metrics=
[create_huber(2.0)])

NameError: name 'model' is not defined

#### Custom Layer

In [26]:
# layer without weights
exponential_layer = keras.layers.Lambda(lambda x: tf.exp(x))

NameError: name 'keras' is not defined

To build a custom stateful layer (i.e., a layer with weights), you need to create a subclass of the `keras.layers.Layer` class. 

In [27]:
# dense layer
class MyDense(keras.layers.Layer):
    # hyperparams units and activation
    def __init__(self, units, activation=None, **kwargs):
        super().__init__(**kwargs)
        self.units = units
        self.activation = keras.activations.get(activation)
    # create layer variables by calling add_weight() for each weight
    def build(self, batch_input_shape):
        self.kernel = self.add_weight(
            name="kernel", shape=[batch_input_shape[-1], self.units],
            initializer="glorot_normal")
        self.bias = self.add_weight(
            name="bias", shape=[self.units], initializer="zeros")
        super().build(batch_input_shape) # must be at the end
    # perform desidered operations
    def call(self, X):
        # matrix multiply (input X and kernel) + bias 
        # then apply activation function
        return self.activation(X @ self.kernel + self.bias)
    
    # returns shape of layer’s outputs
    def compute_output_shape(self, batch_input_shape):
        # same as inputs but last dimension = n of neurons in layer
        return tf.TensorShape(batch_input_shape.as_list()[:-1] +
                                [self.units])
    def get_config(self):
        base_config = super().get_config()
        return {**base_config, "units": self.units,
                "activation":
keras.activations.serialize(self.activation)}

NameError: name 'keras' is not defined

#### Custom Models

Briefly: subclass the `keras.Model` class, create layers and variables in the constructor, and implement the `call()` method to do whatever you want the model to do.

#### Losses and Metrics Based on Model Internals

There will be times when we will want to define losses based on other parts of the model, such as the weights or activations of its hidden layers.

In [28]:
class ReconstructingRegressor(keras.Model):
    def __init__(self, output_dim, **kwargs):
        super().__init__(**kwargs)
        self.hidden = [keras.layers.Dense(30, activation="selu",

# 5 dense hidden layers + dense output layer
kernel_initializer="lecun_normal")
                    for _ in range(5)]
        self.out = keras.layers.Dense(output_dim)
# extra dense layer used to reconstruct the inputs of the model
    def build(self, batch_input_shape):
        n_inputs = batch_input_shape[-1]
        self.reconstruct = keras.layers.Dense(n_inputs)
        super().build(batch_input_shape)
        
# inputs passed to reconstruction layer
    def call(self, inputs):
        Z = inputs
        for layer in self.hidden:
            Z = layer(Z)
        # computes reconstruction loss
        reconstruction = self.reconstruct(Z)
        recon_loss = tf.reduce_mean(tf.square(reconstruction - inputs))
        # adds only 0.05 of recon loss (to avoid dominating main loss)
        self.add_loss(0.05 * recon_loss)
        return self.out(Z)

NameError: name 'keras' is not defined

Similarly we can create `keras.metrics.Mean` objects to customize metrics. 

#### Computing Gradients Using Autodiff

For NNs, computing partial derivatives analytically by hand would be nothing short of a nightmare (for sane people, at least). One solution could be to compute an approximation of each partial derivative by measuring how much the function’s output changes when you tweak the corresponding parameter. 

Let's check this with an example function `f`:

In [29]:
def f(w1, w2):
    return 3 * w1 ** 2 + 2 * w1 * w2

In [30]:
w1, w2 = 5, 3

In [31]:
eps = 1e-6

In [32]:
(f(w1 + eps, w2) - f(w1, w2)) / eps

36.000003007075065

In [33]:
(f(w1, w2 + eps) - f(w1, w2)) / eps

10.000000003174137

The partial derivatives should be 36 and 10. Very close as an approximation, but still quite inconvenient to use due to having to call `f()` at least once per param.

Here comes autodiff:

In [34]:
w1, w2 = tf.Variable(5.), tf.Variable(3.)
with tf.GradientTape() as tape:
    z = f(w1, w2)

In [35]:
gradients = tape.gradient(z, [w1, w2])

In [36]:
gradients

[<tf.Tensor: id=85, shape=(), dtype=float32, numpy=36.0>,
 <tf.Tensor: id=77, shape=(), dtype=float32, numpy=10.0>]

Perfect! 36 and 10. And super fast! 

`tape` only track operations involving variables, otherwise it returns `None`. However, we can force the tape to watch tensors, record every operation that involves them and compute gradients with regard to these tensors, as if they were variables.

#### Custom Training Loops

In some cases (e.g. two different optimizers) we may need to write our own custom loop. 

First, let’s build a simple model:

In [40]:
from tensorflow import keras

l2_reg = keras.regularizers.l2(0.05)
model = keras.models.Sequential([
    keras.layers.Dense(30, activation="elu",
kernel_initializer="he_normal",
                    kernel_regularizer=l2_reg),
    keras.layers.Dense(1, kernel_regularizer=l2_reg)
])

Next, let’s create a tiny function that will randomly sample a batch of instances from the training set:

In [41]:
def random_batch(X, y, batch_size=32):
    idx = np.random.randint(len(X), size=batch_size)
    return X[idx], y[idx]

Let’s also define a function that will display the training status, including the number of steps, the total number of steps, the mean loss since the start of the epoch and other metrics:

In [42]:
def print_status_bar(iteration, total, loss, metrics=None):
    metrics = " - ".join(["{}: {:.4f}".format(m.name, m.result())
                        for m in [loss] + (metrics or [])])
    end = "" if iteration < total else "\n"
    print("\r{}/{} - ".format(iteration, total) + metrics,
        end=end)

Now let's define some hyperparameters and choose the optimizer, the loss function, and the metrics:

In [43]:
n_epochs = 5
batch_size = 32
n_steps = len(X_train) // batch_size
optimizer = keras.optimizers.Nadam(lr=0.01)
loss_fn = keras.losses.mean_squared_error
mean_loss = keras.metrics.Mean()
metrics = [keras.metrics.MeanAbsoluteError()]

NameError: name 'X_train' is not defined

And finally here is our **custom loop**: 

In [44]:
# loop 1: epochs
for epoch in range(1, n_epochs + 1):
    print("Epoch {}/{}".format(epoch, n_epochs))
    # loop 2: batches in epoch
    for step in range(1, n_steps + 1):
        # sample random batch from training set
        X_batch, y_batch = random_batch(X_train_scaled, y_train)
        with tf.GradientTape() as tape:
            # make prediction for one batch
            y_pred = model(X_batch, training=True)
            # compute main loss
            main_loss = tf.reduce_mean(loss_fn(y_batch, y_pred))
            # compute total loss
            loss = tf.add_n([main_loss] + model.losses)
        # gradient of the loss with regard to each trainable variable
        gradients = tape.gradient(loss, model.trainable_variables)
        # apply them to the optimizer to perform a Gradient Descent step
        optimizer.apply_gradients(zip(gradients,
model.trainable_variables))
        # update mean loss
        mean_loss(loss)
        for metric in metrics:
            # update metrics
            metric(y_batch, y_pred)
        # display status bar 
        print_status_bar(step * batch_size, len(y_train), mean_loss,
metrics)
    # print final status bar (complete)
    print_status_bar(len(y_train), len(y_train), mean_loss, metrics)
    # restart mean loss and metrics
    for metric in [mean_loss] + metrics:
        metric.reset_states()

Epoch 1/5


NameError: name 'n_steps' is not defined

### TensorFlow Functions and Graphs

In TensorFlow 2, graphs are a lot simpler to use. Let's start with a function:

In [45]:
def cube(x):
    return x ** 3

In [46]:
# tensor
cube(tf.constant(2.0))

<tf.Tensor: id=88, shape=(), dtype=float32, numpy=8.0>

Now, let’s use `tf.function()` to convert this Python function to a **TensorFlow Function**:

In [47]:
tf_cube = tf.function(cube) # or use @tf.function as decorator

In [48]:
tf_cube

<tensorflow.python.eager.def_function.Function at 0x1da5ddbd5c8>

This will work like a Python function, but returning a tensor:

In [49]:
tf_cube(2)

<tf.Tensor: id=94, shape=(), dtype=int32, numpy=8>

In [50]:
tf_cube(tf.constant(2.0))

<tf.Tensor: id=102, shape=(), dtype=float32, numpy=8.0>

If we **really** still need the Python function, it is still accessible under `python_function`:

In [52]:
tf_cube.python_function(2)

8