# 12. Custom Models and Training with TensorFlow

After introducing Keras high-level API, which will be good for most of our everyday use cases. In this chapter we will dive deeper in the lower level API.

This Chapter uses TF2. 

### Quick tour of TensorFlow

Here is a summary of what TensorFlow offers:

* Core similar to NumPy but with GPU support
* Distributed computing support
* Just-in-time (JIT) compiler that allows it to optimize computations for speed and memory usage. It works by: 
    1. Extracting **computation graph** from Py function
    2. Optimizing it (e.g., by pruning unused nodes)
    3. Running it efficiently (e.g., by automatically running independent operations in parallel)
* Exportable computation graph, potentially allowing to train in an env and run in another
* Implements autodiff and provides optimizers

### Using TensorFlow like NumPy

TensorFlow’s API revolves around **tensors**, which **flow** from operation to operation—hence the name _TensorFlow_. A tensor is usually a multidimensional array (exactly like a NumPy `ndarray`), but it can also hold a scalar. 

#### Tensors and Operations

We can create a tensor with `tf.constant()`:

In [None]:
import tensorflow as tf

tf.enable_eager_execution()

# floats matrix 2x3 
tf.constant([[1., 2., 3.], [4., 5., 6.]])

In [None]:
tf.constant()

In [None]:
t = tf.constant([[1., 2., 3.], [4., 5., 6.]])

In [None]:
t.shape

In [None]:
t.dtype

Indexing similar to NumPy:

In [None]:
t[:, 1:]

In [None]:
t[..., 1, tf.newaxis]

Tensor operations as we would expect them:  

In [None]:
t + 10

In [None]:
tf.square(t)

In [None]:
# matrix multiplication
t @ tf.transpose(t)

Generally, NumPy and TensorFlow are compatible in terms of operations.

**Note**: NumPy uses 64-bit precision by default, so don't forget to set it to `dtype=tf.float32` (more than enough for NNs).

#### Type Conversions

Tf doesn't allow operations between different types, or even different bit precisions. 

#### Variables

For things that need to change (e.g. weights) we would need to use `tf.Variable`:

In [None]:
v = tf.Variable([[1., 2., 3.], [4., 5., 6.]])

In [None]:
v

A `tf.Variable` acts much like a `tf.Tensor` but it can also be modified using the `assign()` method.

In [None]:
v.assign(2 * v)

In [None]:
v[0, 1].assign(42)

In [None]:
v[:, 2].assign([0., 1.])

In [None]:
v.scatter_nd_update(indices=[[0, 0], [1, 2]], updates=[100., 200.])

### Customizing Models and Training Algorithms

#### Custom Loss Functions

Let's assume we are working with a very noisy dataset. Let's also pretend that we want to implement Huber loss and it's not include in Keras (it actually is):

In [None]:
def huber_fn(y_true, y_pred):
    error = y_true - y_pred
    is_small_error = tf.abs(error) < 1
    squared_loss = tf.square(error) / 2
    linear_loss = tf.abs(error) - 0.5
    return tf.where(is_small_error, squared_loss, linear_loss)

Now we can use this loss when you compile the Keras model, then train your model:

In [None]:
model.compile(loss=huber_fn, optimizer="nadam")
model.fit(X_train, y_train, [...])

#### Saving and Loading Models That Contain Custom Components

In [None]:
model = keras.models.load_model("my_model_with_a_custom_loss.h5",
                                custom_objects={"huber_fn": huber_fn})

What if we want to change the loss function threshold for _small errors_ (1 in the example above)? Then we will have to create a function that creates a configured loss function:

In [None]:
def create_huber(threshold=1.0):
    def huber_fn(y_true, y_pred):
        error = y_true - y_pred
        is_small_error = tf.abs(error) < threshold
        squared_loss = tf.square(error) / 2
        linear_loss = threshold * tf.abs(error) - threshold**2 / 2
        return tf.where(is_small_error, squared_loss, linear_loss)
    return huber_fn

model.compile(loss=create_huber(2.0), optimizer="nadam")

Unfortunately this threshold will not be saved, we have to specify it again: 

In [None]:
# using name of function, not name of function creating the function

model = keras.models.load_model("my_model_with_a_custom_loss_threshold_2.h5",
                                custom_objects={"huber_fn":
create_huber(2.0)})

#### Custom Activation Functions, Initializers, Regularizers, and Constraints

Most Keras functionalities can be customized in pretty much the same way:

In [None]:
def my_softplus(z): # = tf.nn.softplus(z)
    return tf.math.log(tf.exp(z) + 1.0)

In [None]:
def my_glorot_initializer(shape, dtype=tf.float32):
    stddev = tf.sqrt(2. / (shape[0] + shape[1]))
    return tf.random.normal(shape, stddev=stddev, dtype=dtype)

In [None]:
def my_l1_regularizer(weights):
    return tf.reduce_sum(tf.abs(0.01 * weights))

#### Custom Metrics

In most cases, designing a custom metric function is very similar to creating a custom loss function:

In [None]:
# using huber loss as metric
model.compile(loss="mse", optimizer="nadam", metrics=
[create_huber(2.0)])

#### Custom Layer

In [None]:
# layer without weights
exponential_layer = keras.layers.Lambda(lambda x: tf.exp(x))

To build a custom stateful layer (i.e., a layer with weights), you need to create a subclass of the `keras.layers.Layer` class. 

In [None]:
# dense layer
class MyDense(keras.layers.Layer):
    # hyperparams units and activation
    def __init__(self, units, activation=None, **kwargs):
        super().__init__(**kwargs)
        self.units = units
        self.activation = keras.activations.get(activation)
    # create layer variables by calling add_weight() for each weight
    def build(self, batch_input_shape):
        self.kernel = self.add_weight(
            name="kernel", shape=[batch_input_shape[-1], self.units],
            initializer="glorot_normal")
        self.bias = self.add_weight(
            name="bias", shape=[self.units], initializer="zeros")
        super().build(batch_input_shape) # must be at the end
    # perform desidered operations
    def call(self, X):
        # matrix multiply (input X and kernel) + bias 
        # then apply activation function
        return self.activation(X @ self.kernel + self.bias)
    
    # returns shape of layer’s outputs
    def compute_output_shape(self, batch_input_shape):
        # same as inputs but last dimension = n of neurons in layer
        return tf.TensorShape(batch_input_shape.as_list()[:-1] +
                                [self.units])
    def get_config(self):
        base_config = super().get_config()
        return {**base_config, "units": self.units,
                "activation":
keras.activations.serialize(self.activation)}

#### Custom Models

Briefly: subclass the `keras.Model` class, create layers and variables in the constructor, and implement the `call()` method to do whatever you want the model to do.

#### Losses and Metrics Based on Model Internals

There will be times when we will want to define losses based on other parts of the model, such as the weights or activations of its hidden layers.

In [None]:
class ReconstructingRegressor(keras.Model):
    def __init__(self, output_dim, **kwargs):
        super().__init__(**kwargs)
        self.hidden = [keras.layers.Dense(30, activation="selu",

# 5 dense hidden layers + dense output layer
kernel_initializer="lecun_normal")
                    for _ in range(5)]
        self.out = keras.layers.Dense(output_dim)
# extra dense layer used to reconstruct the inputs of the model
    def build(self, batch_input_shape):
        n_inputs = batch_input_shape[-1]
        self.reconstruct = keras.layers.Dense(n_inputs)
        super().build(batch_input_shape)
        
# inputs passed to reconstruction layer
    def call(self, inputs):
        Z = inputs
        for layer in self.hidden:
            Z = layer(Z)
        # computes reconstruction loss
        reconstruction = self.reconstruct(Z)
        recon_loss = tf.reduce_mean(tf.square(reconstruction - inputs))
        # adds only 0.05 of recon loss (to avoid dominating main loss)
        self.add_loss(0.05 * recon_loss)
        return self.out(Z)

Similarly we can create `keras.metrics.Mean` objects to customize metrics. 

#### Computing Gradients Using Autodiff

For NNs, computing partial derivatives analytically by hand would be nothing short of a nightmare (for sane people, at least). One solution could be to compute an approximation of each partial derivative by measuring how much the function’s output changes when you tweak the corresponding parameter. 

Let's check this with an example function `f`:

In [None]:
def f(w1, w2):
    return 3 * w1 ** 2 + 2 * w1 * w2

In [None]:
w1, w2 = 5, 3

In [None]:
eps = 1e-6

In [None]:
(f(w1 + eps, w2) - f(w1, w2)) / eps

In [None]:
(f(w1, w2 + eps) - f(w1, w2)) / eps

The partial derivatives should be 36 and 10. Very close as an approximation, but still quite inconvenient to use due to having to call `f()` at least once per param.

Here comes autodiff:

In [None]:
w1, w2 = tf.Variable(5.), tf.Variable(3.)
with tf.GradientTape() as tape:
    z = f(w1, w2)

In [None]:
gradients = tape.gradient(z, [w1, w2])

In [None]:
gradients