---

# 7 Working with Keras: A deep dive

In [1]:
import os
import time
import pathlib

import numpy as np

from matplotlib import pyplot as plt

IMAGE_DIR = pathlib.Path("images")
IMAGE_DIR.mkdir(exist_ok=True)

import keras

os.environ["KERAS_BACKEND"] = "jax"

from IPython.core.magic import register_cell_magic

@register_cell_magic
def backend(line, cell):
    current, required = os.environ.get("KERAS_BACKEND", ""), line.split()[-1]
    if current == required:
        get_ipython().run_cell(cell)
    else:
        print(
            f"This cell requires the {required} backend. To run it, change KERAS_BACKEND to "
            f"\"{required}\" at the top of the notebook, restart the runtime, and rerun the notebook."
        )

## A spectrum of workflows

The principle of *progressive disclosure of complexity*:

<!-- ![Chollet spectrum](images/chollet/figure7.1.png) -->
![Chollet spectrum](https://raw.githubusercontent.com/jchwenger/AI/main/lectures/04/images/chollet/figure7.1.png)

[DLWP](https://deeplearningwithpython.io/chapters/chapter07_deep-dive-keras/#different-ways-to-build-keras-models), Figure 7.1

---

## Different ways to build Keras models

### The Sequential model

We started this module with the `Sequential` class:

In [2]:
model = keras.Sequential([
    keras.layers.Dense(64, activation="relu"),
    keras.layers.Dense(10, activation="softmax")
])

With this, we saw that we could build a Sequential model incrementally:

In [3]:
model = keras.Sequential()
model.add(keras.layers.Dense(64, activation="relu"))
model.add(keras.layers.Dense(10, activation="softmax"))

#### Calling a model for the first time to build it

The weights of the model are not initialised yet! Trying to access them will raise an error.

In [4]:
try:
    print(model.weights)
except ValueError as e:
    print(e)

[]


The building happens either when the model is used on data, or if `build()` is called *with the `Input` shape specified.*

In [5]:
try:
    model.build()         # also an error
except ValueError as e:
    print(e)

In [6]:
model.build(input_shape=(None, 3)) # you must specify the input shape!
model.weights                      # (this also happens automatically if you run data through the model!)

I0000 00:00:1765553332.463575  693411 service.cc:152] XLA service 0x600003a7b500 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1765553332.463593  693411 service.cc:160]   StreamExecutor device (0): Host, Default Version


[<Variable path=sequential_1/dense_2/kernel, shape=(3, 64), dtype=float32, value=[[-0.15215589  0.26188546 -0.19847265  0.2639258   0.18800727 -0.2963591
   -0.02342585 -0.2094822   0.02145737  0.24172497  0.12560809 -0.04921538
   -0.16620733 -0.22215898 -0.14349402  0.19340783  0.10025513 -0.08030254
    0.01428804 -0.20661667  0.29778862 -0.21202365  0.08048853 -0.06162983
   -0.09381296  0.25942641  0.01158056 -0.04888177  0.01377121  0.06205985
   -0.14670208  0.24797624  0.27000105 -0.0590806   0.27573746 -0.17742842
   -0.15788108  0.209126   -0.0806684   0.02271116 -0.2363201  -0.05207242
    0.1683422   0.11306635 -0.21241006 -0.29192978  0.03717208 -0.2397191
   -0.12021935 -0.07145695 -0.18378127  0.16068089 -0.01027673  0.0926367
   -0.08086468  0.07404858 -0.10073985  0.27857524 -0.01817483 -0.04772779
    0.11797905  0.16013938  0.01259902  0.14422283]
  [-0.00341412 -0.19324987 -0.08149539 -0.00670403 -0.04800996  0.08111954
    0.2472099   0.28467184 -0.02422068 -0.2924

#### Naming models and layers with the `name` argument

The default behaviour will give automatic names to layers.

In [7]:
model.summary() # note the default names under Layer

However, we can name our layers ourselves.

In [8]:
model = keras.Sequential(name="my_example_model")
model.add(keras.layers.Dense(64, activation="relu", name="my_first_layer"))
model.add(keras.layers.Dense(10, activation="softmax", name="my_last_layer"))
model.build((None, 3))
model.summary() # now summary reflects our name choices

#### Specifying the input shape of your model in advance

That allows us to call `build()` without argument.

In [9]:
model = keras.Sequential()
model.add(keras.Input(shape=(3,)))                   # our input shape is specified (not including batch size)
model.add(keras.layers.Dense(64, activation="relu"))
model.build()                                           # no error, the input shape was defined

#### Summary can be called while you are building your model

That allows you to track your progress.

In [10]:
model.summary()

In [11]:
model.add(keras.layers.Dense(10, activation="softmax"))
model.summary()

---

### The Functional API

This is the more involved, and more flexible way to define models in `Keras`. The [official tutorial](https://www.tensorflow.org/guide/keras/functional).

#### A simple Functional model with two `Dense` layers

In [12]:
inputs = keras.Input(shape=(3,), name="my_input")
features = keras.layers.Dense(64, activation="relu")(inputs)
outputs = keras.layers.Dense(10, activation="softmax")(features)
model = keras.Model(inputs=inputs, outputs=outputs)

Building the model step-by-step:

In [13]:
inputs = keras.Input(shape=(3,), name="my_input") # this is a *symbolic tensor*, without data in it yet

In [14]:
inputs.shape # will include the batch size as 'None'

(None, 3)

In [15]:
inputs.dtype # the default dtype

'float32'

In [16]:
features = keras.layers.Dense(64, activation="relu")(inputs)

In [17]:
features.shape # we can check the shape dynamically

(None, 64)

In [18]:
outputs = keras.layers.Dense(10, activation="softmax")(features)
model = keras.Model(inputs=inputs, outputs=outputs) # ← specify inputs & outputs

Result:

In [19]:
model.summary()

#### An example of a multi-input, multi-output model

The example involves ranking tickets by priority.

Inputs:
- The **title** of the ticket (text input)
- The **text body** of the ticket (text input)
- Any **tags** added by the user (categorical input, assumed here to be one-hot)

Outputs:
- The **priority** score of the ticket, a scalar between 0 and 1 (sigmoid output);
- The **department** that should handle the ticket (a softmax over the set of departments)


Note the use of the [`Concatenate`](https://www.tensorflow.org/api_docs/python/keras/layers/Concatenate) layer!

In [20]:
vocabulary_size = 10000
num_tags = 100
num_departments = 4
                                               # ↓ NAMES (useful later)
title = keras.Input(shape=(vocabulary_size,), name="title")          # 1. THREE INPUTS
text_body = keras.Input(shape=(vocabulary_size,), name="text_body")
tags = keras.Input(shape=(num_tags,), name="tags")

features = keras.layers.Concatenate()([title, text_body, tags])      # 2. GRAPH: - concatenate
features = keras.layers.Dense(64, activation="relu")(features)       #           - dense

priority = keras.layers.Dense(                                       #           - OUTPUT 1: processing `features`
    1, activation="sigmoid", name="priority"
)(features)

department = keras.layers.Dense(                                     #           - OUTPUT 2: also processing `features`
    num_departments, activation="softmax", name="department"
)(features)

model = keras.Model(                                                 # 3. MODEL DEFINITION
    inputs=[title, text_body, tags],                                 #           - inputs
    outputs=[priority, department]                                   #           - outputs
)

##### Training a multi-input, multi-output model

We provide lists:
 - of losses;
 - of metrics;
 - of inputs & targets.

##### All in the same order/format as in the model definition

In [21]:
num_samples = 1280

title_data = np.random.randint(0, 2, size=(num_samples, vocabulary_size))      # INPUTS
text_body_data = np.random.randint(0, 2, size=(num_samples, vocabulary_size))
tags_data = np.random.randint(0, 2, size=(num_samples, num_tags))

priority_data = np.random.random(size=(num_samples, 1))                        # OUTPUTS
department_data = np.random.randint(0, 2, size=(num_samples, num_departments))

model.compile(
    optimizer="rmsprop",
    loss=[
        "mean_squared_error",       # loss for output 1
        "categorical_crossentropy"  # loss for output 2
    ],
    metrics=[
        ["mean_absolute_error"],    # metrics for output 1
        ["accuracy"]                # metrics for output 2
    ],
)
model.fit(
    [title_data, text_body_data, tags_data], # input data and target data
    [priority_data, department_data],        # as specified in model definition
    epochs=1
)

[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - department_accuracy: 0.2070 - department_loss: 37.9191 - loss: 38.2257 - priority_loss: 0.3066 - priority_mean_absolute_error: 0.4695


I0000 00:00:1765553333.080003  693873 device_compiler.h:188] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


<keras.src.callbacks.history.History at 0x147b44a70>

In [22]:
print("evaluating:")
model.evaluate(                                    # EVALUATE: the *same* as training, but without changing the net
    [title_data, text_body_data, tags_data],       #           used to test the network e.g. on the test set
    [priority_data, department_data]
)
print()
print("predicting:")
priority_preds, department_preds = model.predict(  # PREDICT: just use the model as is (given some inputs, what
    [title_data, text_body_data, tags_data]        #          are the model's predictions?)
)

evaluating:
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - department_accuracy: 0.1398 - department_loss: 12.8313 - loss: 13.1709 - priority_loss: 0.3396 - priority_mean_absolute_error: 0.5042

predicting:
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step


#### Training a model by providing dicts of input & target arrays

The same as above. This is only possible because we **named** our layers!

```python
                                               # ↓ naming the layer
title = keras.Input(shape=(vocabulary_size,), name="title")
```

In [23]:
model.compile(
    optimizer="rmsprop",
    loss={
        "priority": "mean_squared_error",         # loss for output 1
        "department": "categorical_crossentropy"  # loss for output 2
    },
    metrics={
        "priority": ["mean_absolute_error"],      # metrics for output 1
        "department": ["accuracy"]                # metrics for output 2
    },
)

model.fit(
    {                                             # input data
        "title": title_data,
        "text_body": text_body_data,
        "tags": tags_data
    },
    {                                             # target data
        "priority": priority_data,
        "department": department_data
    },
    epochs=1,
)

[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - department_accuracy: 0.2367 - department_loss: 49.9224 - loss: 50.2626 - priority_loss: 0.3403 - priority_mean_absolute_error: 0.5049


<keras.src.callbacks.history.History at 0x107c91be0>

In [24]:
print("evaluatiing:")
model.evaluate(
    {                                             # input data
        "title": title_data,
        "text_body": text_body_data,
        "tags": tags_data
    },
    {                                             # target data
        "priority": priority_data,
        "department": department_data
    },
)

print()
print("predicting:")
priority_preds, department_preds = model.predict(
    {
        "title": title_data,
        "text_body": text_body_data,
        "tags": tags_data
    }
)

evaluatiing:
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - department_accuracy: 0.1266 - department_loss: 30.9959 - loss: 31.3355 - priority_loss: 0.3396 - priority_mean_absolute_error: 0.5042  

predicting:
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step 


#### Ignore one or more model output(s) during training

Use `None` in the loss like so:

```python
model.compile(
    optimizer="rmsprop",
    loss=["categorical_crossentropy", None], # second output without loss
    metrics=["accuracy"]
)
```

Useful if you want to be able to inspect the insides of your model. Other option: train your model, then use the functional syntax to build another model that outputs the inner layers you are interested in...

#### The power of the Functional API: Access to layer connectivity

Thanks to it, we could create a more complex **graph** than the `Sequential` model.

Using [`keras.utils.plot_model`](https://www.tensorflow.org/api_docs/python/keras/utils/plot_model) we can create a summary of the model.

Default setup:

In [25]:
keras.utils.plot_model(model, IMAGE_DIR / "ticket_classifier.png")

You must install pydot (`pip install pydot`) for `plot_model` to work.


Displaying the shapes:

In [26]:
keras.utils.plot_model(model, IMAGE_DIR / "ticket_classifier_with_shape_info.png", show_shapes=True)

You must install pydot (`pip install pydot`) for `plot_model` to work.


#### Retrieving the inputs or outputs of a layer in a Functional model

In [27]:
model.layers # concatenate is the 4th layer, last common dense layer is 5th

[<InputLayer name=title, built=True>,
 <InputLayer name=text_body, built=True>,
 <InputLayer name=tags, built=True>,
 <Concatenate name=concatenate, built=True>,
 <Dense name=dense_10, built=True>,
 <Dense name=priority, built=True>,
 <Dense name=department, built=True>]

In [28]:
msg = "Our concatenation layer"
print(f"{msg}\n{'='*len(msg)}")
print()
msg = "inputs:"
print(f"{msg}\n{'-'*len(msg)}")
for i in model.layers[3].input:   # DISSECTING LAYER 4: inputs
    print(i)
    print()
msg = "outputs:"
print(f"{msg}\n{'-'*len(msg)}")
print(model.layers[3].output)     # DISSECTING LAYER 4: only one output

Our concatenation layer

inputs:
-------
<KerasTensor shape=(None, 10000), dtype=float32, sparse=False, ragged=False, name=title>

<KerasTensor shape=(None, 10000), dtype=float32, sparse=False, ragged=False, name=text_body>

<KerasTensor shape=(None, 100), dtype=float32, sparse=False, ragged=False, name=tags>

outputs:
--------
<KerasTensor shape=(None, 20100), dtype=float32, sparse=False, ragged=False, name=keras_tensor_14>


#### Creating a new model by reusing intermediate layer outputs

**Workflow**

1. Extract intermediate features;
2. Add a new layer / create a new output;
3. Define a new model specifying that new output.

In [29]:
                                               # 1. Extract & reuse the features of our intermediate Dense
features = model.layers[4].output              #    5th layer (after concatenate)

                                               # 2. Create a new output
difficulty = keras.layers.Dense(3, activation="softmax", name="difficulty")(features)

new_model = keras.Model(
    inputs=[title, text_body, tags],
    outputs=[priority, department, difficulty] # same as before, with an additional output!
)

Displaying the new topology:

In [30]:
keras.utils.plot_model(new_model, IMAGE_DIR / "updated_ticket_classifier.png", show_shapes=True)

You must install pydot (`pip install pydot`) for `plot_model` to work.


---

### Subclassing the Model class

#### Rewriting our previous example as a subclassed model

In [31]:
class CustomerTicketModel(keras.Model):                               # ← define a SUBCLASSED MODEL

    def __init__(self, num_departments):
        super().__init__()
        self.concat_layer = keras.layers.Concatenate()                # the layers are now linked to
        self.mixing_layer = keras.layers.Dense(64, activation="relu") # this object
        self.priority_scorer = keras.layers.Dense(1, activation="sigmoid")
        self.department_classifier = keras.layers.Dense(
            num_departments, activation="softmax"
        )

    def call(self, inputs):                                              # CALL: what all Keras models must implement!
        title = inputs["title"]                                          # (note: we assume 'inputs' is a dictionary)
        text_body = inputs["text_body"]
        tags = inputs["tags"]

        features = self.concat_layer([title, text_body, tags])           # where the computation actually happens
        features = self.mixing_layer(features)
        priority = self.priority_scorer(features)
        department = self.department_classifier(features)
        return priority, department

In [32]:
model = CustomerTicketModel(num_departments=4)                           # ← instantiate the model object

priority, department = model(
    {"title": title_data, "text_body": text_body_data, "tags": tags_data}
)

The rest of the code is the same as before:

In [33]:
model.compile(
    optimizer="rmsprop",
    loss=["mean_squared_error", "categorical_crossentropy"],
    metrics=[["mean_absolute_error"], ["accuracy"]],
)
model.fit(
    {"title": title_data, "text_body": text_body_data, "tags": tags_data},
    [priority_data, department_data],
    epochs=1,
)
model.evaluate(
    {"title": title_data, "text_body": text_body_data, "tags": tags_data},
    [priority_data, department_data],
)
priority_preds, department_preds = model.predict(
    {"title": title_data, "text_body": text_body_data, "tags": tags_data}
)

[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - accuracy: 0.2219 - categorical_crossentropy_loss: 43.4015 - loss: 43.7274 - mean_absolute_error: 0.4904 - mean_squared_error_loss: 0.3259
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.0562 - categorical_crossentropy_loss: 40.9741 - loss: 41.3053 - mean_absolute_error: 0.4958 - mean_squared_error_loss: 0.3312
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step 


#### Beware: What subclassed models don't support

The `summary()` method will work with the **functional** syntax but not with **subclassed models**!

#### Multiple losses

Multiple losses are *summed* to get one final number.

You can modulate the importance of each element using `loss_weights`.

```python
model.compile(
    optimizer="rmsprop",
    loss=["mean_squared_error", "categorical_crossentropy"],
    # ↓ the final loss will be :
    loss_weights=[2,1]
               # ↑ (2 * `mean_squared_error` + 1 * `categorical_crossentropy`)
    metrics=[["mean_absolute_error"], ["accuracy"]],
)
```

[Keras Model documentation](https://www.tensorflow.org/api_docs/python/keras/Model)

---

### Mixing and matching different components

#### Creating a Functional model that includes a subclassed model

In [34]:
class Classifier(keras.Model):                               # ← define a SUBCLASSED MODEL
    """A basic subclassed model"""
    def __init__(self, num_classes=2):
        super().__init__()
        if num_classes == 2:
            num_units = 1
            activation = "sigmoid"
        else:
            num_units = num_classes
            activation = "softmax"
        self.dense = keras.layers.Dense(num_units, activation=activation)
    def call(self, inputs):                                     # CALL: what all Keras models must implement!
        return self.dense(inputs)

inputs = keras.Input(shape=(3,))                             # ← FUNCTIONAL SYNTAX
features = keras.layers.Dense(64, activation="relu")(inputs) # ← ..., then
outputs = Classifier(num_classes=10)(features)                  # ← our SUBCLASSED MODEL in the middle
model = keras.Model(inputs=inputs, outputs=outputs)          # ← back to functional syntax


#### Creating a subclassed model that includes a Functional model

In [35]:
inputs = keras.Input(shape=(64,))
outputs = keras.layers.Dense(1, activation="sigmoid")(inputs)
binary_classifier = keras.Model(inputs=inputs, outputs=outputs) # ← FUNCTIONAL MODEL definition

class MyModel(keras.Model):                                     # ← SUBCLASSED MODEL
    """Another basic subclassed model"""
    def __init__(self, num_classes=2):
        super().__init__()
        self.dense = keras.layers.Dense(64, activation="relu")
        self.classifier = binary_classifier                        # ← included INTO OUR SUBCLASSED ONE

    def call(self, inputs):                                        # CALL: what all Keras models must implement!
        features = self.dense(inputs)
        return self.classifier(features)                           # ← used here

model = MyModel()

---

## Using built-in training and evaluation loops

### Using callbacks

You add one or more callbacks through the `callbacks` argument to `fit/evaluate/predict` in a list:

```python
model.(fit|evaluate|predict)(
    ...
    callbacks=[MyCustomCallback(), MyOtherCustomCallback()]
)
```

The [documentation](https://www.tensorflow.org/api_docs/python/keras/callbacks/Callback), [list](https://www.tensorflow.org/api_docs/python/keras/callbacks) and a [tutorial](https://www.tensorflow.org/guide/keras/custom_callback).

### Early Stopping

```python
keras.callbacks.EarlyStopping
```

Stops training as soon as the tracked metric stops improving. ([docs](https://www.tensorflow.org/api_docs/python/keras/callbacks/EarlyStopping))

### Checkpoints

```python
keras.callbacks.ModelCheckpoint
```

Saves your network automatically. ([docs](https://www.tensorflow.org/api_docs/python/keras/callbacks/ModelCheckpoint))

### Learning rate schedules

While it is recommended to use `RMSprop` and `Adam` to start with (they adapt the learning rate for you)), the state of the art is usually achieved with tweaked variants of `SGD` with momentum (where you need to search for the right parameters).

In this context, you usually want your learning rate to go down as you get nearer to your optimisation goal.


```python
keras.callbacks.LearningRateScheduler
keras.callbacks.ReduceLROnPlateau
```

Changes your learning rate during training. ([Scheduler docs](https://www.tensorflow.org/api_docs/python/keras/callbacks/LearningRateScheduler))  
Reduces the learning rate during training as soon as the tracked metrick stops improving. ([Plateau docs](https://www.tensorflow.org/api_docs/python/keras/callbacks/ReduceLROnPlateau))

#### Note on Adaptive Methods

`RMSProp`, `Adam` and other adaptive methods do not need this so much as the size of the updates are already dynamically changed during the training process.

All these fancy learning rate techniques are usually implemented to make SGD outperform those! In which case you'd search for optimal performance with SGD + Momentum + Schedule + Nesterov!

### Monitoring and visualization with TensorBoard

```python
keras.callbacks.TensorBoard
```

Checkout [`tensorboard_in_notebooks.ipynb`](https://github.com/jchwenger/AI/blob/main/lectures/04/tensorboard_in_notebooks.ipynb)! Also on the VLE. ([docs](https://www.tensorflow.org/api_docs/python/keras/callbacks/TensorBoard)) (Topic 4: Fundamentals).

More references and examples in the notebook [`chapter07.callbacks.learning-rate-schedules.ipynb`](https://github.com/jchwenger/AI/blob/main/lectures/04.more/chapter07.callbacks.learning-rate-schedules.ipynb).

---

## Writing your own training and evaluation loops

And also: **Writing your own metrics**

Please refer to the notebook [`chapter07.custom-classes.subclassing.ipynb`](https://github.com/jchwenger/AI/blob/main/lectures/04.more/chapter07.custom-classes.subclassing.ipynb).

# Summary

### Spectrum of Workflows

- the `Sequential` syntax;
- the `Functional` syntax;
- `Model` or `Layer` subclassing
- Losses must match outputs!
- Various losses are **summed**, weighted average using `loss_weights`;
- Retrieve inner layers from models → build new ones;
- Mix & match: functional & subclassing can be combined;
- No `summary()` method with subclassing!