# Modern TensorFlow 2.x Fundamentals

A comprehensive guide to TensorFlow 2.x -- from tensors and automatic differentiation to building, training, and deploying deep learning models.

**This notebook replaces the original TF1.x sample** and covers the modern, Pythonic approach to deep learning with TensorFlow.

---

### Table of Contents

1. [Welcome to Modern TensorFlow](#1)
2. [Setup & Verification](#2)
3. [Tensors -- The Foundation](#3)
4. [Eager Execution & GradientTape](#4)
5. [Building Models -- Three Ways](#5)
6. [Training Loop Deep Dive](#6)
7. [Callbacks System](#7)
8. [tf.data Pipeline](#8)
9. [Practical Example: MNIST Classifier](#9)
10. [Practical Example: Linear Regression (Updated from TF1)](#10)
11. [Saving & Loading Models](#11)
12. [Performance Tips](#12)
13. [Exercises](#13)

<a name="1"></a>
## 1. Welcome to Modern TensorFlow

### What is TensorFlow 2.x?

TensorFlow 2.x is Google's open-source platform for machine learning. It provides a comprehensive ecosystem of tools, libraries, and community resources that lets researchers push the state of the art in ML and developers easily build and deploy ML-powered applications.

### Key Differences from TF1.x

| Feature | TF 1.x | TF 2.x |
|---------|--------|--------|
| **Execution** | Graph-based (lazy) -- required `tf.Session()` | **Eager by default** -- runs immediately like NumPy |
| **API** | Low-level, verbose, `tf.placeholder`, `tf.Variable` init | **Keras integrated** as the high-level API |
| **Sessions** | `tf.Session().run()` to evaluate anything | **No sessions needed** -- just call functions |
| **Variables** | `tf.global_variables_initializer()` required | Variables initialize on creation |
| **Graphs** | Built explicitly, then executed | Built implicitly via `@tf.function` when needed |
| **Debugging** | Painful -- print statements did not work in graph mode | **Standard Python debugging** works out of the box |
| **API Cleanup** | Duplicated, inconsistent APIs (`tf.layers` vs `tf.keras.layers`) | **Single unified API** under `tf.keras` |

### The TensorFlow Ecosystem

- **TensorFlow Core** -- The main library for building and training models
- **Keras** (`tf.keras`) -- High-level API for building neural networks (integrated into TF2)
- **TensorFlow Lite** -- Deploy models on mobile and edge devices
- **TensorFlow Serving** -- Production model serving with gRPC/REST APIs
- **TensorFlow.js** -- Run models in the browser or Node.js
- **TFX (TensorFlow Extended)** -- End-to-end ML pipelines for production
- **TensorFlow Hub** -- Repository of pre-trained model components
- **TensorBoard** -- Visualization toolkit for training metrics, graphs, and more
- **TensorFlow Datasets** -- Collection of ready-to-use datasets

<a name="2"></a>
## 2. Setup & Verification

In [None]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

print(f"TensorFlow version: {tf.__version__}")
print(f"Keras version: {tf.keras.__version__}")
print(f"Eager execution enabled: {tf.executing_eagerly()}")
print(f"GPU available: {len(tf.config.list_physical_devices('GPU')) > 0}")

if tf.config.list_physical_devices('GPU'):
    print(f"GPU device(s): {tf.config.list_physical_devices('GPU')}")
else:
    print("Running on CPU -- GPU not detected.")

print(f"NumPy version: {np.__version__}")

> **Note:** In TF 1.x you had to call `tf.enable_eager_execution()` explicitly or work inside `tf.Session()`. In TF 2.x, eager execution is enabled by default -- code runs line by line, just like regular Python.

<a name="3"></a>
## 3. Tensors -- The Foundation

Tensors are the central data structure in TensorFlow. They are multi-dimensional arrays with a uniform data type (`dtype`). Tensors are immutable (like NumPy arrays) -- every operation produces a new tensor.

### 3.1 Creating Tensors

In [None]:
# ---- Constants (immutable) ----
scalar = tf.constant(42)
vector = tf.constant([1.0, 2.0, 3.0])
matrix = tf.constant([[1, 2], [3, 4]])
tensor_3d = tf.constant([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

print("=== Tensor Basics ===")
print(f"Scalar:    shape={scalar.shape}, dtype={scalar.dtype}, ndim={scalar.ndim}, value={scalar.numpy()}")
print(f"Vector:    shape={vector.shape}, dtype={vector.dtype}, ndim={vector.ndim}")
print(f"Matrix:    shape={matrix.shape}, dtype={matrix.dtype}, ndim={matrix.ndim}")
print(f"3D Tensor: shape={tensor_3d.shape}, dtype={tensor_3d.dtype}, ndim={tensor_3d.ndim}")

In [None]:
# ---- Variables (mutable -- used for model parameters) ----
var = tf.Variable([[1.0, 2.0], [3.0, 4.0]], name="my_variable")
print(f"Variable: {var}")
print(f"Name: {var.name}, Shape: {var.shape}, Dtype: {var.dtype}")

# Variables can be updated in-place
var.assign([[10.0, 20.0], [30.0, 40.0]])
print(f"After assign: {var.numpy()}")

var[0, 1].assign(99.0)
print(f"After element assign: {var.numpy()}")

In [None]:
# ---- Random Tensors ----
normal = tf.random.normal([3, 3], mean=0.0, stddev=1.0)
uniform = tf.random.uniform([3, 3], minval=0, maxval=10)
truncated = tf.random.truncated_normal([3, 3], mean=0.0, stddev=1.0)

print("Normal:\n", normal.numpy())
print("\nUniform:\n", uniform.numpy())

# Reproducibility with seeds
tf.random.set_seed(42)
a = tf.random.normal([2, 2])
tf.random.set_seed(42)
b = tf.random.normal([2, 2])
print(f"\nSame seed produces same result: {tf.reduce_all(a == b).numpy()}")

In [None]:
# ---- Special Tensors ----
zeros = tf.zeros([3, 4])
ones = tf.ones([2, 3])
eye = tf.eye(4)  # Identity matrix
filled = tf.fill([2, 3], 7.0)
range_t = tf.range(0, 10, 2)
linspace = tf.linspace(0.0, 1.0, 5)

print(f"Zeros: shape={zeros.shape}")
print(f"Identity matrix:\n{eye.numpy()}")
print(f"Range: {range_t.numpy()}")
print(f"Linspace: {linspace.numpy()}")

### 3.2 Tensor Operations

In [None]:
# ---- Arithmetic Operations ----
a = tf.constant([[1.0, 2.0], [3.0, 4.0]])
b = tf.constant([[5.0, 6.0], [7.0, 8.0]])

print("=== Arithmetic ===")
print(f"Addition (a + b):\n{(a + b).numpy()}")
print(f"\nSubtraction (a - b):\n{(a - b).numpy()}")
print(f"\nElement-wise multiply (a * b):\n{(a * b).numpy()}")
print(f"\nMatrix multiply (a @ b):\n{(a @ b).numpy()}")
print(f"\nElement-wise power (a ** 2):\n{(a ** 2).numpy()}")

In [None]:
# ---- Reduction Operations ----
m = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])

print("=== Reductions ===")
print(f"Matrix:\n{m.numpy()}")
print(f"\nReduce sum (all): {tf.reduce_sum(m).numpy()}")
print(f"Reduce sum (axis=0, columns): {tf.reduce_sum(m, axis=0).numpy()}")
print(f"Reduce sum (axis=1, rows):    {tf.reduce_sum(m, axis=1).numpy()}")
print(f"Reduce mean: {tf.reduce_mean(m).numpy()}")
print(f"Reduce max:  {tf.reduce_max(m).numpy()}")
print(f"Reduce min:  {tf.reduce_min(m).numpy()}")
print(f"Argmax (axis=1): {tf.argmax(m, axis=1).numpy()}")

In [None]:
# ---- Reshaping and Manipulation ----
t = tf.range(12)
print(f"Original: shape={t.shape}, values={t.numpy()}")

reshaped = tf.reshape(t, [3, 4])
print(f"\nReshaped to [3, 4]:\n{reshaped.numpy()}")

transposed = tf.transpose(reshaped)
print(f"\nTransposed to {transposed.shape}:\n{transposed.numpy()}")

# Expand and squeeze dimensions
expanded = tf.expand_dims(t, axis=0)  # Add batch dimension
print(f"\nExpanded dims: {t.shape} -> {expanded.shape}")

squeezed = tf.squeeze(expanded)
print(f"Squeezed dims: {expanded.shape} -> {squeezed.shape}")

# Concatenation and stacking
a = tf.constant([[1, 2], [3, 4]])
b = tf.constant([[5, 6], [7, 8]])
print(f"\nConcat (axis=0):\n{tf.concat([a, b], axis=0).numpy()}")
print(f"\nConcat (axis=1):\n{tf.concat([a, b], axis=1).numpy()}")
print(f"\nStack (new axis=0):\n{tf.stack([a, b], axis=0).numpy()}")

In [None]:
# ---- Indexing and Slicing ----
t = tf.constant([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print(f"Tensor:\n{t.numpy()}")

print(f"\nt[0]:       {t[0].numpy()}")         # First row
print(f"t[:, 0]:    {t[:, 0].numpy()}")         # First column
print(f"t[1, 2]:    {t[1, 2].numpy()}")         # Element at row 1, col 2
print(f"t[0:2, 1:3]:\n{t[0:2, 1:3].numpy()}")  # Submatrix
print(f"t[-1]:      {t[-1].numpy()}")           # Last row

In [None]:
# ---- Broadcasting ----
# TensorFlow follows the same broadcasting rules as NumPy
a = tf.constant([[1], [2], [3]])   # Shape: (3, 1)
b = tf.constant([10, 20, 30])      # Shape: (3,)

print(f"a shape: {a.shape}")
print(f"b shape: {b.shape}")
print(f"\na + b (broadcasted):\n{(a + b).numpy()}")
# a is broadcast along axis=1, b along axis=0

### 3.3 NumPy Interop & GPU Placement

In [None]:
# ---- NumPy <-> TensorFlow conversion ----
np_array = np.array([[1.0, 2.0], [3.0, 4.0]])

# NumPy to TensorFlow
tf_tensor = tf.constant(np_array)
tf_tensor_convert = tf.convert_to_tensor(np_array)
print(f"NumPy -> TF: {tf_tensor.dtype}, {tf_tensor.shape}")

# TensorFlow to NumPy
back_to_np = tf_tensor.numpy()
print(f"TF -> NumPy: {type(back_to_np)}, {back_to_np.dtype}")

# TensorFlow ops accept NumPy arrays directly
result = tf.multiply(np_array, 2)
print(f"\nTF op on NumPy array: {result.numpy()}")

print(f"\n--- Important: TF tensors and NumPy arrays can share memory ---")
print(f"TF tensor device: {tf_tensor.device}")

In [None]:
# ---- Device Placement ----
# TF automatically places operations on GPU if available
print("Available devices:")
for device in tf.config.list_physical_devices():
    print(f"  {device}")

# Explicit placement
with tf.device('/CPU:0'):
    cpu_tensor = tf.constant([[1.0, 2.0], [3.0, 4.0]])
print(f"\nCPU tensor device: {cpu_tensor.device}")

# If GPU is available, you can place tensors on it
if tf.config.list_physical_devices('GPU'):
    with tf.device('/GPU:0'):
        gpu_tensor = tf.constant([[1.0, 2.0], [3.0, 4.0]])
    print(f"GPU tensor device: {gpu_tensor.device}")
else:
    print("No GPU available -- skipping GPU placement example.")

<a name="4"></a>
## 4. Eager Execution & GradientTape

### Why Eager Execution Matters

In TF 1.x, you first built a computation graph and then ran it inside a `tf.Session()`. Debugging was painful because you could not inspect intermediate values easily.

In TF 2.x, **eager execution** means operations execute immediately and return concrete values. This makes TensorFlow feel like NumPy and allows standard Python debugging tools (print, pdb, etc.).

### Automatic Differentiation with `tf.GradientTape`

`tf.GradientTape` records operations for automatic differentiation. TensorFlow "watches" `tf.Variable` objects by default. For `tf.constant`, you must call `tape.watch()` explicitly.

In [None]:
# ---- Simple Gradient Computation ----
# f(x) = x^2 + 2x + 1
# f'(x) = 2x + 2

x = tf.Variable(3.0)
with tf.GradientTape() as tape:
    y = x**2 + 2*x + 1  # y = x^2 + 2x + 1

grad = tape.gradient(y, x)  # dy/dx = 2x + 2 = 2(3) + 2 = 8
print(f"x = {x.numpy()}, y = {y.numpy()}, dy/dx = {grad.numpy()}")  # Should be 8.0

In [None]:
# ---- Gradients with respect to multiple variables ----
w = tf.Variable(2.0)
b = tf.Variable(1.0)
x_val = tf.constant(3.0)

with tf.GradientTape() as tape:
    y = w * x_val + b  # y = wx + b

# Compute gradients w.r.t. both w and b simultaneously
dw, db = tape.gradient(y, [w, b])
print(f"y = w*x + b = {y.numpy()}")
print(f"dy/dw = x = {dw.numpy()}")  # dy/dw = x = 3.0
print(f"dy/db = 1 = {db.numpy()}")  # dy/db = 1.0

In [None]:
# ---- Higher-Order Gradients ----
# f(x) = x^3
# f'(x) = 3x^2
# f''(x) = 6x

x = tf.Variable(2.0)
with tf.GradientTape() as t2:
    with tf.GradientTape() as t1:
        y = x ** 3                  # y = x^3
    dy_dx = t1.gradient(y, x)       # f'(x) = 3x^2
d2y_dx2 = t2.gradient(dy_dx, x)    # f''(x) = 6x

print(f"f(x) = x^3 at x = 2")
print(f"  f(2)   = {y.numpy()}")         # 8
print(f"  f'(2)  = {dy_dx.numpy()}")     # 12
print(f"  f''(2) = {d2y_dx2.numpy()}")   # 12

In [None]:
# ---- Persistent Tape (for multiple gradient calls) ----
x = tf.Variable(3.0)
with tf.GradientTape(persistent=True) as tape:
    y = x ** 2
    z = x ** 3

dy_dx = tape.gradient(y, x)  # 2x = 6
dz_dx = tape.gradient(z, x)  # 3x^2 = 27
print(f"dy/dx = {dy_dx.numpy()}, dz/dx = {dz_dx.numpy()}")

# Important: delete persistent tapes to free resources
del tape

In [None]:
# ---- Watching Constants ----
x = tf.constant(3.0)  # Constants are NOT watched by default
with tf.GradientTape() as tape:
    tape.watch(x)
    y = x ** 2

grad = tape.gradient(y, x)
print(f"Gradient of x^2 at x=3 (constant): {grad.numpy()}")

In [None]:
# ---- Visualizing a Gradient: Tangent Line to a Curve ----
x_range = np.linspace(-3, 3, 100)

# Compute f(x) = x^2 and its gradient at a specific point
x_point = tf.Variable(1.5)
with tf.GradientTape() as tape:
    y_point = x_point ** 2
slope = tape.gradient(y_point, x_point)

# Tangent line: y = f(a) + f'(a)(x - a)
tangent = y_point.numpy() + slope.numpy() * (x_range - x_point.numpy())

plt.figure(figsize=(8, 5))
plt.plot(x_range, x_range**2, 'b-', linewidth=2, label='$f(x) = x^2$')
plt.plot(x_range, tangent, 'r--', linewidth=2, label=f'Tangent at x={x_point.numpy()}')
plt.plot(x_point.numpy(), y_point.numpy(), 'ro', markersize=10)
plt.xlabel('x')
plt.ylabel('f(x)')
plt.title(f'GradientTape: Tangent to $x^2$ at x={x_point.numpy()}, slope={slope.numpy()}')
plt.legend()
plt.grid(True, alpha=0.3)
plt.ylim(-2, 9)
plt.tight_layout()
plt.show()

<a name="5"></a>
## 5. Building Models -- Three Ways

TensorFlow/Keras provides three progressively more flexible APIs for building models:

| API | Best For | Flexibility | Ease |
|-----|----------|------------|------|
| **Sequential** | Simple linear stacks of layers | Low | High |
| **Functional** | Multi-input/output, shared layers, residual connections | Medium | Medium |
| **Subclassing** | Full control, dynamic architectures, research | High | Low |

### 5a. Sequential API

In [None]:
# The Sequential API: a simple linear stack of layers
sequential_model = tf.keras.Sequential([
    tf.keras.layers.Dense(128, activation='relu', input_shape=(784,)),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10, activation='softmax')
])

sequential_model.summary()

In [None]:
# You can also build it incrementally
seq_model_v2 = tf.keras.Sequential(name='incremental_model')
seq_model_v2.add(tf.keras.layers.Dense(128, activation='relu', input_shape=(784,)))
seq_model_v2.add(tf.keras.layers.Dropout(0.2))
seq_model_v2.add(tf.keras.layers.Dense(64, activation='relu'))
seq_model_v2.add(tf.keras.layers.Dropout(0.2))
seq_model_v2.add(tf.keras.layers.Dense(10, activation='softmax'))

print(f"Model: {seq_model_v2.name}, Layers: {len(seq_model_v2.layers)}")

### 5b. Functional API

The Functional API allows you to build models with non-linear topology, shared layers, and multiple inputs/outputs.

In [None]:
# Functional API: same architecture as above but using function-call syntax
inputs = tf.keras.Input(shape=(784,), name='input_features')
x = tf.keras.layers.Dense(128, activation='relu', name='dense_1')(inputs)
x = tf.keras.layers.Dropout(0.2, name='dropout_1')(x)
x = tf.keras.layers.Dense(64, activation='relu', name='dense_2')(x)
x = tf.keras.layers.Dropout(0.2, name='dropout_2')(x)
outputs = tf.keras.layers.Dense(10, activation='softmax', name='predictions')(x)

functional_model = tf.keras.Model(inputs=inputs, outputs=outputs, name='functional_model')
functional_model.summary()

In [None]:
# Functional API excels at multi-input models
# Example: a model that takes an image and metadata as separate inputs
image_input = tf.keras.Input(shape=(784,), name='image')
metadata_input = tf.keras.Input(shape=(10,), name='metadata')

# Image branch
x1 = tf.keras.layers.Dense(64, activation='relu')(image_input)
x1 = tf.keras.layers.Dense(32, activation='relu')(x1)

# Metadata branch
x2 = tf.keras.layers.Dense(16, activation='relu')(metadata_input)

# Merge branches
merged = tf.keras.layers.concatenate([x1, x2])
output = tf.keras.layers.Dense(10, activation='softmax')(merged)

multi_input_model = tf.keras.Model(
    inputs=[image_input, metadata_input],
    outputs=output,
    name='multi_input_model'
)
multi_input_model.summary()

### 5c. Model Subclassing API

For maximum flexibility, subclass `tf.keras.Model`. This is similar to PyTorch's approach and is common in research.

In [None]:
class CustomModel(tf.keras.Model):
    """A custom model built by subclassing tf.keras.Model."""

    def __init__(self, num_classes=10):
        super().__init__()
        self.dense1 = tf.keras.layers.Dense(128, activation='relu')
        self.dropout1 = tf.keras.layers.Dropout(0.2)
        self.dense2 = tf.keras.layers.Dense(64, activation='relu')
        self.dropout2 = tf.keras.layers.Dropout(0.2)
        self.classifier = tf.keras.layers.Dense(num_classes, activation='softmax')

    def call(self, inputs, training=False):
        x = self.dense1(inputs)
        x = self.dropout1(x, training=training)
        x = self.dense2(x)
        x = self.dropout2(x, training=training)
        return self.classifier(x)


subclass_model = CustomModel(num_classes=10)

# Subclassed models need a forward pass to build
subclass_model.build(input_shape=(None, 784))
subclass_model.summary()

# Verify it works with a dummy input
dummy_input = tf.random.normal([2, 784])
output = subclass_model(dummy_input, training=False)
print(f"\nOutput shape: {output.shape}")
print(f"Output sums to 1 (softmax): {tf.reduce_sum(output, axis=1).numpy()}")

<a name="6"></a>
## 6. Training Loop Deep Dive

### 6.1 Training with `model.fit()` (High-Level)

The simplest way to train a model. Handles batching, metrics, callbacks, and more.

In [None]:
# Generate synthetic data for demonstration
np.random.seed(42)
X_synthetic = np.random.randn(1000, 784).astype(np.float32)
y_synthetic = np.random.randint(0, 10, size=(1000,))

# Build and compile model
fit_model = tf.keras.Sequential([
    tf.keras.layers.Dense(128, activation='relu', input_shape=(784,)),
    tf.keras.layers.Dense(10, activation='softmax')
])

fit_model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# Train
history = fit_model.fit(
    X_synthetic, y_synthetic,
    epochs=5,
    batch_size=32,
    validation_split=0.2,
    verbose=1
)

print(f"\nFinal train accuracy: {history.history['accuracy'][-1]:.4f}")
print(f"Final val accuracy:   {history.history['val_accuracy'][-1]:.4f}")

### 6.2 Custom Training Loop with GradientTape

When you need full control: custom losses, gradient manipulation, multi-model training (GANs), etc.

In [None]:
# Build model
custom_loop_model = tf.keras.Sequential([
    tf.keras.layers.Dense(128, activation='relu', input_shape=(784,)),
    tf.keras.layers.Dense(10, activation='softmax')
])

# Setup
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-3)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()
train_acc_metric = tf.keras.metrics.SparseCategoricalAccuracy()
val_acc_metric = tf.keras.metrics.SparseCategoricalAccuracy()

# Create datasets
X_train_ct = X_synthetic[:800]
y_train_ct = y_synthetic[:800]
X_val_ct = X_synthetic[800:]
y_val_ct = y_synthetic[800:]

train_dataset = tf.data.Dataset.from_tensor_slices((X_train_ct, y_train_ct)).shuffle(800).batch(32)
val_dataset = tf.data.Dataset.from_tensor_slices((X_val_ct, y_val_ct)).batch(32)

# ---- The @tf.function decorator compiles the function into a TF graph for speed ----
@tf.function
def train_step(x_batch, y_batch):
    with tf.GradientTape() as tape:
        predictions = custom_loop_model(x_batch, training=True)
        loss = loss_fn(y_batch, predictions)
    gradients = tape.gradient(loss, custom_loop_model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, custom_loop_model.trainable_variables))
    train_acc_metric.update_state(y_batch, predictions)
    return loss

@tf.function
def val_step(x_batch, y_batch):
    predictions = custom_loop_model(x_batch, training=False)
    val_acc_metric.update_state(y_batch, predictions)

# Training loop
epochs = 5
for epoch in range(epochs):
    # Training
    for step, (x_batch, y_batch) in enumerate(train_dataset):
        loss = train_step(x_batch, y_batch)

    train_acc = train_acc_metric.result()

    # Validation
    for x_batch, y_batch in val_dataset:
        val_step(x_batch, y_batch)

    val_acc = val_acc_metric.result()

    print(f"Epoch {epoch + 1}/{epochs} -- "
          f"loss: {loss:.4f}, "
          f"train_acc: {train_acc:.4f}, "
          f"val_acc: {val_acc:.4f}")

    # Reset metrics
    train_acc_metric.reset_state()
    val_acc_metric.reset_state()

### 6.3 Understanding `@tf.function` and Tracing

`@tf.function` converts a Python function into a TensorFlow graph. This can dramatically speed up execution because:

1. The graph is optimized (constant folding, operator fusion)
2. Operations run in C++ without Python overhead
3. Graphs can be exported for serving

**Tracing**: The first time a `@tf.function`-decorated function is called, TensorFlow "traces" it -- running the Python code once to record the operations. Subsequent calls with the same input signature reuse the compiled graph.

In [None]:
# Demonstrating @tf.function tracing
@tf.function
def my_function(x):
    print("Tracing!")  # Only printed during tracing, NOT on every call
    return x * x + 2 * x + 1

# First call -- traces the function
print("Call 1:")
result1 = my_function(tf.constant(3.0))
print(f"  Result: {result1.numpy()}")

# Second call with same dtype/shape -- reuses the traced graph
print("\nCall 2 (same signature -- no retracing):")
result2 = my_function(tf.constant(4.0))
print(f"  Result: {result2.numpy()}")

# Call with different dtype -- triggers retracing
print("\nCall 3 (different dtype -- retraces):")
result3 = my_function(tf.constant(5))
print(f"  Result: {result3.numpy()}")

<a name="7"></a>
## 7. Callbacks System

Callbacks allow you to hook into the training process at various points. TF/Keras provides many built-in callbacks, and you can create custom ones.

### 7.1 Built-in Callbacks

In [None]:
import os
import tempfile

# Create a temporary directory for checkpoints and logs
tmpdir = tempfile.mkdtemp()

callbacks = [
    # Stop training when validation loss stops improving
    tf.keras.callbacks.EarlyStopping(
        monitor='val_loss',
        patience=3,
        restore_best_weights=True,
        verbose=1
    ),

    # Save the best model
    tf.keras.callbacks.ModelCheckpoint(
        filepath=os.path.join(tmpdir, 'best_model.keras'),
        monitor='val_loss',
        save_best_only=True,
        verbose=1
    ),

    # Reduce learning rate when a metric has stopped improving
    tf.keras.callbacks.ReduceLROnPlateau(
        monitor='val_loss',
        factor=0.5,
        patience=2,
        min_lr=1e-6,
        verbose=1
    ),

    # TensorBoard logging
    tf.keras.callbacks.TensorBoard(
        log_dir=os.path.join(tmpdir, 'logs'),
        histogram_freq=1
    )
]

print("Callbacks configured:")
for cb in callbacks:
    print(f"  - {cb.__class__.__name__}")
print(f"\nCheckpoint dir: {tmpdir}")

### 7.2 Custom Callback

In [None]:
class DetailedProgressCallback(tf.keras.callbacks.Callback):
    """Custom callback that tracks and displays detailed training progress."""

    def __init__(self):
        super().__init__()
        self.history = {'loss': [], 'val_loss': [], 'lr': []}

    def on_train_begin(self, logs=None):
        print("=" * 60)
        print("Training started")
        print(f"Optimizer: {self.model.optimizer.__class__.__name__}")
        print(f"Trainable parameters: {self.model.count_params():,}")
        print("=" * 60)

    def on_epoch_end(self, epoch, logs=None):
        lr = float(tf.keras.backend.get_value(self.model.optimizer.learning_rate))
        self.history['loss'].append(logs['loss'])
        self.history['val_loss'].append(logs.get('val_loss', 0))
        self.history['lr'].append(lr)
        if epoch % 5 == 0:
            print(f"  Epoch {epoch}: loss={logs['loss']:.4f}, "
                  f"val_loss={logs.get('val_loss', 0):.4f}, lr={lr:.6f}")

    def on_train_end(self, logs=None):
        print("=" * 60)
        print(f"Training complete after {len(self.history['loss'])} epochs")
        print(f"Best loss: {min(self.history['loss']):.4f}")
        if self.history['val_loss'] and any(v > 0 for v in self.history['val_loss']):
            print(f"Best val_loss: {min(v for v in self.history['val_loss'] if v > 0):.4f}")
        print("=" * 60)


# Quick demo of the custom callback
cb_model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation='relu', input_shape=(784,)),
    tf.keras.layers.Dense(10, activation='softmax')
])
cb_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

progress_cb = DetailedProgressCallback()
cb_model.fit(
    X_synthetic, y_synthetic,
    epochs=10,
    batch_size=32,
    validation_split=0.2,
    callbacks=[progress_cb],
    verbose=0  # Suppress default output; our callback handles it
)

<a name="8"></a>
## 8. tf.data Pipeline

The `tf.data` API makes it easy to build efficient, scalable input pipelines. It handles batching, shuffling, prefetching, and parallel data loading.

### 8.1 Creating Datasets

In [None]:
# ---- From NumPy arrays ----
X_data = np.random.randn(100, 10).astype(np.float32)
y_data = np.random.randint(0, 2, 100)

dataset = tf.data.Dataset.from_tensor_slices((X_data, y_data))
print(f"Dataset element spec: {dataset.element_spec}")
print(f"Dataset cardinality: {dataset.cardinality().numpy()}")

# Inspect first element
for x, y in dataset.take(1):
    print(f"\nFirst element -- x shape: {x.shape}, y: {y.numpy()}")

In [None]:
# ---- From a generator (useful for large datasets that do not fit in memory) ----
def data_generator():
    for i in range(50):
        yield np.random.randn(10).astype(np.float32), np.int32(i % 5)

gen_dataset = tf.data.Dataset.from_generator(
    data_generator,
    output_signature=(
        tf.TensorSpec(shape=(10,), dtype=tf.float32),
        tf.TensorSpec(shape=(), dtype=tf.int32)
    )
)

for x, y in gen_dataset.take(2):
    print(f"Generator sample -- x shape: {x.shape}, label: {y.numpy()}")

### 8.2 Transformations: map, batch, shuffle, prefetch, cache

In [None]:
# ---- Complete pipeline example ----
def preprocess(x, y):
    """Example preprocessing: normalize features."""
    x = (x - tf.reduce_mean(x)) / (tf.math.reduce_std(x) + 1e-7)
    return x, y

def create_dataset(images, labels, batch_size=32, is_training=True):
    """Create an optimized tf.data pipeline."""
    dataset = tf.data.Dataset.from_tensor_slices((images, labels))

    # Cache the dataset in memory after the first epoch
    dataset = dataset.cache()

    if is_training:
        dataset = dataset.shuffle(buffer_size=10000)

    # Apply preprocessing in parallel
    dataset = dataset.map(preprocess, num_parallel_calls=tf.data.AUTOTUNE)

    # Batch the data
    dataset = dataset.batch(batch_size)

    # Prefetch the next batch while the current one is being processed
    dataset = dataset.prefetch(tf.data.AUTOTUNE)

    return dataset


# Create training and validation datasets
train_ds = create_dataset(X_data[:80], y_data[:80], batch_size=16, is_training=True)
val_ds = create_dataset(X_data[80:], y_data[80:], batch_size=16, is_training=False)

print("Pipeline element spec:")
print(f"  {train_ds.element_spec}")

for x_batch, y_batch in train_ds.take(1):
    print(f"\nBatch shapes -- x: {x_batch.shape}, y: {y_batch.shape}")
    print(f"Mean after normalization: {tf.reduce_mean(x_batch).numpy():.4f} (should be near 0)")

In [None]:
# ---- Performance Tips for tf.data ----
print("tf.data Performance Best Practices:")
print("="* 50)
print("1. .cache()       -- Cache data in memory after first read")
print("2. .shuffle(N)    -- Shuffle with buffer size N (use large N for good randomness)")
print("3. .map(fn, num_parallel_calls=AUTOTUNE)")
print("                  -- Parallelize map operations")
print("4. .batch(N)      -- Batch after shuffle and map")
print("5. .prefetch(AUTOTUNE)")
print("                  -- Overlap data loading with training")
print()
print("Recommended order: cache -> shuffle -> map -> batch -> prefetch")
print(f"\nAUTOTUNE value: {tf.data.AUTOTUNE}")

<a name="9"></a>
## 9. Practical Example: MNIST Classifier

A complete end-to-end example: load real data, build a model, train, evaluate, and visualize results.

In [None]:
# ---- Load and preprocess MNIST ----
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Normalize pixel values to [0, 1]
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

print(f"Training set:   {x_train.shape}, labels: {y_train.shape}")
print(f"Test set:       {x_test.shape}, labels: {y_test.shape}")
print(f"Pixel range:    [{x_train.min()}, {x_train.max()}]")
print(f"Label range:    [{y_train.min()}, {y_train.max()}]")
print(f"Label distribution: {np.bincount(y_train)}")

In [None]:
# ---- Visualize some samples ----
fig, axes = plt.subplots(2, 5, figsize=(12, 5))
for i, ax in enumerate(axes.flat):
    ax.imshow(x_train[i], cmap='gray')
    ax.set_title(f"Label: {y_train[i]}", fontsize=12)
    ax.axis('off')
plt.suptitle('MNIST Sample Images', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

In [None]:
# ---- Build model ----
mnist_model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10, activation='softmax')
])

mnist_model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

mnist_model.summary()

In [None]:
# ---- Train ----
mnist_history = mnist_model.fit(
    x_train, y_train,
    epochs=10,
    batch_size=128,
    validation_split=0.2,
    callbacks=[
        tf.keras.callbacks.EarlyStopping(patience=3, restore_best_weights=True, verbose=1)
    ],
    verbose=1
)

In [None]:
# ---- Plot Training History ----
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Loss
ax1.plot(mnist_history.history['loss'], label='Train Loss', linewidth=2)
ax1.plot(mnist_history.history['val_loss'], label='Val Loss', linewidth=2)
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Loss')
ax1.set_title('Training & Validation Loss')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Accuracy
ax2.plot(mnist_history.history['accuracy'], label='Train Accuracy', linewidth=2)
ax2.plot(mnist_history.history['val_accuracy'], label='Val Accuracy', linewidth=2)
ax2.set_xlabel('Epoch')
ax2.set_ylabel('Accuracy')
ax2.set_title('Training & Validation Accuracy')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.suptitle('MNIST Training History', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

In [None]:
# ---- Evaluate on test set ----
test_loss, test_acc = mnist_model.evaluate(x_test, y_test, verbose=0)
print(f"Test loss:     {test_loss:.4f}")
print(f"Test accuracy: {test_acc:.4f}")

In [None]:
# ---- Confusion Matrix ----
y_pred_probs = mnist_model.predict(x_test, verbose=0)
y_pred = np.argmax(y_pred_probs, axis=1)

# Compute confusion matrix
confusion_mtx = tf.math.confusion_matrix(y_test, y_pred, num_classes=10).numpy()

plt.figure(figsize=(10, 8))
plt.imshow(confusion_mtx, interpolation='nearest', cmap='Blues')
plt.title('Confusion Matrix', fontsize=14, fontweight='bold')
plt.colorbar()
tick_marks = np.arange(10)
plt.xticks(tick_marks, tick_marks)
plt.yticks(tick_marks, tick_marks)

# Add text annotations
thresh = confusion_mtx.max() / 2.0
for i in range(10):
    for j in range(10):
        plt.text(j, i, format(confusion_mtx[i, j], 'd'),
                 ha='center', va='center',
                 color='white' if confusion_mtx[i, j] > thresh else 'black')

plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.tight_layout()
plt.show()

# Per-class accuracy
print("\nPer-class accuracy:")
for cls in range(10):
    cls_mask = y_test == cls
    cls_acc = np.mean(y_pred[cls_mask] == y_test[cls_mask])
    print(f"  Digit {cls}: {cls_acc:.4f} ({np.sum(cls_mask)} samples)")

In [None]:
# ---- Visualize Predictions ----
fig, axes = plt.subplots(2, 5, figsize=(14, 6))

# Pick some random test samples
np.random.seed(42)
indices = np.random.choice(len(x_test), 10, replace=False)

for i, (ax, idx) in enumerate(zip(axes.flat, indices)):
    ax.imshow(x_test[idx], cmap='gray')
    pred_label = y_pred[idx]
    true_label = y_test[idx]
    confidence = y_pred_probs[idx][pred_label] * 100

    color = 'green' if pred_label == true_label else 'red'
    ax.set_title(f"Pred: {pred_label} ({confidence:.1f}%)\nTrue: {true_label}",
                 fontsize=10, color=color)
    ax.axis('off')

plt.suptitle('Sample Predictions (green=correct, red=incorrect)',
             fontsize=13, fontweight='bold')
plt.tight_layout()
plt.show()

<a name="10"></a>
## 10. Practical Example: Linear Regression (Updated from TF1)

The original TF1.x notebook used `tf.placeholder`, `tf.Session`, and manual optimizer calls. Here is the modern TF2 equivalent.

### Original TF1.x Code (for reference)
```python
# TF 1.x -- DO NOT RUN
W = tf.Variable([.3], tf.float32)
b = tf.Variable([-.3], tf.float32)
x = tf.placeholder(tf.float32)
linear_model = W * x + b
y = tf.placeholder(tf.float32)
loss = tf.reduce_sum(tf.square(linear_model - y))
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)

sess = tf.Session()
sess.run(tf.global_variables_initializer())
for i in range(1000):
    sess.run(train, {x: [1,2,3,4], y: [0,-1,-2,-3]})
```

### Modern TF2 Version

In [None]:
# ---- Modern Linear Regression with Keras ----
X = np.array([1, 2, 3, 4], dtype=np.float32)
Y = np.array([0, -1, -2, -3], dtype=np.float32)

# A single Dense layer with 1 unit IS a linear regression: y = Wx + b
lr_model = tf.keras.Sequential([
    tf.keras.layers.Dense(1, input_shape=(1,))
])

lr_model.compile(
    optimizer=tf.keras.optimizers.SGD(learning_rate=0.01),
    loss='mse'
)

lr_history = lr_model.fit(X, Y, epochs=1000, verbose=0)

W, b = lr_model.layers[0].get_weights()
print(f"Learned parameters:")
print(f"  W: {W.flatten()[0]:.4f}  (expected: -1.0)")
print(f"  b: {b[0]:.4f}  (expected: 1.0)")
print(f"  Final loss: {lr_history.history['loss'][-1]:.6f}")

# Predictions
predictions = lr_model.predict(X, verbose=0)
print(f"\nPredictions vs Ground Truth:")
for xi, yi, pi in zip(X, Y, predictions.flatten()):
    print(f"  x={xi:.0f}: predicted={pi:.4f}, actual={yi:.0f}")

In [None]:
# ---- Visualize the fit ----
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Plot the regression line
x_plot = np.linspace(0, 5, 100)
y_plot = W.flatten()[0] * x_plot + b[0]

ax1.scatter(X, Y, c='red', s=100, zorder=5, label='Data points')
ax1.plot(x_plot, y_plot, 'b-', linewidth=2, label=f'Fit: y = {W.flatten()[0]:.3f}x + {b[0]:.3f}')
ax1.set_xlabel('x')
ax1.set_ylabel('y')
ax1.set_title('Linear Regression Fit')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Plot the loss curve
ax2.plot(lr_history.history['loss'], linewidth=2)
ax2.set_xlabel('Epoch')
ax2.set_ylabel('MSE Loss')
ax2.set_title('Training Loss Over Epochs')
ax2.set_yscale('log')
ax2.grid(True, alpha=0.3)

plt.suptitle('Linear Regression (Modern TF2)', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

In [None]:
# ---- Linear Regression with GradientTape (manual approach) ----
# This shows the low-level approach equivalent to the TF1.x version

W_manual = tf.Variable(0.3, dtype=tf.float32, name='W')
b_manual = tf.Variable(-0.3, dtype=tf.float32, name='b')

X_t = tf.constant([1.0, 2.0, 3.0, 4.0])
Y_t = tf.constant([0.0, -1.0, -2.0, -3.0])

optimizer_manual = tf.keras.optimizers.SGD(learning_rate=0.01)

losses_manual = []
for step in range(1000):
    with tf.GradientTape() as tape:
        predictions = W_manual * X_t + b_manual
        loss = tf.reduce_sum(tf.square(predictions - Y_t))

    gradients = tape.gradient(loss, [W_manual, b_manual])
    optimizer_manual.apply_gradients(zip(gradients, [W_manual, b_manual]))
    losses_manual.append(loss.numpy())

print("Manual GradientTape Linear Regression:")
print(f"  W: {W_manual.numpy():.4f}  (expected: -1.0)")
print(f"  b: {b_manual.numpy():.4f}  (expected: 1.0)")
print(f"  Final loss: {losses_manual[-1]:.6f}")

<a name="11"></a>
## 11. Saving & Loading Models

TensorFlow provides multiple ways to save models depending on your use case.

### 11.1 SavedModel Format (Recommended for Production)

In [None]:
import tempfile

save_dir = tempfile.mkdtemp()

# ---- SavedModel format (TF's native format) ----
# Includes: architecture, weights, optimizer state, computation graph
savedmodel_path = os.path.join(save_dir, 'mnist_savedmodel')
mnist_model.save(savedmodel_path)
print(f"SavedModel saved to: {savedmodel_path}")

# Load it back
loaded_savedmodel = tf.keras.models.load_model(savedmodel_path)
loaded_loss, loaded_acc = loaded_savedmodel.evaluate(x_test, y_test, verbose=0)
print(f"Loaded SavedModel -- Test accuracy: {loaded_acc:.4f}")

### 11.2 Keras Format (.keras)

In [None]:
# ---- Keras native format (.keras) ----
keras_path = os.path.join(save_dir, 'mnist_model.keras')
mnist_model.save(keras_path)
print(f"Keras model saved to: {keras_path}")

loaded_keras = tf.keras.models.load_model(keras_path)
keras_loss, keras_acc = loaded_keras.evaluate(x_test, y_test, verbose=0)
print(f"Loaded Keras model -- Test accuracy: {keras_acc:.4f}")

### 11.3 Saving Weights Only

In [None]:
# ---- Weights only ----
# Useful when you want to save just the learned parameters
weights_path = os.path.join(save_dir, 'mnist_weights.weights.h5')
mnist_model.save_weights(weights_path)
print(f"Weights saved to: {weights_path}")

# To load weights, you need to recreate the model architecture first
new_model = tf.keras.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10, activation='softmax')
])
new_model.load_weights(weights_path)
new_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

wt_loss, wt_acc = new_model.evaluate(x_test, y_test, verbose=0)
print(f"Weights-only reload -- Test accuracy: {wt_acc:.4f}")

### 11.4 TF Lite Conversion Basics

In [None]:
# ---- TF Lite conversion ----
# Convert a saved model to TF Lite for mobile/edge deployment
converter = tf.lite.TFLiteConverter.from_saved_model(savedmodel_path)
tflite_model = converter.convert()

tflite_path = os.path.join(save_dir, 'mnist_model.tflite')
with open(tflite_path, 'wb') as f:
    f.write(tflite_model)

original_size = os.path.getsize(keras_path)
tflite_size = os.path.getsize(tflite_path)

print(f"TF Lite model saved to: {tflite_path}")
print(f"Keras model size:  {original_size / 1024:.1f} KB")
print(f"TF Lite model size: {tflite_size / 1024:.1f} KB")
print(f"Size reduction: {(1 - tflite_size / original_size) * 100:.1f}%")

In [None]:
# ---- Summary of save formats ----
print("Model Saving Formats Summary")
print("=" * 65)
print(f"{'Format':<20} {'Includes':<30} {'Use Case'}")
print("-" * 65)
print(f"{'SavedModel':<20} {'Arch + Weights + Graph':<30} {'Production serving'}")
print(f"{'.keras':<20} {'Arch + Weights + Config':<30} {'General purpose'}")
print(f"{'Weights (.h5)':<20} {'Weights only':<30} {'Transfer learning'}")
print(f"{'TF Lite (.tflite)':<20} {'Optimized weights + graph':<30} {'Mobile / Edge'}")

<a name="12"></a>
## 12. Performance Tips

### 12.1 `@tf.function` Best Practices

In [None]:
import time

# ---- Comparing eager vs @tf.function performance ----
def eager_computation(x):
    for _ in range(100):
        x = tf.nn.relu(tf.matmul(x, tf.random.normal([100, 100])))
    return x

@tf.function
def graph_computation(x):
    for _ in range(100):
        x = tf.nn.relu(tf.matmul(x, tf.random.normal([100, 100])))
    return x

x_perf = tf.random.normal([10, 100])

# Warm up the graph function (first call traces)
_ = graph_computation(x_perf)

# Time eager execution
start = time.time()
for _ in range(10):
    _ = eager_computation(x_perf)
eager_time = time.time() - start

# Time graph execution
start = time.time()
for _ in range(10):
    _ = graph_computation(x_perf)
graph_time = time.time() - start

print(f"Eager execution:  {eager_time:.4f}s")
print(f"@tf.function:     {graph_time:.4f}s")
print(f"Speedup:          {eager_time / graph_time:.2f}x")

In [None]:
# ---- @tf.function Best Practices ----
print("@tf.function Best Practices")
print("=" * 55)
print()
print("DO:")
print("  - Use for training steps, inference, and any")
print("    computation-heavy function")
print("  - Use tf.TensorSpec or input_signature to control")
print("    retracing")
print("  - Use tf.print() instead of print() inside")
print("    @tf.function")
print("  - Use tf.cond/tf.while_loop for control flow")
print()
print("DO NOT:")
print("  - Use Python side effects (list.append, dict update)")
print("  - Create tf.Variable inside @tf.function")
print("  - Pass Python objects that change between calls")
print("  - Use Python print() for runtime debugging (only")
print("    runs during tracing)")

### 12.2 Mixed Precision Training

In [None]:
# ---- Mixed Precision Training ----
# Uses float16 for computation and float32 for accumulation.
# Can provide 2-3x speedup on modern GPUs (V100, A100, T4).

from tensorflow.keras import mixed_precision

# Check current policy
print(f"Current dtype policy: {mixed_precision.global_policy()}")

# To enable mixed precision (uncomment on GPU):
# mixed_precision.set_global_policy('mixed_float16')
# print(f"New dtype policy: {mixed_precision.global_policy()}")

# Build a model with mixed precision
# The compute dtype will be float16 but the variable dtype remains float32
# You MUST use a float32 output (or cast) for numerical stability in the loss

print("\nMixed Precision Notes:")
print("  - Speeds up training on GPUs with Tensor Cores (V100, A100, T4)")
print("  - Forward/backward pass uses float16 for speed")
print("  - Weight updates use float32 for numerical stability")
print("  - The final Dense layer should output float32")
print("  - Use tf.keras.mixed_precision.set_global_policy('mixed_float16')")

### 12.3 XLA Compilation

In [None]:
# ---- XLA (Accelerated Linear Algebra) ----
# XLA compiles TF operations into optimized machine code.
# It can fuse operations, eliminate dead code, and optimize memory.

# Method 1: jit_compile on tf.function
@tf.function(jit_compile=True)
def xla_computation(x):
    return tf.nn.relu(tf.matmul(x, tf.ones([100, 100])) + 1.0)

result = xla_computation(tf.random.normal([5, 100]))
print(f"XLA computation result shape: {result.shape}")

# Method 2: Set jit_compile in model.compile
# model.compile(optimizer='adam', loss='mse', jit_compile=True)

print("\nXLA Compilation Notes:")
print("  - Fuses multiple operations into single kernels")
print("  - Reduces memory overhead")
print("  - Works best with fixed-shape tensors")
print("  - Can cause slowdowns if shapes change frequently")
print("  - Enable with @tf.function(jit_compile=True) or")
print("    model.compile(..., jit_compile=True)")

### 12.4 Memory Management

In [None]:
# ---- GPU Memory Management ----
print("GPU Memory Management Strategies")
print("=" * 50)

print("""
1. Memory Growth (recommended):
   Allow GPU memory to grow as needed instead of
   pre-allocating all memory.

   gpus = tf.config.list_physical_devices('GPU')
   for gpu in gpus:
       tf.config.experimental.set_memory_growth(gpu, True)

2. Memory Limit:
   Restrict TensorFlow to a fixed amount of GPU memory.

   tf.config.set_logical_device_configuration(
       gpus[0],
       [tf.config.LogicalDeviceConfiguration(memory_limit=4096)]  # 4 GB
   )

3. Gradient Checkpointing:
   Trade compute for memory by recomputing activations
   during the backward pass instead of storing them.

   # Apply to specific layers:
   tf.recompute_grad(my_function)

4. Reduce Batch Size:
   The simplest way to reduce memory usage.

5. Use tf.data Pipelines:
   Load data in batches instead of all at once.

IMPORTANT: Memory growth must be set BEFORE any GPU
operations are performed (typically at the top of
your script).
""")

<a name="13"></a>
## 13. Exercises

Practice what you have learned by completing the following exercises.

### Exercise 1: Implement a Custom Layer

Create a custom Dense layer from scratch by subclassing `tf.keras.layers.Layer`.

In [None]:
class MyDenseLayer(tf.keras.layers.Layer):
    """A custom dense (fully connected) layer built from scratch.
    
    Exercise: Complete the implementation.
    The layer should:
    1. Create weight matrix W and bias vector b in build()
    2. Compute output = activation(input @ W + b) in call()
    """

    def __init__(self, units, activation=None, **kwargs):
        super().__init__(**kwargs)
        self.units = units
        self.activation = tf.keras.activations.get(activation)

    def build(self, input_shape):
        # TODO: Create self.w with shape (input_shape[-1], self.units)
        #       using self.add_weight(). Use 'glorot_uniform' initializer.
        # TODO: Create self.b with shape (self.units,)
        #       using self.add_weight(). Use 'zeros' initializer.
        pass

    def call(self, inputs):
        # TODO: Compute z = inputs @ self.w + self.b
        # TODO: Apply activation if it exists
        # TODO: Return the result
        pass

    def get_config(self):
        config = super().get_config()
        config.update({'units': self.units, 'activation': self.activation})
        return config


# Test your implementation:
# layer = MyDenseLayer(64, activation='relu')
# output = layer(tf.random.normal([2, 128]))
# print(f"Output shape: {output.shape}")  # Should be (2, 64)

### Exercise 2: Build a Multi-Input Model

Build a model using the Functional API that takes two inputs:
- Numerical features: shape (10,)
- Categorical embedding: shape (5,)

The model should merge both branches and output a single value (regression).

In [None]:
# TODO: Build a multi-input model
#
# numerical_input = tf.keras.Input(shape=(10,), name='numerical')
# categorical_input = tf.keras.Input(shape=(5,), name='categorical')
#
# Numerical branch: Dense(32, relu) -> Dense(16, relu)
# Categorical branch: Dense(16, relu)
# Merge: concatenate both branches
# Output head: Dense(16, relu) -> Dense(1)  (no activation for regression)
#
# multi_model = tf.keras.Model(
#     inputs=[numerical_input, categorical_input],
#     outputs=output
# )
# multi_model.summary()

print("Exercise: Implement the multi-input model described above.")

### Exercise 3: Learning Rate Finder

Implement a learning rate finder that trains for one epoch, gradually increasing the learning rate, and plots loss vs. learning rate to find the optimal range.

In [None]:
class LearningRateFinder(tf.keras.callbacks.Callback):
    """Finds the optimal learning rate range.
    
    Exercise: Complete the implementation.
    
    Strategy:
    1. Start with a very small learning rate (e.g., 1e-7)
    2. Increase it exponentially after each batch
    3. Record the loss at each step
    4. Stop when the loss diverges (e.g., > 4x the minimum loss)
    5. Plot loss vs. learning rate
    6. The optimal LR is typically just before the minimum loss
    """

    def __init__(self, min_lr=1e-7, max_lr=1.0):
        super().__init__()
        self.min_lr = min_lr
        self.max_lr = max_lr
        self.lrs = []
        self.losses = []

    def on_train_begin(self, logs=None):
        # TODO: Calculate the multiplication factor for the LR schedule
        # TODO: Set the initial learning rate
        pass

    def on_train_batch_end(self, batch, logs=None):
        # TODO: Record the current LR and loss
        # TODO: Increase the LR exponentially
        # TODO: Stop training if loss diverges
        pass

    def plot(self):
        # TODO: Plot loss vs learning rate (use log scale for LR)
        pass


# Usage:
# lr_finder = LearningRateFinder(min_lr=1e-7, max_lr=1.0)
# model.fit(x_train, y_train, epochs=1, callbacks=[lr_finder])
# lr_finder.plot()

print("Exercise: Implement the LearningRateFinder callback.")

### Exercise 4: Custom Training Loop with Gradient Accumulation

Implement a training loop that accumulates gradients over multiple mini-batches before applying an update. This is useful when you want a large effective batch size but do not have enough GPU memory.

In [None]:
def train_with_gradient_accumulation(
    model, dataset, optimizer, loss_fn,
    accumulation_steps=4, epochs=5
):
    """Train a model with gradient accumulation.
    
    Exercise: Complete the implementation.
    
    The effective batch size = mini_batch_size * accumulation_steps.
    
    Steps:
    1. Initialize gradient accumulators (zeros_like for each trainable var)
    2. For each mini-batch:
       a. Compute gradients
       b. Add them to the accumulators
    3. Every `accumulation_steps` batches:
       a. Divide accumulated gradients by accumulation_steps
       b. Apply gradients to the model
       c. Reset accumulators to zero
    """
    # TODO: Implement gradient accumulation training loop
    pass


# Usage:
# train_with_gradient_accumulation(
#     model=my_model,
#     dataset=train_dataset,
#     optimizer=tf.keras.optimizers.Adam(1e-3),
#     loss_fn=tf.keras.losses.SparseCategoricalCrossentropy(),
#     accumulation_steps=4,
#     epochs=5
# )

print("Exercise: Implement the gradient accumulation training loop.")
print("\nHint: The key insight is that gradients are additive.")
print("Accumulating gradients from 4 batches of size 32 is equivalent")
print("to computing gradients on a single batch of size 128.")

---

## Summary

This notebook covered the fundamentals of modern TensorFlow 2.x:

| Section | Key Takeaway |
|---------|-------------|
| **Tensors** | Multi-dimensional arrays -- the fundamental data structure |
| **Eager Execution** | Operations run immediately; no sessions needed |
| **GradientTape** | Automatic differentiation for computing gradients |
| **Model Building** | Sequential (simple), Functional (flexible), Subclassing (full control) |
| **Training** | `model.fit()` for simplicity, custom loops for control |
| **Callbacks** | Hook into training: EarlyStopping, Checkpoints, custom callbacks |
| **tf.data** | Efficient input pipelines with shuffle, batch, prefetch |
| **Saving** | SavedModel (production), .keras (general), weights-only (transfer) |
| **Performance** | `@tf.function`, mixed precision, XLA, memory management |

### Where to Go Next

- **CNNs**: `tf.keras.layers.Conv2D` for image tasks
- **RNNs/Transformers**: `tf.keras.layers.LSTM`, `tf.keras.layers.MultiHeadAttention`
- **Transfer Learning**: Load pre-trained models from TF Hub or `tf.keras.applications`
- **Distributed Training**: `tf.distribute.MirroredStrategy` for multi-GPU
- **TF Extended (TFX)**: End-to-end production ML pipelines
- **TensorFlow Probability**: Probabilistic reasoning and statistical analysis

### Resources

- [TensorFlow Official Tutorials](https://www.tensorflow.org/tutorials)
- [Keras Documentation](https://keras.io)
- [TF Guide: Effective TF2](https://www.tensorflow.org/guide/effective_tf2)
- [TF Guide: tf.function](https://www.tensorflow.org/guide/function)
- [TF Guide: tf.data](https://www.tensorflow.org/guide/data)