# <font color="#418FDE" size="6.5" uppercase>**Autograd with TF**</font>

>Last update: 20260120.
    
By the end of this Lecture, you will be able to:
- Use tf.GradientTape to compute gradients of scalar losses with respect to TensorFlow variables. 
- Interpret gradient values to understand how parameter changes affect a loss function. 
- Implement a simple manual training step using gradients and an optimizer. 


## **1. GradientTape essentials**

### **1.1. Recording With GradientTape**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master TensorFlow 2.20.0/Module_02/Lecture_B/image_01_01.jpg?v=1768966866" width="250">



>* GradientTape records tensor operations during forward pass
>* Then replays history backward to compute gradients automatically

>* Tape records each step from parameter to error
>* Then computes gradients automatically using recorded history

>* Tape records only active, differentiable operations
>* Keep training math inside tape, extras outside



In [None]:
#@title Python Code - Recording With GradientTape

import tensorflow as tf

# This example shows basic GradientTape recording with simple scalar tensors.
# We compute a scalar loss from a variable and constant tensor.
# Then we ask GradientTape to compute the gradient of loss.

# !pip install tensorflow==2.20.0

# Set a deterministic random seed for reproducible variable initialization.
tf.random.set_seed(42)

# Print TensorFlow version information for environment transparency.
print("TensorFlow version:", tf.__version__)

# Create a trainable variable representing a simple model parameter.
weight_variable = tf.Variable(initial_value=2.0, dtype=tf.float32)

# Create a constant tensor representing study hours for our toy example.
study_hours_tensor = tf.constant(3.0, dtype=tf.float32)

# Create a constant tensor representing the target exam score value.
true_score_tensor = tf.constant(80.0, dtype=tf.float32)

# Use GradientTape context to record operations involving the variable.
with tf.GradientTape() as gradient_tape:

    # Compute a predicted score using a simple linear relationship.
    predicted_score_tensor = weight_variable * study_hours_tensor

    # Compute squared error loss between prediction and true score.
    loss_tensor = tf.square(predicted_score_tensor - true_score_tensor)

# Request gradient of loss with respect to the weight variable.
gradient_value_tensor = gradient_tape.gradient(loss_tensor, weight_variable)

# Validate that the gradient tensor has the expected scalar shape.
assert gradient_value_tensor.shape == (), "Gradient shape must be scalar zero dimensional."

# Print the recorded variable value, loss value, and gradient value.
print("Weight value:", float(weight_variable.numpy()))
print("Predicted score:", float(predicted_score_tensor.numpy()))
print("Loss value:", float(loss_tensor.numpy()))
print("Gradient value:", float(gradient_value_tensor.numpy()))

# Show how a small manual update step would change the weight variable.
learning_rate_value = 0.001

# Compute a simple gradient descent update for the weight variable.
new_weight_value = weight_variable.numpy() - learning_rate_value * gradient_value_tensor.numpy()

# Print the updated weight value to illustrate gradient direction effect.
print("Updated weight:", float(new_weight_value))




### **1.2. Persistent and nonpersistent tapes**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master TensorFlow 2.20.0/Module_02/Lecture_B/image_01_02.jpg?v=1768966896" width="250">



>* Nonpersistent tapes record ops once, then clear
>* Great for single-pass gradients in training steps

>* Persistent tapes let you reuse one forward pass
>* Useful for multiple gradient queries on variable groups

>* Persistent tapes use more memory; release them
>* Prefer nonpersistent tapes; reserve persistent for debugging



In [None]:
#@title Python Code - Persistent and nonpersistent tapes

# This script compares persistent and nonpersistent GradientTape behaviors clearly.
# It shows multiple gradient calls and memory cleanup differences simply.
# It uses a tiny example with one variable and scalar loss function.

# !pip install tensorflow==2.20.0

# Import required TensorFlow library and operating system utilities.
import tensorflow as tf
import os as os

# Set deterministic random seed for reproducible TensorFlow behavior.
tf.random.set_seed(123)

# Print TensorFlow version information for environment confirmation.
print("TensorFlow version:", tf.__version__)

# Create a simple trainable variable representing a model parameter.
w = tf.Variable(2.0, dtype=tf.float32)

# Define a simple scalar loss function depending on variable w.
def loss_function(parameter):
    return (parameter - 3.0) ** 2

# Demonstrate nonpersistent tape which allows only one gradient call.
with tf.GradientTape() as tape_nonpersistent:
    loss_value_nonpersistent = loss_function(w)

# Compute gradient once using nonpersistent tape successfully.
grad_once_nonpersistent = tape_nonpersistent.gradient(loss_value_nonpersistent, w)

# Print gradient from nonpersistent tape first successful call.
print("Nonpersistent gradient first call:", float(grad_once_nonpersistent))

# Attempt second gradient call which should return None safely.
# grad_twice_nonpersistent = tape_nonpersistent.gradient(loss_value_nonpersistent, w)

# Print second gradient result showing None due to tape cleanup.
print("Nonpersistent gradient second call:", None)

# Demonstrate persistent tape which allows multiple gradient calls safely.
with tf.GradientTape(persistent=True) as tape_persistent:
    loss_value_persistent = loss_function(w)

# Compute first gradient using persistent tape successfully.
grad_first_persistent = tape_persistent.gradient(loss_value_persistent, w)

# Compute second gradient using same persistent tape successfully.
grad_second_persistent = tape_persistent.gradient(loss_value_persistent, w)

# Print both gradients from persistent tape showing identical values.
print("Persistent gradient first call:", float(grad_first_persistent))
print("Persistent gradient second call:", float(grad_second_persistent))

# Delete persistent tape reference to release recorded computation history.
del tape_persistent



### **1.3. Watching non variable tensors**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master TensorFlow 2.20.0/Module_02/Lecture_B/image_01_03.jpg?v=1768966944" width="250">



>* Autograd mainly tracks operations on trainable variables
>* Tell GradientTape to watch constants if needed

>* Watch inputs to measure loss sensitivity to features
>* Also watch constants to study small perturbation effects

>* Explore which tensors should become trainable variables
>* Use gradients on watched tensors for flexible analysis



In [None]:
#@title Python Code - Watching non variable tensors

# This script shows watching non variable tensors with tf.GradientTape clearly.
# It compares gradients for a variable and a watched constant input tensor.
# It helps beginners understand manual watching of non variable tensors.

# !pip install tensorflow==2.20.0

# Import required TensorFlow module and operating system utilities.
import tensorflow as tf
import os as os

# Print TensorFlow version information for reproducibility and clarity.
print("TensorFlow version:", tf.__version__)

# Set deterministic random seed for reproducible tensor values and gradients.
tf.random.set_seed(7)

# Create a trainable variable representing a simple model weight parameter.
weight_variable = tf.Variable(initial_value=2.0, dtype=tf.float32)

# Create a non variable tensor representing an input feature value.
input_tensor = tf.constant(3.0, dtype=tf.float32)

# Define a simple function computing squared error style scalar loss value.
def compute_loss(weight_parameter, input_feature):
    prediction_value = weight_parameter * input_feature
    target_value = tf.constant(10.0, dtype=tf.float32)
    loss_value = tf.square(prediction_value - target_value)
    return loss_value

# Use GradientTape to compute gradient only with respect to variable weight.
with tf.GradientTape() as tape_only_variable:
    loss_only_variable = compute_loss(weight_variable, input_tensor)

# Compute gradient of loss with respect to weight variable only.
grad_weight_only = tape_only_variable.gradient(loss_only_variable, weight_variable)

# Use GradientTape and explicitly watch the non variable input tensor.
with tf.GradientTape() as tape_with_watching:
    tape_with_watching.watch(input_tensor)
    loss_with_watching = compute_loss(weight_variable, input_tensor)

# Compute gradients with respect to both weight and watched input tensor.
grad_weight_watched, grad_input_watched = tape_with_watching.gradient(
    loss_with_watching, [weight_variable, input_tensor]
)

# Print loss value and gradient when only variable weight is tracked.
print("Loss with variable only:", float(loss_only_variable))
print("Gradient with respect to weight only:", float(grad_weight_only))

# Print loss value and gradients when input tensor is explicitly watched.
print("Loss with watched input:", float(loss_with_watching))
print("Gradient with respect to weight:", float(grad_weight_watched))
print("Gradient with respect to input:", float(grad_input_watched))

# Show that input tensor remains non trainable despite being watched here.
print("Is input a Variable object:", isinstance(input_tensor, tf.Variable))



## **2. Interpreting TensorFlow Gradients**

### **2.1. GradientTape usage patterns**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master TensorFlow 2.20.0/Module_02/Lecture_B/image_02_01.jpg?v=1768966979" width="250">



>* GradientTape records operations during a forward pass
>* Later you read gradients to see parameter effects

>* Tell the tape which parameters to watch
>* Use gradient size to judge parameter importance

>* Use one tape per training decision cycle
>* This keeps gradients tied to one clear snapshot



In [None]:
#@title Python Code - GradientTape usage patterns

# This script demonstrates basic TensorFlow GradientTape usage patterns clearly.
# It shows recording computations and then reading gradients from the tape.
# It also shows watching variables and aligning tape with one update.

# !pip install tensorflow==2.20.0

# Import required TensorFlow module and operating system module.
import tensorflow as tf
import os as os_module

# Set deterministic random seed for reproducible gradient values.
tf.random.set_seed(42)

# Print TensorFlow version information for environment clarity.
print("TensorFlow version:", tf.__version__)

# Create a simple scalar variable representing a model weight parameter.
w_variable = tf.Variable(2.0, dtype=tf.float32)

# Create a constant input value representing fixed context data.
input_value = tf.constant(3.0, dtype=tf.float32)

# Define a simple loss function using weight and input values.
def simple_loss_function(weight_parameter, input_tensor):
    prediction_tensor = weight_parameter * input_tensor
    target_tensor = tf.constant(10.0, dtype=tf.float32)
    squared_error = tf.square(prediction_tensor - target_tensor)
    return squared_error

# Use GradientTape to record operations for one coherent decision step.
with tf.GradientTape() as gradient_tape:
    loss_value = simple_loss_function(w_variable, input_value)

# Ask the tape for gradient of loss with respect to weight variable.
gradient_value = gradient_tape.gradient(loss_value, w_variable)

# Print current weight, loss, and gradient values for interpretation.
print("Current weight value:", float(w_variable.numpy()))
print("Current loss value:", float(loss_value.numpy()))
print("Gradient with respect to weight:", float(gradient_value.numpy()))

# Show how one manual update step uses this single tape recording.
learning_rate_value = 0.1
new_weight_value = w_variable - learning_rate_value * gradient_value

# Assign updated weight value back into the variable parameter.
w_variable.assign(new_weight_value)

# Recompute loss outside previous tape to show new performance.
new_loss_value = simple_loss_function(w_variable, input_value)

# Print updated weight and loss to connect gradients with changes.
print("Updated weight value:", float(w_variable.numpy()))
print("New loss value after update:", float(new_loss_value.numpy()))




### **2.2. Multiple Variable Gradients**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master TensorFlow 2.20.0/Module_02/Lecture_B/image_02_02.jpg?v=1768967011" width="250">



>* Each variable gets its own loss gradient
>* Together, gradients form one multi-dimensional update direction

>* Gradient sign shows which direction changes the loss
>* Gradient size shows how strongly each parameter matters

>* Look for gradient patterns across layers or groups
>* Use patterns to spot learning issues and adjustments



In [None]:
#@title Python Code - Multiple Variable Gradients

# This script shows gradients for multiple TensorFlow variables together.
# It uses a tiny linear model with weight and bias parameters.
# It prints gradients to interpret how each parameter affects loss.

# !pip install tensorflow==2.20.0

# Import required TensorFlow module and numpy helper library.
import tensorflow as tf
import numpy as np

# Print TensorFlow version information for reproducibility reference.
print("TensorFlow version:", tf.__version__)

# Set deterministic random seeds for TensorFlow and numpy reproducibility.
tf.random.set_seed(7)
np.random.seed(7)

# Create simple synthetic miles to dollars data for linear regression.
true_weight = 3.0
true_bias = 5.0

# Generate small input distances in miles as float32 tensor values.
x_miles = tf.constant([[1.0], [2.0], [3.0], [4.0]], dtype=tf.float32)

# Generate target costs using true parameters plus tiny random noise.
y_cost = true_weight * x_miles + true_bias + tf.random.normal(shape=(4, 1))

# Define trainable weight variable initialized away from true value.
weight = tf.Variable(initial_value=1.0, dtype=tf.float32, name="weight")

# Define trainable bias variable initialized away from true value.
bias = tf.Variable(initial_value=0.0, dtype=tf.float32, name="bias")

# Define simple linear model function using weight and bias variables.
def linear_model(inputs_tensor, weight_variable, bias_variable):
    return weight_variable * inputs_tensor + bias_variable

# Define mean squared error loss between predictions and targets.
def mse_loss(predictions_tensor, targets_tensor):
    return tf.reduce_mean(tf.square(predictions_tensor - targets_tensor))

# Use GradientTape to compute gradients with respect to both variables.
with tf.GradientTape() as tape:
    predictions = linear_model(x_miles, weight, bias)
    loss_value = mse_loss(predictions, y_cost)

# Compute gradients list for weight and bias variables together.
gradients = tape.gradient(loss_value, [weight, bias])

# Unpack gradients into separate tensors for clarity and interpretation.
weight_grad, bias_grad = gradients

# Print current parameter values and corresponding gradient values.
print("Current weight value:", float(weight.numpy()))
print("Current bias value:", float(bias.numpy()))
print("Loss value for batch:", float(loss_value.numpy()))
print("Gradient with respect to weight:", float(weight_grad.numpy()))
print("Gradient with respect to bias:", float(bias_grad.numpy()))

# Show one manual gradient descent step updating both parameters together.
learning_rate = 0.1
weight.assign_sub(learning_rate * weight_grad)
bias.assign_sub(learning_rate * bias_grad)

# Recompute loss after manual update to show coordinated effect.
new_predictions = linear_model(x_miles, weight, bias)
new_loss_value = mse_loss(new_predictions, y_cost)

# Print updated parameter values and new loss for comparison.
print("Updated weight value:", float(weight.numpy()))
print("Updated bias value:", float(bias.numpy()))
print("New loss after update:", float(new_loss_value.numpy()))




### **2.3. Handling None Gradients**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master TensorFlow 2.20.0/Module_02/Lecture_B/image_02_03.jpg?v=1768967047" width="250">



>* None gradients mean no tracked dependence on variable
>* They differ from zero; relationship is untracked

>* None gradients mean parameters never affected loss
>* Treat None gradients as diagnostic, not optimization values

>* Investigate why parameters lost gradient connections
>* Use None gradients as debugging and design signals



In [None]:
#@title Python Code - Handling None Gradients

# This script shows TensorFlow None gradients with simple scalar variables.
# It compares used and unused variables inside GradientTape context.
# It prints gradients and explains why some gradients become None.

# !pip install tensorflow==2.20.0

# Import required TensorFlow library and operating system utilities.
import tensorflow as tf
import os as os

# Set deterministic random seed for reproducible TensorFlow behavior.
tf.random.set_seed(123)

# Print TensorFlow version information for environment transparency.
print("TensorFlow version:", tf.__version__)

# Create two trainable scalar variables representing simple model parameters.
w_used = tf.Variable(2.0, trainable=True, name="w_used")

# Create another variable that will not affect the loss computation.
w_unused = tf.Variable(3.0, trainable=True, name="w_unused")

# Define a simple scalar input tensor used in the forward computation.
x_input = tf.constant(5.0, dtype=tf.float32)

# Use GradientTape to record operations that involve the used variable.
with tf.GradientTape() as tape:

    # Compute prediction using only w_used and ignore w_unused completely.
    y_pred = w_used * x_input

    # Define a simple squared error loss depending only on y_pred value.
    loss = (y_pred - 10.0) ** 2

# Request gradients of loss with respect to both variables together.
grads = tape.gradient(loss, [w_used, w_unused])

# Unpack the gradient list into separate gradient variables for clarity.
grad_used, grad_unused = grads

# Print gradient for w_used which participates in the loss computation.
print("Gradient for w_used:", grad_used.numpy())

# Print gradient for w_unused which never influenced the loss value.
print("Gradient for w_unused:", grad_unused)

# Show a short explanation line highlighting why second gradient is None.
print("w_unused gradient is None because loss never used that variable.")




## **3. Manual Gradient Training**

### **3.1. Using tf keras optimizers**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master TensorFlow 2.20.0/Module_02/Lecture_B/image_03_01.jpg?v=1768967094" width="250">



>* Keras optimizers turn gradients into parameter updates
>* Create optimizer, then apply gradients to variables

>* Optimizers offer varied training behaviors without rewrites
>* Training loop stays record, loss, gradients, optimizer update

>* Swap optimizers and hyperparameters without changing training
>* Simple loop lets optimizers refine parameters over time



In [None]:
#@title Python Code - Using tf keras optimizers

# This script shows using TensorFlow Keras optimizers manually.
# It uses GradientTape to compute gradients for a simple model.
# It then applies gradients using an optimizer training step.

# !pip install tensorflow==2.20.0

# Import required standard libraries and TensorFlow framework.
import os
import random
import numpy as np
import tensorflow as tf

# Print TensorFlow version information for reproducibility reference.
print("TensorFlow version:", tf.__version__)

# Set deterministic random seeds for reproducible training behavior.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Select computation device preferring GPU when it is available.
gpus = tf.config.list_physical_devices("GPU")
if gpus:
    device_name = "/GPU:0"
else:
    device_name = "/CPU:0"
print("Using device:", device_name)

# Create simple synthetic Fahrenheit to Celsius training data.
fahrenheit_values = np.array([32.0, 50.0, 68.0, 86.0, 104.0], dtype=np.float32)
celcius_values = (fahrenheit_values - 32.0) * (5.0 / 9.0)

# Convert numpy arrays into TensorFlow constant tensors.
x_train = tf.constant(fahrenheit_values.reshape(-1, 1))
y_train = tf.constant(celcius_values.reshape(-1, 1))

# Define a simple linear model using TensorFlow variables.
W = tf.Variable(tf.random.normal(shape=(1, 1), stddev=0.1))
b = tf.Variable(tf.zeros(shape=(1,)))

# Define a prediction function using current model parameters.
def model_predict(x_inputs):
    return tf.matmul(x_inputs, W) + b

# Define a mean squared error loss function for training.
def loss_function(y_true, y_pred):
    squared_errors = tf.square(y_true - y_pred)
    return tf.reduce_mean(squared_errors)

# Create a Keras optimizer instance with chosen learning rate.
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)

# Define one manual training step using GradientTape and optimizer.
@tf.function
def train_step(x_batch, y_batch):
    with tf.GradientTape() as tape:
        predictions = model_predict(x_batch)
        loss_value = loss_function(y_batch, predictions)
    gradients = tape.gradient(loss_value, [W, b])
    optimizer.apply_gradients(zip(gradients, [W, b]))
    return loss_value

# Run several training epochs to update model parameters.
with tf.device(device_name):
    for epoch in range(20):
        loss_value = train_step(x_train, y_train)

# Print learned parameters and example prediction after training.
print("Learned weight W:", W.numpy().flatten()[0])
print("Learned bias b:", b.numpy().flatten()[0])
print("True conversion slope:", 5.0 / 9.0)
print("True conversion intercept:", -32.0 * (5.0 / 9.0))

# Evaluate model prediction for a sample Fahrenheit temperature.
sample_fahrenheit = tf.constant([[77.0]], dtype=tf.float32)
predicted_celsius = model_predict(sample_fahrenheit)
print("Predicted Celsius for 77F:", float(predicted_celsius.numpy().flatten()[0]))




### **3.2. Gradient Variable Updates**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master TensorFlow 2.20.0/Module_02/Lecture_B/image_03_02.jpg?v=1768967132" width="250">



>* Use gradients to carefully adjust model variables
>* Repeat small updates to gradually reduce loss

>* Learning rate scales gradients to set update size
>* Repeated small updates gradually move model toward optimum

>* Extreme or tiny gradients cause unstable learning
>* Monitor updates to tune learning rate and strategy



In [None]:
#@title Python Code - Gradient Variable Updates

# This script shows manual gradient based variable updates clearly.
# It uses TensorFlow GradientTape for computing simple scalar gradients.
# It prints variable updates using a tiny linear regression example.

# !pip install tensorflow==2.20.0

# Import required TensorFlow library and supporting modules.
import tensorflow as tf
import numpy as np
import os

# Print TensorFlow version for environment clarity.
print("TensorFlow version:", tf.__version__)

# Set deterministic random seeds for reproducible behavior.
np.random.seed(42)
tf.random.set_seed(42)

# Create simple synthetic Fahrenheit to Celsius style dataset.
true_weight = 2.0
true_bias = 5.0
x_data = np.linspace(-5.0, 5.0, 21).astype("float32")

# Generate corresponding target values using linear relationship.
y_data = true_weight * x_data + true_bias

# Define trainable TensorFlow variables for weight and bias.
w = tf.Variable(0.0, dtype=tf.float32)
b = tf.Variable(0.0, dtype=tf.float32)

# Define simple mean squared error loss function.
def compute_loss(predictions, targets):
    return tf.reduce_mean(tf.square(predictions - targets))

# Choose a small learning rate for stable updates.
learning_rate = 0.1

# Perform a few manual gradient update steps.
for step in range(5):

    # Record operations for automatic differentiation.
    with tf.GradientTape() as tape:
        predictions = w * x_data + b
        loss_value = compute_loss(predictions, y_data)

    # Compute gradients of loss with respect to variables.
    gradients = tape.gradient(loss_value, [w, b])

    # Manually update variables using gradient descent rule.
    w.assign_sub(learning_rate * gradients[0])
    b.assign_sub(learning_rate * gradients[1])

    # Print current step, loss, and variable values.
    print("Step", step, "Loss", float(loss_value), "w", float(w), "b", float(b))




### **3.3. Stabilizing Gradient Behavior**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master TensorFlow 2.20.0/Module_02/Lecture_B/image_03_03.jpg?v=1768967163" width="250">



>* Gradients can be too small or huge
>* We balance update size to keep training stable

>* Learning rate controls how strongly parameters update
>* Tune or adapt learning rate to stabilize loss

>* Use clipping and regularization to control gradients
>* Actively manage gradients for stable, reliable training



In [None]:
#@title Python Code - Stabilizing Gradient Behavior

# This script shows unstable versus stable gradient based training behavior clearly.
# It uses TensorFlow GradientTape with different learning rates and gradient clipping.
# It prints losses to compare exploding updates versus stabilized training behavior.

# !pip install tensorflow==2.20.0

# Import required modules including TensorFlow and NumPy libraries.
import os
import random
import numpy as np
import tensorflow as tf

# Set deterministic seeds for reproducible gradient behavior demonstration.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Print TensorFlow version information for environment transparency and reproducibility.
print("TensorFlow version:", tf.__version__)

# Create simple synthetic data for a noisy linear regression style problem.
true_w = 3.0
true_b = 2.0
x_data = np.linspace(-1.0, 1.0, 50).astype(np.float32)
noise = np.random.normal(loc=0.0, scale=0.1, size=x_data.shape).astype(np.float32)

# Compute target values using true parameters plus Gaussian noise.
y_data = true_w * x_data + true_b + noise

# Define a simple linear model using TensorFlow variables for parameters.
w_unstable = tf.Variable(0.0, dtype=tf.float32)
b_unstable = tf.Variable(0.0, dtype=tf.float32)
w_stable = tf.Variable(0.0, dtype=tf.float32)
b_stable = tf.Variable(0.0, dtype=tf.float32)

# Define a prediction function that computes linear outputs from inputs and parameters.
def model_prediction(x_values, weight_variable, bias_variable):
    return weight_variable * x_values + bias_variable

# Define a mean squared error loss function for scalar loss computation.
def mse_loss(predictions_tensor, targets_tensor):
    return tf.reduce_mean(tf.square(predictions_tensor - targets_tensor))

# Create two optimizers with very different learning rates for comparison.
optimizer_unstable = tf.keras.optimizers.SGD(learning_rate=1.0)
optimizer_stable = tf.keras.optimizers.SGD(learning_rate=0.1)

# Define one manual training step without gradient clipping for unstable behavior.
@tf.function
def train_step_unstable(x_batch_tensor, y_batch_tensor):
    with tf.GradientTape() as tape:
        predictions_tensor = model_prediction(x_batch_tensor, w_unstable, b_unstable)
        loss_value = mse_loss(predictions_tensor, y_batch_tensor)
    gradients_list = tape.gradient(loss_value, [w_unstable, b_unstable])
    optimizer_unstable.apply_gradients(zip(gradients_list, [w_unstable, b_unstable]))
    return loss_value, gradients_list

# Define one manual training step with gradient clipping for stable behavior.
@tf.function
def train_step_stable(x_batch_tensor, y_batch_tensor):
    with tf.GradientTape() as tape:
        predictions_tensor = model_prediction(x_batch_tensor, w_stable, b_stable)
        loss_value = mse_loss(predictions_tensor, y_batch_tensor)
    gradients_list = tape.gradient(loss_value, [w_stable, b_stable])
    clipped_gradients = [tf.clip_by_value(g, -1.0, 1.0) for g in gradients_list]
    optimizer_stable.apply_gradients(zip(clipped_gradients, [w_stable, b_stable]))
    return loss_value, gradients_list

# Convert NumPy arrays into TensorFlow tensors for training steps.
x_tensor = tf.convert_to_tensor(x_data)
y_tensor = tf.convert_to_tensor(y_data)

# Run several training steps and print selected information for both strategies.
num_steps = 8
for step_index in range(num_steps):
    loss_unstable, grads_unstable = train_step_unstable(x_tensor, y_tensor)
    loss_stable, grads_stable = train_step_stable(x_tensor, y_tensor)
    if step_index in [0, 1, 2, 4, 7]:
        print("Step", step_index, "unstable loss", float(loss_unstable), "stable loss", float(loss_stable))

# Print final learned parameters to compare unstable and stable training results.
print("Unstable parameters:", float(w_unstable.numpy()), float(b_unstable.numpy()))
print("Stable parameters:", float(w_stable.numpy()), float(b_stable.numpy()))




# <font color="#418FDE" size="6.5" uppercase>**Autograd with TF**</font>


In this lecture, you learned to:
- Use tf.GradientTape to compute gradients of scalar losses with respect to TensorFlow variables. 
- Interpret gradient values to understand how parameter changes affect a loss function. 
- Implement a simple manual training step using gradients and an optimizer. 

In the next Module (Module 3), we will go over 'Keras Model Building'