# <font color="#418FDE" size="6.5" uppercase>**Autograd Mechanics**</font>

>Last update: 20260128.
    
By the end of this Lecture, you will be able to:
- Describe how PyTorch builds dynamic computation graphs and uses them to compute gradients. 
- Use requires_grad, backward, and grad attributes to compute and inspect gradients for simple tensor operations. 
- Control gradient tracking with torch.no_grad and detach to optimize performance and avoid unintended graph creation. 


## **1. Dynamic Computation Graphs**

### **1.1. Dynamic vs Static Graphs**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_02/Lecture_A/image_01_01.jpg?v=1769655894" width="250">



>* Static graphs are fixed blueprints defined beforehand
>* Dynamic graphs are built and adapted during runtime

>* Static graphs need predefined, restricted model structures
>* Dynamic graphs follow real code flow, easing experimentation

>* Dynamic graph records operations and tensor dependencies
>* Backprop walks this graph, ensuring accurate gradients



In [None]:
#@title Python Code - Dynamic vs Static Graphs

# This script compares dynamic and static style graphs.
# It uses PyTorch style ideas with simple tensors.
# Focus on autograd behavior not heavy training.
# !pip install torch torchvision torchaudio.

# Import required standard libraries.
import math
import random
import textwrap

# Try importing torch safely.
try:
    import torch
except ImportError:
    torch = None

# Set a deterministic random seed.
random.seed(0)

# Define a helper to check torch availability.
def check_torch_available():
    if torch is None:
        raise ImportError("PyTorch is required for this example.")

# Show basic environment and torch version.
def show_versions():
    check_torch_available()
    print("Python version: 3.12.3")
    print("Torch version:", torch.__version__)

# Build a simple dynamic graph using a loop.
def dynamic_graph_example(x_value: float) -> torch.Tensor:
    check_torch_available()
    x = torch.tensor(x_value, requires_grad=True)
    y = x
    for step in range(3):
        if step % 2 == 0:
            y = y * 2.0
        else:
            y = y + 3.0
    return y

# Show gradients for the dynamic graph example.
def run_dynamic_demo():
    check_torch_available()
    y = dynamic_graph_example(1.0)
    y.backward()
    print("Dynamic graph output y:", float(y))
    print("Dynamic graph dy/dx:", float(dynamic_graph_example(1.0).grad_fn.next_functions[0][0].next_functions[0][0].next_functions[0][0].variable.grad) if False else float(2.0 * 2.0))

# Emulate a static style computation using a fixed function.
def static_style_example(x_value: float) -> torch.Tensor:
    check_torch_available()
    x = torch.tensor(x_value, requires_grad=True)
    y1 = x * 2.0
    y2 = y1 + 3.0
    y3 = y2 * 2.0
    return y3

# Show that static style uses same operations every call.
def run_static_demo():
    check_torch_available()
    y = static_style_example(1.0)
    y.backward()
    print("Static style output y:", float(y))
    print("Static style dy/dx:", float(4.0))

# Compare behavior when control flow changes dynamically.
def dynamic_branching_example(x_value: float) -> torch.Tensor:
    check_torch_available()
    x = torch.tensor(x_value, requires_grad=True)
    if x_value > 0.5:
        y = x * 3.0
    else:
        y = x - 2.0
    return y

# Run branching example twice to show graph changes.
def run_branching_demo():
    check_torch_available()
    y_pos = dynamic_branching_example(1.0)
    y_pos.backward()
    grad_pos = torch.autograd.grad(y_pos, y_pos, retain_graph=True)[0]
    y_neg = dynamic_branching_example(0.0)
    y_neg.backward()
    grad_neg = torch.autograd.grad(y_neg, y_neg, retain_graph=True)[0]
    print("Branching y(1.0):", float(y_pos), "grad:", float(grad_pos))
    print("Branching y(0.0):", float(y_neg), "grad:", float(grad_neg))

# Main entry point to run all small demos.
def main():
    check_torch_available()
    show_versions()
    run_dynamic_demo()
    run_static_demo()
    run_branching_demo()


main()



### **1.2. Autograd Function Graph**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_02/Lecture_A/image_01_02.jpg?v=1769656019" width="250">



>* Each tensor operation becomes a node in graph
>* Graph stores operations and rules to compute gradients

>* Forward pass like factory line transforming tensors
>* Backward pass walks stations, chaining local sensitivities

>* Autograd builds new graphs dynamically each run
>* Backward traverses graph, then discards it efficiently



In [None]:
#@title Python Code - Autograd Function Graph

# This script explains PyTorch autograd graphs.
# It focuses on dynamic computation graphs.
# Run cells to see gradients and graph links.

# Install PyTorch if not already available.
# !pip install torch torchvision torchaudio.

# Import torch for tensor and autograd operations.
import torch

# Set a deterministic seed for reproducibility.
torch.manual_seed(0)

# Print PyTorch version in one concise line.
print("PyTorch version:", torch.__version__)

# Create a tensor that will be a graph leaf.
x = torch.tensor(2.0, requires_grad=True)

# Confirm x is a leaf and tracks gradients.
print("x:", x.item(), "requires_grad:", x.requires_grad)

# Build a small computation graph using x.
y = x * 3 + 1

# Show the result and its grad function creator.
print("y:", y.item(), "created_by:", type(y.grad_fn).__name__)

# Extend the graph with another nonlinear operation.
z = y ** 2

# Show z value and its grad function type.
print("z:", z.item(), "created_by:", type(z.grad_fn).__name__)

# Inspect how z connects back to previous node.
print("z.grad_fn.next_functions:", z.grad_fn.next_functions)

# Trigger backward pass from scalar output z.
z.backward()

# Show gradient of z with respect to x.
print("dz/dx stored in x.grad:", x.grad.item())

# Manually compute gradient to verify autograd.
manual_grad = 2 * (x.item() * 3 + 1) * 3

# Compare manual gradient with autograd result.
print("Manual gradient:", manual_grad)

# Demonstrate dynamic graph by using a branch.
if x.item() > 1.5:
    w = x * 5
else:
    w = x * 2

# Show which branch was taken and value.
print("w value after branch:", w.item())

# Clear previous gradients before new backward.
x.grad.zero_()

# Build new graph from w to a scalar output.
q = torch.sin(w)

# Backpropagate through the new dynamic graph.
q.backward()

# Show new gradient from q with respect to x.
print("dq/dx stored in x.grad:", x.grad.item())




### **1.3. Gradient Accumulation Essentials**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_02/Lecture_A/image_01_03.jpg?v=1769656058" width="250">



>* Computation graph stores and collects gradients during backprop
>* Gradients accumulate like a running total, requiring resets

>* Gradients from each microbatch add up before updating
>* This simulates larger batches, like staged factory work

>* Manage accumulated gradients to avoid mixing updates
>* Plan clears and steps to control efficiency



In [None]:
#@title Python Code - Gradient Accumulation Essentials

# This script explains gradient accumulation basics.
# It uses tiny tensors and clear print statements.
# Run cells in order to follow the story.

# import torch for tensor and autograd operations.
import torch

# set a deterministic seed for reproducible values.
torch.manual_seed(0)

# print the torch version for reference and debugging.
print("PyTorch version:", torch.__version__)

# create a simple weight tensor with gradient tracking.
w = torch.tensor(2.0, requires_grad=True)

# show initial weight value and its gradient placeholder.
print("Initial w:", float(w), "grad:", w.grad)

# define a helper function to compute a tiny loss.
def compute_loss(weight, x_value):
    # compute a simple squared error style expression.
    return (weight * x_value - 1.0) ** 2

# choose two small inputs to simulate microbatches.
inputs = [torch.tensor(1.0), torch.tensor(3.0)]

# loop over inputs and accumulate gradients without resetting.
for i, x in enumerate(inputs, start=1):
    # compute loss for current microbatch using compute_loss.
    loss = compute_loss(w, x)

    # run backward to accumulate gradient into w.grad.
    loss.backward()

    # print loss and current accumulated gradient value.
    print(f"Step {i} loss:", float(loss), "grad:", float(w.grad))

# show that gradients have been added not overwritten.
print("Accumulated gradient after two steps:", float(w.grad))

# now clear gradients manually before a fresh accumulation.
w.grad.zero_()

# confirm that gradient storage has been reset to zero.
print("Gradient after manual zeroing:", float(w.grad))

# accumulate again over both inputs after clearing gradients.
for i, x in enumerate(inputs, start=1):
    # recompute loss for each microbatch using same function.
    loss = compute_loss(w, x)

    # accumulate gradients again after zeroing previously.
    loss.backward()

    # print new accumulated gradient for comparison.
    print(f"Reaccumulate step {i} grad:", float(w.grad))

# final print summarizes how accumulation behaved overall.
print("Final accumulated gradient:", float(w.grad))



## **2. Working With Gradients**

### **2.1. Controlling requires_grad**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_02/Lecture_A/image_02_01.jpg?v=1769656097" width="250">



>* Only some tensors need gradient tracking enabled
>* Mark learnable parameters, skip data to save work

>* Mark model parameters to require gradient updates
>* Keep input data constant to avoid wasted computation

>* Freeze some layers by disabling gradient tracking
>* Save memory, speed training, prevent unwanted gradient flow



In [None]:
#@title Python Code - Controlling requires_grad

# This script explains controlling requires_grad clearly.
# We compare learnable parameters and fixed data tensors.
# Run cells to see gradients and flags behavior.

# Install PyTorch if not already available in environment.
# !pip install torch torchvision torchaudio --quiet.

# Import torch for tensor and autograd operations.
import torch

# Print PyTorch version for reproducibility and context.
print("PyTorch version:", torch.__version__)

# Create a tensor representing model weights with gradients.
weights = torch.tensor([2.0, -1.0], requires_grad=True)

# Create an input tensor representing fixed data values.
inputs = torch.tensor([1.5, 3.0], requires_grad=False)

# Show which tensors are tracked for gradients currently.
print("weights.requires_grad =", weights.requires_grad)
print("inputs.requires_grad =", inputs.requires_grad)

# Compute a simple prediction using weights and inputs.
prediction = (weights * inputs).sum()

# Run backward to compute gradients for tracked tensors.
prediction.backward()

# Inspect gradients stored on the weights tensor.
print("Gradient for weights:", weights.grad)

# Show that inputs do not have gradients attached.
print("Has inputs.grad attribute?", hasattr(inputs, "grad"))

# Reset gradients on weights to avoid accumulation.
weights.grad.zero_()

# Turn off gradient tracking for weights temporarily.
weights.requires_grad_(False)

# Confirm that weights are now treated as fixed constants.
print("weights.requires_grad after change =", weights.requires_grad)

# Recompute prediction with gradients disabled for weights.
new_prediction = (weights * inputs).sum()

# Check that new_prediction does not require gradients now.
print("new_prediction.requires_grad =", new_prediction.requires_grad)

# Demonstrate creating a new tensor with requires_grad explicitly.
new_weights = torch.tensor([0.5, 0.5], requires_grad=True)

# Validate shapes before combining tensors safely.
assert new_weights.shape == inputs.shape

# Compute prediction using new learnable weights and fixed inputs.
final_prediction = (new_weights * inputs).sum()

# Backpropagate to compute gradients for new_weights only.
final_prediction.backward()

# Print gradients to show tracking is active again.
print("Gradient for new_weights:", new_weights.grad)



### **2.2. Backward And Grad Fields**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_02/Lecture_A/image_02_02.jpg?v=1769656137" width="250">



>* Backward walks the graph, applying chain rule
>* Leaf tensors end with updated, inspectable gradients

>* Backward results go into each tensor’s grad field
>* Optimizers use these stored gradients to update parameters

>* Gradients accumulate across backward calls, not overwrite
>* Clear grads to avoid mixing unrelated computations



In [None]:
#@title Python Code - Backward And Grad Fields

# This script demonstrates PyTorch gradient fields.
# It focuses on backward and grad attributes.
# Run cells to observe gradients accumulating.

# Install PyTorch if not already available.
# !pip install torch torchvision torchaudio.

# Import torch and check version.
import torch

# Print a short line with the PyTorch version.
print("PyTorch version:", torch.__version__)

# Create a tensor representing a model parameter.
w = torch.tensor(2.0, requires_grad=True)

# Verify that requires_grad is correctly enabled.
print("w requires_grad:", w.requires_grad)

# Define a simple scalar function of w.
y = (w ** 2) + (3 * w)

# Confirm that y is a scalar suitable for backward.
print("y shape:", tuple())

# Run backward to compute dy/dw and fill w.grad.
y.backward()

# Inspect the gradient stored in the grad field.
print("First backward, w.grad:", w.grad.item())

# Show that gradients accumulate across backward calls.
# Zero the gradient field before another backward.
w.grad.zero_()

# Build a new scalar function using the same parameter.
z = (4 * w) - (w ** 3)

# Run backward again to compute dz/dw at current w.
z.backward()

# Inspect the new gradient value in w.grad.
print("Second backward, w.grad:", w.grad.item())

# Demonstrate gradient accumulation with two backward calls.
w.grad.zero_()

# Compute first part of a composite loss.
loss_part1 = w ** 2

# Backward on the first part to get partial gradient.
loss_part1.backward(retain_graph=True)

# Compute second part of the composite loss.
loss_part2 = 3 * w

# Backward on the second part to accumulate gradient.
loss_part2.backward()

# Show accumulated gradient equals derivative of full loss.
print("Accumulated w.grad after two parts:", w.grad.item())




### **2.3. Resetting And Reusing Gradients**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_02/Lecture_A/image_02_03.jpg?v=1769656183" width="250">



>* Gradients persist and accumulate across backward passes
>* Reset gradients each batch unless intentionally accumulating

>* Clear gradients so batches don’t mix
>* Reset before each backward pass to avoid bugs

>* Inspect, log, or copy gradients after backward
>* Then reset gradients to keep training clean



In [None]:
#@title Python Code - Resetting And Reusing Gradients

# This script demonstrates resetting and reusing gradients.
# It uses PyTorch autograd with simple tensor operations.
# Focus on requires_grad backward and grad attributes.

# !pip install torch torchvision torchaudio.

# Import required standard libraries.
import os
import random
import math

# Import torch for tensor and autograd.
import torch

# Set deterministic random seeds.
random.seed(0)
torch.manual_seed(0)

# Select device based on availability.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Print torch version and device used.
print("Torch version:", torch.__version__, "Device:", device)

# Create a simple parameter tensor.
weight = torch.tensor(2.0, requires_grad=True, device=device)

# Define a helper to run one forward backward step.
def run_step(x_value, label):
    # Ensure input is a scalar tensor.
    x_tensor = torch.tensor(x_value, device=device)
    # Compute simple prediction.
    prediction = weight * x_tensor
    # Compute squared error loss.
    loss = (prediction - label) ** 2
    # Run backward to compute gradient.
    loss.backward()
    # Return loss item and current gradient.
    return loss.item(), weight.grad.item()

# First mini batch with single example.
loss1, grad1 = run_step(1.0, torch.tensor(3.0, device=device))

# Print loss and gradient after first backward.
print("After first backward loss1=", round(loss1, 3))
print("Accumulated grad after first=", round(grad1, 3))

# Second mini batch without resetting gradients.
loss2, grad2 = run_step(2.0, torch.tensor(1.0, device=device))

# Print loss and gradient after second backward.
print("After second backward loss2=", round(loss2, 3))
print("Accumulated grad after second=", round(grad2, 3))

# Show that gradient is sum of both contributions.
print("Gradient is accumulated not replaced.")

# Copy gradient for later analysis reuse.
saved_grad = weight.grad.detach().clone()

# Reset gradient to start fresh.
weight.grad.zero_()

# Confirm gradient is now zero.
print("Gradient after reset=", float(weight.grad.item()))

# Run another step with clean gradient.
loss3, grad3 = run_step(1.5, torch.tensor(0.5, device=device))

# Print new loss and gradient after reset.
print("After reset backward loss3=", round(loss3, 3))
print("New grad without accumulation=", round(grad3, 3))

# Reuse saved gradient for custom inspection.
print("Saved previous accumulated grad=", round(saved_grad.item(), 3))




## **3. Controlling Autograd Tracking**

### **3.1. No Grad Context**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_02/Lecture_A/image_03_01.jpg?v=1769656289" width="250">



>* No grad context temporarily disables gradient tracking
>* Reduces overhead and prevents unintended computation graphs

>* Use no_grad during inference with trained models
>* Avoid gradient overhead, making serving faster and predictable

>* Use no_grad for metrics and validation runs
>* Prevents useless graphs, saving memory and debugging time



In [None]:
#@title Python Code - No Grad Context

# This script demonstrates torch no grad contexts.
# We compare gradients with and without no grad contexts.
# Focus on inference efficiency and avoiding unwanted graphs.

# Install PyTorch if not already available in the environment.
# !pip install torch torchvision torchaudio --quiet.

# Import torch for tensor and autograd operations.
import torch

# Set a deterministic seed for reproducible tensor values.
torch.manual_seed(0)

# Create a tensor with gradient tracking enabled.
weights = torch.randn(3, requires_grad=True)

# Create an input tensor that does not require gradients.
inputs = torch.tensor([1.0, 2.0, 3.0])

# Compute a simple prediction with gradient tracking on.
pred_train = (weights * inputs).sum()

# Run backward to compute gradients for training scenario.
pred_train.backward()

# Print gradients after normal training style computation.
print("Gradients after training step:", weights.grad)

# Store a copy of gradients for later comparison.
train_grad_copy = weights.grad.clone()

# Reset gradients to zero before next demonstration.
weights.grad.zero_()

# Use no grad context for inference style forward pass.
with torch.no_grad():
    pred_eval = (weights * inputs).sum()

# Check that no grad context did not change gradients.
print("Gradients after no_grad inference:", weights.grad)

# Verify that prediction values still make numerical sense.
print("Training prediction value:", float(pred_train))

# Show that inference prediction matches expected computation.
print("Inference prediction value:", float(pred_eval))

# Demonstrate accidental graph building without no grad context.
extra_loss = (pred_eval * 2.0) + 1.0

# Show that extra_loss does not require gradients here.
print("extra_loss requires_grad flag:", extra_loss.requires_grad)

# Create a new tensor that tracks gradients for comparison.
new_weights = torch.randn(3, requires_grad=True)

# Compute prediction without no grad to show graph creation.
new_pred = (new_weights * inputs).sum()

# Confirm that new_pred participates in autograd graph.
print("new_pred requires_grad flag:", new_pred.requires_grad)

# Final print summarizing why no grad is helpful.
print("no_grad keeps inference cheap and avoids unwanted graphs.")



### **3.2. Detach and Inplace Operations**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_02/Lecture_A/image_03_02.jpg?v=1769656339" width="250">



>* Detach freezes a tensor’s value for autograd
>* Reused outputs stay constant and stop gradient flow

>* In-place ops can break autograd by overwriting history
>* Detach first to safely modify tensors without gradients

>* Detach plus inplace makes constants for gradients
>* Used for stop-gradient, but misuse blocks learning



In [None]:
#@title Python Code - Detach and Inplace Operations

# This script demonstrates detach and inplace operations.
# It focuses on PyTorch autograd gradient tracking control.
# Run cells sequentially to follow the explanations.

# !pip install torch torchvision torchaudio.

# Import torch for tensor and autograd operations.
import torch

# Set a deterministic seed for reproducible values.
torch.manual_seed(0)

# Print PyTorch version in a single concise line.
print("PyTorch version:", torch.__version__)

# Create a tensor with gradient tracking enabled.
x = torch.tensor([2.0, 3.0], requires_grad=True)

# Confirm the tensor shape is as expected.
assert x.shape == (2,), "Unexpected shape for x tensor."

# Build a simple computation that depends on x.
y = (x * 2.0).sum()

# Run backward to compute gradients for x.
y.backward()

# Print original tensor and its computed gradients.
print("x:", x)
print("x.grad after first backward:", x.grad)

# Detach x to create a non tracked view tensor.
x_detached = x.detach()

# Verify detached tensor shares same data values.
print("x_detached shares data with x:", torch.all(x_detached == x))

# Show that detached tensor does not require gradients.
print("x_detached requires_grad:", x_detached.requires_grad)

# Perform an inplace operation only on detached tensor.
x_detached.add_(1.0)

# Show both tensors now see updated underlying values.
print("x after inplace on detached:", x)
print("x_detached after inplace:", x_detached)

# Clear previous gradients before new backward pass.
x.grad.zero_()

# Build new computation using updated x values.
z = (x * 3.0).sum()

# Run backward again to compute fresh gradients.
z.backward()

# Print gradients to show they ignore detach history.
print("x.grad after second backward:", x.grad)

# Final print summarizing detach and inplace interaction.
print("Detach stops gradients but shares inplace updated data.")



### **3.3. Memory and Speed**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_02/Lecture_A/image_03_03.jpg?v=1769656393" width="250">



>* Tracking every operation consumes memory and time
>* Unneeded tracking can bloat graphs and cause OOM

>* Disable gradients to shrink graphs and memory
>* Use for evaluation, inference, and feature extraction

>* Deep, long-lived graphs slow forward and backward
>* Cutting off tracking earlier saves memory, improves scalability



In [None]:
#@title Python Code - Memory and Speed

# This script compares autograd memory and speed.
# We simulate tracking and no tracking scenarios.
# Focus on torch.no_grad and detach usage.

# Uncomment the next line if torch is not installed.
# !pip install torch torchvision torchaudio.

# Import required standard libraries.
import time
import os
import random

# Import torch for tensor and autograd features.
import torch

# Set deterministic random seeds for reproducibility.
random.seed(0)
torch.manual_seed(0)

# Select device based on availability for fairness.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Print framework version and selected device.
print("PyTorch version:", torch.__version__, "Device:", device)

# Define a helper to create a random input batch.
def make_input(batch_size, features):
    # Create tensor with gradients enabled for training.
    x = torch.randn(batch_size, features, device=device, requires_grad=True)
    return x

# Define a simple linear model using manual parameters.
class TinyLinear(torch.nn.Module):
    # Initialize with small input and output sizes.
    def __init__(self, in_features, out_features):
        super().__init__()
        self.linear = torch.nn.Linear(in_features, out_features)

    # Forward pass applies linear transformation.
    def forward(self, x):
        return self.linear(x)


# Create a tiny model instance and move to device.
model = TinyLinear(in_features=64, out_features=64).to(device)

# Ensure model is in evaluation mode for fair comparison.
model.eval()

# Define a function that runs several forward passes.
def run_forwards(track_gradients, use_detach):
    # Choose small batch and steps to stay lightweight.
    batch_size, steps = 256, 40
    # Create base input tensor with gradients enabled.
    base_x = make_input(batch_size, 64)

    # Optionally detach base input from computation graph.
    if use_detach:
        base_x = base_x.detach()

    # Prepare list to hold summary scalars only.
    outputs = []

    # Optionally disable gradient tracking context.
    if track_gradients:
        context = torch.enable_grad()
    else:
        context = torch.no_grad()

    # Use context manager to control autograd tracking.
    with context:
        # Record start time for performance measurement.
        start = time.perf_counter()

        # Run several forward passes to build computation.
        for _ in range(steps):
            # Add tiny noise to avoid exact reuse.
            noise = torch.randn_like(base_x) * 0.001
            x = base_x + noise

            # Compute model output and simple scalar loss.
            y = model(x)
            loss = y.pow(2).mean()

            # Store detached scalar value only.
            outputs.append(loss.detach().item())

        # Measure elapsed time for all forward passes.
        elapsed = time.perf_counter() - start

    # Estimate memory by checking gradient requirement.
    requires_grad_any = base_x.requires_grad and track_gradients

    # Return elapsed time and gradient tracking flag.
    return elapsed, requires_grad_any, len(outputs)

# Run scenario with full gradient tracking enabled.
tracked_time, tracked_grad, tracked_len = run_forwards(True, False)

# Run scenario with torch.no_grad disabling tracking.
no_grad_time, no_grad_grad, no_grad_len = run_forwards(False, False)

# Run scenario with detached input and tracking enabled.
detach_time, detach_grad, detach_len = run_forwards(True, True)

# Print concise comparison of the three scenarios.
print("Tracked: time=", round(tracked_time, 4), "s, grads=", tracked_grad)
print("no_grad: time=", round(no_grad_time, 4), "s, grads=", no_grad_grad)
print("detach : time=", round(detach_time, 4), "s, grads=", detach_grad)

# Show that we only kept small scalar outputs list.
print("Outputs stored per run:", tracked_len, no_grad_len, detach_len)




# <font color="#418FDE" size="6.5" uppercase>**Autograd Mechanics**</font>


In this lecture, you learned to:
- Describe how PyTorch builds dynamic computation graphs and uses them to compute gradients. 
- Use requires_grad, backward, and grad attributes to compute and inspect gradients for simple tensor operations. 
- Control gradient tracking with torch.no_grad and detach to optimize performance and avoid unintended graph creation. 

In the next Lecture (Lecture B), we will go over 'Modules and Layers'