# Module 2 - Exercise 1: Autograd Exploration

## Learning Objectives
- Master PyTorch's automatic differentiation system
- Understand computational graphs and gradient flow
- Practice with multivariable gradients and chain rule
- Explore gradient context management
- Implement higher-order derivatives

## Prerequisites
- Completion of Module 1 exercises
- Understanding of calculus derivatives
- Familiarity with chain rule

## Setup and Test Repository

First, let's clone the test repository and set up our environment for step-by-step validation.

In [None]:
# Clone the test repository
!git clone https://github.com/racousin/data_science_practice.git /tmp/tests 2>/dev/null || true
!cd /tmp/tests && pwd && ls -la tests/python_deep_learning/module2/

# Import the test module
import sys
sys.path.append('/tmp/tests')
print("Test repository setup complete!")

## Environment Setup

In [None]:
import torch
import numpy as np
import matplotlib.pyplot as plt

# Print PyTorch version
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

# Import test functions
from tests.python_deep_learning.module2.test_exercise1 import *

## Section 1: Basic Autograd Operations

Learn the fundamentals of automatic differentiation with simple scalar functions.

In [None]:
# TODO: Create a tensor that requires gradients and compute a simple function
# Create x = 2.0 with requires_grad=True
x = None

# TODO: Compute y = x^2 + 3*x + 1
y = None

print(f"x = {x}")
print(f"y = {y}")
print(f"x.requires_grad: {x.requires_grad if x is not None else 'None'}")
print(f"y.requires_grad: {y.requires_grad if y is not None else 'None'}")

In [None]:
# TODO: Compute gradients using backward()
# Call y.backward() to compute gradients

print(f"dy/dx = {x.grad}")
print(f"Expected: dy/dx = 2x + 3 = 2*2 + 3 = 7")

In [None]:
# Test your basic autograd implementation
try:
    test_basic_autograd_functions(locals())
    print("✅ Section 1: Basic Autograd - All tests passed!")
except Exception as e:
    print(f"❌ Section 1: Basic Autograd - Tests failed: {e}")
    print("Please complete the basic autograd tasks above before proceeding.")

## Section 2: Multivariable Gradients

Explore gradients with functions of multiple variables.

In [None]:
# TODO: Create two variables with gradients enabled
x1 = None  # Create tensor with value 1.0, requires_grad=True
x2 = None  # Create tensor with value 2.0, requires_grad=True

# TODO: Compute z = x1^2 + x2^3 + x1*x2
z = None

print(f"x1 = {x1}, x2 = {x2}")
print(f"z = {z}")

In [None]:
# TODO: Compute gradients for multivariable function
# Call z.backward() to compute partial derivatives

print(f"∂z/∂x1 = {x1.grad}")
print(f"∂z/∂x2 = {x2.grad}")
print(f"Expected: ∂z/∂x1 = 2*x1 + x2 = 2*1 + 2 = 4")
print(f"Expected: ∂z/∂x2 = 3*x2^2 + x1 = 3*4 + 1 = 13")

In [None]:
# Test multivariable gradients
try:
    test_multivariable_gradients_functions(locals())
    print("✅ Section 2: Multivariable Gradients - All tests passed!")
except Exception as e:
    print(f"❌ Section 2: Multivariable Gradients - Tests failed: {e}")
    print("Please complete the multivariable gradient tasks above before proceeding.")

## Section 3: Vector and Matrix Gradients

Work with gradients of vector and matrix operations.

In [None]:
# TODO: Create a vector with gradients and compute a scalar loss
vec_x = None  # Create [1.0, 2.0, 3.0] with requires_grad=True

# TODO: Compute vec_loss = sum of squares
vec_loss = None  # torch.sum(vec_x**2)

print(f"vec_x = {vec_x}")
print(f"vec_loss = {vec_loss}")

In [None]:
# TODO: Compute gradients for vector function
# Call vec_loss.backward()

print(f"∇vec_loss = {vec_x.grad}")
print(f"Expected: gradient should be 2*vec_x = [2, 4, 6]")

In [None]:
# Test vector gradients
try:
    test_vector_gradients_functions(locals())
    print("✅ Section 3a: Vector Gradients - All tests passed!")
except Exception as e:
    print(f"❌ Section 3a: Vector Gradients - Tests failed: {e}")

In [None]:
# TODO: Create a matrix and compute gradients
mat_A = None  # Create 2x2 matrix [[1, 2], [3, 4]] with requires_grad=True

# TODO: Compute mat_loss = sum of squares of all elements
mat_loss = None

print(f"mat_A = \n{mat_A}")
print(f"mat_loss = {mat_loss}")

In [None]:
# TODO: Compute matrix gradients
# Call mat_loss.backward()

print(f"∇mat_A = \n{mat_A.grad}")
print(f"Expected: gradient should be 2*mat_A")

In [None]:
# Test matrix gradients
try:
    test_matrix_gradients_functions(locals())
    print("✅ Section 3b: Matrix Gradients - All tests passed!")
except Exception as e:
    print(f"❌ Section 3b: Matrix Gradients - Tests failed: {e}")

## Section 4: Computational Graph and Chain Rule

Understand how PyTorch builds and traverses computational graphs.

In [None]:
# TODO: Build a computational graph step by step
graph_x = None  # Create tensor 2.0 with requires_grad=True

# TODO: Build computation step by step
graph_y = None  # graph_x**2
graph_z = None  # 3*graph_y + 1
graph_w = None  # graph_z**2

print(f"x = {graph_x}")
print(f"y = x^2 = {graph_y}")
print(f"z = 3y + 1 = {graph_z}")
print(f"w = z^2 = {graph_w}")

In [None]:
# TODO: Compute gradients through the computational graph
# Call graph_w.backward()

print(f"dw/dx = {graph_x.grad}")
print(f"Chain rule: dw/dx = dw/dz * dz/dy * dy/dx")
print(f"dw/dz = 2*z = 2*13 = 26")
print(f"dz/dy = 3")
print(f"dy/dx = 2*x = 2*2 = 4")
print(f"Therefore: dw/dx = 26 * 3 * 4 = 312")

In [None]:
# Test computational graph understanding
try:
    test_computational_graph_functions(locals())
    print("✅ Section 4: Computational Graph - All tests passed!")
except Exception as e:
    print(f"❌ Section 4: Computational Graph - Tests failed: {e}")

## Section 5: Gradient Context Management

Learn to control when gradients are computed and stored.

In [None]:
# TODO: Use torch.no_grad() context
x = torch.tensor(3.0, requires_grad=True)

# TODO: Compute operation within no_grad context
with torch.no_grad():
    no_grad_result = None  # x**2 + 2*x

print(f"no_grad_result = {no_grad_result}")
print(f"requires_grad: {no_grad_result.requires_grad}")

In [None]:
# TODO: Use detach() to remove tensor from computational graph
y = x**3 + x
detached_result = None  # y.detach()

print(f"Original y requires_grad: {y.requires_grad}")
print(f"Detached result requires_grad: {detached_result.requires_grad}")
print(f"Values are equal: {torch.equal(y, detached_result)}")

In [None]:
# Test gradient context management
try:
    test_grad_context_functions(locals())
    print("✅ Section 5: Gradient Context - All tests passed!")
except Exception as e:
    print(f"❌ Section 5: Gradient Context - Tests failed: {e}")

## Section 6: Higher-Order Derivatives

Compute second derivatives and higher-order gradients.

In [None]:
# TODO: Compute second derivative
x = torch.tensor(2.0, requires_grad=True)

# TODO: Define function f(x) = x^4
y = None  # x**4

# TODO: Compute first derivative
# y.backward(create_graph=True)  # create_graph=True allows computing gradients of gradients
first_derivative = x.grad.clone()

print(f"f(x) = x^4, x = {x}")
print(f"f'(x) = 4x^3 = {first_derivative}")

# TODO: Compute second derivative
x.grad.zero_()  # Clear first derivative
# first_derivative.backward()
second_derivative = None  # x.grad

print(f"f''(x) = 12x^2 = {second_derivative}")
print(f"Expected: f''(2) = 12*4 = 48")

In [None]:
# Test higher-order gradients
try:
    test_higher_order_gradients_functions(locals())
    print("✅ Section 6: Higher-Order Derivatives - All tests passed!")
except Exception as e:
    print(f"❌ Section 6: Higher-Order Derivatives - Tests failed: {e}")

## Section 7: Gradient Flow Visualization

Visualize how gradients flow through computational graphs.

In [None]:
# Create a more complex computational graph for visualization
def create_complex_function(x):
    """Create a complex function for gradient flow analysis"""
    a = x**2
    b = torch.sin(a)
    c = torch.exp(b)
    d = torch.log(c + 1)
    return d

# Test with different input values
x_values = torch.linspace(-2, 2, 100)
gradients = []

for x_val in x_values:
    x = torch.tensor(x_val.item(), requires_grad=True)
    y = create_complex_function(x)
    y.backward()
    gradients.append(x.grad.item())

# Plot function and its gradient
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
with torch.no_grad():
    y_values = [create_complex_function(x).item() for x in x_values]
plt.plot(x_values.numpy(), y_values)
plt.title('Function: log(exp(sin(x²)) + 1)')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.grid(True)

plt.subplot(1, 2, 2)
plt.plot(x_values.numpy(), gradients)
plt.title("Function's Gradient")
plt.xlabel('x')
plt.ylabel("f'(x)")
plt.grid(True)

plt.tight_layout()
plt.show()

print("The gradient plot shows how the derivative changes across the input domain.")
print("Notice how the gradient reflects the slope of the original function.")

## Section 8: Practical Applications

Apply autograd to real scenarios like optimization and neural network training.

In [None]:
# Simple optimization using gradients
def quadratic_function(x):
    """A simple quadratic function to minimize: f(x) = (x-3)^2 + 1"""
    return (x - 3)**2 + 1

# Initialize parameter
x = torch.tensor(0.0, requires_grad=True)
learning_rate = 0.1
num_steps = 50

# Track optimization progress
x_history = []
loss_history = []

for step in range(num_steps):
    # Forward pass
    loss = quadratic_function(x)
    
    # Record history
    x_history.append(x.item())
    loss_history.append(loss.item())
    
    # Backward pass
    if x.grad is not None:
        x.grad.zero_()
    loss.backward()
    
    # Update parameter
    with torch.no_grad():
        x -= learning_rate * x.grad

print(f"Initial x: {x_history[0]:.4f}")
print(f"Final x: {x_history[-1]:.4f}")
print(f"Target x: 3.0")
print(f"Initial loss: {loss_history[0]:.4f}")
print(f"Final loss: {loss_history[-1]:.4f}")

# Plot optimization progress
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.plot(x_history)
plt.axhline(y=3, color='r', linestyle='--', label='Target (x=3)')
plt.title('Parameter Convergence')
plt.xlabel('Step')
plt.ylabel('x value')
plt.legend()
plt.grid(True)

plt.subplot(1, 2, 2)
plt.plot(loss_history)
plt.title('Loss Decrease')
plt.xlabel('Step')
plt.ylabel('Loss')
plt.grid(True)

plt.tight_layout()
plt.show()

## Final Validation

Run the complete test suite to validate all your solutions.

In [None]:
# Run complete validation
print("Running complete test suite...\n")

all_tests_passed = True
test_sections = [
    ("Basic Autograd", test_basic_autograd_functions),
    ("Multivariable Gradients", test_multivariable_gradients_functions), 
    ("Vector Gradients", test_vector_gradients_functions),
    ("Matrix Gradients", test_matrix_gradients_functions),
    ("Computational Graph", test_computational_graph_functions),
    ("Gradient Context", test_grad_context_functions),
    ("Higher-Order Derivatives", test_higher_order_gradients_functions)
]

for section_name, test_func in test_sections:
    try:
        test_func(locals())
        print(f"✅ {section_name} - PASSED")
    except Exception as e:
        print(f"❌ {section_name} - FAILED: {e}")
        all_tests_passed = False

print("\n" + "="*50)
if all_tests_passed:
    print("🎉 ALL TESTS PASSED! You have successfully mastered PyTorch autograd!")
    print("You are now ready to proceed to Exercise 2: Gradient Analysis.")
else:
    print("❌ Some tests failed. Please review the failed sections and complete the missing implementations.")
print("="*50)

## Summary

In this exercise, you have learned:

1. **Basic Autograd**: How PyTorch automatically computes gradients for scalar functions
2. **Multivariable Functions**: Computing partial derivatives for functions of multiple variables
3. **Vector & Matrix Operations**: Gradients for vector and matrix computations
4. **Computational Graphs**: Understanding how PyTorch builds and traverses computation graphs
5. **Context Management**: Controlling gradient computation with `torch.no_grad()` and `.detach()`
6. **Higher-Order Derivatives**: Computing second derivatives and beyond
7. **Practical Applications**: Using autograd for optimization problems

### Key Concepts Mastered:

- **Gradient Computation**: Using `.backward()` to compute gradients automatically
- **Chain Rule**: How PyTorch applies the chain rule through computational graphs
- **Memory Management**: When to use `no_grad()` and `detach()` for efficiency
- **Graph Construction**: Understanding when and how computational graphs are built
- **Gradient Accumulation**: How gradients accumulate and when to zero them

These fundamentals are essential for understanding how neural networks learn through backpropagation and how optimization algorithms use gradients to update parameters.