# Discussion Week 6

In this discussion we review the Derivative and integrals by focusing on their computational aspects. 

You can use the Shared Computing Cluster (SCC) or Google Colab to run this notebook.

The general instructions for running on the SCC are available under General Resources on [Piazza](https://piazza.com/bu/fall2025/ds722/resources).

## Problem 1: Automatic Differentiation with PyTorch

In this exercise, you'll explore how PyTorch builds computational graphs and computes gradients automatically using `autograd`.

### Objectives

- Define scalar and vector functions using PyTorch tensors.
- Use `.backward()` to compute gradients.
- Visualize the computational graph and understand gradient propagation.

### Step 1: Import PyTorch

In [None]:
import torch

### Step 2: Scalar Function

Below we illustrate how to use Pytorch autograd to compute $f(2)$, and $f^{\prime}(2)$ of $f(x)=x^{2} + 3x + 2$.

In [None]:
# Create a tensor with requires_grad=True to track computation
x = torch.tensor(2.0, requires_grad=True)

# Define the function
f = x**2 + 3*x + 2

# Compute the gradient
f.backward()

# Print the gradient df/dx
print("x =", x.item())
print("f(x) =", f.item())
print("df/dx =", x.grad.item())

### Step 3: Computational Graph

Each operation in $f(x)$ on the tensor $x=2$ creates a node in the computational graph:

- $c = 3\cdot x$ (Multiplication)
- $b = x\cdot x$ (Power)
- $a = b + c$ (Add)
- $f = a + 2$ (Add)

Using the `grad_fn` and `next_function` attributes, we can see the inner workings of the Pytorch data structures responsible for storing and calculating the partial derivatives to calculate the gradients.

It is an efficient way to compute $\nabla f(x)$ by first computing

1. $df/da$
1. $da/db$ and $da/dc$
1. $db/dx$ and $dc/dx$

then $\nabla f(x) = (df/da)(da/db)(db/dx) + (df/da)(da/dc)(dc/dx)$.


Execute the following cell to observe this behavior.

In [None]:
# AddBackward object for adding the two quantities f = a + 2
print("f.grad_fn:", f.grad_fn)
# AddBackward object for adding the two quantities a = b + c and None for the scalar 2
print("f.grad_fn.next_functions:", f.grad_fn.next_functions)
# PowBackward object for b = x**2 and MulBackward for c=3*x
print("f.grad_fn.next_functions:", f.grad_fn.next_functions[0][0].next_functions)
# AccumulateGrad object is a special node to tell the program  to store the calculated gradient of x when backward is called
# Leaf node corresponding to the tensor x
print("f.grad_fn.next_functions:", f.grad_fn.next_functions[0][0].next_functions[0][0].next_functions)
# AccumulateGrad object is a placeholder for x and None for scalar 3
print("f.grad_fn.next_functions:", f.grad_fn.next_functions[0][0].next_functions[1][0].next_functions)

### Step 4: Vector Function

Create similar code as in Step 2 for the vector function $f(x,y)=x^{2}y + \sin{y}$.

In [None]:
#TODO

### Step 5: Computational Graph

Create similar code as in Step 3 for the vector function $f(x,y)=x^{2}y + \sin{y}$.

Is it becoming clearer how Pytorch computes the gradients of functions and how autograd works?

In [None]:
#TODO

## Problem 2: Differentiating Tensor Expressions with Pytorch

Recall from class when we computed the derivative $\frac{dK}{dR}$ of $K=R^{T}R$ where $R\in\mathbb{R}\in\mathbb{R}^{m\times n}$.

The derivative $\frac{dK}{dR}$ was a 4th order tensor of size $n\times n\times m\times n$ and was given by the formula

$$
\frac{\partial K_{pq}}{\partial R_{ij}} =
\begin{cases}
R_{iq} & \text{if}~j=p,~p\neq q \\
R_{ip} & \text{if}~j=q,~p\neq q \\
R_{iq} & \text{if}~j=p,~p=q \\
0 & \text{otherwise} \\
\end{cases},
$$

where $p,q,j=1,\ldots,n$ and $i=1,\ldots m$.

## Step 1

Complete the coding cell below that takes as input a random Pytorch tensor $R\in\mathbb{R}^{4\times 3}$ and using the above formula, create the 4-D tensor $\frac{\partial K_{pq}}{\partial R_{ij}}$.

In [None]:
import torch

# Dimensions
m, n = 4, 3
R = torch.randn(m, n, requires_grad=True)
#TODO

## Step 2:

Define a function `compute_K(R)` which computes the matrix multiplication $R^{T}R$. Then, using `torch.autograd.function.jacobian(compute_K, R)` return the tensor $\frac{\partial K_{pq}}{\partial R_{ij}}$. Compare this with what you computed manually.

In [None]:
#TODO

## Step 3

Let $f(x) = xx^{T}$ for $x\in\mathbb{R}^{n}$. Compute by hand the derivative $\frac{df}{dx}$, then complete the code cell below to compute this for a random 1-D torch when $n=5$.

In [None]:
# TODO

## Step 4

Define a function `compute_f(x)` which computes the outer product $xx^{T}$. Then, using `torch.autograd.function.jacobian(f, x)` return the tensor $\frac{\partial f_{ij}}{\partial x_{k}}$. Compare this with what you computed manually.

In [None]:
# TODO