# 📘 Step 1: Differentiation in PyTorch

In this module, we’ll learn how **PyTorch performs differentiation** —  
a core concept used to train neural networks by adjusting their parameters automatically.  

We’ll start with **simple derivatives**, and later extend to **partial derivatives** for multivariable functions.  

---

## 🔹 1️⃣ What is a Derivative?

A **derivative** measures how a function changes as its input changes.

Consider a quadratic function:

$$
y = x^2
$$

Evaluating this function at $( x = 2 )$ gives:

$$
y = 2^2 = 4
$$

---

## 🔹 2️⃣ Finding the Derivative — Step by Step

According to the rules of calculus:

$$
\frac{dy}{dx} = 2x
$$

Here’s how:
- Bring down the **power** of $( x )$ (which is 2) in front of \( x \)  
- Multiply by $$( x^{(2 - 1)} = x^1 )$$

Thus:

$$
\color{#00b300}{\frac{dy}{dx} = 2x}
$$

Evaluating at \( x = 2 \):

$$
\color{#1f77b4}{\frac{dy}{dx}\Big|_{x=2}} = 2 \times 2 = 4
$$

📘 **Interpretation:**  
At $( x = 2 )$, the function is increasing at a rate of **4 units per unit change in $( x )$**.

---

## 🔹 3️⃣ Computing the Derivative in PyTorch (Concept + Intuition)

PyTorch’s **Autograd** engine performs **automatic differentiation** — it builds a **computational graph** that records every operation on tensors that have `requires_grad=True`.

This allows PyTorch to compute derivatives automatically using the **chain rule**, without manually writing calculus.

---

### ⚙️ Step 1 — Create a Tensor with Gradient Tracking

When you create a tensor:

```python
x = torch.tensor(2.0, requires_grad=True)
```

PyTorch starts **tracking all operations** performed on `x`.  
This means that when you later compute a scalar function of `x`, PyTorch can automatically differentiate it.

The flag `requires_grad=True` tells PyTorch:

> “Keep track of all operations on this tensor because I will need its derivatives later.”

---

### 🧩 Internal View of a Tracked Tensor

| Attribute | Meaning |
|------------|----------|
| `requires_grad=True` | Enables gradient tracking |
| `.grad_fn` | Refers to the **function** that created this tensor (e.g., `PowBackward`) |
| `.is_leaf` | `True` if tensor is created by the user (not result of an op) |
| `.grad` | Will store the computed gradient after `.backward()` |
| `.data` | The raw tensor value (without gradient info) |

---

### ⚙️ Step 2 — Define a Function $y = f(x)$

Suppose we define:

```python
y = x ** 2
```

This operation builds a **computation graph** internally:

$$
\color{#1f77b4}{x}
\;\xrightarrow{\text{square}}\;
\color{#ff7f0e}{y = x^2}
$$

Each node in this graph knows:
- what operation produced it, and  
- how to compute its derivative for the backward pass.

PyTorch stores this information — but it hasn’t calculated any gradient yet.

---

### ⚙️ Step 3 — Compute the Derivative with `.backward()`

When you call:

```python
y.backward()
```

PyTorch initiates the **backward pass** through the computation graph.

It applies the **chain rule** from calculus to compute the derivative:

$$
\color{#00b300}{\frac{dy}{dx}} = \frac{d(x^2)}{dx} = 2x
$$

Internally, this happens in **reverse order** of operations — hence the name **reverse-mode autodiff** (used in backpropagation).

---

### ⚙️ Step 4 — Access the Computed Gradient

Once `.backward()` finishes, the gradient is stored in the tensor’s `.grad` attribute:

```python
print(x.grad)
```

At the current value $x = 2$:

$$
\color{#00b300}{\frac{dy}{dx}\Big|_{x=2}} = 2 \times 2 = 4
$$

✅ So, PyTorch outputs:

```
tensor(4.)
```

which perfectly matches the analytical result.

---

### 🧠 Summary Table

| Concept | PyTorch Command | Mathematical Meaning | Example |
|----------|-----------------|----------------------|----------|
| Enable gradient tracking | `requires_grad=True` | Track variable $x$ | $x = 2$ |
| Define differentiable function | `y = x ** 2` | $y = x^2$ | Square operation |
| Compute derivative | `y.backward()` | $\frac{dy}{dx} = 2x$ | Chain rule |
| Access gradient | `x.grad` | Gradient at current $x$ | $4.0$ |

---

### 🧬 How Autograd Works Internally

| Stage | Graph Representation | Description |
|--------|----------------------|--------------|
| ① Tensor creation | $\color{#1f77b4}{x}$ | Leaf node, gradient tracking enabled |
| ② Operation | $\color{#1f77b4}{x} \to \text{square} \to \color{#ff7f0e}{y}$ | PyTorch builds computational graph |
| ③ Backward pass | $\color{#ff7f0e}{y} \Rightarrow \text{backprop} \Rightarrow \color{#00b300}{x.grad}$ | Reverse-mode autodiff |
| ④ Result | $\color{#00b300}{x.grad = 4.0}$ | Derivative stored in `.grad` |

---

### 🔍 Behind the Scenes Summary

- **`requires_grad=True`** → Enables gradient tracking  
- **Computation graph** → Connects operations and tensors  
- **`.backward()`** → Performs differentiation using chain rule  
- **`.grad`** → Stores computed derivative for each **leaf tensor**

---

### ✅ Final Intuition Recap

| Step | Code | Math | Meaning |
|------|------|------|----------|
| 1️⃣ | `x = torch.tensor(2., requires_grad=True)` | $x = 2$ | Define differentiable variable |
| 2️⃣ | `y = x ** 2` | $y = x^2$ | Define function |
| 3️⃣ | `y.backward()` | $\frac{dy}{dx} = 2x$ | Compute derivative |
| 4️⃣ | `x.grad` | $4.0$ | Retrieve derivative value |

---

📘 **Key Takeaway:**  
PyTorch’s **Autograd** system is a dynamic, real-time differentiation engine  
that automatically builds and traverses computational graphs,  
applying the **chain rule** to compute derivatives efficiently —  
the exact mechanism that powers **deep learning training**. ⚡


---

## 🔸 Explanation of Each Step

| Step | What you conceptually do | Why it matters |
|------|---------------------------|----------------|
| 1️⃣ Enable grad tracking | Mark $( x )$ with `requires_grad=True` | Tells PyTorch to **track operations** on $( x )$ for differentiation. |
| 2️⃣ Build the function | Compute $( y = x^2 )$ using tensor ops | PyTorch creates a **computational graph** linking $( y $) back to $( x )$. |
| 3️⃣ Backward pass | Call a backward step on the scalar output $( y )$ | Runs **reverse-mode autodiff**, applying the **chain rule** through the graph. |
| 4️⃣ Read the gradient | Inspect `x.grad` | Stores $( \left.\frac{dy}{dx}\right|_{x=2} = \color{#1f77b4}{4} )$. |

---

## 🧠 What’s Happening Behind the Scenes

PyTorch creates a **computational graph** dynamically as you perform tensor operations.

Each node in this graph represents:
- A **tensor** (like `x` or `y`)  
- A **function** (power, addition, multiplication, etc.) that produced that tensor

When you call a backward step on the scalar output:
1. PyTorch starts from the **output** node (here, \( y \))  
2. Traverses the graph **backward**  
3. Applies the **chain rule of calculus** at each operation  
4. Accumulates gradients into the `.grad` attribute of **leaf tensors** (like  $x$)

This is **reverse-mode automatic differentiation** — the engine behind **backpropagation**.

---

## 📌 Key Formula Recap

The derivative of $( y = x^2 )$ with respect to $( x )$:

$$
\frac{dy}{dx} = 2x
$$

Evaluating at $x = 2$:

$$
\frac{dy}{dx}\Big|_{x=2} = 2 \times 2 = 4
$$


| **Function** | **Derivative** | **At x = 2** | **Result** |
|:-------------:|:--------------:|:-------------:|:-----------:|
| $$ y = x^2 $$ | $$ \color{#00b300}{\frac{dy}{dx} = 2x} $$ | $$ 2 \times 2 $$ | **4** |


---

✅ **Summary**
- `requires_grad=True` → enables gradient tracking  
- Backward step → computes derivative automatically  
- `x.grad` → stores the gradient value  
- PyTorch internally builds and traverses a **backward graph** 🧩  
- This mechanism is the **core of training neural networks** 🤖💡

✨ **In short:**  
PyTorch performs **automatic differentiation** — you define the function, and it does the calculus for you ⚡


# ⚡ The Power Rule in Differentiation

The example we computed earlier for  

$$
y = x^2
$$

is a direct application of the **Power Rule** — one of the most fundamental derivative rules in calculus.

---

## 🔹 The Power Rule — General Formula

If you have a power function of the form:

$$
\color{#00b300}{y = x^n}
$$

then its derivative with respect to \(x\) is:

$$
\color{#1f77b4}{\frac{dy}{dx} = n \, x^{n - 1}}
$$

---

## 🔸 Applying the Power Rule to $y = x^2$

Here, \( n = 2 \):

$$
\frac{dy}{dx} = \color{#00b300}{2} \, x^{\,\color{#d62728}{2 - 1}} = \color{#00b300}{2x}
$$

Evaluating at \( x = 2 \):

$$
\frac{dy}{dx}\Big|_{x=2} = \color{#00b300}{2} \times \color{#1f77b4}{2} = \color{#d62728}{4}
$$

✅ **Result:**  
The slope of the curve $y = x^2$ at $x = 2$ is **4**.

---

## 🧠 Intuition Behind the Power Rule

The Power Rule tells us **how the slope changes** depending on the exponent \(n\):

| Exponent $n$ | Function | Derivative | Behavior |
|:---------------:|:----------|:------------|:-----------|
| $> 1$ | $x^n$ | $n x^{n-1}$ | Slope increases rapidly |
| $= 1$ | $x$ | $1$ | Constant slope |
| $0 < n < 1$ | $x^{1/2}, x^{1/3}$ | Fractional power | Decreasing slope |
| $< 0$ | $x^{-1}, x^{-2}$ | Negative powers | Inverse relationship |

---

## 🔹 More Examples

| Function | Derivative (Using Power Rule) | Simplified Form |
|-----------|-------------------------------|----------------|
| $y = x^3$ | $\frac{dy}{dx} = 3x^2$ | — |
| $y = x^5$ | $\frac{dy}{dx} = 5x^4$ | — |
| $y = x^{-1}$ | $\frac{dy}{dx} = -x^{-2}$ | $-\frac{1}{x^2}$ |
| $y = x^{1/2}$ | $\frac{dy}{dx} = \frac{1}{2}x^{-1/2}$ | $\frac{1}{2\sqrt{x}}$ |


---

## 🔹 The Power Rule in PyTorch 🧮

PyTorch automatically applies the **Power Rule** through its autograd system:

```python
import torch

x = torch.tensor(2.0, requires_grad=True)
y = x ** 2          # y = x²
y.backward()        # Compute dy/dx
print(x.grad)       # ➜ 4.0
```
--- 

## ⚙️ Behind the Scenes: How PyTorch Applies the Power Rule

When you execute the above code, PyTorch internally performs the differentiation using the **Power Rule**:

$$
\color{#1f77b4}{\frac{dy}{dx}} = \color{#00b300}{2x}
$$

Substituting $x = 2$:

$$
\color{#1f77b4}{\frac{dy}{dx}\Big|_{x=2}} = \color{#00b300}{2 \times 2 = 4}
$$

✅ Thus, PyTorch computes the same derivative as calculus — automatically via **autograd**.

---

## ✨ Summary

| Concept | Formula | Meaning |
|----------|----------|----------|
| **Power Rule** | $ \color{#1f77b4}{\frac{dy}{dx} = n x^{n-1}} $ | Multiply by the exponent and reduce the power by 1 |
| **At $n=2$** | $ \frac{dy}{dx} = 2x $ | Linear slope increasing with $x$ |
| **At $x=2$** | $ 2 \times 2 = 4 $ | Instantaneous rate of change = 4 |


---

📘 **Key Takeaway:**  
The **Power Rule** is one of the most common differentiation tools —  
and PyTorch’s **autograd** applies it automatically whenever you perform power operations on tensors ⚡🤖


In [1]:
# PyTorch Autograd — Complete Differentiation Demo (with detailed comments)
# -----------------------------------------------------------------------------
# This notebook cell demonstrates:
#  A) Single-variable derivative: y = x^2 at x=2
#  B) Another example: choose z so that z(2)=9 and z'(2)=6
#  C) Autograd graph introspection: .data, .grad, .grad_fn, .is_leaf, .requires_grad
#  D) Partial derivatives: f(u,v) = u*v + u^2   (df/du = v+2u, df/dv = u)
#  E) Gradient accumulation & zeroing grads
#  F) Detach vs no_grad (turning off gradient tracking)
#  G) Non-scalar outputs: providing grad_outputs (vector-Jacobian product)
#  H) Jacobian & Hessian via autograd.functional
#  I) Bonus: retain_graph example (multiple backward passes through the same graph)
# -----------------------------------------------------------------------------

import torch                                  # import PyTorch main package
from torch.autograd import functional as A     # functional API for Jacobian/Hessian

torch.manual_seed(0)                           # set RNG seed for reproducibility (if randomness used later)

# ---------------------------
# Helper pretty-printers
# ---------------------------
def p_scalar(name, t):
    """
    Print a scalar tensor's value and key autograd flags.
    - name: label to print
    - t: a 0-dim (scalar) tensor
    """
    # f-string formats a line: name | value | dtype | requires_grad flag
    print(f"{name:20s} | value={t.item():>8.4f} | dtype={t.dtype} | req_grad={t.requires_grad}")

def p_info(name, t):
    """
    Print general tensor info including autograd attributes.
    Useful to see .is_leaf, .grad_fn, and current .grad.
    """
    print(f"{name:20s} | shape={tuple(t.shape)} | dtype={t.dtype} | req_grad={t.requires_grad}")
    print(f"  .is_leaf={t.is_leaf} | .grad_fn={t.grad_fn} | .grad={t.grad}")
    print()

def sep(title):
    """Pretty separator line for sections."""
    print("\n" + "-"*10 + f" {title} " + "-"*10)

# =============================================================================
# A) Single-variable derivative: y = x^2  at x=2  → dy/dx = 2x, so 4 at x=2
# =============================================================================
sep("A) Single-variable derivative: y = x^2")

x = torch.tensor(2.0, requires_grad=True)   # create a leaf tensor with value 2.0 and enable gradient tracking
y = x**2                                    # build computation graph node y = x^2 (records pow op with grad_fn)

p_scalar("x (input)", x)                    # print scalar info for x
p_scalar("y = x**2", y)                     # print scalar info for y

y.backward()                                # perform backward pass from scalar y; computes dy/dx at current x
print("dy/dx at x=2 ->", x.grad.item(), "(expected 4.0)")  # read gradient accumulated into x.grad

# Inspect core autograd fields
p_info("x (after backward)", x)             # x is a leaf: .grad_fn=None, .is_leaf=True, .grad holds dy/dx
p_info("y (result node)", y)                # y is non-leaf: has a grad_fn describing the operation that produced it

# =============================================================================
# B) Another example: choose z s.t. z(2)=9 and z'(2)=6
#    One choice: z(x) = 1.5*x^2 + 3  -> z(2)=1.5*4+3=9 ; z'(x) = 3x -> z'(2)=6
# =============================================================================
sep("B) Another single-variable example")

x2 = torch.tensor(2.0, requires_grad=True)  # new independent leaf tensor at 2.0
z  = 1.5 * x2**2 + 3.0                      # define z(x) with chosen coefficients to match target value/derivative

p_scalar("x2 (input)", x2)                  # print scalar info for x2
p_scalar("z = 1.5*x2**2 + 3", z)            # print scalar info for z

z.backward()                                # compute dz/dx2 at x2=2
print("z(2)      ->", z.item(),   "(expected 9.0)")                # verify function value
print("dz/dx|2   ->", x2.grad.item(), "(expected 6.0)")            # verify derivative value

# =============================================================================
# C) Autograd graph introspection
# =============================================================================
sep("C) Autograd graph introspection")

print("x.is_leaf:", x.is_leaf, " | x.grad_fn:", x.grad_fn)          # True; None for leaf tensors (created by user)
print("y.is_leaf:", y.is_leaf, " | y.grad_fn:", y.grad_fn)          # False; has a PowBackward grad_fn
print("x.requires_grad:", x.requires_grad)                          # True; signals autograd tracking
print("x.data:", x.data)                                            # underlying raw tensor (no tracking)
print("x.grad:", x.grad)                                            # accumulated gradient from previous backward
print()                                                             # blank line for readability

# =============================================================================
# D) Partial derivatives: f(u,v) = u*v + u^2
#     df/du = v + 2u, df/dv = u
#     Evaluate at (u,v)=(1,3) -> f=4, df/du=5, df/dv=1
# =============================================================================
sep("D) Partial derivatives f(u,v) = u*v + u**2")

u = torch.tensor(1.0, requires_grad=True)   # leaf tensor u with gradient tracking
v = torch.tensor(3.0, requires_grad=True)   # leaf tensor v with gradient tracking
f = u*v + u**2                              # build graph for f(u,v) = u*v + u^2

p_scalar("u", u)                            # show u
p_scalar("v", v)                            # show v
p_scalar("f = u*v + u**2", f)               # show f

f.backward()                                # compute gradients ∂f/∂u into u.grad and ∂f/∂v into v.grad
print("f(u,v) =", f.item(), " (expected 4)")                   # verify function value at (1,3)
print("df/du  =", u.grad.item(), " (expected 5 = v + 2u)")     # check u.grad == v + 2u = 3 + 2*1 = 5
print("df/dv  =", v.grad.item(), " (expected 1 = u)")          # check v.grad == u = 1

# =============================================================================
# E) Gradient accumulation & zeroing grads
#    By default, calling backward() accumulates into .grad. Zero them if reusing leaves.
# =============================================================================
sep("E) Gradient accumulation & zeroing")

x3 = torch.tensor(2.0, requires_grad=True)  # new leaf
g1 = x3**2                                  # g1(x) = x^2 ; derivative is 2x
g1.backward()                               # compute dg1/dx at x=2 -> 4, store in x3.grad
print("After first backward: x3.grad =", x3.grad.item(), "(2*x @ x=2 -> 4)")

x3.grad.zero_()                             # IMPORTANT: clear accumulated grads before another backward
g2 = 3*x3**2                                # g2(x) = 3x^2 ; derivative is 6x
g2.backward()                               # compute dg2/dx at x=2 -> 12
print("After zero_ and second backward: x3.grad =", x3.grad.item(), "(6*x @ x=2 -> 12)")

# =============================================================================
# F) Detach vs torch.no_grad (stop tracking computations)
# =============================================================================
sep("F) Detach vs no_grad")

a = torch.tensor(2.0, requires_grad=True)   # leaf requiring grad
b = a**3                                    # tracked op: b depends on a (grad flows a <- b)

with torch.no_grad():                       # temporarily disable autograd tracking inside this block
    c = b * 10                              # c does NOT require grad; ops inside no_grad are excluded from graph

d = c + a                                   # d depends on a (requires_grad=True) + c (no grad); result still requires grad
print("a.requires_grad:", a.requires_grad)  # True (leaf)
print("b.requires_grad:", b.requires_grad)  # True (tracked)
print("c.requires_grad:", c.requires_grad, " (no-grad)")  # False (created under no_grad)
print("d.requires_grad:", d.requires_grad, " (depends on a)")  # True (because of a)

b_detached = b.detach()                     # create a tensor with same data as b but NO grad history (cut graph)
e = b_detached * 5 + a                      # e depends only on a for gradients; path through b is cut
# Zero a.grad first to avoid accumulation from earlier demos
if a.grad is not None:                      # guard in case this is the first run
    a.grad.zero_()                          # clear any previous gradient on 'a'
e.backward()                                # compute de/da; derivative wrt a is 1 from "+ a" term
print("a.grad from e.backward():", a.grad)  # should be tensor(1.) after zeroing then backward

# =============================================================================
# G) Non-scalar outputs → provide grad_outputs (vector-Jacobian product)
#    Example: F(x) = [x^2, 3x], evaluated at x=2
#    For vector outputs, backward needs a "gradient" of the same shape (weights for dot product).
# =============================================================================
sep("G) Non-scalar outputs & grad_outputs (VJP)")

x4 = torch.tensor(2.0, requires_grad=True)  # new leaf
F = torch.stack([x4**2, 3*x4])              # construct a vector output tensor of shape (2,)
w = torch.tensor([1.0, 1.0])                # choose weights for linear combination w^T F (same shape as F)
F.backward(gradient=w)                      # computes d/dx4 (w1*x^2 + w2*3x) at x=2; here: 2*x + 3 = 7
print("F(x) =", F.detach().tolist(), "at x=2")                   # show the vector F(2)
print("∂(w^T F)/∂x at x=2 with w=[1,1]:", x4.grad.item(), "(expected 7)")  # print resulting gradient

# =============================================================================
# H) Full Jacobian and Hessian with autograd.functional
#    - jacobian(func, inputs) returns partial derivatives of each output wrt inputs
#    - hessian(func, inputs) returns second derivatives (can be expensive)
# =============================================================================
sep("H) Jacobian & Hessian via autograd.functional")

def F_uv(u, v):
    """
    Vector-valued function F(u, v) returning:
      F1 = u*v + u^2
      F2 = u - v
    This version explicitly takes two arguments (u, v),
    because autograd.functional.jacobian expands tuple inputs.
    """
    return torch.stack([
        u * v + u**2,   # F1(u,v)
        u - v           # F2(u,v)
    ])

# Define the two scalar inputs (u, v)
u = torch.tensor(1.0, requires_grad=True)
v = torch.tensor(3.0, requires_grad=True)

# Compute the full Jacobian matrix of F wrt (u, v)
J = A.jacobian(F_uv, (u, v))  # autograd will call F_uv(u, v)

# J is nested tuple: ( (dF/du), (dF/dv) )
# Each entry is a tensor of shape (output_dim, input_dim_component)
J_mat = torch.stack([
    torch.stack([J[0][0], J[0][1]]),  # dF1/du, dF1/dv
    torch.stack([J[1][0], J[1][1]])   # dF2/du, dF2/dv
])
print("Jacobian of F=[u*v+u^2, u-v] at (u=1,v=3):")
print(J_mat)  # Expected [[v+2u, u], [1, -1]] -> [[5,1],[1,-1]]

# Hessian example (second derivative) for scalar function h(x) = x^4
def h_scalar(x_):
    """Scalar function h(x) = x^4, second derivative = 12x^2"""
    return x_**4

xh = torch.tensor(2.0, requires_grad=True)
H = A.hessian(h_scalar, xh)
print("Hessian (second derivative) of x^4 at x=2 ->", H.item(), "(expected 48)")
# =============================================================================
# I) Bonus: retain_graph example
#    If you need to call backward twice on the SAME graph, pass retain_graph=True on the first call.
#    Note: Gradients accumulate into .grad, so zero them between passes if needed.
# =============================================================================
sep("I) retain_graph example")

x5 = torch.tensor(2.0, requires_grad=True)    # new leaf
y5 = (x5**3) + (x5**2)                        # define y5(x) = x^3 + x^2
# First backward pass: retain_graph=True keeps the graph so we can call backward again
y5.backward(retain_graph=True)                # computes dy5/dx5 and keeps graph alive
print("First pass dy/dx (accumulated):", x5.grad.item(), "(= 3*x^2 + 2*x at x=2 -> 12 + 4 = 16)")

x5.grad.zero_()                               # clear accumulated gradient before second backward
y5.backward()                                 # second backward; graph is still valid because we retained it earlier
print("Second pass dy/dx (after zero_):", x5.grad.item(), "(same expected 16)")

print("\n✅ Completed: single & partial derivatives, accumulation, detach/no_grad, VJP, Jacobian, Hessian, retain_graph.")



---------- A) Single-variable derivative: y = x^2 ----------
x (input)            | value=  2.0000 | dtype=torch.float32 | req_grad=True
y = x**2             | value=  4.0000 | dtype=torch.float32 | req_grad=True
dy/dx at x=2 -> 4.0 (expected 4.0)
x (after backward)   | shape=() | dtype=torch.float32 | req_grad=True
  .is_leaf=True | .grad_fn=None | .grad=4.0

y (result node)      | shape=() | dtype=torch.float32 | req_grad=True
  .is_leaf=False | .grad_fn=<PowBackward0 object at 0x125c1ee60> | .grad=None


---------- B) Another single-variable example ----------
x2 (input)           | value=  2.0000 | dtype=torch.float32 | req_grad=True
z = 1.5*x2**2 + 3    | value=  9.0000 | dtype=torch.float32 | req_grad=True
z(2)      -> 9.0 (expected 9.0)
dz/dx|2   -> 6.0 (expected 6.0)

---------- C) Autograd graph introspection ----------
x.is_leaf: True  | x.grad_fn: None
y.is_leaf: False  | y.grad_fn: <PowBackward0 object at 0x1277b85b0>
x.requires_grad: True
x.data: tensor(2.)
x.grad: tensor

  print(f"  .is_leaf={t.is_leaf} | .grad_fn={t.grad_fn} | .grad={t.grad}")


F(x) = [4.0, 6.0] at x=2
∂(w^T F)/∂x at x=2 with w=[1,1]: 7.0 (expected 7)

---------- H) Jacobian & Hessian via autograd.functional ----------
Jacobian of F=[u*v+u^2, u-v] at (u=1,v=3):
tensor([[ 5.,  1.],
        [ 1., -1.]])
Hessian (second derivative) of x^4 at x=2 -> 48.0 (expected 48)

---------- I) retain_graph example ----------
First pass dy/dx (accumulated): 16.0 (= 3*x^2 + 2*x at x=2 -> 12 + 4 = 16)
Second pass dy/dx (after zero_): 16.0 (same expected 16)

✅ Completed: single & partial derivatives, accumulation, detach/no_grad, VJP, Jacobian, Hessian, retain_graph.
