# üé® Gallery of Gradients: Tangent's Readable Code

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pedronahum/tangent/blob/master/examples/Gallery_of_Gradients.ipynb)

---

## The Killer Feature: Readable Gradient Code

Unlike other autodiff libraries that build computation graphs, **Tangent generates pure, readable Python code** for gradients. This means:

- ‚úÖ **You can read it** - See exactly how gradients are computed
- ‚úÖ **You can debug it** - Step through with a debugger
- ‚úÖ **You can learn from it** - Understand automatic differentiation
- ‚úÖ **You can optimize it** - Apply standard Python optimization techniques

This notebook is a **curated gallery** of gradient code examples, showing how Tangent transforms various Python patterns into their gradient counterparts.

---

## Setup

First, let's install and import Tangent:

In [None]:
# Install Tangent (uncomment for Colab)
# !pip install git+https://github.com/pedronahum/tangent.git numpy matplotlib

import tangent
import numpy as np

print(f"‚úì Tangent version: {tangent.__version__ if hasattr(tangent, '__version__') else 'dev'}")
print(f"‚úì NumPy version: {np.__version__}")

---

## Example 1: Simple Polynomial üî¢

Let's start with the simplest possible example: a polynomial function.

### Original Function

In [None]:
def polynomial(x):
    """Compute f(x) = x^3 + 2x^2 + 3x + 4"""
    return x**3 + 2*x**2 + 3*x + 4

### Generated Gradient Code

In [None]:
# Generate gradient function and print the code
dpolynomial = tangent.grad(polynomial, verbose=1)

### üí° What's Happening?

**Forward Pass (top section):**
- Tangent stores intermediate results (`x_to_the_3`, `x_to_the_2`, etc.)
- These are needed for computing gradients later

**Backward Pass (bottom section):**
- Gradients flow **backward** through operations
- Each operation's gradient is computed using the **chain rule**
- Notice the power rule: `3 * x ** 2` for the $x^3$ term

**The Math:**
$$\frac{d}{dx}(x^3 + 2x^2 + 3x + 4) = 3x^2 + 4x + 3$$

Let's verify:

In [None]:
x = 2.0
gradient = dpolynomial(x)
expected = 3*x**2 + 4*x + 3  # 3*4 + 4*2 + 3 = 23

print(f"x = {x}")
print(f"Computed gradient: {gradient}")
print(f"Expected gradient: {expected}")
print(f"Match: {np.isclose(gradient, expected)} ‚úì")

---

## Example 2: For Loop üîÑ

This is where things get interesting! Let's see how Tangent handles a for loop.

### Original Function

In [None]:
def sum_of_powers(x, n=5):
    """Compute sum: x^1 + x^2 + x^3 + ... + x^n"""
    result = 0.0
    for i in range(1, n+1):
        result += x ** i
    return result

### Generated Gradient Code

In [None]:
dsum_of_powers = tangent.grad(sum_of_powers, verbose=1)

### üí° What's Happening?

**The Loop Runs Twice!**
1. **Forward Pass** (top): Loop runs forward, storing intermediate values
2. **Backward Pass** (bottom): Loop runs **in reverse**, accumulating gradients!

**Why Reverse?**
- Gradients must flow backward through operations
- Later operations depend on earlier ones
- Running backward ensures correct gradient accumulation

**Key Insight:**
```python
# Forward: result += x ** i
# Backward: bx += i * x ** (i-1) * bresult
```

This is the **chain rule** in action!

Let's verify:

In [None]:
x = 2.0
gradient = dsum_of_powers(x, n=5)

# d/dx(x + x^2 + x^3 + x^4 + x^5) = 1 + 2x + 3x^2 + 4x^3 + 5x^4
expected = sum(i * x**(i-1) for i in range(1, 6))

print(f"x = {x}")
print(f"Computed gradient: {gradient}")
print(f"Expected gradient: {expected}")
print(f"Match: {np.isclose(gradient, expected)} ‚úì")

---

## Example 3: While Loop üåÄ

While loops are even more interesting because the iteration count depends on runtime values.

### Original Function

In [None]:
def geometric_series(x, threshold=0.001):
    """Compute sum: x + x^2 + x^3 + ... until term < threshold"""
    result = 0.0
    term = x
    power = 1
    
    while term > threshold:
        result += term
        power += 1
        term = x ** power
    
    return result

### Generated Gradient Code

In [None]:
dgeometric_series = tangent.grad(geometric_series, verbose=1)

### üí° What's Happening?

**The Challenge:**
- While loops don't have a fixed iteration count
- Number of iterations depends on input value

**Tangent's Solution:**
1. **Stack-based tape recording** during forward pass
2. Values are pushed onto a stack each iteration
3. During backward pass, values are popped in reverse order
4. This ensures correct gradient computation regardless of iteration count!

**Notice:**
- `_stack` variables for recording loop history
- `push` operations in forward pass
- `pop` operations in backward pass

Let's test it:

In [None]:
x = 0.5
gradient = dgeometric_series(x, threshold=0.001)

# For small threshold, this approaches d/dx(x/(1-x)) = 1/(1-x)^2
expected_approx = 1.0 / (1.0 - x)**2

print(f"x = {x}")
print(f"Computed gradient: {gradient}")
print(f"Expected (approx): {expected_approx}")
print(f"Close enough: {abs(gradient - expected_approx) < 0.1} ‚úì")

---

## Example 4: Conditional Logic üîÄ

How does Tangent handle if/else branches?

### Original Function

In [None]:
def relu_like(x):
    """A ReLU-like activation: f(x) = x^2 if x > 0 else -x"""
    if x > 0:
        result = x ** 2
    else:
        result = -x
    return result

### Generated Gradient Code

In [None]:
drelu_like = tangent.grad(relu_like, verbose=1)

### üí° What's Happening?

**Forward Pass:**
- The condition `x > 0` is evaluated
- Only ONE branch executes
- The return value is stored

**Backward Pass:**
- **The same branch** that executed forward must execute backward
- Tangent uses the saved condition result
- Gradient computed only for the executed path

**The Math:**
$$\frac{df}{dx} = \begin{cases} 2x & \text{if } x > 0 \\ -1 & \text{otherwise} \end{cases}$$

Let's verify both branches:

In [None]:
# Positive case
x_pos = 3.0
grad_pos = drelu_like(x_pos)
expected_pos = 2 * x_pos  # d/dx(x^2) = 2x

print("Positive branch (x > 0):")
print(f"  x = {x_pos}")
print(f"  Gradient: {grad_pos}")
print(f"  Expected: {expected_pos}")
print(f"  Match: {np.isclose(grad_pos, expected_pos)} ‚úì")

# Negative case
x_neg = -2.0
grad_neg = drelu_like(x_neg)
expected_neg = -1.0  # d/dx(-x) = -1

print("\nNegative branch (x ‚â§ 0):")
print(f"  x = {x_neg}")
print(f"  Gradient: {grad_neg}")
print(f"  Expected: {expected_neg}")
print(f"  Match: {np.isclose(grad_neg, expected_neg)} ‚úì")

---

## Example 5: NumPy Array Operations üìä

Let's see how Tangent handles arrays and reductions.

### Original Function

In [None]:
def weighted_sum(x, weights):
    """Compute weighted sum: sum(x * weights)"""
    return np.sum(x * weights)

### Generated Gradient Code

In [None]:
dweighted_sum = tangent.grad(weighted_sum, wrt=(0,), verbose=1)  # Gradient w.r.t. x

### üí° What's Happening?

**Key Operations:**

1. **Element-wise multiply** `x * weights`:
   - Gradient of multiply: each gradient gets the other operand
   - $\frac{\partial}{\partial x}(x \cdot w) = w$

2. **Sum reduction** `np.sum(...)`:
   - Forward: Many values ‚Üí One value (reduction)
   - Backward: One gradient ‚Üí Many gradients (broadcast)
   - Tangent uses `unreduce` to reverse the reduction

**Broadcasting Magic:**
- `unreduce` spreads the scalar gradient back to match the input shape
- This handles arrays of any shape automatically!

Let's test:

In [None]:
x = np.array([1.0, 2.0, 3.0])
weights = np.array([0.5, 0.3, 0.2])

gradient = dweighted_sum(x, weights)

print(f"x = {x}")
print(f"weights = {weights}")
print(f"Gradient w.r.t. x: {gradient}")
print(f"Expected: weights = {weights}")
print(f"Match: {np.allclose(gradient, weights)} ‚úì")

---

## Example 6: Nested Function Calls üì¶

Let's see how Tangent handles function composition.

### Original Functions

In [None]:
def sigmoid(x):
    """Sigmoid activation: œÉ(x) = 1 / (1 + exp(-x))"""
    return 1.0 / (1.0 + np.exp(-x))

def neural_layer(x, w, b):
    """Simple neural layer: œÉ(w*x + b)"""
    return sigmoid(w * x + b)

### Generated Gradient Code

In [None]:
dneural_layer = tangent.grad(neural_layer, wrt=(1,), verbose=1)  # Gradient w.r.t. w

### üí° What's Happening?

**Function Inlining:**
- Tangent **inlines** the `sigmoid` function call
- The gradient code contains the full computation
- No separate gradient function for `sigmoid` needed!

**Chain Rule:**
- Outer function: `sigmoid(z)` where `z = w*x + b`
- Inner function: `w*x + b`
- Gradient: $\frac{df}{dw} = \frac{df}{dz} \cdot \frac{dz}{dw} = \sigma'(z) \cdot x$

**Sigmoid Gradient:**
- $\sigma'(x) = \sigma(x) \cdot (1 - \sigma(x))$
- Notice how Tangent reuses the forward result!

Let's test:

In [None]:
x, w, b = 2.0, 0.5, 0.1
gradient = dneural_layer(x, w, b)

# Manual computation
z = w * x + b
sig = sigmoid(z)
expected = sig * (1 - sig) * x  # Chain rule: œÉ'(z) * dz/dw

print(f"x={x}, w={w}, b={b}")
print(f"Gradient: {gradient}")
print(f"Expected: {expected}")
print(f"Match: {np.isclose(gradient, expected)} ‚úì")

---

## Example 7: Matrix Operations üî¢

Let's see Tangent handle the new colon slice feature!

### Original Function

In [None]:
def matrix_row_norm(A):
    """Compute L2 norm of first row: ||A[0, :]||^2"""
    row = A[0, :]  # Extract first row using colon slice
    return np.sum(row ** 2)

### Generated Gradient Code

In [None]:
dmatrix_row_norm = tangent.grad(matrix_row_norm, verbose=1)

### üí° What's Happening?

**Colon Slice Magic:**
- `A[0, :]` is converted to `A[0, slice(None, None, None)]`
- Tangent creates a slice object for the `:` notation
- Gradients flow back only to the selected row!

**Gradient Routing:**
- Forward: Select specific elements (row 0)
- Backward: Gradient goes only to those elements
- Other elements get zero gradient

**The Math:**
- For $f(A) = \sum_i A_{0,i}^2$
- $\frac{\partial f}{\partial A_{0,i}} = 2 A_{0,i}$
- $\frac{\partial f}{\partial A_{j,i}} = 0$ for $j \neq 0$

Let's test:

In [None]:
A = np.array([[1.0, 2.0, 3.0],
              [4.0, 5.0, 6.0]])

gradient = dmatrix_row_norm(A)

# Expected: 2*A[0,:] in first row, zeros elsewhere
expected = np.zeros_like(A)
expected[0, :] = 2 * A[0, :]

print(f"A =\n{A}")
print(f"\nGradient =\n{gradient}")
print(f"\nExpected =\n{expected}")
print(f"\nMatch: {np.allclose(gradient, expected)} ‚úì")

---

## Example 8: Optimization Comparison ‚ö°

Tangent can optimize the generated gradient code. Let's see the difference!

### Original Function

In [None]:
def complex_function(x):
    """A function with repeated subexpressions"""
    a = x * x
    b = a + x
    c = a * 2  # Reuses 'a'
    return b + c

### Unoptimized Gradient

In [None]:
print("=" * 70)
print("UNOPTIMIZED GRADIENT CODE")
print("=" * 70)
dcomplex_unopt = tangent.grad(complex_function, optimized=False, verbose=1)

### Optimized Gradient

In [None]:
print("\n" + "=" * 70)
print("OPTIMIZED GRADIENT CODE")
print("=" * 70)
dcomplex_opt = tangent.grad(complex_function, optimized=True, verbose=1)

### üí° What's Happening?

**Optimizations Applied:**

1. **Dead Code Elimination (DCE)**
   - Removes unused intermediate variables
   - Eliminates calculations that don't affect the output

2. **Common Subexpression Elimination (CSE)**
   - Identifies repeated calculations
   - Computes them once, reuses result

3. **Algebraic Simplification**
   - Simplifies mathematical expressions
   - E.g., `x * 1` ‚Üí `x`, `x + 0` ‚Üí `x`

**Performance Impact:**
- Fewer operations = faster execution
- Less memory usage
- **Same mathematical result!**

Let's verify they're equivalent:

In [None]:
x = 3.0
grad_unopt = dcomplex_unopt(x)
grad_opt = dcomplex_opt(x)

print(f"x = {x}")
print(f"Unoptimized gradient: {grad_unopt}")
print(f"Optimized gradient:   {grad_opt}")
print(f"Same result: {np.isclose(grad_unopt, grad_opt)} ‚úì")
print(f"\nOptimizations don't change correctness, only performance!")

---

## Summary: Why Readable Gradients Matter üéØ

### 1. **Educational Value** üéì
- See exactly how automatic differentiation works
- Understand the chain rule in practice
- Learn how loops and conditionals are differentiated

### 2. **Debugging Power** üêõ
- Step through gradient code with a debugger
- Set breakpoints in gradient computation
- Print intermediate gradient values

### 3. **Performance Optimization** ‚ö°
- Profile gradient code like any Python function
- Apply standard optimization techniques
- Understand computational bottlenecks

### 4. **Trust and Verification** ‚úÖ
- Verify gradient correctness by inspection
- No "black box" computation graphs
- See exactly what Tangent generates

### 5. **Customization** üîß
- Modify generated code if needed
- Add custom logic to gradients
- Integrate with existing code seamlessly

---

## What's Next? üöÄ

Now that you've seen the gallery, try these:

1. **Experiment with your own functions**
   - Write a function
   - Call `tangent.grad(your_function, verbose=1)`
   - Study the generated code!

2. **Check out other examples**
   - [Building Energy Optimization](Building_Energy_Optimization_with_Tangent.ipynb)
   - [Tangent Tutorial](../notebooks/tangent_tutorial.ipynb)

3. **Read the documentation**
   - [Python Feature Support](../docs/features/PYTHON_FEATURE_SUPPORT.md)
   - [Error Messages Guide](../docs/features/ERROR_MESSAGES.md)

4. **Contribute!**
   - Found a bug? [Report it](https://github.com/pedronahum/tangent/issues)
   - Have an idea? [Discuss it](https://github.com/pedronahum/tangent/discussions)
   - Want to help? [Pull requests welcome](https://github.com/pedronahum/tangent/pulls)!

---

## Try It Yourself! üíª

Use the cell below to experiment with your own functions:

In [None]:
# Write your own function here!
def my_function(x):
    # TODO: Add your code here
    return x ** 2

# Generate and inspect the gradient
dmy_function = tangent.grad(my_function, verbose=1)

# Test it
x_test = 5.0
print(f"\nGradient at x={x_test}: {dmy_function(x_test)}")

---

**Made with ‚ù§Ô∏è by the Tangent community**

**License**: Apache 2.0

**Citation:**
```bibtex
@misc{tangent_gallery,
  title={Gallery of Gradients: Readable Gradient Code with Tangent},
  author={Tangent Contributors},
  year={2025},
  url={https://github.com/pedronahum/tangent/tree/master/examples}
}
```