# Broadcasting in PyTorch / NumPy

Broadcasting is the mechanism that allows tensors of different shapes to be combined in element-wise operations. Understanding it is **essential** for writing efficient tensor code without explicit loops.

In [1]:
import torch
import numpy as np

def arange(i: int):
    """Helper to create a range tensor."""
    return torch.tensor(range(i))

## The Broadcasting Rule

When operating on two tensors, PyTorch/NumPy compares their shapes **element-wise from right to left**. Two dimensions are compatible when:

1. They are **equal**, or
2. One of them is **1**

If neither condition is met → **Error**

The resulting shape has the **maximum** size along each dimension.

## Example 1: Scalar + Vector

A scalar (shape `()`) broadcasts to any shape.

In [2]:
a = arange(5)  # shape (5,)
b = 10         # scalar, shape ()

print(f"a = {a}")
print(f"a + 10 = {a + b}")
print(f"Result shape: {(a + b).shape}")

a = tensor([0, 1, 2, 3, 4])
a + 10 = tensor([10, 11, 12, 13, 14])
Result shape: torch.Size([5])


## Example 2: Vector + Vector (same shape)

Element-wise addition when shapes match exactly.

In [3]:
a = arange(4)  # [0, 1, 2, 3]
b = arange(4) * 2  # [0, 2, 4, 6]

print(f"a = {a}")
print(f"b = {b}")
print(f"a + b = {a + b}")

a = tensor([0, 1, 2, 3])
b = tensor([0, 2, 4, 6])
a + b = tensor([0, 3, 6, 9])


## Example 3: The Power of `[:, None]` — Creating an Outer Product

This is the **most important** broadcasting pattern!

- `a` has shape `(4,)` → treated as `(1, 4)` when needed
- `b[:, None]` has shape `(3, 1)`

Broadcasting creates a `(3, 4)` result!

In [4]:
a = arange(4)         # shape (4,)
b = arange(3)[:, None]  # shape (3, 1)

print(f"a shape: {a.shape}")
print(f"a = {a}")
print()
print(f"b shape: {b.shape}")
print(f"b = \n{b}")
print()
print(f"a + b shape: {(a + b).shape}")
print(f"a + b = \n{a + b}")

a shape: torch.Size([4])
a = tensor([0, 1, 2, 3])

b shape: torch.Size([3, 1])
b = 
tensor([[0],
        [1],
        [2]])

a + b shape: torch.Size([3, 4])
a + b = 
tensor([[0, 1, 2, 3],
        [1, 2, 3, 4],
        [2, 3, 4, 5]])


### Visualizing the broadcast:

```
        a = [0, 1, 2, 3]     shape (4,) → (1, 4)
            
    b = [[0],               shape (3, 1)
         [1],
         [2]]

Broadcast a across rows, b across columns:

    a + b = [[0+0, 0+1, 0+2, 0+3],     [[0, 1, 2, 3],
             [1+0, 1+1, 1+2, 1+3],  =   [1, 2, 3, 4],
             [2+0, 2+1, 2+2, 2+3]]      [2, 3, 4, 5]]
```

## Example 4: Outer Product via Broadcasting

Multiply a column vector by a row vector to get all pairwise products.

In [5]:
a = arange(3)[:, None]  # column: shape (3, 1)
b = arange(4)           # row: shape (4,)

outer = a * b
print(f"a (column) = \n{a}")
print(f"b (row) = {b}")
print(f"\nouter product (a * b) = \n{outer}")

a (column) = 
tensor([[0],
        [1],
        [2]])
b (row) = tensor([0, 1, 2, 3])

outer product (a * b) = 
tensor([[0, 0, 0, 0],
        [0, 1, 2, 3],
        [0, 2, 4, 6]])


## Example 5: Creating an Identity Matrix

Compare row indices with column indices using broadcasting.

In [8]:
n = 5
rows = arange(n)[:, None]  # shape (5, 1)
cols = arange(n)           # shape (5,)

# Comparison broadcasts to (5, 5)
eye = (rows == cols).int()
print(f"Identity matrix:\n{eye}")

Identity matrix:
tensor([[1, 0, 0, 0, 0],
        [0, 1, 0, 0, 0],
        [0, 0, 1, 0, 0],
        [0, 0, 0, 1, 0],
        [0, 0, 0, 0, 1]], dtype=torch.int32)


## Example 6: Upper Triangular Matrix

Create a mask where row ≤ column.

In [11]:
n = 5
rows = arange(n)[:, None]  # shape (5, 1)
cols = arange(n)           # shape (5,)

triu = (rows <= cols).int()
print(f"Upper triangular:\n{triu}")

Upper triangular:
tensor([[1, 1, 1, 1, 1],
        [0, 1, 1, 1, 1],
        [0, 0, 1, 1, 1],
        [0, 0, 0, 1, 1],
        [0, 0, 0, 0, 1]], dtype=torch.int32)


## Example 7: Sequence Masking (Batch Processing)

A common NLP pattern: mask positions beyond each sequence's length.

In [12]:
batch_size, max_len = 3, 6
lengths = torch.tensor([3, 5, 2])  # each sequence's actual length

# Create position indices
positions = arange(max_len)  # shape (6,)

# Compare: positions < lengths (need to align shapes)
# lengths[:, None] has shape (3, 1)
# positions has shape (6,)
# Result broadcasts to (3, 6)

mask = positions < lengths[:, None]
print(f"Lengths: {lengths}")
print(f"Positions: {positions}")
print(f"\nMask (True = valid position):\n{mask.int()}")

Lengths: tensor([3, 5, 2])
Positions: tensor([0, 1, 2, 3, 4, 5])

Mask (True = valid position):
tensor([[1, 1, 1, 0, 0, 0],
        [1, 1, 1, 1, 1, 0],
        [1, 1, 0, 0, 0, 0]], dtype=torch.int32)


In [17]:
print(mask)
print(lengths[:,None])

tensor([[ True,  True,  True, False, False, False],
        [ True,  True,  True,  True,  True, False],
        [ True,  True, False, False, False, False]])
tensor([[3],
        [5],
        [2]])


## Example 8: Broadcasting Error

Shapes `(3,)` and `(4,)` are **not compatible** — neither dimension is 1!

In [None]:
a = arange(3)  # shape (3,)
b = arange(4)  # shape (4,)

try:
    result = a + b
except RuntimeError as e:
    print(f"Error: {e}")

## Shape Alignment Cheat Sheet

| Shape A | Shape B | Result | Works? |
|---------|---------|--------|--------|
| `(5,)` | `()` | `(5,)` | ✅ |
| `(5,)` | `(5,)` | `(5,)` | ✅ |
| `(5,)` | `(1,)` | `(5,)` | ✅ |
| `(5,)` | `(4,)` | — | ❌ |
| `(3, 4)` | `(4,)` | `(3, 4)` | ✅ |
| `(3, 4)` | `(3, 1)` | `(3, 4)` | ✅ |
| `(3, 1)` | `(1, 4)` | `(3, 4)` | ✅ |
| `(3, 4)` | `(2, 4)` | — | ❌ |

## Key Takeaways

1. **Align from the right** — shapes are compared starting from the last dimension
2. **Size 1 is magic** — a dimension of size 1 can broadcast to any size
3. **`[:, None]` is your friend** — use it to add a size-1 dimension for broadcasting
4. **Think in grids** — broadcasting lets you create 2D results from 1D inputs
5. **No loops needed** — broadcasting replaces explicit Python loops with fast C/CUDA ops