# Matrix Math and Linear Transformations — Try it in PyTorch

This is an **optional** hands-on companion to [Chapter 5](https://learnai.robennals.org/05-matrix-math). You'll build matrices, watch them transform shapes, and see that a neural network layer is really just a matrix operation.

In [None]:
import torch
import torch.nn as nn
import matplotlib.pyplot as plt
import numpy as np

## Scalars, Vectors, and Matrices

These are just names for tensors of different sizes:
- A **scalar** is a single number (0 dimensions)
- A **vector** is a list of numbers (1 dimension)
- A **matrix** is a grid of numbers (2 dimensions — rows and columns)

You've been using all of these already! Now we'll see what happens when you *multiply* matrices together.

In [None]:
# Scalar — a single number
scalar = torch.tensor(3.14)
print(f"Scalar: {scalar}  (shape: {scalar.shape})")

# Vector — a list of numbers
vector = torch.tensor([1.0, 2.0, 3.0])
print(f"Vector: {vector}  (shape: {vector.shape})")

# Matrix — a grid of numbers (rows × columns)
matrix = torch.tensor([[1.0, 2.0],
                        [3.0, 4.0],
                        [5.0, 6.0]])
print(f"Matrix:\n{matrix}  (shape: {matrix.shape} = 3 rows × 2 columns)")

## 2D Transformations

Multiplying a 2D point by a 2×2 matrix moves that point to a new location. Apply the same matrix to every point of a shape and you get a transformed shape. Different matrices produce different transformations: rotation, stretching, shearing, and more.

The `@` symbol in PyTorch means **matrix multiplication**.

In [None]:
# Define a triangle as 3 points (each point is a 2D vector)
triangle = torch.tensor([[0.0, 0.0], [1.0, 0.0], [0.5, 1.0]])

# Different 2×2 matrices = different transformations
angle = torch.tensor(torch.pi / 4)  # 45 degrees
transforms = {
    "Original": torch.eye(2),
    "Rotate 45°": torch.tensor([[torch.cos(angle), -torch.sin(angle)],
                                  [torch.sin(angle),  torch.cos(angle)]]),
    "Stretch X": torch.tensor([[2.0, 0.0],
                                [0.0, 1.0]]),
    "Shear": torch.tensor([[1.0, 0.5],
                            [0.0, 1.0]]),
}

fig, axes = plt.subplots(1, 4, figsize=(14, 3))
for ax, (name, M) in zip(axes, transforms.items()):
    # Apply the transformation: multiply each point by the matrix
    transformed = triangle @ M.T  # @ means matrix multiply
    
    # Close the triangle for plotting
    pts = torch.cat([transformed, transformed[:1]])
    ax.fill(pts[:, 0].numpy(), pts[:, 1].numpy(), alpha=0.3)
    ax.plot(pts[:, 0].numpy(), pts[:, 1].numpy(), 'o-', markersize=6)
    ax.set_title(name)
    ax.set_xlim(-1.5, 2.5)
    ax.set_ylim(-1, 2)
    ax.set_aspect('equal')
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Changing Dimensions

Matrices don't have to be square. A matrix with different numbers of rows and columns can change the number of dimensions in your data:
- A 1×2 matrix turns a 2D point into a single number (**projection** — losing information)
- A 3×2 matrix turns a 2D point into a 3D point (**embedding** — adding room for more information)

In [None]:
point_2d = torch.tensor([3.0, 4.0])

# Project from 2D down to 1D
project = torch.tensor([[0.6, 0.8]])  # 1×2 matrix
result_1d = project @ point_2d
print(f"2D point: {point_2d.tolist()}")
print(f"After 1×2 projection: {result_1d.tolist()}  (2D → 1D)")

# Embed from 2D up to 3D
embed = torch.tensor([[1.0, 0.0],
                       [0.0, 1.0],
                       [0.5, 0.5]])  # 3×2 matrix
result_3d = embed @ point_2d
print(f"After 3×2 embedding: {result_3d.tolist()}  (2D → 3D)")

## Neural Network Layer = Matrix Operation

Here's the key insight: `nn.Linear` (which we used in Chapters 3 and 4) is *just* a matrix multiplication plus a bias. Let's verify this by doing the math by hand and comparing.

In [None]:
# Create a layer with 3 inputs and 2 outputs
torch.manual_seed(42)
layer = nn.Linear(3, 2)

# Peek inside: it's just a weight matrix and a bias vector
W = layer.weight.data  # 2×3 matrix
b = layer.bias.data    # 2-element vector
print(f"Weight matrix (2×3):\n{W}")
print(f"Bias vector: {b}")

# Pass an input through the layer
x = torch.tensor([1.0, 2.0, 3.0])
layer_output = layer(x)

# Do the same thing by hand: matrix multiply + bias
manual_output = W @ x + b

print(f"\nInput: {x.tolist()}")
print(f"nn.Linear output:  {layer_output.tolist()}")
print(f"Manual W@x + b:    {manual_output.tolist()}")
print(f"\nThey're identical! nn.Linear is just matrix multiplication + bias.")

## ReLU vs Sigmoid

An **activation function** is applied after each matrix multiplication. You've already seen sigmoid (squashes to 0–1). **ReLU** is even simpler: it just replaces negative numbers with zero. Despite being so simple, ReLU works great in practice and trains faster than sigmoid.

In [None]:
x = torch.linspace(-4, 4, 200)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 3))

ax1.plot(x.numpy(), torch.sigmoid(x).numpy(), linewidth=2)
ax1.set_title("Sigmoid: squashes to (0, 1)")
ax1.grid(True, alpha=0.3)
ax1.set_xlabel("Input")
ax1.set_ylabel("Output")

ax2.plot(x.numpy(), torch.relu(x).numpy(), linewidth=2, color='orange')
ax2.set_title("ReLU: zero if negative, unchanged if positive")
ax2.grid(True, alpha=0.3)
ax2.set_xlabel("Input")
ax2.set_ylabel("Output")

plt.tight_layout()
plt.show()

## Why Activation Functions Matter

Without an activation function between layers, stacking two matrix multiplications is the same as a single matrix multiplication — depth is an illusion! Adding a nonlinear activation (like ReLU) between layers makes each layer do something genuinely new.

In [None]:
# Two matrices without activation = one matrix
A = torch.tensor([[1.0, 2.0], [3.0, 4.0]])
B = torch.tensor([[0.5, -1.0], [1.5, 0.5]])

x = torch.tensor([1.0, 1.0])

# Apply A then B (no activation)
without_activation = B @ (A @ x)

# This is the same as applying the combined matrix BA
combined = B @ A
combined_result = combined @ x

print("Without activation function:")
print(f"  B × (A × x) = {without_activation.tolist()}")
print(f"  (BA) × x    = {combined_result.tolist()}")
print(f"  Same result! Two layers collapsed into one.")

# Now with ReLU between layers
with_activation = B @ torch.relu(A @ x)

print(f"\nWith ReLU activation between layers:")
print(f"  B × relu(A × x) = {with_activation.tolist()}")
print(f"  Different! The activation makes the second layer genuinely useful.")

---

*This notebook accompanies [Chapter 5: Matrix Math and Linear Transformations](https://learnai.robennals.org/05-matrix-math). The interactive widgets in the web version let you explore these concepts visually.*

*New to PyTorch? See the [PyTorch from Scratch](https://learnai.robennals.org/appendix-pytorch) appendix for a beginner-friendly introduction.*