# PyTorch from Scratch — Try it in PyTorch

This notebook contains all the code from the [PyTorch from Scratch appendix](https://robennals.github.io/ai-explained/appendix-pytorch). Run each cell to follow along.

## Tensors

A **tensor** is PyTorch's word for "array of numbers." Think of it like a NumPy array that can also run on GPUs and track gradients.

In [None]:
import torch

# From a Python list
x = torch.tensor([1.0, 2.0, 3.0])
print(f"Vector: {x}")

# A 2x3 matrix
m = torch.tensor([[1.0, 2.0, 3.0],
                   [4.0, 5.0, 6.0]])
print(f"Matrix:\n{m}")

# Zeros, ones, random
z = torch.zeros(3, 4)
o = torch.ones(2, 2)
r = torch.randn(5, 3)  # random numbers from a normal distribution

print(f"\nRandom 5x3 tensor:\n{r}")

### Shape and Dtype

Every tensor has a **shape** (its dimensions) and a **dtype** (the type of number it stores). When something goes wrong, `print(x.shape)` is usually your first debugging move.

In [None]:
x = torch.randn(3, 4)
print(f"Shape: {x.shape}")
print(f"Dtype: {x.dtype}")
print(f"Tensor:\n{x}")

### Basic Math

Tensor math works element-wise by default. For matrix multiplication (the workhorse of neural networks), use `@`.

In [None]:
a = torch.tensor([1.0, 2.0, 3.0])
b = torch.tensor([4.0, 5.0, 6.0])

print(f"a + b = {a + b}")
print(f"a * b = {a * b}")
print(f"a ** 2 = {a ** 2}")

# Matrix multiplication: (2x3) @ (3x4) -> (2x4)
A = torch.randn(2, 3)
B = torch.randn(3, 4)
C = A @ B
print(f"\nMatrix multiply: {A.shape} @ {B.shape} -> {C.shape}")
print(f"Result:\n{C}")

## Automatic Differentiation

PyTorch's killer feature. Set `requires_grad=True` on a tensor, do math with it, then call `.backward()` — PyTorch computes the gradient automatically.

In [None]:
x = torch.tensor(3.0, requires_grad=True)

# y = x^2 + 2x + 1
y = x ** 2 + 2 * x + 1

# Compute gradients
y.backward()

# dy/dx = 2x + 2 = 2(3) + 2 = 8
print(f"x = {x.item()}")
print(f"y = x^2 + 2x + 1 = {y.item()}")
print(f"dy/dx = 2x + 2 = {x.grad.item()}")

## Building a Neural Network

PyTorch provides `torch.nn` — a library of common neural network building blocks.

In [None]:
import torch.nn as nn

# nn.Linear: the most fundamental layer
# It computes y = x @ W.T + b
layer = nn.Linear(in_features=3, out_features=2)
x = torch.randn(1, 3)      # one input with 3 features
y = layer(x)                # output has 2 features
print(f"Input shape:  {x.shape}")
print(f"Output shape: {y.shape}")
print(f"Weight shape: {layer.weight.shape}")
print(f"Bias shape:   {layer.bias.shape}")

In [None]:
# nn.Sequential: stack layers into a network
model = nn.Sequential(
    nn.Linear(2, 16),   # 2 inputs -> 16 hidden neurons
    nn.ReLU(),           # activation function
    nn.Linear(16, 16),  # 16 -> 16
    nn.ReLU(),
    nn.Linear(16, 1),   # 16 -> 1 output
)

print(model)

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
print(f"\nTotal parameters: {total_params}")

## Putting It Together: Learning XOR

XOR returns 1 when its inputs differ, and 0 when they're the same. A single neuron can't solve it, but a two-layer network can.

| Input A | Input B | Output |
|---------|---------|--------|
| 0       | 0       | 0      |
| 0       | 1       | 1      |
| 1       | 0       | 1      |
| 1       | 1       | 0      |

In [None]:
import matplotlib.pyplot as plt

# Training data
inputs = torch.tensor([[0., 0.], [0., 1.], [1., 0.], [1., 1.]])
targets = torch.tensor([[0.], [1.], [1.], [0.]])

# Model: 2 inputs -> 8 hidden neurons -> 1 output
model = nn.Sequential(
    nn.Linear(2, 8),
    nn.ReLU(),
    nn.Linear(8, 1),
    nn.Sigmoid(),       # squash output to 0-1
)

loss_fn = nn.BCELoss()  # binary cross-entropy for 0/1 classification
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

# Train and track loss
losses = []
for epoch in range(2000):
    predictions = model(inputs)
    loss = loss_fn(predictions, targets)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    losses.append(loss.item())

    if epoch % 500 == 0:
        print(f"Epoch {epoch}: loss = {loss.item():.4f}")

# Plot loss over time
plt.figure(figsize=(8, 3))
plt.plot(losses)
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Training loss — the network learns XOR")
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Test
with torch.no_grad():
    results = model(inputs)
    print("\nResults:")
    for inp, target, out in zip(inputs, targets, results):
        print(f"  {inp.tolist()} -> {out.item():.3f}  (target: {target.item():.0f})")

---

*This notebook accompanies [Appendix: PyTorch from Scratch](https://robennals.github.io/ai-explained/appendix-pytorch). Now that you know the basics, try the "Try it in PyTorch" notebooks in each chapter!*