# PyTorch Tensor Operations — Daily Practice Notebook
*Generated on 2025-08-19*

This notebook is designed for short, daily reps to build fluency with **PyTorch tensor operations**.
Mastering these will make writing your own `nn.Module` feel natural.

**How to use**  
- Work top-to-bottom. Each mini-exercise has tests that will tell you if you got it right.  
- Only edit where you see **`# TODO`**.  
- Rerun cells after edits until tests pass.  
- Aim for 10–20 minutes per day.

> Tip: If you’re new to PyTorch, keep the [tensor semantics](https://pytorch.org/docs/stable/tensors.html) docs open.

## 1) Setup

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F

print("PyTorch version:", torch.__version__)
torch.manual_seed(42)
torch.set_printoptions(precision=3, sci_mode=False)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)

PyTorch version: 2.6.0+cu124
Using device: cpu


## 2) Warm‑up — creation, dtype, device, reshape/view

In [2]:
# Create a 1-D tensor with values 0..11 and reshape to (3,4)
t1 = torch.arange(12, device=device, dtype=torch.int64)
t1 = t1.reshape(3,4)

# Move to device if available
t1 = t1.to(device)

# Tests (do not modify)
assert t1.shape == (3,4)
assert t1.device == device
assert t1.dtype == torch.int64
print("✅ Warm-up passed.")

✅ Warm-up passed.


## 3) Core creation ops — zeros/ones/full/eye/rand/randn

In [5]:
# Create the following tensors (on current device):
# a) a (2,3) tensor of ones (float32) -> t_ones
# b) a (3,3) identity matrix (float32) -> t_eye
# c) a (2,2) tensor filled with 7 (int64) -> t_full
# d) a (4,5) standard normal -> t_randn
# TODO

t_ones = torch.ones(2,3, dtype=torch.float32)
t_eye = torch.eye(3, dtype=torch.float32)
t_full = torch.full((2,2), 7, dtype=torch.float32)
t_randn = torch.randn(4, 5, dtype=torch.float32)

# Tests
assert t_ones.shape == (2,3) and t_ones.dtype == torch.float32
assert t_eye.shape == (3,3) and t_eye.trace() == 3
assert t_full.dtype == torch.float32 and int(t_full.mean().cpu()) == 7
assert t_randn.shape == (4,5)
print("✅ Creation ops passed.")

✅ Creation ops passed.


## 4) Indexing & slicing — set border to 1, interior to 0

In [6]:
# Create a (5,5) tensor of zeros, set the border to 1, interior stays 0.
# Save as t_border.
# TODO
t_border = torch.zeros(5, 5, device=device, dtype=torch.float32)
t_border[: ,0] = 1
t_border[-1, :] = 1
t_border[0, :] = 1
t_border[:, -1] = 1


# Tests
expected = torch.tensor([[1,1,1,1,1],
                         [1,0,0,0,1],
                         [1,0,0,0,1],
                         [1,0,0,0,1],
                         [1,1,1,1,1]], device=device, dtype=t_border.dtype)
assert torch.equal(t_border, expected)
print("✅ Indexing & slicing passed.")

✅ Indexing & slicing passed.


## 5) Boolean masks — count values above mean

In [8]:
# Given a random (6,4) tensor, compute how many values are strictly above its mean.
# Save boolean mask to mask, count to count_above.
# TODO
x = torch.randn(6, 4, device=device, dtype=torch.float32)
mask = x > x.mean()
count_above = int(mask.sum())


# Tests
assert mask.dtype == torch.bool and mask.shape == x.shape
assert isinstance(count_above, int)
print("✅ Boolean mask passed. Count:", count_above)

✅ Boolean mask passed. Count: 14


## 6) Reshape/view/permute — make contiguous before view

In [19]:
# Start with shape (2,3,4), then permute to (3,2,4).
# Try to view it as (3,8) correctly (may require .contiguous()).
# Save the final tensor as y_view.
# TODO
x = torch.arange(24, device=device, dtype=torch.float32)
x = x.reshape(2, 3, 4)
x = x.permute(1, 0, 2)
y_view = x.contiguous().view(3,8)

# Tests
assert y_view.shape == (3,8)
assert torch.equal(y_view[0], torch.tensor([0,1,2,3,12,13,14,15], device=device))
print("✅ View/permute passed.")

tensor([[[ 0.,  1.,  2.,  3.],
         [ 4.,  5.,  6.,  7.],
         [ 8.,  9., 10., 11.]],

        [[12., 13., 14., 15.],
         [16., 17., 18., 19.],
         [20., 21., 22., 23.]]])
tensor([[[ 0.,  1.,  2.,  3.],
         [12., 13., 14., 15.]],

        [[ 4.,  5.,  6.,  7.],
         [16., 17., 18., 19.]],

        [[ 8.,  9., 10., 11.],
         [20., 21., 22., 23.]]])
tensor([[ 0.,  1.,  2.,  3., 12., 13., 14., 15.],
        [ 4.,  5.,  6.,  7., 16., 17., 18., 19.],
        [ 8.,  9., 10., 11., 20., 21., 22., 23.]])
✅ View/permute passed.


## 7) Broadcasting — pairwise L2 distance (no loops)

In [20]:
# Given A in R^{N x D} and B in R^{M x D}, compute pairwise L2 distances (N x M) using broadcasting.
# Use N=4, M=3, D=5 with a fixed seed for reproducibility.
# Save result as dist_nm.
# Hints: (A[:,None,:] - B[None,:,:])**2 then reduce over D.
# TODO
N, M, D = 4, 3, 5
A = torch.randn(N, D, device=device, dtype=torch.float32)
B = torch.randn(M, D, device=device, dtype=torch.float32)

dist_nm = torch.sqrt(((A[:,None,:]-B[None,:,:])**2).sum(-1))

# Tests
assert dist_nm.shape == (N, M)
# symmetry on diag when A==B (quick sanity by reusing A)
dist_aa = torch.sqrt(((A[:,None,:]-A[None,:,:])**2).sum(-1))
assert torch.allclose(torch.diag(dist_aa), torch.zeros(N, device=device), atol=1e-6)
print("✅ Broadcasting distance passed.")

✅ Broadcasting distance passed.


## 8) Reductions — implement stable logsumexp along dim=-1

In [None]:
# Implement a numerically stable logsumexp for a 2D tensor along the last dim.
# Save function as def my_logsumexp(x): return ...
# TODO
def my_logsumexp(x: torch.Tensor) -> torch.Tensor:
    pass

# Quick test vs PyTorch
z = torch.randn(7, 11, device=device) * 5
ok = torch.allclose(my_logsumexp(z), torch.logsumexp(z, dim=-1, keepdim=True), atol=1e-6)
assert ok
print("✅ LogSumExp passed.")

## 9) Autograd — verify grad of sum(x**2) is 2x

In [None]:
# Create x requires_grad, compute y = sum(x**2), backprop,
# verify x.grad ≈ 2x.
# TODO

assert torch.allclose(x.grad, 2*x, atol=1e-6)
print("✅ Autograd gradient check passed.")

## 10) Linear layer by hand — y = x @ W.T + b

In [None]:
# Implement a manual linear layer and check against nn.Linear.
# Use batch=4, in_features=6, out_features=3.
# TODO
torch.manual_seed(123)
B, Din, Dout = 4, 6, 3
x = torch.randn(B, Din, device=device)

W = torch.randn(Dout, Din, device=device, requires_grad=True)
b = torch.randn(Dout, device=device, requires_grad=True)

y_manual =   # TODO

layer = nn.Linear(Din, Dout, bias=True).to(device)
with torch.no_grad():
    layer.weight.copy_(W)
    layer.bias.copy_(b)

y_ref = layer(x)
assert torch.allclose(y_manual, y_ref, atol=1e-6)
print("✅ Manual linear layer passed.")

## 11) `no_grad`, `detach`, and `requires_grad_`

In [None]:
# Demonstrate turning grad off for evaluation and re-enabling later.
# TODO
p = torch.randn(5, device=device, requires_grad=True)
with torch.no_grad():
    q = p * 3 + 1    # q has no grad history
r = (p.detach() * 2).requires_grad_()  # r is a leaf again with grad
loss = (r**2).sum()
loss.backward()
assert p.grad is None  # never used in backward
assert r.grad is not None
print("✅ no_grad/detach demo passed.")

## 12) DataLoader — batching tensors

In [None]:
from torch.utils.data import TensorDataset, DataLoader

# Build a simple dataset and iterate with DataLoader
# TODO: create X (100, 8) and y (100,) and a loader with batch_size=16, shuffle=True
X = torch.randn(100, 8)
y = torch.randint(0, 2, (100,))
ds = TensorDataset(X, y)
loader = DataLoader(ds, batch_size=16, shuffle=True)

batches = 0
for xb, yb in loader:
    assert xb.shape == (xb.size(0), 8)
    assert yb.dim() == 1
    batches += 1

assert batches == (100 + 16 - 1)//16
print("✅ DataLoader basics passed.")

## 13) Optional — GPU round‑trip

In [None]:
if torch.cuda.is_available():
    t = torch.randn(2,3, device="cuda")
    cpu_t = t.cpu()
    assert cpu_t.device.type == "cpu"
    print("✅ GPU round-trip passed.")
else:
    print("ℹ️ CUDA not available on this machine — skip.")

## 14) Daily Drills (repeatable)
Each day, do 2–3 of the prompts below from memory. Then check with PyTorch docs if stuck.

1. Build a 3D tensor of shape (2, 3, 4) with integers 0..23. Swap the first two dims and flatten the last two.
2. Using boolean masks, set all values in a tensor greater than its (row‑wise) mean to that row’s mean.
3. Compute cosine similarities between two matrices `A (N×D)` and `B (M×D)` using only broadcasting and reductions.
4. Implement a stable softmax using your `my_logsumexp` (no `F.softmax`).
5. Write a function `one_hot(ids, num_classes)` that returns a one‑hot matrix (no loops).
6. Vectorize: given `x (N,)`, compute a rolling window mean of width 3 without loops.
7. Using `unfold`, implement a 1D valid convolution (stride 1) with a given kernel (no `conv1d`).
8. Show three different ways to clone a tensor such that modifying the clone doesn’t change the original.
9. Given logits, compute cross‑entropy loss *manually* (no `F.cross_entropy`).
10. Do a tiny gradient descent step on a linear model: update `W, b` with `lr=1e-2` using autograd grads.

## 15) Bonus — timing: vectorization vs loops

In [None]:
import time

# Compare a loop-based pairwise L2 to the vectorized version above.
torch.manual_seed(0)
N, M, D = 200, 200, 64
A = torch.randn(N, D)
B = torch.randn(M, D)

# Loop version (slow)
t0 = time.time()
D_loop = torch.empty(N, M)
for i in range(N):
    for j in range(M):
        D_loop[i, j] = torch.norm(A[i] - B[j])
t1 = time.time()

# Vectorized
diff = A[:, None, :] - B[None, :, :]
D_vec = torch.sqrt((diff**2).sum(-1))
t2 = time.time()

print(f"Loop: {t1 - t0:.3f}s   Vectorized: {t2 - t1:.3f}s   Speedup: {(t1-t0)/(t2-t1+1e-9):.1f}x")
assert torch.allclose(D_loop, D_vec, atol=1e-6)
print("✅ Timing demo passed.")

## 16) From tensors to Modules — tiny MLP forward pass

In [None]:
# Build a minimal MLP using tensor shapes you understand.
# 8 -> 16 -> 4 with ReLU, then logits. Confirm output shape.
mlp = nn.Sequential(
    nn.Linear(8, 16),
    nn.ReLU(),
    nn.Linear(16, 4),
)
x = torch.randn(32, 8)
out = mlp(x)
assert out.shape == (32, 4)
print("✅ Tiny MLP forward passed.")

## 17) Practice checklist
- [ ] I can reshape, permute, and `view` safely (with `.contiguous()` when needed).
- [ ] I can write pairwise distance and cosine similarity with *broadcasting*.
- [ ] I can implement stable `logsumexp` and softmax.
- [ ] I can do simple autograd checks.
- [ ] I understand `no_grad`, `detach`, and `requires_grad_`.
- [ ] I can use `TensorDataset` and `DataLoader`.
- [ ] I can wire up a tiny `nn.Sequential` MLP and reason about shapes.