# Linear Algebra for Neural Networks

This notebook contains PyTorch examples demonstrating linear algebra concepts essential for understanding neural networks.

## Table of Contents
1. [Matrix Multiplication](#matrix-multiplication)
2. [Matrix Transpose](#matrix-transpose)
3. [Matrix Inverse](#matrix-inverse)
4. [Eigenvalues & Eigenvectors](#eigenvalues--eigenvectors)
5. [Singular Value Decomposition (SVD)](#singular-value-decomposition-svd)

In [None]:
import torch
import numpy as np
import matplotlib.pyplot as plt

## Matrix Multiplication

**Formula:** $\mathbf{C} = \mathbf{A}\mathbf{B}$ where $C_{ij} = \sum_k A_{ik}B_{kj}$

Matrix multiplication is the fundamental operation of neural networks.

In [None]:
# Simple neural network layer: y = Wx + b
batch_size, input_dim, output_dim = 32, 784, 128
x = torch.randn(batch_size, input_dim)  # Input batch
W = torch.randn(output_dim, input_dim)  # Weight matrix
b = torch.randn(output_dim)             # Bias vector

# Forward pass - matrix multiplication
y = x @ W.T + b  # Shape: (32, 128)
print(f"Input shape: {x.shape}")
print(f"Weight shape: {W.shape}")  
print(f"Output shape: {y.shape}")

# Each row of y contains weighted sums for all neurons for one sample
print(f"Output for first sample: {y[0][:5]}...")  # First 5 neurons

## Matrix Transpose

**Formula:** $(\mathbf{A})^T_{ij} = \mathbf{A}_{ji}$

Essential for backpropagation - the transpose "reverses" the forward direction of information flow.

In [None]:
# Forward pass: y = x @ W.T
# Backward pass: dx = dy @ W (using transpose automatically)
x = torch.randn(32, 784, requires_grad=True)
W = torch.randn(128, 784, requires_grad=True)
y = x @ W.T

# Create dummy loss and backpropagate
loss = y.sum()
loss.backward()

print(f"Forward: x {x.shape} @ W.T {W.T.shape} = y {y.shape}")
print(f"Gradient flows back through transpose automatically")
print(f"x.grad shape: {x.grad.shape}")  # Same as x.shape
print(f"W.grad shape: {W.grad.shape}")  # Same as W.shape

## Matrix Inverse

**Formula:** $\mathbf{A}\mathbf{A}^{-1} = \mathbf{I}$

Used in analytical solutions and understanding linear transformations.

In [None]:
# Normal equations for linear regression: θ = (X.T @ X)^(-1) @ X.T @ y
n_samples, n_features = 100, 10
X = torch.randn(n_samples, n_features)
true_theta = torch.randn(n_features)
y = X @ true_theta + 0.1 * torch.randn(n_samples)

# Analytical solution using matrix inverse
XtX = X.T @ X
XtX_inv = torch.inverse(XtX)
theta_analytical = XtX_inv @ X.T @ y

print(f"True theta: {true_theta[:3]}")
print(f"Estimated theta: {theta_analytical[:3]}")
print(f"Error: {torch.norm(true_theta - theta_analytical):.6f}")

# Note: In practice, use torch.linalg.lstsq for numerical stability
theta_stable = torch.linalg.lstsq(X, y).solution
print(f"Stable solution: {theta_stable[:3]}")

## Eigenvalues & Eigenvectors

**Formula:** $\mathbf{A}\mathbf{v} = \lambda\mathbf{v}$

Reveals principal directions of data variation and helps analyze gradient flow.

In [None]:
# Analyzing weight matrix conditioning
W = torch.randn(100, 100)
eigenvals, eigenvecs = torch.linalg.eig(W @ W.T)  # Eigendecomposition
eigenvals = eigenvals.real  # Take real part

condition_number = eigenvals.max() / eigenvals.min()
print(f"Condition number: {condition_number:.2f}")
print(f"Max eigenvalue: {eigenvals.max():.2f}")
print(f"Min eigenvalue: {eigenvals.min():.2f}")

# PCA example - find principal components
data = torch.randn(1000, 50)  # 1000 samples, 50 features
centered_data = data - data.mean(dim=0)
cov_matrix = (centered_data.T @ centered_data) / (len(data) - 1)

eigenvals, eigenvecs = torch.linalg.eigh(cov_matrix)  # For symmetric matrices
# Sort by eigenvalue magnitude
sorted_indices = torch.argsort(eigenvals, descending=True)
principal_components = eigenvecs[:, sorted_indices]

print(f"Explained variance ratios: {eigenvals[sorted_indices][:5] / eigenvals.sum()}")

## Singular Value Decomposition (SVD)

**Formula:** $\mathbf{A} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^T$

Decomposes any matrix into orthogonal transformations and scaling.

In [None]:
# SVD for dimensionality reduction and analysis
data = torch.randn(1000, 100)  # High-dimensional data

# Perform SVD
U, S, Vt = torch.linalg.svd(data, full_matrices=False)

# Analyze the singular values
print(f"Data shape: {data.shape}")
print(f"U shape: {U.shape}")    # Left singular vectors
print(f"S shape: {S.shape}")    # Singular values
print(f"Vt shape: {Vt.shape}")  # Right singular vectors

# Reconstruct with fewer components (dimensionality reduction)
k = 20  # Keep top 20 components
data_reduced = U[:, :k] @ torch.diag(S[:k]) @ Vt[:k, :]

reconstruction_error = torch.norm(data - data_reduced)
compression_ratio = (k * (U.shape[0] + Vt.shape[1])) / (data.shape[0] * data.shape[1])

print(f"Reconstruction error: {reconstruction_error:.2f}")
print(f"Compression ratio: {compression_ratio:.2%}")
print(f"Variance explained by top {k} components: {(S[:k]**2).sum() / (S**2).sum():.2%}")

# SVD for weight initialization (orthogonal initialization)
def svd_init(tensor):
    """Initialize weights using SVD for orthogonal matrices"""
    if tensor.dim() >= 2:
        U, _, Vt = torch.linalg.svd(tensor, full_matrices=False)
        return U if U.shape == tensor.shape else Vt
    return tensor

weight = torch.empty(128, 64)
orthogonal_weight = svd_init(weight)
print(f"Orthogonality check: {torch.norm(orthogonal_weight @ orthogonal_weight.T - torch.eye(128)):.6f}")