# üßÆ Week 4, Day 1: Linear Algebra for AI

**üéØ Goal:** Master the math that powers neural networks, transformers, and all modern AI

**‚è±Ô∏è Time:** 60-90 minutes

**üåü Why This Matters for AI:**
- **Neural networks ARE linear algebra** - Every layer is matrix multiplication
- **Transformers** (GPT, BERT, Claude) = Attention matrices
- **Word embeddings** = Vectors in high-dimensional space
- **Image processing** = Matrix operations on pixel arrays
- **Recommendation systems** = Vector similarity

---

## üî• 2024-2025 AI Trend Alert!

**Large Language Models** are PURE linear algebra:
- GPT-4: 1.8 trillion parameters = GIANT matrices
- **Every token prediction = thousands of matrix multiplications!**
- Training = optimizing billion-dimensional spaces

**Transformer Architecture** revolutionized AI:
- Self-attention = Query, Key, Value MATRICES
- **Understanding Q¬∑K·µÄ = Understanding ChatGPT!**

**RAG & Vector Databases**:
- Text ‚Üí Vectors (embeddings)
- Search = Cosine similarity (dot product!)
- **Linear algebra makes RAG possible!**

**You'll learn the exact math inside ChatGPT, Claude, Gemini!** üöÄ

---

## üìä Why Linear Algebra?

**Linear algebra** = Math of vectors and matrices

Think of it as:
- Regular algebra: Single numbers (1 + 2 = 3) üî¢
- Linear algebra: Arrays of numbers ([1,2,3] + [4,5,6]) üìä

**Real AI example:**
```python
# A neuron in GPT-4
input_vector = [0.5, 0.3, 0.8, ...]  # 12,288 numbers!
weight_matrix = [[w‚ÇÅ‚ÇÅ, w‚ÇÅ‚ÇÇ, ...], ...]  # 12,288 √ó 12,288
output = matrix_multiply(weights, input)  # One layer!
```

Let's start with the basics! üëá

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

print("NumPy version:", np.__version__)
print("‚úÖ Ready to learn the math inside AI!")

## üìê Vectors - The Building Blocks

### What is a Vector?

A **vector** = list of numbers with direction and magnitude

**In AI:**
- Word embeddings: "cat" = [0.2, -0.5, 0.8, ...] (300+ dimensions!)
- Image pixels: [255, 128, 64, ...] (millions of values)
- User preferences: [likes_action, likes_comedy, ...]

Let's visualize!

In [None]:
# Create vectors
v1 = np.array([3, 2])  # 2D vector
v2 = np.array([1, 4])

print("Vector v1:", v1)
print("Vector v2:", v2)
print("\nShape:", v1.shape)  # (2,) = 2 elements

# Visualize vectors
plt.figure(figsize=(8, 6))
plt.quiver(0, 0, v1[0], v1[1], angles='xy', scale_units='xy', scale=1, color='blue', width=0.006, label='v1 [3,2]')
plt.quiver(0, 0, v2[0], v2[1], angles='xy', scale_units='xy', scale=1, color='red', width=0.006, label='v2 [1,4]')

plt.xlim(-1, 5)
plt.ylim(-1, 5)
plt.grid(True, alpha=0.3)
plt.axhline(y=0, color='k', linewidth=0.5)
plt.axvline(x=0, color='k', linewidth=0.5)
plt.xlabel('x')
plt.ylabel('y')
plt.title('2D Vectors Visualization')
plt.legend()
plt.show()

print("\nüìä Vectors have:")
print("  - Direction (angle)")
print("  - Magnitude (length)")

## ‚ûï Vector Operations

### 1Ô∏è‚É£ Vector Addition

In [None]:
# Vector addition
v1 = np.array([3, 2])
v2 = np.array([1, 4])
v_sum = v1 + v2

print("v1 + v2 =", v_sum)

# Visualize
plt.figure(figsize=(8, 6))
plt.quiver(0, 0, v1[0], v1[1], angles='xy', scale_units='xy', scale=1, color='blue', width=0.006, label='v1')
plt.quiver(v1[0], v1[1], v2[0], v2[1], angles='xy', scale_units='xy', scale=1, color='red', width=0.006, label='v2')
plt.quiver(0, 0, v_sum[0], v_sum[1], angles='xy', scale_units='xy', scale=1, color='green', width=0.008, label='v1+v2')

plt.xlim(-1, 6)
plt.ylim(-1, 7)
plt.grid(True, alpha=0.3)
plt.axhline(y=0, color='k', linewidth=0.5)
plt.axvline(x=0, color='k', linewidth=0.5)
plt.title('Vector Addition (Head-to-Tail Rule)')
plt.legend()
plt.show()

print("\nüß† AI Example: Word embeddings")
print("   'king' - 'man' + 'woman' ‚âà 'queen'")
print("   This ACTUALLY works with vector math!")

### 2Ô∏è‚É£ Scalar Multiplication

In [None]:
# Scalar multiplication (scale a vector)
v = np.array([2, 1])
v_scaled = 2 * v

print("v =", v)
print("2 * v =", v_scaled)

# Visualize
plt.figure(figsize=(8, 6))
plt.quiver(0, 0, v[0], v[1], angles='xy', scale_units='xy', scale=1, color='blue', width=0.006, label='v')
plt.quiver(0, 0, v_scaled[0], v_scaled[1], angles='xy', scale_units='xy', scale=1, color='red', width=0.008, label='2*v')

plt.xlim(-1, 5)
plt.ylim(-1, 3)
plt.grid(True, alpha=0.3)
plt.axhline(y=0, color='k', linewidth=0.5)
plt.axvline(x=0, color='k', linewidth=0.5)
plt.title('Scalar Multiplication (Scaling)')
plt.legend()
plt.show()

print("\nüß† AI Example: Learning rate in gradient descent")
print("   new_weights = old_weights - learning_rate * gradient")
print("   Scaling the gradient vector!")

### 3Ô∏è‚É£ Dot Product - THE MOST IMPORTANT!

In [None]:
# Dot product (scalar product)
v1 = np.array([2, 3])
v2 = np.array([1, 4])

dot_product = np.dot(v1, v2)
# Or: v1 @ v2  (Python 3.5+)

print("v1 =", v1)
print("v2 =", v2)
print("\nDot product = v1 ¬∑ v2 =", dot_product)
print("\nCalculation: (2√ó1) + (3√ó4) = 2 + 12 = 14")

# Geometric interpretation
magnitude_v1 = np.linalg.norm(v1)
magnitude_v2 = np.linalg.norm(v2)
cos_angle = dot_product / (magnitude_v1 * magnitude_v2)
angle_rad = np.arccos(cos_angle)
angle_deg = np.degrees(angle_rad)

print(f"\nüìê Geometric interpretation:")
print(f"   ||v1|| = {magnitude_v1:.2f}")
print(f"   ||v2|| = {magnitude_v2:.2f}")
print(f"   Angle between vectors = {angle_deg:.1f}¬∞")
print(f"   v1 ¬∑ v2 = ||v1|| √ó ||v2|| √ó cos(Œ∏)")

print("\nüß† AI Applications:")
print("   ‚úÖ Neural network forward pass")
print("   ‚úÖ Transformer attention mechanism")
print("   ‚úÖ Cosine similarity (text search)")
print("   ‚úÖ Recommendation systems")

### üéØ Real AI Example: Text Similarity with Dot Product

In [None]:
# Simulate word embeddings (simplified)
# In reality: 300-1536 dimensions!
embeddings = {
    'cat': np.array([0.8, 0.3, 0.1, 0.9]),
    'dog': np.array([0.7, 0.4, 0.2, 0.8]),
    'car': np.array([0.1, 0.9, 0.8, 0.2]),
    'kitten': np.array([0.85, 0.25, 0.05, 0.95])
}

def cosine_similarity(v1, v2):
    """Calculate cosine similarity (normalized dot product)"""
    return np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))

# Compare 'cat' with all words
query_word = 'cat'
query_vec = embeddings[query_word]

print(f"üîç Finding words similar to '{query_word}':\n")
similarities = {}
for word, vec in embeddings.items():
    if word != query_word:
        sim = cosine_similarity(query_vec, vec)
        similarities[word] = sim
        print(f"   {word}: {sim:.3f}")

print(f"\nüéØ Most similar: {max(similarities, key=similarities.get)}")
print("\n‚ú® This is EXACTLY how:")
print("   - Google Search finds relevant documents")
print("   - ChatGPT retrieves context (RAG)")
print("   - Spotify recommends similar songs")

## üî≤ Matrices - Multiple Vectors Together

A **matrix** = 2D array of numbers (rectangular grid)

**In AI:**
- Neural network weights
- Batch of data (rows = samples, columns = features)
- Images (rows √ó columns = pixels)

In [None]:
# Create matrices
A = np.array([[1, 2, 3],
              [4, 5, 6]])

B = np.array([[2, 0],
              [1, 3],
              [4, 1]])

print("Matrix A:")
print(A)
print(f"Shape: {A.shape} (2 rows, 3 columns)")

print("\nMatrix B:")
print(B)
print(f"Shape: {B.shape} (3 rows, 2 columns)")

print("\nüß† AI Example:")
print("   A = Batch of 2 samples with 3 features each")
print("   B = Weight matrix connecting 3 inputs to 2 outputs")

## üîÑ Matrix Multiplication - THE CORE OF NEURAL NETWORKS!

In [None]:
# Matrix multiplication
A = np.array([[1, 2, 3],
              [4, 5, 6]])

B = np.array([[2, 0],
              [1, 3],
              [4, 1]])

C = A @ B  # or np.matmul(A, B) or np.dot(A, B)

print("A @ B = C\n")
print(f"({A.shape[0]}√ó{A.shape[1]}) @ ({B.shape[0]}√ó{B.shape[1]}) = ({C.shape[0]}√ó{C.shape[1]})")
print("\nResult C:")
print(C)

print("\nüìê How it works:")
print(f"   C[0,0] = (1√ó2) + (2√ó1) + (3√ó4) = {C[0,0]}")
print(f"   C[0,1] = (1√ó0) + (2√ó3) + (3√ó1) = {C[0,1]}")
print(f"   C[1,0] = (4√ó2) + (5√ó1) + (6√ó4) = {C[1,0]}")
print(f"   C[1,1] = (4√ó0) + (5√ó3) + (6√ó1) = {C[1,1]}")

print("\nüî• KEY RULE: (m√ón) @ (n√óp) = (m√óp)")
print("   Inner dimensions MUST match!")

## üß† Neural Network Layer = Matrix Multiplication!

In [None]:
# Simulate a simple neural network layer
print("ü§ñ SIMPLE NEURAL NETWORK LAYER\n")
print("=" * 50)

# Input: 1 sample with 3 features
X = np.array([[0.5, 0.8, 0.2]])  # Shape: (1, 3)
print("Input (1 sample, 3 features):")
print(X)

# Weights: 3 inputs ‚Üí 4 neurons
W = np.array([[0.1, 0.2, 0.3, 0.4],
              [0.5, 0.6, 0.7, 0.8],
              [0.9, 1.0, 1.1, 1.2]])  # Shape: (3, 4)
print("\nWeights (3‚Üí4 neurons):")
print(W)

# Bias
b = np.array([0.1, 0.1, 0.1, 0.1])  # Shape: (4,)
print("\nBias:")
print(b)

# Forward pass: Y = X @ W + b
Y = X @ W + b
print("\nOutput (1 sample, 4 neurons):")
print(Y)
print(f"Shape: {Y.shape}")

print("\n" + "=" * 50)
print("‚ú® This is LITERALLY how neural networks work!")
print("   Every layer: output = input @ weights + bias")
print("   GPT-4 does this THOUSANDS of times per token!")

## üéØ Transformer Attention Mechanism (Simplified)

In [None]:
# Simplified self-attention (what makes GPT/BERT work!)
print("üî• TRANSFORMER SELF-ATTENTION (Simplified)\n")
print("=" * 60)

# Input: 3 tokens (words), each with 4-dim embedding
X = np.array([[1.0, 0.5, 0.2, 0.8],  # Token 1
              [0.3, 0.9, 0.1, 0.6],  # Token 2
              [0.7, 0.2, 0.9, 0.4]]) # Token 3
print(f"Input embeddings (3 tokens √ó 4 dims):\n{X}\n")

# Weight matrices (normally learned during training)
W_Q = np.random.randn(4, 4) * 0.1  # Query
W_K = np.random.randn(4, 4) * 0.1  # Key
W_V = np.random.randn(4, 4) * 0.1  # Value

# Compute Q, K, V
Q = X @ W_Q  # (3, 4) @ (4, 4) = (3, 4)
K = X @ W_K
V = X @ W_V

print(f"Q (Queries): {Q.shape}")
print(f"K (Keys): {K.shape}")
print(f"V (Values): {V.shape}\n")

# Attention scores: Q @ K^T
attention_scores = Q @ K.T  # (3, 4) @ (4, 3) = (3, 3)
print(f"Attention scores (Q @ K^T):\n{attention_scores}\n")

# Softmax to get attention weights (sum to 1)
def softmax(x):
    exp_x = np.exp(x - np.max(x, axis=-1, keepdims=True))
    return exp_x / np.sum(exp_x, axis=-1, keepdims=True)

attention_weights = softmax(attention_scores)
print(f"Attention weights (after softmax):\n{attention_weights}\n")
print(f"Each row sums to: {attention_weights.sum(axis=1)}\n")

# Apply attention to values
output = attention_weights @ V  # (3, 3) @ (3, 4) = (3, 4)
print(f"Output (attention applied):\n{output}\n")

print("=" * 60)
print("üéØ This is the CORE of GPT, BERT, Claude, Gemini!")
print("   - Q @ K^T = 'How much should tokens attend to each other?'")
print("   - Softmax = Convert scores to probabilities")
print("   - attention @ V = Weighted combination of values")
print("\n‚ú® You just understood transformers!")

## üîÑ Matrix Transpose & Other Operations

In [None]:
# Matrix transpose (flip rows and columns)
A = np.array([[1, 2, 3],
              [4, 5, 6]])

A_T = A.T  # or np.transpose(A)

print("Original A (2√ó3):")
print(A)
print("\nTransposed A^T (3√ó2):")
print(A_T)

print("\nüß† AI Usage: Attention mechanism needs K^T")
print("   Q @ K^T creates attention scores!")

In [None]:
# Other important operations
A = np.array([[1, 2],
              [3, 4]])

print("Matrix A:")
print(A)

# Determinant
det_A = np.linalg.det(A)
print(f"\nDeterminant: {det_A}")

# Inverse (A √ó A^(-1) = Identity)
A_inv = np.linalg.inv(A)
print(f"\nInverse A^(-1):")
print(A_inv)

# Verify: A @ A^(-1) ‚âà I
identity = A @ A_inv
print(f"\nA @ A^(-1) (should be identity):")
print(identity)

# Eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
print(f"\nEigenvalues: {eigenvalues}")
print(f"Eigenvectors:\n{eigenvectors}")

print("\nüß† AI Applications:")
print("   - Inverse: Solving systems of equations")
print("   - Eigenvalues: PCA (dimensionality reduction)")
print("   - Eigenvectors: Principal components")

## üéØ MINI CHALLENGE: Build a 2-Layer Neural Network!

In [None]:
# TODO: Complete this neural network!

print("üß† 2-LAYER NEURAL NETWORK FROM SCRATCH\n")
print("=" * 50)

# Input: 2 samples, 3 features each
X = np.array([[0.5, 0.8, 0.2],
              [0.3, 0.6, 0.9]])
print("Input X (2 samples √ó 3 features):")
print(X)

# Layer 1: 3 ‚Üí 4 neurons
W1 = np.random.randn(3, 4) * 0.5
b1 = np.zeros((1, 4))

# TODO: Compute Layer 1 output
Z1 = X @ W1 + b1
A1 = np.maximum(0, Z1)  # ReLU activation

print(f"\nLayer 1 output (2 √ó 4):")
print(A1)

# Layer 2: 4 ‚Üí 2 neurons (binary classification)
W2 = np.random.randn(4, 2) * 0.5
b2 = np.zeros((1, 2))

# TODO: Compute Layer 2 output
Z2 = A1 @ W2 + b2

# Softmax activation (convert to probabilities)
def softmax(x):
    exp_x = np.exp(x - np.max(x, axis=-1, keepdims=True))
    return exp_x / np.sum(exp_x, axis=-1, keepdims=True)

predictions = softmax(Z2)

print(f"\nFinal predictions (2 samples √ó 2 classes):")
print(predictions)
print(f"\nProbabilities sum to: {predictions.sum(axis=1)}")

print("\n" + "=" * 50)
print("‚ú® You just built a neural network with linear algebra!")
print("\nArchitecture:")
print("   Input (3) ‚Üí Layer1 (4) ‚Üí Layer2 (2) ‚Üí Softmax")
print("\nAll operations:")
print("   - Matrix multiplication (X @ W)")
print("   - Vector addition (+ b)")
print("   - Element-wise operations (ReLU, softmax)")

## üéØ Real Example: Image as Matrix

In [None]:
# Create a simple "image" (grayscale)
image = np.array([[0, 0, 0, 0, 0],
                 [0, 1, 1, 1, 0],
                 [0, 1, 0, 1, 0],
                 [0, 1, 1, 1, 0],
                 [0, 0, 0, 0, 0]])

print("Image as matrix (5√ó5):")
print(image)

# Visualize
plt.figure(figsize=(6, 6))
plt.imshow(image, cmap='gray')
plt.title('Image as Matrix (0=black, 1=white)')
plt.colorbar()
plt.show()

# Simple filter (edge detection)
filter_edge = np.array([[-1, -1, -1],
                       [-1,  8, -1],
                       [-1, -1, -1]])

print("\nüß† In CNNs:")
print("   - Images = matrices")
print("   - Filters = small matrices")
print("   - Convolution = matrix multiplication!")
print("\nReal images:")
print("   - Grayscale: 28√ó28 (MNIST)")
print("   - Color: 224√ó224√ó3 (ImageNet)")
print("   - HD: 1920√ó1080√ó3")

## üéâ Congratulations!

**You just learned:**
- ‚úÖ Vectors (the building blocks of AI)
- ‚úÖ Vector operations (addition, scaling, dot product)
- ‚úÖ Dot product = cosine similarity (text search!)
- ‚úÖ Matrices (collections of vectors)
- ‚úÖ Matrix multiplication (THE core of neural networks)
- ‚úÖ Neural network layers = matrix operations
- ‚úÖ Transformer attention = Q @ K^T
- ‚úÖ Built a 2-layer neural network from scratch!

**üéØ Linear Algebra Cheat Sheet:**
```python
# Vectors
v = np.array([1, 2, 3])
v1 + v2              # Addition
2 * v                # Scaling
np.dot(v1, v2)       # Dot product
np.linalg.norm(v)    # Magnitude

# Matrices
A = np.array([[1,2],[3,4]])
A @ B                # Matrix multiplication
A.T                  # Transpose
np.linalg.inv(A)     # Inverse
np.linalg.det(A)     # Determinant

# Neural networks
output = input @ weights + bias  # Forward pass
```

**üß† Key Insights:**
- Every neural network layer = matrix multiplication
- Transformer attention = Q @ K^T @ V
- Text similarity = cosine similarity (dot product)
- Images = matrices of pixels

**üéØ Practice Exercise:**

Build a 3-layer neural network:
1. Input: 5 features
2. Hidden layer 1: 8 neurons (ReLU)
3. Hidden layer 2: 4 neurons (ReLU)
4. Output: 3 classes (Softmax)
5. Test with random input data

---

**üìö Next Lesson:** Day 2 - Calculus & Optimization (How AI Learns!)

**üí° Fun Fact:** 
- GPT-4 has ~1.8 trillion parameters
- Every forward pass = thousands of matrix multiplications
- Training = optimizing matrices with TRILLIONS of numbers!

---

*You now understand the math inside ChatGPT, Claude, and Gemini!* üöÄ