# Linear Algebra Fundamentals for Data Science

Linear algebra is the mathematical foundation of machine learning, data science, and NLP. This notebook covers essential linear algebra concepts with practical Python implementations.

## Why Linear Algebra Matters:
- **Machine Learning**: Neural networks, PCA, regression, SVD
- **NLP**: Word embeddings, transformers, document similarity
- **Computer Vision**: Image processing, convolutions, transformations
- **Data Science**: Dimensionality reduction, clustering, optimization
- **Deep Learning**: All operations are essentially matrix multiplications

## Topics Covered:
- Vectors and vector operations
- Matrices and matrix operations
- Systems of linear equations
- Eigenvalues and eigenvectors
- Matrix decomposition (SVD, PCA)
- Applications in ML/NLP

In [None]:
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import seaborn as sns
from sklearn.decomposition import PCA
from sklearn.datasets import make_blobs
import warnings

# Set random seed for reproducibility
np.random.seed(42)
warnings.filterwarnings('ignore')

# Configure plotting
plt.style.use('default')
sns.set_palette("husl")

print("Libraries imported successfully!")
print(f"NumPy version: {np.__version__}")

## Vectors and Vector Operations

In [None]:
# Vector basics
print("📐 Vector Operations")
print("=" * 30)

# Create vectors
v1 = np.array([3, 4])
v2 = np.array([1, 2])
v3 = np.array([2, -1, 3])  # 3D vector

print(f"Vector v1: {v1}")
print(f"Vector v2: {v2}")
print(f"Vector v3: {v3}")
print()

# Vector operations
print("Basic Operations:")
print(f"v1 + v2 = {v1 + v2}")
print(f"v1 - v2 = {v1 - v2}")
print(f"3 * v1 = {3 * v1}")
print()

# Vector norms (magnitude)
print("Vector Norms:")
l2_norm_v1 = np.linalg.norm(v1)  # L2 norm (Euclidean)
l1_norm_v1 = np.linalg.norm(v1, ord=1)  # L1 norm (Manhattan)
linf_norm_v1 = np.linalg.norm(v1, ord=np.inf)  # L∞ norm (Maximum)

print(f"||v1||₂ (L2 norm): {l2_norm_v1:.3f}")
print(f"||v1||₁ (L1 norm): {l1_norm_v1:.3f}")
print(f"||v1||∞ (L∞ norm): {linf_norm_v1:.3f}")
print()

# Dot product
dot_product = np.dot(v1, v2)
print(f"Dot product v1 · v2 = {dot_product}")

# Alternative ways to compute dot product
print(f"Using @ operator: v1 @ v2 = {v1 @ v2}")
print(f"Manual calculation: {v1[0]*v2[0] + v1[1]*v2[1]}")
print()

# Angle between vectors
cos_angle = dot_product / (np.linalg.norm(v1) * np.linalg.norm(v2))
angle_rad = np.arccos(cos_angle)
angle_deg = np.degrees(angle_rad)

print(f"Angle between v1 and v2: {angle_deg:.1f}° ({angle_rad:.3f} radians)")
print(f"Cosine of angle: {cos_angle:.3f}")

# Unit vectors (normalization)
v1_unit = v1 / np.linalg.norm(v1)
print(f"\nUnit vector of v1: {v1_unit}")
print(f"Norm of unit vector: {np.linalg.norm(v1_unit):.3f}")

In [None]:
# Visualize vectors
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Plot 1: Basic vectors
origin = [0, 0]
axes[0].quiver(*origin, v1[0], v1[1], angles='xy', scale_units='xy', scale=1, color='red', width=0.005, label='v1')
axes[0].quiver(*origin, v2[0], v2[1], angles='xy', scale_units='xy', scale=1, color='blue', width=0.005, label='v2')
axes[0].quiver(*origin, (v1+v2)[0], (v1+v2)[1], angles='xy', scale_units='xy', scale=1, color='green', width=0.005, label='v1+v2')

# Add vector labels
axes[0].text(v1[0]+0.1, v1[1]+0.1, 'v1', fontsize=12, color='red')
axes[0].text(v2[0]+0.1, v2[1]+0.1, 'v2', fontsize=12, color='blue')
axes[0].text((v1+v2)[0]+0.1, (v1+v2)[1]+0.1, 'v1+v2', fontsize=12, color='green')

axes[0].set_xlim(-1, 6)
axes[0].set_ylim(-1, 7)
axes[0].set_aspect('equal')
axes[0].grid(True, alpha=0.3)
axes[0].set_title('Vector Addition')
axes[0].set_xlabel('x')
axes[0].set_ylabel('y')

# Plot 2: Dot product geometric interpretation
# Project v2 onto v1
v1_unit = v1 / np.linalg.norm(v1)
projection_length = np.dot(v2, v1_unit)
projection = projection_length * v1_unit

axes[1].quiver(*origin, v1[0], v1[1], angles='xy', scale_units='xy', scale=1, color='red', width=0.005)
axes[1].quiver(*origin, v2[0], v2[1], angles='xy', scale_units='xy', scale=1, color='blue', width=0.005)
axes[1].quiver(*origin, projection[0], projection[1], angles='xy', scale_units='xy', scale=1, color='purple', width=0.005)

# Draw projection line
axes[1].plot([v2[0], projection[0]], [v2[1], projection[1]], 'k--', alpha=0.5)

axes[1].text(v1[0]+0.1, v1[1]+0.1, 'v1', fontsize=12, color='red')
axes[1].text(v2[0]+0.1, v2[1]+0.1, 'v2', fontsize=12, color='blue')
axes[1].text(projection[0]+0.1, projection[1]-0.3, 'proj', fontsize=12, color='purple')

axes[1].set_xlim(-1, 4)
axes[1].set_ylim(-1, 5)
axes[1].set_aspect('equal')
axes[1].grid(True, alpha=0.3)
axes[1].set_title(f'Dot Product = {dot_product}\n(Projection of v2 onto v1)')
axes[1].set_xlabel('x')
axes[1].set_ylabel('y')

# Plot 3: Different norms visualization
circle_angles = np.linspace(0, 2*np.pi, 100)

# L2 norm (unit circle)
l2_circle_x = np.cos(circle_angles)
l2_circle_y = np.sin(circle_angles)
axes[2].plot(l2_circle_x, l2_circle_y, 'b-', label='L2 norm (||v||₂ = 1)', linewidth=2)

# L1 norm (diamond)
l1_diamond_x = [1, 0, -1, 0, 1]
l1_diamond_y = [0, 1, 0, -1, 0]
axes[2].plot(l1_diamond_x, l1_diamond_y, 'r-', label='L1 norm (||v||₁ = 1)', linewidth=2)

# L∞ norm (square)
linf_square_x = [1, 1, -1, -1, 1]
linf_square_y = [1, -1, -1, 1, 1]
axes[2].plot(linf_square_x, linf_square_y, 'g-', label='L∞ norm (||v||∞ = 1)', linewidth=2)

axes[2].set_xlim(-1.5, 1.5)
axes[2].set_ylim(-1.5, 1.5)
axes[2].set_aspect('equal')
axes[2].grid(True, alpha=0.3)
axes[2].set_title('Unit Balls for Different Norms')
axes[2].set_xlabel('x')
axes[2].set_ylabel('y')
axes[2].legend()

plt.tight_layout()
plt.show()

print("\n🎯 Key Vector Concepts:")
vector_concepts = [
    "Vectors represent both magnitude and direction",
    "Dot product measures similarity/projection",
    "Unit vectors have magnitude 1 (normalized)",
    "Different norms capture different notions of 'size'",
    "Orthogonal vectors have dot product = 0"
]

for concept in vector_concepts:
    print(f"• {concept}")

## Matrices and Matrix Operations

In [None]:
# Matrix basics
print("🔲 Matrix Operations")
print("=" * 30)

# Create matrices
A = np.array([[1, 2, 3],
              [4, 5, 6]])

B = np.array([[7, 8],
              [9, 10],
              [11, 12]])

C = np.array([[1, 2],
              [3, 4]])

print("Matrix A (2×3):")
print(A)
print(f"Shape: {A.shape}")
print()

print("Matrix B (3×2):")
print(B)
print(f"Shape: {B.shape}")
print()

print("Matrix C (2×2):")
print(C)
print(f"Shape: {C.shape}")
print()

# Matrix multiplication
print("Matrix Multiplication:")
AB = A @ B  # or np.dot(A, B)
print(f"A @ B (2×3) @ (3×2) = (2×2):")
print(AB)
print()

# Element-wise operations
print("Element-wise Operations:")
print(f"C + C:")
print(C + C)
print(f"C * C (element-wise):")
print(C * C)
print(f"C @ C (matrix multiplication):")
print(C @ C)
print()

# Transpose
print("Matrix Transpose:")
print(f"A:")
print(A)
print(f"A.T (transpose):")
print(A.T)
print()

In [None]:
# Special matrices
print("🔷 Special Matrices")
print("=" * 30)

# Identity matrix
I = np.eye(3)
print("Identity Matrix (3×3):")
print(I)
print()

# Zero matrix
Z = np.zeros((2, 3))
print("Zero Matrix (2×3):")
print(Z)
print()

# Ones matrix
O = np.ones((3, 2))
print("Ones Matrix (3×2):")
print(O)
print()

# Random matrix
R = np.random.randn(3, 3)
print("Random Matrix (3×3):")
print(R.round(3))
print()

# Diagonal matrix
D = np.diag([1, 2, 3, 4])
print("Diagonal Matrix:")
print(D)
print()

# Matrix properties
test_matrix = np.array([[2, 1], [1, 2]])
print(f"Test Matrix:")
print(test_matrix)
print()

# Determinant
det = np.linalg.det(test_matrix)
print(f"Determinant: {det:.3f}")

# Trace (sum of diagonal elements)
trace = np.trace(test_matrix)
print(f"Trace: {trace}")

# Rank
rank = np.linalg.matrix_rank(test_matrix)
print(f"Rank: {rank}")
print()

# Matrix inverse (if it exists)
if det != 0:
    inv_matrix = np.linalg.inv(test_matrix)
    print("Matrix Inverse:")
    print(inv_matrix.round(3))
    
    # Verify: A * A^(-1) = I
    verification = test_matrix @ inv_matrix
    print("Verification (A @ A^(-1)):")
    print(verification.round(10))  # Round to avoid floating point errors
else:
    print("Matrix is singular (not invertible)")

In [None]:
# Visualize matrix operations
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Sample matrices for visualization
M1 = np.array([[3, 1], [1, 2]])
M2 = np.array([[2, 0], [1, 1]])

# Plot matrices as heatmaps
im1 = axes[0, 0].imshow(M1, cmap='viridis', interpolation='nearest')
axes[0, 0].set_title('Matrix M1')
for i in range(M1.shape[0]):
    for j in range(M1.shape[1]):
        axes[0, 0].text(j, i, f'{M1[i, j]}', ha='center', va='center', color='white', fontsize=14)

im2 = axes[0, 1].imshow(M2, cmap='viridis', interpolation='nearest')
axes[0, 1].set_title('Matrix M2')
for i in range(M2.shape[0]):
    for j in range(M2.shape[1]):
        axes[0, 1].text(j, i, f'{M2[i, j]}', ha='center', va='center', color='white', fontsize=14)

# Matrix multiplication result
M_product = M1 @ M2
im3 = axes[1, 0].imshow(M_product, cmap='viridis', interpolation='nearest')
axes[1, 0].set_title('M1 @ M2 (Matrix Product)')
for i in range(M_product.shape[0]):
    for j in range(M_product.shape[1]):
        axes[1, 0].text(j, i, f'{M_product[i, j]}', ha='center', va='center', color='white', fontsize=14)

# Element-wise multiplication
M_elementwise = M1 * M2
im4 = axes[1, 1].imshow(M_elementwise, cmap='viridis', interpolation='nearest')
axes[1, 1].set_title('M1 * M2 (Element-wise)')
for i in range(M_elementwise.shape[0]):
    for j in range(M_elementwise.shape[1]):
        axes[1, 1].text(j, i, f'{M_elementwise[i, j]}', ha='center', va='center', color='white', fontsize=14)

# Remove ticks for cleaner look
for ax in axes.flat:
    ax.set_xticks([])
    ax.set_yticks([])

plt.tight_layout()
plt.show()

# Show the calculation step by step
print("🔍 Matrix Multiplication Step by Step:")
print("=" * 45)
print(f"M1 @ M2 calculation:")
print(f"M1 = {M1.tolist()}")
print(f"M2 = {M2.tolist()}")
print()

for i in range(M1.shape[0]):
    for j in range(M2.shape[1]):
        row = M1[i, :]
        col = M2[:, j]
        result = np.dot(row, col)
        print(f"Position ({i},{j}): {row} · {col} = {result}")

print(f"\nResult: {M_product.tolist()}")

## Eigenvalues and Eigenvectors

In [None]:
# Eigenvalues and eigenvectors
print("🔮 Eigenvalues and Eigenvectors")
print("=" * 40)

# Create a symmetric matrix (real eigenvalues)
A = np.array([[4, 2], [2, 3]])

print("Matrix A:")
print(A)
print()

# Compute eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)

print("Eigenvalues:")
for i, val in enumerate(eigenvalues):
    print(f"λ{i+1} = {val:.3f}")
print()

print("Eigenvectors:")
for i, vec in enumerate(eigenvectors.T):
    print(f"v{i+1} = {vec}")
print()

# Verify the eigenvalue equation: Av = λv
print("Verification (Av = λv):")
for i in range(len(eigenvalues)):
    λ = eigenvalues[i]
    v = eigenvectors[:, i]
    
    Av = A @ v
    λv = λ * v
    
    print(f"\nEigenvalue {i+1}: λ = {λ:.3f}")
    print(f"Av = {Av.round(3)}")
    print(f"λv = {λv.round(3)}")
    print(f"Equal? {np.allclose(Av, λv)}")

print("\n" + "="*40)

# Eigendecomposition
# A = PDP^(-1) where P is eigenvectors, D is diagonal eigenvalues
P = eigenvectors
D = np.diag(eigenvalues)
P_inv = np.linalg.inv(P)

A_reconstructed = P @ D @ P_inv

print("Eigendecomposition: A = PDP^(-1)")
print(f"P (eigenvectors):\n{P.round(3)}")
print(f"\nD (eigenvalues):\n{D.round(3)}")
print(f"\nP^(-1):\n{P_inv.round(3)}")
print(f"\nReconstructed A:\n{A_reconstructed.round(3)}")
print(f"\nOriginal A:\n{A}")
print(f"Reconstruction accurate? {np.allclose(A, A_reconstructed)}")

In [None]:
# Visualize eigenvectors
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Plot 1: Eigenvectors of the matrix
origin = [0, 0]

# Plot eigenvectors
for i, (val, vec) in enumerate(zip(eigenvalues, eigenvectors.T)):
    color = ['red', 'blue'][i]
    # Scale eigenvectors by eigenvalue for visualization
    scaled_vec = vec * val * 0.5
    axes[0].quiver(*origin, scaled_vec[0], scaled_vec[1], 
                   angles='xy', scale_units='xy', scale=1, 
                   color=color, width=0.008, 
                   label=f'v{i+1} (λ={val:.2f})')
    
    # Add vector labels
    axes[0].text(scaled_vec[0]*1.2, scaled_vec[1]*1.2, f'v{i+1}', 
                 fontsize=12, color=color)

# Add unit circle for reference
theta = np.linspace(0, 2*np.pi, 100)
axes[0].plot(np.cos(theta), np.sin(theta), 'k--', alpha=0.3, label='Unit circle')

axes[0].set_xlim(-3, 3)
axes[0].set_ylim(-3, 3)
axes[0].set_aspect('equal')
axes[0].grid(True, alpha=0.3)
axes[0].set_title('Eigenvectors of Matrix A')
axes[0].set_xlabel('x')
axes[0].set_ylabel('y')
axes[0].legend()

# Plot 2: Transformation visualization
# Create a set of points
n_points = 20
theta_points = np.linspace(0, 2*np.pi, n_points)
unit_circle_points = np.array([np.cos(theta_points), np.sin(theta_points)])

# Transform points by matrix A
transformed_points = A @ unit_circle_points

# Plot original and transformed points
axes[1].plot(unit_circle_points[0], unit_circle_points[1], 'bo-', 
             alpha=0.6, label='Original (unit circle)', markersize=4)
axes[1].plot(transformed_points[0], transformed_points[1], 'ro-', 
             alpha=0.6, label='Transformed by A', markersize=4)

# Plot eigenvectors on transformed space
for i, (val, vec) in enumerate(zip(eigenvalues, eigenvectors.T)):
    color = ['darkred', 'darkblue'][i]
    scaled_vec = vec * val
    axes[1].quiver(*origin, scaled_vec[0], scaled_vec[1], 
                   angles='xy', scale_units='xy', scale=1, 
                   color=color, width=0.008)

axes[1].set_xlim(-6, 6)
axes[1].set_ylim(-6, 6)
axes[1].set_aspect('equal')
axes[1].grid(True, alpha=0.3)
axes[1].set_title('Linear Transformation by Matrix A')
axes[1].set_xlabel('x')
axes[1].set_ylabel('y')
axes[1].legend()

plt.tight_layout()
plt.show()

print("\n🎯 Eigenvalue/Eigenvector Interpretation:")
eigen_concepts = [
    "Eigenvectors: directions that don't change under transformation",
    "Eigenvalues: how much the eigenvectors are scaled",
    "λ > 1: eigenvector is stretched",
    "0 < λ < 1: eigenvector is compressed",
    "λ < 0: eigenvector is flipped and scaled",
    "Used in PCA, PageRank, stability analysis, quantum mechanics"
]

for concept in eigen_concepts:
    print(f"• {concept}")

## Matrix Decomposition: SVD and PCA

In [None]:
# Singular Value Decomposition (SVD)
print("🔀 Singular Value Decomposition (SVD)")
print("=" * 45)

# Create a sample matrix
np.random.seed(42)
A = np.random.randn(4, 3)

print(f"Original matrix A ({A.shape[0]}×{A.shape[1]}):")
print(A.round(3))
print()

# Perform SVD: A = UΣV^T
U, s, Vt = np.linalg.svd(A, full_matrices=False)

print(f"U matrix ({U.shape[0]}×{U.shape[1]}):")
print(U.round(3))
print(f"\nSingular values s ({len(s)} values):")
print(s.round(3))
print(f"\nV^T matrix ({Vt.shape[0]}×{Vt.shape[1]}):")
print(Vt.round(3))
print()

# Reconstruct the matrix
S = np.diag(s)
A_reconstructed = U @ S @ Vt

print("Reconstructed A:")
print(A_reconstructed.round(3))
print(f"Reconstruction error: {np.linalg.norm(A - A_reconstructed):.2e}")
print()

# Low-rank approximation
print("Low-rank approximations:")
for k in range(1, min(A.shape) + 1):
    # Use only first k components
    A_k = U[:, :k] @ np.diag(s[:k]) @ Vt[:k, :]
    error = np.linalg.norm(A - A_k)
    variance_explained = np.sum(s[:k]**2) / np.sum(s**2) * 100
    print(f"  Rank-{k}: Error = {error:.3f}, Variance explained = {variance_explained:.1f}%")

print("\n" + "="*45)

# Principal Component Analysis (PCA)
print("📊 Principal Component Analysis (PCA)")
print("=" * 45)

# Generate sample 2D data
np.random.seed(42)
n_samples = 100

# Create correlated data
X = np.random.randn(n_samples, 2)
X[:, 1] = X[:, 0] + 0.5 * np.random.randn(n_samples)  # Add correlation

# Add some rotation
angle = np.pi / 6  # 30 degrees
rotation_matrix = np.array([[np.cos(angle), -np.sin(angle)],
                           [np.sin(angle), np.cos(angle)]])
X = X @ rotation_matrix.T

# Center the data
X_centered = X - np.mean(X, axis=0)

# Perform PCA manually using SVD
U, s, Vt = np.linalg.svd(X_centered, full_matrices=False)
principal_components = Vt.T  # Each column is a principal component

# Explained variance
explained_variance = s**2 / (n_samples - 1)
explained_variance_ratio = explained_variance / np.sum(explained_variance)

print(f"Data shape: {X.shape}")
print(f"Data mean: {np.mean(X, axis=0).round(3)}")
print(f"Data covariance matrix:")
print(np.cov(X.T).round(3))
print()

print("Principal Components:")
for i in range(len(explained_variance)):
    print(f"PC{i+1}: {principal_components[:, i].round(3)} (explains {explained_variance_ratio[i]*100:.1f}% variance)")
print()

# Transform data to PC space
X_pca = X_centered @ principal_components

print(f"Transformed data shape: {X_pca.shape}")
print(f"PC1 variance: {np.var(X_pca[:, 0]):.3f}")
print(f"PC2 variance: {np.var(X_pca[:, 1]):.3f}")
print(f"Correlation between PCs: {np.corrcoef(X_pca.T)[0, 1]:.3f}")

In [None]:
# Visualize PCA
fig, axes = plt.subplots(1, 3, figsize=(18, 6))

# Plot 1: Original data with principal components
axes[0].scatter(X[:, 0], X[:, 1], alpha=0.6, c='blue')

# Plot principal components
center = np.mean(X, axis=0)
for i, (pc, var_ratio) in enumerate(zip(principal_components.T, explained_variance_ratio)):
    color = ['red', 'green'][i]
    # Scale by sqrt of eigenvalue for visualization
    pc_scaled = pc * np.sqrt(explained_variance[i]) * 2
    axes[0].arrow(center[0], center[1], pc_scaled[0], pc_scaled[1],
                  head_width=0.1, head_length=0.1, fc=color, ec=color, linewidth=2)
    axes[0].text(center[0] + pc_scaled[0]*1.2, center[1] + pc_scaled[1]*1.2, 
                 f'PC{i+1}\n({var_ratio*100:.1f}%)', fontsize=10, color=color, ha='center')

axes[0].set_aspect('equal')
axes[0].grid(True, alpha=0.3)
axes[0].set_title('Original Data with Principal Components')
axes[0].set_xlabel('X1')
axes[0].set_ylabel('X2')

# Plot 2: Data in PC space
axes[1].scatter(X_pca[:, 0], X_pca[:, 1], alpha=0.6, c='purple')
axes[1].axhline(y=0, color='k', linestyle='-', alpha=0.3)
axes[1].axvline(x=0, color='k', linestyle='-', alpha=0.3)
axes[1].set_aspect('equal')
axes[1].grid(True, alpha=0.3)
axes[1].set_title('Data in Principal Component Space')
axes[1].set_xlabel('PC1')
axes[1].set_ylabel('PC2')

# Plot 3: Explained variance
pcs = [f'PC{i+1}' for i in range(len(explained_variance_ratio))]
axes[2].bar(pcs, explained_variance_ratio * 100, color=['red', 'green'])
axes[2].set_title('Explained Variance by Component')
axes[2].set_ylabel('Explained Variance (%)')
axes[2].grid(True, alpha=0.3)

# Add cumulative variance line
cumulative_var = np.cumsum(explained_variance_ratio) * 100
ax2_twin = axes[2].twinx()
ax2_twin.plot(pcs, cumulative_var, 'bo-', color='orange', linewidth=2, markersize=8)
ax2_twin.set_ylabel('Cumulative Explained Variance (%)', color='orange')
ax2_twin.tick_params(axis='y', labelcolor='orange')

plt.tight_layout()
plt.show()

print("\n🎯 PCA Key Insights:")
pca_insights = [
    "PCA finds directions of maximum variance in data",
    "Principal components are orthogonal (uncorrelated)",
    "First PC captures most variance, second PC captures second most, etc.",
    "Used for dimensionality reduction and data visualization",
    "Helps identify most important features in high-dimensional data",
    "Foundation for many ML algorithms and data compression"
]

for insight in pca_insights:
    print(f"• {insight}")

## Applications in Machine Learning and NLP

In [None]:
# Application 1: Document similarity using cosine similarity
print("📚 Application 1: Document Similarity (NLP)")
print("=" * 50)

# Simulate document-term matrix (documents as rows, terms as columns)
documents = [
    "machine learning algorithms",
    "deep learning neural networks", 
    "natural language processing",
    "computer vision image recognition",
    "data science analytics"
]

# Simple bag-of-words representation (in practice, use TF-IDF)
vocabulary = ['machine', 'learning', 'algorithms', 'deep', 'neural', 'networks', 
              'natural', 'language', 'processing', 'computer', 'vision', 'image', 
              'recognition', 'data', 'science', 'analytics']

# Create document-term matrix
doc_term_matrix = np.zeros((len(documents), len(vocabulary)))

for i, doc in enumerate(documents):
    words = doc.split()
    for word in words:
        if word in vocabulary:
            j = vocabulary.index(word)
            doc_term_matrix[i, j] = 1  # Binary occurrence

print("Document-Term Matrix:")
print(f"Shape: {doc_term_matrix.shape}")
print(doc_term_matrix.astype(int))
print()

# Calculate cosine similarity between documents
def cosine_similarity(v1, v2):
    """Calculate cosine similarity between two vectors."""
    dot_product = np.dot(v1, v2)
    norm_product = np.linalg.norm(v1) * np.linalg.norm(v2)
    if norm_product == 0:
        return 0
    return dot_product / norm_product

# Create similarity matrix
n_docs = len(documents)
similarity_matrix = np.zeros((n_docs, n_docs))

for i in range(n_docs):
    for j in range(n_docs):
        similarity_matrix[i, j] = cosine_similarity(doc_term_matrix[i], doc_term_matrix[j])

print("Document Similarity Matrix (Cosine Similarity):")
print(similarity_matrix.round(3))
print()

# Find most similar documents
print("Document pairs and their similarities:")
for i in range(n_docs):
    for j in range(i+1, n_docs):
        sim = similarity_matrix[i, j]
        print(f"Doc {i+1} & Doc {j+1}: {sim:.3f}")
        print(f"  '{documents[i]}' vs '{documents[j]}'")

print("\n" + "="*50)

# Application 2: Linear regression using linear algebra
print("📈 Application 2: Linear Regression")
print("=" * 50)

# Generate synthetic data
np.random.seed(42)
n_samples = 100
X = np.random.randn(n_samples, 2)  # 2 features
true_coefficients = np.array([3, -2])
true_intercept = 1
noise = np.random.randn(n_samples) * 0.5

# y = X @ coefficients + intercept + noise
y = X @ true_coefficients + true_intercept + noise

print(f"Data shape: X = {X.shape}, y = {y.shape}")
print(f"True coefficients: {true_coefficients}")
print(f"True intercept: {true_intercept}")
print()

# Add intercept term to X (bias column)
X_with_intercept = np.column_stack([np.ones(n_samples), X])

# Solve using normal equation: θ = (X^T X)^(-1) X^T y
XtX = X_with_intercept.T @ X_with_intercept
Xty = X_with_intercept.T @ y
theta = np.linalg.solve(XtX, Xty)  # More stable than using inverse

estimated_intercept = theta[0]
estimated_coefficients = theta[1:]

print("Linear Regression Results:")
print(f"Estimated intercept: {estimated_intercept:.3f} (true: {true_intercept})")
print(f"Estimated coefficients: {estimated_coefficients.round(3)} (true: {true_coefficients})")
print()

# Calculate predictions and error
y_pred = X_with_intercept @ theta
mse = np.mean((y - y_pred)**2)
r_squared = 1 - np.sum((y - y_pred)**2) / np.sum((y - np.mean(y))**2)

print(f"Mean Squared Error: {mse:.4f}")
print(f"R-squared: {r_squared:.4f}")

print("\n" + "="*50)

# Application 3: Dimensionality reduction with PCA
print("🎯 Application 3: Dimensionality Reduction")
print("=" * 50)

# Generate high-dimensional data with intrinsic lower dimensionality
np.random.seed(42)
n_samples = 200
n_features = 10

# Create data that lies mostly in a 2D subspace
latent_dim = 2
latent_data = np.random.randn(n_samples, latent_dim)

# Random projection to higher dimensional space
projection_matrix = np.random.randn(latent_dim, n_features)
X_high_dim = latent_data @ projection_matrix

# Add some noise
X_high_dim += np.random.randn(n_samples, n_features) * 0.1

print(f"High-dimensional data shape: {X_high_dim.shape}")

# Apply PCA
from sklearn.decomposition import PCA

pca = PCA()
X_pca = pca.fit_transform(X_high_dim)

# Analyze explained variance
explained_var_ratio = pca.explained_variance_ratio_
cumulative_var = np.cumsum(explained_var_ratio)

print("\nExplained variance by component:")
for i, var_ratio in enumerate(explained_var_ratio[:5]):
    print(f"PC{i+1}: {var_ratio:.4f} ({var_ratio*100:.2f}%)")

print(f"\nCumulative variance explained by first 2 components: {cumulative_var[1]:.4f} ({cumulative_var[1]*100:.2f}%)")
print(f"Cumulative variance explained by first 5 components: {cumulative_var[4]:.4f} ({cumulative_var[4]*100:.2f}%)")

# Determine number of components for 95% variance
n_components_95 = np.argmax(cumulative_var >= 0.95) + 1
print(f"\nComponents needed for 95% variance: {n_components_95}")

In [None]:
# Visualize applications
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# Plot 1: Document similarity heatmap
im1 = axes[0, 0].imshow(similarity_matrix, cmap='viridis', interpolation='nearest')
axes[0, 0].set_title('Document Similarity Matrix')
axes[0, 0].set_xlabel('Document')
axes[0, 0].set_ylabel('Document')
axes[0, 0].set_xticks(range(n_docs))
axes[0, 0].set_yticks(range(n_docs))
axes[0, 0].set_xticklabels([f'Doc{i+1}' for i in range(n_docs)])
axes[0, 0].set_yticklabels([f'Doc{i+1}' for i in range(n_docs)])

# Add similarity values to heatmap
for i in range(n_docs):
    for j in range(n_docs):
        axes[0, 0].text(j, i, f'{similarity_matrix[i, j]:.2f}', 
                        ha='center', va='center', color='white')

plt.colorbar(im1, ax=axes[0, 0])

# Plot 2: Linear regression results
axes[0, 1].scatter(y, y_pred, alpha=0.6)
min_val, max_val = min(y.min(), y_pred.min()), max(y.max(), y_pred.max())
axes[0, 1].plot([min_val, max_val], [min_val, max_val], 'r--', linewidth=2)
axes[0, 1].set_xlabel('True y')
axes[0, 1].set_ylabel('Predicted y')
axes[0, 1].set_title(f'Linear Regression Results\nR² = {r_squared:.3f}')
axes[0, 1].grid(True, alpha=0.3)

# Plot 3: PCA explained variance
n_components_show = 8
axes[1, 0].bar(range(1, n_components_show + 1), 
               explained_var_ratio[:n_components_show] * 100, 
               alpha=0.7, color='skyblue')
axes[1, 0].plot(range(1, n_components_show + 1), 
                cumulative_var[:n_components_show] * 100, 
                'ro-', linewidth=2, markersize=6)
axes[1, 0].set_xlabel('Principal Component')
axes[1, 0].set_ylabel('Explained Variance (%)')
axes[1, 0].set_title('PCA Explained Variance')
axes[1, 0].grid(True, alpha=0.3)
axes[1, 0].axhline(y=95, color='red', linestyle='--', alpha=0.7, label='95% threshold')
axes[1, 0].legend()

# Plot 4: First two principal components
axes[1, 1].scatter(X_pca[:, 0], X_pca[:, 1], alpha=0.6, c=latent_data[:, 0], cmap='viridis')
axes[1, 1].set_xlabel('First Principal Component')
axes[1, 1].set_ylabel('Second Principal Component')
axes[1, 1].set_title('Data in PC Space (colored by latent feature)')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n🚀 Linear Algebra in ML/NLP Summary:")
applications = [
    "Document similarity: Cosine similarity using dot products",
    "Linear regression: Matrix operations for parameter estimation",
    "PCA: Eigendecomposition for dimensionality reduction",
    "Neural networks: Matrix multiplications in forward/backward pass",
    "Word embeddings: Vector representations of words",
    "SVD: Matrix factorization for recommendation systems",
    "Optimization: Gradient descent uses vector operations",
    "Computer vision: Convolutions as matrix operations"
]

for app in applications:
    print(f"• {app}")

## Key Takeaways

### Essential Linear Algebra Concepts:

1. **Vectors**:
   - Represent data points, features, parameters
   - Dot product measures similarity/projection
   - Norms measure magnitude/distance
   - Foundation for all ML algorithms

2. **Matrices**:
   - Represent datasets, transformations, systems
   - Matrix multiplication combines transformations
   - Transpose, inverse, determinant have geometric meaning
   - Used in data representation and model parameters

3. **Eigendecomposition**:
   - Finds principal directions of transformation
   - Used in PCA, stability analysis, PageRank
   - Eigenvalues show importance of directions
   - Eigenvectors are the principal directions

4. **SVD (Singular Value Decomposition)**:
   - Most general matrix decomposition
   - Works for any matrix (not just square)
   - Foundation of PCA, recommender systems
   - Enables low-rank approximations

5. **PCA (Principal Component Analysis)**:
   - Finds directions of maximum variance
   - Dimensionality reduction technique
   - Data compression and visualization
   - Removes correlation between features

### Applications in Data Science:

**Machine Learning:**
- Linear/logistic regression: Matrix operations
- Neural networks: Matrix multiplications
- SVM: Inner products in feature space
- Clustering: Distance calculations

**NLP:**
- Word embeddings: Vector representations
- Document similarity: Cosine similarity
- Topic modeling: Matrix factorization
- Transformers: Attention via matrix operations

**Computer Vision:**
- Image as matrices
- Convolutions: Matrix operations
- PCA for face recognition
- Image transformations

### Essential NumPy Operations:

```python
# Vector operations
np.dot(v1, v2)          # Dot product
np.linalg.norm(v)       # Vector norm
v / np.linalg.norm(v)   # Unit vector

# Matrix operations
A @ B                   # Matrix multiplication
A.T                     # Transpose
np.linalg.inv(A)        # Inverse
np.linalg.det(A)        # Determinant

# Decompositions
eigenvals, eigenvecs = np.linalg.eig(A)    # Eigendecomposition
U, s, Vt = np.linalg.svd(A)               # SVD
```

### Why Linear Algebra Matters:

1. **Efficiency**: Vectorized operations are much faster
2. **Scalability**: Handle large datasets efficiently
3. **Generality**: Same operations work in any dimension
4. **Optimization**: Gradient-based methods use linear algebra
5. **Understanding**: Provides geometric intuition for algorithms

## Practice Exercises

1. **Implement linear regression** from scratch using only NumPy
2. **Build a simple neural network** with matrix operations
3. **Create a document recommender** using cosine similarity
4. **Perform image compression** using SVD
5. **Implement PCA** for data visualization
6. **Solve a system of equations** using different methods
7. **Create a PageRank algorithm** using eigenvalues
8. **Build a collaborative filtering system** with matrix factorization

## Next Steps

Master linear algebra to:
- **Understand ML algorithms** at a deeper level
- **Debug and optimize** model performance
- **Implement custom algorithms** efficiently
- **Work with high-dimensional data** confidently
- **Contribute to research** in ML/AI

Linear algebra is the language of machine learning – master it to unlock the full potential of data science!