# 📐 Math for Data Science: Essential Foundations

<img src='https://miro.medium.com/max/1400/1*L76A5gL6176UbMgn7q4Ybg.jpeg' width='600' alt='Math for DS'>

## 🎯 Why Math Matters in Data Science

**Don't worry!** You don't need to be a math genius. We'll learn just enough math to understand:
- How machine learning algorithms work
- Why certain techniques are used
- How to optimize models
- What's happening "under the hood"

### 📚 What We'll Master Today:
1. **Linear Algebra Basics** - Vectors, matrices, operations
2. **Essential Calculus** - Derivatives for optimization
3. **Key Concepts** - Distance metrics, dimensionality
4. **Practical Applications** - See math in action
5. **ML Connections** - Where each concept is used

### 🎓 Learning Approach:
- **Visual First** - See it, then understand it
- **Code Everything** - NumPy makes math tangible
- **Real Examples** - Not abstract theory
- **Build Intuition** - Understanding > memorization

---

## 🚀 Let's Make Math Fun and Practical!

In [None]:
# Import essential libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from mpl_toolkits.mplot3d import Axes3D
import warnings
warnings.filterwarnings('ignore')

# Set style for beautiful plots
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')

print("📐 Math for Data Science - Ready to Learn!")
print("\n💡 Remember: We're learning practical math, not theory!")

---

## 📌 Section 1: Vectors - The Building Blocks

### 🎯 What is a Vector?

Think of a vector as:
- **In ML**: A list of features for one data point
- **In Math**: An arrow with direction and magnitude
- **In Code**: A 1D NumPy array

<img src='https://miro.medium.com/max/1400/1*AdhTIAfILudMOc0kfHHLkw.png' width='400' alt='Vector'>

In [None]:
# 1.1 Creating and Understanding Vectors
print("🎯 VECTORS IN DATA SCIENCE\n" + "="*40)

# Example: House features as a vector
# [bedrooms, bathrooms, size_sqft, age_years, price_1000s]
house_vector = np.array([3, 2, 1500, 10, 250])

print("House Features Vector:")
print(f"Vector: {house_vector}")
print(f"Dimensions: {house_vector.shape[0]}")
print(f"Type: {type(house_vector)}")

# Visualize 2D vector
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# 2D Vector visualization
vector_2d = np.array([3, 4])
ax1.quiver(0, 0, vector_2d[0], vector_2d[1], angles='xy', scale_units='xy', scale=1, color='blue', width=0.01)
ax1.set_xlim(-1, 6)
ax1.set_ylim(-1, 6)
ax1.grid(True)
ax1.set_xlabel('X')
ax1.set_ylabel('Y')
ax1.set_title(f'2D Vector: {vector_2d}')
ax1.axhline(y=0, color='k', linewidth=0.5)
ax1.axvline(x=0, color='k', linewidth=0.5)

# Multiple vectors (dataset)
np.random.seed(42)
dataset = np.random.randn(50, 2) * 2 + 3
ax2.scatter(dataset[:, 0], dataset[:, 1], alpha=0.6)
ax2.set_xlabel('Feature 1')
ax2.set_ylabel('Feature 2')
ax2.set_title('Dataset: Each Point is a Vector')
ax2.grid(True)

plt.tight_layout()
plt.show()

In [None]:
# 1.2 Vector Operations
print("➕ VECTOR OPERATIONS\n" + "="*40)

# Define two vectors
v1 = np.array([1, 2, 3])
v2 = np.array([4, 5, 6])

print(f"Vector 1: {v1}")
print(f"Vector 2: {v2}")
print()

# Basic operations
print("📊 Basic Operations:")
print(f"Addition: v1 + v2 = {v1 + v2}")
print(f"Subtraction: v1 - v2 = {v1 - v2}")
print(f"Scalar multiplication: 3 * v1 = {3 * v1}")
print(f"Element-wise multiplication: v1 * v2 = {v1 * v2}")
print()

# Important vector operations
print("🎯 Key Operations for ML:")
print(f"Dot product: v1 · v2 = {np.dot(v1, v2)}")
print(f"Magnitude (norm) of v1: ||v1|| = {np.linalg.norm(v1):.3f}")
print(f"Unit vector of v1: {v1 / np.linalg.norm(v1)}")
print()

# Angle between vectors
cos_angle = np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))
angle_rad = np.arccos(cos_angle)
angle_deg = np.degrees(angle_rad)
print(f"Angle between v1 and v2: {angle_deg:.2f}°")

# Practical example: Cosine similarity (used in recommendation systems)
print("\n📈 Practical Application: Cosine Similarity")
user1_ratings = np.array([5, 3, 0, 4, 4])  # Movie ratings
user2_ratings = np.array([4, 0, 0, 5, 4])

cos_sim = np.dot(user1_ratings, user2_ratings) / (np.linalg.norm(user1_ratings) * np.linalg.norm(user2_ratings))
print(f"User 1 ratings: {user1_ratings}")
print(f"User 2 ratings: {user2_ratings}")
print(f"Cosine similarity: {cos_sim:.3f}")
print(f"Interpretation: Users are {cos_sim*100:.1f}% similar in taste")

### 🏋️ Exercise 1: Vector Practice

Calculate:
1. The distance between two points
2. The dot product interpretation
3. Normalize a vector

In [None]:
# Your solution here:

# Solution:
print("📝 VECTOR EXERCISES\n" + "="*40)

# 1. Euclidean distance
point1 = np.array([1, 2, 3])
point2 = np.array([4, 6, 8])
distance = np.linalg.norm(point2 - point1)
print(f"1. Distance between {point1} and {point2}: {distance:.3f}")

# 2. Dot product as projection
v_a = np.array([3, 4])
v_b = np.array([1, 0])  # Unit vector along x-axis
projection = np.dot(v_a, v_b)
print(f"\n2. Projection of {v_a} onto x-axis: {projection}")
print(f"   (This is the x-component of the vector!)")

# 3. Normalize vector
vector = np.array([3, 4, 0])
normalized = vector / np.linalg.norm(vector)
print(f"\n3. Original vector: {vector}")
print(f"   Normalized: {normalized}")
print(f"   New magnitude: {np.linalg.norm(normalized):.0f}")

---

## 📌 Section 2: Matrices - The Data Tables

### 🎯 What is a Matrix?

- **In ML**: Your entire dataset (rows = samples, columns = features)
- **In Math**: A 2D array of numbers
- **In Code**: A 2D NumPy array or DataFrame

In [None]:
# 2.1 Creating and Understanding Matrices
print("📊 MATRICES IN DATA SCIENCE\n" + "="*40)

# Dataset as a matrix
# Each row = one house, Each column = one feature
houses_matrix = np.array([
    [3, 2, 1500, 10, 250],  # House 1
    [4, 3, 2000, 5, 350],   # House 2
    [2, 1, 900, 20, 150],   # House 3
    [5, 4, 3000, 2, 500]    # House 4
])

features = ['Bedrooms', 'Bathrooms', 'SqFt', 'Age', 'Price($1000)']

print("Housing Dataset Matrix:")
print(pd.DataFrame(houses_matrix, columns=features))
print(f"\nShape: {houses_matrix.shape} (samples × features)")

# Visualize matrix
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# Heatmap of matrix
sns.heatmap(houses_matrix, annot=True, fmt='.0f', cmap='YlOrRd', 
            xticklabels=features, yticklabels=[f'House {i+1}' for i in range(4)],
            ax=ax1)
ax1.set_title('Dataset as Matrix Heatmap')

# Correlation matrix
corr_matrix = np.corrcoef(houses_matrix.T)
sns.heatmap(corr_matrix, annot=True, fmt='.2f', cmap='coolwarm',
            xticklabels=features, yticklabels=features,
            center=0, ax=ax2)
ax2.set_title('Feature Correlation Matrix')

plt.tight_layout()
plt.show()

In [None]:
# 2.2 Matrix Operations
print("✖️ MATRIX OPERATIONS\n" + "="*40)

# Define matrices
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
v = np.array([1, 2])

print("Matrix A:")
print(A)
print("\nMatrix B:")
print(B)
print(f"\nVector v: {v}")
print()

# Basic operations
print("📊 Basic Matrix Operations:")
print("A + B:")
print(A + B)
print("\n2 * A:")
print(2 * A)
print("\nElement-wise multiplication (A * B):")
print(A * B)

# Matrix multiplication
print("\n🎯 Matrix Multiplication (A @ B):")
print(A @ B)
print("\n📝 How it works:")
print(f"Result[0,0] = (1×5 + 2×7) = {1*5 + 2*7}")
print(f"Result[0,1] = (1×6 + 2×8) = {1*6 + 2*8}")

# Matrix-vector multiplication
print("\n🔄 Matrix-Vector Multiplication (A @ v):")
result = A @ v
print(f"Result: {result}")
print("This transforms the vector v using matrix A!")

# Transpose
print("\n🔄 Transpose (A.T):")
print("Original A:")
print(A)
print("Transposed A:")
print(A.T)

In [None]:
# 2.3 Special Matrices
print("🌟 SPECIAL MATRICES\n" + "="*40)

# Identity matrix
I = np.eye(3)
print("Identity Matrix (like the number 1 for matrices):")
print(I)
print("Property: A @ I = A")

# Diagonal matrix
D = np.diag([2, 3, 4])
print("\nDiagonal Matrix:")
print(D)

# Symmetric matrix
S = np.array([[1, 2, 3],
              [2, 4, 5],
              [3, 5, 6]])
print("\nSymmetric Matrix (S = S.T):")
print(S)
print(f"Is symmetric? {np.allclose(S, S.T)}")

# Practical application: Covariance matrix is always symmetric
data = np.random.randn(100, 3)
cov_matrix = np.cov(data.T)
print("\n📊 Covariance Matrix (always symmetric):")
print(cov_matrix.round(3))

---

## 📌 Section 3: Eigenvalues & Eigenvectors - The Hidden Structure

### 🎯 Why They Matter:

- **PCA** uses them for dimensionality reduction
- **PageRank** uses them to rank web pages
- **Recommendation systems** use them for matrix factorization

In [None]:
# 3.1 Understanding Eigenvalues & Eigenvectors
print("🔮 EIGENVALUES & EIGENVECTORS\n" + "="*40)

# Simple example
A = np.array([[3, 1],
              [1, 3]])

# Calculate eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)

print("Matrix A:")
print(A)
print(f"\nEigenvalues: {eigenvalues}")
print(f"\nEigenvectors:")
print(eigenvectors)

# Verify the eigenvalue equation: A @ v = λ @ v
for i in range(len(eigenvalues)):
    eigval = eigenvalues[i]
    eigvec = eigenvectors[:, i]
    
    # Check if A @ v = λ @ v
    left = A @ eigvec
    right = eigval * eigvec
    
    print(f"\n✅ Verification for eigenvalue {eigval:.2f}:")
    print(f"A @ v = {left}")
    print(f"λ * v = {right}")
    print(f"Equal? {np.allclose(left, right)}")

# Visualize eigenvectors
fig, ax = plt.subplots(figsize=(8, 8))

# Plot eigenvectors
for i in range(len(eigenvalues)):
    eigvec = eigenvectors[:, i]
    eigval = eigenvalues[i]
    
    # Original eigenvector
    ax.quiver(0, 0, eigvec[0], eigvec[1], angles='xy', scale_units='xy', 
              scale=1, color=f'C{i}', width=0.01, label=f'Eigenvector {i+1}')
    
    # Transformed by matrix (scaled by eigenvalue)
    transformed = A @ eigvec
    ax.quiver(0, 0, transformed[0], transformed[1], angles='xy', scale_units='xy',
              scale=1, color=f'C{i}', width=0.01, alpha=0.5, linestyle='--',
              label=f'A @ eigenvector {i+1}')

ax.set_xlim(-5, 5)
ax.set_ylim(-5, 5)
ax.grid(True)
ax.axhline(y=0, color='k', linewidth=0.5)
ax.axvline(x=0, color='k', linewidth=0.5)
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_title('Eigenvectors: Special Directions That Only Scale')
ax.legend()
ax.set_aspect('equal')

plt.show()

In [None]:
# 3.2 PCA Application
print("🎯 PCA USING EIGENVALUES\n" + "="*40)

# Create correlated 2D data
np.random.seed(42)
mean = [5, 5]
cov = [[2, 1.5],
       [1.5, 1]]
data = np.random.multivariate_normal(mean, cov, 200)

# Center the data
data_centered = data - np.mean(data, axis=0)

# Calculate covariance matrix
cov_matrix = np.cov(data_centered.T)

# Get eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

# Sort by eigenvalue (largest first)
idx = eigenvalues.argsort()[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]

print(f"Eigenvalues (variance explained): {eigenvalues}")
print(f"\nVariance explained:")
total_var = np.sum(eigenvalues)
for i, eigval in enumerate(eigenvalues):
    print(f"PC{i+1}: {eigval/total_var*100:.1f}%")

# Visualize PCA
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# Original data with principal components
ax1.scatter(data[:, 0], data[:, 1], alpha=0.5)
mean = np.mean(data, axis=0)

# Draw principal components
for i in range(2):
    eigvec = eigenvectors[:, i]
    eigval = eigenvalues[i]
    
    # Scale eigenvector by sqrt of eigenvalue
    pc = eigvec * np.sqrt(eigval) * 2
    
    ax1.arrow(mean[0], mean[1], pc[0], pc[1], 
              head_width=0.2, head_length=0.1, 
              fc=f'C{i}', ec=f'C{i}', linewidth=2,
              label=f'PC{i+1}')

ax1.set_xlabel('Feature 1')
ax1.set_ylabel('Feature 2')
ax1.set_title('Original Data with Principal Components')
ax1.legend()
ax1.grid(True)

# Transform data to PC space
data_pca = data_centered @ eigenvectors
ax2.scatter(data_pca[:, 0], data_pca[:, 1], alpha=0.5)
ax2.set_xlabel('First Principal Component')
ax2.set_ylabel('Second Principal Component')
ax2.set_title('Data in PCA Space')
ax2.grid(True)

plt.tight_layout()
plt.show()

---

## 📌 Section 4: Calculus Basics - Understanding Change

### 🎯 Why Calculus in ML?

- **Gradient Descent** - How models learn
- **Backpropagation** - How neural networks train
- **Optimization** - Finding the best parameters

In [None]:
# 4.1 Derivatives - Rate of Change
print("📈 DERIVATIVES INTUITION\n" + "="*40)

# Function: f(x) = x^2
def f(x):
    return x**2

# Derivative: f'(x) = 2x
def f_derivative(x):
    return 2*x

# Visualize
x = np.linspace(-3, 3, 100)
y = f(x)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# Plot function
ax1.plot(x, y, 'b-', linewidth=2, label='f(x) = x²')

# Show tangent lines at different points
points = [-2, 0, 2]
colors = ['red', 'green', 'orange']

for point, color in zip(points, colors):
    # Calculate tangent line
    slope = f_derivative(point)
    y_point = f(point)
    
    # Tangent line equation: y - y0 = m(x - x0)
    x_tangent = np.linspace(point-1, point+1, 50)
    y_tangent = slope * (x_tangent - point) + y_point
    
    ax1.plot(x_tangent, y_tangent, color=color, linestyle='--', alpha=0.7,
            label=f'Tangent at x={point}, slope={slope}')
    ax1.plot(point, y_point, 'o', color=color, markersize=8)

ax1.set_xlabel('x')
ax1.set_ylabel('f(x)')
ax1.set_title('Function and Tangent Lines')
ax1.legend()
ax1.grid(True)

# Plot derivative
y_derivative = f_derivative(x)
ax2.plot(x, y_derivative, 'r-', linewidth=2, label="f'(x) = 2x")
ax2.axhline(y=0, color='black', linewidth=0.5)
ax2.axvline(x=0, color='black', linewidth=0.5)
ax2.set_xlabel('x')
ax2.set_ylabel("f'(x)")
ax2.set_title('Derivative (Slope at Each Point)')
ax2.legend()
ax2.grid(True)

plt.tight_layout()
plt.show()

print("\n💡 Key Insights:")
print("• Derivative tells us the slope at any point")
print("• Positive derivative = function increasing")
print("• Negative derivative = function decreasing")
print("• Zero derivative = minimum or maximum")

In [None]:
# 4.2 Gradient Descent Visualization
print("🎯 GRADIENT DESCENT IN ACTION\n" + "="*40)

# Cost function: J(θ) = (θ - 3)^2
def cost_function(theta):
    return (theta - 3)**2

def gradient(theta):
    return 2 * (theta - 3)

# Gradient descent
learning_rate = 0.1
theta = 0  # Starting point
history = [theta]
costs = [cost_function(theta)]

# Run gradient descent
for i in range(20):
    grad = gradient(theta)
    theta = theta - learning_rate * grad
    history.append(theta)
    costs.append(cost_function(theta))

# Visualize
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# Cost function and path
theta_range = np.linspace(-1, 6, 100)
cost_range = cost_function(theta_range)

ax1.plot(theta_range, cost_range, 'b-', linewidth=2, label='Cost Function')
ax1.plot(history, [cost_function(h) for h in history], 'ro-', 
         markersize=5, linewidth=1, alpha=0.6, label='Gradient Descent Path')
ax1.plot(3, 0, 'g*', markersize=15, label='Minimum (θ=3)')
ax1.set_xlabel('θ (Parameter)')
ax1.set_ylabel('Cost J(θ)')
ax1.set_title('Gradient Descent Finding Minimum')
ax1.legend()
ax1.grid(True)

# Convergence
ax2.plot(costs, 'b-', linewidth=2)
ax2.set_xlabel('Iteration')
ax2.set_ylabel('Cost')
ax2.set_title('Cost Decreasing Over Iterations')
ax2.grid(True)

plt.tight_layout()
plt.show()

print(f"\n📊 Results:")
print(f"Starting θ: {history[0]:.3f}")
print(f"Final θ: {history[-1]:.3f}")
print(f"True minimum: θ = 3")
print(f"Final cost: {costs[-1]:.6f}")

In [None]:
# 4.3 Partial Derivatives - Multiple Variables
print("🔄 PARTIAL DERIVATIVES\n" + "="*40)

# Function: f(x,y) = x^2 + y^2
def f_2d(x, y):
    return x**2 + y**2

# Partial derivatives
def df_dx(x, y):
    return 2*x

def df_dy(x, y):
    return 2*y

# Create mesh
x = np.linspace(-3, 3, 50)
y = np.linspace(-3, 3, 50)
X, Y = np.meshgrid(x, y)
Z = f_2d(X, Y)

# Visualize
fig = plt.figure(figsize=(15, 5))

# 3D surface
ax1 = fig.add_subplot(131, projection='3d')
surf = ax1.plot_surface(X, Y, Z, cmap='viridis', alpha=0.8)
ax1.set_xlabel('X')
ax1.set_ylabel('Y')
ax1.set_zlabel('f(x,y)')
ax1.set_title('f(x,y) = x² + y²')

# Contour plot
ax2 = fig.add_subplot(132)
contour = ax2.contour(X, Y, Z, levels=20)
ax2.clabel(contour, inline=True, fontsize=8)
ax2.set_xlabel('X')
ax2.set_ylabel('Y')
ax2.set_title('Contour Plot')

# Gradient field
ax3 = fig.add_subplot(133)
x_sparse = np.linspace(-3, 3, 10)
y_sparse = np.linspace(-3, 3, 10)
X_sparse, Y_sparse = np.meshgrid(x_sparse, y_sparse)
U = -df_dx(X_sparse, Y_sparse)  # Negative for descent
V = -df_dy(X_sparse, Y_sparse)

ax3.quiver(X_sparse, Y_sparse, U, V, alpha=0.6)
ax3.contour(X, Y, Z, levels=20, alpha=0.3)
ax3.set_xlabel('X')
ax3.set_ylabel('Y')
ax3.set_title('Gradient Field (Arrows Point Downhill)')

plt.tight_layout()
plt.show()

print("\n💡 Key Insight:")
print("The gradient points in the direction of steepest increase.")
print("For gradient descent, we move in the opposite direction!")

---

## 📌 Section 5: Distance Metrics - Measuring Similarity

### 🎯 Essential for:
- **KNN** - Finding nearest neighbors
- **Clustering** - Grouping similar points
- **Anomaly Detection** - Finding outliers

In [None]:
# 5.1 Different Distance Metrics
print("📏 DISTANCE METRICS\n" + "="*40)

# Two points
p1 = np.array([1, 2, 3])
p2 = np.array([4, 6, 8])

# Euclidean distance (L2 norm)
euclidean = np.linalg.norm(p2 - p1)
print(f"Points: p1 = {p1}, p2 = {p2}")
print(f"\n1. Euclidean Distance: {euclidean:.3f}")
print(f"   Formula: √[(x₂-x₁)² + (y₂-y₁)² + (z₂-z₁)²]")

# Manhattan distance (L1 norm)
manhattan = np.sum(np.abs(p2 - p1))
print(f"\n2. Manhattan Distance: {manhattan:.3f}")
print(f"   Formula: |x₂-x₁| + |y₂-y₁| + |z₂-z₁|")

# Chebyshev distance (L∞ norm)
chebyshev = np.max(np.abs(p2 - p1))
print(f"\n3. Chebyshev Distance: {chebyshev:.3f}")
print(f"   Formula: max(|x₂-x₁|, |y₂-y₁|, |z₂-z₁|)")

# Cosine distance
cosine_sim = np.dot(p1, p2) / (np.linalg.norm(p1) * np.linalg.norm(p2))
cosine_dist = 1 - cosine_sim
print(f"\n4. Cosine Distance: {cosine_dist:.3f}")
print(f"   Measures angle between vectors, not magnitude")

# Visualize in 2D
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Points for visualization
origin = np.array([0, 0])
point = np.array([3, 4])

for ax, title, dist_type in zip(axes, 
                                ['Euclidean', 'Manhattan', 'Chebyshev'],
                                ['euclidean', 'manhattan', 'chebyshev']):
    
    # Draw grid
    ax.grid(True, alpha=0.3)
    ax.set_xlim(-1, 5)
    ax.set_ylim(-1, 5)
    ax.set_aspect('equal')
    
    # Plot points
    ax.plot(0, 0, 'ro', markersize=10, label='Origin')
    ax.plot(3, 4, 'bo', markersize=10, label='Point')
    
    if dist_type == 'euclidean':
        # Straight line
        ax.plot([0, 3], [0, 4], 'g-', linewidth=2, label=f'Distance: {5:.1f}')
        
    elif dist_type == 'manhattan':
        # L-shaped path
        ax.plot([0, 3, 3], [0, 0, 4], 'g-', linewidth=2, label=f'Distance: {7:.1f}')
        
    elif dist_type == 'chebyshev':
        # Show max dimension
        ax.plot([0, 3], [0, 0], 'r--', alpha=0.5)
        ax.plot([3, 3], [0, 4], 'g-', linewidth=3, label=f'Distance: {4:.1f}')
    
    ax.set_title(f'{title} Distance')
    ax.set_xlabel('X')
    ax.set_ylabel('Y')
    ax.legend()

plt.tight_layout()
plt.show()

In [None]:
# 5.2 KNN Example with Distance Metrics
print("🎯 K-NEAREST NEIGHBORS EXAMPLE\n" + "="*40)

# Generate sample data
np.random.seed(42)
n_samples = 30

# Class 0: bottom-left
class0 = np.random.randn(n_samples//2, 2) + [2, 2]
# Class 1: top-right
class1 = np.random.randn(n_samples//2, 2) + [6, 6]

X = np.vstack([class0, class1])
y = np.array([0] * (n_samples//2) + [1] * (n_samples//2))

# New point to classify
new_point = np.array([4, 4])

# Calculate distances
distances = np.linalg.norm(X - new_point, axis=1)

# Find k nearest neighbors
k = 5
nearest_indices = np.argsort(distances)[:k]
nearest_classes = y[nearest_indices]

# Predict class (majority vote)
prediction = int(np.mean(nearest_classes) > 0.5)

# Visualize
plt.figure(figsize=(10, 8))

# Plot training data
colors = ['blue', 'red']
for i in range(2):
    mask = y == i
    plt.scatter(X[mask, 0], X[mask, 1], c=colors[i], 
               label=f'Class {i}', alpha=0.6, s=50)

# Plot new point
plt.scatter(new_point[0], new_point[1], c='green', 
           marker='*', s=500, label='New Point', edgecolor='black', linewidth=2)

# Highlight nearest neighbors
plt.scatter(X[nearest_indices, 0], X[nearest_indices, 1], 
           facecolors='none', edgecolors='green', s=200, linewidth=2,
           label=f'K={k} Nearest')

# Draw connections to nearest neighbors
for idx in nearest_indices:
    plt.plot([new_point[0], X[idx, 0]], [new_point[1], X[idx, 1]], 
            'g--', alpha=0.3)

plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title(f'KNN Classification (K={k})\nPrediction: Class {prediction}')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

print(f"\n📊 Results:")
print(f"Nearest {k} neighbors belong to classes: {nearest_classes}")
print(f"Prediction for new point: Class {prediction}")

---

## 📌 Section 6: Dimensionality - The Curse and The Blessing

### 🎯 Understanding High-Dimensional Spaces

In [None]:
# 6.1 The Curse of Dimensionality
print("📊 CURSE OF DIMENSIONALITY\n" + "="*40)

# Volume of hypersphere vs hypercube
dimensions = range(1, 21)
volume_ratios = []

for d in dimensions:
    # Volume of unit hypersphere / Volume of unit hypercube
    if d == 1:
        ratio = 2  # Line segment
    elif d == 2:
        ratio = np.pi / 4  # Circle/Square
    elif d == 3:
        ratio = (4/3 * np.pi) / 8  # Sphere/Cube
    else:
        # General formula
        ratio = (np.pi ** (d/2)) / (2**d * np.math.gamma(d/2 + 1))
    
    volume_ratios.append(ratio)

# Visualize
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# Volume ratio
ax1.plot(dimensions, volume_ratios, 'b-', linewidth=2)
ax1.set_xlabel('Dimensions')
ax1.set_ylabel('Volume Ratio (Sphere/Cube)')
ax1.set_title('Sphere Volume Vanishes in High Dimensions')
ax1.grid(True)
ax1.set_yscale('log')

# Distance concentration
np.random.seed(42)
dims = [2, 5, 10, 20, 50, 100]
distance_stds = []

for d in dims:
    # Generate random points
    points = np.random.randn(100, d)
    
    # Calculate all pairwise distances
    distances = []
    for i in range(len(points)):
        for j in range(i+1, len(points)):
            dist = np.linalg.norm(points[i] - points[j])
            distances.append(dist)
    
    # Calculate coefficient of variation
    cv = np.std(distances) / np.mean(distances)
    distance_stds.append(cv)

ax2.plot(dims, distance_stds, 'r-', linewidth=2, marker='o')
ax2.set_xlabel('Dimensions')
ax2.set_ylabel('Distance Variation (CV)')
ax2.set_title('Distances Become Similar in High Dimensions')
ax2.grid(True)

plt.tight_layout()
plt.show()

print("\n💡 Key Insights:")
print("1. In high dimensions, most of the volume is near the surface")
print("2. All points become approximately equidistant")
print("3. This makes distance-based algorithms challenging")
print("4. Solution: Dimensionality reduction (PCA, t-SNE, etc.)")

---

## 🎯 Section 7: Practical ML Applications

### Connecting Math to Machine Learning

In [None]:
# 7.1 Linear Regression from Scratch
print("📈 LINEAR REGRESSION USING LINEAR ALGEBRA\n" + "="*40)

# Generate sample data
np.random.seed(42)
n_samples = 100
X = np.random.randn(n_samples, 1) * 2 + 5
true_slope = 3
true_intercept = 10
y = true_slope * X + true_intercept + np.random.randn(n_samples, 1) * 2

# Add intercept term to X
X_with_intercept = np.hstack([np.ones((n_samples, 1)), X])

# Normal equation: θ = (X'X)^(-1)X'y
theta = np.linalg.inv(X_with_intercept.T @ X_with_intercept) @ X_with_intercept.T @ y

print("Using Normal Equation:")
print(f"Estimated intercept: {theta[0, 0]:.3f} (True: {true_intercept})")
print(f"Estimated slope: {theta[1, 0]:.3f} (True: {true_slope})")

# Predictions
y_pred = X_with_intercept @ theta

# Visualize
plt.figure(figsize=(10, 6))
plt.scatter(X, y, alpha=0.6, label='Data')
plt.plot(X, y_pred, 'r-', linewidth=2, label='Fitted Line')
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Linear Regression using Linear Algebra')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

# Calculate R-squared
ss_res = np.sum((y - y_pred)**2)
ss_tot = np.sum((y - np.mean(y))**2)
r_squared = 1 - (ss_res / ss_tot)

print(f"\nR² Score: {r_squared:.3f}")

In [None]:
# 7.2 Logistic Regression Sigmoid
print("📊 LOGISTIC REGRESSION MATHEMATICS\n" + "="*40)

# Sigmoid function
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

# Derivative of sigmoid
def sigmoid_derivative(z):
    s = sigmoid(z)
    return s * (1 - s)

# Visualize
z = np.linspace(-10, 10, 100)
y_sigmoid = sigmoid(z)
y_derivative = sigmoid_derivative(z)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

# Sigmoid function
ax1.plot(z, y_sigmoid, 'b-', linewidth=2)
ax1.axhline(y=0.5, color='red', linestyle='--', alpha=0.5)
ax1.axvline(x=0, color='black', linewidth=0.5)
ax1.set_xlabel('z')
ax1.set_ylabel('σ(z)')
ax1.set_title('Sigmoid Function')
ax1.grid(True, alpha=0.3)
ax1.set_ylim(-0.1, 1.1)

# Sigmoid derivative
ax2.plot(z, y_derivative, 'r-', linewidth=2)
ax2.axvline(x=0, color='black', linewidth=0.5)
ax2.set_xlabel('z')
ax2.set_ylabel("σ'(z)")
ax2.set_title('Sigmoid Derivative')
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n💡 Key Properties:")
print(f"• σ(0) = {sigmoid(0):.3f}")
print(f"• σ(∞) → 1")
print(f"• σ(-∞) → 0")
print(f"• Maximum derivative at z=0: {sigmoid_derivative(0):.3f}")

---

## 🏆 Final Project: Neural Network Math

### Build a Simple Neural Network from Scratch

In [None]:
# Final Project: Neural Network Mathematics
print("🧠 NEURAL NETWORK FROM SCRATCH\n" + "="*50)

class SimpleNeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        # Initialize weights randomly
        self.W1 = np.random.randn(input_size, hidden_size) * 0.5
        self.b1 = np.zeros((1, hidden_size))
        self.W2 = np.random.randn(hidden_size, output_size) * 0.5
        self.b2 = np.zeros((1, output_size))
        
    def sigmoid(self, z):
        return 1 / (1 + np.exp(-z))
    
    def sigmoid_derivative(self, z):
        return z * (1 - z)
    
    def forward(self, X):
        # Forward propagation
        self.z1 = X @ self.W1 + self.b1
        self.a1 = self.sigmoid(self.z1)
        self.z2 = self.a1 @ self.W2 + self.b2
        self.a2 = self.sigmoid(self.z2)
        return self.a2
    
    def backward(self, X, y, output, learning_rate=0.1):
        # Backward propagation
        m = X.shape[0]
        
        # Output layer gradients
        self.dz2 = output - y
        self.dW2 = (self.a1.T @ self.dz2) / m
        self.db2 = np.sum(self.dz2, axis=0, keepdims=True) / m
        
        # Hidden layer gradients
        self.da1 = self.dz2 @ self.W2.T
        self.dz1 = self.da1 * self.sigmoid_derivative(self.a1)
        self.dW1 = (X.T @ self.dz1) / m
        self.db1 = np.sum(self.dz1, axis=0, keepdims=True) / m
        
        # Update weights
        self.W2 -= learning_rate * self.dW2
        self.b2 -= learning_rate * self.db2
        self.W1 -= learning_rate * self.dW1
        self.b1 -= learning_rate * self.db1

# XOR problem
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])

# Create and train network
nn = SimpleNeuralNetwork(2, 4, 1)
losses = []

print("Training XOR Network...")
for epoch in range(5000):
    # Forward pass
    output = nn.forward(X)
    
    # Calculate loss
    loss = np.mean((output - y) ** 2)
    losses.append(loss)
    
    # Backward pass
    nn.backward(X, y, output)
    
    if epoch % 1000 == 0:
        print(f"Epoch {epoch}, Loss: {loss:.4f}")

# Final predictions
final_output = nn.forward(X)

print("\n📊 Results:")
print("Input\t\tTarget\tPrediction")
for i in range(len(X)):
    print(f"{X[i]}\t{y[i, 0]}\t{final_output[i, 0]:.3f}")

# Visualize training
plt.figure(figsize=(12, 5))

# Loss curve
plt.subplot(1, 2, 1)
plt.plot(losses)
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training Loss')
plt.grid(True)

# Decision boundary
plt.subplot(1, 2, 2)
x_min, x_max = -0.5, 1.5
y_min, y_max = -0.5, 1.5
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
                     np.linspace(y_min, y_max, 100))
Z = nn.forward(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.contourf(xx, yy, Z, levels=[0, 0.5, 1], colors=['blue', 'red'], alpha=0.3)
plt.scatter(X[:, 0], X[:, 1], c=y.ravel(), cmap='RdBu', s=100, edgecolor='black')
plt.xlabel('Input 1')
plt.ylabel('Input 2')
plt.title('XOR Decision Boundary')
plt.grid(True)

plt.tight_layout()
plt.show()

print("\n✅ Network successfully learned XOR!")

---

## 🎯 Summary & Next Steps

### 🏆 What You've Learned:

✅ **Linear Algebra**
- Vectors and operations
- Matrices and transformations
- Eigenvalues/eigenvectors
- Applications in ML

✅ **Calculus**
- Derivatives and optimization
- Gradient descent
- Partial derivatives
- Backpropagation intuition

✅ **Key Concepts**
- Distance metrics
- Dimensionality
- Mathematical foundations of ML

### 🚀 Next Steps:

1. **Practice with Code** - Implement algorithms from scratch
2. **Learn Statistics** - Next notebook!
3. **Apply to ML** - Use math to understand algorithms
4. **Deep Learning Math** - More complex architectures

### 💡 Key Takeaways:

- **Math is a tool**, not a barrier
- **Visualize everything** to build intuition
- **Code makes math concrete**
- **You don't need to be a mathematician** to do data science

### 📚 Resources:

- 3Blue1Brown YouTube Channel
- Khan Academy Linear Algebra
- Fast.ai Computational Linear Algebra
- Andrew Ng's ML Course

---

## 🎉 Congratulations!

You now understand the essential math for data science!

Remember: **Every formula has a purpose, every concept has an application.**

**Keep learning, keep coding, and keep building!** 🚀

In [None]:
# 🎊 Chapter Complete!
print("🎊" * 20)
print("\n    🏆 MATH FOR DATA SCIENCE COMPLETE! 🏆")
print("\n    You've conquered:")
print("    ✅ Vectors & Matrices")
print("    ✅ Linear Algebra")
print("    ✅ Calculus Basics")
print("    ✅ ML Mathematics")
print("\n    Ready for: Statistics for Data Science!")
print("\n" + "🎊" * 20)