# 🏔️ Multivariable Gradients: Navigating Optimization Landscapes

> *"In the mountains of machine learning, gradients are your compass, pointing toward the summit of optimal solutions."*

Welcome to the world of **multivariable calculus** - where we explore functions of many variables and their optimization landscapes! This is where AI algorithms learn to navigate complex parameter spaces.

## 🎯 What You'll Master

- **Partial derivatives**: Understanding how functions change in each direction
- **Gradient vectors**: The multivariable generalization of derivatives
- **3D surface visualization**: Seeing optimization landscapes in action
- **Hessian matrices**: Second-order information for advanced optimization

---

In [None]:
# Essential imports for multivariable calculus visualization
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sympy as sp
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
import ipywidgets as widgets
from ipywidgets import interact, interactive, fixed
from matplotlib.animation import FuncAnimation
from IPython.display import HTML, display, Latex
from scipy.optimize import minimize
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px

# Set up beautiful plotting
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
%matplotlib inline
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 11

# Configure SymPy for pretty printing
sp.init_printing(use_latex=True)

print("🏔️ Multivariable calculus laboratory initialized!")
print("Ready to explore optimization landscapes and gradient flows...")

---

# 📐 Chapter 1: Partial Derivatives and Gradients

## From Single to Multiple Variables

For a function $f(x, y)$, **partial derivatives** measure how the function changes when we vary one variable while holding others constant:

$$\frac{\partial f}{\partial x} = \lim_{h \to 0} \frac{f(x+h, y) - f(x, y)}{h}$$

The **gradient** combines all partial derivatives into a vector:

$$\nabla f = \begin{bmatrix} \frac{\partial f}{\partial x} \\ \frac{\partial f}{\partial y} \end{bmatrix}$$

**Key Properties**:
- Points in direction of steepest increase
- Magnitude indicates rate of change
- Perpendicular to level curves

Let's explore this with interactive examples!

In [None]:
def explore_partial_derivatives_with_sympy():
    """
    Explore partial derivatives using symbolic computation
    """
    # Define symbolic variables
    x, y = sp.symbols('x y')
    
    print("📐 Partial Derivatives with SymPy")
    print("=" * 40)
    
    # Example functions commonly seen in machine learning
    functions = {
        'Quadratic Bowl': x**2 + y**2,
        'Saddle Point': x**2 - y**2,
        'Rosenbrock Function': (1 - x)**2 + 100*(y - x**2)**2,
        'Neural Network Loss': (x**2 + y**2)/2 + sp.log(1 + sp.exp(-x*y)),
        'Himmelblau Function': (x**2 + y - 11)**2 + (x + y**2 - 7)**2
    }
    
    results = {}
    
    for name, func in functions.items():
        print(f"\n🔍 {name}:")
        print(f"f(x,y) = {func}")
        
        # Compute partial derivatives
        df_dx = sp.diff(func, x)
        df_dy = sp.diff(func, y)
        
        print(f"∂f/∂x = {df_dx}")
        print(f"∂f/∂y = {df_dy}")
        
        # Find critical points (where gradient = 0)
        critical_points = sp.solve([df_dx, df_dy], [x, y])
        print(f"Critical points: {critical_points}")
        
        # Compute Hessian matrix (second derivatives)
        d2f_dx2 = sp.diff(func, x, 2)
        d2f_dy2 = sp.diff(func, y, 2)
        d2f_dxdy = sp.diff(func, x, y)
        
        hessian = sp.Matrix([[d2f_dx2, d2f_dxdy], [d2f_dxdy, d2f_dy2]])
        print(f"Hessian matrix:\n{hessian}")
        
        # Evaluate at a test point
        test_point = {x: 1, y: 1}
        try:
            grad_at_point = [float(df_dx.subs(test_point)), float(df_dy.subs(test_point))]
            print(f"Gradient at (1,1): [{grad_at_point[0]:.3f}, {grad_at_point[1]:.3f}]")
        except:
            print(f"Gradient at (1,1): Cannot evaluate (complex expression)")
        
        results[name] = {
            'function': func,
            'gradient': [df_dx, df_dy],
            'hessian': hessian,
            'critical_points': critical_points
        }
    
    return results

# Explore partial derivatives
symbolic_results = explore_partial_derivatives_with_sympy()

## 🎮 Interactive 3D Surface Explorer

Let's create an interactive visualization to explore different function surfaces and their gradients:

In [None]:
def interactive_3d_surface_explorer(function_type='quadratic', resolution=50, show_gradients=True):
    """
    Interactive 3D surface visualization with gradients
    """
    # Create meshgrid
    x = np.linspace(-3, 3, resolution)
    y = np.linspace(-3, 3, resolution)
    X, Y = np.meshgrid(x, y)
    
    # Define different functions
    if function_type == 'quadratic':
        Z = X**2 + Y**2
        dZ_dx = 2*X
        dZ_dy = 2*Y
        title = "f(x,y) = x² + y² (Convex Bowl)"
        
    elif function_type == 'saddle':
        Z = X**2 - Y**2
        dZ_dx = 2*X
        dZ_dy = -2*Y
        title = "f(x,y) = x² - y² (Saddle Point)"
        
    elif function_type == 'rosenbrock':
        Z = (1 - X)**2 + 100*(Y - X**2)**2
        dZ_dx = -2*(1 - X) - 400*X*(Y - X**2)
        dZ_dy = 200*(Y - X**2)
        title = "f(x,y) = (1-x)² + 100(y-x²)² (Rosenbrock)"
        
    elif function_type == 'peaks':
        Z = 3*(1-X)**2 * np.exp(-(X**2) - (Y+1)**2) - 10*(X/5 - X**3 - Y**5) * np.exp(-X**2-Y**2) - 1/3*np.exp(-(X+1)**2 - Y**2)
        # Numerical gradients for complex function
        dZ_dx = np.gradient(Z, axis=1) / (x[1] - x[0])
        dZ_dy = np.gradient(Z, axis=0) / (y[1] - y[0])
        title = "f(x,y) = Peaks Function (Complex Landscape)"
        
    elif function_type == 'himmelblau':
        Z = (X**2 + Y - 11)**2 + (X + Y**2 - 7)**2
        dZ_dx = 2*(X**2 + Y - 11)*2*X + 2*(X + Y**2 - 7)
        dZ_dy = 2*(X**2 + Y - 11) + 2*(X + Y**2 - 7)*2*Y
        title = "f(x,y) = (x²+y-11)² + (x+y²-7)² (Himmelblau)"
    
    # Create subplots
    fig = plt.figure(figsize=(20, 15))
    
    # 3D surface plot
    ax1 = fig.add_subplot(2, 3, 1, projection='3d')
    surf = ax1.plot_surface(X, Y, Z, cmap='viridis', alpha=0.8, 
                           linewidth=0, antialiased=True)
    ax1.set_title(title, fontsize=12, weight='bold')
    ax1.set_xlabel('x')
    ax1.set_ylabel('y')
    ax1.set_zlabel('f(x,y)')
    
    # Contour plot with gradient vectors
    ax2 = fig.add_subplot(2, 3, 2)
    contour = ax2.contour(X, Y, Z, levels=20, colors='black', alpha=0.4, linewidths=0.5)
    ax2.contourf(X, Y, Z, levels=20, cmap='viridis', alpha=0.7)
    
    if show_gradients:
        # Sample gradient vectors (reduce density for clarity)
        step = max(1, resolution // 15)
        ax2.quiver(X[::step, ::step], Y[::step, ::step], 
                  dZ_dx[::step, ::step], dZ_dy[::step, ::step], 
                  color='red', alpha=0.8, scale=50, width=0.003)
    
    ax2.set_title('Contour + Gradient Vectors', fontsize=12, weight='bold')
    ax2.set_xlabel('x')
    ax2.set_ylabel('y')
    ax2.set_aspect('equal')
    
    # Gradient magnitude
    ax3 = fig.add_subplot(2, 3, 3)
    grad_magnitude = np.sqrt(dZ_dx**2 + dZ_dy**2)
    im = ax3.imshow(grad_magnitude, extent=[-3, 3, -3, 3], origin='lower', cmap='plasma')
    ax3.set_title('Gradient Magnitude ||∇f||', fontsize=12, weight='bold')
    ax3.set_xlabel('x')
    ax3.set_ylabel('y')
    plt.colorbar(im, ax=ax3)
    
    # Partial derivative ∂f/∂x
    ax4 = fig.add_subplot(2, 3, 4)
    im1 = ax4.imshow(dZ_dx, extent=[-3, 3, -3, 3], origin='lower', cmap='RdBu')
    ax4.set_title('Partial Derivative ∂f/∂x', fontsize=12, weight='bold')
    ax4.set_xlabel('x')
    ax4.set_ylabel('y')
    plt.colorbar(im1, ax=ax4)
    
    # Partial derivative ∂f/∂y
    ax5 = fig.add_subplot(2, 3, 5)
    im2 = ax5.imshow(dZ_dy, extent=[-3, 3, -3, 3], origin='lower', cmap='RdBu')
    ax5.set_title('Partial Derivative ∂f/∂y', fontsize=12, weight='bold')
    ax5.set_xlabel('x')
    ax5.set_ylabel('y')
    plt.colorbar(im2, ax=ax5)
    
    # Cross-section analysis
    ax6 = fig.add_subplot(2, 3, 6)
    center_idx = resolution // 2
    ax6.plot(x, Z[center_idx, :], 'b-', linewidth=2, label=f'f(x, y=0)')
    ax6.plot(y, Z[:, center_idx], 'r-', linewidth=2, label=f'f(x=0, y)')
    ax6.set_title('Cross-sections through Origin', fontsize=12, weight='bold')
    ax6.set_xlabel('Variable Value')
    ax6.set_ylabel('Function Value')
    ax6.legend()
    ax6.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Analysis
    min_idx = np.unravel_index(np.argmin(Z), Z.shape)
    max_idx = np.unravel_index(np.argmax(Z), Z.shape)
    
    print(f"\n📊 Function Analysis:")
    print(f"Function: {title}")
    print(f"Minimum value: {Z[min_idx]:.3f} at ({X[min_idx]:.2f}, {Y[min_idx]:.2f})")
    print(f"Maximum value: {Z[max_idx]:.3f} at ({X[max_idx]:.2f}, {Y[max_idx]:.2f})")
    print(f"Average gradient magnitude: {np.mean(grad_magnitude):.3f}")
    print(f"Max gradient magnitude: {np.max(grad_magnitude):.3f}")
    
    # Find approximate critical points
    grad_threshold = np.percentile(grad_magnitude, 5)  # Bottom 5% of gradient magnitudes
    critical_candidates = grad_magnitude < grad_threshold
    if np.any(critical_candidates):
        crit_indices = np.where(critical_candidates)
        print(f"\n🎯 Approximate critical points (low gradient regions):")
        for i in range(min(5, len(crit_indices[0]))):
            cx, cy = X[crit_indices[0][i], crit_indices[1][i]], Y[crit_indices[0][i], crit_indices[1][i]]
            cz = Z[crit_indices[0][i], crit_indices[1][i]]
            print(f"  ({cx:.2f}, {cy:.2f}) → f = {cz:.3f}")

# Create interactive widget
print("🎮 Interactive 3D Surface Explorer")
print("Explore different optimization landscapes:")

interact(interactive_3d_surface_explorer,
         function_type=widgets.Dropdown(
             options=['quadratic', 'saddle', 'rosenbrock', 'peaks', 'himmelblau'],
             value='quadratic',
             description='Function:'
         ),
         resolution=widgets.IntSlider(
             value=50,
             min=20,
             max=100,
             step=10,
             description='Resolution:'
         ),
         show_gradients=widgets.Checkbox(
             value=True,
             description='Show Gradients'
         ));

---

# ⛰️ Chapter 2: Gradient Descent on Complex Landscapes

## Visualizing Optimization Algorithms

Let's see how different optimization algorithms navigate complex multivariable landscapes:

In [None]:
def simulate_optimization_algorithms():
    """
    Simulate different optimization algorithms on various landscapes
    """
    print("⛰️ Optimization Algorithm Comparison")
    print("=" * 40)
    
    # Define test functions
    def rosenbrock(x, y):
        return (1 - x)**2 + 100*(y - x**2)**2
    
    def rosenbrock_grad(x, y):
        dx = -2*(1 - x) - 400*x*(y - x**2)
        dy = 200*(y - x**2)
        return np.array([dx, dy])
    
    def himmelblau(x, y):
        return (x**2 + y - 11)**2 + (x + y**2 - 7)**2
    
    def himmelblau_grad(x, y):
        dx = 2*(x**2 + y - 11)*2*x + 2*(x + y**2 - 7)
        dy = 2*(x**2 + y - 11) + 2*(x + y**2 - 7)*2*y
        return np.array([dx, dy])
    
    # Optimization algorithms
    def gradient_descent(func, grad_func, start, lr=0.01, max_iter=1000):
        path = [start.copy()]
        current = start.copy()
        
        for i in range(max_iter):
            gradient = grad_func(current[0], current[1])
            current = current - lr * gradient
            path.append(current.copy())
            
            if np.linalg.norm(gradient) < 1e-6:
                break
        
        return np.array(path)
    
    def momentum_gd(func, grad_func, start, lr=0.01, momentum=0.9, max_iter=1000):
        path = [start.copy()]
        current = start.copy()
        velocity = np.zeros_like(start)
        
        for i in range(max_iter):
            gradient = grad_func(current[0], current[1])
            velocity = momentum * velocity - lr * gradient
            current = current + velocity
            path.append(current.copy())
            
            if np.linalg.norm(gradient) < 1e-6:
                break
        
        return np.array(path)
    
    def adam_optimizer(func, grad_func, start, lr=0.01, beta1=0.9, beta2=0.999, 
                      epsilon=1e-8, max_iter=1000):
        path = [start.copy()]
        current = start.copy()
        m = np.zeros_like(start)  # First moment
        v = np.zeros_like(start)  # Second moment
        
        for i in range(max_iter):
            gradient = grad_func(current[0], current[1])
            
            # Update biased first moment estimate
            m = beta1 * m + (1 - beta1) * gradient
            
            # Update biased second raw moment estimate
            v = beta2 * v + (1 - beta2) * (gradient**2)
            
            # Compute bias-corrected first moment estimate
            m_hat = m / (1 - beta1**(i + 1))
            
            # Compute bias-corrected second raw moment estimate
            v_hat = v / (1 - beta2**(i + 1))
            
            # Update parameters
            current = current - lr * m_hat / (np.sqrt(v_hat) + epsilon)
            path.append(current.copy())
            
            if np.linalg.norm(gradient) < 1e-6:
                break
        
        return np.array(path)
    
    # Test configurations
    test_functions = {
        'Rosenbrock': (rosenbrock, rosenbrock_grad),
        'Himmelblau': (himmelblau, himmelblau_grad)
    }
    
    optimizers = {
        'Gradient Descent': gradient_descent,
        'Momentum': momentum_gd,
        'Adam': adam_optimizer
    }
    
    # Run experiments
    results = {}
    
    for func_name, (func, grad_func) in test_functions.items():
        print(f"\n🎯 Testing on {func_name} Function:")
        results[func_name] = {}
        
        # Starting point
        if func_name == 'Rosenbrock':
            start = np.array([-1.5, 2.0])
        else:  # Himmelblau
            start = np.array([0.0, 0.0])
        
        for opt_name, optimizer in optimizers.items():
            print(f"  Running {opt_name}...")
            
            if opt_name == 'Gradient Descent':
                path = optimizer(func, grad_func, start, lr=0.001)
            elif opt_name == 'Momentum':
                path = optimizer(func, grad_func, start, lr=0.001, momentum=0.9)
            else:  # Adam
                path = optimizer(func, grad_func, start, lr=0.01)
            
            final_point = path[-1]
            final_value = func(final_point[0], final_point[1])
            iterations = len(path)
            
            results[func_name][opt_name] = {
                'path': path,
                'final_point': final_point,
                'final_value': final_value,
                'iterations': iterations
            }
            
            print(f"    Final point: ({final_point[0]:.4f}, {final_point[1]:.4f})")
            print(f"    Final value: {final_value:.6f}")
            print(f"    Iterations: {iterations}")
    
    return results, test_functions

# Run optimization comparison
optimization_results, test_funcs = simulate_optimization_algorithms()

## 📊 Visualizing Optimization Paths

Let's create comprehensive visualizations of how different optimizers navigate the landscape:

In [None]:
def visualize_optimization_paths(results, test_functions):
    """
    Visualize optimization paths for different algorithms
    """
    colors = {'Gradient Descent': 'red', 'Momentum': 'blue', 'Adam': 'green'}
    
    for func_name, (func, grad_func) in test_functions.items():
        fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(16, 12))
        
        # Create contour plot
        x = np.linspace(-2, 2, 100)
        y = np.linspace(-1, 3, 100)
        if func_name == 'Himmelblau':
            x = np.linspace(-5, 5, 100)
            y = np.linspace(-5, 5, 100)
        
        X, Y = np.meshgrid(x, y)
        Z = func(X, Y)
        
        # Main contour plot with all paths
        contour = ax1.contour(X, Y, Z, levels=30, colors='gray', alpha=0.5, linewidths=0.5)
        ax1.contourf(X, Y, Z, levels=30, cmap='viridis', alpha=0.6)
        
        # Plot optimization paths
        for opt_name, data in results[func_name].items():
            path = data['path']
            ax1.plot(path[:, 0], path[:, 1], color=colors[opt_name], 
                    linewidth=2, alpha=0.8, label=opt_name)
            ax1.plot(path[0, 0], path[0, 1], 'ko', markersize=8, label='Start' if opt_name == 'Gradient Descent' else '')
            ax1.plot(path[-1, 0], path[-1, 1], 'k*', markersize=12, 
                    label='End' if opt_name == 'Gradient Descent' else '')
        
        ax1.set_title(f'{func_name} Function - Optimization Paths', fontsize=14, weight='bold')
        ax1.set_xlabel('x')
        ax1.set_ylabel('y')
        ax1.legend()
        ax1.grid(True, alpha=0.3)
        
        # Convergence curves
        for opt_name, data in results[func_name].items():
            path = data['path']
            function_values = [func(point[0], point[1]) for point in path]
            ax2.plot(function_values, color=colors[opt_name], 
                    linewidth=2, label=opt_name)
        
        ax2.set_title('Convergence Curves', fontsize=14, weight='bold')
        ax2.set_xlabel('Iteration')
        ax2.set_ylabel('Function Value')
        ax2.set_yscale('log')
        ax2.legend()
        ax2.grid(True, alpha=0.3)
        
        # Distance from minimum over time
        if func_name == 'Rosenbrock':
            true_minimum = np.array([1.0, 1.0])
        else:  # Himmelblau has multiple minima, use one of them
            true_minimum = np.array([3.0, 2.0])
        
        for opt_name, data in results[func_name].items():
            path = data['path']
            distances = [np.linalg.norm(point - true_minimum) for point in path]
            ax3.plot(distances, color=colors[opt_name], 
                    linewidth=2, label=opt_name)
        
        ax3.set_title('Distance from Minimum', fontsize=14, weight='bold')
        ax3.set_xlabel('Iteration')
        ax3.set_ylabel('Distance to Minimum')
        ax3.set_yscale('log')
        ax3.legend()
        ax3.grid(True, alpha=0.3)
        
        # Gradient magnitude over time
        for opt_name, data in results[func_name].items():
            path = data['path']
            grad_magnitudes = [np.linalg.norm(grad_func(point[0], point[1])) for point in path]
            ax4.plot(grad_magnitudes, color=colors[opt_name], 
                    linewidth=2, label=opt_name)
        
        ax4.set_title('Gradient Magnitude', fontsize=14, weight='bold')
        ax4.set_xlabel('Iteration')
        ax4.set_ylabel('||∇f||')
        ax4.set_yscale('log')
        ax4.legend()
        ax4.grid(True, alpha=0.3)
        
        plt.tight_layout()
        plt.show()
        
        # Print summary statistics
        print(f"\n📊 {func_name} Function Results:")
        print(f"{'Algorithm':<15} {'Iterations':<12} {'Final Value':<15} {'Distance to Min':<15}")
        print("-" * 70)
        
        for opt_name, data in results[func_name].items():
            final_distance = np.linalg.norm(data['final_point'] - true_minimum)
            print(f"{opt_name:<15} {data['iterations']:<12} {data['final_value']:<15.6f} {final_distance:<15.6f}")

# Visualize optimization paths
visualize_optimization_paths(optimization_results, test_funcs)

---

# 🧮 Chapter 3: Hessian Matrices and Second-Order Methods

## Understanding Curvature Information

The **Hessian matrix** contains second-order partial derivatives:

$$H = \begin{bmatrix}
\frac{\partial^2 f}{\partial x^2} & \frac{\partial^2 f}{\partial x \partial y} \\
\frac{\partial^2 f}{\partial y \partial x} & \frac{\partial^2 f}{\partial y^2}
\end{bmatrix}$$

**Key Properties**:
- Describes local curvature of the function
- Eigenvalues determine function behavior at critical points
- Used in Newton's method for faster convergence

Let's explore Hessian matrices and their applications:

In [None]:
def analyze_hessian_matrices():
    """
    Analyze Hessian matrices for different functions
    """
    print("🧮 Hessian Matrix Analysis")
    print("=" * 30)
    
    # Define test functions with analytical Hessians
    def quadratic_bowl(x, y):
        return x**2 + y**2
    
    def quadratic_bowl_hessian(x, y):
        return np.array([[2, 0], [0, 2]])
    
    def saddle_point(x, y):
        return x**2 - y**2
    
    def saddle_point_hessian(x, y):
        return np.array([[2, 0], [0, -2]])
    
    def elongated_bowl(x, y):
        return 5*x**2 + y**2
    
    def elongated_bowl_hessian(x, y):
        return np.array([[10, 0], [0, 2]])
    
    def rotated_ellipse(x, y):
        return (x + y)**2 + 2*(x - y)**2
    
    def rotated_ellipse_hessian(x, y):
        return np.array([[6, -2], [-2, 6]])
    
    test_cases = {
        'Quadratic Bowl': (quadratic_bowl, quadratic_bowl_hessian),
        'Saddle Point': (saddle_point, saddle_point_hessian),
        'Elongated Bowl': (elongated_bowl, elongated_bowl_hessian),
        'Rotated Ellipse': (rotated_ellipse, rotated_ellipse_hessian)
    }
    
    fig, axes = plt.subplots(2, 4, figsize=(20, 10))
    
    for i, (name, (func, hess_func)) in enumerate(test_cases.items()):
        # Create grid for visualization
        x = np.linspace(-3, 3, 100)
        y = np.linspace(-3, 3, 100)
        X, Y = np.meshgrid(x, y)
        Z = func(X, Y)
        
        # Plot function surface
        ax1 = axes[0, i]
        contour = ax1.contour(X, Y, Z, levels=20, colors='black', alpha=0.4, linewidths=0.5)
        ax1.contourf(X, Y, Z, levels=20, cmap='viridis', alpha=0.7)
        ax1.set_title(f'{name}\n{func.__name__.replace("_", " ").title()}', fontsize=12, weight='bold')
        ax1.set_xlabel('x')
        ax1.set_ylabel('y')
        ax1.set_aspect('equal')
        
        # Analyze Hessian at origin
        H = hess_func(0, 0)
        eigenvalues, eigenvectors = np.linalg.eig(H)
        determinant = np.linalg.det(H)
        trace = np.trace(H)
        
        # Plot eigenvalue visualization
        ax2 = axes[1, i]
        
        # Draw eigenvectors
        scale = 0.5
        for j, (eigenval, eigenvec) in enumerate(zip(eigenvalues, eigenvectors.T)):
            color = 'red' if eigenval > 0 else 'blue'
            ax2.arrow(0, 0, scale * eigenvec[0], scale * eigenvec[1], 
                     head_width=0.1, head_length=0.1, fc=color, ec=color,
                     linewidth=2, label=f'λ_{j+1}={eigenval:.1f}')
        
        ax2.set_xlim(-1, 1)
        ax2.set_ylim(-1, 1)
        ax2.set_aspect('equal')
        ax2.grid(True, alpha=0.3)
        ax2.legend()
        
        # Classify critical point
        if determinant > 0 and trace > 0:
            point_type = "Local Minimum"
        elif determinant > 0 and trace < 0:
            point_type = "Local Maximum"
        elif determinant < 0:
            point_type = "Saddle Point"
        else:
            point_type = "Degenerate"
        
        ax2.set_title(f'Hessian Analysis\n{point_type}', fontsize=12, weight='bold')
        
        # Print analysis
        print(f"\n🔍 {name}:")
        print(f"Hessian matrix:\n{H}")
        print(f"Eigenvalues: {eigenvalues}")
        print(f"Determinant: {determinant:.3f}")
        print(f"Trace: {trace:.3f}")
        print(f"Classification: {point_type}")
        
        if eigenvalues[0] != eigenvalues[1]:
            condition_number = max(eigenvalues) / min(eigenvalues) if min(eigenvalues) > 0 else float('inf')
            print(f"Condition number: {condition_number:.3f}")
    
    plt.tight_layout()
    plt.show()
    
    return test_cases

# Analyze Hessian matrices
hessian_cases = analyze_hessian_matrices()

## 🚀 Newton's Method vs Gradient Descent

Let's compare Newton's method (which uses Hessian information) with standard gradient descent:

In [None]:
def compare_newton_vs_gradient_descent():
    """
    Compare Newton's method with gradient descent
    """
    print("🚀 Newton's Method vs Gradient Descent")
    print("=" * 40)
    
    # Define a quadratic function for fair comparison
    def quadratic_func(x, y):
        return 2*x**2 + 3*y**2 + x*y - 4*x - 6*y + 10
    
    def quadratic_grad(x, y):
        dx = 4*x + y - 4
        dy = 6*y + x - 6
        return np.array([dx, dy])
    
    def quadratic_hessian(x, y):
        return np.array([[4, 1], [1, 6]])
    
    # Newton's method
    def newton_method(func, grad_func, hess_func, start, max_iter=20):
        path = [start.copy()]
        current = start.copy()
        
        for i in range(max_iter):
            gradient = grad_func(current[0], current[1])
            hessian = hess_func(current[0], current[1])
            
            # Newton's update: x_{k+1} = x_k - H^{-1} * ∇f
            try:
                hess_inv = np.linalg.inv(hessian)
                step = hess_inv @ gradient
                current = current - step
                path.append(current.copy())
                
                if np.linalg.norm(gradient) < 1e-10:
                    break
            except np.linalg.LinAlgError:
                print("Hessian is singular, stopping Newton's method")
                break
        
        return np.array(path)
    
    def gradient_descent_method(func, grad_func, start, lr=0.1, max_iter=100):
        path = [start.copy()]
        current = start.copy()
        
        for i in range(max_iter):
            gradient = grad_func(current[0], current[1])
            current = current - lr * gradient
            path.append(current.copy())
            
            if np.linalg.norm(gradient) < 1e-10:
                break
        
        return np.array(path)
    
    # Starting points
    start_points = [np.array([3.0, 3.0]), np.array([-2.0, 4.0]), np.array([5.0, -1.0])]
    
    # True minimum (solve ∇f = 0)
    A = np.array([[4, 1], [1, 6]])
    b = np.array([4, 6])
    true_minimum = np.linalg.solve(A, b)
    true_min_value = quadratic_func(true_minimum[0], true_minimum[1])
    
    print(f"True minimum: ({true_minimum[0]:.4f}, {true_minimum[1]:.4f})")
    print(f"True minimum value: {true_min_value:.6f}")
    
    # Run comparisons
    fig, axes = plt.subplots(2, 3, figsize=(18, 12))
    
    # Create contour plot
    x = np.linspace(-3, 6, 100)
    y = np.linspace(-2, 5, 100)
    X, Y = np.meshgrid(x, y)
    Z = quadratic_func(X, Y)
    
    for i, start in enumerate(start_points):
        # Run optimizations
        newton_path = newton_method(quadratic_func, quadratic_grad, quadratic_hessian, start)
        gd_path = gradient_descent_method(quadratic_func, quadratic_grad, start, lr=0.1)
        
        # Plot paths
        ax = axes[0, i]
        contour = ax.contour(X, Y, Z, levels=20, colors='gray', alpha=0.5, linewidths=0.5)
        ax.contourf(X, Y, Z, levels=20, cmap='viridis', alpha=0.6)
        
        # Plot optimization paths
        ax.plot(newton_path[:, 0], newton_path[:, 1], 'ro-', linewidth=2, 
               markersize=6, label=f'Newton ({len(newton_path)} steps)')
        ax.plot(gd_path[:, 0], gd_path[:, 1], 'bo-', linewidth=2, 
               markersize=4, alpha=0.7, label=f'Gradient Descent ({len(gd_path)} steps)')
        
        # Mark start and end points
        ax.plot(start[0], start[1], 'ks', markersize=10, label='Start')
        ax.plot(true_minimum[0], true_minimum[1], 'k*', markersize=15, label='True Minimum')
        
        ax.set_title(f'Start: ({start[0]:.1f}, {start[1]:.1f})', fontsize=12, weight='bold')
        ax.set_xlabel('x')
        ax.set_ylabel('y')
        ax.legend()
        ax.grid(True, alpha=0.3)
        
        # Convergence analysis
        ax2 = axes[1, i]
        
        # Function values over iterations
        newton_values = [quadratic_func(point[0], point[1]) for point in newton_path]
        gd_values = [quadratic_func(point[0], point[1]) for point in gd_path]
        
        ax2.plot(range(len(newton_values)), newton_values, 'ro-', linewidth=2, 
                markersize=6, label='Newton')
        ax2.plot(range(len(gd_values)), gd_values[:len(newton_values)*3], 'bo-', linewidth=2, 
                markersize=4, alpha=0.7, label='Gradient Descent')
        ax2.axhline(y=true_min_value, color='black', linestyle='--', 
                   alpha=0.7, label='True Minimum')
        
        ax2.set_title('Convergence Comparison', fontsize=12, weight='bold')
        ax2.set_xlabel('Iteration')
        ax2.set_ylabel('Function Value')
        ax2.set_yscale('log')
        ax2.legend()
        ax2.grid(True, alpha=0.3)
        
        # Print results
        print(f"\nStarting point {i+1}: ({start[0]:.1f}, {start[1]:.1f})")
        print(f"Newton's method:")
        print(f"  Final point: ({newton_path[-1, 0]:.6f}, {newton_path[-1, 1]:.6f})")
        print(f"  Final value: {newton_values[-1]:.10f}")
        print(f"  Iterations: {len(newton_path)}")
        print(f"Gradient descent:")
        print(f"  Final point: ({gd_path[-1, 0]:.6f}, {gd_path[-1, 1]:.6f})")
        print(f"  Final value: {gd_values[-1]:.10f}")
        print(f"  Iterations: {len(gd_path)}")
    
    plt.tight_layout()
    plt.show()
    
    print(f"\n🎯 Key Insights:")
    print(f"• Newton's method converges quadratically (very fast)")
    print(f"• Gradient descent converges linearly (slower)")
    print(f"• Newton's method requires Hessian computation (expensive)")
    print(f"• For quadratic functions, Newton's method finds exact solution in one step")

# Compare Newton's method with gradient descent
compare_newton_vs_gradient_descent()

---

# 🎯 Key Takeaways

## 📐 Multivariable Calculus Fundamentals
- **Partial derivatives**: Measure change in one direction while holding others constant
- **Gradient vectors**: Combine all partial derivatives, point toward steepest ascent
- **Level curves**: Contours where function value is constant, perpendicular to gradients
- **Directional derivatives**: Rate of change in any specified direction

## 🏔️ Optimization Landscapes
- **Convex functions**: Single global minimum, easy to optimize
- **Non-convex functions**: Multiple local minima, challenging landscapes
- **Saddle points**: Neither minima nor maxima, common in high dimensions
- **Algorithm behavior**: Different optimizers navigate landscapes differently

## 🧮 Second-Order Information
- **Hessian matrices**: Capture curvature information
- **Eigenvalue analysis**: Determines critical point classification
- **Condition numbers**: Measure optimization difficulty
- **Newton's method**: Uses curvature for faster convergence

## 🚀 Practical Applications in AI
- **Neural network training**: Gradient-based optimization in high dimensions
- **Loss landscapes**: Understanding training dynamics
- **Adaptive methods**: Adam, RMSprop use curvature approximations
- **Convergence analysis**: Why some problems are harder than others

---

# 🔮 Coming Next: Probability & Statistics

Having mastered calculus, we now turn to the mathematics of uncertainty:

- Random variables and probability distributions
- Bayes' theorem and probabilistic inference
- Markov chains and stochastic processes
- Applications to machine learning and AI

**Ready to embrace uncertainty and learn the language of probabilistic AI? Let's explore the world of randomness! 🎲**