# 📈 Derivatives and Gradients: The Language of Change in AI

> *"Calculus is the mathematics of change, and in AI, everything is about learning from change."*

Welcome to the world of **calculus** - the mathematical foundation that enables AI systems to learn and optimize! Derivatives tell us how functions change, and gradients point us toward optimal solutions.

## 🎯 What You'll Master

- **Derivatives**: Understanding rates of change geometrically and algebraically
- **Symbolic computation**: Using SymPy for step-by-step derivative calculations
- **Gradients**: Vector fields and optimization directions
- **AI connections**: How derivatives power machine learning algorithms

---

In [None]:
# Essential imports for calculus visualization
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sympy as sp
from mpl_toolkits.mplot3d import Axes3D
import ipywidgets as widgets
from ipywidgets import interact, interactive, fixed
from matplotlib.animation import FuncAnimation
from IPython.display import HTML, display, Latex
from scipy.optimize import minimize

# Set up beautiful plotting
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
%matplotlib inline
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 11

# Configure SymPy for pretty printing
sp.init_printing(use_latex=True)

print("📈 Calculus laboratory initialized!")
print("Ready to explore derivatives, gradients, and optimization...")

---

# 🔍 Chapter 1: Understanding Derivatives

## What is a Derivative?

A **derivative** measures how a function changes as its input changes. Mathematically:

$$f'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h}$$

**Geometric Interpretation**: The derivative is the slope of the tangent line to the function at a given point.

**AI Connection**: Derivatives tell us how to adjust parameters to minimize loss functions!

Let's visualize this concept:

In [None]:
def visualize_derivative_concept(func, func_derivative, x_point=2, h_values=None):
    """
    Visualize the derivative as the limit of secant line slopes
    """
    if h_values is None:
        h_values = [1.0, 0.5, 0.1, 0.01]
    
    x = np.linspace(-1, 5, 1000)
    y = func(x)
    
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(16, 12))
    axes = [ax1, ax2, ax3, ax4]
    
    colors = ['red', 'blue', 'green', 'purple']
    
    for i, (h, ax, color) in enumerate(zip(h_values, axes, colors)):
        # Plot the function
        ax.plot(x, y, 'k-', linewidth=2, label=f'f(x) = x²')
        
        # Calculate secant line
        x1, x2 = x_point, x_point + h
        y1, y2 = func(x1), func(x2)
        
        # Secant line slope
        slope = (y2 - y1) / (x2 - x1)
        
        # Plot points
        ax.plot([x1, x2], [y1, y2], 'o', color=color, markersize=8)
        
        # Plot secant line
        x_line = np.linspace(x1 - 0.5, x2 + 0.5, 100)
        y_line = y1 + slope * (x_line - x1)
        ax.plot(x_line, y_line, '--', color=color, linewidth=2, 
               label=f'Secant (h={h}): slope={slope:.3f}')
        
        # True derivative at the point
        true_slope = func_derivative(x_point)
        y_tangent = func(x_point) + true_slope * (x_line - x_point)
        ax.plot(x_line, y_tangent, '-', color='black', linewidth=1, alpha=0.7,
               label=f'Tangent: slope={true_slope:.3f}')
        
        ax.set_xlim(0, 4)
        ax.set_ylim(0, 16)
        ax.set_title(f'h = {h}', fontsize=14, weight='bold')
        ax.legend()
        ax.grid(True, alpha=0.3)
        
        # Highlight the difference
        error = abs(slope - true_slope)
        ax.text(0.5, 14, f'Error: {error:.4f}', fontsize=12, 
               bbox=dict(boxstyle="round", facecolor=color, alpha=0.3))
    
    plt.suptitle(f'Derivative as Limit of Secant Lines at x = {x_point}', 
                fontsize=16, weight='bold')
    plt.tight_layout()
    plt.show()
    
    # Show the limiting process
    print(f"🔍 Derivative Approximation at x = {x_point}:")
    print(f"True derivative f'({x_point}) = {func_derivative(x_point):.6f}")
    print(f"\nSecant line approximations:")
    for h in h_values:
        slope = (func(x_point + h) - func(x_point)) / h
        error = abs(slope - func_derivative(x_point))
        print(f"h = {h:6.2f}: slope = {slope:8.6f}, error = {error:.6f}")

# Example: f(x) = x²
def quadratic(x):
    return x**2

def quadratic_derivative(x):
    return 2*x

print("📐 Visualizing Derivative Concept with f(x) = x²")
visualize_derivative_concept(quadratic, quadratic_derivative, x_point=2)

## 🧮 Symbolic Derivatives with SymPy

Let's use SymPy to compute derivatives symbolically and see the step-by-step process:

In [None]:
def explore_derivatives_with_sympy():
    """
    Explore various functions and their derivatives using SymPy
    """
    # Define symbolic variable
    x = sp.Symbol('x')
    
    # Define various functions
    functions = {
        'Polynomial': x**3 + 2*x**2 - 5*x + 1,
        'Exponential': sp.exp(x),
        'Logarithmic': sp.log(x),
        'Trigonometric': sp.sin(x),
        'Composite': sp.sin(x**2),
        'Rational': 1/x,
        'Neural Activation': 1/(1 + sp.exp(-x)),  # Sigmoid
        'ML Loss Function': x**2 + sp.log(1 + sp.exp(-x))  # Modified logistic loss
    }
    
    print("🧮 Symbolic Derivative Calculations:")
    print("=" * 80)
    
    for name, func in functions.items():
        derivative = sp.diff(func, x)
        second_derivative = sp.diff(derivative, x)
        
        print(f"\n📊 {name}:")
        print(f"   f(x)  = {func}")
        print(f"   f'(x) = {derivative}")
        print(f"   f''(x) = {second_derivative}")
        
        # Evaluate at specific point
        x_val = 1
        try:
            f_val = float(func.subs(x, x_val))
            fp_val = float(derivative.subs(x, x_val))
            fpp_val = float(second_derivative.subs(x, x_val))
            
            print(f"   At x = {x_val}: f = {f_val:.4f}, f' = {fp_val:.4f}, f'' = {fpp_val:.4f}")
        except:
            print(f"   At x = {x_val}: Cannot evaluate (domain issues)")
    
    # Demonstrate derivative rules
    print(f"\n\n🔧 Derivative Rules Demonstration:")
    print("=" * 50)
    
    # Power rule
    n = sp.Symbol('n')
    power_func = x**n
    power_deriv = sp.diff(power_func, x)
    print(f"\n📐 Power Rule:")
    print(f"   d/dx[x^n] = {power_deriv}")
    
    # Product rule
    f = x**2
    g = sp.sin(x)
    product = f * g
    product_deriv = sp.diff(product, x)
    manual_product = sp.diff(f, x) * g + f * sp.diff(g, x)
    print(f"\n🔄 Product Rule:")
    print(f"   f(x) = {f}, g(x) = {g}")
    print(f"   d/dx[f·g] = {product_deriv}")
    print(f"   f'·g + f·g' = {manual_product}")
    print(f"   Equal? {sp.simplify(product_deriv - manual_product) == 0}")
    
    # Chain rule
    outer = sp.sin(x)
    inner = x**2
    composite = sp.sin(x**2)
    chain_deriv = sp.diff(composite, x)
    manual_chain = sp.cos(x**2) * 2*x
    print(f"\n⛓️  Chain Rule:")
    print(f"   f(g(x)) = sin(x²)")
    print(f"   d/dx[f(g(x))] = {chain_deriv}")
    print(f"   f'(g(x))·g'(x) = {manual_chain}")
    print(f"   Equal? {sp.simplify(chain_deriv - manual_chain) == 0}")
    
    return functions

# Explore derivatives
function_dict = explore_derivatives_with_sympy()

## 🎮 Interactive Derivative Explorer

Let's create an interactive tool to visualize functions and their derivatives:

In [None]:
def interactive_derivative_explorer(function_type='polynomial', param_a=1, param_b=1, param_c=0):
    """
    Interactive exploration of functions and their derivatives
    """
    x = np.linspace(-5, 5, 1000)
    
    # Define function based on type and parameters
    if function_type == 'polynomial':
        y = param_a * x**2 + param_b * x + param_c
        dy_dx = 2 * param_a * x + param_b
        title = f'f(x) = {param_a}x² + {param_b}x + {param_c}'
        deriv_title = f"f'(x) = {2*param_a}x + {param_b}"
        
    elif function_type == 'trigonometric':
        y = param_a * np.sin(param_b * x + param_c)
        dy_dx = param_a * param_b * np.cos(param_b * x + param_c)
        title = f'f(x) = {param_a}sin({param_b}x + {param_c})'
        deriv_title = f"f'(x) = {param_a*param_b}cos({param_b}x + {param_c})"
        
    elif function_type == 'exponential':
        y = param_a * np.exp(param_b * x) + param_c
        dy_dx = param_a * param_b * np.exp(param_b * x)
        title = f'f(x) = {param_a}e^({param_b}x) + {param_c}'
        deriv_title = f"f'(x) = {param_a*param_b}e^({param_b}x)"
        
    elif function_type == 'sigmoid':
        y = param_a / (1 + np.exp(-param_b * (x - param_c)))
        sigmoid_term = np.exp(-param_b * (x - param_c))
        dy_dx = param_a * param_b * sigmoid_term / (1 + sigmoid_term)**2
        title = f'f(x) = {param_a}/(1 + e^(-{param_b}(x-{param_c})))'
        deriv_title = "f'(x) = sigmoid derivative"
    
    fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(18, 6))
    
    # Plot original function
    ax1.plot(x, y, 'b-', linewidth=2, label='f(x)')
    ax1.set_title(title, fontsize=14, weight='bold')
    ax1.set_xlabel('x')
    ax1.set_ylabel('f(x)')
    ax1.grid(True, alpha=0.3)
    ax1.legend()
    
    # Plot derivative
    ax2.plot(x, dy_dx, 'r-', linewidth=2, label="f'(x)")
    ax2.axhline(y=0, color='k', linestyle='--', alpha=0.5)
    ax2.set_title(deriv_title, fontsize=14, weight='bold')
    ax2.set_xlabel('x')
    ax2.set_ylabel("f'(x)")
    ax2.grid(True, alpha=0.3)
    ax2.legend()
    
    # Plot both together
    ax3.plot(x, y, 'b-', linewidth=2, label='f(x)')
    ax3.plot(x, dy_dx, 'r-', linewidth=2, label="f'(x)")
    ax3.axhline(y=0, color='k', linestyle='--', alpha=0.5)
    ax3.set_title('Function and Derivative', fontsize=14, weight='bold')
    ax3.set_xlabel('x')
    ax3.set_ylabel('y')
    ax3.grid(True, alpha=0.3)
    ax3.legend()
    
    plt.tight_layout()
    plt.show()
    
    # Analysis
    critical_points = []
    zero_crossings = []
    
    # Find approximate critical points (where derivative ≈ 0)
    for i in range(1, len(dy_dx)-1):
        if abs(dy_dx[i]) < 0.1 and dy_dx[i-1] * dy_dx[i+1] < 0:
            critical_points.append(x[i])
    
    # Find zero crossings of derivative
    for i in range(len(dy_dx)-1):
        if dy_dx[i] * dy_dx[i+1] < 0:
            zero_crossings.append(x[i])
    
    print(f"📊 Function Analysis:")
    print(f"Function type: {function_type}")
    print(f"Parameters: a={param_a}, b={param_b}, c={param_c}")
    
    if critical_points:
        print(f"Critical points (f'(x) ≈ 0): {[f'{cp:.2f}' for cp in critical_points[:3]]}")
    else:
        print(f"No critical points found in range")
    
    # Behavioral analysis
    if function_type == 'polynomial':
        if param_a > 0:
            print(f"🔄 Parabola opens upward (minimum exists)")
        elif param_a < 0:
            print(f"🔄 Parabola opens downward (maximum exists)")
        else:
            print(f"➡️ Linear function (constant derivative)")
    
    elif function_type == 'sigmoid':
        print(f"📈 Sigmoid function: S-shaped curve commonly used in neural networks")
        print(f"💡 Derivative is bell-shaped, maximum at inflection point")

# Create interactive widget
print("🎮 Interactive Derivative Explorer")
print("Explore different functions and see how parameters affect their derivatives:")

interact(interactive_derivative_explorer,
         function_type=widgets.Dropdown(
             options=['polynomial', 'trigonometric', 'exponential', 'sigmoid'],
             value='polynomial',
             description='Function type:'
         ),
         param_a=widgets.FloatSlider(value=1, min=-3, max=3, step=0.1, description='Parameter a:'),
         param_b=widgets.FloatSlider(value=1, min=-3, max=3, step=0.1, description='Parameter b:'),
         param_c=widgets.FloatSlider(value=0, min=-3, max=3, step=0.1, description='Parameter c:'));

---

# 🧭 Chapter 2: Gradients - Vectors of Change

## From Single-Variable to Multi-Variable

When we have functions of multiple variables $f(x, y)$, the **gradient** generalizes the concept of derivative:

$$\nabla f = \begin{bmatrix} \frac{\partial f}{\partial x} \\ \frac{\partial f}{\partial y} \end{bmatrix}$$

**Key Properties**:
- Points in direction of steepest increase
- Magnitude indicates rate of change
- Perpendicular to level curves

**AI Connection**: Gradients tell us how to update neural network weights!

In [None]:
def visualize_gradients_2d():
    """
    Visualize gradients for 2D functions
    """
    # Define a 2D function: f(x,y) = x² + y² + xy
    def f(x, y):
        return x**2 + y**2 + 0.5*x*y
    
    # Analytical gradient
    def gradient(x, y):
        df_dx = 2*x + 0.5*y
        df_dy = 2*y + 0.5*x
        return df_dx, df_dy
    
    # Create grid
    x = np.linspace(-3, 3, 20)
    y = np.linspace(-3, 3, 20)
    X, Y = np.meshgrid(x, y)
    Z = f(X, Y)
    
    # Compute gradients
    DX, DY = gradient(X, Y)
    
    fig = plt.figure(figsize=(20, 15))
    
    # 1. 3D surface plot
    ax1 = fig.add_subplot(2, 3, 1, projection='3d')
    surf = ax1.plot_surface(X, Y, Z, cmap='viridis', alpha=0.7)
    ax1.set_title('3D Surface: f(x,y) = x² + y² + 0.5xy', fontsize=12, weight='bold')
    ax1.set_xlabel('x')
    ax1.set_ylabel('y')
    ax1.set_zlabel('f(x,y)')
    
    # 2. Contour plot with gradient vectors
    ax2 = fig.add_subplot(2, 3, 2)
    contour = ax2.contour(X, Y, Z, levels=15, colors='black', alpha=0.5)
    ax2.clabel(contour, inline=True, fontsize=8)
    
    # Sample fewer points for gradient arrows
    step = 3
    ax2.quiver(X[::step, ::step], Y[::step, ::step], 
              DX[::step, ::step], DY[::step, ::step], 
              color='red', alpha=0.7, scale=50)
    ax2.set_title('Contours + Gradient Vectors', fontsize=12, weight='bold')
    ax2.set_xlabel('x')
    ax2.set_ylabel('y')
    ax2.set_aspect('equal')
    
    # 3. Gradient magnitude
    ax3 = fig.add_subplot(2, 3, 3)
    magnitude = np.sqrt(DX**2 + DY**2)
    im = ax3.imshow(magnitude, extent=[-3, 3, -3, 3], origin='lower', cmap='plasma')
    ax3.set_title('Gradient Magnitude', fontsize=12, weight='bold')
    ax3.set_xlabel('x')
    ax3.set_ylabel('y')
    plt.colorbar(im, ax=ax3)
    
    # 4. Level curves (detailed)
    ax4 = fig.add_subplot(2, 3, 4)
    contourf = ax4.contourf(X, Y, Z, levels=20, cmap='viridis', alpha=0.8)
    contour_lines = ax4.contour(X, Y, Z, levels=20, colors='white', linewidths=0.5)
    ax4.set_title('Level Curves (Contour Plot)', fontsize=12, weight='bold')
    ax4.set_xlabel('x')
    ax4.set_ylabel('y')
    plt.colorbar(contourf, ax=ax4)
    
    # 5. Gradient components
    ax5 = fig.add_subplot(2, 3, 5)
    ax5.quiver(X, Y, DX, DY, magnitude, cmap='coolwarm', scale=30)
    ax5.set_title('Gradient Field (Color = Magnitude)', fontsize=12, weight='bold')
    ax5.set_xlabel('x')
    ax5.set_ylabel('y')
    ax5.set_aspect('equal')
    
    # 6. Gradient descent path
    ax6 = fig.add_subplot(2, 3, 6)
    ax6.contour(X, Y, Z, levels=15, colors='gray', alpha=0.5)
    
    # Simulate gradient descent
    start_point = [2.5, 2.0]
    learning_rate = 0.1
    path = [start_point]
    
    for _ in range(20):
        current = path[-1]
        grad_x, grad_y = gradient(current[0], current[1])
        next_point = [current[0] - learning_rate * grad_x, 
                     current[1] - learning_rate * grad_y]
        path.append(next_point)
    
    path = np.array(path)
    ax6.plot(path[:, 0], path[:, 1], 'ro-', linewidth=2, markersize=4, 
            label='Gradient Descent Path')
    ax6.plot(path[0, 0], path[0, 1], 'go', markersize=10, label='Start')
    ax6.plot(path[-1, 0], path[-1, 1], 'bs', markersize=10, label='End')
    ax6.set_title('Gradient Descent Optimization', fontsize=12, weight='bold')
    ax6.set_xlabel('x')
    ax6.set_ylabel('y')
    ax6.legend()
    ax6.set_aspect('equal')
    
    plt.tight_layout()
    plt.show()
    
    # Analyze the function
    print(f"🧭 Gradient Analysis for f(x,y) = x² + y² + 0.5xy:")
    print(f"\n∇f = [∂f/∂x, ∂f/∂y] = [2x + 0.5y, 2y + 0.5x]")
    
    # Critical point (where gradient = 0)
    print(f"\n🎯 Critical Point Analysis:")
    print(f"Setting ∇f = 0:")
    print(f"2x + 0.5y = 0")
    print(f"2y + 0.5x = 0")
    print(f"Solution: x = 0, y = 0 (minimum point)")
    
    # Verify
    critical_x, critical_y = 0, 0
    grad_at_critical = gradient(critical_x, critical_y)
    value_at_critical = f(critical_x, critical_y)
    
    print(f"\n✅ Verification:")
    print(f"∇f(0,0) = {grad_at_critical}")
    print(f"f(0,0) = {value_at_critical}")
    
    print(f"\n🚀 Gradient Descent Path:")
    print(f"Started at: ({path[0, 0]:.2f}, {path[0, 1]:.2f})")
    print(f"Ended at: ({path[-1, 0]:.2f}, {path[-1, 1]:.2f})")
    print(f"Function value decreased from {f(path[0, 0], path[0, 1]):.3f} to {f(path[-1, 0], path[-1, 1]):.3f}")

# Visualize gradients
visualize_gradients_2d()

## 🎯 Gradients in Machine Learning

Let's see how gradients are used in a simple machine learning context - linear regression!

In [None]:
def demonstrate_gradients_in_ml():
    """
    Demonstrate how gradients are used in machine learning optimization
    """
    # Generate synthetic data for linear regression
    np.random.seed(42)
    n_samples = 50
    X = np.random.randn(n_samples, 1) * 2
    true_slope = 1.5
    true_intercept = 0.5
    noise = np.random.randn(n_samples, 1) * 0.3
    y = true_slope * X + true_intercept + noise
    
    # Define loss function (Mean Squared Error)
    def mse_loss(w, b, X, y):
        predictions = w * X + b
        error = predictions - y
        return np.mean(error**2)
    
    # Analytical gradients
    def compute_gradients(w, b, X, y):
        n = len(X)
        predictions = w * X + b
        error = predictions - y
        
        dw = (2/n) * np.sum(error * X)
        db = (2/n) * np.sum(error)
        
        return dw, db
    
    # Gradient descent optimization
    def gradient_descent(X, y, learning_rate=0.01, epochs=100):
        # Initialize parameters
        w = np.random.randn()
        b = np.random.randn()
        
        history = {'w': [w], 'b': [b], 'loss': [mse_loss(w, b, X, y)]}
        
        for epoch in range(epochs):
            # Compute gradients
            dw, db = compute_gradients(w, b, X, y)
            
            # Update parameters
            w = w - learning_rate * dw
            b = b - learning_rate * db
            
            # Record history
            history['w'].append(w)
            history['b'].append(b)
            history['loss'].append(mse_loss(w, b, X, y))
        
        return w, b, history
    
    # Run optimization
    final_w, final_b, history = gradient_descent(X, y, learning_rate=0.1, epochs=50)
    
    # Visualize results
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(16, 12))
    
    # 1. Data and final fit
    ax1.scatter(X, y, alpha=0.6, label='Training Data')
    x_line = np.linspace(X.min(), X.max(), 100)
    y_true = true_slope * x_line + true_intercept
    y_pred = final_w * x_line + final_b
    
    ax1.plot(x_line, y_true, 'g--', linewidth=2, label=f'True: y = {true_slope}x + {true_intercept}')
    ax1.plot(x_line, y_pred, 'r-', linewidth=2, 
            label=f'Learned: y = {final_w:.2f}x + {final_b:.2f}')
    ax1.set_title('Linear Regression Result', fontsize=14, weight='bold')
    ax1.set_xlabel('x')
    ax1.set_ylabel('y')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    
    # 2. Loss surface and optimization path
    w_range = np.linspace(0, 3, 50)
    b_range = np.linspace(-1, 2, 50)
    W, B = np.meshgrid(w_range, b_range)
    
    # Compute loss surface
    Loss = np.zeros_like(W)
    for i in range(W.shape[0]):
        for j in range(W.shape[1]):
            Loss[i, j] = mse_loss(W[i, j], B[i, j], X, y)
    
    contour = ax2.contour(W, B, Loss, levels=20, colors='gray', alpha=0.5)
    ax2.contourf(W, B, Loss, levels=20, cmap='viridis', alpha=0.7)
    
    # Plot optimization path
    path_w = np.array(history['w'])
    path_b = np.array(history['b'])
    ax2.plot(path_w, path_b, 'ro-', linewidth=2, markersize=3, 
            label='Gradient Descent Path')
    ax2.plot(path_w[0], path_b[0], 'go', markersize=10, label='Start')
    ax2.plot(path_w[-1], path_b[-1], 'bs', markersize=10, label='End')
    ax2.plot(true_slope, true_intercept, 'r*', markersize=15, label='True Parameters')
    
    ax2.set_title('Loss Surface & Optimization Path', fontsize=14, weight='bold')
    ax2.set_xlabel('Weight (w)')
    ax2.set_ylabel('Bias (b)')
    ax2.legend()
    
    # 3. Loss curve
    ax3.plot(history['loss'], 'b-', linewidth=2)
    ax3.set_title('Training Loss Over Time', fontsize=14, weight='bold')
    ax3.set_xlabel('Epoch')
    ax3.set_ylabel('Mean Squared Error')
    ax3.grid(True, alpha=0.3)
    ax3.set_yscale('log')
    
    # 4. Parameter evolution
    epochs = range(len(history['w']))
    ax4.plot(epochs, history['w'], 'r-', linewidth=2, label='Weight (w)')
    ax4.plot(epochs, history['b'], 'b-', linewidth=2, label='Bias (b)')
    ax4.axhline(y=true_slope, color='r', linestyle='--', alpha=0.7, label=f'True w = {true_slope}')
    ax4.axhline(y=true_intercept, color='b', linestyle='--', alpha=0.7, label=f'True b = {true_intercept}')
    ax4.set_title('Parameter Evolution', fontsize=14, weight='bold')
    ax4.set_xlabel('Epoch')
    ax4.set_ylabel('Parameter Value')
    ax4.legend()
    ax4.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Analysis
    final_loss = history['loss'][-1]
    initial_loss = history['loss'][0]
    improvement = ((initial_loss - final_loss) / initial_loss) * 100
    
    print(f"🤖 Machine Learning Gradient Descent Results:")
    print(f"\n📊 Final Parameters:")
    print(f"Weight (w): {final_w:.4f} (true: {true_slope})")
    print(f"Bias (b): {final_b:.4f} (true: {true_intercept})")
    
    print(f"\n📈 Training Progress:")
    print(f"Initial loss: {initial_loss:.6f}")
    print(f"Final loss: {final_loss:.6f}")
    print(f"Improvement: {improvement:.2f}%")
    
    print(f"\n🧮 Gradient Information:")
    final_dw, final_db = compute_gradients(final_w, final_b, X, y)
    print(f"Final gradients: dw = {final_dw:.6f}, db = {final_db:.6f}")
    print(f"Gradient magnitude: {np.sqrt(final_dw**2 + final_db**2):.6f}")
    
    if abs(final_dw) < 0.01 and abs(final_db) < 0.01:
        print(f"✅ Converged! Gradients are close to zero.")
    else:
        print(f"⚠️  Not fully converged. Consider more epochs or different learning rate.")
    
    return history

# Demonstrate gradients in ML
ml_history = demonstrate_gradients_in_ml()

## 🔬 Numerical vs Analytical Gradients

In practice, we often need to verify our gradient calculations. Let's compare numerical and analytical gradients:

In [None]:
def compare_gradient_methods():
    """
    Compare numerical and analytical gradient computation methods
    """
    # Define a test function: f(x,y) = x³ + y² + 2xy + sin(x)
    def test_function(x, y):
        return x**3 + y**2 + 2*x*y + np.sin(x)
    
    # Analytical gradients
    def analytical_gradient(x, y):
        df_dx = 3*x**2 + 2*y + np.cos(x)
        df_dy = 2*y + 2*x
        return df_dx, df_dy
    
    # Numerical gradients (finite differences)
    def numerical_gradient(func, x, y, h=1e-8):
        df_dx = (func(x + h, y) - func(x - h, y)) / (2 * h)
        df_dy = (func(x, y + h) - func(x, y - h)) / (2 * h)
        return df_dx, df_dy
    
    # Test points
    test_points = [
        (0, 0),
        (1, 1),
        (-1, 2),
        (2, -1),
        (0.5, 1.5)
    ]
    
    results = []
    
    print(f"🔬 Numerical vs Analytical Gradient Comparison")
    print(f"Function: f(x,y) = x³ + y² + 2xy + sin(x)")
    print(f"Analytical: ∇f = [3x² + 2y + cos(x), 2y + 2x]")
    print(f"\n{'Point':<12} {'Analytical':<25} {'Numerical':<25} {'Error':<15}")
    print(f"="*80)
    
    for x, y in test_points:
        # Compute gradients
        anal_dx, anal_dy = analytical_gradient(x, y)
        num_dx, num_dy = numerical_gradient(test_function, x, y)
        
        # Compute errors
        error_dx = abs(anal_dx - num_dx)
        error_dy = abs(anal_dy - num_dy)
        total_error = np.sqrt(error_dx**2 + error_dy**2)
        
        results.append({
            'point': (x, y),
            'analytical': (anal_dx, anal_dy),
            'numerical': (num_dx, num_dy),
            'error': total_error
        })
        
        print(f"({x:4.1f},{y:4.1f})   [{anal_dx:8.6f},{anal_dy:8.6f}]   [{num_dx:8.6f},{num_dy:8.6f}]   {total_error:10.2e}")
    
    # Visualize the comparison
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(16, 12))
    
    # Create grid for visualization
    x = np.linspace(-2, 2, 20)
    y = np.linspace(-2, 2, 20)
    X, Y = np.meshgrid(x, y)
    Z = test_function(X, Y)
    
    # Analytical gradients
    DX_anal, DY_anal = analytical_gradient(X, Y)
    
    # 1. Function surface
    contour1 = ax1.contourf(X, Y, Z, levels=20, cmap='viridis', alpha=0.8)
    ax1.contour(X, Y, Z, levels=20, colors='white', linewidths=0.5)
    ax1.set_title('Function f(x,y)', fontsize=14, weight='bold')
    ax1.set_xlabel('x')
    ax1.set_ylabel('y')
    plt.colorbar(contour1, ax=ax1)
    
    # 2. Analytical gradient field
    step = 2
    ax2.contour(X, Y, Z, levels=15, colors='gray', alpha=0.5)
    ax2.quiver(X[::step, ::step], Y[::step, ::step], 
              DX_anal[::step, ::step], DY_anal[::step, ::step], 
              color='red', alpha=0.7, scale=50)
    ax2.set_title('Analytical Gradient Field', fontsize=14, weight='bold')
    ax2.set_xlabel('x')
    ax2.set_ylabel('y')
    ax2.set_aspect('equal')
    
    # 3. Numerical vs analytical comparison at test points
    points = np.array([r['point'] for r in results])
    errors = [r['error'] for r in results]
    
    scatter = ax3.scatter(points[:, 0], points[:, 1], c=errors, 
                         s=100, cmap='Reds', edgecolors='black')
    ax3.contour(X, Y, Z, levels=10, colors='gray', alpha=0.3)
    
    for i, ((x, y), error) in enumerate(zip(points, errors)):
        ax3.annotate(f'{error:.2e}', (x, y), xytext=(5, 5), 
                    textcoords='offset points', fontsize=8)
    
    ax3.set_title('Gradient Error at Test Points', fontsize=14, weight='bold')
    ax3.set_xlabel('x')
    ax3.set_ylabel('y')
    plt.colorbar(scatter, ax=ax3, label='Error Magnitude')
    
    # 4. Error analysis
    h_values = np.logspace(-12, -1, 50)
    errors_vs_h = []
    
    test_x, test_y = 1.0, 1.0
    anal_dx, anal_dy = analytical_gradient(test_x, test_y)
    
    for h in h_values:
        num_dx, num_dy = numerical_gradient(test_function, test_x, test_y, h)
        error = np.sqrt((anal_dx - num_dx)**2 + (anal_dy - num_dy)**2)
        errors_vs_h.append(error)
    
    ax4.loglog(h_values, errors_vs_h, 'b-', linewidth=2)
    ax4.axvline(x=1e-8, color='r', linestyle='--', alpha=0.7, label='Default h = 1e-8')
    ax4.set_title('Numerical Error vs Step Size h', fontsize=14, weight='bold')
    ax4.set_xlabel('Step Size (h)')
    ax4.set_ylabel('Gradient Error')
    ax4.legend()
    ax4.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Summary statistics
    all_errors = [r['error'] for r in results]
    print(f"\n📈 Error Analysis Summary:")
    print(f"Average error: {np.mean(all_errors):.2e}")
    print(f"Maximum error: {np.max(all_errors):.2e}")
    print(f"Minimum error: {np.min(all_errors):.2e}")
    
    if np.max(all_errors) < 1e-6:
        print(f"✅ Excellent agreement! Numerical gradients are very accurate.")
    elif np.max(all_errors) < 1e-3:
        print(f"✅ Good agreement. Numerical gradients are reasonably accurate.")
    else:
        print(f"⚠️  Some discrepancy detected. Check implementation or use smaller h.")
    
    print(f"\n💡 Key Insights:")
    print(f"• Numerical gradients approximate analytical gradients very well")
    print(f"• Optimal step size h balances truncation vs round-off error")
    print(f"• Analytical gradients are exact and computationally efficient")
    print(f"• Numerical gradients useful for debugging and verification")
    
    return results

# Compare gradient methods
gradient_comparison = compare_gradient_methods()

---

# 🎯 Key Takeaways

## 📈 Derivatives - The Foundation
- **Rate of change**: How functions respond to input variations
- **Geometric meaning**: Slope of tangent line at any point
- **Limit definition**: Precise mathematical foundation
- **Symbolic computation**: SymPy enables exact calculations

## 🧭 Gradients - Multivariable Extension
- **Vector of partial derivatives**: Direction and magnitude of steepest ascent
- **Optimization compass**: Points toward function maximum
- **Level curve relationship**: Always perpendicular to contours
- **Dimensionality**: Generalizes to any number of variables

## 🤖 Machine Learning Applications
- **Parameter optimization**: Gradients guide weight updates
- **Loss minimization**: Following negative gradient reduces error
- **Convergence**: Zero gradient indicates optimal solution
- **Learning rate**: Controls step size in parameter space

## 🔬 Computational Considerations
- **Analytical vs numerical**: Exact vs approximate gradient computation
- **Verification**: Numerical gradients validate analytical derivations
- **Efficiency**: Analytical gradients are faster and more accurate
- **Automatic differentiation**: Modern frameworks compute gradients automatically

---

# 🚀 Coming Next: Chain Rule & Backpropagation

Now that we understand individual derivatives and gradients, it's time to explore the **chain rule** - the mathematical foundation of neural network training:

- Chain rule for composite functions
- Backpropagation algorithm derivation
- Neural network gradient computation
- Computational graphs and automatic differentiation

**Ready to chain your way to deep learning mastery? Let's dive into backpropagation! ⛓️**