In [None]:
import marimo as mo

# Week 2: Zero-Order Optimization Techniques**IME775: Data Driven Modeling and Optimization**ðŸ“– **Reference**: Watt, Borhani, & Katsaggelos (2020). *Machine Learning Refined* (2nd ed.), **Chapter 2**---## Learning Objectives- Understand the zero-order optimality condition- Apply global optimization methods- Apply local optimization methods- Implement random search and coordinate descent

In [None]:
import numpy as npimport matplotlib.pyplot as plt

## Introduction (Section 2.1)**Zero-order methods** optimize functions using only function evaluationsâ€”no derivatives required.### When to Use Zero-Order Methods- Derivative is unavailable or expensive to compute- Function is non-smooth or discontinuous- Black-box optimization- Hyperparameter tuning

## The Zero-Order Optimality Condition (Section 2.2)For an unconstrained minimization problem:$$\min_{w} g(w)$$A point $w^*$ is a **global minimum** if:$$g(w^*) \leq g(w) \quad \forall w$$A point $w^*$ is a **local minimum** if:$$g(w^*) \leq g(w) \quad \forall w \text{ in some neighborhood of } w^*$$

In [None]:
# Visualize global vs local minimax = np.linspace(-2, 4, 500)g = lambda x: x**4 - 4*x**3 + 4*x**2 + 2fig, ax = plt.subplots(figsize=(10, 6))ax.plot(x, g(x), 'b-', linewidth=2)# Mark local minimumax.plot(0, g(0), 'go', markersize=12, label='Local minimum at x=0')ax.plot(2, g(2), 'r*', markersize=15, label='Global minimum at x=2')ax.set_xlabel('w', fontsize=12)ax.set_ylabel('g(w)', fontsize=12)ax.set_title('Global vs Local Minima (ML Refined, Section 2.2)', fontsize=14)ax.legend()ax.grid(True, alpha=0.3)fig

## Global Optimization Methods (Section 2.3)### Exhaustive Grid SearchEvaluate function at every point on a grid:```for each w in grid:    evaluate g(w)return w with minimum g(w)```**Pros**: Guaranteed to find global minimum (on grid)  **Cons**: Exponentially expensive in dimension### ComplexityFor $N$ grid points per dimension and $d$ dimensions:- Evaluations needed: $N^d$- Curse of dimensionality!

In [None]:
# Grid search visualization    return 20 + x**2 + y**2 - 10*(np.cos(2*np.pi*x) + np.cos(2*np.pi*y))x_grid = np.linspace(-3, 3, 100)y_grid = np.linspace(-3, 3, 100)X, Y = np.meshgrid(x_grid, y_grid)Z = rastrigin(X, Y)fig2, axes = plt.subplots(1, 2, figsize=(14, 5))# Contour plotax1 = axes[0]contour = ax1.contourf(X, Y, Z, levels=30, cmap='viridis')plt.colorbar(contour, ax=ax1)ax1.set_xlabel('$w_1$')ax1.set_ylabel('$w_2$')ax1.set_title('Rastrigin Function (Many Local Minima)')# Grid search pointsax2 = axes[1]ax2.contourf(X, Y, Z, levels=30, cmap='viridis', alpha=0.5)grid_x = np.linspace(-3, 3, 10)grid_y = np.linspace(-3, 3, 10)GX, GY = np.meshgrid(grid_x, grid_y)ax2.scatter(GX, GY, c='red', s=20, label='Grid points')ax2.plot(0, 0, 'g*', markersize=20, label='Global minimum')ax2.set_xlabel('$w_1$')ax2.set_ylabel('$w_2$')ax2.set_title('Grid Search (10Ã—10 = 100 evaluations)')ax2.legend()plt.tight_layout()fig2

## Local Optimization Methods (Section 2.4)### The Descent Framework```1. Initialize: w = w_02. Repeat:   a. Choose descent direction d   b. Choose step size Î±   c. Update: w = w + Î±Â·d3. Until: convergence```### Challenges- May converge to local minimum, not global- Choice of direction and step size is crucial- Initialization affects final result

## Random Search (Section 2.5)### AlgorithmRandomly sample points and keep the best:```pythonbest_w = Nonebest_cost = infinityfor k = 1 to K:    w = sample_random_point()    cost = g(w)    if cost < best_cost:        best_w = w        best_cost = costreturn best_w```### Properties- Simple to implement- Embarrassingly parallel- Can escape local minima- Slow convergence for high dimensions

In [None]:
# Random search demonp.random.seed(42)    return (x**2 + y - 11)**2 + (x + y**2 - 7)**2# Random searchn_samples = 100samples_x = np.random.uniform(-5, 5, n_samples)samples_y = np.random.uniform(-5, 5, n_samples)costs = himmelblau(samples_x, samples_y)best_idx = np.argmin(costs)# Visualizationx_range = np.linspace(-5, 5, 100)y_range = np.linspace(-5, 5, 100)X, Y = np.meshgrid(x_range, y_range)Z = himmelblau(X, Y)fig3, ax3 = plt.subplots(figsize=(10, 8))contour = ax3.contourf(X, Y, Z, levels=30, cmap='viridis', alpha=0.7)plt.colorbar(contour, ax=ax3)# Plot samples colored by costscatter = ax3.scatter(samples_x, samples_y, c=costs, cmap='coolwarm',                       s=50, edgecolors='white', linewidths=0.5)ax3.plot(samples_x[best_idx], samples_y[best_idx], 'g*',          markersize=20, label=f'Best found: ({samples_x[best_idx]:.2f}, {samples_y[best_idx]:.2f})')ax3.set_xlabel('$w_1$')ax3.set_ylabel('$w_2$')ax3.set_title(f'Random Search: {n_samples} Samples (ML Refined, Section 2.5)')ax3.legend()fig3

## Coordinate Search and Descent (Section 2.6)### Coordinate SearchOptimize one variable at a time while holding others fixed:```pythonwhile not converged:    for j = 1 to n:        w_j = argmin_{w_j} g(w_1, ..., w_j, ..., w_n)```### Coordinate DescentMove in coordinate directions with line search:```pythonwhile not converged:    for j = 1 to n:        direction = e_j  # j-th unit vector        Î± = line_search(w, direction)        w = w + Î± * direction```### Advantages- No gradient needed- Simple to implement- Works well when dimensions are separable

In [None]:
# Coordinate descent visualization    return 0.5*w[0]**2 + 2*w[1]**2 + 0.5*w[0]*w[1]# Coordinate descent pathpath = [np.array([4.0, 3.0])]w = path[0].copy()for _ in range(10):    # Optimize w1    w[0] = -0.25 * w[1]  # Analytical solution for this quadratic    path.append(w.copy())    # Optimize w2    w[1] = -0.125 * w[0]    path.append(w.copy())path = np.array(path)# Contour plotx_range = np.linspace(-5, 5, 100)y_range = np.linspace(-4, 4, 100)X, Y = np.meshgrid(x_range, y_range)Z = 0.5*X**2 + 2*Y**2 + 0.5*X*Yfig4, ax4 = plt.subplots(figsize=(10, 8))ax4.contour(X, Y, Z, levels=20, cmap='viridis')ax4.plot(path[:, 0], path[:, 1], 'ro-', markersize=8, linewidth=2, label='Coordinate Descent Path')ax4.plot(path[0, 0], path[0, 1], 'g*', markersize=15, label='Start')ax4.plot(0, 0, 'b*', markersize=15, label='Optimum')ax4.set_xlabel('$w_1$')ax4.set_ylabel('$w_2$')ax4.set_title('Coordinate Descent (ML Refined, Section 2.6)')ax4.legend()ax4.grid(True, alpha=0.3)ax4.set_aspect('equal')fig4

## Summary| Method | Type | Pros | Cons ||--------|------|------|------|| **Grid Search** | Global | Guaranteed optimal (on grid) | Exponential cost || **Random Search** | Global | Simple, parallel | Slow convergence || **Coordinate Search** | Local | No gradients needed | Can be slow || **Coordinate Descent** | Local | Simple, works well for separable | Zigzag path |---## References- **Primary**: Watt, J., Borhani, R., & Katsaggelos, A. K. (2020). *Machine Learning Refined* (2nd ed.), Chapter 2.- **Supplementary**: Nocedal, J. & Wright, S. (2006). *Numerical Optimization*, Chapter 9.## Next Week**First-Order Optimization: Gradient Descent** (Chapter 3): Using derivatives for faster optimization.