<a href="https://colab.research.google.com/github/harunpirim/IME775/blob/main/week-02/notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>---

# Week 2: Zero-Order Optimization Techniques
**IME775: Data Driven Modeling and Optimization**
ðŸ“– **Reference**: Watt, Borhani, & Katsaggelos (2020). *Machine Learning Refined* (2nd ed.), **Chapter 2**
---
## Learning Objectives
- Understand the zero-order optimality condition
- Apply global optimization methods
- Apply local optimization methods
- Implement random search and coordinate descent


In [None]:
import numpy as np
import matplotlib.pyplot as plt

## Introduction (Section 2.1)
**Zero-order methods** optimize functions using only function evaluationsâ€”no derivatives required.
### When to Use Zero-Order Methods
- Derivative is unavailable or expensive to compute
- Function is non-smooth or discontinuous
- Black-box optimization
- Hyperparameter tuning


## The Zero-Order Optimality Condition (Section 2.2)
For an unconstrained minimization problem:
$$\min_{w} g(w)$$
A point $w^*$ is a **global minimum** if:
$$g(w^*) \leq g(w) \quad \forall w$$
A point $w^*$ is a **local minimum** if:
$$g(w^*) \leq g(w) \quad \forall w \text{ in some neighborhood of } w^*$$


In [None]:
# Visualize global vs local minima
x = np.linspace(-2, 4, 500)
g = lambda x: x**4 - 4*x**3 + 4*x**2 + 2
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(x, g(x), 'b-', linewidth=2)
# Mark local minimum
ax.plot(0, g(0), 'go', markersize=12, label='Local minimum at x=0')
ax.plot(2, g(2), 'r*', markersize=15, label='Global minimum at x=2')
ax.set_xlabel('w', fontsize=12)
ax.set_ylabel('g(w)', fontsize=12)
ax.set_title('Global vs Local Minima (ML Refined, Section 2.2)', fontsize=14)
ax.legend()
ax.grid(True, alpha=0.3)
fig

mo.md(r"""
## Global Optimization Methods (Section 2.3)
### Exhaustive Grid Search
Evaluate function at every point on a grid:
```
for each w in grid:
    evaluate g(w)


In [None]:
# Grid search visualization
def rastrigin(x, y):

## Local Optimization Methods (Section 2.4)
### The Descent Framework
```
1. Initialize: w = w_0
2. Repeat:
   a. Choose descent direction d
   b. Choose step size Î±
   c. Update: w = w + Î±Â·d
3. Until: convergence
```
### Challenges
- May converge to local minimum, not global
- Choice of direction and step size is crucial
- Initialization affects final result


mo.md(r"""
## Random Search (Section 2.5)
### Algorithm
Randomly sample points and keep the best:
```python
best_w = None
best_cost = infinity
for k = 1 to K:
    w = sample_random_point()
    cost = g(w)
    if cost < best_cost:
        best_w = w
        best_cost = cost


In [None]:
# Random search demo
np.random.seed(42)
def himmelblau(x, y):

## Coordinate Search and Descent (Section 2.6)
### Coordinate Search
Optimize one variable at a time while holding others fixed:
```python
while not converged:
    for j = 1 to n:
        w_j = argmin_{w_j} g(w_1, ..., w_j, ..., w_n)
```
### Coordinate Descent
Move in coordinate directions with line search:
```python
while not converged:
    for j = 1 to n:
        direction = e_j  # j-th unit vector
        Î± = line_search(w, direction)
        w = w + Î± * direction
```
### Advantages
- No gradient needed
- Simple to implement
- Works well when dimensions are separable


In [None]:
# Coordinate descent visualization
def quadratic(w):

## Summary
| Method | Type | Pros | Cons |
|--------|------|------|------|
| **Grid Search** | Global | Guaranteed optimal (on grid) | Exponential cost |
| **Random Search** | Global | Simple, parallel | Slow convergence |
| **Coordinate Search** | Local | No gradients needed | Can be slow |
| **Coordinate Descent** | Local | Simple, works well for separable | Zigzag path |
---
## References
- **Primary**: Watt, J., Borhani, R., & Katsaggelos, A. K. (2020). *Machine Learning Refined* (2nd ed.), Chapter 2.
- **Supplementary**: Nocedal, J. & Wright, S. (2006). *Numerical Optimization*, Chapter 9.
## Next Week
**First-Order Optimization: Gradient Descent** (Chapter 3): Using derivatives for faster optimization.
