# Optimization

What is optimization? It's finding the `min` or `max` of some `objective function` given a set of constraints. 
For example...

$$argmin_x(f_0(x))~s.t.~f_i(x) \leq 0, i = 1, ..., k$$

Most analytics and ML models are in the form of an optimization problem.

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
plt.style.use('fivethirtyeight')

## Continuous
Includes `convex` and `non-convex` optimization

#### Convex Set
contains line segments between any two points in the set.

$x_1, x_2 \in C, 0 \leq \theta \leq 1 => \theta x_1 + (1 - \theta)x_2 \in C$

#### Convex Function 
`f` is convex if the domf is a convex set and
$f(\theta x + (1 - \theta) y ) \leq \theta f(x) + (1 - \theta)f(y)$ for all $x, y \in domf, 0 \leq \theta \leq 1$

#### Strictly Convex 
`f` is strictly convex if the domf is a convex set and 
$f(\theta x + (1 - \theta) y ) \leq \theta f(x) + (1 - \theta)f(y)$ for all $x, y \in domf, 0 \lt \theta \lt 1$


How do we verify if a function is convex or not? Well, if the function is differentiable we can find it's first order approximation using `taylor theorem`. We can say that `f` is convex iff for all feasible `x` and `y` values the following is true:

$f(y) \geq + \nabla f(x)^T(y-x)$

#### Properties of convex functions
* any locally optimal point, is globally optimal
* `x` is optimal iff $\nabla f(x)^T(y - x) \geq 0$. There is no direction that could reduce the value of the function, thus it will be optimal. 


## Set Properties

### Closed Set
* A set if `closed` if it includes `bounary points`. 
* Formatlly, a set $X$ is closed if for any convergent sequence in $X$, its limit point also belongs to X, i.e. if $\{x^i\}\in X$ and $lim_{i\rightarrow \inf}x^i = x^0$ then $x^0 \in X$
 * $X = \mathbb{R}$ is `closed`
 * $X = {x: 0 \lt x \le 1}$ is `not closed`
* If non of the inequalities are strict, then the set is closed. 

### Bounded Set
* A set is bounded if it can be enclosed in a large enough box.
* Formally, the set $X$ is bounded if there exists $M\ge0$ such that $||X|| \le M$ for all $x \in X$
* A set that is both bounded and closed is called compact. 


#### Examples
* $X = \mathbb{R}^2$ is `closed` but `not bounded`
* $X = \{(x, y): x^2+y^2 \lt 1\}$ is `bounded` but `not closed`
* $X = \{(x, y): x^2+y^2 \ge 1\}$ is `closed` but `not bounded`
* $X = \{(x, y): x^2+y^2 \le 1\}$ is `closed` and `bounded` (compact)

## First Order Methods
Using first derivative to find optimal solution. Sometimes, these functions have `closed form solutions`, similar to `linear regression`, where we can take the first derivative, set it to 0, and find the point that minimizes the function. Below are some options for when we don't have closed form solutions.

The convergence rate for `GD` is $\frac{1}{k}$ which is sublinear. It can be improved with `Accelerated Gradient Descent`

### Accelerated Algorithms

### Stochastic Gradient Descent 
Consdier $F(x) = \sum_{i=1}^n f_i(x)$ with convex $f_i(x)$. To estimate the mean of the population a natural loss function to be minimized if $F(x) = \sum_{i=1}^n (y_i - x)^2$

Using `gradient descent` could be very expensive since it would have to calculate the gradient for every function. Stochatic gradient descent would be a better solution for this problem. 

1: initialize $x_1$<br>
2: for k = 1 to $K$ do<br>
&nbsp;&nbsp;&nbsp;&nbsp;3: fot i = 1 to n do<br>
&nbsp;&nbsp;&nbsp;&nbsp;4: sample an observation $i$ uniformly at random<br>
&nbsp;&nbsp;&nbsp;&nbsp;5: update $x_{k+1} = x_k - \alpha \nabla f_i(x_k)$<br>
&nbsp;&nbsp;&nbsp;&nbsp;6: end for
7: end for<br>
8: return $x_k$<br>

----

## Second Order Methods

## Newton's Method

## Quasi-Newton Method

## Broyder-Fletcher-Goldfarb-Shanno (BFGS) algorithm

---