# Convex Optimization, chapter 1, Introduction-
## Topics to cover
### What is mathematical optimization
### Least-squares and linear programming
### Convex optimization
### How to read the book

## What is a mathematical optimization problem ?
* Can be written in a canonical form
  \begin{align}
    \text{minimize} \quad & f_0(x) \\
    \text{such that} \quad & f_i(x) \leq b_i, i=1,\dots,m
  \end{align}
* Vocabulary
  * $x = (x_0, . . . , x-{n-1})$ is the optimization variable of the problem
  * $f_0 : \mathbb{R}^n \Leftrightarrow \mathbb{R}$ is the objective function
  * $f_i : \mathbb{R}^n \Leftrightarrow \mathbb{R}$, $i = 1, \dots, m$ are the (inequality) constraint functions
  * The constants $b_1, \dots, b_{m}$ are the limits, or bounds, for the constraints

## Class of optimization problems
* The type of optimization problem depends on the nature of $f_0, f_i \dots$
  * We have a linear program if $f_i$ are linear: i.e $f_i(\alpha x + \beta y)=\alpha f_i(x)+\beta f_i(y)$
  * If a problem is not linear, it is called a nonlinear program, examples are quadratic, geometric, cp, socp
* The book is about convex optimization problems
  * $f_i$ should be convex, ie:
    \begin{align}
      & f_i(\alpha x + \beta y) \leq \alpha f_i(x)+\beta f_i(y)\\
      & \forall x,y \in \mathbb{R}^n \text{ and } \forall \alpha, \beta \in \mathbb{R}^+ \text{ s.t } \alpha + \beta=1 
    \end{align}
    

## Solving optimization problems
* There are many types of algorithm to solve various types of problems
  * Some problems can be solve with multiple methods
  * choice depends on number of variables, number of constraints
  * Also depends on some specific structure, ex, sparsity: very few variables involved in the constraints
  * Smooth objective and constraints $\neq$ easy problem



## Least-squares and linear programming
* Least square is ubiquitous in science/engineering ! it is unconstrained:
  \begin{align}
    \text{minimize } f_0 = \|Ax-b\|_2^2 = \sum_{i=0}^{k-1} \left(a_i^t \cdot x - b_i\right)^2
  \end{align}
* Least square was very successfull in part because of its implicit bayesian  interpretation.
* It can be considered as a maximum log-likelihood estimator for homoscedastic normal distribution
* Is is smooth, twice differentiable, strongly convex
  * derivative of the smooth convex function reads
    \begin{align}
      f_0(x) &= \frac{1}{2} \|Ax-b\|_2^2\\
      &=  \frac{1}{2} x^t A^t A x + b^t b - x^t A^t b \\
      f_0'(x) &= A^t A x - A^t b
    \end{align}
  * Derivative of convex function vanishes at the minimum: $f_0'(x) = 0 \Leftrightarrow A^t A x = A^t b$
* It has at least a closed form solution (Moore penrose pseudo inverse):$A^+ b$ where $A^+ = \left(A^t A \right)^{-1} A^t$
* For a well conditionned matrix, solution can be found in $n^2 k$ operations
* This later condition can be seen as a one shot newton method (second order method)

## Least square as a statistical estimator
* Lets explain this
  * $b = Ax^* + \epsilon$ with $\epsilon$ the outcome of a random process that follows an homoscedastic multivariate normal law: $\epsilon \sim \mathcal{N}(0,\sigma^2)$
  * Bayes theorem: $p(x|b) = \frac{p(x) \times p(b|x)}{p(b)}$
  * $b$ the observation vector, probability $p(b)$ without statistical apriori, is supposed equiprobable: $p(b)=\alpha_0$
  * $x$ the candidate solution, without apriori, also supposed equiprobable over $\mathbb{R}^{k}$: $p(x)=\alpha_1$.
  * Marginal version of the conditional probability $p(b_i|x) = \frac{1}{\sqrt{2\pi\sigma^2}} exp^{-\frac{((Ax)_i-b_i )^2}{2\sigma^2}}$
  * All $b_i $ are independants with the same distribution, we can write $ p(b|x) = \prod_{i=0}^{k-1} p(b_i|x)$  
  * Can be written in a vectorial fashion: $p(b|x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{\|Ax-b\|_2^2}{2\sigma^2}}$
  * The likelihood of $x$ then reads:  
  \begin{equation}
    p(x|b) = \frac{\frac{\alpha_1}{\sqrt{2\pi\sigma^2}} exp^{-\frac{\|Ax-b\|_2^2}{2\sigma^2}}}{\alpha_0}
  \end{equation}
  * Maximizing the logarithm of this likelihood, without the constants, amounts to:
  \begin{align}
    &\underset{x \in \mathbb{R}^k}{max} \quad -\frac{1}{2}||Ax - b||_{2}^{2} \\
    \Leftrightarrow \quad &\underset{x \in \mathbb{R}^k}{min} \quad \frac{1}{2}||Ax - b||_{2}^{2}
  \end{align}

## Least square generalization 1
* The weighted least square (heteroscedastic case)
  * \begin{align}
      L(b) &= \frac{1}{\sqrt{(2\pi)^n det(\Sigma)}} e^{-\frac{1}{2} (Ax-b)^{\intercal}\Sigma^{-1}(Ax-b)} \\
    \end{align}
  * $\Sigma$ a covariance matrix, symmetric, positive semi definite, can be diagonalized in an orthogonal basis : $\Sigma = Q^t D Q$, such that $\Sigma^{-1} = Q^t D^{-1} Q$ :   
    \begin{align}
      &\underset{x \in \mathbb{R}^k}{max} \quad -\frac{1}{2} (Ax-b)^t\Sigma^{-1}(Ax-b) \\
      \Leftrightarrow \quad &\underset{x \in \mathbb{R}^n}{min} \quad \frac{1}{2} (Ax-b)^t Q^t D^{-1} Q (Ax-b) \\
      \Leftrightarrow \quad &\underset{x \in \mathbb{R}^n}{min} \quad \frac{1}{2} (Ax-b)^t Q^t D^{-\frac{1}{2}}D^{-\frac{1}{2}} Q (Ax-b) \\
      \Leftrightarrow \quad &\underset{x \in \mathbb{R}^n}{min} \quad \frac{1}{2} \|D^{-\frac{1}{2}} Q Ax - D^{-\frac{1}{2}} Qb\|_2^2 \\
      \Leftrightarrow \quad &\underset{x \in \mathbb{R}^n}{min} \quad \frac{1}{2} \|A'x - b'\|_2^2
    \end{align}

## Least square generalization 2
* The Tychonov regularization that favors minimum energy solution, also has a statistical interpretation
  \begin{align}
     \underset{x \in \mathbb{R}^k, \alpha \geq 0}{min} \quad \frac{1}{2} \|Ax-b\|_2^2 + \alpha \|x\|_2^2 \\
  \end{align}
* derivative of the smooth convex function now reads
    \begin{align}
      f_0(x) &= \frac{1}{2} \|Ax-b\|_2^2 + \alpha \|x\|_2^2\\
      &=  \frac{1}{2} x^t A^t A x + \alpha x^t x + b^t b - x^t A^t b \\
      &=  \frac{1}{2} x^t (A^t A + 2 \alpha Id) x + b^t b - x^t A^t b \\
      f_0'(x) &= (A^t A + 2 \alpha Id) x - A^t b
    \end{align}
  * Derivative of convex function vanishes at the minimum: $f_0'(x) = 0 \Leftrightarrow (A^t A + 2 \alpha Id) x = A^t b$
  * It has also a closed form solution uder symmetric positive definitiveness assumption :$A^+ b$ where $A^+ = \left(A^t A + 2 \alpha Id \right)^{-1} A^t$

## Linear programming
* A linear program reads
  \begin{align}
    \text{minimize} \quad & c^t x\\
    \text{s.t} \quad & a_i^t x \leq b_i, i=0,1,\dots m-1
  \end{align}
* No simple analytical solution, but well known algorithms in solvers: simplex method and interior point methods
* n^2 m complexity for interior point methods in the worst case

## How to read the book

> *Our main goal is to help the reader develop a working knowledge of
convex optimization, i.e., to develop the skills and background needed
to recognize, formulate, and solve convex optimization problems.*

* The book is divided into three main parts, titled Theory, Applications, and Algorithms

##Random examples
