# Optimization

## 1.1 Quadratic Minimization

- A general quadratic function of $n$ variables can be written $$f(x) = \sum_{i = 1}^n c_i x_i + \frac{1}{2} \sum_{i = 1}^n \sum_{j = 1}^n \gamma_{ij} x_i x_j.$$
- Let $\mathbf{c} = (c_1, \dots, c_n)^{\mathsf T}$, $\mathbf{x} = (x_1, \dots, x_n)^{\mathsf{T}}$, and $\mathbf{C} = [\gamma_{ij}]$, so that we attain the more compact form $$f(\mathbf{x}) = \mathbf{c}^{\mathsf T} \mathbf{x} + \frac{1}{2} \mathbf{x}^{\mathsf{T}} \mathbf{C} \mathbf{x}. \tag{1.1}$$

Here, we see that $f: \mathbb{R}^n \to \mathbb{R}$ is a real-valued function.

- **We assume $\mathbf{C}$ is symmetric**, otherwise it may be replaced by $\frac{1}{2}[\mathbf{C} + \mathbf{C}^{\mathsf{T}}]$ which is.
- **We assume $\mathbf{C}$ is SPD**, (symmetric, positive semidefinite). Therefore, $\mathbf{s}^{\mathsf{T}} \mathbf{C} \mathbf{s} \geq 0$ for all $\mathbf{s} \in \mathbb{R}^n$.

- The gradient of (1.1) with respect to the spatial vector $\mathbf{x}$ is $$\begin{align*} \nabla f(\mathbf{x}) &= \nabla (\mathbf{c}^{\mathsf{T}}\mathbf{x}) + \frac{1}{2} \nabla (\mathbf{x}^{\mathsf{T}} \mathbf{C} \mathbf{x}) \\ &= (\mathbf{c} \nabla_{\mathbf{x}} \mathbf{x} + \mathbf{x}^{\mathsf{T}} \nabla_{\mathbf{x}} \mathbf{c}^{\mathsf{T}}) + \frac{1}{2} 2 \mathbf{C} \mathbf{x} \\ &= \mathbf{c} + \mathbf{C}\mathbf{x}. \tag{1.2}\end{align*}$$

> **Example:** Solve $$\min \{-4x_1 - 4x_2 + x_1^2 + x_2^2 \mid x_1 + x_2 = 2\}.$$

In the form of (1.1), we have $$f(\mathbf{x}) = \begin{bmatrix} -4 & -4 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} + \frac{1}{2} \begin{bmatrix} x_1 & x_2\end{bmatrix} \begin{bmatrix} 2 & 0 \\ 0 & 2\end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}.$$ The Lagrangian is therefore $$\mathcal{L}(\mathbf{x}, \lambda) = \begin{bmatrix} -4 & -4 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} + \frac{1}{2} \begin{bmatrix} x_1 & x_2\end{bmatrix} \begin{bmatrix} 2 & 0 \\ 0 & 2\end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} + \lambda\left(2 - \begin{bmatrix} 1 & 1\end{bmatrix}\begin{bmatrix} x_1 \\ x_2 \end{bmatrix}\right)$$

Therefore, the total derivative of $\mathcal{L}$ is $$\nabla_{\mathbf{x}, \lambda} \mathcal{L} = \left\{\begin{bmatrix} -4 - \lambda \\ -4 -\lambda\end{bmatrix} + \begin{bmatrix} 2 & 0 \\ 0 & 2\end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}, 2 - \begin{bmatrix} 1 & 1\end{bmatrix}\begin{bmatrix} x_1 \\ x_2 \end{bmatrix}\right\}$$

$\nabla_{\mathbf{x}, \lambda} \mathcal{L} = 0$ is equivalent to the following system $$\begin{bmatrix} 1 & 1 & 0 \\ 2 & 0 & -1 \\ 0 & 2 & -1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ \lambda \end{bmatrix} = \begin{bmatrix} 2 \\ 4 \\ 4\end{bmatrix} \implies \begin{bmatrix} x_1 \\ x_2 \\ \lambda \end{bmatrix} = \begin{bmatrix} 1 \\ 1 \\2\end{bmatrix}.$$ So, the minimum is at $(x_1, x_2) = (1, 1)$. 

We can also use `scipy` in Python to solve this problem.

In [6]:
from scipy.optimize import minimize, LinearConstraint
import numpy as np 
# To use scipy optimize, follow these steps:
# 1. Define your objective function in terms of the variables x[0], x[1], ... x[n - 1]
# 2. Define an initial guess x0 = [x0, x1, ..., xn-1]
# 3. Define your linear constraint using LinearConstraint(b, lb, ub) where
# if your constraint is b^T x = d, lb = ub = d 
# 4. minimize(objective_function, x0, constraints = [linear_constraint])

def objective_function(x):
    return x[0]**2 + x[1]**2 - 4*x[0] - 4*x[1]

x0 = np.array([0.5, 0.5])
b = np.array([1, 1])
lb = 2
ub = 2

linear_constraint = LinearConstraint(b, lb, ub)

result = minimize(objective_function, x0, constraints = [linear_constraint])
# Optimal solution
print(result.x)

# Minimum objective value
print(result.fun)

[1. 1.]
-6.0


In general, when we consider the **model problem** $$\min \{\mathbf{c}^{\mathsf{T}} \mathbf{x} + \frac{1}{2} \mathbf{x}^{\mathsf{T}} \mathbf{C} \mathbf{x} \mid \mathbf{A}\mathbf{x} = \mathbf{b}\} \tag{1.4}$$ for $\mathbf{c}, \mathbf{x} \in \mathbb{R}^n$, $\mathbf{C} \in \mathbb{R}^{n \times n}$, $\mathbf{A} \in \mathbb{R}^{m \times n}$, $\mathbf{b} \in \mathbb{R}^m$, the **optimality conditions** for (1.4) are 
1. $\mathbf{A} \mathbf{x}_0 = \mathbf{b}$ and,
2. there exists a vector $u$ with $-\nabla f(\mathbf{x}_0) = \mathbf{A}^{\mathsf{T}} \mathbf{u}$, where $\mathbf{u}$ is called the **multiplier vector** for the problem. It has one component for each row of $A$. 

**Theorem.** 
1. $\mathbf{x}_0$ is optimal for (1.4) if and only if $\mathbf{x}_0$ satisfies the optimality conditions for (1.4).
2. $\mathbf{x}_0$ is optimal for (1.4) if and only if there exists an $m$-vector $\mathbf{u}$ such that $(\mathbf{x}_0, \mathbf{u})$ satisfies the linear equations $$\begin{bmatrix} \mathbf{C} & \mathbf{A}^{\mathsf{T}} \\ \mathbf{A} & 0\end{bmatrix} \begin{bmatrix} \mathbf{x}_0 \\ \mathbf{u}\end{bmatrix} = \begin{bmatrix} - \mathbf{c} \\ \mathbf{b} \end{bmatrix}.$$  

## 1.2 - Nonlinear Optimization

**Theorem.** Suppose:
1. $\mathbf{A}\mathbf{x}_0 = \mathbf{b}$,
2. there exists an $m$-vector $u$ with $-\nabla \mathbf{x}(x_0) = \mathbf{A}^{\mathsf{T}} \mathbf{u}$, and
3. the **Hessian** $$H(\mathbf{x}) = \left[\frac{\partial^2 f(\mathbf{x})}{\partial x_i \partial x_j}\right]$$ evaluated at $\mathbf{x} = \mathbf{x}_0$ is positive definite.
Then, $\mathbf{x}_0$ is a local minimizer for $$\min \{f(\mathbf{x}) \mid \mathbf{A}\mathbf{x} = \mathbf{b}\}.$$ If, in addition, $H(\mathbf{x})$ is positive definite for all $\mathbf{x}$ such that $\mathbf{Ax} = \mathbf{b}$, then $\mathbf{x}_0$ is a global minimizer.

Note that, for a quadratic function, $H = \mathbf{C}$. 

## 1.3 - Extreme Points

A **linear programming** (LP) problem is the problem of minimizing a linear function subject to linear inequality/equality constraints. 

Consider the **feasible region** $R$ defined by $$R = \{\mathbf{x} \mid \mathbf{a}_i^{\mathsf{T}} \mathbf{x} \leq b_i, \qquad i = 1, 2, \dots, m\}$$ where $\mathbf{a}_i$ is a given $n$-vector and each $b_i$ is a given scalar. Written more compactly, for $A = [\mathbf{a}_1, \mathbf{a}_2, \dots, \mathbf{a}_m]^{\mathsf{T}}$, $$R = \{\mathbf{x} \mid \mathbf{Ax} \leq \mathbf{b}\}.$$

- A point $\mathbf{x}_0$ is **feasible** if $\mathbf{x}_0 \in R$ and **infeasible** otherwise.
- For any $i$ with $1 \leq i \leq m$, the constraint $i$ is **active** at $\mathbf{x}_0$ if $\mathbf{a}_i^{\mathbf{T}} \mathbf{x}_0 = b_i$ and **inactive** at $\mathbf{x}_0$ if $\mathbf{a}_i^{\mathsf{T}}\mathbf{x}_0 < b_i$.
- A point $\mathbf{x}_0$ is an **extreme point** of $R$ if $x_0 \in R$ and there are $n$ constraints having linearly independent gradients active at $\mathbf{x}_0$. 

**Theorem.** If $\mathbf{A} \in \mathbb{R}^{m \times n}$, then the LP problem $R$ generated by $\mathbf{A}$ has extreme points (supposing $R$ is nonempty) if and only if $\mathrm{rank}(\mathbf{A}) = n$.

In the case of the LP problem $$\min \{\mathbf{c}^{\mathsf{T}}\mathbf{x} \mid \mathbf{A} \mathbf{x} \leq b\},$$ we have the following theorem

**Theorem.** For the LP problem above, assume that $\mathrm{rank}(\mathbf{A}) = n$ and that the problem is bounded from below. Then one of the extreme points of the above LP problem is an optimal solution for it.

So, when $\mathbf{A}$ has full rank and the problem is bounded from below, we can just look at the extreme points to find our optimal solution!

> **Example:** Solve $$\min \{-\boldsymbol{\mu}^{\mathsf{T}}\mathbf{x} \mid \mathbf{x} \geq 0, \mathbf{l}^{\mathsf{T}}\mathbf{x} = 1\}$$ where $\mathbf{l} = (1, 1, \dots, 1)^{\mathsf{T}}$, and $\mu_i$ is the expected return on asset $i$ and $x_i$ is the proportion of wealth invested in each asset $i$, for $i = 1, 2, \dots, n$. 

Minimizing $-\boldsymbol{\mu}^{\mathsf{T}}\mathbf{x}$ is equivalent to maximizing $\boldsymbol{\mu}^{\mathsf{T}}\mathbf{x}$ so the given problem is equivalent to maximizing the total expected return $\mu_1 x_1 + \cdots + \mu_n x_n$ subject to a **no short sales constraint** $(\mathbf{x} \geq 0)$ and the budget constraint which requires the sum of the proportions to be 1. 

Notice that,
- The budget constraint is *always active*,
- $n - 1$ of he non-negativity constraints must be active at any extreme point,
- the extreme points are those at which 100% is invested in any given asset.

By the above theorem, one of the extreme points is an optimal solution, so the objective function at these points are just $-\mu_1, \dots, \mu_n$ and the smallest of these occurs for that index $k$ at which $\mu_k = \max\{\mu_1, \dots, \mu_n\}$ and the optimal holdings is $\mathbf{x}^\ast$ where $x^\ast_i = 0$ for $i = 1, \dots, n$, $i \neq k$, and $x_k^\ast = 1$.

$\star$ If one is trying to maximize his/her expected return subject to no short sales constraints, the optimal solution is to invest all wealth into the asset with the highest expected return. We also have the Python script:

In [30]:
%run quadratic_minimization_script.py

Optimal solution x = [1. 1.]
Multipliers for Ax = b: [2.]
