---
title: 12.1 Applications
subject:  Optimization
subtitle: 
short_title: 12.1 Applications
authors:
  - name: Nikolai Matni
    affiliations:
      - Dept. of Electrical and Systems Engineering
      - University of Pennsylvania
    email: nmatni@seas.upenn.edu
license: CC-BY-4.0
keywords: 
math:
  '\vv': '\mathbf{#1}'
  '\bm': '\begin{bmatrix}'
  '\em': '\end{bmatrix}'
  '\R': '\mathbb{R}'
---

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/nikolaimatni/ese-2030/HEAD?labpath=/10_Ch_11_PCA_Apps/121-Apps.ipynb)

{doc}`Lecture notes <../lecture_notes/Lecture 21 - An introduction to unconstrained optimization, gradient descent, and Newton’s method.pdf>`

## Reading

Material related to this page, as well as additional exercises, can be

## Learning Objectives

By the end of this page, you should know:
- 

\textbf{A Brief Introduction to Optimization}

There were a few times during the semester where we tried to find the
``best'' vector (or matrix) among a collection of many such vectors or matrices.
For example, in least squares, we looked to find the vector $x$ that minimized
the expression $\|Ax - b\|^2$. In low-rank approximation, we looked to find the
matrix $\hat{M}$ that has rank k (or less) that minimized the expression $\|\hat{M} - M\|_F^2$.

These were both specific instances of what is called a \textit{mathematical optimization
problem}. We will focus on \textit{unconstrained problems}. You will learn a lot more
about optimization problems in ESE 3210, and today is only meant to give you
a small taste, and to show how essential linear algebra is in finding solutions.

Optimization is the process of finding one (or more) vectors $x \in \mathbb{R}^n$ that minimize
a function $f: \mathbb{R}^n \to \mathbb{R}$. This is written as the optimization problem

\begin{center}
minimize $f(x)$ \hspace{1cm} (P)
\end{center}

Here, $x \in \mathbb{R}^n$. In (P), the variable $x \in \mathbb{R}^n$ is called the \textit{decision variable},
and the function $f: \mathbb{R}^n \to \mathbb{R}$ is called the \textit{cost function} or \textit{objective function}.
Optimization problem (P) is called \textit{unconstrained} because we are free to pick any
$x \in \mathbb{R}^n$ we like to minimize $f(x)$. A constrained optimization problem has the additional
requirement that $x$ must satisfy some added conditions, e.g., lie in the solution set
of $Ax = b$. We will not consider such problems today, but you'll see many in
ESE 3210.

The goal of optimization is to find a special decision variable $x^*$ for which
the cost function $f$ is as small as possible, i.e., such that

\begin{equation}
f(x^*) \leq f(x) \quad \text{for all } x \in \mathbb{R}^n \quad (*)
\end{equation}

Such an $x^*$ is called an \textit{optimal solution} to problem (P), and is defined as
the arg min of $f$:

\begin{equation}
x^* = \arg\min_{x \in \mathbb{R}^n} f(x). \quad (AM)
\end{equation}

Equation (AM) simply says in math that $x^*$ satisfies the definition (*) of
an optimal point. [Note that if there are multiple optimal points, we instead write

\begin{equation}
x^* \in \arg\min_{x \in \mathbb{R}^n} f(x)
\end{equation}

to indicate $x^*$ belongs to the set of optimal points. Pedantic.]

\textbf{Example: The least-squares problem}

\begin{center}
minimize $\|Ax - b\|^2$
\end{center}

over $x \in \mathbb{R}^n$ is an unconstrained optimization problem. The objective function is
$f(x) = \|Ax - b\|^2$, and the optimal solution is

\begin{equation}
x^* = (A^TA)^{-1}A^Tb = A^{\dagger}b
\end{equation}

when $(A^TA)^{-1}$ exists. Otherwise, $x^* \in \arg\min \|Ax - b\|^2 \Leftrightarrow A^TAx^* = A^Tb$, i.e.,
if and only if $x^*$ satisfies the normal equations $A^TAx = A^Tb$.

Despite how simple and innocuous problem (P) looks, it can be used to encode
very rich and very challenging problems. Even for $x \in \mathbb{R}$, we can get ourselves
into trouble. Consider the following two functions that we wish to minimize:

[Two graphs are drawn here, labeled $f_1(x)$ and $f_2(x)$]

Which do you think is easier to optimize? The left figure, with function $f_1(x)$, is
"bowl-shaped" and is smallest at $x^* = 2$. What's nice about $f_1$ is that there's also an obvious
algorithm for finding $x^* = 2$: If you imagine yourself as an ant standing on the function
$f_1(x)$, all you need to do is "walk downhill" until you eventually find the bottom of the bowl.

In contrast, the right figure with function $f_2(x)$, there are many hills and valleys. The
optimal value $x^* = 3$ is the one for which $f_2(x)$ is smallest. But now if we again imagine
ourselves as an ant standing on the function $f_2(x)$, our strategy of walking downhill will not
always work! For example, if we were to start at $x = 1.5$, then walking downhill would bring
us to the bottom of the first valley at $x = 1$. Now $x = 1$ is not an optimal point,
since $f_2(3) < f_2(1)$, but from our ant's perspective, $f_2(1)$ appears to be the
smallest. We call such a point $\tilde{x}$ that satisfies $f(\tilde{x}) \leq f(x)$ for all $x$ near to
$\tilde{x}$, say for $|x - \tilde{x}| \leq \varepsilon$ for some $\varepsilon > 0$, a \textit{local minimum}, and when we wish to
emphasize that a point $x^*$ satisfying (*) is indeed the best possible value, we call it
a \textit{global minimum}.

As you may have guessed, we really like "bowl-shaped" functions for which our
walking downhill strategy finds a global minimum. Such functions satisfy a geometric
property called \textit{convexity}. A convex function $f: \mathbb{R}^n \to \mathbb{R}$ is one which satisfies

the following property:
\begin{equation}
f(\theta x + (1-\theta)y) \leq \theta f(x) + (1-\theta)f(y) \text{ for all } \theta \in [0,1] \quad (CVX)
\end{equation}
and all $x,y \in \mathbb{R}^n$

To understand what (CVX) is saying, it is best to draw what it means for a scalar
function $f: \mathbb{R} \to \mathbb{R}$.

[A graph is drawn here illustrating the convexity property]

(CVX) says that if I pick any two points $f(x)$ and $f(y)$ on the graph, and
draw a line segment between these two points, then this line lies above the graph. It
turns out that this is exactly the right way to mathematically characterize "bowl-shaped"
functions, even when $x \in \mathbb{R}^n$. The important feature of convex functions is that "walking
downhill" will always bring us to a global minimum. We will see much more about
convex functions, but you'll see them again in ESE 3210, and there is a graduate
level course, ESE 6050, which focuses entirely on convex optimization problems.

Example: The affine function $f(x) = Ax - b$ is convex. To see this, we check that
\begin{equation*}
f(\theta x + (1-\theta)y) \leq \theta f(x) + (1-\theta)f(y) \text{ for all } x,y \in \mathbb{R}^n \text{ and } \theta \in [0,1]
\end{equation*}
But $f(\theta x + (1-\theta)y) = A(\theta x + (1-\theta)y) - b = \theta (Ax - b) + (1-\theta)(Ay - b) = \theta f(x) + (1-\theta)f(y)$
Affine functions are on the "boundary" of being convex.

Example: The least squares objective $f(x) = \|Ax - b\|^2$ is convex. One can check
this from the definition (CVX), but this is very tedious.

To gain some intuition, let's consider the scalar setting $f(x) = \|ax - b\|^2$. Expanding
out $f(x)$ we see
\begin{equation*}
f(x) = x^2 \|a\|^2 - 2a^Tbx + b^2,
\end{equation*}

which is an upward pointing quadratic since $\|a\|^2 > 0$ for any $a \neq 0$. This same
intuitive extends to $x \in \mathbb{R}^n$ setting. Expanding out $f(x) = \|Ax - b\|^2$, we get

\begin{equation*}
f(x) = x^T A^TAx - 2x^TA^Tb + \|b\|^2.
\end{equation*}

This is a quadratic function, with quadratic term given by the quadratic form
$x^T A^TAx$, defined by the positive semidefinite matrix $A^TA$. This means that $f(x)$ is an
upward pointing bowl with ellipsoidal level sets (recall from Lecture 16), and hence is convex

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/nikolaimatni/ese-2030/HEAD?labpath=/10_Ch_11_PCA_Apps/121-Apps.ipynb)
