# Lecture 1 Section 1: *Terminology and Existence/Uniqueness for Univariate Optimization*

# Part I: Preliminaries and Definitions
We let $\mathbb{R}$ denote the set of real numbers. Suppose $f:\mathbb{R}\rightarrow\mathbb{R}$ is a real-valued function. An **optimization program** involves a search for the largest (or, smallest) value $f$ attains, as well as any possible points $x$ for which $f(x)$ is exactly this largest (or, smallest) value. Such an $f$ is often called an **objective function** or just an **objective**.

To begin with, a **minimization program** over $\mathbb{R}$ is written
$$
(P_\min):\:\:\min_{x\in\mathbb{R}}\: f(x)
$$
where $f:\mathbb{R}\rightarrow\mathbb{R}$ is a real-valued function. A point $x^\ast\in\mathbb{R}$ is said to be a **solution** to (or, **minimizer**/**minimum** of) $(P_\min)$ if $f(x^\ast)\leq f(x)$ for all $x\in\mathbb{R}$. 

A **maximization program** over $\mathbb{R}$ is written
$$
(P_\max):\:\:\max_{x\in\mathbb{R}}\: f(x).
$$
A point $x^\ast\in\mathbb{R}$ is said to be a **solution** to (or, **maximizer**/**maximum** of) $(P_\max)$ if $f(x)\leq f(x^\ast)$ for all $x\in\mathbb{R}$.

Two optimization programs are said to be **equivalent** if a solution for one is always a solution for the other as well.

## The Reflection Principle

If $x^\ast$ is a solution to $(P_\max)$, then $x^\ast$ is a solution to 
$$
\min_{x\in\mathbb{R}}\: -f(x).
$$
Similarly, if $z_\ast$ is a solution $(P_\min)$, then $z_\ast$ is a solution to
$$
\max_{x\in\mathbb{R}}\: -f(x).
$$

In [None]:
'''
Example 01: The reflection principle
'''

# Import numerical python and pyplot
import numpy as np # Namespace is np
import matplotlib.pyplot as plt # Namespace is plt

def f(x):
    '''
    A simple quadratic polynomial
    :param x: a numerical value or numpy array
    :return: 1-x^2
    '''
    return 1 - x**2

x = np.linspace(-2, 2, 100) # a uniform partition of [0, 1] consisting of 100 points
g = lambda z: -f(z) # anonymous function composing f with negation

plt.figure('The reflection principle')

plt.subplot(1, 2, 1)
plt.plot([-2, 2], [0, 0], 'k--') # The dashed line is y=0
plt.plot(x, f(x)) # f(x) works because of "vectorization"
plt.axis([-2, 2, -4, 4])
plt.xlabel('x')
plt.ylabel('f(x)')
plt.title('Original function')

plt.subplot(1, 2, 2)
plt.plot([-2, 2], [0, 0], 'k--') # The dashed line is y=0
plt.plot(x, g(x))
plt.axis([-2, 2, -4, 4])
plt.xlabel('x')
plt.ylabel('-f(x)')
plt.title('Reflected function')

plt.show()

It follows from the reflection principle that any maximization program is equivalent to the minimization program involving the negated function. Thus, we need only consider minimization programs when we talk about optimization.

A **minimum value** of $f:\mathbb{R}\rightarrow\mathbb{R}$ is a value $p\in\mathbb{R}$ such that $p\leq f(x)$ and if $y\leq f(x)$ for all $x\in\mathbb{R}$, then $y\leq p$. Similarly, a **maximum value** of $f$ is a value $q$ satisfying $f(x)\leq q$ for all $x\in\mathbb{R}$ and if $f(x)\leq y$ for all $x\in\mathbb{R}$, then $q\leq y$.

## Constrained Optimization
The above optimization programs over all of $\mathbb{R}$ are often called **unconstrained** optimization programs. If instead we are only interested in optimizing over a subset $X\subset\mathbb{R}$, we write the **constrained** optimization program as
$$
(P):\:\:\min f(x)\text{ subject to }x\in X
$$
In this case, a **minimum value** is any $p$ satisfying $p\leq f(x)$ for all $x\in X$, and if $r\leq f(x)$ for all $x\in\mathbb{R}$, then $r\leq p$. A **solution**/**minimizer**/**minimum** is any $x^{(0)}\in X$ such that $f(x^{(0)})\leq f(x)$ for all $x\in X$.

Any point $x\in X$ is called a **feasible point** or just **feasible**, and the set $X$ is often called the **feasible region**, **region of optimization**, or **set of feasible points**. If $x\not\in X$, then $x$ is **not feasible**. If $X$ contains no points (i.e. $X$ is the empty set, or $X=\emptyset$), then the program $(P)$ is called **infeasible**.

## Team Problems
1. Write down a function which does not have a minimum value.
2. Write down a function which has a minimum value, but which does not have a minimzer.
3. Can a function ever have a minimizer, but fail to have a minimum value?
4. Suppose $x^\ast$ solves $\min_{x\in\mathbb{R}} -f(x)$. What is the maximum value of $f$ over $\mathbb{R}$?
5. Explain why the minimum value of $f$ is unique if it exists; that is, if $p$ and $q$ are both minimum values of $f$, show that $p=q$. Answer: Since $p$ is a minimum value, then $p\leq q$. Also, $q\leq p$ since $q$ is minimum. By trichotomy, $p=q$.
6. Explain why $\min x^2$ subject to $x^2+2x+2\leq 0$ is an infeasible program.
7. Suppose $f, g:\mathbb{R}\rightarrow\mathbb{R}$ and $p>-\infty$ is the minimum value of $g$. Is $p$ the minimum value of $h=g\circ f$ (i.e. $h(x)=g(f(x))$? 
8. A function $f:\mathbb{R}\rightarrow\mathbb{R}$ is **monotone decreasing** if $f(x)>f(y)$ for all $x<y$. If $f$ is monotone decreasing, show that it cannot have a minimizer. Answer: Suppose $x^\ast$ is a minimizer. Then $f(x^\ast+1)<f(x^\ast)$ since $x^\ast < x^\ast + 1$, so we have contradicted the fact that $x^\ast$ is a solution.

# Part II: Important considerations
When approaching any optimization program, there are some basic questions that need to be answered.

1. Does $(P_\min)$ have a minimum value?
2. Does $(P_\min)$ have a solution?
3. Does $(P_\min)$ have a **unique** solution?
4. When $(P_\min)$ has a minimum value, how can we find an $\widetilde{x}$ such that $f(\widetilde{x})$ is close to the minimum value?
5. When $(P_\min)$ has a solution $x^\ast$, how can we find an $\widetilde{x}$ which is close to $x^\ast$?

Without imposing additional structure on $f$, the answers to these questions are generally "no." In each of the following sections, we explore conditions on $f$ which allow us to know when the answers to the above questions are "yes."

# Part III: Existence of minimum values and minimizers

If there is an $L\in\mathbb{R}$ such that $L\leq f(x)$ for all $x\in\mathbb{R}$, then $f$ is said to be **bounded below**.

### Theorem: If $f$ is bounded below, then $(P_\min)$ has a minimum value.

A function $f:\mathbb{R}\rightarrow\mathbb{R}$ is **continuous** if $f(x)=\lim_{n\rightarrow\infty} f(x_n)$ whenever $x=\lim_{n\rightarrow\infty} x_n$, and where $\{x_n\}_{n=1}^\infty$ is a real sequence.

Given a set $X\subset\mathbb{R}$, a function $f:X\rightarrow\mathbb{R}$ is **continuous on $X$** if $f(x)=\lim_{n\rightarrow\infty} f(x_n)$ whenever $\lim_{n\rightarrow\infty} x_n=x$, where $\{x_n\}_{n=1}^\infty\subset X$ and $x\in X$. That is, $\{x_n\}_{n=1}^\infty$ is a sequence in $X$ that converges to a point in $X$.

A set $X\subset\mathbb{R}$ is said to be **closed** if whenever $\{x_n\}_{n=1}^\infty\subset X$ (that is, $\{x_n\}_{n=1}^\infty$ is a sequence in $X$), and $x=\lim_{n\rightarrow\infty}x_n$, then $x\in X$. That is, $X$ is closed with respect to convergent limits.

A set $X\subset\mathbb{R}$ is said to be **bounded** if there is an $R>0$ with $X\subset (-R, R)$. $X$ is said to be **compact** if $X$ is closed and bounded.

In 1D, the simplest class of examples of compact sets are intervals of the form $[a, b]$, where $a,b\in \mathbb{R}$ and $a\leq b$.


### Theorem (Extreme Value Theorem): If $X$ is compact and $f:X\rightarrow\mathbb{R}$ is continuous on $X$, then $f$ has a minimizer $x^\ast\in X$.


In [None]:
'''
Convexity Examples
'''

t = np.linspace(0, 1, 100)
x = np.linspace(-10, 10, 1000)
a = 3
b = 8
l = (1-t)*a + t*b

plt.plot(x, np.abs(x))
plt.plot(l, (1-t)*np.abs(a) + t*np.abs(b), 'k--')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.title('Convex, but not strictly convex')
plt.show()

plt.plot(x, np.exp(x/10))
plt.plot(l, (1-t)*np.exp(a/10) + t*np.exp(b/10), 'k--')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.title('Strictly convex, but not strongly convex')
plt.show()

plt.plot(x, x**4)
plt.plot(l, (1-t)*a**4 + t*b**4, 'k--')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.title('Strictly convex, but $f^{\prime\prime}(0)=0$')
plt.show()

plt.plot(x, x**2)
plt.plot(l, (1-t)*a**2 + t*b**2, 'k--')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.title('Strongly convex')
plt.show()

It is not too difficult to establish that strong convexity implies strict convexity implies convexity. With additional work, convexity can be shown to imply continuity (except possibly at endpoints).

#### Theorem (Convex Functions are Continuous): If $X$ is convex and open, and $f:X\rightarrow\mathbb{R}$ is convex on $X$, then $f$ is continuous on $X$.

If $f$ is convex, then $(P_\min)$ is called a **convex program**. 

Now, **strict minimizer**/**unique minimizer** of $f$ on $X$ is a point $x^\ast$ such that $f(x^\ast)<f(x)$ for all $x\in X\setminus\{x^\ast\}$. We say that $f\in C^2(X)$ if $f\in C^1(X)$ and $f^\prime\in C^1(X)$.

#### Theorem (Fundamental Theorem of Convex Programming): If $X\subset\mathbb{R}$ is convex, compact, and $f:X\rightarrow\mathbb{R}$ is  convex on $X$, then the set of minimizers of $f$ on $X$ form a convex set. Moreover, if $f$ strictly convex on $X$, then $f$ has a unique minimizer on $X$.

This is a very strong theoretical guarantee, but we need some tools for verifying convexity. Let $C^1(X)$ denote the set of all functions $f:X\rightarrow\mathbb{R}$ such that $f^\prime(x)$ exists for all $x\in X$, and $f^\prime(x)$ is continuous on $X$. 

#### Theorem (First Order Conditions for Convexity): If $X\subset\mathbb{R}$ is convex, then $f\in C^1(X)$ is convex if and only if $f(x)\geq f(y) + f^\prime(y)(x-y)$ for all $x,y\in X$. If $f(x) > f(y) + f^\prime(y)(x-y)$ for all $x,y\in X$, then $f$ is strictly convex.

The first order conditions are not always easy to verify, so we would also like second order conditions to make our lives easier. 

#### Theorem (Second Order Conditions for Convexity):  If $X\subset\mathbb{R}$ is convex, then $f\in C^2(X)$ is convex if and only if $f^{\prime\prime}(x)\geq 0$ for all $x\in X$. If $f^{\prime\prime}(x)>0$ for all $x\in X$, then $f$ is strictly convex.

These conditions are quite useful in practice, but there are also several operations that preserve convexity.

#### Theorem (Positive Weighted Sum of Convex is Convex): If $X\subset\mathbb{R}$ is convex, $f,g:X\rightarrow\mathbb{R}$ are convex on $X$, and $a, b\geq 0$, then $h:X\rightarrow\mathbb{R}$ defined by $h(x) = af(x) + bg(x)$ for all $x\in X$ is convex on $X$.

#### Theorem (Pointwise Maximum of Convex is Convex): If $X\subset\mathbb{R}$ is convex and $f, g:X\rightarrow\mathbb{R}$ are convex on $X$, then $h:X\rightarrow\mathbb{R}$ defined by $h(x) = \max(f(x), g(x))$ for all $x$ is also convex on $X$.


In [None]:
# Example: pointwise maximum of two functions

t = np.linspace(-1, 1, 100)
f = lambda x: (x-1)**2
g = lambda x: np.exp(x)
h = lambda x: max(f(x), g(x)) # The pointwise maximum of f and g

h_vals = np.zeros(100)
for i in range(100):
    h_vals[i] = h(t[i]) 
    
plt.figure('Pointwise Max Example')
plt.subplot(1, 3, 1)
plt.plot(t, f(t))
plt.subplot(1, 3, 2)
plt.plot(t, g(t))
plt.subplot(1, 3, 3)
plt.plot(t, h_vals)
plt.show()





A function $f:\mathbb{R}\rightarrow\mathbb{R}$ is called **affine** if there are $a, b\in\mathbb{R}$ such that $f(x)=ax+b$ for all $x\in\mathbb{R}$.

#### Theorem (Convexity Preservation under Affine Precomposition): Suppose $X\subset\mathbb{R}$ is convex, $f:X\rightarrow\mathbb{R}$ is convex on $X$, $a,b\in\mathbb{R}$, and set $Y = \{y\in\mathbb{R}: ay+b\in X\}$. Then $Y$ is convex and $g:Y\rightarrow\mathbb{R}$ defined by $g(y) = f(ay+b)$ for all $y\in Y$ is convex on $Y$.

For this next theorem, we define the **image** of a function $f:X\rightarrow\mathbb{R}$ as the set 
$$
f(X)=\{y\in\mathbb{R}: f(x)=y\text{ for some }x\in X\}.
$$
Note that if $X$ is convex and $f$ is convex over $X$, then $f(X)$ is also convex: if $p, q\in f(X)$, there are $v, w\in X$ with $f(v)=p, f(w)=q$. Let $t\in[0,1]$ and set $r=(1-t)p + tq$. Since $r$ is between $p$ and $q$, there is $z$ between $v$ and $w$ such that $f(z)=r$ by the Intermediate Value Theorem, and hence $(1-t)p+tq\in f(X)$ for all $t\in[0,1]$.

We also say that a function $f:X\rightarrow\mathbb{R}$ is **non-decreasing** or **order preserving** on $X$ if $x\leq y$ implies $f(x)\leq f(y)$ for all $x, y\in X$.

#### Theorem (Convexity Preservation under Convex Monotone Transformation): Suppose $X\subset\mathbb{R}$ is convex, $f:X\rightarrow\mathbb{R}$ is convex on $X$, and that $g:f(X)\rightarrow \mathbb{R}$ is convex and non-decreasing, then $g\circ f: X\rightarrow\mathbb{R}$ is convex on $X$.

## Team Questions

1. Show that $f(x)=x\arctan(x) - \frac{1}{2}\log(1+x^2)$ is strictly convex on $\mathbb{R}$. 
2. Show that $f(x)=-\log(-x)$ is strictly convex and order preserving on $(-\infty,0)$. 
3. If $X$ is convex and $f_1, f_2, \ldots, f_k:X\rightarrow\mathbb{R}$ are all convex, explain why $g:X\rightarrow\mathbb{R}$ defined by $g(x)=\sum_{i=1}^k f_i(x)$ is also convex.  
4. If $X$ is convex and $f_1, f_2, \ldots, f_k:X\rightarrow\mathbb{R}$ are all convex, explain why $g:X\rightarrow\mathbb{R}$ defined by $g(x)=\max_{i=1,\ldots, k} f_i(x)$ is also convex. 
5. For $x_1,x_2,\ldots,x_N\in\mathbb{R}$, show that
$$
\max_{x\in\mathbb{R}}\prod_{i=1}^N e^{-\frac{1}{2}(x-x_i)^2}
$$
is equivalent to a convex optimization program.
6. A function is **concave** on $\mathbb{R}$ if $f((1-t)x+ty)\geq (1-t)f(x)+tf(y)$ for all $x,y\in\mathbb{R}$ and all $t\in[0,1]$. Explain why $f$ is concave if and only if $-f$ is convex. 
7. What is the geometric interpretation of the first order condition $f(x)\geq f(y)+f^\prime(y)(y-x)$? 
8. The **epigraph** of a function $f:\mathbb{R}\rightarrow\mathbb{R}$ is the set defined by $\text{epi}(f) = \{(x,y)\in\mathbb{R}\times\mathbb{R}: f(x)\leq y\}$. Show that $f$ is convex if and only if $((1-\alpha)x_0+\alpha x_1, (1-\alpha)y_0+\alpha y_1)$ for all $(x_0, y_0), (x_1, y_1)\in\text{epi}(f)$ and all $\alpha\in[0,1]$ (in other words, $\text{epi}(f)$ is a convex 2D set and contains all line segments connecting points inside of it). 