Our main problem is
\begin{align*}
&\min_{x \in \Omega} f(x) \quad f : \mathbb R^n \supseteq \Omega \to \mathbb R,
\end{align*}
where $\Omega$ is one of the following three types:
 - $\Omega = \mathbb R^n$.
 -  $\Omega$ open.
 - $\Omega$ the closure of an open set.
 
We can consider minimization problems without any loss of generality, since any maximization problem can be converted to a minimization problem by taking the negative of the function in question: that is,
$$
\max_{x \in \Omega} f(x) = \min_{x \in \Omega} -f(x).
$$

### _Defn_. Feasible Direction
Given $\Omega \subseteq \mathbb R^n$ and a point $x_0 \in \Omega$, we say that the vector $v \in \mathbb R^n$ is a __feasible direction__ at $x_0$ if there is an $\overline{s} > 0$ such that $x_0 + sv \in \Omega$ for all $s \in [0, \overline{s}]$.

## _Thrm_. First order necessary condition for a local minimum (FONC)
__Claim__. If $f : \mathbb R^n \supseteq \Omega \to \mathbb R$ is $C^1$ and $x_0$ is a local minimizer of $f$ in the interior of $\Omega$, then $\nabla f(x_0) = 0$.

_proof_. If $x_0$ is an interior point of $\Omega$, then all directions at $x_0$ are feasible. In particular, for any such $v$, we have $\nabla f(x_0) \cdot (v) \geq 0$ and $\nabla f(x_0) \cdot (-v) \geq 0$, which implies $\nabla f(x_0) = 0$ as all directions are feasible at $x_0$.

__Claim__. Let $f : \mathbb R^n \supseteq \Omega \to \mathbb R$ be $C^1$. If $x_0 \in \Omega$ is a local minimizer of $f$, then $\nabla f(x_0) \cdot v \geq 0$ for all feasible directions $v$ at $x_0$.

_proof_.  Reduce to a single-variable problem by defining $g(s) = f(x_0 + sv)$, where $s \geq 0$. Then $0$ is a local minimizer of $g$. Taylor's theorem gives us
\[
g(s) - g(0) = s g'(0) + o(s) = s \nabla f(x_0) \cdot v + o(s).
\]
If $\nabla f(x_0) \cdot v < 0$, then for sufficiently small $s$ the right side is negative. This implies that $g(s) < g(0)$ for those $s$, a contradiction. Therefore $\nabla f(x_0) \cdot v \geq 0$.

### Example
_Consider the optimization problem_
\begin{align*}
\min_{x \in \Omega} f(x,y) = x^2 - xy + y^2 - 3y \qquad \text{over } \Omega = \mathbb R^2.
\end{align*}

By the corollary to the FONC, we want to find the points $(x_0, y_0)$ where $\nabla f(x_0, y_0) = 0$. We have
\begin{align*}
\nabla f(x,y) = (2x-y, -x+2y-3),
\end{align*}
so we want to solve 
\begin{align*}
2x - y &= 0 \\
-x + 2y &= 3,
\end{align*}
which has solution $(x_0, y_0) = (1,2)$. Therefore $(1,2)$ is the only \emph{candidate} for a local minimizer. That is, if the function $f$ has a local minimizer in $\mathbb R^2$, then it must be $(1,2)$.

It turns out that $(1,2)$ is a global minimizer for $f$ on $\Omega = \mathbb R^2$. By some work, we have
\[
f(x,y) = \left(x - \frac{y}{2}\mathbb Right)^2 + \frac{3}{4}(y-2)^2 - 3.
\]
In this form, it is obvious that a global minimizer occurs at the point where the squared terms are zero, if such a point exists. That point is $(1,2)$.

### Example
_Consider the problem_
\begin{align*}
\min_{x \in \Omega} f(x,y) = x^2 - x + y + xy \qquad \text{over } \Omega = \{(x,y) \in \mathbb R^2 : x,y \geq 0\}.
\end{align*}

We have
$$\nabla f(x,y) = (2x + y - 1, x + 1).$$
To apply the FONC, we'll divide the feasible set $\Omega$ into four different regions.   
Suppose that $(x_0, y_0)$ is a local minimizer of $f$ on $\Omega$.
 - $(x_0, y_0)$ is an interior point:  
By the corollary to the FONC, we must have $\nabla f(x_0, y_0) = 0$. Then $x_0 = -1$, which is not in the interior of $\Omega$. This case fails.


 - $(x_0, y_0)$ on the positive x-axis:  
Then we are considering $(x_0, 0)$. The feasible directions at $(x_0, 0)$ are those vectors $v \in \mathbb R^2$ with $v_2 \geq 0$. The FONC tells us that $\nabla f(x_0,0) \cdot v \geq 0$ for all feasible directions $v$. We then have
$$(2x_0 - 1)v_1 + (x_0 + 1)v_2 \geq 0$$
for all $v_1$ and all $v_2 \geq 0$. In particular, this holds for $v_2 = 0$, so $(2x_0 - 1)v_1 \geq 0$ for all $v_1$, implying $x_0 = 1/2$. Therefore $(1/2, 0)$ is a candidate for a local minimizer of $f$ on $\Omega$ - this is the only candidate for a local minimizer of $f$ on the positive $x$-axis.

 - $(x_0, y_0)$ on the positive y-axis:  
 Then we are considering $(0, y_0)$. The feasible directions here are $v \in \mathbb R^2$ with $v_1 \geq 0$. Then we have
$$(y_0 - 1)v_1 + v_2 \geq 0$$
for any $v_2$ and $v_1 \geq 0$. This is a contradiction if we take $v_1 = 0$, so $f$ has no local minimizers along the positive $y$-axis.


 - $(x_0, y_0)$ is the origin:  
 Then we are considering $(0,0)$. The feasible directions here are $v \in \mathbb R^2$ with $v_1, v_2 \geq 0$. Then we have $$-v_1 + v_2 \geq 0$$
for all $v_1, v_2 \geq 0$, a contradiction. Therefore the origin is not a local minimizer of $f$.
\end{enumerate}
We conclude that the only candidate for a local minimizer of $f$ is $(1/2, 0)$. It turns out that this is actually a global minimizer of $f$ on $\Omega$. (This is to be seen.)


## _Thrm_. Second order necessary condition for a local minimum (SONC)
__Claim__. Let $f : \mathbb R^n \supseteq \Omega \to \mathbb R$ be $C^2$. If $x_0 \in \Omega$ is a local minimizer of $f$, then for any feasible direction $v$ at $x_0$ the following conditions hold:
- $\nabla f(x_0) \cdot v \geq 0$.
- If $\nabla f(x_0) \cdot v = 0$, then $v^T \nabla^2 f(x_0) v \geq 0$.

_proof_. Fix a feasible direction $v$ at $x_0$. Then $f(x_0) \leq f(x_0 + sv)$ for sufficiently small $s$. By Taylor's theorem,
$$f(x_0 + sv) = f(x_0) + s \nabla f(x_0) + \frac{1}{2} s^2 v^T \nabla^2 f(x_0) v + o(s^2)$$
so by the FONC,
$$f(x_0 + sv) - f(x_0) = \frac{1}{2} s^2 v^T \nabla^2 f(x_0) v + o(s^2)$$
If $v^T \nabla^2 f(x_0) v < 0$, then for sufficiently small $s$ the right side is negative, implying that $f(x_0 + sv) < f(x_0)$ for such $s$, which contradicts local minimality of $f(x_0)$. Therefore $v^T \nabla^2 f(x_0) \geq 0$.

__Corollary__. If $f : \mathbb R^n \supseteq \Omega \to \mathbb R$ is $C^2$ and $x_0$ is a local minimizer of $f$ in the interior of $\Omega$, then $\nabla f(x_0) = 0$ and $\nabla^2 f(x_0)$ is positive semidefinite.

### _Defn_. Principal Minor
A __principal minor__ of a square matrix $A$ is the determinant of a submatrix of $A$ obtained by removing any $k$ rows and the corresponding $k$ columns, $k \geq 0$. A leading principal minor of $A$ is the determinant of a submatrix obtained by removing the last $k$ rows and $k$ columns of $A$, $k \geq 0$.

## _Thrm_. Sylvester's criterion 
 - __For positive definite self-adjoint matrices__ If $A$ is a self-adjoint matrix, then $A \succ 0$ if and only if all of the leading principal minors of $A$ are positive.
 - __For positive semidefinite self-adjoint matrices__
If $A$ is a self-adjoint matrix, then $A \succeq 0$ if and only if all of the principal minors of $A$ are non-negative.

### Example 
Consider the problem
\begin{align*}
\min_{x \in \Omega} f(x,y) = x^2 - xy + y^2 - 3y \qquad \text{over } \Omega = \mathbb R^2.
\end{align*}
Recall that $(1,2)$ was the only candidate for a local minimizer of $f$ on $\Omega$. We now check that the SONC holds. Since $(1,2)$ is an interior point of $\Omega$, we must have $\nabla^2 f(1,2) \succeq 0$. We have
$$\nabla^2 f(1,2) = \begin{pmatrix}
2 & -1 \\ -1 & 2
\end{pmatrix}$$
All of the leading principal minors of $\nabla^2 f(1,2)$ are positive, so $(1,2)$ satisfies the SONC by Sylvester's criterion. 

### Example

Consider the problem
\begin{align*}
\min_{x \in \Omega} f(x,y) = x^2 - x + y + xy \qquad \text{over } \Omega = \{(x,y) \in \mathbb R^2 : x,y \geq 0\}.
\end{align*}
Recall that $(1/2, 0)$ was the only candidate for a local minizer of $f$. We have
$$\nabla^2 f(1/2, 0) = \begin{pmatrix}
2 & 1 \\
1 & 0
\end{pmatrix}$$
To satisfy the SONC, we must have $v^T \nabla^2 f(1/2, 0) v \geq 0$
for all feasible directions $v$ at $(1/2, 0)$ such that $\nabla f(1/2, 0) \cdot v = 0$. We have $\nabla f(1/2, 0) = (0, 3/2)$ 
so if $v = (v_1, 0)$, then $v$ is a feasible direction at $(1/2, 0)$ with $\nabla f(1,2, 0) \cdot v = 0$. Then
$$v^T \nabla^2 f(1/2, 0) v = \begin{pmatrix}
v_1 & 0
\end{pmatrix}\begin{pmatrix}
2 & 1 \\
1 & 0
\end{pmatrix}\begin{pmatrix}
v_1 \\ 0
\end{pmatrix} = \begin{pmatrix}
v_1 & 0
\end{pmatrix} \begin{pmatrix}
2v_1 \\ v_1
\end{pmatrix} = 2v_1^2 \geq 0$$
So the SONC is satisfied.

## Completing the Square

Let $A$ be a symmetric positive definite $n \times n$ matrix. Our problem is 
\begin{align*}
\min_{x \in \Omega} f(x) = \frac{1}{2} x^T Ax - b \cdot x \qquad \text{over } \Omega = \mathbb R^n.
\end{align*} 
The FONC tells us that if $x_0$ is a local minimizer of $f$, then since $x_0$ is an interior point, $\nabla f(x_0) = 0$. We thus have $Ax_0 = b$, so since $A$ is invertible (positive eigenvalues), $x_0 = A^{-1}b$. Therefore $x_0 = A^{-1}b$ is the unique candidate for a local minimizer of $f$ on $\Omega$.


The SONC then tells us that $\nabla^2 f(x_0) = A$, so that $\nabla^2 f(x_0) \succ 0$, implying that $x_0   = A^{-1}b$ is a candidate for a local minimizer of $f$ on $\Omega$.

In fact, the candidate $x_0$ is a global minimizer. Why? We will "complete the square". We can write
$$f(x) = 	\frac{1}{2} x^T Ax - b \cdot x = \frac{1}{2}(x - x_0)^T A(x-x_0) - \frac{1}{2} x_0^T A x_0$$
this relies on symmetry. (Long rearranging of terms.) In this form it is obvious that $x_0$ is a global minimizer of $f$ over $\Omega$.

## _Thrm_. Second order sufficient conditions for interior local minimizers

__Lemma__ If $A$ is symmetric and positive-definite, then  there is an $a > 0$ such that $v^T A v \geq a \|v\|^2$ for all $v$.

_proof_. There is an orthogonal matrix $Q$ with $Q^T A Q = \mathrm{diag}(\lambda_1, \dots, \lambda_n)$. If $v = Qw$,
\begin{align*}
v^T A v &= (Qw)^T A Qw \\
&= w^T (Q^T A Q) w \\
&= \lambda_1 w_1^2 + \cdots + \lambda_n w_n^2 \\
&\geq \min\{\lambda_1, \dots, \lambda_n\} \|w\|^2 \\
&= \min\{\lambda_1, \dots, \lambda_n\} \|v\|^2 \qquad \text{since $Q$ is orthogonal}
\end{align*}
Since $A$ is positive-definite, every eigenvalue is positive and we are done.

__Claim__. Let $f$ be $C^2$ on $\Omega \subseteq \mathbb R^n$, and let $x_0$ be an interior point of $\Omega$ such that $\nabla f(x_0) = 0$ and $\nabla^2 f(x_0) \succ 0$. Then $x_0$ is a strict local minimizer of $f$.

_proof_. The condition $\nabla^2 f(x_0) \succ 0$ implies there is an $a > 0$ such that $v^T \nabla^2 f(x_0) v \geq a \cdot \|v\|^2$ for all $v$. By Taylor's theorem we have
$$f(x_0 + v) - f(x_0) = \frac{1}{2} v^T \nabla^2 f(x_0) v + o(\|v\|^2) \geq \frac{1}{2} a\|v\|^2 + o(\|v\|^2) = \|v\|^2 \left( \frac{a}{2} + \frac{o(\|v\|^2)}{\|v\|^2} \right)$$
For sufficiently small $v$ the right hand side is positive, so $f(x_0 + v) > f(x_0)$ for all such $v$. Therefore $x_0$ is a strict local minimizer of $f$ on $\Omega$.


### Example
Consider $f(x,y) = xy$. The gradient is $\nabla f(x,y) = (y,x)$ and the Hessian is 
$$\nabla^2 f(x,y) = \begin{pmatrix}
0 & 1 \\ 1 & 0
\end{pmatrix}$$
Suppose we want to minimize $f$ on all of $\Omega = \mathbb R^2$. By the FONC, the only candidate for a local minimizer is $(0,0)$. The Hessian's eigenvalues are $\pm 1$, so it is not positive definite. We conclude by the SONC that the origin is not a local minimizer of $f$.


### Example
Consider the same function $f(x,y) = xy$ on $\Omega = \{(x,y) \in \mathbb R^2, x, y \geq 0\}$. We claim that every point of the boundary of $\Omega$ is a local minimizer of $f$.

Consider $(x,0)$ with $x > 0$. The feasible directions here are $v$ with $v_2 \geq 0$. The FONC tells us that $\nabla f(x,0) \cdot v\geq 0$. This dot product is $xv_2 \geq 0$, so $(x,0)$ satisfies the FONC. Therefore every point on the positive x-axis is a candidate for a local minimizer. As for the SONC, $\nabla f(x,0) \cdot v = xv_2 = 0$ if and only if $v_2 = 0$. Then $v^T \nabla^2 f(x,0) v = 0$. Of course, this tells us nothing; we need a sufficient condition that works for boundary points. That's for next lecture.

Or, you could just say that $f = 0$ on the boundary of $\Omega$ and is positive on the interior, so every point of the boundary of $\Omega$ is a local minimizer (not strict) of $f$.

# Additional Examples

## Example 1

_Let $a\in\mathbb R$ and $f_a:\mathbb R^2\rightarrow \mathbb R, f_a(x, y)= x^2 + 2y^2 + axy - y$._

### Part (a) 
_Find the points satisfy FOC_

The partial derivative gives 
$$\frac{\partial f_a}{\partial x} = 2x+ay, \frac{\partial f_a}{\partial y} = 4y+ax - 1$$
Set the derivatives to 0 to meet the FOC.  
If $a = 0$, then $x = 0, y = 1/4$ satisfies FOC.  
If $a\neq 0$, then to make $\begin{bmatrix}a&4\\2&a\end{bmatrix}\begin{bmatrix}x\\y\end{bmatrix} = \begin{bmatrix}1\\0\end{bmatrix}$ has solutions, the reduced row echelon form gives  $\begin{bmatrix}a&4\\0&\frac{a^2}2- 4\end{bmatrix}\begin{bmatrix}x\\y\end{bmatrix} = \begin{bmatrix}1\\-1\end{bmatrix}$, If $a = \pm 2\sqrt{2}$, then there is no local minimum, otherwise, we have the unique solution $y = \frac{2}{8-a^2}, x = \frac1a(1-\frac{8}{8-a^2}) = \frac{-a}{8-a^2}$.   
To summarize,  
If $a = 0$, then $x=0, y=1/4$  satisfies the FOC
If $a = \pm 2\sqrt 2$, then there is no points satisfy the FOC  
If $a\neq 0$ and $a\neq \pm2\sqrt2$, then $x = \frac{-a}{8-a^2}, y = \frac{2}{8-a^2}$ satisfies the FOC.

### Part (b)
_Find the points satisfy SOC_ 

The Hessian matrix gives 
$$F_a = \begin{bmatrix}2&a\\a&4\end{bmatrix}$$
Note that $F_a$ is positive semidefinite iff $\det(F_a) = 2\times 4-a^2 > 0$, so that for any $a\in(-2\sqrt 2, 2\sqrt 2)$, the points satisfies the SOC. 

### Part (c) 
_Prove the local minimum is actually global minimum_ 

_proof 1. Prove by completing the square_  
Note that $f_a(x, y) = \frac12\begin{bmatrix}x\\y\end{bmatrix}\cdot \begin{bmatrix}2&a\\a&4\end{bmatrix}\begin{bmatrix}x\\y\end{bmatrix} - \begin{bmatrix}0\\1\end{bmatrix}\cdot\begin{bmatrix}x\\y\end{bmatrix}$  
Therefore, using completing the square method, let $x^* = \begin{bmatrix}x^*\\y^*\end{bmatrix}\frac{1}{8-a^2}\begin{bmatrix}4&-a\\-a&2\end{bmatrix}\begin{bmatrix}0\\1\end{bmatrix} = \begin{bmatrix}\frac{-a}{8-a^2}\\\frac{2}{8-a^2}\end{bmatrix}$, then 
$$f_a(x, y) = \frac12\begin{bmatrix}x-x^*\\y-y^*\end{bmatrix}\cdot \begin{bmatrix}2&a\\a&4\end{bmatrix}\begin{bmatrix}x-x^*\\y-y^*\end{bmatrix} - \frac12\begin{bmatrix}x^*\\y^*\end{bmatrix}\cdot \begin{bmatrix}2&a\\a&4\end{bmatrix}\begin{bmatrix}x^*\\y^*\end{bmatrix}$$
Note that when $a\in (-2\sqrt2, 2\sqrt 2)$, the matrix $\begin{bmatrix}2&a\\a&4\end{bmatrix}$ is positive-semidefinite, i.e. for any $\begin{bmatrix}x\\y\end{bmatrix}$, we have $\begin{bmatrix}x-x^*\\y-y^*\end{bmatrix}\cdot \begin{bmatrix}2&a\\a&4\end{bmatrix}\begin{bmatrix}x-x^*\\y-y^*\end{bmatrix}\geq 0$. Therefore, the minimum can only be reached when $\begin{bmatrix}x-x^*\\y-y^*\end{bmatrix} = 0 \Rightarrow \begin{bmatrix}x\\y\end{bmatrix} = \begin{bmatrix}x^*\\y^*\end{bmatrix} = \begin{bmatrix}\frac{-a}{8-a^2}\\\frac{2}{8-a^2}\end{bmatrix}$  
_proof 2. Prove by convexity_  
Let $\vec x_1 = (x_1, y_1)\in \mathbb R^2, \vec x_2 = (x_2, y_2) \in \mathbb R^2$, let $c\in [0, 1]$.  
For $c = 1$, obviously $f_a(1x_1 +0x_2) = 1f_a(x_1) + 0 f_a(x_2)$, similarly for $c=0$.  
For $c\in(0,1)$, denote $Q = \begin{bmatrix}2&a\\a&4\end{bmatrix}$
\begin{align*}
f(c\vec x_1 + (1-c)\vec x_2) &= \frac 12(c\vec x_1 + (1-c)\vec x_2)^T Q (c\vec x_1 + (1-c)\vec x_2) - \begin{bmatrix}0\\1\end{bmatrix}(c\vec x_1 + (1-c)\vec x_2)\\
&= \frac12\big[
c^2\vec x_1^TQ\vec x_1 + c(1-c)\vec x_1^TQ\vec x_2+c(1-c)\vec x_2^TQ\vec x_1 + (1-c)^2\vec x_2^TQ\vec x_2
\big] \\
&\quad- \begin{bmatrix}0\\1\end{bmatrix}(c\vec x_1 + (1-c)\vec x_2)
\end{align*}
Therefore, 
\begin{align*}
&\quad cf_a(\vec x_1) + (1-c)f_a(\vec x_2) - f_a(c_x1+(1-c)x_2)\\
&=\frac{1}{2}(c-c^2)\vec x_1^TQx_1 - c(1-c)\vec x_1^TQ\vec x_2-c(1-c)\vec x_2^TQ\vec x_1 +((1-c)-(1-c)^2)\vec x_2^TQ\vec x_2\\
&= \frac12c(1-c)[x_1^TQx_1-x_1^TQx_2-x_2^TQx_1+x_2^TQx_2]\\
&= \frac12c(1-c)(\vec x_1-\vec x_2)^TQ(\vec x_1-\vec x_2)
\end{align*}
Note that when $a\in(-2\sqrt 2, 2\sqrt 2)$, $Q$ is positive semidefinite, hence $\frac12c(1-c)(\vec x_1-\vec x_2)^TQ(\vec x_1-\vec x_2) \geq 0$. Therefore, $cf_a(\vec x_1) + (1-c)f_a(\vec x_2) \geq f_a(c_x1+(1-c)x_2)$. By the definition of convex function, $f_a$ is convex, and the local minimum is the global minimum.

## Example 2
_Find the local minimum point(s) for $f(x, y,z ) = (x-\frac y2)^2 + \frac34(y-2)^2 + z^2 -3$ on $S = \{(x,y,z)\in\mathbb R^3 \mid x\leq 0, y \geq 0\}$, and prove the local minimum is actually a global minimum._


First of all, let $g(x, y) = f(x, y, 0) = (x-\frac y2)^2 + \frac34(y-2)^2-3$. Obsever that $\forall z\in\mathbb R. z^2 \geq 0$, hence minimizes $f$ is equivalent to minimize $g$ as having $z=0$. 

First, the partial derivative
$$\nabla g = (2x-y, 2y-x-3)$$

1. $x < 0, y > 0$   
solves the system of equation that $\nabla f = \vec0$, 
$$x=1, y=2, z =0$$
does not lie in the set, there is no local min in the interior of $S$

2. $x = 0, y > 0$  
Let the feasible direction be $v = (v_1, v_2), v_1 \leq 0$ and want 
$$\nabla f \cdot v = (2\cdot 0-y)v_1 + (2y-0-3)v_2 =-yv_1 + (2y-3)v_2\geq 0$$
Note that for any $v_1 \leq 0, -yv_1 \geq 0$ so that the condition is equivalent to have $2y-3 = 0\Rightarrow y = \frac32$, and the candidate is $(0, \frac32)$

3. $x < 0, y = 0$
Let the feasible direction be $v = (v_1, v_2), v_2 \geq 0$ and want 
$$\nabla f \cdot v = (2x-0)v_1 + (2\cdot 0-x-3)v_2 =2xv_1 - (x+3)v_2\geq 0$$
Note that for direction $(1/2, 1), \nabla f\cdot v = 2x\cdot\frac12 - (x + 3) = -3 < 0$ for any $x$, hence there is no local min in this case. 

4. $x, y = 0$
Let the feasible direction be $v = (v_1, v_2), v_1 \leq 0, v_2 \geq 0$ and want 
$$\nabla f \cdot v = 0v_1 + (-3)v_2 = -3v_2\geq 0$$
does not hold for $v_2 \geq 0$, hence no local minimum.

## Example 3
_Show that $xx^T$, where $x\in\mathbb R^n$, is positive semidefinite_

Let $x\in\mathbb R^n, a\in\mathbb R^n$.  
\begin{align*}
a^T(xx^T)a &= (a^Tx)(x^Ta)\\
&= (x^Ta)^T(x^Ta) &\text{take transpose}\\
&= \|x^Ta\|^2\\
&\geq 0
\end{align*}
Therefore, by definition of positive semidefinite, $xx^T$ is positive semidefinite 

## Example 4

### Part (a)
_Let $f(x) = b\cdot Ax$, $A$ is $n\times m$ matrix, $x\in\mathbb R^m, b\in\mathbb R^n$, show that $\nabla f(x)=A^Tb$._ 

First, note that
\begin{align*}
b\cdot Ax &= 
\begin{bmatrix}b_1\\\vdots\\b_n
\end{bmatrix}\cdot 
\begin{bmatrix}
A_{11}&\cdots&A_{1m}\\
\vdots&\ddots&\vdots\\
A_{n1}&\cdots&A_{nm}
\end{bmatrix}
\begin{bmatrix}x_1\\\vdots\\x_m
\end{bmatrix}\\
&= \begin{bmatrix}b_1\\\vdots\\b_n
\end{bmatrix}
\cdot 
\begin{bmatrix}\sum_{i=1}^mA_{1i}x_i\\\vdots\\\sum_{i=1}^mA_{ni}x_i
\end{bmatrix}\\
&= \sum_{j=1}^n b_j\sum_{i=1}^m A_{ji}{x_i}
\end{align*}
Therefore, for each component $x_i$, we can easily derive the partial derivative as 
$$
\frac{\partial f}{\partial x_i} =\sum_{j=1}^n b_j A_{ji}
$$
and so that $\nabla f = \begin{bmatrix}\sum_{j=1}^n b_j A_{j1}\\\vdots\\\sum_{j=1}^n b_j A_{jm}\end{bmatrix} = A^Tb$

### Part (b)  
_Let $f(x) = x\cdot Ax$, show that $\nabla f(x) = (A+A^T)x$_

\begin{align*}
\nabla (x^TAx) &= \nabla(\sum_{i=1}^n \sum_{j=1}^n A_{ij}x_ix_j)\\
&= \begin{bmatrix}\sum_{i=1}^n A_{i1}x_i + \sum_{j=1}^n A_{1j}x_j\\...\\\sum_{i=1}^n A_{in}x_i + \sum_{j=1}^n A_{nj}x_j\end{bmatrix}\\
&= Ax + A^Tx\\
&= (A+A^T)x
\end{align*}

## Example 5

_Let $f:\mathbb R^{2n} \rightarrow \mathbb R, f(x, y) = \frac12|Ax-By|^2$, where $A,B$ are $m\times n$ matrices, $x, y\in\mathbb R^n$_

### Part (a) 
_Find $\nabla f, \nabla^2f$_ 

Note that $f(x, y) = \frac12 (Ax-By)^T(Ax-By)$, let $g(x, y) = Ax-By, h(a) = a^Ta$ so that $f(x, y) = \frac12h(g(x, y))$. Therefore, using chain rule, 
$$D f(x,y) = \frac12D (h\circ g)(x,y) = \frac12Dh(g(x,y)) \cdot Dg(x,y)$$
Note that $h(a) = a^T a = a^TI a$ where $I$ is the identity matrix, so that by Question 4 Part (b), $Dh(a) = (I+I^T)a = 2a$.  
Then, note that $\frac{\partial g}{\partial x} = A, \frac{\partial g}{\partial y} = -B$, hence $Df(x, y) = \begin{bmatrix}[A]&[-B]\end{bmatrix}$, i.e. matrix $A, -B$ stacked horizontally.  
Therefore, 
$$\nabla f(x, y) = \frac12 \cdot 2 [Ax-By]\cdot \begin{bmatrix}[A]\\ [-B]\end{bmatrix} = \begin{bmatrix}[A]& [-B]\end{bmatrix}^T(Ax-By)$$

Then, note that $(Ax-By)^T\begin{bmatrix}[A]& [-B]\end{bmatrix} = \begin{bmatrix}[A]& [-B]\end{bmatrix}^T Ax - \begin{bmatrix}[A]& [-B]\end{bmatrix}^TBy$. Therefore, 
$$\frac{\partial}{\partial x} \begin{bmatrix}[A]& [B]\end{bmatrix}^T Ax - \begin{bmatrix}[A]& [B]\end{bmatrix}^TBy = \begin{bmatrix}[A]& [-B]\end{bmatrix}^T A$$
$$\frac{\partial}{\partial y} \begin{bmatrix}[A]& [B]\end{bmatrix}^T Ax - \begin{bmatrix}[A]& [B]\end{bmatrix}^TBy = \begin{bmatrix}[-A]& [B]\end{bmatrix}^T B$$
Rewrite into matrix form, 
$$\nabla^2 f = \begin{bmatrix}[A^TA]& [-B^TA]\\ [-A^TB] & [B^TB]\end{bmatrix}$$

### Part (b)
_If $(x_0, y_0)$ satisfies $Ax_0 = By_0$, then $(x_0, y_0)$ is a local minimum_

Note that $\nabla f(x_0, y_0) = \begin{bmatrix}[A]& [B]\end{bmatrix}^T(Ax_0-By_0) = \begin{bmatrix}[A]& [B]\end{bmatrix}^T0 = 0$ which satisfies FOC.  

Also, note that the Hessian matrix can be rewrite as 
$$\nabla^2 f = \begin{bmatrix}[A^TA]& [-B^TA]\\ [-A^TB] & [B^TB]\end{bmatrix} = \begin{bmatrix}A\\-B\end{bmatrix}^T\begin{bmatrix}A\\-B\end{bmatrix}$$ 
so that it is a positive semidefinite matrix, which satisfy the SOC.

## Example 6

_Let $g$ be a convex function on $\mathbb R^n$, $f$ be a linear, nondecreasing function on a single variable_

### Part (a) 
_Prove $F:=f\circ g$ is convex_

\begin{align*}
F(\theta x + (1-\theta) y) = f(g(\theta x + (1- \theta) y)))
\end{align*}
By convexity of $g$, 
$$g(\theta x + (1- \theta) y) \leq \theta g(x) + (1-\theta)g(y)$$ 
By non-decreasing of $f$
$$f(g(\theta x + (1- \theta) y)) \leq f(\theta g(x) + (1-\theta)g(y))$$
By linearity of $f$
$$f(\theta g(x) + (1-\theta)g(y)) = \theta f(g(x)) + (1-\theta) f(g(y)) = \theta F(x) + (1-\theta) F(y)$$
By the definition of convex, the claim is proven. 

### Part (b)
\begin{align*}
\nabla^2F(x) &= \frac{\partial}{\partial x}(\frac{\partial f}{\partial g}\cdot \frac{\partial g}{\partial x}) &\text{chain rule}\\
&= (\frac{\partial}{\partial x}\frac{\partial f}{\partial g})\cdot \frac{\partial g}{\partial x} + \frac{\partial f}{\partial g}\cdot(\frac{\partial }{\partial x}\frac{\partial g}{\partial x})&\text{product rule}\\
&= (\frac{\partial^2 f}{\partial g^2}\cdot  \frac{\partial g}{\partial x})\cdot \frac{\partial g}{\partial x} + \frac{\partial f}{\partial g}\cdot \frac{\partial^2 g}{\partial x^2} &\text{chain rule}
\end{align*}
Rewrite the derivatives with the matrix multiplication notation
\begin{align*}
\nabla^2 F(x) &= [d^2 f(g(x)) \nabla g(x)]\nabla g(x)^T + d f(g(x))\nabla^2 g(x)\\
&= d^2 f(g(x))\nabla g(x)\nabla g(x)^T +  df(g(x))\nabla^2 g(x)
\end{align*}

Because $f$ is linear, $d^2f(g(x)) = 0$  
Because $f$ is non-decreasing, $df(g(x)) \geq 0$  
Because $g$ is convex, $\nabla^2 g(x)$ is positive semidefinite  
Also, note that $\nabla g(x) \nabla g(x)^T$ is positive semidefinite
Therefore, a positive semidefinite matrix scaled by a positive number is still positive semidefinite. 

## Example 7
_$f:\mathbb R^2\rightarrow \mathbb R$ is non-negative convex function, $F:\mathcal A\rightarrow \mathbb R, F(\mu) = \int_0^1f(\mu(x), \mu'(x))dx$ where $\mathcal A = \{\mu\in C^1: [0,1]\rightarrow\mathbb R\}$. Prove $F$ is convex on $\mathcal A$_


Let $a\in (0, 1), u_1, u_2\in \mathcal A$. 
$$
F(au_1 + (1-a)u_2)
= \int_0^1f(au_1(x) + (1-a)u_2(x), (au_1 + (1-a)u_2)'(x)))dx$$
Using chain rule
$$= \int_0^1f(au_1(x) + (1-a)u_2(x), au_1'(x) + (1-a)u_2'(x))dx$$
Note that for any $x\in [0, 1]$, by convexity of $f$
$$f(au_1(x) + (1-a)u_2(x), au_1'(x) + (1-a)u_2'(x))
\leq af(u_1(x), u_1'(x)) + (1-a) f(u_2(x), u_2'(x))$$
Because $f$ is non-negative

\begin{align*}
&\quad\int_0^1 f(au_1(x) + (1-a)u_2(x), au_1'(x) + (1-a)u_2'(x))dx\\
&\leq \int_0^1 af(u_1(x), u_1'(x)) + (1-a) f(u_2(x), u_2'(x))dx\\
&= a\int_0^1 f(u_1(x), u_1'(x))dx + (1-a)\int_0^1 f(u_2(x), u_2'(x))dx\\
&= aF(u) + (1-a)F(u)
\end{align*}

## Example 8

_If $f:\Omega\rightarrow \mathbb R$ is covex on $\Omega=(a,b)$, then $f$ is also continuous_. 

__lemma__ If $f:\Omega\rightarrow\mathbb R$ is convex, then $\forall x_1, x, x_2 \in \Omega, x_1\leq x\leq x_2. \frac{f(x) - f(x_1)}{x-x_1} \leq \frac{f(x_2) - f(x_1)}{x_2-x_1} \leq \frac{f(x_2) - f(x)}{x_2-x}$.  

_proof_. Let $x_1, x, x_2\in \Omega. x_1 < x < x_2$, note that $\frac{x_2 - x}{x_2 - x_1} \in [0, 1]$
Since $f$ is convex, 
$$f(x) = f(\frac{x_2 - x}{x_2 - x_1}x_1 + \frac{x-x_1}{x_2 - x_1} x_2) \leq \frac{x_2 - x}{x_2 - x_1}f(x_1) + \frac{x- x_1}{x_2 - x_1}f(x_2)$$
Then, the inequalities can be easily derived as 
\begin{align*}
\frac{f(x) - f(x_1)}{x-x_1}
&\leq \frac{1}{x-x_1}\big[\frac{x_2 - x}{x_2 - x_1}f(x_1) + \frac{x- x_1}{x_2 - x_1}f(x_2) - f(x_1)\big]\\
&= \frac1{x-x_1}\frac{x-x_1}{x_2-x_1}(f(x_2)-f(x_1))\\
&= \frac{f(x_2) - f(x_1)}{x_2-x_1}
\end{align*}
Similar derivation holds for 
$$\frac{f(x_2) - f(x)}{x_2 - x} \geq \frac{f(x_2) - f(x_1)}{x_2-x_1}$$

__Claim__ If $f$ is convex, then $\forall x_0\in (a, b), \lim_{x\rightarrow x_0} f(x) = f(x_0)$ ($f$ is continuous using the limit definition). 

_proof_. Let $c, d \in (a, b), a<c<x_0 < d<b$.  
Take functions $l_1(x) = \frac{f(x_0) - f(c)}{x_0 - c}(x-x_0) + f(x_0), l_2(x) = \frac{f(d) - f(x_0)}{d - x_0}(x-x_0) + f(x_0)$, where $l_1$ is the line pass through $(c, x_0)$ and $l_2$ is the line pass through $(x_0, d)$.  
Then, for any $x\in (x_0, d)$, since $f$ is convex and use our lemma above, we have 
$$\frac{f(x - x_0)}{x-x_0} \leq \frac{f(d) - f(x_0)}{d - x_0}$$
$$\frac{f(x) - f(c)}{x-c} \geq \frac{f(x_0) - f(c)}{x_0 - c}$$
so that 
$$f(x) = \frac{f(x - x_0)}{x-x_0}(x-x_0) + f(x_0) \leq \frac{f(d - x_0)}{d-x_0}(x-x_0) + f(x_0) = l_2(x)$$
$$f(x) = \frac{f(x) - f(c)}{x-c}(x-x_0) + f(x_0) \geq \frac{f(x_0-c)}{x_0-c}(x-x_0) + f(x_0) = l_1(x)$$
Since $\forall x\in (x_0, d), l_1(x)\leq f(x) \leq l_2(x)$ and $\lim_{x\rightarrow x_0+}l_1(x) = l_1(x_0) = f(x_0) = l_2(x_0) = \lim_{x\rightarrow x_0+}l_2(x)$, by squeeze theorem
$$\lim_{x\rightarrow x_0+}f(x) = f(x_0)$$

With the similar arguments, we can show that $\forall x\in (c, x_0), l_2(x) \leq f(x) \leq l_1(x)$, and use squeeze theorem, 
$$\lim_{x\rightarrow x_0-}f(x) = f(x_0)$$
Finally, the two limits from both sides conclude that 
$$\lim_{x\rightarrow x_0}f(x) = f(x_0)$$
Therefore, we have shown that $\forall x\in (a, b), \lim_{x\rightarrow x_0}f(x) = f(x_0)$, which means $f$ is continuous on $(a,b)$

## Example 9
_If $f:\Omega\rightarrow \mathbb R$ is continuous and convex and exists some maximum on the interior of $\Omega$, then $f$ is a constant function_

_proof._ Let $x_0 \in \Omega_{int}$ where $f(x_0)$ is the maximum.  
Assume $f$ is not constant. Take $x_1 \in \Omega$ s.t. $f(x_1) < f(x_0)$.   
Since $x_0$ is an interior point, take some $t\in(0, 1)$ s.t. $x_2 = x_0 - t(x_1 - x_0), x_2 \in B(x_0, \epsilon)\subset \Omega$ for some $\epsilon > 0$.  
Then, $x_2, x_0, x_1$ forms a line and $x_0 = \frac{t}{1+t}x_1 + \frac{1}{1+t}x_2$.   
By our assumption, $f(x_1) < f(x_0), f(x_2) \leq f(x_0)$, hence exists $c = \frac{1}{t} \in (0, 1)$
$$\frac{1}{1+t}f(x_1) + \frac{t}{1+t}f(x_2) < \frac{1}{1+t}f(x_0) + \frac{t}{1+t}f(x_0) = f(x_0)$$
This contradicts with the fact that $f$ is convex, by contradiction, $f$ must be a constant function. 

## Example 10

_Let $f: \Omega\rightarrow \mathbb R, f(x): a\cdot x + b$ where $\Omega$ is compact and convex subset of $\mathbb R^n$._  

### Part (a)
_If $a\neq 0$, then any minimizer of $f$ must be on $\partial \Omega$_  

_proof_. Suppose exists some minimizer $x_0 \in \Omega_{int}$, then take some $t > 0$ s.t. $x = x_0 - ta \in B(x_0, \epsilon) \subset \Omega$ 
$$f(x) = a\cdot (x_0 - ta) + b = a\cdot x_0 - t\|a\|^2 + b = f(x_0) - t\|a\|^2$$
If $a\neq 0$, then $t\|a\|^2 > 0, f(x) < f(x_0)$, $x_0$ is not a minimizer. By contradiction, the minimizer must be on $\partial \Omega$. 

### Part (b)
_Suppose $g(x) = \|x\|^2 + f(x)$, under what condition of $a$ can you guarantee that the minimizers do not occur in the interior of the set $\Omega$?_

Note that $\nabla g(x) = 2x + a$. Note that a point $x_*$ is not minimizer means that exists a feasible direction $d \in \mathbb R^n$ s.t. $\nabla g(x_*) < 0$.  
Because $\|x\|^2 + f(x)$ is continuous on $\mathbb R^n$, for some interior point $x_*$, all directions are feasible, therefore $x_*$ is not a minimizer implies that $\nabla g(x_*) =2x_* + a \neq 0$.  
Therefore, to guarantee that any interior point is not a minimizer, we want $a$ to satisfy that $\forall x\in\Omega_{int}, 2x +a \neq 0$

## Example 11
_If $f(x):\mathbb R^n\rightarrow\mathbb R$ is convex, then $g(x, z) : \mathbb R^n \times \mathbb R \rightarrow \mathbb R, g(x, z) := f(x) + \|x+z\|^2$ is also convex._ 

_proof_. 
Let $x_1, z_1, x_2, z_2\in \mathbb R^n, c\in [0, 1]$, consider 
\begin{align*}
g(c(x_1, z_1) + (1-c)(x_2, z_2)) &= f(cx_1 + (1-c)x_2) + \|cx_1  + (1-c)x_2 + cz_1 + (1-c)z_2\|^2\\
&\leq cf(x_1) + (1-c)f(x_2) &f\text{ is convex}\\
&\quad + \|cx_1 + cz_1\|^2 + \|(1-c)x_2 + (1-c)z_2\|^2 &\text{triangular inequality}\\
&= cf(x_1) + c\|x_1+z_1\|^2  + (1-c)f(x_2) + (1-c)\|x_2 + z_2\|^2\\
&= cf(x_1, z_1) + (1-c)f(x_2, z_2)
\end{align*}
By definition of convexity, $g$ is convex

## Example 12
_For $f: \mathbb R^n \rightarrow \mathbb R$ be $C^1$ function, define $M = \{(x, f(x))\in \mathbb R^{n+1}\}$, given $p = (x_0, f(x_0)) \in M$, find the tangent space $T_pM$_. 

Define $g(x, z) = f(x) - z$, note that $\nabla g(x, z) = [\nabla f(x), -1]\in\mathbb R^{n}\times \mathbb R$.  
Then, note that $g(p) = 0$ and the equation of the tangent plane where $p$ is on the plane is given as 
\begin{align*}
\nabla g(x_0, f(x_0))\cdot ((x, z) - (x_0, f(x_0))) &= 0\\
\nabla f(p)\cdot(x-x_0) + (-1)(z-f(x_0))&= 0\\
\nabla f(p) \cdot(x-x_0) + f(x_0)&=z
\end{align*}
Therefore, the tangent space is given as 
$$T_pM = \{(x, \nabla f(p)\cdot(x-x_0) + f(x_0): x\in\mathbb R^n\}$$