## <span style="color: forestgreen;">Optimization Algorithms - Lecture 1</span>

<span style="color: forestgreen;">Subhashis Mohanty (subhashis at alguile dot com), April 2021</span>

This notebook is ancillary to <a href="https://www.youtube.com/playlist?list=PLbZhQRxUimh57G5JtnDA6p9-MYmbHMeP9">Optimization Algorithms Lecture 1 (youtube playlist)</a>, and here is the associated <a href="https://github.com/striverconniver/OptimizationAlgorithms.git">Github repository</a>.

I suggest that you watch the <a href="https://www.youtube.com/watch?v=1w3fx40yWas&list=PLbZhQRxUimh57G5JtnDA6p9-MYmbHMeP9&index=1">introductory video</a> that explains the _raison d'etre_ for this lecture series.


<span style="color: brown;">Note: This notebook has been tested with Python 3.8, numpy 1.19.3, and matplotlib 3.3.4. It assumes that pip is installed. It is best to run this notebook in a virtual environment with Python 3.8 and pip.</span>

### <span style="color: forestgreen;">Lecture Contents</span>

1. <span style="color: forestgreen;">The Optimization Problem</span>
  
 * Example
 * Definition

    Covered in youtube <a href="https://www.youtube.com/watch?v=um1Rl3Nef7k&list=PLbZhQRxUimh57G5JtnDA6p9-MYmbHMeP9&index=2">Lecture 1, Part 2</a>
 
<p style="margin-bottom: 15px;"></p>
 
2. <span style="color: forestgreen;">Definitions from Metric Topology, Analysis, and Linear Algebra</span>

  * Metric Topology
    1. Metric spaces
    2. Neighborhood or a point in a metric space
    3. Limit points
    4. Open sets
    5. Closed sets
    6. Bounded sets

 * Real Analysis
    1. Continuity of a function
    2. Derivative of a function
    
 * Minimizers
    1. Global
    2. Local   
    
    Covered in youtube <a href="https://www.youtube.com/watch?v=6ivPAD-EiKU&list=PLbZhQRxUimh57G5JtnDA6p9-MYmbHMeP9&index=3">Lecture 1, Part 3</a>&nbsp;&nbsp;&nbsp;<a href="https://www.youtube.com/watch?v=6mgKJv_C2rM&list=PLbZhQRxUimh57G5JtnDA6p9-MYmbHMeP9&index=4">Lecture 1, Part 4</a>&nbsp;&nbsp;&nbsp;<a href="https://www.youtube.com/watch?v=zvktWWtnS2A&list=PLbZhQRxUimh57G5JtnDA6p9-MYmbHMeP9&index=5">Lecture 1, Part 5</a>&nbsp;&nbsp;&nbsp;<a href="https://www.youtube.com/watch?v=aTz6AENt8OE&list=PLbZhQRxUimh57G5JtnDA6p9-MYmbHMeP9&index=6">Lecture 1, Part 6</a>
    
<p style="margin-bottom: 15px;"></p>    
      
 * Analysis and Linear Algebra
    1. Positive Semi-Definite and Positive Definite matrices
    2. Gradient of a function
    3. Jacobian of a function
    4. Hessian of a function
    
    Covered in youtube <a href="https://www.youtube.com/watch?v=Nz0frz07bzc&list=PLbZhQRxUimh57G5JtnDA6p9-MYmbHMeP9&index=9">Lecture 1, Part 9</a>
<p style="margin-bottom: 15px;"></p>
 
4. <span style="color: forestgreen;">Recognizing Local Minimizers</span>
 
 * First Order Necessary Conditions (FONC)
 
    Covered in youtube <a href="https://www.youtube.com/watch?v=dfB6NryaO48&list=PLbZhQRxUimh57G5JtnDA6p9-MYmbHMeP9&index=7">Lecture 1, Part 7</a>&nbsp;&nbsp;&nbsp;<a href="https://www.youtube.com/watch?v=tSIGhbZqs_4&list=PLbZhQRxUimh57G5JtnDA6p9-MYmbHMeP9&index=8">Lecture 1, Part 8</a>
<p style="margin-bottom: 15px;"></p>

 * Second Order Necessary Condtions (SONC)
 
    Covered in youtube <a href="https://www.youtube.com/watch?v=Nz0frz07bzc&list=PLbZhQRxUimh57G5JtnDA6p9-MYmbHMeP9&index=9">Lecture 1, Part 9</a>&nbsp;&nbsp;&nbsp;<a href="https://www.youtube.com/watch?v=UDRDZ5nSbLw&list=PLbZhQRxUimh57G5JtnDA6p9-MYmbHMeP9&index=10">Lecture 1, Part 10</a>&nbsp;&nbsp;&nbsp;<a href="https://www.youtube.com/watch?v=apgrQgC1-r4&list=PLbZhQRxUimh57G5JtnDA6p9-MYmbHMeP9&index=11">Lecture 1, Part 11</a>
<p style="margin-bottom: 15px;"></p>
 
 * Second Order Sufficient Conditions (SOSC)
 
    Covered in youtube <a href="https://www.youtube.com/watch?v=PbnSRLNrd34&list=PLbZhQRxUimh57G5JtnDA6p9-MYmbHMeP9&index=12">Lecture 1, Part 12</a>


### <span style="color: forestgreen;">The Optimization Problem</span><a class="anchor" id="theoptprob"></a>

Examples before definitions and abstractions is a good maxim to follow. Examples seem to open up our neural pathways and make them more receptive to arid and disembodied definitions and abstractions, while also lending them body and texture. With that in mind I will begin with an example replete with python code and plots before writing down the definition of the Optimization Problem.

#### <span style="color: forestgreen;">Example of an Optimization Problem</span>  

The example I have chosen is the minimization of a version of the normal distribution. If you are not familiar with the normal distribution, don't sweat it. It isn't my objective at this point to introduce and explain the normal distribution, but rather to use it to ground the meaning of the Optimization Problem. There will be opportunity yet to explore the details and nuance of this single most important distribution in all of Statistics in a later lecture.

The specific version of the normal distribution we shall use as our example is the _bivariate, uncorrelated, translated jointly normal distribution_; here is the probability density function (PDF) <span style="color: black;">$f: \mathbb{R}^2 \rightarrow \mathbb{R}$</span>:

<p></p>

<center><span style="font-weight: lighter;">$\large f(x_1, x_2) = \frac{1}{2\pi} \; e^{\frac{-\;[(x_1\, -\, 1)^2\, +\, (x_2\, -\, 2)^2]}{2}}\tag{1}$</span></center>

<span>The general bivariate jointly normal PDF is:</span>

<center><span>$\large f_{G}(x_1, x_2) = \frac{1}{2\pi\sigma_{X_1}\sigma_{X_2}\sqrt{1\, -\, \rho_{X_1X_2}^2}}e^{-\; \frac{1}{2(1\, -\, \rho_{X_1X_2}^2)}\big[\big(\frac{x_1\, -\, \mu_{X_1}}{\sigma_{X_1}}\big)^2\, -\, \frac{2\rho_{X_1X_2}(x_1\, -\, \mu_{X_1})(x_2\, - \, \mu_{X_2})}{\sigma_{X_1}\sigma_{X_2}}\, +\, \big(\frac{x_2\, -\, \mu_{X_2}}{\sigma_{X_2}}\big)^2\big]},$</span></center>
<p style="margin-top: 30px;"></p>
<span>from which we can deduce that </span>

* <span>the mean of the distribution in (1), $\mu = [\mu_{X_1}\;\; \mu_{X_2}]^T = [1\;\; 2]^T$;</span>
* <span>$\sigma_{X_1}^2 = 1 = \sigma_{X_2}^2$, i.e. the variances of both the random variables $X_1$ and $X_2$ is one (and thus the standard deviations of both is also one);
* <span>the correlation coefficient between $X_1$ and $X_2$, $\rho_{X_1X_2} = 0$.</span>

The only PDF facts that are relevant for our purposes are:

* $f$ is a function of two variables $x_1$ and $x_2$;
* it has a peak at $(x_1, x_2) = (1, 2)$, the mean of the PDF;
* its value drops drops rapidly, radially away from the mean of the distribution.

Given the extraordinary symmetry in the particular PDF $f$ we have chosen, its _level set curves_,

<p></p>

<center><span style="font-weight: lighter;">$\large L_c = \{(x_1, x_2)\; |\; f(x_1, x_2) = c\}, \tag{2}$</span></center>

i.e. its _contour lines_ (lines with equal values of $f$) are _circles centered at_ $(1, 2)$. See Figure 1, immediately below the Python code that plots our PDF.

In [None]:
# Install numpy and matplotlib pip packages in the current Jupyter kernel;
# we know these specific versions, numpy 1.19.3 and matplotlib 1.15.0, work.
import sys
!{sys.executable} -m pip install numpy==1.19.3
!{sys.executable} -m pip install matplotlib==3.3.4

In [None]:
###############################################################################
import numpy as np
from matplotlib import pyplot as plt
###############################################################################
import SMPlotUtilities as smplt
###############################################################################

In [None]:
from IPython.core.display import HTML
HTML("""
<style>
.output_png {
    display: table-cell;
    text-align: center;
    vertical-align: middle;
}
</style>
""")

In [None]:
###############################################################################
plt.rcParams['figure.figsize'] = [smplt.mpSize["w"], smplt.mpSize["h"]]
###############################################################################
fig, ax = plt.subplots()
###############################################################################
smplt.EnvelopeRoundedCompact(ax, smplt.mpRndBnds)
###############################################################################
x1, x2, f, levellist = smplt.BivariateNormalCompact(smplt.mpPts)
###############################################################################                     
# Setup the PDF as a contour plot
###############################################################################
cp = ax.contour(x1, x2, f, levels=levellist, linewidths=2)
###############################################################################
# Set a colorbar legend to appear with the plot
###############################################################################
fig.colorbar(cp)
###############################################################################
# Title, axis labels, extents, grid color, background and line style
###############################################################################
smplt.SetupPlotCompact(ax, plt, "$x_1$", "$x_2$", smplt.mpBnds, "",
                "Figure 1: Bivariate, uncorrelated, translated normal\
 distribution - Level set curves", smplt.mpCapLoc)
###############################################################################
plt.show()
###############################################################################

Our goal is to find those values of $x = (x_1, x_2)$ for which $f$ has the lowest<sup style="color: brown;">&dagger;</sup> value; but $f$ doesn't have any lowest value in $\mathbb{R}^2$, though it does have a lowest value in ${\overline{\mathbb{R}}}^2$, the
extended plane<sup style="color: brown;">&ddagger;</sup>, i.e. $\forall x_1, x_2 \in \mathbb{R}$,

<p></p>

<center><span style="font-weight: lighter;">$\{x\, |\, \text{lowest value of}\, f(x)\} = \{(x_1, \pm\infty), (\pm\infty,\, x_2), (\pm\infty,\, \pm\infty)\}.\tag{3}$</span></center>

<p></p>
<br><br>
<p></p>


<span style="font-size: 85%;"><sup style="color: brown;">&dagger;</sup>&nbsp; I am deliberately avoiding using the word _minimum_ since _mimimum_ and <a href="#optdef">_minimizers_</a> will be defined shortly.</span>

<span style="font-size: 85%;"><sup style="color: brown;">&ddagger;</sup>&nbsp; See <a href="#references">[1] Rudin</a> Definition 1.23 for a definition of the the extended real number system; informally, the extended real number systems adds the two symbols $\pm \infty$ to the reals and identifies what operations involving them are legal.</span>

<p></p>

But let's make the problem more real &mdash; closer to what one might expect in a practical situation. Let us impose some constraints on the problem of finding values of $x = (x_1, x_2)$ for which $f$ has the lowest value &mdash; let us add two constraints. I am using <span style="color: orange;">orange color for $c_1$</span> and <span style="color: red;">red color for $c_2$</span> as cues since those are their colors in the plots below. 

##### <span style="color: orange;"><u>Constraint $c_1:$</u></span>

<p></p>

<center><span style="color: orange;">$x_2 \ge 2x_1 - 2\tag{4}$</span></center>

##### <span style="color: red;"><u>Constraint $c_2:$</u></span>

<p></p>

<center><span style="color: red;">$x_2 \le -2(x_1 - 2)^2 + 2\tag{5}$</span></center>

<p></p>
<br>
<p></p>

Let us examine constraints $c_1$ and $c_2$ before plotting them. The equation of a straight line is

<p></p>

<center><span style="color: orange;">$x_{2} = mx_{1} + b\tag{6}$</span></center>

The slope of the line in (5) is $m$ and the $x_2$-intercept, the intersection of the $x_2$ axis with the line, is $b$. Briefly changing the inequality <span style="color: orange;">$c_1$</span> to an equality in (4),


<p></p>

<center><span style="color: orange;">$c_1^{\text{equality}}: x_{2} = 2x_{1} + (-2),\tag{7}$</span></center>

what we have is also an equation of a line with a slope $m = 2$, and an $x_2$-intercept $b = -2$. Figure 2 below adds <span style="color: orange;">$c_1^{\text{equality}}$</span> to Figure 1.

In [None]:
###############################################################################
plt.rcParams['figure.figsize'] = [smplt.mpSize["w"], smplt.mpSize["h"]]
###############################################################################
fig, ax = plt.subplots()
###############################################################################
smplt.EnvelopeRoundedCompact(ax, smplt.mpRndBnds)
###############################################################################
x1, x2, f, levellist = smplt.BivariateNormalCompact(smplt.mpPts)
###############################################################################                     
# Setup the PDF as a contour plot
###############################################################################
cp = ax.contour(x1, x2, f, levels=levellist, linewidths=2)
###############################################################################
# Set a colorbar legend to appear with the plot
###############################################################################
fig.colorbar(cp)
###############################################################################
# Constraint c1
###############################################################################
x1c = smplt.mpC1["rng"]
c1 = 2*x1c - 2
ax.plot(x1c, c1, smplt.mpC1["attr"]["color"], smplt.mpC1["attr"]["lw"],
        smplt.mpC1["attr"]["zorder"])
plt.text(smplt.mpC1["txtx"], smplt.mpC1["txty"], smplt.mpC1["ineq"],
         smplt.mpC1["tattr"])
###############################################################################
# Title, axis labels, extents, grid color, background and line style
###############################################################################
smplt.SetupPlotCompact(ax, plt, "$x_1$", "$x_2$", smplt.mpBnds, "",
                       smplt.mpC1["cap"], smplt.mpCapLoc) 
###############################################################################
plt.show()

Let us now incorporate <span style="color: orange;">$c_1$</span> instead of <span style="color: orange;">$c_1^{\text{equality}}$</span> into our analysis: what the inequality is telling us is that the only permissible values of $(x_1, x_2)$ over which we are to seek the lowest value of $f$ has to lie on or above the <span style="color: orange;">orange line</span>.

<p>
<br>
</p>

In connection with constraint <span style="color: red;">$c_2$</span>, let us first consider the square function:

<p></p>

<center><span style="color: red;">$x_{2} = x_1^2,\tag{8}$</span></center>

<p></p>

whose graph is plotted in Figure 3.

In [None]:
###############################################################################
plt.rcParams['figure.figsize'] = [smplt.mpSize["w"], smplt.mpSize["h"]]
###############################################################################
fig, ax = plt.subplots()
###############################################################################
smplt.EnvelopeRoundedCompact(ax, smplt.sfRndBnds)
###############################################################################
x1 = smplt.sf["rngb"]
x2 = x1**2
ax.plot(x1, x2, color="Chartreuse", lw=2)
plt.text(smplt.sf["ebx"], smplt.sf["eby"], smplt.sf["eqb"],
         smplt.sf["tattrC"])
plt.text(smplt.sf["tbx"], smplt.sf["tby"], smplt.sf["tb"],
         smplt.sf["tattrCs"])
###############################################################################
# Title, axis labels, extents, grid color, background and line style
###############################################################################
smplt.SetupPlotCompact(ax, plt, "$x_1$", "$x_2$", smplt.sfBnds, "",
                smplt.sf["cap"], smplt.sfCapLoc)
###############################################################################
plt.show()

Briefly changing the inequality <span style="color: red;">$c_2$</span> to an equality in (5),

<p></p>

<center><span style="color: red;">$c_2^{\text{equality}}: x_{2} = -2(x_{1} - 2)^2 + 2,\tag{9}$</span></center>

what we have is also an equation of square function, which we can get to from (8) in four easy steps.

<span style="color: red;">$(8) \rightarrow (9)$</span>

<p></p>

Step 1 (translated):
<br><br>
<center><span style="color: yellow; background: black;  border-radius: 10px; padding: 10px;">$x_{2} = (x_1 - 2)^2,$</span></center>

<p></p>

Step 2 (translated, squared, then scaled):
<br><br>
<center><span style="color: DodgerBlue; background: black;  border-radius: 10px; padding: 10px;">$x_{2} = 2(x_1 - 2)^2,$</span></center>

<p></p>

Step 3 (translated, squared, scaled and inverted):
<br><br>
<center><span style="color: HotPink; background: black;  border-radius: 10px; padding: 10px;">$x_{2} = -2(x_1 - 2)^2,$</span></center>

<p></p>

Step 4 (translated, squared, scaled, inverted, and then translated again):
<br><br>
<center><span style="color: red; background: black;  border-radius: 10px; padding: 10px;">$x_{2} = -2(x_1 - 2)^2 + 2,$</span></center>

<p><br></p>

plotted in Figure 4.

In [None]:
###############################################################################
plt.rcParams['figure.figsize'] = [smplt.mpSize["w"], smplt.mpSize["h"]]
###############################################################################
fig, ax = plt.subplots()
###############################################################################
smplt.EnvelopeRoundedCompact(ax, smplt.sfRndBnds)
###############################################################################
# Baseline
###############################################################################
x1 = smplt.sf["rngb"]
x2 = x1**2
ax.plot(x1, x2, color="Chartreuse", lw=2, zorder=1)
plt.text(smplt.sf["ebx"], smplt.sf["eby"], smplt.sf["eqb"], smplt.sf["tattrC"])
###############################################################################
# Step 1
###############################################################################
x1 = smplt.sf["rng1"]
x2 = (x1 - 2)**2
ax.plot(x1, x2, color="yellow", lw=2, zorder=2)
plt.text(smplt.sf["t1tx"], smplt.sf["t1ty"], smplt.sf["t1t"],
         smplt.sf["tattrYs"])
plt.text(smplt.sf["e1x"], smplt.sf["e1y"], smplt.sf["eq1"], smplt.sf["tattrY"])
plt.text(smplt.sf["t1x"], smplt.sf["t1y"], smplt.sf["t1"], smplt.sf["Y"])
###############################################################################
# Step 2
###############################################################################
x1 = smplt.sf["rng1"]
x2 = 2*(x1 - 2)**2
ax.plot(x1, x2, color="DodgerBlue", lw=2, zorder=3)
plt.text(smplt.sf["t2tx"], smplt.sf["t2ty"], smplt.sf["t2t"],
        smplt.sf["tattrDBs"])
plt.text(smplt.sf["e2x"], smplt.sf["e2y"], smplt.sf["eq2"], smplt.sf["tattrDB"])
plt.text(smplt.sf["t2x"], smplt.sf["t2y"], smplt.sf["t2"], smplt.sf["DB"])
###############################################################################
# Step 3
###############################################################################
x1 = smplt.sf["rng1"]
x2 = -2*(x1 - 2)**2
ax.plot(x1, x2, color="HotPink", lw=2, zorder=4)
plt.text(smplt.sf["t3tx"], smplt.sf["t3ty"], smplt.sf["t3t"],
        smplt.sf["tattrHPs"])
plt.text(smplt.sf["e3x"], smplt.sf["e3y"], smplt.sf["eq3"], smplt.sf["tattrHP"])
plt.text(smplt.sf["t3x"], smplt.sf["t3y"], smplt.sf["t3"], smplt.sf["HP"])
###############################################################################
# Step 4
###############################################################################
x1 = smplt.sf["rng1"]
x2 = -2*(x1 - 2)**2 + 2
ax.plot(x1, x2, color="red", lw=2, zorder=5)
plt.text(smplt.sf["t4tx"], smplt.sf["t4ty"], smplt.sf["t4t"],
        smplt.sf["tattrRs"])
plt.text(smplt.sf["e4x"], smplt.sf["e4y"], smplt.sf["eq4"], smplt.sf["tattrR"])
plt.text(smplt.sf["t4x"], smplt.sf["t4y"], smplt.sf["t4"], smplt.sf["R"])
plt.text(smplt.sf["t4bx"], smplt.sf["t4by"], smplt.sf["tb"], smplt.sf["R"])
plt.text(smplt.sf["e4sx"], smplt.sf["e4sy"], smplt.sf["eqs4"],
         smplt.sf["tattrR"])
plt.text(smplt.sf["e4px"], smplt.sf["e4py"], smplt.sf["eqp4"],
         smplt.sf["tattrR"])
plt.scatter([smplt.sf["e4tx"]], [smplt.sf["e4ty"]], s=60, color='r', zorder=50)
###############################################################################
# Title, axis labels, extents, grid color, background and line style
###############################################################################
smplt.SetupPlotCompact(ax, plt, "$x_1$", "$x_2$", smplt.sfBnds, "",
                smplt.sf["cap2"], smplt.sfCapLoc)
###############################################################################
plt.show()

Figure 5 below adds <span style="color: red;">$c_2^{\text{equality}}$</span> to Figure 2.

In [None]:
###############################################################################
plt.rcParams['figure.figsize'] = [smplt.mpSize["w"], smplt.mpSize["h"]]
###############################################################################
fig, ax = plt.subplots()
###############################################################################
smplt.EnvelopeRoundedCompact(ax, smplt.mpRndBnds)
###############################################################################
#x1_points = np.arange(-0.25, 3.25, 0.01)
#x2_points = np.arange(-0.25, 3.25, 0.01)
###############################################################################
x1, x2, f, levellist = smplt.BivariateNormalCompact(smplt.mpPts)
###############################################################################                     
# Setup the PDF as a contour plot
###############################################################################
cp = ax.contour(x1, x2, f, levels=levellist, linewidths=2)
###############################################################################
# Set a colorbar legend to appear with the plot
###############################################################################
fig.colorbar(cp)
###############################################################################
# Constraint c1
###############################################################################
x1c = smplt.mpC1["rng"]
c1 = 2*x1c - 2
ax.plot(x1c, c1, smplt.mpC1["attr"]["color"], smplt.mpC1["attr"]["lw"],
        smplt.mpC1["attr"]["zorder"])
plt.text(smplt.mpC1["txtx"], smplt.mpC1["txty"], smplt.mpC1["ineq"],
         smplt.mpC1["tattr"])
###############################################################################
# Constraint c2
###############################################################################
c2 = -2*(x1c - 2)**2 + 2
ax.plot(x1c, c2, smplt.mpC2["attr"]["color"], smplt.mpC2["attr"]["lw"],
        smplt.mpC2["attr"]["zorder"])
plt.text(smplt.mpC2["txtx"], smplt.mpC2["txty"], smplt.mpC2["ineq"],
         smplt.mpC2["tattr"])
###############################################################################
# Fill the "feasible" region with color DodgerBlue                                                       
###############################################################################                    
ax.fill_between(x1c, c1, c2, where=(c1 <= c2), facecolor='DodgerBlue', zorder=2)
###############################################################################
# Indicate the feasible region
###############################################################################
smplt.DrawArrow(plt, smplt.mpC2["arrowsx"], smplt.mpC2["arrowsy"],
                smplt.mpC2["arrowdx"], smplt.mpC2["arrowdy"],
                smplt.pCol["DB"], 2, 0.1, True, 0.1)
plt.text(smplt.mpC2["fsx"], smplt.mpC2["fsy"], 'feasible region',
         color=smplt.pCol["DB"], fontsize="large", zorder=20)
###############################################################################
# Mark the lowest value of f in the feasible region
###############################################################################
plt.scatter([smplt.mpC2["spx"]], [smplt.mpC2["spy"]], s=80, zorder=50,
            color=smplt.pCol["DB"])
plt.text(smplt.mpC2["sptx"], smplt.mpC2["spty"], smplt.mpC2["sptxt"],
         color=smplt.pCol["DB"], fontsize="large", zorder=20)
###############################################################################
# Title, axis labels, extents, grid color, background and line style
###############################################################################
smplt.SetupPlotCompact(ax, plt, "$x_1$", "$x_2$", smplt.mpBnds, "",
                       smplt.mpC2["cap"], smplt.mpCapLoc) 
###############################################################################
plt.show()

Let us now incorporate <span style="color: red;">$c_2$</span> instead of <span style="color: red;">$c_2^{\text{equality}}$</span> into our analysis: what the inequality is telling us is that the only permissible values of $(x_1, x_2)$ over which we are to seek the lowest value of $f$ has to lie on or below the <span style="color: red;">red curve</span> in Figure 5 (while simultaneously constrained to those $(x_1, x_2)$ that are on or above the <span style="color: orange;">orange line</span> from our analysis above). The <span style="color: DodgerBlue;">blue</span> region in Figure 5, the region circumscribed  by the constraints <span style="color: orange;">$c_1$</span> and <span style="color: red;">$c_2$</span> is known as the <span style="color: DodgerBlue;">feasible region</span>, and it is there that the lowest value of $f$ is to be sought.

<p></p>

Clearly <span style="color: DodgerBlue;">$(x_1, x_2) = (1, 0)$</span> yields the lowest value of $f$ in the <span style="color: DodgerBlue;">feasible region</span> since it is furtherest away from the mean $(x_1, x_2) = (1, 2)$, and thus is the _minimizer_ (see next section for definition of _minimizer_) of $f$.

#### <span style="color: forestgreen;">Optimization Problem &mdash; Formal Specification</span><a class="anchor" id="optdef"></a> 

Now that we have seen a concrete example replete with code, plots, and analysis, the definition of the Optimization Problem should be a cinch. In an Optimization Problem, there is an

* Objective or Cost or Loss Function

<p></p>

<center><span style="font-weight: lighter;">$\large f: \mathbb{R}^n \rightarrow \mathbb{R}, \quad n \in \mathbb{N}^{+}$&nbsp;&nbsp;&nbsp;&nbsp;(see <a href="#glossary">Glossary</a> for definition of $\large\mathbb{N}^{+}$)</span></center> 

<span style="margin-left: 30px;">which is to be maximized or minimized subject to</span>

* Constraints $c_i$ on the domain $\mathbb{R}^n$, with $i \in I$, some index set.

<p style="margin-left: 30px;">[Some authors maintain a distinction between constraints that express inequalities vs. equalities, and we may when convenient do the same.]</p>

The _maximizer_ and _minimizer_ of $f$ are defined respectively as

<p></p>

<center><span style="font-weight: lighter;">$\large x^{\star} \equiv \underset{x\, \in\, \mathbb{R}^n}{\operatorname{argmax}} f(x)\tag{10}$</span></center>

<p>and</p>

<center><span style="font-weight: lighter;">$\large \large x_{\star} \equiv \underset{x\, \in\, \mathbb{R}^n}{\operatorname{argmin}} f(x),\tag{11}$</span></center>

subject to $c_i$. The _maximum_ and _minimum_ value of $f$ are defined respectively as
    
<p></p>

<center><span style="font-weight: lighter;">$\large \underset{x\, \in\, \mathbb{R}^n}{\operatorname{\max}} f(x) \equiv f(x^\star)\tag{12}$</span></center>

<p>and</p>

<center><span style="font-weight: lighter;">$\large \underset{x\, \in\, \mathbb{R}^n}{\operatorname{\min}} f(x) \equiv f(x_\star).\tag{13}$</span></center>

<p></p>

Note that that every problem of _maximization_ can be reduced to a problem of
_minimization_ via

<p></p>

<center><span style="font-weight: lighter;">$\large x^{\star} \equiv \underset{x\, \in\, \mathbb{R}^n}{\operatorname{argmin}} -f(x).\tag{14}$</span></center>


Thus our discussion of optimization will exclusively focus on _minimization_ hereafter. 

Note that there can be many, one, or no _minimizer_ of $f$. See Figure 6 below which shows three different functions $f: \mathbb{R} \rightarrow \mathbb{R}$.

<p style="color: forestgreen;">Zero <i>minimizers</i>:</p>

<center><span style="font-weight: lighter; color: forestgreen;">$x_2 = x_1.$</span></center>

<p style="color: red;">One <i>minimizer</i> at $x_1 = 0$:</p>

<center><span style="font-weight: lighter; color: red;">$x_2 = x_1^2.$</span></center>

<p style="color: DodgerBlue;">Countably infinite <i>minimizers</i> at $x_1 = \frac{(2n + 1)\pi}{2}, n \in \mathbb{Z}$:</p>

<center><span style="font-weight: lighter; color: DodgerBlue;">$x_2 = -\sin^2(x_1).$</span></center>

In [None]:
###############################################################################
plt.rcParams['figure.figsize'] = [28/2.54, 22.4/2.54]
###############################################################################
fig, ax = plt.subplots()
###############################################################################
smplt.EnvelopeRounded(ax, -1.9, 4.9, -1.9, 3.9)
###############################################################################
x1 = np.arange(-2, 5, 0.01)
x2 = -(np.sin(x1))**2
ax.plot(x1, x2, color="DodgerBlue", lw=2)
plt.text(2.1, -1.4, '$\mathbf{x_2 = -\sin^2(x_1)}$', color="DodgerBlue",
         fontweight='bold',fontsize='large')
plt.text(2.1, -1.8, 'Countably infinite minimizers', color="DodgerBlue",
         fontsize='large')
plt.scatter([-np.pi/2, np.pi/2, 3*np.pi/2], [-1.0, -1.0, -1.0], s=80,
            color='DodgerBlue', zorder=50)
plt.text(1.15, -0.7, '$\mathbf{(\pi/2, -1)}$', color="DodgerBlue",
         fontsize='large')
plt.text(4.05, -1.3, '$\mathbf{(3\pi/2, -1)}$', color="DodgerBlue",
         fontsize='large')
plt.text(-1.95, -0.7, '$\mathbf{(-\pi/2, -1)}$', color="DodgerBlue",
         fontsize='large')
###############################################################################
x1 = np.arange(-1.95, 3.97, 0.01)
x2 = x1
ax.plot(x1, x2, color="forestgreen", lw=2)
plt.text(3.5, 2.6, '$\mathbf{x_2 = x_1}$', color="forestgreen",
         fontweight='bold',fontsize='large')
plt.text(3.5, 2.2, 'Zero minimizers', color="forestgreen", fontsize='large')
plt.text(0.05, -0.3, '$\mathbf{(0, 0)}$', color="red", fontsize='large',
         zorder=50)
###############################################################################
x1 = np.arange(-1.97, 2, 0.01)
x2 = x1**2
ax.plot(x1, x2, color="red", lw=2)
plt.text(0.25, 3.6, '$\mathbf{x_2 = x_1^2}$', color="red",
         fontweight='bold',fontsize='large')
plt.text(0.25, 3.2, 'One minimizer', color="red", fontsize='large')
plt.scatter([0.0], [0.0], s=80, color='red', zorder=50)
###############################################################################
# Title, axis labels, extents, grid color, background and line style
###############################################################################
smplt.SetupPlot(ax, plt, "$x_1$", "$x_2$", -2, 5, -2, 4, "",
                "Figure 6: Zero, one, and countably infinite minimizers",
                1.33, -2.75)
###############################################################################
plt.show()

### <span style="color: forestgreen;">References</span><a class="anchor" id="references"></a>

1. Rudin, Principles of Mathematical Analysis, 3rd Edition, 1976, McGraw Hill.
2. Nocedal and Wright, Numerical Optimization, 2nd Edition, 2006, Springer.
3. Kochenderfer and Wheeler, Algorithms for Optimization, 2019, The MIT Press.

### <span style="color: forestgreen;">Glossary</span><a class="anchor" id="glossary"></a>

* $\mathbb{N} \equiv \{0, 1, 2, 3, \ldots\}$
* $\mathbb{N}^{+}\equiv \mathbb{N} \setminus \{0\}$
* $\mathbb{Z} \equiv \{0, \pm 1, \pm 2, \pm 3, \ldots\}$