# Visualization of Optimization Algorithms

## CS519: Scientific Visualization - Fall 2020

### netid: <mark>mmorais2</mark>

In [1]:
from IPython.display import display, HTML

### Table of Contents
1. Abstract
1. Introduction
1. Test Functions
    1. Rosenbrock Function
    2. Goldstein-Price Function
    3. Bartels-Conn Function
    4. Egg Crate Function
1. Optimization Algorithms
    1. Gradient Descent
        1. Gradient Descent: Rosenbrock
        1. Gradient Descent: Goldstein-Price
    2. BFGS
        1. BFGS: Rosenbrock
        1. BFGS: Goldstein-Price
    3. Simulated Annealing
        1. Simulated Annealing: Rosenbrock
        1. Simulated Annealing: Goldstein-Price
        1. Simulated Annealing: Bartels-Conn
        1. Simulated Annealing: Egg Crate
    4. Particle Swarm
        1. Particle Swarm: Rosenbrock
        1. Particle Swarm: Goldstein-Price
        1. Particle Swarm: Bartels-Conn
        1. Particle Swarm: Egg Crate
1. Results
1. Discussion
1. References
1. Appendix: Links to Animations
1. Appendix: Source Code
    1. Library Imports
    1. Library Versions
    1. Surface Generation
    1. Rosenbrock Function
    1. Goldstein-Price Function
    1. Bartels-Conn Function
    1. Egg Crate Function
    1. Gradient Descent
    1. BFGS
    1. Simulated Annealing
    1. Particle Swarm
    1. Static 2d simulation result plot
    1. Static 3d simulation result plot
    1. Animated 2d simulation result plot

# Abstract

Optimization algorithms seek to find the best solution $x^{*}$ from a set S such that $f(x^{*}) \leq f(x)$ for all $x$ in S. For this project we describe and implement a handful of optimization algorithms, evaluate their performance on some well known test functions, and create visualizations to build some intuition and verify their function.  Finally, we perform a comparative analysis between algorithms using repeated trials on these test functions in order to draw broader conclusions. 

# Introduction

Given a fuction $f: \mathbb{R}^n \rightarrow \mathbb{R}$ and the set $S \in \mathbb{R}^n$ the solution to an optimization problem seeks to find $x^{*} \in S$ such that $f(x^{*}) \leq f(x)$ for all $x$ in S.  The function $f$ is referred to as the objective function and the set $S$ is referred to the as the constraints.

In this project we will examine 4 different algorithms for solving optimization problems.  Each algorithm represents a more general class of problem-solving approaches to optimzation.

In order to understand how each algorithm functions as well as appreciate the strengths and weakness, we choose a set of test functions of varying difficulty.  These functions are described in the next section. 

# Test Functions

The table below summarizes the test functions used in the project.  Each test function is classified according to criteria given for optimization test functions in Jamil and Yang (2013) [[4](#References)].

* Continous
    * A continuous function has a value $f(x)$ for every $x$ in the domain $S$.
* Differentiable
    * Gradient-based optimization methods require the objective function to be differentiable within the search domain.  Differentiability criteria are similarly restricted.  
    * Examples of functions that are not differentiable are absolute value functions or functions with asymptote(s).
* Unimodal vs. Multimodal
    * A unimodal function is either monotonically increasing or decreasing.  As a result, the function has no local minima or maxima.
    * Some of the optimization algorithms we investigate use only gradient information to decide on a search direction and are guaranteed to find the global minimum of a unimodal function.  Although gradient-based methods can be used on multimodal functions, there is no guarantee that they will find the global minimum.
* Separability
    * Degree of independence of parameters in the search space.
    * Certain kinds of optimization algorithms can exploit separability during their search, but this criteria is unimportant for the algorithms used in this project.

| Test Function Name | Criteria |
|-|-|
| Rosenbrock | Continuous, Differentiable, Non-Separable, Unimodal |
| Goldstein-Price | Continuous, Differentiable, Non-Separable, Multimodal |
| Bartels-Conn | Continuous, Non-Differentiable, Non-Separable, Multimodal |
| Egg Crate | Continuous, Differentiable, Separable, Multimodal |

## Rosenbrock Function

The Rosenbrock function [[3](#References)] is characterized by a parabolic shaped flat valley bounded by steep canyon walls. 

$$
f(x_1, \cdots, x_D) = \sum_{i}^{D-1} [ 100(x_{i+1} - x_i^2)^2 + (1 - x_i)^2]
$$

Global minimum is located at $x^* = f(1, \cdots, 1)$ and $f(x^*) = 0$.

### Classification
* Continuous
* Differentiable
* Non-Separable
* Unimodel

### Implementation Details
The domain of the Rosenbrock function used in this project is restricted to (-2,2) along both $x_1$ and $x_2$ dimensions.

source code: [Rosenbrock-Function:-Source-Code](#Rosenbrock-Function:-Source-Code)

### Rosenbrock: Visualization
A 2d filled contour plot of the Rosenbrock function is shown below.
* The global minimum is labeled with a diamond.
* There are 12 regularly sampled points superimposed on the plot and labeled as $x_0$ that are used to initialize separate trials of each optimization algorithm.

<div align="middle">
    <img src="./sims/rosenbrock-plot2d.png">
</div>

## Goldstein-Price Function

The Goldstein-Price [[4](#References)] function has a single global minimum, but many local minima and several orders of magnitude difference in range.

$$
f(x_1, x_2) = [1 + (x_1 + x_2 + 1)^2 (19 - 14 x_1 + 3 x_1^2 - 14 x_2 + 6 x_1 x_2 + 3 x_2^2)] \\ \times [30 + (2 x_1 - 3 x_2)^2 (18 - 32 x_1 + 12 x_1^2 + 48 x_2 - 36 x_1 x_2 + 27 x_2^2)]
$$

Global minimum is located at $x^* = f(0, -1)$ and $f(x^*) = 3$.

### Classification
* Continuous
* Differentiable
* Non-Separable
* Multimodal

### Implementation Details
The domain of the Goldstein-Price function used in this project is restricted to (-2,2) along both $x_1$ and $x_2$ dimensions.

source code: [Goldstein-Price-Function:-Source-Code](#Goldstein-Price-Function:-Source-Code)

### Goldstein-Price: Visualization
A 2d filled contour plot of the Goldstein-Price function is shown below.
* The global minimum is labeled with a diamond.
* There are 12 regularly sampled points superimposed on the plot and labeled as $x_0$ that are used to initialize separate trials of each optimization algorithm.

<div align="middle">
    <img src="./sims/goldstein_price-plot2d.png">
</div>

## Bartels-Conn Function

The Bartels-Conn [[2](#References)] function is characterized by a central elliptical valley with discontinuities appearing at all points along the descent direction. 

$$
f(x_1, x_2) =  |x_1^2 + x_2^2 + x_1 x_2| + |\sin(x_1)| + |\cos(x_2)|
$$

Global minimum is located at $x^* = f(0, 0)$ and $f(x^*) = 1$.

### Classification
* Continuous
* Non-Differentiable
* Non-Separable
* Multimodal

### Implementation Details
The domain of the Bartels-Conn function used in this project is restricted to (-5,5) along both $x_1$ and $x_2$ dimensions.

source code: [Bartels-Conn-Function:-Source-Code](#Bartels-Conn-Function:-Source-Code)

### Bartels-Conn: Visualization
A 2d filled contour plot of the Bartels-Conn function is shown below.
* The global minimum is labeled with a diamond.
* There are 12 regularly sampled points superimposed on the plot and labeled as $x_0$ that are used to initialize separate trials of each optimization algorithm.

<div align="middle">
    <img src="./sims/bartels_conn-plot2d.png">
</div>

## Egg Crate Function

The Egg Crate function [[2](#References)] is characterized by many local minima arranged in a two-dimensional grid.  The minimum value of each local minima becomes progressively lower as you move towards the origin.

$$
f(x_1, x_2) = x_1^2 + x_2^2 + 25 (\sin^2(x_1) + \sin^2(x_2))
$$

Global minimum is located at $x^* = f(0, 0)$ and $f(x^*) = 0$.

### Classification
* Continuous
* Differentiable
* Separable
* Multimodal

### Implementation Details
The domain of the Egg Crate function used in this project is restricted to (-5,5) along both $x_1$ and $x_2$ dimensions.

source code: [Egg-Crate-Function:-Source-Code](#Egg-Crate-Function:-Source-Code)

### Egg Crate: Visualization
A 2d filled contour plot of the Egg Crate function is shown below.
* The global minimum is labeled with a diamond.
* There are 12 regularly sampled points superimposed on the plot and labeled as $x_0$ that are used to initialize separate trials of each optimization algorithm.

<div align="middle">
    <img src="./sims/egg_crate-plot2d.png">
</div>

# Optimization Algorithms

The table below summarizes the algorithms used in the project. Each algorithm is classified according to criteria that are based on the de- scription of algorithms for optimzation provided in Kochenderfer and Wheeler (2019) [4].

* Gradient-based
    * Gradient-based methods use the derivative of the function to decide on a search direction.  These methods are further subcategorized into whether they use only first-order or second-order information.
    * Gradient-based methods are restricted to unimodal or nearly unimodal test functions such as Rosenbrock and Goldstein-Price.
* Stochastic
    * Stochastic optimization methods incorporate randomness when deciding on a search direction or distance.  Unlike gradient-based methods, these methods are not deterministic and evaluating their behavior is more challenging.
    * Although simulated annealing and particle swarm use drastically different search methods (each is described in more detail later), the contrast to make here is between a stochastic method that explores from a single point versus a stochastic method that explores from multiple points in parallel. For these reasons we refer to simulated annealing as a stochastic serial method and particle swarm as a stochastic parallel (or population) method.
    * Stochastic methods are not restricted to any test functions.

| Method | Approach |
|-|-|
| Gradient Descent | Gradient-Based, First-Order |
| BFGS | Gradient-Based, Second-Order |
| Simulated Annealing | Stochastic, Serial |
| Particle Swarm | Stochastic, Parallel/Population |

## Gradient Descent

The Gradient Descent method [[1](#References)] is a first-order iterative optimization algorithm for finding the local minimum of a differentiable function.

### Gradient Descent Algorithm
1. Start with some initial guess $x_0$ and learning rate $\alpha$.
2. Update $x_k$ in the direction of negative gradient $x_k = x_{k-1} - \alpha \nabla f(x_{k-1})$.
3. Evaluate the gradient at the new minimum $\nabla f(x_k)$
4. Repeat from step 2 until $\nabla f(x_k) \approx 0$

### Implementation Details
Gradient descent is implemented using the [autograd](https://autograd.readthedocs.io/en/latest/index.html) automatic differentiation library to compute the gradient of each test function.  A constant learning rate $\alpha$ is used for all iterations of the algorithm.  The algorithm terminates when the norm of the gradient is less than _tol_.

source code: [Gradient-Descent:-Source-Code](#Gradient-Descent:-Source-Code)

### Gradient Descent: Rosenbrock
The result of a single trial of gradient descent on the Rosenbrock test function is shown below.  Observe that search direction is perpendicular to the isovalue given by the contour.  Although gradient descent quickly finds the narrow valley containing the global minimum, most of the iterations of the algorithm (nit=8288) are spent slowly traveling along this relatively flat surface.

<div align="middle">
    <img src="./sims/gradient_descent-rosenbrock-plot2d-10.png">
</div>

### Gradient Descent: Goldstein Price
The result of a single trial of gradient descent on the Goldstein-Price test function is shown below.  Goldstein-Price is multimodal and as a result, gradient descent is not guaranteed to find the global minimum which is what happens on this trial.  Instead of finding the global minimum, the search ends when finding a flat surface inside of a local minimum ($\min(f)=30$ instead of global $\min(f)=3$).

<div align="middle">
<img src="./sims/gradient_descent-goldstein_price-plot2d-05.png">
</div>

## BFGS

The Broyden-Fletcher-Goldfarb-Shanno aka BFGS method is a second-order iterative optimization method.

### Quasi-Newton Methods
The BFGS method [[5](#References)] is referred to as Quasi-Newton in reference to the fact that unlike Newton's method which uses an explicit Hessian matrix, these methods approximate the Hessian. 

$$
x_{k+1} = x_k - \alpha_k B_k^{-1} \nabla f(x_k)
$$

where
* $B_k$ is approxmation to Hessian
* $\alpha_k$ is obtained from line search

Secant updating methods have superlinear convergence ($1 < r < 2$).
* Slower to converge than Newton's method, but cost-per-iteration is less.

### BFGS Algorithm
1. Start with some initial guess $x_0$ and approximate Hessian $B_0 = I$.
2. Solve $B_k s_k = -\nabla f(x_k)$ for $s_k$ or use a line search (described below) to find $s_k$.
3. Compute $x_{k+1} = x_k + s_k$.
4. Compute the difference in gradients $y_k = \nabla f(x_{k+1}) - \nabla f(x_k)$.
5. Update approximate Hessian.
$$
B_{k+1} = B_k + \frac{y_k y_k^T}{y_k^T s_k} - \frac{B_k s_k s_k^T B_k}{s_k^T B_k s_k}
$$
6. Repeat from step 2 until some stopping criteria is reached.

### Line Search
A line search is used to find the distance along the descent direction of the next step of the optimization.  The scalar multiple $\alpha$ along the descent direction $d$ is found by minimizing the function below.

$$
\underset{\alpha}{\text{minimize}} f(x_k + \alpha d)
$$
where
* $f(...)$ is the function to minimize
* $x_k$ is the current solution
* $\alpha$ is a scalar 
* $d$ is a vector that describes the descent direction of the function

For first-order optimization problems the descent direction is given by the negative gradient $-\nabla f(x_k)$.

For second-order optimization problems the descent direction is given by by the product of the negative gradient and Hessian $-\nabla f(x_k) H_k$.

### Implementation Details
BFGS requires the gradient of the function to minimize and similar to gradient descent the [autograd](https://autograd.readthedocs.io/en/latest/index.html) automatic differentiation library is used to compute the gradient of each test function.  We observed that the performance of the line search has a big impact on the quality of the results obtained and as a result, we replaced a simple version of BFGS written in numpy with a version from scipy having a far more sophisticated line search.  The algorithm terminates when the norm of the gradient is less than _tol_.

source code: [BFGS:-Source-Code](#BFGS:-Source-Code)

### BFGS: Rosenbrock
The result of a single trial of BFGS on the Rosenbrock test function is shown below. Observe that similar to gradient descent the search direction is perpendicular to the isovalue given by the contour.  However unlike gradient descent, the line search in the BFGS algorithm is able to exploit second-order information to take larger steps between iterations.  As a result, the algorithm converges very quickly (nit=30) in comparison to all the other methods tested.

<div align="middle">
    <img src="./sims/bfgs-rosenbrock-plot2d-11.png">
</div>

animation: [youtube](https://youtu.be/_HfCZAnnIgI)

### BFGS: Goldstein Price
The result of a single trial of BFGS on the Goldstein-Price test function is shown below.  Compare this result to the result initialized at the same initial position $x_0$ with gradient descent [Gradient-Descent:-Goldstein-Price](#Gradient-Descent:-Goldstein-Price).  The solution trajectory used by BFGS avoids passing through the local minimum in which gradient descent was caught.  Despite the improvement, BFGS fails to find the global minimum on 6 of the 12 trials of the Goldstein-Price test function.

<div align="middle">
<img src="./sims/bfgs-goldstein_price-plot2d-05.png">
</div>

animation: [youtube](https://youtu.be/fIyGMrPIsGk)

## Simulated Annealing

Simulated annealing [[1](#References)] is a stochastic optimization method based on the natural physical optimization process that occurs when a material is heated to a relatively high temperature and allowed to cool.  At high temperature the atoms in the material more readily break apart and redistribute allowing the material to become more easily deformed and disordered.  As the material cools, the amount of free energy needed for such motion decreases and the material hardens into an ordered crystal structure.

In the context of optimization, this process suggests two mechanisms:
* A means by which the search continues in the direction of the local minimum or restarts in a new position that might be initially worse than the current local minimum. 
* A slow decrease in the probability that the algorithm restarts the search in some other position.

### Transition Distribution
The mean and covariance of the transition distribution is used to select a new position.  The new position is described in terms of an offset from the current position according to a multivariate normal distribution.

### Annealing Schedule
The annealing schedule describes the probability $p(z)$ that the algorithm restarts the search in some other position.  The initial value and rate of decay are parameters of the algorithm.

### Simulated Annealing Algorithm
1. Start with some initial guess $x_0$ and set this as the global minimum $f(x_{min}) = f(x_0)$.
2. Generate a new position $x_k$ by adding to $x_{k-1}$ an offset randomly chosen from the transition distribution.
3. Evaluate the function at the new position and compute the change in the objective function $\Delta f(x_k) = f(x_k) - f(x_{k-1})$.
4. If the objective function is improved $\Delta f(x_k) < 0$, then move to the new position, else use the annealing schedule to compute the probability that despite the lack of improvement a change in position is still made.
5. If the function evaluated at this position is less than the global minimum, then update the global minimum $f(x_{min}) = f(x_k)$.
6. Repeat from step 2 until the number of iterations are reached.

### Implementation Details
Simulated annealing is implemented in numpy with a multivariate normal initialized from a list of means (set to 1. for all trials) and shared covariance matrix (set to the identity matrix for all trials).  Points sampled from this multivariate normal that fall outside of the domain are clipped to the boundary.  More effort could be spent to tune the distribution to each test function, but that wasn't done.  The initial temperature used by the annealing schedule is set to $T_0=1.0$ for all simulations.

source code: [Simulated-Annealing:-Source-Code](#Simulated-Annealing:-Source-Code)

### Simulated Annealing: Rosenbrock
The result of a single trial of simulated annealing on the Rosenbrock test function is shown below.  The algorithm maintains a history of the absolute minimum that it has found and the location where the algorithm is searching does not necessarily corresond to the algorithm minimum.  To help visualize the algorithm progress, the plots and animation show the evolution of the minimum rather than current location.

<div align="middle">
    <img src="./sims/simulated_annealing-rosenbrock-plot2d-10.png">
</div>

animation: [youtube](https://youtu.be/dcKCRAYu-Oo)

### Simulated Annealing: Goldstein-Price
The result of a single trial of simulated annealing on the Goldstein-Price test function is shown below.  Similar to the previous example, the algorithm does not make rapid progress towards the minimum until about 1000 iterations have passed.

<div align="middle">
    <img src="./sims/simulated_annealing-goldstein_price-plot3d-06.png">
</div>

animation: [youtube](https://youtu.be/AgMDXNWJH24)

### Simulated Annealing: Bartels-Conn
The result of a single trial of simulated annealing on the Bartels-Conn test function is shown below.  Since the Bartels-Conn function is not differentiable, we cannot use a gradient-based solver.  However, we can compare the progress of the simulated annealing algorithm along the downhill direction of the test surface.  In contrast to what we would see from gradient descent or BFGS, the progress downhill zig-zags randomly since there is no gradient information to direct progress.  Nevertheless the simulated annealing algorithm finds the global minimum after 200 iterations.

<div align="middle">
    <img src="./sims/simulated_annealing-bartels_conn-plot2d-12.png">
</div>

animation: [youtube](https://youtu.be/KKK3SiV80Ls)

### Simulated Annealing: Egg Crate
The result of a single trial of simulated annealing on the Egg Crate test function is shown below.  This test function has 8 local minima surrounding a global minimum.  The simulated annealing function spends about 1000 iterations exploring 2 of the surrounding local minima before finding the central global minimum.  After so many iterations, the annealing process will discourage restarts and the algorithm spends another 500 iterations to get within 2 decimal places of accuracy ($\min(f)=2.06e-02$) of the global minimum.

<div align="middle">
    <img src="./sims/simulated_annealing-egg_crate-plot3d-01.png">
</div>

animation: [youtube](https://youtu.be/bfBrm2unoOg)

## Particle Swarm

Particle swarm [[6](#References)] is a stochastic optimization method based on particles at different positions that simultaneously explore the optimization function and influence each other's search.  

Each particle in the swarm is characterized by the following properties:
* current position
* current velocity
* position of the minimum found by this particle

The rule that each particle uses to update its' next position is based on the following:
* current velocity of the particle
* velocity in the direction of the minimum found by this particle so far
* velocity in the direction of the global minimum found by all particles so far

### Particle Swarm Algorithm
1. Initialize a list of $n \times p$-dimensional particles with a random initial position $(x_{1,0}, \cdots, x_{n,0})$ and random velocity $(v_{1,0}, \cdots, v_{n,0})$.
2. For each particle save the position and minimum function value found by that particle $(x_{1,\min}, \cdots, x_{n,\min})$ and $(f(x_{1,\min}), \cdots, f(x_{n,\min}))$.
3. Save the global position and minimum function value $x_\min$ and $f(x_\min)$.
4. Initialize a counter $k=1$ used to track the iteration number.
5. Update the velocity of each particle for the current iteration $v_{i,k} = \omega v_{i,k-1} + p_1 r_{i,1} (x_{i,\min} - x_{i,k-1}) + p_2 r_{i,2} (x_\min - x_{i,k-1})$ where $\omega$ is a inertia constant, $p_1, p_2$ are momentum constants, and $r_{i,1}, r_{i,2}$ are per-particle random numbers in the range $[0,1]$.
6. Update the position of each particle for the current iteration $x_{i,k} = x_{i,k-1} + v_{i,k}$.
7. Evaluate the objective function at each new position and update the per-particle and global minimum function values and position.
8. Repeat from step 4 until the number of iterations are reached.

### Implementation Details
Particle swarm is implemented in numpy. There are a few algorithm hyperparameters which are set to default values for all trials:
* $\omega$ inertia coefficient (default: 1.0)
* $p_1$ momentum coefficient towards min position of current particle (default: 1.0)
* $p_2$ momentum coefficient towards min position among all particles (default: 1.0)

An additional hyperparamter controls the number of particles $n$ used in the swarm.  Each particle will participate in the search during an iteration.  As a result, the effective number of iterations of the algorithm is the number of particles multiplied by the number of iterations.  The effective number of iterations (nit) is shown in the plots (the animations have a per-particle iteration counter that ends at nit/n).

Based on experiments adjusting the value of $n$ such that the effective number of iterations is held constant, we observed that the algorithm will always find the global minimum when $n \geq 5$.  A value of $n=3$ was chosen so that the algorithm doesn't always find the global minimum. 

source code: [Particle-Swarm:-Source-Code](#Particle-Swarm:-Source-Code)

### Particle Swarm: Rosenbrock
The result of a single trial of particle swarm on the Rosenbrock test function is shown below.  To help visualize the algorithm progress, the plots and animation show the trajectory of each particle rather than the value of the minimum position among all the particles (note the difference in convention compared to simulated annealing).

For this trial $n=3$ particles start from different initial positions that are well dispersed across the test function.  One of the consistent patterns that emerges from the animations is the tendency for particles to converge.  In this trial, all particles converge at the entrance of the narrow valley leading to the global minimum.  Since the particles have only 50 iterations each, there isn't enough time to get closer.

<div align="middle">
    <img src="./sims/particle_swarm-rosenbrock-plot3d-05.png">
</div>

animation: [youtube](https://youtu.be/SzwsbCBg-tk)

### Particle Swarm: Goldstein-Price
The result of a single trial of particle swarm on the Goldstein-Price test function is shown below. Compare the result to the result with BFGS [BFGS:-Goldstein-Price](#BFGS:-Goldstein-Price).  Although none of the particles start as close to the global minimum, the collection of particles are able to influence each other such that they all move in the same direction down the slope. 

<div align="middle">
    <img src="./sims/particle_swarm-goldstein_price-plot2d-08.png">
</div>

animation: [youtube](https://youtu.be/FIHklbjZrQ0)

### Particle Swarm: Bartels-Conn
The result of a single trial of particle swarm on the Bartels-Conn test function is shown below.

<div align="middle">
    <img src="./sims/particle_swarm-bartels_conn-plot3d-02.png">
</div>

animation: [youtube](https://youtu.be/crtcMyoKOzQ)

### Particle Swarm: Egg Crate
The result of a single trial of particle swarm on the Egg Crate test function is shown below.  This animation is an excellent demonstration of particles starting from opposite corners of the domain converging and moving together to reach the global minimum.

<div align="middle">
    <img src="./sims/particle_swarm-egg_crate-plot2d-11.png">
</div>

animation: [youtube](https://youtu.be/Z0m8CiTAb3M)

# Results

The table below summarizes the results obtained from running 12 trials of each combination of algorithm and test function.  Each column is defined as follows:
* alg
    * Name of algorithm.
* func
    * Name of test function.
* ntrials
    * Number of trials
* nmin
    * Number of trials reaching minimum (or near-minimum) threshold.
    * For the Rosenbrock, Bartels-Conn, and Egg Crate functions, any result with an absolute error less than 1.0 from the test surface minimum is considered to have reached the minimum.
    * For the Goldstein-Price function the same rule applies, but the threshold is 10.0 due to the magnitude of the range across the test function.
* mae
    * Mean average absolute error of all trials.
    * The absolute error is computed by taking the magnitude of the difference between the minimum reported by the algorithm and the known global minimum value of the test function.
* mnit
    * Mean number of iterations to reach a solution.
    * Gradient-based algorithms will terminate at convergence, but the other algoritms are run for a fixed number of iterations.
    * Particle swarm uses multiple particles running in parallel. The number of iterations reported for this algorithm in the table reflects the number of iterations of the algorithm multiplied by number of particles.  This makes the number of iterations of this algorithm comparable to a serial algorithm such as simulated annealing.
* msec
    * Mean elapsed runtime (in seconds) across all trials.
    * Running time values are reported using a laptop from early 2015 with 2.9 GHz Dual-Core Intel i5 processor and 8GB of memory. The operating system used is ubuntu 18.04.
    * Versions of software used appears in [Library-Versions](#Library-Versions).
* algparm
    * Algorithm hyperparameters and settings used for all trials.
    * Gradient Descent: $\alpha$ is learning rate and _tol_ is convergence threshold
    * BFGS: _tol_ is convergence threshold
    * Simulated Annealing: $T_0$ is the initial temperature used in the [Annealing-Schedule](#Annealing-Schedule)
    * Particle Swarm: $n$ is the number of particles

In [2]:
display(HTML('./sims/results-sum.html'))

alg,func,ntrials,nmin,mae,mnit,msec,algparm
Gradient Descent,Rosenbrock,12,12,0.000123,8281,11.58,$\alpha$=0.001 tol=0.01
Gradient Descent,Goldstein Price,12,3,204.0,2113,25.45,$\alpha$=1e-05 tol=0.01
BFGS,Rosenbrock,12,12,2.51e-06,28,0.38,tol=0.01
BFGS,Goldstein Price,12,6,149.0,16,0.58,tol=0.01
Simulated Annealing,Rosenbrock,12,12,0.221,200,0.3,$T_0$=1.0
Simulated Annealing,Goldstein Price,12,7,36.5,1500,1.23,$T_0$=1.0
Simulated Annealing,Bartels Conn,12,11,1.8,200,0.19,$T_0$=1.0
Simulated Annealing,Egg Crate,12,6,6.4,1500,1.43,$T_0$=1.0
Particle Swarm,Rosenbrock,12,10,1.23,150,0.07,n=3
Particle Swarm,Goldstein Price,12,9,22.4,750,0.38,n=3


# Discussion

## Conclusions
The following conclusions can be drawn from the comparative results presented.

* When using gradient-based solvers, first-order methods such as gradient descent require more iterations to find the global minimum than second-order methods such as BFGS. Use of a dyanamic learning rate with gradient descent would reduce the number of iterations required, but would not change the fundamental conclusion.
* Gradient-based solvers achieve higher levels of accuracy than stochastic solvers. In our tests gradient based solvers achieve about 6 decimal digits of precision on the Rosenbrock function, whereas stochastic solvers achieve at most 1 decimal digit of precision on the same test function despite using more iterations.
* Stochastic solvers can sometimes find the global minimum despite getting caught in a local minimum. In contrast, a gradient-based solver cannot escape a local minimum. In our tests simulated annealing and particle swarm are able to find the global minimum of the Goldstein-Price function more frequently than BFGS and gradient descent. Increasing the number of iterations given to BFGS and gradient descent would not change this outcome.
* Stochastic solvers require more iterations than gradient-based solvers, but less computational effort per iteration. The mean runtime per iteration (msec/mnit) of simulated annealing and particle swarm on the Rosenbrock and Goldstein-Price function is less than BFGS and gradient descent.
* Particle swarm is more computationally efficient than simulated annealing. Both particle swarm and simulated annealing reached the benchmark on 36 of 48 trials, but particle swarm required 1800 mean number of iterations (mnit) whereas simulated annealing required 3400 mnit. 

## Future Work
The following should be considered for future work.

* All of the test functions are from $f: \mathbb{R}^2$. Repeating these experiments on higher dimensional test functions could produce different results. Interesting properties to study would be the growth in number of iterations required by the stochastic solvers versus the growth in computational cost of the gradient-based methods on higher dimensional surfaces.
* One alternative that wasn't explored by this study is to use repeated trials of stochastic solvers initialized at different initial points rather than increasing the number of iterations from the same initial point.

# References

[1]
> Mykel J. Kochenderfer and Tim A. Wheeler. 2019. Algorithms for Optimization. The MIT Press.

[2]
> Momin Jamil and Xin-She Yang, A literature survey of benchmark functions for global optimization problems, Int. Journal of Mathematical Modelling and Numerical Optimisation, Vol. 4, No. 2, pp. 150–194 (2013). DOI: 10.1504/IJMMNO.2013.055204

[3]
> H. H. Rosenbrock, “An Automatic Method for Finding the Greatest or least Value of a Function,” Computer Journal, vol. 3, no. 3, pp. 175-184, 1960. [Available Online]: http://comjnl.oxfordjournals.org/content/3/3/175.full.pdf

[4]
> A. A. Goldstein, J. F. Price, “On Descent from Local Minima,” Mathematics and Comptutaion, vol. 25, no. 115, pp. 569-574, 1971.

[5]
> Michael T. Heath. 2018. Scientific Computing: An Introductory Survey, Revised Second Edition. SIAM-Society for Industrial and Applied Mathematics, Philadelphia, PA, USA.

[6]
>  J. Kennedy, R. C. Eberhart, and Y. Shi, Swarm Intelligence. Morgan Kaufmann, 2001.

# Appendix: Links to Animations

The table belows lists links to selected 2d animations referred to in the project and uploaded to youtube.

| Algorithm | Surface | Trial | Youtube Link |
|-|-|-|-|
| BFGS | Rosenbrock | 1 | https://youtu.be/PDk9d_65sHs |
| BFGS | Rosenbrock | 11 | https://youtu.be/_HfCZAnnIgI |
| BFGS | Goldstein-Price | 4 | https://youtu.be/Y01H7iUr6js |
| BFGS | Goldstein-Price | 5 | https://youtu.be/fIyGMrPIsGk |
| Simulated Annealing | Rosenbrock | 10 | https://youtu.be/dcKCRAYu-Oo |
| Simulated Annealing | Goldstein-Price | 6 | https://youtu.be/AgMDXNWJH24 | |
| Simulated Annealing | Bartels-Conn | 1 | https://youtu.be/wp2_u-zHY7c |
| Simulated Annealing | Bartels-Conn | 12 | https://youtu.be/KKK3SiV80Ls |
| Simulated Annealing | Egg Crate | 1 | https://youtu.be/bfBrm2unoOg |
| Particle Swarm | Rosenbrock | 5 | https://youtu.be/SzwsbCBg-tk |
| Particle Swarm | Goldstein-Price | 4 | https://youtu.be/cyyI9hAzjqg |
| Particle Swarm | Goldstein-Price | 7 | https://youtu.be/sQjNwbgXpvc |
| Particle Swarm | Goldstein-Price | 8 | https://youtu.be/FIHklbjZrQ0 |
| Particle Swarm | Bartels-Conn | 2 | https://youtu.be/crtcMyoKOzQ |
| Particle Swarm | Egg Crate | 10 | https://youtu.be/s9MEM_ML3kg |
| Particle Swarm | Egg Crate | 11 | https://youtu.be/Z0m8CiTAb3M |

# Appendix: Source Code

Source code listings for the following components of the project:
* Test functions
* Optimization algorithms
* Static 2d and 3d plots
* Animated 2d plots

## Library Imports

Library imports for optimization algorithms.

```python
from autograd import grad
import numpy as np
import scipy.optimize as opt
```

Note that gradient-based algorithms such as Gradient Descent and BFGS use [autograd](https://autograd.readthedocs.io/en/latest/index.html) to wrap numpy operations with equivalent functions that support automatic differentiation (AD).  Replacing the standard numpy import with the line below is sufficient to be able to differentiate functions which are composed of numpy operations.

```python
import autograd.numpy as np
```

Library imports for plotting functions.

```python
import json
import os

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
from matplotlib.animation import FuncAnimation, FFMpegWriter
```

## Library Versions
Python version `3.6.9` and the packages listed in the `requirements.txt` below were used to produce the results reported in this project. 

```
nbconvert==5.6.1
jupyter~=1.0
matplotlib~=3.3
numpy~=1.19
scipy~=1.5
scikit-image~=0.17
scikit-learn~=0.23
pandas~=1.0
pyvista~=0.25
itkwidgets~=0.32
autograd~=1.3
```

## Surface Generation: Source Code

In [3]:
def surface(fx, start=-30, stop=30, num=60):
    """
    surface evaluates fx at regularly spaced grid of points

    Parameters
    ----------
    fx : func
        fx is a vector valued function that returns a scalar result
    start : float
        lower bound of the coordinate grid
    stop : float
        upper bound of the coordinate grid
    num : int
        number of points along one dimension of the grid

    Returns
    -------
    array
        2D array formed by evaluating fx at each grid point
    """
    x = np.linspace(start=start, stop=stop, num=num)
    x1, x2 = np.meshgrid(x, x, indexing='ij')
    X = np.vstack((x1.ravel(), x2.ravel()))
    z = np.apply_along_axis(fx, 0, X).reshape(num,num)
    return x1, x2, z

## Rosenbrock Function: Source Code

In [4]:
def rosenbrock(x):
    """
    rosenbrock evaluates Rosenbrock function at vector x

    Parameters
    ----------
    x : array
        x is a D-dimensional vector, [x1, x2, ..., xD]

    Returns
    -------
    float
        scalar result
    """
    D = len(x)
    i, iplus1 = np.arange(0,D-1), np.arange(1,D)
    return np.sum(100*(x[iplus1] - x[i]**2)**2 + (1-x[i])**2)

## Goldstein-Price Function: Source Code

In [5]:
def goldstein_price(x):
    """
    goldstein_price evaluates Goldstein-Price function at vector x

    Parameters
    ----------
    x : array
        x is a 2-dimensional vector, [x1, x2]

    Returns
    -------
    float
        scalar result
    """
    a = (x[0] + x[1] + 1)**2
    b = 19 - 14*x[0] + 3*x[0]**2 - 14*x[1] + 6*x[0]*x[1] + 3*x[1]**2
    c = (2*x[0] - 3*x[1])**2
    d = 18 - 32*x[0] + 12*x[0]**2 + 48*x[1] - 36*x[0]*x[1] + 27*x[1]**2
    return (1. + a*b) * (30. + c*d)

## Bartels-Conn Function: Source Code

In [6]:
def bartels_conn(x):
    """
    bartels_conn evaluates Bartels-Conn function at vector x

    Parameters
    ----------
    x : array
        x is a 2-dimensional vector, [x1, x2]

    Returns
    -------
    float
        scalar result
    """
    a = np.abs(x[0]**2 + x[1]**2 + x[0]*x[1])
    b = np.abs(np.sin(x[0]))
    c = np.abs(np.cos(x[1]))
    return a + b +c

## Egg Crate Function: Source Code

In [7]:
def egg_crate(x):
    """
    egg_crate evaluates Egg Crate function at vector x

    Parameters
    ----------
    x : array
        x is a 2-dimensional vector, [x1, x2]

    Returns
    -------
    float
        scalar result
    """
    return x[0]**2 + x[1]**2 + 25.*(np.sin(x[0])**2 + np.sin(x[1])**2)

## Gradient Descent: Source Code

In [8]:
def gradient_descent(fx, gradfx, x0, alpha, tol, maxiter):
    """
    gradient_descent returns the point xk where fx is minimum

    Parameters
    ----------
    fx : function
        function to minimize
    gradfx : function
        gradient of function to minimize
    x0 : numpy.ndarray
        initial guess for xk
    alpha : float
        learning rate
    tol : float
        convergence threshold
    maxiter : int
        maximum number of iterations

    Returns
    -------
    numpy.ndarray
        point xk where fx is minimum
    numpy.ndarray
        position and value history
        [[x0, fx(x0), gradfx(x0)],
         [x1, fx(x1), gradfx(x1)],...]
    """

    xk, fxk, gradfxk = x0, fx(x0), gradfx(x0)

    # Save current and minimum position and value to history.
    steps = np.zeros((maxiter, (x0.size*2)+1))
    steps[0,:] = np.hstack((x0, fxk, gradfxk))

    # Repeat up to maximum number of iterations.
    for k in range(1,maxiter):

        # Stop iteration when gradient is near zero.
        if np.linalg.norm(gradfxk) < tol:
            steps = steps[:-(maxiter-k),:]
            break

        # Update xk based on product of learning rate and gradient.
        xk = xk - alpha * gradfxk

        # Evaluate gradient at new value of xk.
        gradfxk = gradfx(xk)

        # Evaluate the function at new value of xk.
        fxk = fx(xk)

        # Save iteration history.
        steps[k,:] = np.hstack((xk, fxk, gradfxk))

    return xk, steps

## BFGS: Source Code

In [9]:
# NOTE(mmorais): bfgs is failing on line search, use scipy instead.
def scipy_bfgs(fx, gradfx, x0, tol, maxiter):
    """
    scipy_bfgs wraps scipy implementation of bfgs
    """
    # Save current and minimum position and value to history.
    steps = np.zeros((maxiter+1, (x0.size*2)+1))
    steps[0,:] = np.hstack((x0, fx(x0), gradfx(x0)))
    def make_callback():
        k = 1
        def callback(xk):
            nonlocal k
            # Save iteration history.
            steps[k,:] = np.hstack((xk, fx(xk), gradfx(xk)))
            k = k + 1
        return callback

    # Invoke scipy minimize with BFGS option.
    res = opt.minimize(fx, x0, method='BFGS', jac=gradfx, tol=tol,
                       options={'maxiter': maxiter},
                       callback=make_callback())

    # Copy OptimizeResult to equivalent returned from bfgs.
    xk = res.x
    if res.nit < maxiter:
        steps = steps[:-(maxiter-res.nit),:]
    return xk, steps

## Simulated Annealing: Source Code

In [10]:
def simulated_annealing(fx, x0, mean, cov, tk, bounds, niter):
    """
    simulated_annealing returns the point xk where fx is minimum

    Parameters
    ----------
    fx : function
        function to minimize
    x0 : numpy.ndarray
        initial guess for xk
    mean : numpy.ndarray
        means of multivariate normal transition distribution
    cov : numpy.ndarray
        covariance of multivariate normal transition distribution
    tk : function
        annealing schedule as a function of iteration number
    bounds : numpy.ndarray
        domain boundaries [x1_min, x1_max, ..., xn_min, xn_max]
    niter : int
        number of iterations

    Returns
    -------
    numpy.ndarray
        point xk where fx is minimum
    numpy.ndarray
        current and minimum position and value history
        [[x0, fx(x0), xk_min, fx(xk_min)],
         [x1, fx(x1), xk_min, fx(xk_min)],...]
    """

    # Initialize solution at x0.
    xk, fxk = x0, fx(x0)
    xk_min, fxk_min = xk, fxk

    # Setup random transition distribution.
    mvnorm = partial(np.random.multivariate_normal, mean, cov)

    # Save current and minimum position and value to history.
    steps = np.zeros((niter, (x0.size+1)*2))
    steps[0,:] = np.hstack((x0, fxk, xk_min, fxk_min))

    # Perform fixed number of iterations.
    for k in range(1,niter):

        # Generate a new random point.
        xk1 = xk + mvnorm()
        xk1 = np.clip(xk1, a_min=bounds[::2], a_max=bounds[1::2])

        # Evaluate the function at the new point.
        fxk1 = fx(xk1)

        # Compute the change in the objective function.
        delta_fxk = fxk1 - fxk

        # If objective function is improved or escape current position,
        # then update xk, fxk with the new position.
        if delta_fxk < 0. or np.random.random() < np.exp(-fxk1/tk(k)):
            xk, fxk = xk1, fxk1
            if fxk1 < fxk_min:
                xk_min, fxk_min = xk1, fxk1

        # Save iteration history.
        steps[k,:] = np.hstack((xk1, fxk1, xk_min, fxk_min))

    return xk_min, steps

## Particle Swarm: Source Code

In [11]:
def particle_swarm(fx, x0s, omega, p1, p2, bounds, niter):
    """
    particle_swarm returns the point xk where fx is minimum

    Parameters
    ----------
    fx : function
        function to minimize
    x0s : numpy.ndarray
        initial positions of particles in swarm
    omega : float
        inertia coefficient
    p1 : float
        momentum coefficient towards min position of current particle
    p2 : float
        momentum coefficient towards min position among all particles
    bounds : numpy.ndarray
        domain boundaries [x1_min, x1_max, ..., xn_min, xn_max]
    niter : int
        number of iterations

    Returns
    -------
    numpy.ndarray
        point xk where fx is minimum
    numpy.ndarray
        current and minimum position and value history for each particle
        [[x_1,0, fx(x_1,0), xk_1,min, fx(xk_1,min),...,
            x_n,0, fx(x_n,0), xk_n,min, fx(xk_n,min)],
         [x_1,1, fx(x_1,1), xk_1,min, fx(xk_1,min),...,
            x_n,1, fx(x_n,1), xk_n,min, fx(xk_n,min)],
    """

    # Initialize swarm with position, velocity, and min position.
    pos = np.copy(x0s)
    x0sdelta = np.max(x0s, axis=0) - np.min(x0s, axis=0)
    vel = (np.random.random(x0s.shape)-0.5)*x0sdelta
    posmin, fxmin = np.copy(x0s), np.apply_along_axis(fx, 1, x0s)

    # Global minimum position.
    xk_min, fxk_min = posmin[np.argmin(fxmin),:], np.min(fxmin)

    # Save position, velocity, and min position by particle to history.
    # Also save global min position and value with each particle history.
    npart, ndim, nvecs = x0s.shape[0], x0s.shape[1], 4
    steps = np.zeros((npart*niter, ndim*nvecs+2))
    steps[:npart,:] = np.hstack((pos, vel, posmin, fxmin[:,np.newaxis],
                                 np.broadcast_to(xk_min,(npart,ndim)),
                                 np.broadcast_to(fxk_min,(npart,1))))

    # Perform fixed number of iterations.
    for k in range(1,niter):

        # Compute new velocity of each particle.
        rs = np.random.random((npart,2))
        vel = omega*vel + p1*rs[0]*(posmin-pos) + p2*rs[1]*(xk_min-pos)

        # Update the position of each particle based on velocity.
        pos = pos + vel
        pos = np.clip(pos, a_min=bounds[::2], a_max=bounds[1::2])

        # Evaluate the objective function at each new position.
        fxpart = np.apply_along_axis(fx, 1, pos)

        # If objective function is improved,
        # then replace particle minimum position and value.
        inds = fxpart < fxmin
        posmin[inds,:], fxmin[inds] = pos[inds,:], fxpart[inds]

        # If global objective function is improved,
        # then replace global minimum position and value.
        ind = np.argmin(fxmin)
        if fxmin[ind] < fxk_min:
            xk_min, fxk_min = posmin[ind,:], fxmin[ind]

        # Save particle history.
        ind0 = k*npart
        steps[ind0:ind0+npart,:] = (
            np.hstack((pos, vel, posmin, fxmin[:,np.newaxis],
                       np.broadcast_to(xk_min,(npart,ndim)),
                       np.broadcast_to(fxk_min,(npart,1)))))

    return xk_min, steps

## Static 2d Simulation Result Plot

In [12]:
def load_steps(**params):
    """Return solution steps based on simulation properties."""
    savefn = os.path.join(params['base_dirn'],
                          params['savefn_fmt'].format(**params))
    return np.load(savefn)


def load_meta(**params):
    """Return metafile based on simulation properties."""
    metafn = os.path.join(params['base_dirn'],
                          params['metafn_fmt'].format(**params))
    return json.load(open(metafn, 'r'))


def plot2d_solutions(**params):
    """
    plot2d_solutions creates 2d solution plot from simulation results
    """
    algstr = params['alg'].replace('_',' ').title()
    funcstr = params['func'].replace('_',' ').title()
    ngridpts = params.get('ngridpts', 500)
    bounds = params['bounds']
    trial = params['trial']  # Single trial only.
    xkmind = params.get('xkmind', slice(2))
    color = params.get('color', 'darkorange')
    ticker_locator = params.get('ticker_locator', 'LinearLocator')
    colorbar_label = params.get('colorbar_label', 'z')
    show_legend = params.get('show_legend', True)

    # Imbue title with simulation meta information.
    meta = load_meta(**params)
    expmin, expxkmin = meta['exp_fxkmin'], meta['exp_xkmin']
    expminstr = 'abs $\\min(f)$={0:.0f}'.format(expmin)
    algstr = algstr if len(algstr) > 4 else algstr.upper()
    algmeta = [('nx0','n'),('T0','$T_0$'),
               ('alpha','$\\alpha$'),('tol','tol')]
    algmetastr = ' '.join(['{0}={1}'.format(n2, meta[n1])
                           for n1, n2 in algmeta if n1 in meta])
    nitstr = 'nit={0:d}'.format(meta['nsteps'][trial-1])
    minfx = meta['f(xk)'][trial-1]
    minfmt = '.2e' if minfx < 1e-1 else '.1f'
    minstr = '$\\min(f)$={0:{1}}'.format(minfx, minfmt)
    metastrs = [algstr, minstr, nitstr, algmetastr]
    titlestr = ' '.join([s for s in metastrs if len(s) > 0])
    suptitlestr = 'Solution Trajectories: {0} Function'.format(funcstr)

    # Generate surface for filled contour plot.
    fx = globals()[params['func']]
    start, stop = np.min(bounds[::2]), np.max(bounds[1::2])
    x1, x2, z = surface(fx, start, stop, ngridpts)

    fig = plt.figure(figsize=(8,6))

    # Plot 2d filled contour.
    locator = getattr(ticker, ticker_locator)
    cs = plt.contourf(x1, x2, z, locator=locator(), cmap='viridis_r',
                      alpha=0.7)

    # Plot expected minimum.
    plt.scatter(expxkmin[0], expxkmin[1], marker='D', c='red', s=30,
                label=expminstr)

    # Plot initial point.
    x0 = np.array(meta['x0'][trial-1]).reshape(-1,2)
    plt.scatter(x0[:,0], x0[:,1], marker='X', c='dodgerblue', s=30,
                label='$x_0$')

    # Plot solution trajectory.
    steps = load_steps(**params)
    nx0 = meta.get('nx0', 1)  # Multiple particles?
    xks = steps[:,xkmind]
    xks = np.clip(xks, a_min=bounds[::2], a_max=bounds[1::2])
    nxks = 0 if np.isnan(xks).any() else len(xks)//nx0
    for p in range(nx0):
        p0, pN, pstep = p, nxks, nx0
        plt.plot(xks[p0:pN:pstep,0], xks[p0:pN:pstep,1],
                 marker='.', ms=5, markevery=0.25,
                 ls='-', lw=1, c=color,
                 label='$x_k$, trial={:d}'.format(trial))

    plt.suptitle(suptitlestr)
    plt.title(titlestr)
    plt.xlabel('x1')
    plt.xlim(bounds[:2])
    plt.ylabel('x2')
    plt.ylim(bounds[2:])
    plt.colorbar(cs, label=colorbar_label)
    if show_legend:
        plt.legend()
    if params.get('plot2dfn_fmt') is not None:
        imgn = params['plot2dfn_fmt'].format(**params)
        plotfn = os.path.join(params['base_dirn'], imgn)
        plt.savefig(plotfn)
    else:
        plt.show()
    plt.close(fig)

## Static 3d Simulation Result Plot

In [13]:
def load_steps(**params):
    """Return solution steps based on simulation properties."""
    savefn = os.path.join(params['base_dirn'],
                          params['savefn_fmt'].format(**params))
    return np.load(savefn)


def load_meta(**params):
    """Return metafile based on simulation properties."""
    metafn = os.path.join(params['base_dirn'],
                          params['metafn_fmt'].format(**params))
    return json.load(open(metafn, 'r'))


def plot3d_solutions(**params):
    """
    plot3d_solutions creates 3d solution plot from simulation results
    """
    algstr = params['alg'].replace('_',' ').title()
    funcstr = params['func'].replace('_',' ').title()
    ngridpts = params.get('ngridpts', 500)
    bounds = params['bounds']
    elev = params['elev']
    azim = params['azim']
    trial = params['trial']  # Single trial only.
    xkmind = params.get('xkmind', slice(2))
    color = params.get('color', 'crimson')
    show_legend = params.get('show_legend', True)

    # Imbue title with simulation meta information.
    meta = load_meta(**params)
    expmin, expxkmin = meta['exp_fxkmin'], meta['exp_xkmin']
    expminstr = 'abs $\\min(f)$={0:.0f}'.format(expmin)
    algstr = algstr if len(algstr) > 4 else algstr.upper()
    algmeta = [('nx0','n'),('T0','$T_0$'),
               ('alpha','$\\alpha$'),('tol','tol')]
    algmetastr = ' '.join(['{0}={1}'.format(n2, meta[n1])
                           for n1, n2 in algmeta if n1 in meta])
    nitstr = 'nit={0:d}'.format(meta['nsteps'][trial-1])
    minfx = meta['f(xk)'][trial-1]
    minfmt = '.2e' if minfx < 1e-1 else '.1f'
    minstr = '$\\min(f)$={0:{1}}'.format(minfx, minfmt)
    metastrs = [algstr, minstr, nitstr, algmetastr]
    titlestr = ' '.join([s for s in metastrs if len(s) > 0])
    suptitlestr = 'Solution Trajectories: {0} Function'.format(funcstr)

    # Generate surface for filled contour plot.
    fx = globals()[params['func']]
    start, stop = np.min(bounds[::2]), np.max(bounds[1::2])
    x1, x2, z = surface(fx, start, stop, ngridpts)

    fig = plt.figure(figsize=(10,8))
    ax = fig.gca(projection='3d')
    ax.view_init(elev=elev, azim=azim)
    surf = ax.plot_surface(x1, x2, z, cmap='viridis_r', alpha=0.7)
    fig.colorbar(surf, shrink=0.5, aspect=5)

    # Plot expected minimum.
    ax.scatter3D([expxkmin[0]],[expxkmin[1]], [expmin],
                 marker='D', c='black', s=30,
                 label=expminstr)

    # Plot initial point.
    x0 = np.array(meta['x0'][trial-1]).reshape(-1,2)
    ax.scatter3D(x0[:,0], x0[:,1], [fx(xk) for xk in x0],
                 marker='X', c='dodgerblue', s=30,
                 label='$x_0$')

    # Plot solution trajectory.
    steps = load_steps(**params)
    nx0 = meta.get('nx0', 1)  # Multiple particles?
    xks = steps[:,xkmind]
    xks = np.clip(xks, a_min=bounds[::2], a_max=bounds[1::2])
    nxks = 0 if np.isnan(xks).any() else len(xks)//nx0
    for p in range(nx0):
        p0, pN, pstep = p, nxks, nx0
        ax.plot3D(xks[p0:pN:pstep,0],
                  xks[p0:pN:pstep,1],
                  [fx(xk) for xk in xks[p0:pN:pstep,:]],
                  ls='-', lw=1, c=color,
                  label='$x_k$, trial={:d}'.format(trial))

    plt.suptitle(suptitlestr)
    plt.title(titlestr)
    plt.xlabel('x1')
    plt.xlim(bounds[:2])
    plt.ylabel('x2')
    plt.ylim(bounds[2:])
    if show_legend:
        ax.legend()
    if params.get('plot3dfn_fmt') is not None:
        imgn = params['plot3dfn_fmt'].format(**params)
        plotfn = os.path.join(params['base_dirn'], imgn)
        plt.savefig(plotfn)
    else:
        plt.show()
    plt.close(fig)

## Animated 2d Simulation Result Plot

In [14]:
def load_steps(**params):
    """Return solution steps based on simulation properties."""
    savefn = os.path.join(params['base_dirn'],
                          params['savefn_fmt'].format(**params))
    return np.load(savefn)


def load_meta(**params):
    """Return metafile based on simulation properties."""
    metafn = os.path.join(params['base_dirn'],
                          params['metafn_fmt'].format(**params))
    return json.load(open(metafn, 'r'))


def anim2d_solutions(**params):
    """
    anim2d_solutions creates 2d animation from simulation results
    """
    algstr = params['alg'].replace('_',' ').title()
    funcstr = params['func'].replace('_',' ').title()
    ngridpts = params.get('ngridpts', 500)
    bounds = params['bounds']
    trial = params['trial']  # Single trial only.
    xkmind = params.get('xkmind', slice(2))
    fxkmind = params.get('fxkmind', 2)
    color = params.get('color', 'darkorange')
    ticker_locator = params.get('ticker_locator', 'LinearLocator')
    colorbar_label = params.get('colorbar_label', 'z')
    fps = params.get('fps', 30)
    bitrate = params.get('bitrate', 1000)
    show_legend = params.get('show_legend', True)

    # Imbue title with simulation meta information.
    meta = load_meta(**params)
    expmin, expxkmin = meta['exp_fxkmin'], meta['exp_xkmin']
    expminstr = 'abs $\\min(f)$={0:.0f}'.format(expmin)
    algstr = algstr if len(algstr) > 4 else algstr.upper()
    algmeta = [('nx0','n'),('T0','$T_0$'),
               ('alpha','$\\alpha$'),('tol','tol')]
    algmetastr = ' '.join(['{0}={1}'.format(n2, meta[n1])
                           for n1, n2 in algmeta if n1 in meta])
    nitstr = 'nit={0:d}'.format(meta['nsteps'][trial-1])
    minfx = meta['f(xk)'][trial-1]
    minfmt = '.2e' if minfx < 1e-1 else '.1f'
    minstr = '$\\min(f)$={0:{1}}'.format(minfx, minfmt)
    metastrs = [algstr, minstr, nitstr, algmetastr]
    titlestr = ' '.join([s for s in metastrs if len(s) > 0])
    suptitlestr = 'Solution Trajectories: {0} Function'.format(funcstr)

    # Generate surface for filled contour plot.
    fx = globals()[params['func']]
    start, stop = np.min(bounds[::2]), np.max(bounds[1::2])
    x1, x2, z = surface(fx, start, stop, ngridpts)

    fig = plt.figure(figsize=(8,6))

    # Plot 2d filled contour.
    locator = getattr(ticker, ticker_locator)
    cs = plt.contourf(x1, x2, z, locator=locator(), cmap='viridis_r',
                      alpha=0.7)

    # Plot expected minimum.
    plt.scatter(expxkmin[0], expxkmin[1], marker='D', c='red', s=30,
                label=expminstr)

    # Plot initial point.
    x0 = np.array(meta['x0'][trial-1]).reshape(-1,2)
    plt.scatter(x0[:,0], x0[:,1], marker='X', c='dodgerblue', s=30,
                label='$x_0$')

    # Plot bounds.
    plt.suptitle(suptitlestr)
    plt.title(titlestr)
    plt.xlabel('x1')
    plt.xlim(bounds[:2])
    plt.ylabel('x2')
    plt.ylim(bounds[2:])
    plt.colorbar(cs, label=colorbar_label)

    # Load solution trajectory.
    steps = load_steps(**params)
    nx0 = meta.get('nx0', 1)  # Multiple particles?
    xks, fxks = steps[:,xkmind], steps[:,fxkmind]
    xks = np.clip(xks, a_min=bounds[::2], a_max=bounds[1::2])
    nxks = 0 if np.isnan(xks).any() else len(xks)//nx0

    txtx = bounds[0] + (bounds[1]-bounds[0])*0.5
    txty = bounds[3] - (bounds[3]-bounds[2])*0.025
    txt = plt.text(txtx, txty, '', ha='center', va='top')
    lns = []
    for _ in range(nx0):
        ln, = plt.plot([], [],
                       ls=(0, (1,1)), lw=2, c=color,
                       label='$x_k$, trial={:d}'.format(trial))
        lns.append(ln)

    def update(ind):
        # All of the particles store the same min.
        txt.set_text('k={0:d} $f(x_k)$={1:{2}}'.format(
                     ind+1, fxks[ind*nx0],
                     '.2e' if fxks[ind*nx0] < 1e-1 else '.1f'))
        for p in range(nx0):
            # Starting from first position up to + incl current position.
            p0, pN, pstep = p, (ind*nx0)+p+nx0, nx0
            lns[p].set_data(xks[p0:pN:pstep,0], xks[p0:pN:pstep,1])
        return lns

    if show_legend:
        plt.legend()

    anim = FuncAnimation(fig, update, frames=nxks, blit=True)
    if params.get('anim2dfn_fmt') is not None:
        imgn = params['anim2dfn_fmt'].format(**params)
        animfn = os.path.join(params['base_dirn'], imgn)
        writer = FFMpegWriter(fps=fps, bitrate=bitrate,
                              extra_args=['-vcodec', 'libx264'])
        anim.save(animfn, writer=writer)
    plt.close(fig)