# TIES483 Assignment, Mikael Myyrä

***

## Problem 1

A window is being built and the bottom is a rectangle and the top is a semicircle. If there is 12 m of framing materials what must the dimensions of the window be to make the window area as big as possible?

Model the decision problem as an optimization problem and solve it with a method of your choosing. **Analyse the result!**

***

The relevant measurements (circumference and area) of the window can be expressed as functions of width $w$ and height $h$. The circumference is $w + 2h$ for the rectangular part and $\pi \frac{w}{2}$ for the circular part, so the total circumference is $C(w, h) = (1 + \frac{\pi}{2})w + 2h$. The area is $wh$ for the rectangular part and $\pi (\frac{w}{2})^2$ for the circular part, so the total area is $A(w, h) = wh + \frac{\pi}{4} w^2$. We're trying to maximize $A$ (and thus minimize $-A$) subject to the linear constraint $C = 12$. Also, $w$ and $h$ must be greater than zero because this is a real window. In formal terms, the problem is

$$
\begin{align}
\min \qquad & -wh - \frac{\pi}{4} w^2 \\
\text{s.t.} \qquad & (1 + \frac{\pi}{2})w + 2h = 12 \\
& w, r > 0
\end{align}
$$

In this simple case it should be possible to find a solution analytically using the stationarity rule from the KKT conditions. Let's try it. The Lagrangian for this problem is

$$
L(w, h) = -wh - \frac{\pi}{4} w^2 - \lambda(1 + \frac{\pi}{2})w - 2\lambda h
$$

and its gradient

$$
\nabla L(w, h) = (-h - \frac{\pi}{2} w - \lambda(1 + \frac{\pi}{2}), -w - 2\lambda).
$$

We're looking for points where
$$
\nabla L(w, h) = \mathbf{0} \\
\begin{cases}
-h - \frac{\pi}{2}w - \lambda(1 + \frac{\pi}{2}) = 0 \\
-w - 2\lambda = 0
\end{cases} \\
w = -\frac{\lambda}{2} \\
-h + (\frac{\pi}{4} - (1 + \frac{\pi}{2}))\lambda = 0 \iff h = (-\frac{\pi}{4} - 1)\lambda
$$

Applying the constraint, we find feasible values for $\lambda$:

$$
\begin{align}
(1 + \frac{\pi}{2})w + 2h &= 12 \\
(1 + \frac{\pi}{2})(-\frac{\lambda}{2}) + 2(-\frac{\pi}{4} - 1)\lambda &= 12 \\x
(-\frac12 - \frac{\pi}{4} - \frac{\pi}{2} - 2)\lambda &= 12 \\
(-\frac{3\pi}{4} - \frac{5}{2})\lambda &= 12 \\
\lambda &= \frac{-12}{\frac{3\pi}{4} + \frac{5}{2}}
\end{align}
$$

now, applying $\lambda$ back to $w$ and $h$, we get

$$
w = -\frac{\lambda}{2} = \frac{12}{\frac{3\pi}{2} + 5} \approx 1.236 \\
h = (-\frac{\pi}{4} - 1) \lambda 
  = \frac{3\pi + 12}{\frac{3\pi}{4} + \frac{5}{2}} \approx 4.412.
$$

Analysis: Because we used the KKT conditions to come up with this solution analytically, we already know it's optimal. Just to check that the calculations were correct, you can plug $w$ and $h$ back into the constraint function to ensure it holds. Plugging said values into the objective function and flipping the sign gives the optimal area of the window, which comes out to be approximately $6.653 m^2$.

***

## Problem 2

The 10-dimensional Robsenbrock function (one of the variants) is defined as
$$
f(\mathbf{x}) = \sum_{i=1}^{9} 100 (x_{i+1} - x_i^2 )^2 + (1-x_i)^2
$$
for $x\in\mathbb R^{10}$. 

Compare at least two different optimization method's performance in minimizing this function over $\mathbb R^{10}$. You can decide the method of comparison as the one that makes most sense to you. **Analyze the results!**

***

In [9]:
# first, let's define the problem in Python

def obj(x):
    assert(len(x) == 10)
    ans = 0.0
    for i in range(0, 9):
        ans += 100*((x[i+1] - x[i]**2)**2) + (1 - x[i])**2
    return ans

The gradient of the function is also needed for a method I want to test. It's a bit complex so I'll write down some steps factorizing the function to make it easier for myself.

$$
\begin{align}
f(\mathbf{x}) &= \sum_{i=1}^9 100(x_{i+1} - x_i^2)^2 + (1 - x_i)^2 \\
&= \sum_{i=1}^9 100(x_{i+1}^2 - 2 x_{i+1} x_i^2 + x_i^4) + (1 - 2 x_i + x_i^2) \\
&= \sum_{i=1}^9 100x_{i+1}^2 - 200 x_{i+1} x_i^2 + 100 x_i^4 + x_i^2 - 2 x_i + 1
\end{align}
$$

The expression for the gradient is so long it's hard to write down, but is fairly straightforward to calculate by iterating the sum and updating corresponding elements in the gradient with the derivatives of the summed expression. The gradient elements obtained from a single iteration of the sum are

$$
\nabla_{(i, i+1)} f(\mathbf{x}) = (-400x_{i+1}x_i + 400x_i^3 + 2x_i - 2, 200x_{i+1} - 200x_{i+1}x_i^2).
$$

In [24]:
import numpy as np

def obj_gradient(x):
    assert(len(x) == 10)
    ans = np.array([0.0] * 10)
    for i in range(0, 9):
        ans[i] += -400 * x[i+1] * x[i] + 400 * x[i]**3 + 2 * x[i] - 2
        ans[i+1] += 200 * x[i+1] - 200 * x[i+1] * x[i]**2
    return ans

Since scipy.optimize.minimize has multiple methods available and is able to print statistics, it seems like a convenient place to go for this task. Let's pick Nelder-Mead (uses no derivatives) and BFGS (uses first derivatives).

In [26]:
from scipy.optimize import minimize

first_guesses = [
    # try a few starting points varying distances away
    np.array([0.0] * 10),
    np.array([1.0] * 10),
    np.array([2.0] * 10),
    np.array([100.0] * 10),
]

for x0 in first_guesses:
    print('=' * 80)
    print(f'Starting point: {x0}')
    
    print('Nelder-Mead:')
    nm_result = minimize(
        obj,
        x0,
        method='Nelder-Mead',
        tol=1e-6,
        options={'disp': True},
    )
    print('Result:')
    print(nm_result)
    
    print('BFGS:')
    bfgs_result = minimize(
        obj,
        x0,
        method='BFGS',
        jac=obj_gradient,
        tol=1e-6,
        options={'disp': True},
    )
    print('Result:')
    print(bfgs_result)

Starting point: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
Nelder-Mead:
Result:
 final_simplex: (array([[ 0.95819942,  0.92047379,  0.83391187,  0.70387608,  0.52461411,
         0.28254586,  0.06659239,  0.08429688, -0.04589073, -0.08591718],
       [ 0.95894233,  0.9204341 ,  0.83222897,  0.70545843,  0.5254473 ,
         0.2806476 ,  0.06734214,  0.08417792, -0.04925657, -0.08365011],
       [ 0.96120856,  0.92602729,  0.84155191,  0.72115311,  0.54131841,
         0.29620842,  0.07283483,  0.08501551, -0.05263569, -0.08522277],
       [ 0.95987139,  0.92438481,  0.83994796,  0.7175543 ,  0.54184927,
         0.3029527 ,  0.07201315,  0.08518736, -0.04873201, -0.08613543],
       [ 0.95692642,  0.92085713,  0.83469867,  0.70989731,  0.53229555,
         0.28569522,  0.06754217,  0.08580009, -0.04813109, -0.08382746],
       [ 0.95642002,  0.91996378,  0.83332828,  0.70289444,  0.51999602,
         0.27972976,  0.06984152,  0.08314439, -0.04933047, -0.08598427],
       [ 0.95946257,  0.92098857

Result:
 final_simplex: (array([[0.99594434, 1.00061315, 0.99899839, 0.99894412, 1.00421332,
        1.01327232, 1.01922871, 1.03531195, 1.0651717 , 1.13458539],
       [0.99579141, 1.00105939, 0.99869556, 0.998967  , 1.0043477 ,
        1.01188324, 1.01592046, 1.02636383, 1.0466271 , 1.09758113],
       [0.99605162, 1.00083998, 0.99939916, 0.99842367, 1.00409885,
        1.0119609 , 1.01652826, 1.02638101, 1.04683471, 1.09866375],
       [0.99567335, 1.00064177, 0.99960383, 1.00014888, 1.00429823,
        1.01392193, 1.01794475, 1.03268623, 1.05995148, 1.12322089],
       [0.99696348, 1.00287916, 0.99978023, 0.99895393, 1.00450958,
        1.01203808, 1.01501725, 1.02643147, 1.04707779, 1.09823168],
       [0.99626718, 1.00122233, 0.99862005, 0.99973699, 1.0030887 ,
        1.0136106 , 1.02077926, 1.03623261, 1.0670106 , 1.1409193 ],
       [0.99588546, 1.00120275, 0.99969038, 0.99877669, 1.00350816,
        1.01356892, 1.02016055, 1.03525761, 1.06553596, 1.13659802],
       [0.996884

Both methods produce very different results depending on the chosen starting point. I think this is because both methods run out of allowed iterations before reaching convergence, and the function has a very flat minimal region (in the 2D case it's shaped like a parabola; in this case it's impossible to visualize but I assume it's something like a 10D paraboloid). Thus all points on this paraboloid-ish surface have objective function values very close to the minimum.

Nelder-Mead seems to take thousands of iterations if we start from anywhere that isn't already in the minimum region. BFGS, on the other hand, takes an order of magnitude fewer function evaluations, which is expected as it uses gradient information to find good search directions. However, BFGS doesn't appear to find better results. On the contrary, in the last example of starting at (100, ..., 100), BFGS comes up with a much worse solution than Nelder-Mead. This seems strange, but it does look like the gradient at the point it finds is close to zero. This may be because the shape of the function is very flat, but I think I've probably made a mistake in my gradient formulation. I've looked over the calculations many times and I can't find a mistake, but this result seems very strange otherwise.

Note: I calculated the gradient by hand because I don't have easy access to the `ad` library on my OS. Using that library would be an easy way to check if the results are the same with a gradient that's known to be correct.

(1, ..., 1) looks like it's an optimal solution as both methods return it if we start there.

***

## Problem 3

The task is to solve a black-box optimization problem where the objective and constraint function values can be obtained by calling an executable. The executable <i>prob3</i> will be available at the course website http://users.jyu.fi/~jhaka/opt/ along with instructions on how to use it in the <i>README</i> file. 

The format of the problem is
$$
\begin{align}
\min \ &f(x)\\
\text{s.t. }&h_1(x) = 0\\
        &h_2(x) = 0\\
        &g_1(x) \geq 0\\
        &g_2(x) \geq 0\\
        &g_3(x) \geq 0\\
        &g_4(x) \geq 0\\
        &x\in \mathbb R^4.        
\end{align}
$$
Solve the optimization problem by using the tools and optimization method of your choosing. **Analyse the results!**

***

***

## Problem 4

Study biobjective optimization problem
$$
\begin{align}
\min \ &(\|x-(1,0)\|,\|x-(0,1)\|)\\
\text{s.t. }&x\in \mathbb R^2.
\end{align}
$$
Try to generate an evenly spread representation of the Pareto front. Plot the results in both the decision and objective spaces. **Analyze the results!**

***