In [121]:
import numpy as np
import pandas as pd
import math as math;
np.seterr("warn");

1. Gradient Descent

Given is the function $f$:<br>
$f(x) = x_1^4 + 4x_1x_2 + 2x_2 + \frac{1}{2}x_2^2$

Its partial derivatives are given by:<br>
$\frac{\partial f}{\partial x_1} = 4x_1^3 + 4x_2$

$\frac{\partial f}{\partial x_2} = x_2 + 4x_1 + 2$

And so the gradient $\nabla f$ of $f$ is equal to:<br>
$$\begin{pmatrix} 4x_1^3 + 4x_2 \\ x_2 + 4x_1 + 2 \end{pmatrix}$$

Function that computes the gradient of $f$ at a given point $x \in \mathbb R^2$:

In [46]:
def get_gradient_f(x):
    partialx1 = 4 * (x[0])**3 + 4 * x[1];
    partialx2 = x[1] + 4 * x[0] + 2;
    gradient = np.array(partialx1, partialx2);
    return gradient;

Function 'get_f' that simply returns the function value of $f$ for a given $x \in \mathbb R^2$:

In [48]:
def get_f(x):
    return x[0]**4 + 4*x[0]*x[1] + 2*x[1] + (1/2)*x[1]**2;

A function 'eta_const' that returns a constant step-size at any given time instant $t 
\in \N$:

In [12]:
def eta_const(t, c=0.1):
    return c;

A function 'eta_sqrt' that returns for any iteration $t \in \N$ the step size $c / \sqrt{t + 1}$:

In [66]:
def eta_sqrt(t, c=0.1):
    return c / math.sqrt(t + 1);

A function 'eta_multistep' that returns a step size that is initially set to eta init, but is decayed at each milestone by multiplying it with factor c:

In [79]:
def eta_multistep(t, milestones, c=0.1, eta_init=0.1):
    for i in range(len(milestones)):
        if (t < milestones[i]):
            return eta_init * c**i;
    
    return eta_init * c**(len(milestones));

The gradient_descent function:

In [73]:
def gradient_descent(f, grad_f, eta, x_0, max_iter=100):
    coordinates = np.empty([max_iter + 1, 2]);
    coordinates[0] = x_0;
    for t in range(max_iter):
        coordinates[t + 1] = coordinates[t] - eta(t) * grad_f(coordinates[t]);
    return f(coordinates[max_iter]);
    # poor memory efficiency, but oh well


Perform 100 iterations, starting at x 0=(1,1) and return the function value of x100 for the following step size policies:

In [10]:
x0 = np.array([1,1]);
print(x0);

[1 1]


A: eta_const:

In [51]:
print(gradient_descent(get_f, get_gradient_f, eta_const, x0, 100));

4.179738622631322e-23


B: eta_sqrt:

In [67]:
print(gradient_descent(get_f, get_gradient_f, eta_sqrt, x0, 100))

0.00023604572142682287


C: eta_multistep:

In [80]:
# gradient descent especially for multistep:
def gradient_descent_multistep(f, grad_f, x_0, max_iter=100):
    coordinates = np.empty([max_iter + 1, 2]);
    coordinates[0] = x_0;
    for t in range(max_iter):
        coordinates[t + 1] = coordinates[t] - eta_multistep(t, [10, 60, 90], c=0.5, eta_init=0.1) * grad_f(coordinates[t]);
    return f(coordinates[max_iter]);

print(gradient_descent_multistep(get_f, get_gradient_f, x0, 100))

1.4013539444174622e-09


2. Coordinate Descent:

Given is the function $f$:<br>
$f = \frac{1}{2}x_1^4 - x_1x_2 + x_2^2 + x_2x_3 + x_3^2$

Below are all partial derivatives of $f$:<br>
$\frac{\partial f}{\partial x_1} = 2x_1^3 - x_2$

$\frac{\partial f}{\partial x_2} = -x_1 + 2x_2 + x_3$

$\frac{\partial f}{\partial x_3} = x_2 + 2x_3$

Next, we provide the implementations of the minimizer functions:

In [129]:
def argmin_x1(x):
    # FONC:
    if (x[1] >= 0):
        x1 = math.pow(x[1], float(1)/3);
    else:
        x1 = -math.pow(abs(x[1]), float(1)/3);
    # Only one point satisfies FONC, so must be minimum:
    return x1;

def argmin_x2(x):
    # FONC:
    x2 = (x[0] - x[2]) / 2;
    # Only one point satisfies FONC, so must be minimum:
    return x2;

def argmin_x3(x):
    # FONC:
    x3 = -(1/2)*x[1];
    # Only one point satisfies FONC, so must be minimum:
    return x3;

And a (token) function to get values of $f$ for any $x \in \mathbb{R}^2$:

In [83]:
def get_f(x):
    return (1/2)*x[0]**4 + x[0]*x[1]+x[1]**2 + x[1]*x[2] + x[2]**2;

Now, we provide a function that can be used to execute coordinate descent:

In [92]:
def coordinate_descent(f, argmin, x_0, max_iter=100):
    x_t = x_0;
    for t in range(max_iter):
        for i in range(len(argmin)):
            x_t[i] = argmin[i](x_t);
    return x_t;

Then, finally, we run our code to get the results:

In [133]:
x_0 = np.array([5., 10., 5.]);

xfinal = coordinate_descent(get_f, [argmin_x1, argmin_x2, argmin_x3], x_0);

print(xfinal[0]);
print("rounded: " + str(round(xfinal[0], 1)));

-0.8164965809277261
rounded: -0.8


3. 