In [1]:
import numpy as np
from math import exp as e

Flàvia Ferrús and David Rosado


## Proposed experiments

### Experiment 1

Let us implement the Sequential Quadratic Optimization (SQO) method by applying $\alpha^k=1$ and iteratively update the current point to obtain the next. Let us start by define the functions that we need. Remember that the Lagrangian is given by
\begin{align*}
\mathcal{L}(\textbf{x}, \lambda) = f(\textbf{x}) - \lambda h(\textbf{x}),\hspace{0.5cm}\textbf{x}\in\mathbb{R}^n.
\end{align*}
In our case, $n=2$ and $f(x,y)=e^{3x} + e^{-4y}$ and $h(x,y)=x^2+y^2-1$.

In [2]:
#Definitions of the functions
def f(x,y):
  return e(3*x) + e(-4*y)
def h(x,y):
  return x**2 + y**2 -1
def grad_f(x,y):
  return np.array([3*e(3*x), -4*e(-4*y)])
def grad_h(x,y):
  return np.array([2*x, 2*y])
def hessian_f(x,y):
  H = np.zeros((2,2))
  H[0,0] = 9*e(3*x)
  H[1,0] = 0
  H[0,1] = 0
  H[1,1] = 16*e(-4*y)
  return H
def hessian_h(x,y):
  H = np.zeros((2,2))
  H[0,0] = 2
  H[1,0] = 0
  H[0,1] = 0
  H[1,1] = 2
  return H
def lagran(x,y,lanbda):
  return f(x,y) - lanbda*h(x,y)
def lagran_gradx(x,y,lanbda):
  return grad_f(x,y) - lanbda*grad_h(x,y)
def lagran_hessianx(x,y,lanbda):
  return hessian_f(x,y) - lanbda*hessian_h(x,y)

Let us implement the SQO algorithm with Newton's method to solve
\begin{cases}
\text{min}\hspace{0.2cm}f(x,y)\\
\text{subject to}\hspace{0.2cm} h(x,y)=0
\end{cases}
where $f$ and $h$ are defined in the previous cell.

In [3]:
#Function that implements the SQO with Newton's method
def Newton_algorithm(x0,y0,lanbda_0,alpha,max_iter,tol):
  for i in range(0,max_iter):
    #Build the matrix A to solve Ax=b
    A = np.zeros((3,3))
    for k in range(0,2):
      for j in range(0,2):
        A[k,j] = lagran_hessianx(x0,y0,lanbda_0)[k,j]
    for k in range(0,2):
      A[2,k] = -grad_h(x0,y0)[k]
      A[k,2] = -grad_h(x0,y0)[k]
   #Build the vector b
    b = np.zeros(3)
    for k in range(0,2):
      b[k] = -lagran_gradx(x0,y0,lanbda_0)[k]
    b[2] = h(x0,y0)
    #Solve the system using the python solve
    delta = np.linalg.solve(A,b)
    #Actualize the variables
    x0 = x0 + alpha*delta[0]
    y0 = y0 + alpha*delta[1]
    lanbda_0 = lanbda_0 + alpha*delta[2]
    if np.linalg.norm(lagran_gradx(x0,y0,lanbda_0))<tol:
      print('Iterations:',i)
      print('x = (x, y) =',x0,y0)
      print('lamba=',lanbda_0)
      return x0,y0,lanbda_0
      break
  return x0,y0,lanbda_0

In [4]:
x,y,lanbda = Newton_algorithm(-1,1,-1,1,100,1e-3)

Iterations: 2
x = (x, y) = -0.7483381762503777 0.663323446868971
lamba= -0.21232390186241443


We can observe in the *pdf* file that the solution of this problem is $(x^*,y^*)=( -0.74834,0.66332)$ and $\lambda^*=−0.21233$. Notice that we reach the correct solution of the problem in two iterations choosing $\epsilon = 10^{-3}$. Evidently, if we set a lower $ϵ$, the number of iterations will increase.

### Experiment 2

Let us choose starting points that are farther away of the optimal solution to see if the algorithm works or not.

In [5]:
#Let us create random points, farther away of the optimal solution and implement the algorithm
for i in range(1,11):
  x0 = float(np.random.rand(1) + i/2)
  y0 = float(np.random.rand(1) - i/2)
  lanbda_0 = float(np.random.rand(1) + i/4)
  print('The starting points are (x0,y0,lanbda_0)=',(x0,y0,lanbda_0))
  Newton_algorithm(x0,y0,lanbda_0,1,100,1e-3)
  print('\n\n')

The starting points are (x0,y0,lanbda_0)= (1.1897569607285545, -0.18888208027615416, 0.8217764510629444)
Iterations: 5
x = (x, y) = 0.9949825879806212 -0.10004824333248946
lamba= 29.827861700223693



The starting points are (x0,y0,lanbda_0)= (1.700359514272526, -0.03637257311024056, 1.2836457959895375)
Iterations: 9
x = (x, y) = -0.7483594868892595 0.6633030894123038
lamba= -0.21230688436242776



The starting points are (x0,y0,lanbda_0)= (1.7738036897812932, -0.9915299122890213, 0.7993198082158964)
Iterations: 5
x = (x, y) = 0.910413052541143 -0.41370064616629076
lamba= 25.293845609301567



The starting points are (x0,y0,lanbda_0)= (2.328579083310692, -1.6307090402545716, 1.5751935995040847)
Iterations: 7
x = (x, y) = 0.9104132322159664 -0.4137000699470232
lamba= 25.293855122901604



The starting points are (x0,y0,lanbda_0)= (3.16346236271288, -1.5590286171787615, 1.2907586410378216)
Iterations: 7
x = (x, y) = 0.9949826916769633 -0.1000493219931445
lamba= 29.827804650979605



The 

Notie that in most cases, the method does not work. That is beacuse Newton algorithm only works in a local way, so if we choose starting points that are farther away of the optimal solution, the method may not work.

### Experiment 3

Let us define the merit function $\mathcal{M}$ and perform a classical gradient descent( with backtraking) algorithm, in order to deal with the problem of starting points that are farther away of the optimal solution.

In [6]:
#Definition of the merit function and its gradient
def merit(x, y, rho=10):
    return f(x, y) + rho * h(x, y)**2

def grad_merit(x, y, rho=10):
    return grad_f(x, y) + 2 * rho * h(x, y) * grad_h(x, y)

In [7]:
#Gradient descent with backtracking
def gradient_descent(f,grad_f,w0,w1,tol):
  x_0=np.zeros(2)
  x_0[0]=w0
  x_0[1]=w1
  while True:
      alpha=1
      grad = grad_f(x_0[0],x_0[1])
      x_k=x_0-alpha*grad/np.linalg.norm(grad)
      while f(x_k[0],x_k[1])>=f(x_0[0],x_0[1]):
        alpha=alpha/2
        x_k=x_0-alpha*grad/np.linalg.norm(grad)
      if abs(f(x_k[0],x_k[1]) - f(x_0[0],x_0[1])) < tol  or np.linalg.norm(grad/np.linalg.norm(grad)) < tol:
        return x_k
      else:
        x_0=x_k
        
  return x_k

Let us test this method with a point farther away of the optimal solution and oberve if the result is close to the optimal solution.

In [8]:
w0 = float(np.random.rand(1) + 34/2)
w1 = float(np.random.rand(1) - 34/2)
print(w0)
res1 = gradient_descent(merit, grad_merit,w0,w1,1e-3)
print('The solution of the gradient descent using the merit function is (x,y)=',(res1[0],res1[1]))
w0 = float(np.random.rand(1) + 20/2)
w1 = float(np.random.rand(1) - 20/2)
print(w0)
res2 = gradient_descent(merit, grad_merit,w0,w1,1e-3)
print('The solution of the gradient descent using the merit function is (x,y)=',(res2[0],res2[1]))

17.967853701779216
The solution of the gradient descent using the merit function is (x,y)= (-0.3236139698916473, 0.8825391209903108)
10.973207959981872
The solution of the gradient descent using the merit function is (x,y)= (-0.5179680757923558, 0.8241081695219867)


Notice that we are getting closer to the optimal solution!!

### Experiment 4

As we have seen and is said in the $\textit{pdf}$ file, the minimizers of the merit function do not necessarily have to coincide with the minimizers of the constrained problem. Therefore, we will build an algorithm that consists in the following: 
+ Start with the merit function to obtain an approximation to the optimal point we are looking for.
+ Once an approximation to the solution is found, use the Newton-based method to find the optimal solution.

We will use the starting points used in the previous experiment. Notice that we have the first of the algorithm already implemented. The aproximation points obtained with the merit function are stored in $res1$ and $res2$. Let us apply now the Newton-based algorithm to find the optimal solution.

In [None]:
sol1 = Newton_algorithm(res1[0],res1[1], -1, 1, 100,1e-3)
sol2 = Newton_algorithm(res2[0],res2[1], -1, 1, 100,1e-3)

Iterations: 2
x = (x, y) = -0.7483353457092601 0.6633208782631613
lamba= -0.21232389024350248
Iterations: 2
x = (x, y) = -0.7483106326338553 0.6633717041241959
lamba= -0.2122554628381516


Finally, we obtain the expected result!!

## Extra experiment

We seek to apply the minimum possible force to move a particle with mass $m$ from an initial point $x_0 = (0,0,0)$, to the final point $x_1 = (1,0,0)$ in $T=1$ seconds, in absence of any other body forces. Let's consider the problem uni-dimensional, since we can consider the reference system to be centered at the initial point of the particle and assume the particle is moving in the $x$-axis direction. Thus, the generalized coordinate is given by $q=x$ and momentum $p= m \dot{q}$. Since there are no field no conservative acting on the system and there are no no stationary constraints acting over the free particle, we have that the Hamiltonian corresponds to the total energy of the system, this is 
$$ 
H = p \dot{q} - L = E_T = E_K + E_P
$$
where $E_K, E_P$ are the kinetic and potential energies of the system, $L=E_k - E_p$ is the Lagrangian. Thus, under these assumptions we have $E_K = \frac{1}{2} m \dot{q}^2 = \frac{p^2}{2m}$, where clearly $\dot{q} = \frac{\partial q}{\partial t}$, and $E_P = - W_F= -\int F(t) dr$ where the $W_F$ denotes the work experienced by the force $F(t)$ that we apply to the particle. Given that we assume that the force is conservative and thus its work does not depend on the path followed, we have $F(t)=f$, and therefore, the hamiltonian has the following expression:  
$$
H(p,q,t) = \frac{1}{2} m \dot{q}^2 - F q 
$$
Thus, since $H=E_T$ and due to the principle of conservation of energy we have 
$$
\frac{\partial H}{\partial t} = 0 \iff m \dot{q} \ddot{q} - f\dot{q} = 0 \iff f = m \ddot{q}
$$
Note that we have recovered the second Newton's law, and we can therefore find $f$ by solving the differential equation obtained in terms of $f$ and then compute the corresponding value of $f$ by plugging in the initial conditions fixed:
$$
\ddot{q} = \frac{f}{m} \iff \dot{q} \int_0^t\frac{f}{m}ds = \frac{ft}{m} \iff q(t) = \int_0^t\frac{fs}{m}ds = \frac{f t^2}{2m}
$$
Using now the initial conditions we have $f= 2m$. 

However, consider now the case in which we want the particle to reach point $x_1$ and stay there. Observe now that the force used $F(t)$ is not conservative this time, and we may consider the non stationary constraints over the Hamiltonian. We seek to find now the minimum force $F(t)=f$, i.e. $min |F(t)|$ constrained to the second Newton's law: $F(t) = m \ddot{q} = \dot{p} \iff \dot{q}= p/m, \ \dot{p}=f$. Thus, considering the Lagrange multipliers on this system we have 
$$
L' = |f(t)|^2 + \lambda_1 \frac{p}{m} + \lambda_2 f(t)
$$
Therefore, the Euler-Lagrange equations with the new generalized variable to be $p, f$ are given by the corresponding partial derivatives we have:
$$
\frac{\partial L'}{\partial f } = \frac{d}{dt} \frac{\partial L'}{\partial \dot{f} } = 0 \iff 2 f + \lambda_2 = 0 \iff f = -\lambda_2/2
$$

Now, similarly, we have
\begin{align*}
\frac{\partial L'}{\partial q} = 0 =& \frac{d}{dt}\frac{\partial L'}{\partial \dot{q}} = \frac{d}{dt} \lambda_1 = \dot{\lambda_1} \\
\frac{\partial L'}{\partial p} = & \frac{\lambda_1}{m} = \frac{d}{dt}\frac{\partial L'}{\partial \dot{p}}  = \dot{\lambda_2}
\end{align*}
And therefore $\lambda_1 = const = a$ and $\lambda_2 = \frac{at+b}{m}$. Consequently we have that $f=- \frac{at+b}{2m}$, and by solving the corresponding differential equations we have
\begin{align*}
p(t) = \int_0^t f(s) ds &= \frac{at^2 }{4m} + \frac{bt}{2m} \\
q(t) = \int_0^t p(s) ds &= \frac{1}{m} \Big[ \frac{at^3 }{12m} + \frac{bt^2}{4m} \Big]
\end{align*}
Finally, by fixing the boundary conditions of $p(1) = 0 $ and $q(1)= 1$ we get that $a = -24m^2$ and $b = -a/2 = 12m^2$, and consequently we get
$$
\boxed{f(t) = \frac{at + b}{m} = -24mt + 12m}
$$