# Cartpole optimal control problem

![image.png](attachment:image.png)

A cartpole is another classical example of control. In this system, an underactuated pole is attached on top of a 1D actuacted cart. The game is to raise the pole to a standing position.

The model is here:
https://en.wikipedia.org/wiki/Inverted_pendulum

We denote by $m_1$ the cart mass, $m_2$ the pole mass, $l$ the pole length, $\theta$ the pole angle w.r.t. the vertical axis, $x$ the cart position, and $g=$9.81 the gravity.

The system acceleration can be rewritten as:

$$\ddot{\theta} = \frac{1}{\mu(\theta)} \big( \frac{\cos \theta}{l} f + \frac{mg}{l} \sin(\theta) - m_2 \cos(\theta) \sin(\theta) \dot{\theta}^2\big),$$

$$\ddot{x} = \frac{1}{\mu(\theta)} \big( f + m_2 \cos(\theta) \sin(\theta) g -m_2 l \sin(\theta) \dot{\theta} \big),$$

$\hspace{12em}$with $$\mu(\theta) = m_1+m_2 \sin(\theta)^2,$$

where $f$ represents the input command.


## I. Differential Action Model

A Differential Action Model (DAM) describes the action (control/dynamics) in continous-time. In this exercise, we ask you to write the equation of motions for the cartpole.

For more details, see the instructions inside the DifferentialActionModelCartpole class.

In [None]:
import crocoddyl
import pinocchio
import numpy as np

class DifferentialActionModelCartpole(crocoddyl.DifferentialActionModelAbstract):
    def __init__(self):
        crocoddyl.DifferentialActionModelAbstract.__init__(self, crocoddyl.StateVector(4), 1, 6) # nu = 1; nr = 6
        self.unone = np.zeros(self.nu)

        self.m1 = 1.
        self.m2 = .1
        self.l  = .5
        self.g  = 9.81
        self.costWeights = [ 1., 1., 0.1, 0.001, 0.001, 1. ]  # sin, 1-cos, x, xdot, thdot, f
        
    def calc(self, data, x, u=None):
        if u is None: u=model.unone
        # Getting the state and control variables
        y, th, ydot, thdot = np.asscalar(x[0]), np.asscalar(x[1]), np.asscalar(x[2]), np.asscalar(x[3])
        f = np.asscalar(u[0])

        # Shortname for system parameters
        m1, m2, l, g = self.m1, self.m2, self.l, self.g
        s, c = np.sin(th), np.cos(th)

        # Defining the equation of motions
        m = m1 + m2
        mu = m1 + m2 * s**2
        xddot  = (f     + m2 * c * s * g - m2 * l * s * thdot**2 ) / mu
        thddot = (c * f / l + m * g * s / l  - m2 * c * s * thdot**2 ) / mu
        data.xout = np.matrix([ xddot,thddot ]).T

        # Computing the cost residual and value
        data.r = np.matrix(self.costWeights * np.array([ s, 1-c, y, ydot, thdot, f ])).T
        data.cost = .5* np.asscalar(sum(np.asarray(data.r)**2))

    def calcDiff(self,data,x,u=None,recalc=True):
        # Advance user might implement the derivatives
        pass

You may want to check your computation. Here is how to create the model and run the calc method.

In [None]:
cartpoleDAM = DifferentialActionModelCartpole()
cartpoleData = cartpoleDAM.createData()
x = cartpoleDAM.state.rand()
u = np.zeros(1)
cartpoleDAM.calc(cartpoleData, x, u)

## II. Write the derivatives with DAMNumDiff

In the previous exercise, we didn't define the derivatives of the cartpole system. In crocoddyl, we can compute them without any additional code thanks to the DifferentialActionModelNumDiff class. This class computes the derivatives through numerical differentiation.

In the following cell, you need to create a cartpole DAM that computes the derivates using NumDiff.

In [None]:
# Creating the cartpole DAM
cartpoleND = crocoddyl.DifferentialActionModelNumDiff(cartpoleDAM, True)

After creating your cartpole DAM with NumDiff. We would like that you answer the follows:

 - 2 columns of Fx are null. Wich ones? Why?

 - can you double check the values of Fu?


## III. Integrate the model

After creating DAM for the cartpole system. We need to create an Integrated Action Model (IAM). Remenber that an IAM converts the continuos-time action model into a discrete-time action model. For this exercise we'll use a simpletic Euler integrator.

In [None]:
timeStep = 5e-2
cartpoleIAM = crocoddyl.IntegratedActionModelEuler(cartpoleND, timeStep)

## IV. Write the problem, create the solver

First, you need to describe your shooting problem. For that, you have to indicate the number of knots and their time step. For this exercise we want to use 50 knots with $dt=$5e-2.

Here is how we create the problem.

In [None]:
# Fill the number of knots (T) and the time step (dt)
x0 = np.matrix([ 0., 3.14, 0., 0. ]).T
T  = 50
problem = crocoddyl.ShootingProblem(x0, [ cartpoleIAM ]*T, cartpoleIAM)

Problem cannot solve, just integrate:

In [None]:
us = [ pinocchio.utils.zero(cartpoleIAM.differential.nu) ]*T
xs = problem.rollout(us)

In cartpole_utils, we provite a plotCartpole and a animateCartpole methods.

In [None]:
%%capture
%matplotlib inline
from cartpole_utils import animateCartpole
anim = animateCartpole(xs)

# If you encounter problems probably you need to install ffmpeg/libav-tools:
# sudo apt-get install ffmpeg
# or
# sudo apt-get install libav-tools

And let's display this rollout!

Note that to_jshtml spawns the video control commands.

In [None]:
from IPython.display import HTML
# HTML(anim.to_jshtml())
HTML(anim.to_html5_video())

Now we want to create the solver (SolverDDP class) and run it. Display the result. **Do you like it?**

In [None]:
# Creating the DDP solver
ddp = crocoddyl.SolverDDP(problem)
ddp.setCallbacks([crocoddyl.CallbackVerbose()])

# Solving this problem
done = ddp.solve([], [], 1000)
print done
print ddp.us

In [None]:
%%capture
%matplotlib inline

# Create animation
anim = animateCartpole(ddp.xs)

In [None]:
# HTML(anim.to_jshtml())
HTML(anim.to_html5_video())

## Tune the problem, solve it

Give some indication about what should be tried for solving the problem.


 - Without a terminal model, we can see some swings but we cannot stabilize. What should we do?

 - The most important is to reach the standing position. Can we also nullify the velocity?

 - Increasing all the weights is not working. How to slowly increase the penalty?



In [None]:
###########################################################################
################## TODO: Tune the weights for each cost ###################
###########################################################################
terminalCartpole = DifferentialActionModelCartpole()
terminalCartpoleDAM = crocoddyl.DifferentialActionModelNumDiff(terminalCartpole, True)
terminalCartpoleIAM = crocoddyl.IntegratedActionModelEuler(terminalCartpoleDAM)

terminalCartpole.costWeights[0] = 100
terminalCartpole.costWeights[1] = 100
terminalCartpole.costWeights[2] = 1.
terminalCartpole.costWeights[3] = 0.1
terminalCartpole.costWeights[4] = 0.01
terminalCartpole.costWeights[5] = 0.0001
problem = crocoddyl.ShootingProblem(x0, [ cartpoleIAM ]*T, terminalCartpoleIAM)

In [None]:
# Creating the DDP solver
ddp = crocoddyl.SolverDDP(problem)
ddp.setCallbacks([ crocoddyl.CallbackVerbose() ])

# Solving this problem
done = ddp.solve([], [], 300)
print done

In [None]:
%%capture
%matplotlib inline

# Create animation
anim = animateCartpole(ddp.xs)

In [None]:
# HTML(anim.to_jshtml())
HTML(anim.to_html5_video())