# Adaptive Dynamic Programming
 - Continuous-time systems
 - On-policy

### Actor-critic scheme
 - Single NN

## Quanser 2-DOF helicopter
<center><img src="https://www.quanser.com/wp-content/uploads/2017/04/2-DOF-Helicopter_Shadow-923x766.jpg" alt="Yahoo Finance" width=300/>

The linearized, continuous-time state-space model is given by

<center>$\left\{ \begin{array}{l}
\dot{x}\left(t\right)=\bar{A}x\left(t\right)+\bar{B}u(t) \\ 
y\left(t\right)=Cx(t) \end{array}
\right.$

where the system state is defined as
$ x = \left [ \theta, \phi, \dot \theta, \dot \phi \right ]^T $
, and

<center>$\begin{array}{l}
\bar{A}=\left[ \begin{array}{cccc}
0 & 0 & 1 & 0 \\ 
0 & 0 & 0 & 1 \\ 
0 & 0 & -\frac{B_p}{J_{T_p}} & 0 \\ 
0 & 0 & 0 & -\frac{B_y}{J_{T_y}} \end{array}
\right] \\ 
\bar{B}=\left[ \begin{array}{cc}
0 & 0 \\ 
0 & 0 \\ 
\frac{K_{pp}}{J_{T_p}} & \frac{K_{py}}{J_{T_p}} \\ 
\frac{K_{yp}}{J_{T_y}} & \frac{K_{yy}}{J_{T_y}} \end{array}
\right] \\ 
C=\left[ \begin{array}{cccc}
1 & 0 & 0 & 0 \\ 
0 & 1 & 0 & 0 \end{array}
\right] \end{array}$

This model will be deployed in `MATLAB/Simulink` in order to be simulated for the desired control strategies later.

## Adaptive optimal control law
- Single NN
- Actor-critic scheme

Policy evaluation and policy improvement
<center>$\begin{gather}
  x^T Q x+u^T R u + \left(\nabla V(x)\right)^T \left( F(x)+B(x)u \right)=0,\\
  u(x)=-\frac{1}{2}R^{-1}B^T(x)\nabla V.
\end{gather}$

## Actor-critic scheme

The **Critic NN** approximates the value function $V(x)$ as:
<center>$\hat{V}(x)=\hat{W}_v^T \mu_v(x)$

<!-- where it is expected to approximate
<center>$V(x)={W_v^*}^T \mu_v(x) + \varepsilon_v.$ -->

Consider a quadratic running cost as
<center>$L(x,u)=x^T Q x+u^T R u$


The approximate optimal control law is obtained as:
<center>$\hat{u}(x)=-\frac{1}{2}R^{-1}B^T(x) \left(\nabla \mu_v(x)\right)^T \hat{W}_v$

## Normalized gradient descent: Critic update rule
The Hamiltonian function is expressed as
<center>$H(x,\nabla {V},u)=0.$

Introduce Hamiltonian approximation error as 
<center>$e_c= H(x,\nabla \hat{V},u) - H\left(x,\nabla V,u\right)$

which gives
<center>$e_c = \hat{H} = x^T Q x + u^T R u + {\frac{\partial \hat{V}}{\partial x}}^T\left(F\left(x\right)+Bu\right),$

or simply
<center>$e_c = x^T Q x + u^T R u + \hat{W}_v^T \nabla \mu_v(x) \left( F(x)+B(x)u \right),$

Then, define the objective function for training the critic network as
<center>$E_c=\frac{1}{2} e_c^T e_c,$

which gives the update rule
<center>$\begin{array}{cl}
% {\dot{\hat{W}}}_v & =-{\alpha }_v\frac{\partial E_v}{\partial w_v} \\ 
\dot{\hat{W}}_v & =  -\alpha_c \frac{\partial E_c}{\partial \hat{W}_v} \\
%  & =-{\alpha }_v\mathrm{\Phi }e_v \end{array}
 & =-\alpha_c \phi e_c
 \end{array}$
    
Normalized form
<center>$\dot{\hat{W}}_v = -\alpha_c \frac{\phi}{\left(1 + \phi^T \phi \right)^2}e_c$

where $\phi= \nabla \mu_v(x) \left( F(x)+B(x)u \right)$.

## Modification term in the updating rule

Consider the following Lyapunov function
<center>${J_s} = \frac{1}{2}x^Tx.$

The time derivative of ${J_s}$ is obtained as
<center>$\dot{J_s} =\nabla {J_s}^T \dot{x} = \nabla {J_s}^T \left(F+Bu^*\right) < 0.$

It is assumed that the following condition holds
<center>$
{\left( \nabla {J_s}\right)}^T\left(F+Bu^*\right)=-{\left( \nabla {J_s}\right)}^T \Gamma \left(\nabla {J_s}\right),$
    
where $\Gamma$ is a positive definite matrix.

Under this condition, we can add a modification term to the update rule as follows
<center>$\dot{\hat{W}}_v = -\alpha_c \frac{\phi}{\left(1 + \phi^T \phi \right)^2}e_c - \alpha_s \frac{\partial \dot{J_s}(x)}{\partial \hat{W}_v},$
    
where

<center>$\frac{\partial \dot{J_s}(x)}{\partial \hat{W}_v} = \frac{\partial \left[\left(\nabla J_s\right)^T \left( F(x)+B(x) \hat{u}(x) \right)\right]}{\partial \hat{W}_v}.$

Substitution gives
<center>$\dot{\hat{W}}_v = -\alpha_c \frac{\phi}{\left(1 + \phi^T \phi \right)^2}e_c + \frac{1}{2}\alpha_s \Pi(x,\hat{u})
\nabla \mu_v BR^{-1}B^T \nabla J_s,$
    
where $\Pi(x,\hat{u})$ is defined as the additional stabilizing term given by
<center>$
\Pi(x,\hat{u}) = 
\begin{cases}
0 & \text{ if } \dot{J_s}<0 \\
1 & \text{ else.}
\end{cases}$

<center><img src="ARE_from_HJB.jpg" alt="ARE from HJB" width=600/>