In [1]:
%reload_ext autoreload
%autoreload 2
%config IPCompleter.greedy=True
# load libraries and set plot parameters
import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline

from IPython.display import set_matplotlib_formats
set_matplotlib_formats('pdf', 'png')
plt.rcParams['savefig.dpi'] = 75

plt.rcParams['figure.autolayout'] = False
plt.rcParams['figure.figsize'] = 10, 6
plt.rcParams['axes.labelsize'] = 18
plt.rcParams['axes.titlesize'] = 20
plt.rcParams['font.size'] = 16
plt.rcParams['lines.linewidth'] = 2.0
plt.rcParams['lines.markersize'] = 8
plt.rcParams['legend.fontsize'] = 14
plt.rcParams['text.usetex'] = True
plt.rcParams['font.family'] = "serif"
plt.rcParams['font.serif'] = "cm"

# Outline

- GMM
    - kmeans + gmm(https://ipvs.informatik.uni-stuttgart.de/mlr/marc/teaching/19-MachineLearning/06-unsupervised.pdf)
    - kmeans implementation (https://www.kaggle.com/andyxie/k-means-clustering-implementation-in-python)
- HMM
    - hmm theoretical (https://ipvs.informatik.uni-stuttgart.de/mlr/marc/teaching/18-ArtificialIntelligence/09-graphicalModels.pdf)
    - hmm implementation heads + tails (https://github.com/ananthpn/pyhmm)
    - cont time and discrete time plots (https://github.com/lopatovsky/HMMs)
    - hmm robo filtering (https://github.com/beneisnr/hMm-filtering)
- DTW

- pbdlib
    - Related Work:
    - no symbolic approach (look up "Natural methods for robot task learning: Instructive demonstrations, generalization and practice")
        - DRAWBACK: symbolic approaches rely on biases to be segmented
    - no direct time dependence
        - DRAWBACK: Algining and scaling time dependet sequences is a difficult task (handling spatial and temporal perturbances is hard)
    - other approaches have considererd modeling the intrinsic dynamics of motion
        - BENEFIT: does not depent on explicit time variable
        - BENEFIT: can be modulated in unseen regions
        - DRAWBACK: require high number of states and smooting procedure
    - propsed model: HMMs and GMMs
    - GMR models joint probability function of the data (no direct regression like gpr)
    - regression function is then derived from the joint density model
        - ADVANTAGE: input and output components are only specified at the very last step of the process
        - ADVANTAGE: density estimation can be learning in an off-line phase, regression process can be computed very rapidly
        - ADVANTAGE: can handle different sources of missing data: system can consider any combination of input/output mappings

- do PCA

## Approach

- a skill is demonstrated to the robot in slightly different situations
- demonstration $m \in \{1, ..., M\}$ consists of a set of Trajectories $T_m$
- Trajectory $T_m$ consists of $d$-dimensional joint positions $x$ and velocities $\dot{x}$ 

$$D = \{\{(x_t, \dot{x}_t)\}_{t=1}^T\}_{m=1}^M$$

- joint distribution $\mathcal{P}(x, \dot{x})$ is encoded in a continuous HMM of K states.
- output distribution of each state is represented by a gaussian which encodes local variation and correlation information
- Parameters of HMM:

$$\{\Pi, a, \mu, \Sigma\}$$

- learned using Baum-Welch Algorithm(variant of the expectation maximization algorithm)
- Input and output components of HMM in each state $s_i$:

$$\mu_i = \left[\begin{array}{c}\mu_i^x \\ \mu_i^{\dot{x}}\end{array}\right] \text{, } \Sigma_i = \left[\begin{array}{cc}\Sigma_i^x & \Sigma_i^{x\dot{x}}\\ \Sigma_i^{\dot{x}x}& \Sigma_i^{\dot{x}}\end{array}\right] $$

- given the current position command, a desired velocity command is estimated using gaussian mixture regression

$$\hat{\dot{x}} = \sum_{i=1}^K h_i(x) [ \mu_i^{\dot{x}} + \Sigma_i^{\dot{x}x}(\Sigma_i^x)^{-1} (x - \mu_i^x)]$$

- where $h_i(x)$ is used to encode the sequential information encapsulated in the HMM:

$$h_i(x_t) = \frac{(\sum_{j=1}^K h_j(x_{t-1}) a_{ji}) \mathcal{N}(x_t; \mu_i^x, \Sigma_i^x)}{\sum_{k=1}^K [(\sum_{j=1}^K h_j(x_{t-1}) a_{jk})\mathcal{N}(x_t; \mu_k^x, \Sigma_k^x)]}$$

- since reproduction is unstable in regions that have not been covered during the demonstration, a secondary term has to be added:

$$\hat{x} = \sum_{i=1}^K h_i(x) [ \mu_i^{x} + \Sigma_i^{x\dot{x}}(\Sigma_i^\dot{x})^{-1} (\dot{x} - \mu_i^\dot{x})]$$


$$\ddot{x} = (\hat{\dot{x}} - \dot{x}) \kappa^{\mathcal{V}} + (\hat{x}-x)\kappa^{\mathcal{P}} $$

- show plots

- the first term allows the robot to follow the demonstrate motion profile, the second term keeps the robot from departing from a known situation and forces it to com back into the subspace of demonstrations

- may lead to oscilations

- use adaptive gains: proportional gain should decrease when the system is close to the demonstrated trajectories

- adaptive gains allow the controller to focus on the other constraints of the task

## Metrics

- M1: RMS error along the motion w.r.t to the demonstration dataset  M_1

- M2: RMS error after DTW; spatial information is prioritized here M_2; the metric compares the path followed by the robot instead of the exact trajectory

- M3: Norm of jerk; derivative of acceleration is a good candidate to evaluate the smoothness of human motion

- M4: Computation time of learning process

- M5: Retrieval duration


## Comparison

- HMM vs TMGR (Time dependent gaussian mixture regression): 

- M1 & M2: all methods perform well, HMM performs well with a small number of states

- M3: HMM is a little bit jerky

- M4: training time is less important than reproduction time

- M5: LWR not competitable; linear dependencs in the number of states

- when dimensionality is low, the difficulty is to correctly handle the crossing points that can appear when randomly generating trajectories

## Results

- HMM can handle crossing points in trajectories (due to sequential information)

- HMM can be used in an unsupervised manner: several movements can be encoded in a single model without specifying any class label or number of movements

- HMM hitting a table tennis ball from different angles, desired velocity is encoded in demonstrations
