Skip to content

Latest commit

 

History

History
194 lines (86 loc) · 20.3 KB

dmp.md

File metadata and controls

194 lines (86 loc) · 20.3 KB

Dynamical Movement Primitives

It is assumed that you have already read the tutorials on Dynamical Systems and Function Approximation.

The core idea behind dynamical movement primitives (DMPs) is to represent movement primitives as a combination of dynamical systems. The state variables of the main dynamical system alt text then represent trajectories for controlling, for instance, the 7 joints of a robot arm, or its 3D end-effector position. The attractor state is the end-point or goal of the movement.

The key advantage of DMPs is that they inherit the nice properties from linear dynamical systems (guaranteed convergence towards the attractor, robustness to perturbations, independence of time, etc) whilst allowing arbitrary (smooth) motions to be represented by adding a non-linear forcing term. This forcing term is often learned from demonstration, and subsequently improved through reinforcement learning.

DMPs were introduced in [ijspeert02movement], but in this section we follow largely the notation and description in [ijspeert13dynamical], but at a slower pace.

Historical remark. The term "dynamicAL movement primitives" is now preferred over "dynamic movement primitives". The newer term makes the relation to dynamicAL systems more clear, and avoids confusion about whether the output of "dynamical movement primitives" is in kinematic or dynamic space (it is usually in kinematic space).

Remark. This documentation and code focusses only on discrete movement primitives. For rythmic movement primitives, we refer to [ijspeert13dynamical].

Basic Point-to-Point Movements: A Critically Damped Spring-Damper System

At the heart of the DMP lies a spring-damper system, as described in Spring-Damper Systems. In DMP papers, the notation of the spring-damper system is usually a bit different:

alt text

In the last two steps, we change the attractor state from 0 to alt text , where alt text is the goal of the movement.

To avoid overshooting or slow convergence towards alt text , we prefer to have a critically damped spring-damper system for the DMP. For such systems alt text must hold, see Critical Damping. In our notation this becomes alt text , which leads to alt text . This determines the value of alt text for a given value of alt text in DMPs. The influence of alt text is illustrated in the first figure here.

Rewriting the second order dynamical system as a first order system (see Rewriting one 2nd Order Systems as two 1st Order Systems) with expanded state alt text yields:

alt text

Please note that in the implementation, the state is implemented as alt text . The order is inconsequential, but we use the notation above ( alt text ) throughout the rest of this tutorial section, for consistency with the DMP literature.

Arbitrary Smooth Movements: the Forcing Term

The representation described in the previous section has some nice properties in terms of convergence towards the attractor , robustness to perturbations , and autonomy, but it can only represent very simple movements. To achieve more complex movements, we add a time-dependent forcing term to the spring-damper system. The spring-damper systems and forcing term are together known as a transformation system.

alt text

The forcing term is an open loop controller, i.e. it depends only on time. By modifying the acceleration profile of the movement with a forcing term, arbitrary smooth movements can be achieved. The function alt text is usually a function approximator, such as locally weighted regression (LWR) or locally weighted projection regression (LWPR), see Function Approximation. The graph below shows an example of a forcing term implemented with LWR with random weights for the basis functions.

alt text

Ensuring Convergence to 0 of the Forcing Term: the Gating System

Since we add a forcing term to the dynamical system, we can no longer guarantee that the part of the system repesenting alt text will converge towards alt text ; perhaps the forcing term continually pushes it away alt text (perhaps it doesn't, but the point is that we cannot guarantee that it always doesn't). That is why there is a question mark in the attractor state in the equation above.

To guarantee that the movement will always converge towards the attractor alt text , we need to ensure that the forcing term decreases to 0 towards the end of the movement. To do so, a gating term is added, which is 1 at the beginning of the movement, and 0 at the end. This gating term itself is determined by, of course, a dynamical system. In [ijspeert02movement], it was suggested to use an exponential system. We add this extra system to our dynamical system by expanding the state as follows:

alt text

Ensuring Autonomy of the Forcing Term: the Phase System

By introducing the dependence of the forcing term alt text on time alt text the overall system is no longer autonomous. To achieve independence of time, we therefore let alt text be a function of the state of an (autonomous) dynamical system rather than of alt text . This system represents the phase of the movement. [ijspeert02movement] suggested to use the same dynamical system for the gating and phase, and use the term canonical system to refer this joint gating/phase system. Thus the phase of the movement starts at 1, and converges to 0 towards the end of the movement, just like the gating system. The new formulation now is (the only difference is alt text instead of alt text ):

alt text

Note that in most papers, the symbol for the state of the canonical system is alt text . Since this symbol is already reserved for the state of the complete DMP, we rather use alt text

Todo: Discuss goal-dependent scaling, i.e. alt text

Multi-dimensional Dynamic Movement Primitives

Since DMPs usually have multi-dimensional states (e.g. one output alt text for each of the alt text joints), it is more accurate to use bold fonts for the state variables (except the gating/phase system, because it is always 1D) so that they represent vectors:

alt text

So far, the graphs have shown 1-dimensional systems. To generate D-dimensional trajectories for, for instance, the 7 joints of an arm or the 3D position of its end-effector, we simply use D transformation systems. A key principle in DMPs is to use one and the same phase system for all of the transformation systems, to ensure that the output of the transformation systems are synchronized in time. The image below show the evolution of all the dynamical systems involved in integrating a multi-dimensional DMP.

alt text

Alternative Systems for Gating, Phase and Goals

The DMP formulation presented so far follows [ijspeert02movement]. Since then, several variations have been proposed, which have several advantages in practice. We now describe some of these variations.

Gating: Sigmoid System

A disadvantage of using an exponential system as a gating term is that the gating decreases very quickly in the beginning. Thus, the output of the function approximator alt text needs to be very high towards the end of the movement if it is to have any effect at all. This leads to scaling issues when training the function approximator.

Therefore, sigmoid systems have more recently been proposed [kulvicius12joining] as a gating system. This leads to the following DMP formulation (since the gating and phase system are no longer shared, we introduce a new state variable alt text for the phase term):

alt text

where the term alt text is determined by alt text

Phase: Constant Velocity System

In practice, using an exponential phase system may complicate imitation learning of the function approximator alt text , because samples are not equidistantly spaced in time. Therefore, we introduce a dynamical system that mimics the properties of the phase system described in [kulvicius12joining], whilst allowing for a more natural integration in the DMP formulation, and thus our code base. This system starts at 0, and has a constant velocity of alt text , which means the system reaches 1 when alt text . When this point is reached, the velocity is set to 0.

alt text

This is admittedly not very elegant, as this discontinuous dynamical system leads to a non-smooth velocity and acceleration profiles. However, the velocities and accelerations of this system are never used, as only the phase itself is passed to the function approximators. So it's not elegant, but it doesn't hurt. This system has been implemented in the TimeSystem class.

alt text

With the constant velocity dynamical system the DMP formulation becomes:

alt text

Zero Initial Accelerations: the Delayed Goal System

Since the spring-damper system leads to high initial accelerations (see the graph to the right below), which is usually not desirable for robots, it was suggested to move the attractor of the system from the initial state alt text to the goal state alt text during the movement [kulvicius12joining]. This delayed goal attractor alt text itself is represented as an exponential dynamical system that starts at alt text , and converges to alt text (in early versions of DMPs, there was no delayed goal system, and alt text was simply equal to alt text throughout the movement). The combination of these two systems, listed below, leads to a movement that starts and ends with 0 velocities and accelerations, and approximately has a bell-shaped velocity profile. This representation is thus well suited to generating human-like point-to-point movements, which have similar properties.

alt text

alt text

In my experience, this DMP formulation is the best for learning human-like point-to-point movements (bell-shaped velocity profile, approximately zero velocities and accelerations at beginning and start of the movement), and generates nice normalized data for the function approximator without scaling issues. The image below shows the interactions between the spring-damper system, delayed goal system, phase system and gating system.

alt text

Summary

The core idea in dynamical movement primitives is to combine dynamical systems, which have nice properties in terms of convergence towards the goal, robustness to perturbations, and independence of time, with function approximators, which allow for the generation of arbitrary (smooth) trajectories. The key enabler to this approach is to gate the output of the function approximator with a gating system, which is 1 at the beginning of the movement, and 0 towards the end.

Further enhancements can be made by making the system autonomous (by using the output of a phase system rather than time as an input to the function approximator), or having initial velocities and accelerations of 0 (by using a delayed goal system).

Multi-dimensional DMPs are achieved by using multi-dimensional dynamical systems, and learning one function approximator for each dimension. Synchronization of the different dimensions is ensure by coupling them with only one phase system.

Further reading: Bibliography

  • [ijspeert02movement] A. J. Ijspeert, J. Nakanishi, and S. Schaal. Movement imitation with nonlinear dynamical systems in humanoid robots. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2002.
  • [ijspeert13dynamical] A. Ijspeert, J. Nakanishi, P Pastor, H. Hoffmann, and S. Schaal. Dynamical Movement Primitives: Learning attractor models for motor behaviors. Neural Computation, 25(2):328-373, 2013.
  • [kulvicius12joining] Tomas Kulvicius, KeJun Ning, Minija Tamosiunaite, and Florentin Wörgötter. Joining movement sequences: Modified dynamic movement primitives for robotics applications exemplified on handwriting. IEEE Transactions on Robotics, 28(1):145-157, 2012.

Further reading: dmpbbo tutorials

The next tutorials to go to would be: