# Learning how to move a human arm

In this tutorial we will show how to train a basic biomechanical model using `keras-rl`.

## Installation

To make it work, follow the instructions in
https://github.com/stanfordnmbl/osim-rl#getting-started
i.e. run

    conda create -n opensim-rl -c kidzik opensim git python=2.7
    source activate opensim-rl
    pip install git+https://github.com/stanfordnmbl/osim-rl.git
Then run

    git clone https://github.com/stanfordnmbl/osim-rl.git
    conda install keras -c conda-forge
    pip install git+https://github.com/matthiasplappert/keras-rl.git
    cd osim-rl
    conda install jupyter
follow the instructions and once jupyter is installed and type

    jupyter notebook
This should open the browser with jupyter. Navigate to this notebook, i.e. to the file `scripts/train.arm.ipynb`.

## Preparing the environment

The following two blocks load necessary libraries and create a simulator environment.

In [1]:
# Derived from keras-rl
import opensim as osim
import numpy as np
import sys

from keras.models import Sequential, Model
from keras.layers import Dense, Activation, Flatten, Input, concatenate
from keras.optimizers import Adam

import numpy as np

from rl.agents import DDPGAgent
from rl.memory import SequentialMemory
from rl.random import OrnsteinUhlenbeckProcess

from osim.env.arm import ArmEnv

from keras.optimizers import RMSprop

import argparse
import math

Using Theano backend.


In [2]:
# Load walking environment
env = ArmEnv(True)
env.reset()

# Total number of steps in training
nallsteps = 10000

nb_actions = env.action_space.shape[0]

## Creating the actor and the critic

The actor serves as a brain for controlling muscles. The critic is our approximation of how good is the brain performing for achieving the goal

In [3]:
# Create networks for DDPG
# Next, we build a very simple model.
actor = Sequential()
actor.add(Flatten(input_shape=(1,) + env.observation_space.shape))
actor.add(Dense(32))
actor.add(Activation('relu'))
actor.add(Dense(32))
actor.add(Activation('relu'))
actor.add(Dense(32))
actor.add(Activation('relu'))
actor.add(Dense(nb_actions))
actor.add(Activation('sigmoid'))
print(actor.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten_1 (Flatten)          (None, 14)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 32)                480       
_________________________________________________________________
activation_1 (Activation)    (None, 32)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 32)                1056      
_________________________________________________________________
activation_2 (Activation)    (None, 32)                0         
_________________________________________________________________
dense_3 (Dense)              (None, 32)                1056      
_________________________________________________________________
activation_3 (Activation)    (None, 32)                0         
__________

In [4]:
action_input = Input(shape=(nb_actions,), name='action_input')
observation_input = Input(shape=(1,) + env.observation_space.shape, name='observation_input')
flattened_observation = Flatten()(observation_input)
x = concatenate([action_input, flattened_observation])
x = Dense(64)(x)
x = Activation('relu')(x)
x = Dense(64)(x)
x = Activation('relu')(x)
x = Dense(64)(x)
x = Activation('relu')(x)
x = Dense(1)(x)
x = Activation('linear')(x)
critic = Model(inputs=[action_input, observation_input], outputs=x)
print(critic.summary())

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
observation_input (InputLayer)   (None, 1, 14)         0                                            
____________________________________________________________________________________________________
action_input (InputLayer)        (None, 6)             0                                            
____________________________________________________________________________________________________
flatten_2 (Flatten)              (None, 14)            0                                            
____________________________________________________________________________________________________
concatenate_1 (Concatenate)      (None, 20)            0                                            
___________________________________________________________________________________________

## Train the actor and the critic

We will now run `keras-rl` implementation of the DDPG algorithm which trains both networks.

In [5]:
# Set up the agent for training
memory = SequentialMemory(limit=100000, window_length=1)
random_process = OrnsteinUhlenbeckProcess(theta=.15, mu=0., sigma=.2, size=env.noutput)
agent = DDPGAgent(nb_actions=nb_actions, actor=actor, critic=critic, critic_action_input=action_input,
                  memory=memory, nb_steps_warmup_critic=100, nb_steps_warmup_actor=100,
                  random_process=random_process, gamma=.99, target_model_update=1e-3,
                  delta_clip=1.)
agent.compile(Adam(lr=.001, clipnorm=1.), metrics=['mae'])

[2017-07-22 22:32:57,326] install mkl with `conda install mkl-service`: No module named mkl


In [6]:
# Okay, now it's time to learn something! We visualize the training here for show, but this
# slows down training quite a lot. You can always safely abort the training prematurely using
# Ctrl + C.
agent.fit(env, nb_steps=2000, visualize=False, verbose=0, nb_max_episode_steps=200, log_interval=10000)
# After training is done, we save the final weights.
#    agent.save_weights(args.model, overwrite=True)


Distance: 0.460324
True positions: (-1.149648,-0.048636)
Reached: (-1.566472,-0.005135)

Distance: 0.358700
True positions: (0.080656,-0.269618)
Reached: (-0.037848,-0.509815)

Distance: 2.262523
True positions: (0.010504,-0.686305)
Reached: (-1.567794,-0.002080)

Distance: 0.637941
True positions: (-0.651515,-0.027902)
Reached: (-0.421993,-0.436321)

Distance: 1.889324
True positions: (0.043834,-0.279527)
Reached: (-1.567862,-0.001899)

Distance: 0.467375
True positions: (0.068209,-0.205440)
Reached: (-0.298954,-0.305652)

Distance: 1.345630
True positions: (-1.105126,-0.895527)
Reached: (-1.564775,-0.009546)

Distance: 0.629321
True positions: (-0.601926,-0.867793)
Reached: (-0.531817,-0.308581)

Distance: 1.611037
True positions: (-0.589761,-0.644584)
Reached: (-1.565031,-0.008817)

Distance: 0.639025
True positions: (-0.590143,-0.529063)
Reached: (-0.167579,-0.312602)

Distance: 1.425042
True positions: (-0.717768,-0.583929)
Reached: (-1.565764,-0.006883)

Distance: 0.897903
True 

<keras.callbacks.History at 0x7f8aadceaf50>

## Evaluate the results
Check how our trained 'brain' performs. Below we will also load a pretrained model (on the larger number of episodes), which should perform better. It was trained exactly the same way, just with a larger number of steps (parameter `nb_steps` in `agent.fit`.

In [7]:
# agent.load_weights(args.model)
# Finally, evaluate our algorithm for 1 episode.
agent.test(env, nb_episodes=2, visualize=False, nb_max_episode_steps=1000)

Testing for 2 episodes ...

Distance: 1.744610
True positions: (-0.117997,-0.297044)
Reached: (-1.567597,-0.002033)

Distance: 0.880632
True positions: (-0.014510,-0.320320)
Reached: (0.339653,-0.846790)

Distance: 0.656555
True positions: (-0.423018,-0.626791)
Reached: (0.056300,-0.804027)

Distance: 1.092906
True positions: (-0.550707,-0.236474)
Reached: (0.175475,-0.603198)

Distance: 1.005385
True positions: (-0.443725,-0.151066)
Reached: (0.203324,-0.509402)

Distance: 0.951899
True positions: (-0.414841,-0.014396)
Reached: (0.123739,-0.427716)

Distance: 0.839426
True positions: (-0.418568,-0.693720)
Reached: (-0.001927,-1.116504)

Distance: 1.497009
True positions: (-0.894740,-0.584725)
Reached: (0.170642,-1.016351)

Distance: 1.131753
True positions: (-0.743247,-0.725087)
Reached: (0.183012,-0.930581)

Distance: 0.312900
True positions: (-0.142488,-0.953783)
Reached: (0.149220,-0.932591)
Episode 1: reward: -976.410, steps: 1000

Distance: 1.580191
True positions: (-0.732438,-0.

<keras.callbacks.History at 0x7f8aa0777410>

In [9]:
agent.load_weights("../models/example.h5f")
# Finally, evaluate our algorithm for 1 episode.
agent.test(env, nb_episodes=5, visualize=False, nb_max_episode_steps=1000)

Testing for 5 episodes ...

Distance: 1.982073
True positions: (-0.365306,-0.789004)
Reached: (-1.565560,-0.007185)

Distance: 0.489316
True positions: (-1.066750,-0.081859)
Reached: (-0.828360,-0.332785)

Distance: 0.340835
True positions: (-0.821318,-0.029109)
Reached: (-0.726064,-0.274690)

Distance: 0.112628
True positions: (-0.885140,-0.309075)
Reached: (-0.880001,-0.416565)

Distance: 0.225917
True positions: (-1.052513,-0.912940)
Reached: (-0.975606,-1.061949)

Distance: 0.219576
True positions: (-0.137413,-0.078975)
Reached: (-0.165256,-0.270706)

Distance: 0.201277
True positions: (-0.692787,-0.661610)
Reached: (-0.850584,-0.705090)

Distance: 0.257323
True positions: (-0.855284,-0.132450)
Reached: (-0.740171,-0.274659)

Distance: 0.099240
True positions: (-0.470129,-0.782663)
Reached: (-0.568455,-0.781748)

Distance: 0.418505
True positions: (-1.123986,-0.081918)
Reached: (-0.917664,-0.294101)
Episode 1: reward: -340.685, steps: 1000

Distance: 0.565987
True positions: (-1.16

<keras.callbacks.History at 0x7f8aa0777890>