This project is based on the Nvidia open source project "jetson-reinforcement" developed by Dustin Franklin. The goal of the project is to create a DQN agent and define reward functions to teach a robotic arm to carry out two primary objectives:
- Have any part of the robot arm touch the object of interest, with at least a 90% accuracy.
- Have only the gripper base of the robot arm touch the object, with at least a 80% accuracy.
Run the following commands from terminal to build the project from source:
$ sudo apt-get install cmake
$ git clone https://github.com/fouliex/DeepRLArmManipulation.git
$ cd DeepRLArmManipulation
$ git submodule update --init
$ mkdir build
$ cd build
$ cmake ../
$ make
During the cmake
step, Torch will be installed so it can take awhile. It will download packages and ask you for your sudo
password during the install.
To successfully leverage deep learning technology in robots, one need to move to a library format that integrate with robotivs and simulators. In addition, robots require real-time responses to changes in their environments so computation performance matters. Therefore the API provides an interface to the Python code written with Pytorch,but the wrappers use Python's low level C to pass memory objects between the user's application and Torch without extra copies. By using a complied language(C/C++) instead of an interpreted one, performance is improved and speeded up even more when GPU acceleration is leveraged.
API Stack for Deel RL is from Nvidia repo
The robotic arm model found in the Gazebo world calls upon a gazebo plugin called ArmPlugin
. This plugin is responsible
for creating the Deep Q-Network(DQN) agent and training it to learn to touch the prop.
The gazebo plugin shared libgazeboArmPlugin.so
a object file that attached to the robot model in the Gazebo world. That
object file is responsible for integrating the simulation environment with the Reinforcement Learning(RL) agent. The plugin is defined
int the ArmPlugin.cpp
file located in the gazebo folder.
The ArmPlugin.cpp
take advantage of the C++ API. This plugin creates specific functions for the class ArmPlugin defined in
ArmPlugin.h
.
This function is responsible for creating and initializing nodes that subscribe to two specific topics, one for the camera and one for the contact sensor for the object.
This function is the calllback function for the camera subscriber. It takes the message from the camera topic, extracts the image and saves it. This is then passed to the DQN.
This function is the callback function for the object's contact sensor. It is used to test whether the contact sensor,
called my_contact
, defined for the object in Gazebo world, observes a collision with another element/model or not.
This function serves to create and initialize the agent.Various parameters that are passed to the Create()
function for
the agent are defined at the top of the file such as:
#define INPUT_WIDTH 512
#define INPUT_HEIGHT 512
#define OPTIMIZER "None"
#define LEARNING_RATE 0.0f
#define REPLAY_MEMORY 10000
#define BATCH_SIZE 8
#define USE_LSTM false
#define LSTM_SIZE 32
This function receives the action value from the DQN and decides to take that action. For every frame that the camera receives, the agent needs to take an appropriate action. Because the DQN agent is discrete, the network selects one output for every frame. This output which is the action value can then be mapped to a specific action thus controlling the arm joints.
There are two possible ways to control the arm joints:
- Velocity Control
- Position Control
For both of these types of control, one can increase or decrease either the joint velocity or the joint position, by a small delta value.
This function is utilized to issue rewards and train the DQN. It is called upon at every simulation iteration and can be used to update the robot joints, and issue both rewards based on the desired goal:
- End of Episode(EOE)
- Interim
At EOE, various parameters for the API and the plugin are reset, and the current accuracy of the agent performing the appropriate task is displayed on the terminal.
The reward information are defined in the ArmPlugin::OnUpdate() function.The arms joints are updated using position control and for each joint there are two action which is to increase or decrease the joint position.
- A reward(
REWARD_WIN *10
) is giving If any part of the arm touch the object. - A reward(
REWARD_WIN
) is giving if a positive weighted average is derived. - A penalty(
REWARD_LOSS *10
) is giving if any part of the robot touch the ground and the episode end. - A penalty(
REWARD_LOST * distance to goal
) is provided if a negative weighted average is derived. - Any collision ends the episode.
- A reward(
REWARD_WIN *20
) is giving if the gripper base of the robot touch the object. - A penalty(
REWARD_LOSS *5
) is giving if any part of the robot touch the object. - A penalty(
REWORD_LOSS *10
) is the robot touch the ground - A penalty(
REWARD_LOST
) is added for no movement if the absolute average goal is less than 0.001 for the gripper base. - Any collision ends the episode.
Objective 1 was perform on a physical machine(Ubuntu 16.04) with a GTX1080 GPU. Below are the Hyperparameters:
INPUT_WIDTH
andINPUT_HEIGHT
are set to 64.- Image dimensions are set to the same size as the input.
- Training is perform on a GTX1080 therefore there's not need to restrict memory usage.
OPTIMIZER
is set to Adam. It performs better then RMSProp in this project.LEARNING_RATE
is set to 0.1 for this objective.REPLAY_MEMORY
is set to 10000 for this objective.BATCH_SIZE
is set to 512.- 512 is use because there's enough memory on the GTX1080.
USE_LSTM
is set to true.LSTM_SIZE
is set to 256.
Objective 2 was perform on a Udacity virtual machine with a Tesla k80 GPU.
INPUT_WIDTH
andINPUT_HEIGHT
are set to 64.OPTIMIZER
is set to Adam. it performs better then RMSProp.LEARNING_RATE
is set to 0.01 for the second objective.REPLAY_MEMORY
is set to 20000 for the second objective.- Due to the smaller surface area a higher replay memory is use to allow more discrete learning.
BATCH_SIZE
is set to 512.- 512 is use because there's enough memory on the Tesla k80.
USE_LSTM
is set to true.LSTM_SIZE
is set to 256.
Both objectives is achieve in the physical environment(Ubuntu 16.04, Objective 1) and the virtual environment(Udaticity VM, objective 2). As long as the robot reaches the goal on the first 50 tries the faster the training goes.
Below are screenshots and videos that the goal were achieved with a minimum of 100 runs:
Any part of the robot arm touch the object of interest with at least a 90% accuracy.
The gripper base of the robot arm touch the object of Interest with at least a 80% accuracy.
- Optimize the reward functions because after 50 runs if the gripper base doesn't touch the object of interest, there's very low chance for it to recuperate and learn how to touch the object.
- Play around by increasing the hyperparameters and see how the robot performs.
- Try making the robot to achieve the goals with the RMSProp optimizer instead of the Adam.
To get started with the project environment, run the following:
$ cd RoboND-DeepRL-Project/build/aarch64/bin
$ chmod u+x gazebo-arm.sh
$ ./gazebo-arm.sh
The plugins which hook the learning into the simulation are located in the gazebo/
directory of the repo. The RL agent and the reward functions are to be defined in ArmPlugin.cpp
.