Skip to content
No description, website, or topics provided.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
Environments Added readme to pybullet files Oct 27, 2019
LICENSE Initial commit of coadaptation repository Oct 27, 2019 Initial commit of coadaptation repository Oct 27, 2019 Initial commit of coadaptation repository Oct 27, 2019
requirements.txt Initial commit of coadaptation repository Oct 27, 2019

Fast Evolution through Actor-Critic Reinforcement Learning

This is the official repository providing a refactored implementation of the data-driven design optimization method presented in the paper Data-efficient Co-Adaptation of Morphology and Behaviour with Deep Reinforcement Learning.

At the moment, the repository contains a basic implementation of the proposed algorithm and its baseline. We use particle swarm optimization on the Q-function, which is used as a surrogate function predicting the performance of design candidates and, thus, avoiding the necessity to simulate/evaluate design candidates. The baseline uses also particle swarm optimization but evaluates design candidates in simulation instead.

The current environment provided is Half-Cheetah, using pybullet, for which we have to learn effective movement strategies and the optimal leg lengths, maximizing the performance of the agent.

Additional methods and environments which are shown in the paper will be added over time and the structure of the repository might change in the future.


If you use this code in your research, please cite

  title={Data-efficient Co-Adaptation of Morphology and Behaviour with Deep Reinforcement Learning},
  author={Luck, Kevin Sebastian and Ben Amor, Heni and Calandra, Roberto},
  booktitle={Conference on Robot Learning},

Acknowledgements of Previous Work

This project would have been harder to implement without the great work of the developers behind rlkit and pybullet.

The reinforcement learning loop makes extensive use of rlkit, a framework developed and maintained by Vitchyr Pong. You find this repository here. We made slight adaptations to the Soft-Actor-Critic algorithm used in this repository.

Tasks were simulated in PyBullet, the repository can be found here. Adaptations were made to the files found in pybullet_evo to enable the dynamic adaptation of design parameters during the training process.

Why do you use an older version of rlkit?

When I started working on this project, the tag v0.1.2 was the newest. There are quite many changes from 0.1.2 to 0.2.0, I will tackle this update in the future ;)


Make sure that PyTorch is installed. You find more information here:

First, clone this repository to your local computer as usual. Then, install the required packages via pip by executing pip3 install -r requirements.txt.

The SAC implementation used differs slightly from the original version in rlkit developed by Vitchyr Pong. For your convenience, I provide a forked repository. However, all credit for the SAC implementation goes to Vitchyr Pong.

Clone the adapted rlkit with

git clone

Now, set in your terminal the environment variable PYTHONPATH with

export PYTHONPATH=/path/to/Coadaptation-rlkit/

where the folder /path/to/Coadaptation-rlkit contains the folder rlkit. This enables us to import rlkit with import rlkit.

You may have to set the environmental variable every time you open a new terminal.

Starting experiments

After setting the environmental variable and installing the packages you can proceed to run the experiments. There are two experimental configurations already set up for you in You can execute them with

python3 sac_pso_batch


python3 sac_pso_sim

You may change the configs or add new ones. Make sure to add new configurations to the config_dict in

Data logging

If you execute these commands, they will automatically create directories in which the performance and achieved rewards will be stored. Each experiment creates a specific folder with the current date/time and a random string as name. You can find in this folder a copy of the config you executed and one csv file for each design on which the reinforcement learning algorithm was executed. Each csv file contains three rows: The type of the design (either 'Initial', 'Optimized' or 'Random'); The design vector; And the subsequent, cumulative rewards for each episode/trial.

The file ADDVIZFILE provieds a basic jupyter notebook to visualize the collected data.

You can’t perform that action at this time.