Skip to content

keraJLi/synthetic-gymnax

Repository files navigation

🌐 Synthetic Gymnax
Drop-in environment replacements that make your RL algorithm train faster.
Code style: black License: Apache 2.0 PyPI version

Synthetic gymnax contains Gymnax environments that train agents within 10k time steps.

🔄 Make a one-line change ...

Simply replace by
import gymnax
env, params = gymnax.make("CartPole-v1")

...  # your training code
import gymnax, synthetic_gymnax
env, params = gymnax.make("Synthetic-CartPole-v1")
# add 'synthetic' to env:  ^^^^^^^^^^
...  # your training code

💨 ... and enjoy fast training.

The synthetic environments are meta-learned to train agents within 10k time steps. This can be much faster than training in the real environment, even when using tuned hyperparameters!

  • 🟩 Real environment training, using tuned hyperparameters (IQM of 5 training runs)
  • 🟦 Synthetic environment training, using any reasonable hyperparameters (IQM performance of 20 training runs with random HP configurations)

🏗 Installing synthetic-gymnax

  1. Install via pip: pip install synthetic-gymnax
  2. Install from source: pip install git+https://github.com/keraJLi/synthetic-gymnax

🏅 Performance of agents after training for 10k synthetic steps

Classic control: 10k synthetic 🦶
Environment PPO SAC DQN DDPG TD3
Synthetic-Acrobot-v1 -84.1 -85.3 -82.6 - -
Synthetic-CartPole-v1 500.0 500.0 500.0 - -
Synthetic-Mountaincar-v0 -181.8 -170.1 -118.4 - -
Synthetic-CountinuousMountainCar-v0 66.9 91.1 - 97.6 97.5
Synthetic-Pendulum-v1 -205.4 -188.3 - -164.3 -168.5
Brax: 10k synthetic, 5m real 🦶
Environment PPO SAC DDPG TD3
Synthetic Real Synthetic Real Synthetic Real Synthetic Real
halfcheetah 1657.4 3487.1 5810.4 7735.5 6162.4 3263.3 6555.8 13213.5
hopper 853.5 2521.9 2738.8 3119.4 3012.4 1536.0 2985.3 3325.8
humanoidstandup 13356.1 17243.5 21105.2 23808.1 21039.0 24944.8 20372.0 28376.2
swimmer 348.5 83.6 361.6 124.8 365.1 348.5 365.4 232.2
walker2d 858.3 2039.6 1323.1 4140.1 1304.3 698.3 1321.8 4605.8

💡 Background

The environments in this package are the result of our paper, Discovering Minimal Reinforcement Learning Environments (citation below). They are optimized using evolutionary meta-learning, such that they maximize the performance of an agent after training in the synthetic environment. In the paper, we find that

  1. The synthetic environments don't need to have episodes that exceed a single time steps. Instead, synthetic contextual bandits are enough to train good policies.
  2. The synthetic contextual bandits generalize to unseen network architectures and optimization schemes. While gradient-based optimization was used during meta-learning, evolutionary methods work in evaluation, too.
  3. We can speed up downstream meta-learning applications, such as Discovered Policy Optimization. For more info, have a look at the paper!

Conceptual algorithm overview

💫Replicating our results

We provide the configurations used in meta-training the checkpoints for synthetic environments in synthetic_gymnax/checkpoints/*environment*/config.yaml. They can be used with the meta-learning script by calling e.g.

python examples/metalearn_synthenv.py --config synthetic_gymnax/checkpoints/hopper/config.yaml

Please note that when installing via pip, the configs are not bundled with the package. Please clone the repository to get them.

✍ Citing and more information

If you use the provided synthetic environments in your work, please cite our paper as

@article{liesen2024discovering,
  title={Discovering Minimal Reinforcement Learning Environments}, 
  author={Jarek Liesen and Chris Lu and Andrei Lupu and Jakob N. Foerster and Henning Sprekeler and Robert T. Lange},
  year={2024},
  eprint={2406.12589},
  archivePrefix={arXiv}
}