Skip to content

polixir/NeoRL2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NeoRL2

License License

The NEORL2 repository is an extension of the offline reinforcement learning benchmark NeoRL. The NEORL2 repository contains datasets for training and corresponding environments for testing the trained policies. The current datasets are collected from seven open-source environments: Pipeline, Simglucose, RocketRecovery, RandomFrictionHopper, DMSD, Fusion and SafetyHalfCheetah tasks. We perform online training using reinforcement learning algorithms or PID policies on these tasks and then select suboptimal policies with returns ranging from 50% to 80% of the expert's return to generate offline datasets for each task. These suboptimal policy-sampled datasets better align with real-world task scenarios compared to random or expert policy datasets.

Install NeoRL2 interface

NeoRL2 interface can be installed as follows:

git clone https://agit.ai/Polixir/neorl2.git
cd neorl
pip install -e .

After installation, Pipeline、Simglucose、RocketRecover、DMSD and Fusion environments will be available. However, the "RandomFrictionHopper" and "SafetyHalfCheetah" tasks rely on MuJoCo. If you need to use these two environments, it is necessary to obtain a license and follow the setup instructions, and then run:

pip install -e .[mujoco]

Using NeoRL2

NeoRL2 uses the OpenAI Gym API. Tasks can be created as follows:

import neorl2
import gymnasium as gym

# Create an environment
env = gym.make("Pipeline")
env.reset()
env.step(env.action_space.sample())

After creating the environment, you can use the get_dataset() function to obtain training data and validation data:

train_data, val_data = env.get_dataset()

Each environment supports setting and getting the reward function and done function of the environment, which is very useful for adjusting the environment settings when needed.

# Set reward function
env.set_reward_func(reward_func)

# Get reward function
env.get_reward_func(reward_func)

# Set done function
env.get_done_func(done_func)

# Get done function
env.set_done_func(done_func)

You can use the following environments now:

Env Name observation shape action shape have done max timesteps
Pipeline 52 1 False 1000
Simglucose 31 1 True 480
RocketRecovery 7 2 True 500
RandomFrictionHopper 13 3 True 1000
DMSD 6 2 False 100
Fusion 15 6 False 100
SafetyHalfCheetah 18 6 False 1000

Data in NeoRL2

In NeoRL2, training data and validation data returned by get_dataset() function are dict with the same format:

  • obs: An N by observation dimensional array of current step's observation.

  • next_obs: An N by observation dimensional array of next step's observation.

  • action: An N by action dimensional array of actions.

  • reward: An N dimensional array of rewards.

  • done: An N dimensional array of episode termination flags.

  • index: An trajectory number-dimensional array. The numbers in index indicate the beginning of trajectories.

Reference

Simglucose: Jinyu Xie. Simglucose v0.2.1 (2018) [Online]. Available: https://github.com/jxx123/simglucose. Accessed on: 5-17-2024. code

DMSD: Char, Ian, et al. "Correlated Trajectory Uncertainty for Adaptive Sequential Decision Making." NeurIPS 2023 Workshop on Adaptive Experimental Design and Active Learning in the Real World. 2023. paper code

MuJoCo: Todorov E, Erez T, Tassa Y. "Mujoco: A Physics Engine for Model-based Control." Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026-5033, 2012. paper website

Gym: Brockman, Greg, et al. "Openai gym." arXiv preprint arXiv:1606.01540 (2016). paper code

Licenses

All datasets are licensed under the Creative Commons Attribution 4.0 License (CC BY), and code is licensed under the Apache 2.0 License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published