Skip to content
Justin Fu edited this page Apr 18, 2021 · 19 revisions

Maze2D

Task name
maze2d-open-v0
maze2d-umaze-v1
maze2d-medium-v1
maze2d-large-v1
maze2d-open-dense-v0
maze2d-umaze-dense-v1
maze2d-medium-dense-v1
maze2d-large-dense-v1

The Maze2D domain involves moving force-actuated ball (along the X and Y axis) to a fixed target location. The observation consists of the (x, y) location and velocities. The dataset consists of one continuous trajectory of the agent navigating to random goal locations, and thus has no terminal states. However, for the purposes of being able to split the trajectory into smaller subtrajectories, we use the timeout field to denote instances when the randomly selected navigation goal has been reached.

The four maze layouts are shown below (from left to right: open, umaze, medium large):

The four environments maze2d-open-v0, maze2d-umaze-v0, maze2d-medium-v0, maze2d-large-v0 use a sparse reward which is has a value of 1.0 when the agent (light green ball) is within a 0.5 unit radius of the target (light red ball).

Each environment has a dense reward version, which instead uses the negative exponentiated distance as the reward.

AntMaze

Task name
antmaze-umaze-v0
antmaze-umaze-diverse-v0
antmaze-medium-diverse-v0
antmaze-medium-play-v0
antmaze-large-diverse-v0
antmaze-large-play-v0

The AntMaze domain uses the same umaze, medium, and large mazes from the Maze2D domain, but replaces the agent with the "Ant" robot from the OpenAI Gym MuJoCo benchmark.

The dataset in 'antmaze-umaze-v0' is generated by commanding a fixed goal location from a fixed starting location (these are the opposite sides of the wall in the umaze).

For harder tasks, the "diverse" dataset is generated by commanding random goal locations in the maze and navigating the ant to them. The "play" dataset is generated by commanding specific hand-picked goal locations from hand-picked initial positions.

MiniGrid-FourRooms

Task name
minigrid-fourrooms-v0
minigrid-fourrooms-random-v0

The Minigrid domain is a discrete analog of Maze2D.

Two datasets are provided: minigrid-fourrooms-v0, which is generated by a controller that randomly samples goal locations and navigates to them, and minigrid-fourrooms-random-v0, which samples actions uniformly at random.

Adroit

Task name
pen-human-v0/v1
pen-cloned-v0/v1
pen-expert-v0/v1
hammer-human-v0/v1
hammer-cloned-v0/v1
hammer-expert-v0/v1
door-human-v0/v1
door-cloned-v0/v1
door-expert-v0/v1
relocate-human-v0/v1
relocate-cloned-v0/v1
relocate-expert-v0v1

The Adroit domain involves controlling a 24-DoF robotic hand. There are 4 tasks, from the hand_dapg repository. Clockwise from the top left, they are pen (aligning a pen with a target orientation), door (opening a door), relocate (move a ball to a target position), and hammer (hammer a nail into a board).

There are 3 datasets for each environment.

  • Human uses the 25 human demonstrations provided in the DAPG repository.
  • Cloned uses a 50-50 split between demonstration data and 2500 trajectories sampled from a behavioral cloned policy on the demonstrations. The demonstration trajectories are copied to match the number of behavioral cloned trajectories.
  • Expert uses 5000 trajectories sampled from an expert that solves the task, provided in the DAPG repository.

The *-v0 datasets were used to generate the results reported in our whitepaper and are included for backwards compatibility. However, the *-v1 datasets have improved metadata:

  • infos/action_log_probs: The log-probabilities for actions in the dataset. (expert dataset only).
  • Assorted information providing full environment state (infos/qpos, infos/qvel, and task-specific fields)
  • metadata/policy/*: Contains the weights of the policy used to collect the data. (expert dataset only)

As well as some bugfixes to the dataset.

  • The *-human tasks now have timeouts set when the demonstrations end, rather than at the maximum horizon of the environment.

Gym

Task name
halfcheetah-random-v0/v2
halfcheetah-medium-v0/v2
halfcheetah-expert-v0/v2
halfcheetah-medium-replay-v0/v2
halfcheetah-medium-expert-v0/v2
walker2d-random-v0/v2
walker2d-medium-v0/v2
walker2d-expert-v0/v2
walker2d-medium-replay-v0/v2
walker2d-medium-expert-v0/v2
hopper-random-v0/v2
hopper-medium-v0/v2
hopper-expert-v0/v2
hopper-medium-replay-v0/v2
hopper-medium-expert-v0/v2
ant-random-v0/v2
ant-medium-v0/v2
ant-expert-v0/v2
ant-medium-replay-v0/v2
ant-medium-expert-v0/v2
  • Random uses 1M samples from a randomly initialized policy.
  • Expert uses 1M samples from a policy trained to completion with SAC.
  • Medium uses 1M samples from a policy trained to approximately 1/3 the performance of the expert.
  • Medium-Replay uses the replay buffer of a policy trained up to the performance of the medium agent. Timeouts in this dataset are not always marked when the agent reaches the max trajectory length, but rather when 1000 timesteps have been sampled for a particular training iteration.
  • Medium-Expert uses a 50-50 split of medium and expert data (slightly less than 2M samples total).

The *-v0 datasets were used to generate the results reported in our whitepaper and are included for backwards compatibility. However, the *-v2 datasets have improved metadata:

  • infos/action_log_probs: The log-probabilities for actions in the dataset. (medium/expert datasets only).
  • infos/qpos and infos/qvel: State information
  • metadata/policy/*: Contains the weights of the policy used to collect the data. (medium/expert datasets only)

As well as some bugfixes:

  • All trajectories now timeout at 1000 steps.

Flow

Task name
flow-ring-random-v0
flow-ring-controller-v0
flow-merge-random-v0
flow-merge-controller-v0

The Flow environment involves controlling the acceleration of autonomous vehicles (1 in ring, up to 5 in merge) in order to maximize traffic flow. The two road configurations we include are shown above: a single-lane ring environment and a highway merge intersection. We also include two types of datasets - the "random" data conisists of random accelerations being commanded to the autonomous vehicle, and the "controller" data uses an intelligent driver model (IDM) in order to command accelerations.

FrankaKitchen

Task name
kitchen-complete-v0
kitchen-partial-v0
kitchen-mixed-v0

The goal of the FrankaKitchen environment is to interact with the various objects in order to reach a desired state configuration. The objects you can interact with include the position of the kettle, flipping the light switch, opening and closing the microwave and cabinet doors, or sliding the other cabinet door. The desired goal configuration for all 3 tasks is to complete 4 subtasks: open the microwave, move the kettle, flip the light switch, and slide open the cabinet door. 3 datasets are included:

  • The complete dataset includes demonstrations of all 4 target subtasks being completed, in order.
  • The partial dataset includes other tasks being performed, but there are subtrajectories where the 4 target subtasks are completed in sequence.
  • The mixed dataset contains various subtasks being performed, but the 4 target subtasks are never completed in sequence together.

CARLA

Task name
carla-lane-v0
carla-town-v0
carla-town-full-v0

We include tasks based on two map layouts within the CARLA simulator. Observations are provided as a 6912-dimensional vector, which can be reshaped into a (48, 48, 3)-dimensional RGB image.

  • carla-lane-v0 is a lane-keeping task in a figure 8 road layout (CARLA Town04). The dataset consists of a hand-coded lane-keep controller which drives continuously while avoiding crashes with other vehicles. The total size of the dataset is 100K images.
  • carla-town-v0 is a navigation task, where the vehicle must navigate to a target location. The dataset consists of the same lane-keeping controller from carla-lane-v0, except the vehicle makes random turns at intersections. The total size of the dataset is 100K images.
  • carla-town-full-v0 is the same task as carla-town-v0, except with an expanded dataset of 2 million images. Because this dataset cannot typically be loaded into memory at once, for this environment you can use the env.get_dataset_chunk(int) method to load part of the dataset at a time. The index you pass in ranges from 0 to 19 (inclusive), with each chunk of data loading in 100K images at a time.

There is some additional setup required for CARLA, which can be found here