Reinforcement Learning Examples

We introduce the following reinforcement learning examples that are implemented using Isaac Sim's RL framework.

Pre-trained checkpoints can be found on the Nucleus server. To set up localhost, please refer to the Isaac Sim installation guide.

Note: All commands should be executed from omniisaacgymenvs/omniisaacgymenvs.

Reinforcement Learning Examples

Cartpole cartpole.py

Cartpole is a simple example that demonstrates getting and setting usage of DOF states using ArticulationView from omni.isaac.core. The goal of this task is to move a cart horizontally such that the pole, which is connected to the cart via a revolute joint, stays upright.

Joint positions and joint velocities are retrieved using get_joint_positions and get_joint_velocities respectively, which are required in computing observations. Actions are applied onto the cartpoles via set_joint_efforts. Cartpoles are reset by using set_joint_positions and set_joint_velocities.

Training can be launched with command line argument task=Cartpole.

Running inference with pre-trained model can be launched with command line argument task=Cartpole test=True checkpoint=omniverse://localhost/NVIDIA/Assets/Isaac/2022.1/Isaac/Samples/OmniIsaacGymEnvs/Checkpoints/cartpole.pth

Config files used for this task are:

Task config: Cartpole.yaml
rl_games training config: CartpolePPO.yaml

Ant ant.py

Ant is an example of a simple locomotion task. The goal of this task is to train quadruped robots (ants) to run forward as fast as possible. This example inherets from LocomotionTask, which is a shared class between this example and the humanoid example; this simplifies implementations for both environemnts since they compute rewards, observations, and resets in a similar manner. This framework allows us to easily switch between robots used in the task.

The Ant task includes more examples of utilizing ArticulationView from omni.isaac.core, which provides various functions to get and set both DOF states and articulation root states in a tensorized fashion across all of the actors in the environment. get_world_poses, get_linear_velocities, and get_angular_velocities, can be used to determine whether the ants have been moving towards the desired direction and whether they have fallen or flipped over. Actions are applied onto the ants via set_joint_efforts, which moves the ants by setting torques to the DOFs. Force sensors are also placed on each of the legs to observe contacts with the ground plane; the sensor values can be retrieved using get_force_sensor_forces.

Training can be launched with command line argument task=Ant.

Running inference with pre-trained model can be launched with command line argument task=Ant test=True checkpoint=omniverse://localhost/NVIDIA/Assets/Isaac/2022.1/Isaac/Samples/OmniIsaacGymEnvs/Checkpoints/ant.pth

Config files used for this task are:

Task config: Ant.yaml
rl_games training config: AntPPO.yaml

Humanoid humanoid.py

Humanoid is another environment that uses LocomotionTask. It is conceptually very similar to the Ant example, where the goal for the humanoid is to run forward as fast as possible.

Training can be launched with command line argument task=Humanoid.

Running inference with pre-trained model can be launched with command line argument task=Humanoid test=True checkpoint=omniverse://localhost/NVIDIA/Assets/Isaac/2022.1/Isaac/Samples/OmniIsaacGymEnvs/Checkpoints/humanoid.pth

Config files used for this task are:

Task config: Humanoid.yaml
rl_games training config: HumanoidPPO.yaml

Shadow Hand Object Manipulation shadow_hand.py

The Shadow Hand task is an example of a challenging dexterity manipulation task with complex contact dynamics. It resembles OpenAI's Learning Dexterity project and Robotics Shadow Hand training environments. The goal of this task is to orient the object in the robot hand to match a random target orientation, which is visually displayed by a goal object in the scene.

This example inherets from InHandManipulationTask, which is a shared class between this example and the Allegro Hand example. The idea of this shared InHandManipulationTask class is similar to that of the LocomotionTask; since the Shadow Hand example and the Allegro Hand example only differ by the robot hand used in the task, using this shared class simplifies implementation across the two.

In this example, motion of the hand is controlled using position targets with set_joint_position_targets. The object and the goal object are reset using set_world_poses; their states are retrieved via get_world_poses for computing observations. It is worth noting that the Shadow Hand model in this example also demonstrates the use of tendons, which are imported using the omni.isaac.mjcf extension.

Training can be launched with command line argument task=ShadowHand.

Training with Domain Randomization can be launched with command line argument task.domain_randomization.randomize=True. For best training results with DR, use num_envs=16384.

Running inference with pre-trained model can be launched with command line argument task=ShadowHand test=True checkpoint=omniverse://localhost/NVIDIA/Assets/Isaac/2022.1/Isaac/Samples/OmniIsaacGymEnvs/Checkpoints/shadow_hand.pth

Config files used for this task are:

Task config: ShadowHand.yaml
rl_games training config: ShadowHandPPO.yaml

OpenAI Variant

In addition to the basic version of this task, there is an additional variant matching OpenAI's Learning Dexterity project. This variant uses the openai observations in the policy network, but asymmetric observations of the full_state in the value network. This can be launched with command line argument task=ShadowHandOpenAI_FF.

Config files used for this are:

Task config: ShadowHandOpenAI_FF.yaml
rl_games training config: ShadowHandOpenAI_FFPPO.yaml.

LSTM Training Variant

This variant uses LSTM policy and value networks instead of feed forward networks, and also asymmetric LSTM critic designed for the OpenAI variant of the task. This can be launched with command line argument task=ShadowHandOpenAI_LSTM.

Config files used for this are:

Task config: ShadowHandOpenAI_LSTM.yaml
rl_games training config: ShadowHandOpenAI_LSTMPPO.yaml.

Allegro Hand Object Manipulation allegro_hand.py

This example performs the same object orientation task as the Shadow Hand example, but using the Allegro hand instead of the Shadow hand.

Training can be launched with command line argument task=AllegroHand.

Running inference with pre-trained model can be launched with command line argument task=AllegroHand test=True checkpoint=omniverse://localhost/NVIDIA/Assets/Isaac/2022.1/Isaac/Samples/OmniIsaacGymEnvs/Checkpoints/allegro_hand.pth

Config files used for this task are:

Task config: AllegroHand.yaml
rl_games training config: AllegroHandPPO.yaml

ANYmal anymal.py

This example trains a model of the ANYmal quadruped robot from ANYbotics to follow randomly chosen x, y, and yaw target velocities.

Training can be launched with command line argument task=Anymal.

Running inference with pre-trained model can be launched with command line argument task=Anymal test=True checkpoint=omniverse://localhost/NVIDIA/Assets/Isaac/2022.1/Isaac/Samples/OmniIsaacGymEnvs/Checkpoints/anymal.pth

Config files used for this task are:

Task config: Anymal.yaml
rl_games training config: AnymalPPO.yaml

Anymal Rough Terrain anymal_terrain.py

A more complex version of the above Anymal environment that supports traversing various forms of rough terrain.

Training can be launched with command line argument task=AnymalTerrain.

Running inference with pre-trained model can be launched with command line argument task=AnymalTerrain test=True checkpoint=omniverse://localhost/NVIDIA/Assets/Isaac/2022.1/Isaac/Samples/OmniIsaacGymEnvs/Checkpoints/anymal_terrain.pth

Task config: AnymalTerrain.yaml
rl_games training config: AnymalTerrainPPO.yaml

Note during test time use the last weights generated, rather than the usual best weights. Due to curriculum training, the reward goes down as the task gets more challenging, so the best weights do not typically correspond to the best outcome.

Note if you use the ANYmal rough terrain environment in your work, please ensure you cite the following work:

@misc{rudin2021learning,
      title={Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning}, 
      author={Nikita Rudin and David Hoeller and Philipp Reist and Marco Hutter},
      year={2021},
      journal = {arXiv preprint arXiv:2109.11978}

Note The OmniIsaacGymEnvs implementation slightly differs from the implementation used in the paper above, which also uses a different RL library and PPO implementation. The original implementation is made available here. Results reported in the Isaac Gym technical paper are based on that repository, not this one.

NASA Ingenuity Helicopter ingenuity.py

This example trains a simplified model of NASA's Ingenuity helicopter to navigate to a moving target. It showcases the use of velocity tensors and applying force vectors to rigid bodies. Note that we are applying force directly to the chassis, rather than simulating aerodynamics. This example also demonstrates using different values for gravitational forces. Ingenuity Helicopter visual 3D Model courtesy of NASA: https://mars.nasa.gov/resources/25043/mars-ingenuity-helicopter-3d-model/.

Training can be launched with command line argument task=Ingenuity.

Running inference with pre-trained model can be launched with command line argument task=Ingenuity test=True checkpoint=omniverse://localhost/NVIDIA/Assets/Isaac/2022.1/Isaac/Samples/OmniIsaacGymEnvs/Checkpoints/ingenuity.pth

Config files used for this task are:

Task config: Ingenuity.yaml
rl_games training config: IngenuityPPO.yaml

Quadcopter quadcopter.py

This example trains a very simple quadcopter model to reach and hover near a fixed position.
Lift is achieved by applying thrust forces to the "rotor" bodies, which are modeled as flat cylinders.
In addition to thrust, the pitch and roll of each rotor is controlled using DOF position targets.

Training can be launched with command line argument task=Quadcopter.

Running inference with pre-trained model can be launched with command line argument task=Quadcopter test=True checkpoint=omniverse://localhost/NVIDIA/Assets/Isaac/2022.1/Isaac/Samples/OmniIsaacGymEnvs/Checkpoints/quadcopter.pth

Config files used for this task are:

Task config: Quadcopter.yaml
rl_games training config: QuadcopterPPO.yaml

Crazyflie crazyflie.py

This example trains the Crazyflie drone model to hover near a fixed position. It is achieved by applying thrust forces to the four rotors.

Training can be launched with command line argument task=Crazyflie.

Running inference with pre-trained model can be launched with command line argument task=Crazyflie test=True checkpoint=omniverse://localhost/NVIDIA/Assets/Isaac/2022.1/Isaac/Samples/OmniIsaacGymEnvs/Checkpoints/crazyflie.pth

Config files used for this task are:

Task config: Crazyflie.yaml
rl_games training config: CrazyfliePPO.yaml

Ball Balance ball_balance.py

This example trains balancing tables to balance a ball on the table top. This is a great example to showcase the use of force and torque sensors, as well as DOF states for the table and root states for the ball. In this example, the three-legged table has a force sensor attached to each leg. We use the force sensor APIs to collect force and torque data on the legs, which guide position target outputs produced by the policy.

Training can be launched with command line argument task=BallBalance.

Running inference with pre-trained model can be launched with command line argument task=BallBalance test=True checkpoint=omniverse://localhost/NVIDIA/Assets/Isaac/2022.1/Isaac/Samples/OmniIsaacGymEnvs/Checkpoints/ball_balance.pth

Config files used for this task are:

Task config: BallBalance.yaml
rl_games training config: BallBalancePPO.yaml

Franka Cabinet franka_cabinet.py

This Franka example demonstrates interaction between Franka arm and cabinet, as well as setting states of objects inside the drawer. It also showcases control of the Franka arm using position targets. In this example, we use DOF state tensors to retrieve the state of the Franka arm, as well as the state of the drawer on the cabinet. Actions are applied as position targets to the Franka arm DOFs.

Training can be launched with command line argument task=FrankaCabinet.

Running inference with pre-trained model can be launched with command line argument task=FrankaCabinet test=True checkpoint=omniverse://localhost/NVIDIA/Assets/Isaac/2022.1/Isaac/Samples/OmniIsaacGymEnvs/Checkpoints/franka_cabinet.pth

Config files used for this task are:

Task config: FrankaCabinet.yaml
rl_games training config: FrankaCabinetPPO.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rl_examples.md

rl_examples.md

Reinforcement Learning Examples

Cartpole cartpole.py

Ant ant.py

Humanoid humanoid.py

Shadow Hand Object Manipulation shadow_hand.py

OpenAI Variant

LSTM Training Variant

Allegro Hand Object Manipulation allegro_hand.py

ANYmal anymal.py

Anymal Rough Terrain anymal_terrain.py

NASA Ingenuity Helicopter ingenuity.py

Quadcopter quadcopter.py

Crazyflie crazyflie.py

Ball Balance ball_balance.py

Franka Cabinet franka_cabinet.py

Files

rl_examples.md

Latest commit

History

rl_examples.md

File metadata and controls

Reinforcement Learning Examples

Cartpole cartpole.py

Ant ant.py

Humanoid humanoid.py

Shadow Hand Object Manipulation shadow_hand.py

OpenAI Variant

LSTM Training Variant

Allegro Hand Object Manipulation allegro_hand.py

ANYmal anymal.py

Anymal Rough Terrain anymal_terrain.py

NASA Ingenuity Helicopter ingenuity.py

Quadcopter quadcopter.py

Crazyflie crazyflie.py

Ball Balance ball_balance.py

Franka Cabinet franka_cabinet.py