Skip to content

Integration of the skrl RL library#6

Merged
Mayankm96 merged 29 commits intoisaac-sim:mainfrom
Toni-SM:skrl-integration
Jan 26, 2023
Merged

Integration of the skrl RL library#6
Mayankm96 merged 29 commits intoisaac-sim:mainfrom
Toni-SM:skrl-integration

Conversation

@Toni-SM
Copy link
Copy Markdown
Contributor

@Toni-SM Toni-SM commented Jan 22, 2023

Description

Integration of the skrl reinforcement learning library

  • Adds a wrapper function to wrap an IsaacEnv for skrl compatibility
  • Adds a trainer class that also logs episode information
  • Adds train.py and play.py in workflows to run skrl
  • Adds configuration files for currently included environments with parameters matching that of rl_games as close as possible

Dependencies:

Type of change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Checklist

  • I have run the pre-commit checks with pre-commit run --all-files (see here instructions to set it up)
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file

@Mayankm96 Mayankm96 added the enhancement New feature or request label Jan 22, 2023
@Mayankm96
Copy link
Copy Markdown
Contributor

Mayankm96 commented Jan 22, 2023

Thanks a lot for the pull request! I'll take a look at it tomorrow.

Can you please also update the changelog and the version in the config/extension.toml file of omni.isaac.orbit_envs?

https://isaac-orbit.github.io/orbit/source/refs/contributing.html#maintaining-a-changelog

@Toni-SM
Copy link
Copy Markdown
Contributor Author

Toni-SM commented Jan 22, 2023

Done, CHANGELOG and extension.toml files updated!!!

@Mayankm96
Copy link
Copy Markdown
Contributor

Mayankm96 commented Jan 25, 2023

Hi!

I finally got time today to test this out. I am yet to try training on the classic environments but would be nice to do some evaluations while having the same configurations for PPO across the frameworks.

Right now, mostly tried training for the reach environment. I have the following comments.

Issues

  1. Currently, while training the reach environment, I am seeing a divergence in the loss curve of the value function. Is it possible the parameters are not tuned? I suppose there might be tiny differences in algorithmic implementations but don't think the value function should be diverging. One possible explanation could be that the learning iterations are too large and that's causing some kind of collapse.

    value_skrl

  2. The learned policy for reach seems to be not that stable (I tried the best_agent.pt) . It is quite shaky which I don't see when I train with rsl_rl. Maybe again the parameters are not similar or the same and that is the potential problem.

    skrl.mp4
  3. Currently, the play.py is using the runner class to run the agent-env interaction.

    • When I hit the STOP icon on the GUI, it just freezes and doesn't exit the script quickly. Is there a possible issue in the trainer that doesn't let it exit?
    • When I hit the X icon on the GUI, I observe a similar behavior.

Nice to haves

  • Is it possible to also log the episode information in the runner? Many times in robotics, we care about how each reward term is evolving (for weight tuning). The cumulative reward is often hard to decipher when having many reward terms
  • Can the play.py just load the last or best checkpoint by default? We do something similar for rsl-rl (code snippet).
  • An updated example on how to use a learned policy outside the runner loop. It should usually be simple since that is a torch model that needs to be loaded. But it would be great if we can have an example to show this. It comes quite handy when deploying a policy in a complete robot system (in this case, an extension).

Additional comments

  • We will also need to add the license of skrl inside the docs/licenses/dependencies directory for book-keeping purposes.

@Mayankm96 Mayankm96 self-requested a review January 25, 2023 10:36
@Toni-SM
Copy link
Copy Markdown
Contributor Author

Toni-SM commented Jan 25, 2023

Hi @Mayankm96


Comments:
  1. Loss curve divergence of the value function should not be a problem. rl_games also diverge... I checked the rsl_rl PPO implementation and there is a small difference in the way KL and learning rate scheduler are computed, also some training parameters are different

    Screenshot from 2023-01-25 17-49-14

  2. I mapped (as far as possible) the same parameters as .yaml files for rl_games. The stability of the policy comes mostly from the number of time steps used compared to rsl_rl.

    • skrl: 8000,
    • rl_games: 16 * 500 = 8000
    • rsl_rl: 64 * 250 = 16000

    rls_rl works just as "badly" as the others at 8000 timesteps.
    Training for 16000 timesteps generates a better curves

    Screenshot from 2023-01-25 18-17-24

  3. Solved


Nice to haves
  1. Log the episode information:
    Done

  2. Load the last or best checkpoint by default:
    Done

  3. An updated example of how to use a learned policy outside the runner loop:
    It sounds great. I am going to create an example for futures pull requests


Additional comments

  1. skrl license in docs:
    Done

Copy link
Copy Markdown
Contributor

@Mayankm96 Mayankm96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for providing more information. I just added some minor feedback on the code to understand things a bit better. Once that is cleared up, we can merge this awesome PR :)

Comment thread source/standalone/workflows/skrl/config.py Outdated
Comment thread source/standalone/workflows/skrl/train.py Outdated
Comment thread source/standalone/workflows/skrl/play.py Outdated
Comment thread source/standalone/workflows/skrl/play.py Outdated
Comment thread source/standalone/workflows/skrl/play.py Outdated
@Toni-SM
Copy link
Copy Markdown
Contributor Author

Toni-SM commented Jan 25, 2023

Done...

Summary:

  • Create a customized trainer in the file where the skrl wrapper is defined for:
    • log episode information during training
    • simplify train.py and play.py
  • Remove logging in play.py
  • Move the import statements to the beginning of the config.py file

Copy link
Copy Markdown
Contributor

@Mayankm96 Mayankm96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good to me now. Thanks a lot for working on skrl integration. It is great to have more libraries supported in the workflows.

Would love to try out other agents as well in skrl at some point and provide examples :)

@Mayankm96 Mayankm96 changed the title Skrl integration Integration of the skrl RL library Jan 26, 2023
@Mayankm96 Mayankm96 merged commit e9862d4 into isaac-sim:main Jan 26, 2023
@Toni-SM
Copy link
Copy Markdown
Contributor Author

Toni-SM commented Jan 26, 2023

Great!
Let's work on integrating new agents in future pull requests!

By the way... The online documentation does not reflect any changes 🤔

@Mayankm96
Copy link
Copy Markdown
Contributor

It should be up now! :)

By the way, I made some minor refactoring in this commit f7e4183 :

  • renaming of SkrlLogTrainer to SkrlSequentialLogTrainer
  • readding of the eval mode to the class for completeness
  • some docstring fixes
  • setting the agent to "eval" mode in the play.py script

@Toni-SM
Copy link
Copy Markdown
Contributor Author

Toni-SM commented Jan 26, 2023

Nice... It looks much better now

@Toni-SM Toni-SM deleted the skrl-integration branch August 18, 2023 08:06
renezurbruegg added a commit to renezurbruegg/IsaacLab that referenced this pull request Feb 8, 2026
# Description

This PR introduces an actuator model that takes care of the Serial to
Parallel conversion for the dynaarm,

(Part of the dynaarm integration isaac-sim#6)

## Type of change

- New feature (non-breaking change which adds functionality)

## Screenshots
## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [ ] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

---------

Co-authored-by: Pascal Roth <57946385+pascal-roth@users.noreply.github.com>
Co-authored-by: Mayank Mittal <mittalma@leggedrobotics.com>
renezurbruegg added a commit to renezurbruegg/IsaacLab that referenced this pull request Feb 8, 2026
# Description

Adds the dynaarm asset with and without covers and the correct collision
shapes.

It depends on the Parallel Actuator model from isaac-sim#7 

To view an example: 
```python
 ./isaaclab.sh -p source/standalone/demos/dynaarms.py
 ```
![image](https://github.com/user-attachments/assets/607e16fe-088d-41ae-a2ef-3c5f251095c3)

(This PR is part of isaac-sim#6)

## Type of change
- New feature (non-breaking change which adds functionality)

## Screenshots


## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format`
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file
- [ ] I have added my name to the `CONTRIBUTORS.md` or my name already exists there

---------

Co-authored-by: Pascal <roth.pascal@outlook.de>
renezurbruegg added a commit to renezurbruegg/IsaacLab that referenced this pull request Feb 8, 2026
# Description

```bash
./isaaclab.sh -p source/standalone/environments/random_agent.py --task Isaac-Sysid-Dynaarm-v0 --num_envs 2
./isaaclab.sh -p source/standalone/environments/random_agent.py --task Isaac-Sysid-Dynaarm-Covers-v0 --num_envs 2
```

This PR adds the sysid environments for the dynaarm.

Sysid environments are environments which only contain the standalone
assets and directly observe joint positions and velocities.

This PR depends on isaac-sim#7 and isaac-sim#8, which need to be merged first.

Related Issue: isaac-sim#6 
## Type of change

<!-- As you go through the list, delete the ones that are not
applicable. -->

- New feature (non-breaking change which adds functionality)

## Screenshots

Please attach before and after screenshots of the change if applicable.

## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [ ] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

<!--
As you go through the checklist above, you can mark something as done by
putting an x character in it

For example,
- [x] I have done this task
- [ ] I have not done this task
-->

---------

Co-authored-by: Pascal Roth <57946385+pascal-roth@users.noreply.github.com>
Co-authored-by: Pascal <roth.pascal@outlook.de>
pv-nvidia added a commit to pv-nvidia/IsaacLab that referenced this pull request Apr 24, 2026
Add RAII-style context managers for safe raw Fabric access:

- fabric_write(): calls PrepareForReuse on entry, update_world_xforms +
  sync on exit.  Provides world_matrices fabricarray and view_to_fabric
  mapping for custom warp kernel launches.
- fabric_read(): calls PrepareForReuse on entry (ensures valid pointers
  after topology changes), no-op on exit.

Also exposes read-only properties:
- world_matrices: the raw fabricarray of omni:fabric:worldMatrix
- view_to_fabric_mapping: the view-index to fabric-index mapping

This addresses Piotr's Issue isaac-sim#6 (reader/writer pattern) by providing
a structured way to bracket Fabric operations that ensures
PrepareForReuse and hierarchy updates are never forgotten.

Tests added:
- test_fabric_write_context_manager: validates write + readback
- test_fabric_read_context_manager: validates read without side effects

Depends on: fix/fabric-prepare-for-reuse (PR isaac-sim#5380)
pv-nvidia added a commit to pv-nvidia/IsaacLab that referenced this pull request Apr 24, 2026
Add RAII-style context managers for safe raw Fabric access:

- fabric_write(): calls PrepareForReuse on entry, update_world_xforms +
  sync on exit.  Provides world_matrices fabricarray and view_to_fabric
  mapping for custom warp kernel launches.
- fabric_read(): calls PrepareForReuse on entry (ensures valid pointers
  after topology changes), no-op on exit.

Also exposes read-only properties:
- world_matrices: the raw fabricarray of omni:fabric:worldMatrix
- view_to_fabric_mapping: the view-index to fabric-index mapping

This addresses Piotr's Issue isaac-sim#6 (reader/writer pattern) by providing
a structured way to bracket Fabric operations that ensures
PrepareForReuse and hierarchy updates are never forgotten.

Tests added:
- test_fabric_write_context_manager: validates write + readback
- test_fabric_read_context_manager: validates read without side effects

Depends on: fix/fabric-prepare-for-reuse (PR isaac-sim#5380)
pv-nvidia added a commit to pv-nvidia/IsaacLab that referenced this pull request Apr 24, 2026
Add RAII-style context managers for safe raw Fabric access:

- fabric_write(): calls PrepareForReuse on entry, update_world_xforms +
  sync on exit.  Provides world_matrices fabricarray and view_to_fabric
  mapping for custom warp kernel launches.
- fabric_read(): calls PrepareForReuse on entry (ensures valid pointers
  after topology changes), no-op on exit.

Also exposes read-only properties:
- world_matrices: the raw fabricarray of omni:fabric:worldMatrix
- view_to_fabric_mapping: the view-index to fabric-index mapping

This addresses Piotr's Issue isaac-sim#6 (reader/writer pattern) by providing
a structured way to bracket Fabric operations that ensures
PrepareForReuse and hierarchy updates are never forgotten.

Tests added:
- test_fabric_write_context_manager: validates write + readback
- test_fabric_read_context_manager: validates read without side effects

Depends on: fix/fabric-prepare-for-reuse (PR isaac-sim#5380)
pv-nvidia added a commit to pv-nvidia/IsaacLab that referenced this pull request Apr 24, 2026
Add RAII-style context managers for safe raw Fabric access:

- fabric_write(): calls PrepareForReuse on entry, update_world_xforms +
  sync on exit.  Provides world_matrices fabricarray and view_to_fabric
  mapping for custom warp kernel launches.
- fabric_read(): calls PrepareForReuse on entry (ensures valid pointers
  after topology changes), no-op on exit.

Also exposes read-only properties:
- world_matrices: the raw fabricarray of omni:fabric:worldMatrix
- view_to_fabric_mapping: the view-index to fabric-index mapping

This addresses Piotr's Issue isaac-sim#6 (reader/writer pattern) by providing
a structured way to bracket Fabric operations that ensures
PrepareForReuse and hierarchy updates are never forgotten.

Tests added:
- test_fabric_write_context_manager: validates write + readback
- test_fabric_read_context_manager: validates read without side effects

Depends on: fix/fabric-prepare-for-reuse (PR isaac-sim#5380)
pv-nvidia added a commit to pv-nvidia/IsaacLab that referenced this pull request Apr 24, 2026
Add RAII-style context managers for safe raw Fabric access:

- fabric_write(): calls PrepareForReuse on entry, update_world_xforms +
  sync on exit.  Provides world_matrices fabricarray and view_to_fabric
  mapping for custom warp kernel launches.
- fabric_read(): calls PrepareForReuse on entry (ensures valid pointers
  after topology changes), no-op on exit.

Also exposes read-only properties:
- world_matrices: the raw fabricarray of omni:fabric:worldMatrix
- view_to_fabric_mapping: the view-index to fabric-index mapping

This addresses Piotr's Issue isaac-sim#6 (reader/writer pattern) by providing
a structured way to bracket Fabric operations that ensures
PrepareForReuse and hierarchy updates are never forgotten.

Tests added:
- test_fabric_write_context_manager: validates write + readback
- test_fabric_read_context_manager: validates read without side effects

Depends on: fix/fabric-prepare-for-reuse (PR isaac-sim#5380)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants