Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update vectorized reinforcement learning #104

Merged
merged 2 commits into from
Jun 6, 2024

Conversation

engintoklu
Copy link
Collaborator

This pull request updates vectorized reinforcement functionalities of EvoTorch, so that they are compatible with the gymnasium 1.0.x API (while preserving compatibility with gymnasium 0.29.x).

In more details, this pull request introduces an EvoTorch-specific SyncVectorEnv implementation (as an alternative to gymnasium's SyncVectorEnv class). This custom SyncVectorEnv preserves the classical auto-reset behavior on which EvoTorch relies, allowing us to transition to gymnasium 1.0.x.

Having our custom SyncVectorEnv allows us to introduce these performance-related improvements as well:

  • observations, rewards, etc. reported by SyncVectorEnv are now moved into the device of where the policies are executed;
  • sub-environments of SyncVectorEnv that have reached the maximum number of episodes are not executed further.

Brax-related notebook examples are also refactored. Instead of including the entire brax example in a single notebook, there are now two notebooks, one focusing on the training and the other focusing on the visualization. The visualization example is updated so that it works correctly with the latest version of brax.

This commit updates vectorized reinforcement
functionalities of EvoTorch, so that they are
compatible with the gymnasium 1.0.x API (while
preserving compatibility with gymnasium 0.29.x).

In more details, this commit introduces an
EvoTorch-specific `SyncVectorEnv` implementation
(as an alternative to gymnasium's SyncVectorEnv
class). This custom SyncVectorEnv preserves the
classical auto-reset behavior on which EvoTorch
relies, allowing us to transition to gymnasium
1.0.x.

Having our custom `SyncVectorEnv` allows us to
introduce these performance-related improvements
as well:
(i) observations, rewards, etc. reported by
`SyncVectorEnv` are now moved into the device of
where the policies are executed;
(ii) sub-environments of SyncVectorEnv that have
reached the maximum number of episodes are not
executed further.

Brax-related notebook examples are also
refactored. Instead of including the entire brax
example in a single notebook, there are now two
notebooks, one focusing on the training and the
other focusing on the visualization. The
visualization notebook is updated so that it
works correctly with the latest version of brax.
@engintoklu engintoklu added the enhancement New feature or request label Jun 3, 2024
@engintoklu engintoklu self-assigned this Jun 3, 2024
Fix a typo in the code
@flukeskywalker flukeskywalker merged commit d995b97 into master Jun 6, 2024
1 of 2 checks passed
@flukeskywalker flukeskywalker deleted the refactor/updated-vecgymne branch June 6, 2024 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants