Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.5.0 #76

Merged
merged 96 commits into from
Aug 16, 2023
Merged

0.5.0 #76

merged 96 commits into from
Aug 16, 2023

Conversation

StoneT2000
Copy link
Member

@StoneT2000 StoneT2000 commented Apr 1, 2023

Draft PR for upgrading ManiSkill2 to gymnasium as well making various improvements as discussed with @Jiayuan-Gu

Changes:

  • upgrade API to gymnasium
  • improve reward functions by scaling to [0, 1] and verifying they are working (@xuanlinli17 )
  • Removing goal visuals
  • Updating environment state representations
  • update stable baselines notebook tutorials to gymnasium versions
  • update stable baselines scripts to gymnasium versions
  • Add CleanRL style baselines
  • Semi-automated pytests on all environments (@StoneT2000 )
  • Move downloadable files to Google Storage and avoid using Google Drive.

Breaking Changes

  • env.render now accepts no arguments. The old render functions are separated out as other functions and env.render calls them and chooses which one based on the env.render_mode attribute (set usually upon env creation).
  • env.step returns observation, reward, terminated, truncated, info. See https://gymnasium.farama.org/content/migration-guide/#environment-step for details. For ManiSkill2, the old done signal is now called terminated and truncated is False. All environments by default have a 200 max episode steps so truncated=True after 200 steps.
  • env.reset returns a tuple observation, info. For ManiSkill2, info is always an empty dictionary. Moreover, env.reset accepts two new keyword arguments: seed: int, options: dict | None. Note that options is usually used to configure various random settings/numbers of an environment. Previously ManiSkill2 used to use custom keyword arguments such as reconfigure. These keyword arguments are still usable but must be passed through an options dict e.g. env.reset(options=dict(reconfigure=True)).
  • env.seed has now been removed in favor of using env.reset(seed=val) per the Gymnasium API.
  • ManiSkill VectorEnv is now also modified to adhere to the Gymnasium Vector Env API. Note this means that vec_env.observation_space and vec_env.action_space are batched under the new API, and the individual environment spaces are defined as vec_env.single_observation_space and vec_env.single_action_space
  • All reward functions have been changed to be scaled to the range of [0, 1], generally making any value-learning kind of approach more stable and avoiding gradient explosions. On any environment a reward of 1 indicates success as well and is also indicated by the boolean stored in info["success"]. The scaled dense rewards are the new default reward function and is called normalized_dense. To use the old <0.5.0 ManiSkill2 dense rewards, set reward_mode to dense.

New Additions

Code

  • Environment code come with separated render functions representing the old render modes. There is now env.render_human for creating a interactive GUI and viewer, env.render_rgb_array for generating RGB images of the current env from a 3rd person perspective, and env.render_cameras which renders all the cameras (including rgb, depth, segmentation if available) and compacts them into one rgb image that is returned. Note that human and rgb_array are used only for visualization purposes. They may include artifacts like indicators of where the goal is for visualization purposes, see PickCube-v0 or PandaAvoidObstacles-v0 for examples. cameras mode is reflective of what the actual visual observations are returned by calls to env.reset and env.step.
  • The ManiSkill2 VecEnv creator function make_vec_env now accepts a max_episode_steps argument which overrides the default max_episode_steps specified when registering the environment. The default max_episode_steps is 200 for all environments, but note it may be more efficient for RL training and evaluation to use a smaller value as shown in the RL tutorials.

Tutorials

  • All tutorials have been updated to reflect new gym API, new stable baselines 3, and should be more stable on google colab

Not Code

  • New CONTRIBUTING.md document has been added, with details on how to locally develop on ManiSkill2 and test it

Bug Fixes

Miscellaneous Changes

  • Dockerfile now accepts a python version as an argument
  • README and documentation updated to reflect new gym API
  • mani_skill2.examples.demo_vec_env module now accepts a --vecenv-type argument which can be either ms2 or gym and defaults to ms2. Lets users benchmark the speed difference themselves. Module was further cleaned to print more nicely
  • Various example scripts that have main functions now accept an args argument and allow for using those scripts from within python and not just the CLI. Used for testing purposes.
  • Fix some lack of quietness on some example scripts
  • Replaying trajectories accepts a new --count argument that lets you specify how many trajectories to replay. There is no data shuffling so the replayed trajectories will always be the same and in the same order. By default this is None meaning all trajectories are replayed.

@Jiayuan-Gu Jiayuan-Gu self-requested a review August 14, 2023 08:13
@Jiayuan-Gu Jiayuan-Gu self-assigned this Aug 14, 2023
Copy link
Contributor

@Jiayuan-Gu Jiayuan-Gu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also format the directory by black (but please take a look over which files are formatted, since pyproject.toml might be updated for filtering which files to be formatted.

requirements.txt Outdated Show resolved Hide resolved
mani_skill2/__init__.py Outdated Show resolved Hide resolved
depth2 = rgbd[..., 7:8] / (2**10)
depth1 = rgbd[..., 3:4]
depth2 = rgbd[..., 7:8]
if not scale_rgb_only:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this if-statement if scale_rgb_only is always True, or add a comment to explain why depth is normalized by being divided by (2 ** 10).

mani_skill2/utils/wrappers/observation.py Outdated Show resolved Hide resolved
mani_skill2/utils/registration.py Outdated Show resolved Hide resolved
mani_skill2/vector/wrappers/sb3.py Show resolved Hide resolved
tests/manual_test_venv.py Outdated Show resolved Hide resolved
@Jiayuan-Gu Jiayuan-Gu marked this pull request as ready for review August 14, 2023 09:00
Copy link
Contributor

@Jiayuan-Gu Jiayuan-Gu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@StoneT2000 StoneT2000 merged commit 0a5e5b8 into main Aug 16, 2023
@StoneT2000 StoneT2000 deleted the 0.5.0 branch August 23, 2023 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants