-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
0.5.0 #76
0.5.0 #76
Conversation
- workaround for `env.render` - remove `make_box_space_readable` - fix `VecRobotSegmentationObservationWrapper`
…python floats for gymnasium
…, so reward function is deterministic as well.
…-vis is true, fix goal visual to now show up for rgb_array and human
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also format the directory by black (but please take a look over which files are formatted, since pyproject.toml
might be updated for filtering which files to be formatted.
depth2 = rgbd[..., 7:8] / (2**10) | ||
depth1 = rgbd[..., 3:4] | ||
depth2 = rgbd[..., 7:8] | ||
if not scale_rgb_only: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this if-statement if scale_rgb_only
is always True, or add a comment to explain why depth is normalized by being divided by (2 ** 10).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Draft PR for upgrading ManiSkill2 to gymnasium as well making various improvements as discussed with @Jiayuan-Gu
Changes:
Breaking Changes
env.render
now accepts no arguments. The old render functions are separated out as other functions andenv.render
calls them and chooses which one based on theenv.render_mode
attribute (set usually upon env creation).env.step
returnsobservation, reward, terminated, truncated, info
. See https://gymnasium.farama.org/content/migration-guide/#environment-step for details. For ManiSkill2, the old done signal is now called terminated and truncated is False. All environments by default have a 200 max episode steps so truncated=True after 200 steps.env.reset
returns a tupleobservation, info
. For ManiSkill2, info is always an empty dictionary. Moreover,env.reset
accepts two new keyword arguments:seed: int, options: dict | None
. Note thatoptions
is usually used to configure various random settings/numbers of an environment. Previously ManiSkill2 used to use custom keyword arguments such asreconfigure
. These keyword arguments are still usable but must be passed through an options dict e.g.env.reset(options=dict(reconfigure=True))
.env.seed
has now been removed in favor of usingenv.reset(seed=val)
per the Gymnasium API.vec_env.observation_space
andvec_env.action_space
are batched under the new API, and the individual environment spaces are defined asvec_env.single_observation_space
andvec_env.single_action_space
info["success"]
. The scaled dense rewards are the new default reward function and is callednormalized_dense
. To use the old <0.5.0 ManiSkill2 dense rewards, setreward_mode
todense
.New Additions
Code
env.render_human
for creating a interactive GUI and viewer,env.render_rgb_array
for generating RGB images of the current env from a 3rd person perspective, andenv.render_cameras
which renders all the cameras (including rgb, depth, segmentation if available) and compacts them into one rgb image that is returned. Note that human and rgb_array are used only for visualization purposes. They may include artifacts like indicators of where the goal is for visualization purposes, see PickCube-v0 or PandaAvoidObstacles-v0 for examples. cameras mode is reflective of what the actual visual observations are returned by calls toenv.reset
andenv.step
.make_vec_env
now accepts amax_episode_steps
argument which overrides the defaultmax_episode_steps
specified when registering the environment. The defaultmax_episode_steps
is 200 for all environments, but note it may be more efficient for RL training and evaluation to use a smaller value as shown in the RL tutorials.Tutorials
Not Code
Bug Fixes
Miscellaneous Changes
mani_skill2.examples.demo_vec_env
module now accepts a--vecenv-type
argument which can be eitherms2
orgym
and defaults toms2
. Lets users benchmark the speed difference themselves. Module was further cleaned to print more nicelymain
functions now accept anargs
argument and allow for using those scripts from within python and not just the CLI. Used for testing purposes.--count
argument that lets you specify how many trajectories to replay. There is no data shuffling so the replayed trajectories will always be the same and in the same order. By default this isNone
meaning all trajectories are replayed.