Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Gymnasium/Gym0.26.x support (new Env.reset()/step()/seed()/render() APIs). #28369

Merged
merged 216 commits into from
Dec 20, 2022

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Sep 8, 2022

Gymnasium 0.26.3 has been released with major changes in the by-default settings for environments. A custom gym.Env (now: gymnasium.Env) subclass is one of the most important entry points of RLlib users to our library.

To read more about gymnasium, go here: https://github.com/Farama-Foundation/Gymnasium

RLlib should therefore support the new APIs (e.g. Env.reset() now returns obs AND infos and Env.step() returns terminated and truncated flags, except for the old done one) going forward.

Users that are still using the old gym.Env APIs in their classes should either rewrite those classes to abide to the new API or use the provided wrappers (gymnasium.wrappers.EnvCompatibility or ray.rllib.env.wrappers.multi_agent_env_compatibility.py::MultiAgentEnvCompatibility)

A detailed error message is being provided if users are still on the old gym package or are using old-API gymnasium.Env subclasses.

This PR:

  • Replaces all import gym or related statements by import gymnasium as gym.
  • Alters all RLlib env APIs, such as BaseEnv, VectorEnv, MultiAgentEnv, etc.. to fully be compatible with the new gymnasium APIs, meaning e.g. VectorEnv.reset_at() now returns obs AND infos as well as takes optional seed and options arguments.
  • Addresses the related pettingzoo, minigrid, Atari, etc.. updates as well. For example, all Atari experiments now have as their env setting the new "ALE/" prefix, e.g. config.environment("ALE/Pong-v5", frameskip=1) for an equivalent to PongNoFrameskip-v4.
  • Reinterprets the old done flag the same as the new terminated flag (e.g. in loss functions, DONES has been replaced with TERMINATED). The new truncated flag is collected as well and available in all train batches (even though, it's mostly ignored). This should improve our loss math as we are currently e.g. setting Q-values to 0.0, even though an episode is only truncated (CartPole-v1 after 500 ts), but not really terminated!
  • The config settings horizon, soft_horizon and no_done_at_end have all been deprecated and must no longer be used. Instead, users should implement the proper logic in their environments, using the gymnasium.wrappers.TimeLimit wrapper, properly returning terminated vs truncated (instead of an indiscriminate done), and properly picking a new initial state after reset().
  • A backward compatibility check (checks for RLlib producing the correct error messages) has been added to catch usages of the gym package, but also gymnasium Envs that still use the old APIs.
  • Seeding is done via the Env.reset() method in gymnasium. RLlib RolloutWorkers make sure this is properly implemented (instead of pre-seeding an env via its seed() method, which has been deprecated). In future PRs, we will allow users to individually set seeds and options per episode (reset(self, *, seed=.., options=..)) via callbacks.
  • This PR is already massive. No docs changes have been added thus far to not blow things out of proportion. This will be done in follow-up PRs.

Major TODOs before this can be merged:

  • Fix env runner code: resetting, soft resetting, done -> terminated translation, etc..
  • seeding should work more transparently (which worker, which sub-env gets which seed?) and per episode (seeding now happens on reset, not in env c'tor or seed() method anymore).

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977
Copy link
Contributor Author

sven1977 commented Sep 8, 2022

@jsuarez5341 @jkterry1 ^

May take a while to get all test cases to pass and merge this.

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Copy link
Member

@avnishn avnishn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add a config flag now called no reset on truncated?

The point of extra truncated signals is so that we can do what rllib does for SAC where we enable no reset on done.

We probably now need to do no reset on truncated.

def reset(
self,
seed: Optional[int] = None,
) -> Tuple[MultiAgentDict, MultiAgentDict]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't reset also now support reset kwargs? We would need to add those if that's the case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your were right, this was added as options, just like in gymnasium. It's currently NOT supported for RLlib users (won't break, but users cannot add any content per episode to that kwargs dict).

I wanted to enable this in a follow-up PR.

if not self.initialized:
# TODO(sven): Should we make it possible to pass in a seed here?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not necessary. Users should call reset themselves for that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But they can't. This logic is normally entirely encapsulated in our RolloutWorker/Sampler/EnvRunner logic, which starts the endless loop right away with a poll() call (NOT a reset).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh gotcha. In that case I'd rather change it over there than over here.

@@ -71,6 +71,7 @@ def check_shape(self, observation: Any) -> None:
)
try:
if not self._obs_space.contains(observation):
print()#TODO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scratch work? Or on purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still in progress.

@@ -209,6 +224,7 @@ def get_type(var):
if not env.observation_space.contains(temp_sampled_next_obs):
raise ValueError(error)
_check_done(done)
_check_done(truncated)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add flag to this function to change the error message to be about truncated or done.

@sven1977 sven1977 changed the title [RLlib] Gym 0.26 support. [WIP; RLlib] Gym 0.26 support. Sep 8, 2022
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@jsuarez5341
Copy link
Contributor

@jsuarez5341 @jkterry1 ^

May take a while to get all test cases to pass and merge this.

From the latest gym announcements in case you want to get something working with older envs:

"[26] comes with number of breaking changes that in previous versions were turned off. For users wanting to use the new gym version but have old gym environments, we provide the EnvStepCompatibility wrapper and gym.make(..., apply_api_compatibility=True) to using these environments."

Signed-off-by: sven1977 <svenmika1977@gmail.com>
…0_26_support

Signed-off-by: sven1977 <svenmika1977@gmail.com>

# Conflicts:
#	rllib/tests/test_multi_agent_env.py
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
…0_26_support

Signed-off-by: sven1977 <svenmika1977@gmail.com>

# Conflicts:
#	rllib/evaluation/env_runner_v2.py
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Copy link
Contributor

@maxpumperla maxpumperla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving, pending a question (likely atari)

@@ -31,7 +31,7 @@
"To run the application, first install some dependencies.\n",
"\n",
"```bash\n",
"pip install gym[atari]\n",
"pip install gymnasium[atari] gym==0.26.2\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why both?

Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 merged commit 8e680c4 into ray-project:master Dec 20, 2022
@maziarg
Copy link

maziarg commented Dec 21, 2022

Thanks for woking on this issue, having RLlib supporting gymnasium is something I have been waiting for. Do we know roughly when you are planning to release version 2.3?

tamohannes pushed a commit to ju2ez/ray that referenced this pull request Jan 25, 2023
…PIs). (ray-project#28369)

Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>
@afennelly-mitre
Copy link

@sven1977 I noticed that in ray/rllib/utils/pre_checks/env.py, within the check_gym_environments() method, there is the following snippet (see [line 162]):(

# Raise warning if using new reset api introduces in gym 0.24
)

  # Raise warning if using new reset api introduces in gym 0.24
  reset_signature = inspect.signature(env.unwrapped.reset).parameters.keys()
  if any(k in reset_signature for k in ["seed", "return_info"]):
      if log_once("reset_signature"):
          logger.warning(
              "Your env reset() method appears to take 'seed' or 'return_info'"
              " arguments. Note that these are not yet supported in RLlib."
              " Seeding will take place using 'env.seed()' and the info dict"
              " will not be returned from reset."
          )

Is this still the case after this PR was merged, ie. will seeding still take place using env.seed()? Thank you!

@sven1977 sven1977 deleted the gym_0_26_support branch June 2, 2023 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants