Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] New API stack EnvRunners were not passing EnvContext to environments (but plain config dicts). #43818

Merged

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Mar 8, 2024

Fix: New API stack EnvRunners were not passing EnvContext to environments (but plain config dicts).

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Copy link
Collaborator

@simonsays1980 simonsays1980 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Some general questions to our design for creating environments.

if not self.config.is_multi_agent():
raise ValueError(
f"Cannot use this EnvRunner class ({type(self).__name__}), if your "
"setup is nt multi-agent! Try adding multi-agent information to your "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny typo: "nt -> not"

)

# Perform actual gym.make call.
self.env = gym.make("rllib-multi-agent-env-v0")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw. gym.make has a autoreset flag with which we can set off the auto resetting after the end of an epsiode if needed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gym.register as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok, maybe that's useful for the callback we fixed yesterday. I mean for multi-agent, this doesn't matter right now b/c we are only using a single gym.Env here (no gym.vector.Env).
Would this disabling of auto-reset work for gym.vector.Env (for our SingleAgentEnvRunners)? I couldn't find anything on the gymnasium docs about this, only that it's always on and no mention of an off-switch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, gym.vector.make() does not have this option. It's doing autoreset always.
And gym.vector.register() does not exist :(

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use gym.register() with the vector_entry_point and autoreset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could work! Lemme check ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't work. I think Vector.Env overrides this setting again somehow :( .

import gymnasium as gym
from ray.rllib.env.utils import _gym_env_creator
from functools import partial

env = "CartPole-v1"

entry_point = partial(
    _gym_env_creator,
    env_context={},
    env_descriptor="CartPole-v1",
)
gym.register("rllib-single-agent-env-runner-v0", entry_point=entry_point, autoreset=False)

# Wrap into `VectorListInfo`` wrapper to get infos as lists.
env = gym.wrappers.VectorListInfo(
    gym.vector.make(
        "rllib-single-agent-env-runner-v0",
        num_envs=2,
        asynchronous=False,
    )
)

obs, infos = env.reset()

while True:
    actions = env.action_space.sample()
    obs, rewards, terminateds, truncateds, infos = env.step(actions)

    for i in range(2):
        # Manually reset sub-env.
        if terminateds[i] or truncateds[i]:
            # If we continue stepping here, we should see a warning:
            # ```
            # "You are calling 'step()' even though this environment has already ... "
            # ```
            # but we don't, so vector env DOES seem to auto-reset, no matter what :(
            print(f"sub-env {i} done -> needs to be reset?")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the source code. In SyncVectorEnv and AsyncVectorEnv the autoreset flag is set to False explicitly. No chance then.

"""Creates a MultiAgentEnv (is-a gymnasium env)."""
env_ctx = self.config.env_config
if not isinstance(env_ctx, EnvContext):
env_ctx = EnvContext(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need the EnvContext to start the gymnasium environments? gymnasium uses the kwargs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gym Envs don't need it, but RLlib users are so used to having these properties in the env config dict that it would break things for a lot of them:

config.worker_index = ...
config.num_workers = ...
config["some_setting_for_env_ctor"] = ...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. It is sometimes a bit unintuitive to use a dicitonary instead of directly inserting variables like in gymnasium directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree. It's historical and I don't want to break this API right now. We should discuss this in more detail. For now I just wanted to make the new stack behave just like the old in this regard.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understandable.

Signed-off-by: Sven Mika <sven@anyscale.io>
@sven1977 sven1977 merged commit c47a374 into ray-project:master Mar 8, 2024
8 of 9 checks passed
ryanaoleary pushed a commit to ryanaoleary/ray that referenced this pull request Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants