Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Fix env rendering and recording options (for non-local mode; >0 workers; +evaluation-workers). #14796

Merged
merged 15 commits into from
Mar 23, 2021

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Mar 19, 2021

This PR addresses the following problem:
RLlib's env rendering and video recording options are currently buggy.

  • Allows custom gym.Envs to be rendered in an automatic window by simply returning a np.array RGB-image in the render() method.
  • Alternatively, custom Envs can take care of their own rendering mechanism via their own window handling.
  • Fixes video recording for non-local mode and num_workers > 0.
  • Adds an example script that shows how to use both options in a simple corridor env.
  • Soft-obsoletes "monitor" config option for clarity.
  • Works also for evaluation-only (via evaluation config as shown in the new example script).

IMPORTANT NOTE:
A recent bug in openAI gym prevents RLlib's "record_env" option from recording videos properly. Instead, the produced mp4 files have a size of 1kb and are corrupted. A simple fix for this is described here:
openai/gym#1925

Why are these changes needed?

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@sven1977 sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Mar 22, 2021
@rfali
Copy link
Contributor

rfali commented Jun 1, 2021

I was trying to use the render and recorder for the PettingZoo environments, but only the render works (and the pygame window crashes at the end of episode) and the recorder doesn't record anything at all (there is no videos folder as well). Has this patch been verified to work with custom multi-agent envs?

I installed the nightly wheels from here and also upgraded gym to latest version.
ray: 2.0.0.dev0
gym: 0.18.3
pettingzoo: 1.8.2

First, I verified that the rllib/examples/env_rendering_and_recording.py works, renders and saves the videos. There is a helpful prompt that says the recorded video is being saved with location path.

I then tried 2 pettingzoo environments (waterworld and space_invaders), both of them did render but the pygame window crashes. If render is set to False, then the training completes but there are no videos folder, or any prompt that they are being saved. Here is the code I tried from one of the rllib examples rllib/examples/multi_agent_parameter_sharing.py

from ray import tune
from ray.tune.registry import register_env
from ray.rllib.env.wrappers.pettingzoo_env import PettingZooEnv
from pettingzoo.sisl import waterworld_v3

if __name__ == "__main__":

    def env_creator(args):
        return PettingZooEnv(waterworld_v3.env())

    env = env_creator({})
    register_env("waterworld", env_creator)

    obs_space = env.observation_space
    act_space = env.action_space

    policies = {"shared_policy": (None, obs_space, act_space, {})}

    # for all methods
    policy_ids = list(policies.keys())

    tune.run(
        "APEX_DDPG",
        stop={"episodes_total": 10},
        checkpoint_freq=10,
        local_dir="my_results",
        config={

            # Enviroment specific
            "env": "waterworld",

            # General
            "num_gpus": 1,
            "num_workers": 2,
            "num_envs_per_worker": 8,
            "learning_starts": 1000,
            "buffer_size": int(1e5),
            "compress_observations": True,
            "rollout_fragment_length": 20,
            "train_batch_size": 512,
            "gamma": .99,
            "n_step": 3,
            "lr": .0001,
            "prioritized_replay_alpha": 0.5,
            "final_prioritized_replay_beta": 1.0,
            "target_network_update_freq": 50000,
            "timesteps_per_iteration": 25000,

            # Method specific
            "multiagent": {
                "policies": policies,
                "policy_mapping_fn": (lambda agent_id: "shared_policy"),
            },
            "evaluation_interval": 1,
            "evaluation_num_episodes": 2,
            "evaluation_num_workers": 1,
            "evaluation_config": {
                "record_env": "videos",
                "render_env": False,
            },
        },
    )

This one uses the space_invaders game and I also moved the render and recorder out of the evaluation config, but there was no change to the outcome.

from ray import tune
from ray.tune.registry import register_env
from ray.rllib.env.wrappers.pettingzoo_env import PettingZooEnv
from pettingzoo.atari import space_invaders_v1

if __name__ == "__main__":

    def env_creator(args):
        return PettingZooEnv(space_invaders_v1.env())

    env = env_creator({})
    register_env("space_invaders", env_creator)

    obs_space = env.observation_space
    act_space = env.action_space

    policies = {"shared_policy": (None, obs_space, act_space, {})}

    # for all methods
    policy_ids = list(policies.keys())

    tune.run(
        "PPO",
        stop={"episodes_total": 10},
        checkpoint_freq=10,
        local_dir="my_results",
        config={
            # Enviroment specific
            "env": "space_invaders",

            # General
            "num_gpus": 1,
            "num_workers": 1,
            "num_envs_per_worker": 2,
            "record_env": "videos",
            "render_env": False,
        
        },
    )

Please let me know if I should open this as a separate issue. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants