Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: update rllib and pettingzoo examples to current release versions #113

Open
elliottower opened this issue Mar 1, 2023 · 33 comments
Assignees

Comments

@elliottower
Copy link
Contributor

I'm planning to do some testing with the newly released ray 2.3.0 and I noticed the pettingzoo example is 9 months old and still uses gym rather than gymnasium, both should probably be updated. Will create a PR if I can get them to work but figure I'd create an issue first. Appreciate the repo, very excited to use it.

@jagapiou
Copy link
Member

jagapiou commented Mar 1, 2023

That sounds very useful thank you! I don't know the Ray stuff etc that well, so @duenez is best to comment on any details.

@jagapiou jagapiou assigned jagapiou and duenez and unassigned jagapiou Mar 1, 2023
@duenez
Copy link
Collaborator

duenez commented Mar 1, 2023

Hello Elliot,

That would be fantastic! Please let me know if there's anything I can do to help.

@elliottower
Copy link
Contributor Author

elliottower commented Mar 2, 2023

Hello Elliot,

That would be fantastic! Please let me know if there's anything I can do to help.

Appreciate it, will reach out if I run into any issues. I got a good bit of progress today, although a major blocker to updating the entire pettingzoo tutorial is that stable-baselines3 doesn't officially support gymnasium yet (pettingzoo has migrated as of October 2022). There's some open pull requests which may get merged in the future (stable-baselines3 stable-baselines3-contrib but the only way to get it working currently is manually installing the feature branch: pip install git+https://github.com/DLR-RM/stable-baselines3@feat/gymnasium-support. I can do this in the meantime for development but I'd imagine merging the tutorial should probably wait on an official release.

Edit: looks like there's an API incompatibility with pettingzoo and the sb3 gymnasium support feature branch (link) so the pettingzoo tutorial will have to wait until that is fixed.

I got most of the pettingzoo code adapted though, I chose to use the SB3's frame stacking functionality rather than supersuit because the latter required casting types to float, stacking frames, and then casting back to uint8 for SB3. I've also heard that supersuit will be closed down and its functionality migrated directly into gymnasium and pettingzoo, so I figure it's best not to rely on it too much if possible.

My changes so far: elliottower@64b38e9

Luckily ray/rllib 2.3.0 does have gymnasium support so that shouldn't be a blocker, planning to work on that tomorrow.

@elliottower
Copy link
Contributor Author

elliottower commented Mar 2, 2023

One minor issue I had with this repo was running the pylint script, didn't find any documentation on it and for some reason running pylint --rcfile .pylintrc didn't work.

I'm currently running the pytest suite but it seems to be going very slowly and says it has 4787 items. Edit: took 34 minutes 30 seconds on a 2019 intel mbp. The .sh install scripts worked great though and the tests ran pretty quickly there (less than 5 mins iirc)

@duenez
Copy link
Collaborator

duenez commented Mar 2, 2023

Yes, some of our tests are very slow (the full substrate tests). I believe we don't run all the tests in the install.sh script.

@elliottower
Copy link
Contributor Author

elliottower commented Mar 3, 2023

Is there a reason the pettingzoo env is defined with a separate superclass class _ParallelEnv? I got some issues when trying to use wrapper functions from pettingzoo/gymnasium/SB3 because the base class when you do env.unwrapped is _ParallelEnv.

Moving the EzPickle init line into the regular class (which is of the correct type parallel pettingzoo env) led to an error saying it can’t load the lab2d object from pickle. I don’t have experience with lab2d or the EzPickle function, imagine the specific definition of this ParallelEnv with the two superclasses (one for the env one for EzPickle) is what allows it to load properly?

Edit: Nevermind, I got the EzPickle component to work by restructuring the code and using wrappers. Will post a PR once I'm finished.

@elliottower
Copy link
Contributor Author

Any idea on how to deal with these warnings? Getting them when running pettingzoo API tests, thinking they may be corner cases that the pettingzoo wrapper doesn't check for @jagapiou @duenez
W @/Users/elliottower/Documents/GitHub/meltingpot//meltingpot/lua/modules/prefab_utils.lua:83] Character x not found in the charPrefabMap. Ignoring.

W @/Users/elliottower/Documents/GitHub/meltingpot//meltingpot/lua/modules/base_simulation.lua:225] GameObject 'scene' did not have a Transform component, but explicitly specifying one is strongly preferred. Using a default.

W @/Users/elliottower/Documents/GitHub/meltingpot//meltingpot/lua/modules/base_simulation.lua:219] GameObject 'scene' did not have a StateManager component, but explicitly specifying one is strongly preferred. Using a default.

Also PR is up: #117, running final tests and will then make a PR on shimmy with melting pot compatibility. I think using shimmy is best as it has comprehensive tests and is used for other API conversions, so I can update this PR to use shimmy once that gets added (as opposed to examples/pettingzoo/utils.py)

@duenez
Copy link
Collaborator

duenez commented Mar 4, 2023 via email

@elliottower
Copy link
Contributor Author

Will get the list of substrates later today when I rerun the tests.

@duenez Do you happen to know if it’s possible to do rendering using PIL or something other than matplotlib? Not sure exactly how the rendering works internally but on the shimmy PR they raised concerns about not wanting to have to import matplotlib (most farama code uses pygame afaik and/or PIL, I think it might be a speed concern?)

And fwiw the pickle code is still used in other farama games like Atari and their devs said it looks fine.

@duenez
Copy link
Collaborator

duenez commented Mar 7, 2023

Yes, of course. The observation is produced as a raw numpy array of height X width X channels (3, RGB). Si you can render this to anything you want. One possibility is doing imshow in matplotlib, but equally possible is using PIL.

@elliottower
Copy link
Contributor Author

@duenez thanks, I ended up searching through the code more and finding an example using pygame rendering, which is what is used by all other pettingzoo envs, and mirrored that functionality in the shimmy wrapper.

@elliottower
Copy link
Contributor Author

@duenez is it possible to reset the environment using a specific seed rather than re-initializing a new environment with a specified seed? The new gym API requires that the reset() function take a seed argument, I've been trying to adapt the code but it doesn't seem to be possible without initializing the environment again, which I also have had trouble figuring out how to do.

The only python file I see using seeding is builder_test.py but all of the example scripts and documentation seem to indicate that the higher level substrate.build() or substrate.build_from_config() are preferred to manually calling builder.builder.

I think the correct code should be something like this, but it says that the config is missing they key simulation.

config = meltingpot.python.substrate.get_config(substrate_name)
env = builder.build(config, env_seed=seed)

Have been having a lot of difficulty understanding the inner workings of this code and if it's even possible to do what I want to do. Maybe it's best to change the way the reset() function works internally to allow a seed to be specified?

@duenez
Copy link
Collaborator

duenez commented Mar 8, 2023

You need to get the factory:
https://github.com/deepmind/meltingpot/blob/7de41d2db0e5eca31107312d405e20ff3a7da39e/meltingpot/python/substrate.py#L70

config = meltingpot.python.substrate.get_config(substrate_name)
factory = meltingpot.python.substrate.get_factory_from_config(config)
env = factory.build(config.default_player_roles)

If you need to explicitly set the seed, we currently don't expose it in the builder through the canonical way to build substrates. The reason we need a factory is because now the config doesn't need to know the number of players ahead of time. This means that we must have an intermediate step when we have the player roles to know what we need to build.

If this is critical, we can think about piping a seed through. What do you think?

@elliottower
Copy link
Contributor Author

elliottower commented Mar 8, 2023

You need to get the factory:

https://github.com/deepmind/meltingpot/blob/7de41d2db0e5eca31107312d405e20ff3a7da39e/meltingpot/python/substrate.py#L70

config = meltingpot.python.substrate.get_config(substrate_name)
factory = meltingpot.python.substrate.get_factory_from_config(config)
env = factory.build(config.default_player_roles)

If you need to explicitly set the seed, we currently don't expose it in the builder through the canonical way to build substrates. The reason we need a factory is because now the config doesn't need to know the number of players ahead of time. This means that we must have an intermediate step when we have the player roles to know what we need to build.

If this is critical, we can think about piping a seed through. What do you think?

Thanks for the quick reply, makes sense about the number of players. So I take it using the factory doesn’t allow seeding either? It’s a core functionality of the current gymnasium and pettingzoo APIs so if it’s possible to make the reset method take a seed argument that would be awesome.

In gymnasium/pettingzoo you have to call reset() on the env before using it and set the seed that way, but if your code doesn’t do that maybe it’s better to pipe the seed through the factory build method and I can make the reset method rebuild the substrate using that.

We are also planning to make a shimmy wrapper for lab2d and gymnasium, do you happen to know if that would run into the same issue?

@duenez
Copy link
Collaborator

duenez commented Mar 8, 2023

It depends on the shimmy API. We usually don't have a way to reset substrates without rebuilding the whole thing anyway, because some stochastic choices occur at the creation stage. This is why we have a reset wrapper that does this: https://github.com/deepmind/meltingpot/blob/7de41d2db0e5eca31107312d405e20ff3a7da39e/meltingpot/python/utils/substrates/wrappers/reset_wrapper.py

@elliottower
Copy link
Contributor Author

It depends on the shimmy API. We usually don't have a way to reset substrates without rebuilding the whole thing anyway, because some stochastic choices occur at the creation stage. This is why we have a reset wrapper that does this: https://github.com/deepmind/meltingpot/blob/7de41d2db0e5eca31107312d405e20ff3a7da39e/meltingpot/python/utils/substrates/wrappers/reset_wrapper.py

Makes sense, I did some more testing and saw that the underlying dm_env code defines _rng to be a certain seed so I tried setting env._rng in the shimmy wrapper but then noticed what you mentioned about the stochastic environment creation. So is it possible to modify the reset wrapper to take in a seed? If you could help out with adding support for seeding that it would be greatly appreciated.

I could look into it myself as well (and potentially make a PR) but I’d need a better idea of where to look and which files to modify. Might be simpler for someone more familiar with the repo though.

@duenez
Copy link
Collaborator

duenez commented Mar 8, 2023 via email

@elliottower
Copy link
Contributor Author

Appreciate it

@jagapiou
Copy link
Member

jagapiou commented Mar 9, 2023

Fundamentally, there's a API incompatibility here: a gym reset(seed) cannot be forwarded to a dmenv reset(). So you'll have to rebuild the Substrate/Scenario on reset as you say.

FYI right now you can change the Substrate seed at training-time: [EDIT: this was true in v1 it is not true in v2]

But that won't help with Scenarios, and they'd need a seed to also be passed to the background bots when building them.

There may also be other sources of randomness (e.g. calls to random in Wrappers) or other sources of non determinism (e.g. python set iteration order varies on each process) that might make things hard. So we'd need a bunch of tests of determinism.

Finally, rather than adding an optional seed to all build methods, I think it might be better to add a seed (optional so we're still DMEnv compatible) to dmlab2d.Lab2d.reset (https://github.com/deepmind/lab2d/blob/main/dmlab2d/__init__.py). Since this is where the seed is actually randomized and set. Then our Substrate.reset method can then inherit and use that (and I think we can kill our ResetWrapper).

So it's not a small job. Alternatively, is there a quick hack?

  1. Ignore the seed argument and just let the environment be stochastic (trigger a warning or whatever).
  2. If reset(seed) accepts a default or a "randomize seed" sentinel, raise a NotImplementedError if a fixed seed has been requested.

@jagapiou
Copy link
Member

jagapiou commented May 4, 2023

Sorry update here: it's not currently possible to change the substrate seed. builder takes an env_seed that allows this but it is not currently wired in.

@dimonenka
Copy link
Contributor

I have locally updated the rllib example to use ray==2.5.1 instead of 2.0.0, should I submit a pull request?

@jagapiou
Copy link
Member

Amazing, yes please!

@elliottower
Copy link
Contributor Author

elliottower commented Jul 24, 2023

Just fwiw when I get a chance I’m planning to do a PR updating the pettingzoo example to use shimmy and current pettingzoo, I’ve created official SB3 tutorials for pettingzoo in the most recent release so it would be cool to show this on our docs site as another example using SB3.

Also, once RLlib merges my PR (they have some other blockers but it should be soon) RLlib will be fully compatible with current gymnasium/pettingzoo, so users will be able to do RLlib through pettingzoo if they want. well as directly like in these tutorials.

We have a full CleanRL training script for Atari games that I imagine could also be similarly adapted to work on melting pot using a pettingzoo wrapper. I won’t have time to fully adapt it but thinking when I do at least update the pettingzoo tutorial I can mention something along the lines of “this enables you to use any sort of training framework as shown in our tutorials” (we even have one using LangChain, though that’s only suitable for games like chess, I would imagine LLMs would do poorly on visual input like this)

@duenez
Copy link
Collaborator

duenez commented Jul 24, 2023

Fantastic, let me know if there's anything on our side that's needed.

@jagapiou
Copy link
Member

FYI: @dimonenka's PR updating rllib: #153

@itstyren
Copy link

Hi, I found a minor typo in the file "example/pettingzoo/sb3_train.py" on line 86.

The environment name should be commons_harvest__open instead of commons_harvest_open. Could you please take a look? Thanks.

@elliottower
Copy link
Contributor Author

Hey all, I’ve been planning on it for a while but finally have time to make a PR updating this tutorial now that PettingZoo has official SB3 tutorials, was thinking about other possible tutorial options such as AgileRL, which now supports PettingZoo directly.

Also would be nice to demonstrate the Shimmy conversion wrapper I made, which was based on this example but is more full featured and tested against every underlying environment in our CI, supports serialization via pickle, etc. Maybe @duenez has thoughts on the best way to proceed?

@benbind
Copy link

benbind commented May 17, 2024

It looks like I'm late to the party! Is it now possible to make commons_harves__open deterministic? The 'wired in' hyperlink seemed like it was useful, but it looks like that page has been taken down at this point.

@jzleibo
Copy link
Collaborator

jzleibo commented May 17, 2024

It should be possible to make the substrates deterministic. You can see how the seed is set here:

if env_seed is None:
# Select a long seed different than zero.
env_seed = random.randint(1, _MAX_SEED)
env_seeds = (seed % (_MAX_SEED + 1) for seed in itertools.count(env_seed))
def build_environment():
seed = next(env_seeds)
lab2d_settings_dict["env_seed"] = str(seed) # Sets the Lua seed.
env_raw = dmlab2d.Lab2d(_DMLAB2D_ROOT, lab2d_settings_dict)
observation_names = env_raw.observation_names()
return dmlab2d.Environment(
env=env_raw,
observation_names=observation_names,
seed=seed)

It should work if you just change that logic to pass a fixed seed.

@benbind
Copy link

benbind commented May 17, 2024

Even with overwriting a seed, the agents spawn in different areas:
image
image

@duenez
Copy link
Collaborator

duenez commented May 17, 2024

You might need to use a custom builder, the raw builder in builder.py
(https://github.com/google-deepmind/meltingpot/blob/main/meltingpot/utils/substrates/builder.py) does allow a seed, and does lead to deterministic results for me. But the normal substrate.build() does not by design: https://github.com/google-deepmind/meltingpot/blob/main/meltingpot/substrate.py

@duenez
Copy link
Collaborator

duenez commented May 17, 2024

Actually, that's where you were doing this. Don't use the zero seed, because that means to resample a seed. Use a fixed, non-zero one (like 42 :) ). If that fails, try setting you python random seed too, as some objects are built on the python side, but I don't think avatars would.

@benbind
Copy link

benbind commented May 17, 2024

Still no dice:
image
In builder.py I'm directly setting seed = 42, as well as setting resetting random.seed(42) each time I call random.random().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants