-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] Re-instantiate on_episode_created
callback for multi-agent (for now).
#43779
[RLlib] Re-instantiate on_episode_created
callback for multi-agent (for now).
#43779
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Important detail with the new API. There is also no effort planned by Farama to change this. Maybe works with the asynchronous vector environment.
"sub-environments when they are terminated. Override the " | ||
"`on_episode_start` method instead, which gets fired right after " | ||
"the `env.reset()` call." | ||
"When using the new API stack in single-agent and with EnvRunners, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's a problem I encountered when evaluating against benchmarks, i.e. running a benhcmark against the policy which means the environment needs to be reconfigured for a benchmark run. The solution is to run episodes and after x episodes switch the configuration and call the sampler again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Some users need this opportunity to send config-information to the env before(!) it is reset, but after(!) an episode ended.
We'll have to figure out a way around this for single-agent now, using vector Env. :|
@@ -350,6 +350,8 @@ def _sample_timesteps( | |||
|
|||
# Create a new episode instance. | |||
self._episode = self._new_episode() | |||
self._make_on_episode_callback("on_episode_created", self._episode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could even write self.make_on_episode_callback("on_episode_created")
as self._episode
is used anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed (in next PR :) )
Re-instantiate
on_episode_created
callback for multi-agent (for now).The callback
on_episode_created
is currently not available on the new stack due to the usage ofgym.vector.Env
, which automatically resets sub-environments under the hood (so there is no chance for RLlib to get in between an episode ending and the reset of the next one).Solution:
Single-agent: Find setting in gym to disable this behavior.
Multi-agent: Same as above, BUT also, RLlib currently does NOT even use
gym.vector.Env
for multi-agent, so here we might - for the time being - simply re-enable this callback.Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.