[RLlib] agent index is missing from policy input dict on environment reset #37521
Labels
bug
Something that is supposed to be working; but isn't
P1
Issue that should be fixed within a few weeks
rllib
RLlib related issues
rllib-samplingbackend
Issues around the sampling backend of RLlib
What happened + What you expected to happen
If I write a custom policy class that overrides
compute_actions_from_input_dict(input_dict, ...)
, then in older versions of RLlib this input_dict always had theSampleBatch.AGENT_INDEX
set correctly. With the latest version, it is set correctly except for on the first timestep after an environment reset. This is becauseEnvRunnerV2.__process_resetted_obs_for_eval
does not add an agent_index key to the input dict, unlikeEnvRunnerV2._process_observations
which does add this key. It should be very simple to fix this bug: just changeto
in
EnvRunnerV2.__process_resetted_obs_for_eval
.I would submit a pull request myself but in the past my PRs have been ignored.
Versions / Dependencies
Ray: 2.5.1
Python: 3.9.16
OS: Ubuntu 22.04.2 LTS
Reproduction script
This gives an error with Ray 2.5.1 but is fixed if I change the code snippet I mentioned above in
EnvRunnerV2
:Issue Severity
Medium: It is a significant difficulty but I can work around it.
The text was updated successfully, but these errors were encountered: