You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The documentation mentions that rewards column is never deleted, however when using PGTorchPolicy and PGTorchTrainer, only the advantages column is available, and there is no reward.
This prohibits using offline algorithm, like re-creating the offline datasets example in the docs here.
Reproduction (REQUIRED)
Can be recreated by following the offline datasets example. Generate data with PG:
Looking at the outputs json file for this training, there are no rewards field. Also, when using the generated data to train DQN, it fails with KeyError: rewards (and KeyError: new_obs).
I have verified my script runs in a clean environment and reproduces the issue.
I have verified the issue also occurs with the latest wheels.
The text was updated successfully, but these errors were encountered:
dibgerge
added
bug
Something that is supposed to be working; but isn't
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Jan 23, 2021
sven1977
added
P1
Issue that should be fixed within a few weeks
rllib
and removed
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Feb 10, 2021
Workaround for now: _use_trajectory_view_api=False.
Sorry about the delay. I thought this was fixed in an earlier PR, but it wasn't.
This PR here (see above) fixes the issue for both torch and tf. It makes sure that certain "core" keys are always available in the loss/output, such as rewards, dones, infos, etc..
What is the problem?
Ray version: nightly build
DL package: Torch
The documentation mentions that rewards column is never deleted, however when using
PGTorchPolicy
andPGTorchTrainer
, only theadvantages
column is available, and there is noreward
.This prohibits using offline algorithm, like re-creating the offline datasets example in the docs here.
Reproduction (REQUIRED)
Can be recreated by following the offline datasets example. Generate data with PG:
Looking at the outputs json file for this training, there are no
rewards
field. Also, when using the generated data to train DQN, it fails withKeyError: rewards
(andKeyError
:new_obs
).The text was updated successfully, but these errors were encountered: