[rllib] rewards column is being discarded when using PG #13646

dibgerge · 2021-01-23T08:25:58Z

What is the problem?

Ray version: nightly build
DL package: Torch

The documentation mentions that rewards column is never deleted, however when using PGTorchPolicy and PGTorchTrainer, only the advantages column is available, and there is no reward.

This prohibits using offline algorithm, like re-creating the offline datasets example in the docs here.

Reproduction (REQUIRED)

Can be recreated by following the offline datasets example. Generate data with PG:

rllib train
    --run=PG \
    --env=CartPole-v0 \
    --config='{"output": "/tmp/cartpole-out", "output_max_file_size": 5000000}' \
    --stop='{"timesteps_total": 100000}'

Looking at the outputs json file for this training, there are no rewards field. Also, when using the generated data to train DQN, it fails with KeyError: rewards (and KeyError: new_obs).

I have verified my script runs in a clean environment and reproduces the issue.
I have verified the issue also occurs with the latest wheels.

The text was updated successfully, but these errors were encountered:

sven1977 · 2021-02-10T14:55:26Z

Workaround for now: _use_trajectory_view_api=False.

Sorry about the delay. I thought this was fixed in an earlier PR, but it wasn't.
This PR here (see above) fixes the issue for both torch and tf. It makes sure that certain "core" keys are always available in the loss/output, such as rewards, dones, infos, etc..

sven1977 · 2021-02-10T14:55:43Z

@dibgerge, thanks for filing this!

… in certain situations when using the traj. view API. (#14036)

dibgerge added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jan 23, 2021

sven1977 added P1 Issue that should be fixed within a few weeks rllib and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Feb 10, 2021

sven1977 self-assigned this Feb 10, 2021

sven1977 mentioned this issue Feb 10, 2021

[RLlib] Issue #13646: Rewards still not available in loss/json-output in certain situations when using the traj. view API. #14036

Merged

6 tasks

sven1977 closed this as completed in #14036 Feb 12, 2021

sven1977 added a commit that referenced this issue Feb 12, 2021

[RLlib] Issue #13646: Rewards still not available in loss/json-output…

936cb59

… in certain situations when using the traj. view API. (#14036)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rllib] rewards column is being discarded when using PG #13646

[rllib] rewards column is being discarded when using PG #13646

dibgerge commented Jan 23, 2021

sven1977 commented Feb 10, 2021 •

edited

sven1977 commented Feb 10, 2021

[rllib] rewards column is being discarded when using PG #13646

[rllib] rewards column is being discarded when using PG #13646

Comments

dibgerge commented Jan 23, 2021

What is the problem?

Reproduction (REQUIRED)

sven1977 commented Feb 10, 2021 • edited

sven1977 commented Feb 10, 2021

sven1977 commented Feb 10, 2021 •

edited