Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rllib] rewards column is being discarded when using PG #13646

Closed
2 tasks done
dibgerge opened this issue Jan 23, 2021 · 2 comments · Fixed by #14036
Closed
2 tasks done

[rllib] rewards column is being discarded when using PG #13646

dibgerge opened this issue Jan 23, 2021 · 2 comments · Fixed by #14036
Assignees
Labels
bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks

Comments

@dibgerge
Copy link
Contributor

What is the problem?

Ray version: nightly build
DL package: Torch

The documentation mentions that rewards column is never deleted, however when using PGTorchPolicy and PGTorchTrainer, only the advantages column is available, and there is no reward.

This prohibits using offline algorithm, like re-creating the offline datasets example in the docs here.

Reproduction (REQUIRED)

Can be recreated by following the offline datasets example. Generate data with PG:

rllib train
    --run=PG \
    --env=CartPole-v0 \
    --config='{"output": "/tmp/cartpole-out", "output_max_file_size": 5000000}' \
    --stop='{"timesteps_total": 100000}'

Looking at the outputs json file for this training, there are no rewards field. Also, when using the generated data to train DQN, it fails with KeyError: rewards (and KeyError: new_obs).

  • I have verified my script runs in a clean environment and reproduces the issue.
  • I have verified the issue also occurs with the latest wheels.
@dibgerge dibgerge added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jan 23, 2021
@sven1977 sven1977 added P1 Issue that should be fixed within a few weeks rllib and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Feb 10, 2021
@sven1977 sven1977 self-assigned this Feb 10, 2021
@sven1977
Copy link
Contributor

sven1977 commented Feb 10, 2021

Workaround for now: _use_trajectory_view_api=False.

Sorry about the delay. I thought this was fixed in an earlier PR, but it wasn't.
This PR here (see above) fixes the issue for both torch and tf. It makes sure that certain "core" keys are always available in the loss/output, such as rewards, dones, infos, etc..

@sven1977
Copy link
Contributor

@dibgerge, thanks for filing this!

sven1977 added a commit that referenced this issue Feb 12, 2021
… in certain situations when using the traj. view API. (#14036)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks
Projects
None yet
2 participants