Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid probability value in tensor when running mpo #174

Open
kan-s0 opened this issue Apr 18, 2022 · 0 comments
Open

Invalid probability value in tensor when running mpo #174

kan-s0 opened this issue Apr 18, 2022 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@kan-s0
Copy link
Collaborator

kan-s0 commented Apr 18, 2022

Describe the bug
RuntimeError when running mpo

To Reproduce

python main.py --config.mpo.atari --env.name breakout --sync

When config is modified with the values shown in the paper, it occurs faster and more frequently.

Expected behavior

  • An error occurred when calculating multinomial method with pi from Actor network.
  • RuntimeError: probability tensor contains either inf, nan or element < 0

Screenshots

training graph

스크린샷 2022-04-18 오후 2 36 23

  • default config, green, also causes an error at 7M.

error txt

스크린샷 2022-04-18 오후 2 23 06

mpo generated agent code

스크린샷 2022-04-18 오후 2 28 12

Development Env. (OS, version, libraries):

  • linux
  • V4XLARGE
  • python 3.7.11
  • jorldy:0.3.0

Additional context

  • Even with default config, an error sometimes occurs after a lot of learning.
  • If you set the config to the value shown in the paper, you get a much higher score at the beginning, but an error quickly occurs.
@kan-s0 kan-s0 added the bug Something isn't working label Apr 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants