Invalid probability value in tensor when running mpo #174

kan-s0 · 2022-04-18T05:35:48Z

Describe the bug
RuntimeError when running mpo

To Reproduce

python main.py --config.mpo.atari --env.name breakout --sync

When config is modified with the values shown in the paper, it occurs faster and more frequently.

Expected behavior

An error occurred when calculating multinomial method with pi from Actor network.
RuntimeError: probability tensor contains either inf, nan or element < 0

Screenshots

Development Env. (OS, version, libraries):

Additional context

Even with default config, an error sometimes occurs after a lot of learning.
If you set the config to the value shown in the paper, you get a much higher score at the beginning, but an error quickly occurs.

The text was updated successfully, but these errors were encountered:

kan-s0 added the bug Something isn't working label Apr 18, 2022

kan-s0 assigned atech-rl-kakaoenterprise Apr 18, 2022

Provide feedback