PPO on learned reward #2

dsbrown1331 · 2019-04-07T02:26:11Z

I was able to train the embedding reward, but when I ran PPO it ended with a reward of zero. Were you able to get non-zero rewards on Montezuma's Revenge?

| approxkl | 0.00054461794 |
| clipfrac | 0.1274414 |
| eplenmean | 1.38e+03 |
| eprewmean | 0 |
| explained_variance | 0.888 |
| fps | 1082 |
| nupdates | 1950 |
| policy_entropy | 2.5611527 |
| policy_loss | -0.0030562507 |
| serial_timesteps | 249600 |
| time_elapsed | 9.28e+03 |
| total_timesteps | 9984000 |
| value_loss | 0.1622782 |

MaxSobolMark · 2019-04-15T02:40:14Z

There was an error in the readme, instead of running on MontezumaRevengeNoFrameskip-v4 you should run PPO on MontezumaImmitationNoFrameskip-v4, which is the environment that takes the imitation reward into account.
That should give you some rewards from the imitation. I think that when I ran PPO on the imitation environment I got some positive reward from the actual Montezuma environment, but then it got stuck there.

MaxSobolMark closed this as completed Apr 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPO on learned reward #2

PPO on learned reward #2

dsbrown1331 commented Apr 7, 2019

MaxSobolMark commented Apr 15, 2019

PPO on learned reward #2

PPO on learned reward #2

Comments

dsbrown1331 commented Apr 7, 2019

MaxSobolMark commented Apr 15, 2019