You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was able to train the embedding reward, but when I ran PPO it ended with a reward of zero. Were you able to get non-zero rewards on Montezuma's Revenge?
There was an error in the readme, instead of running on MontezumaRevengeNoFrameskip-v4 you should run PPO on MontezumaImmitationNoFrameskip-v4, which is the environment that takes the imitation reward into account.
That should give you some rewards from the imitation. I think that when I ran PPO on the imitation environment I got some positive reward from the actual Montezuma environment, but then it got stuck there.
I was able to train the embedding reward, but when I ran PPO it ended with a reward of zero. Were you able to get non-zero rewards on Montezuma's Revenge?
| approxkl | 0.00054461794 |
| clipfrac | 0.1274414 |
| eplenmean | 1.38e+03 |
| eprewmean | 0 |
| explained_variance | 0.888 |
| fps | 1082 |
| nupdates | 1950 |
| policy_entropy | 2.5611527 |
| policy_loss | -0.0030562507 |
| serial_timesteps | 249600 |
| time_elapsed | 9.28e+03 |
| total_timesteps | 9984000 |
| value_loss | 0.1622782 |
The text was updated successfully, but these errors were encountered: