Suggestion: Include the possible environment score ranges in the main descriptions #79

nhansendev · 2022-04-13T05:08:48Z

I found the table with the range of possible scores in the appendix of the paper and thought it could be a useful reference to include in a more visible location, such as on the main github page alongside the environment descriptions:

"
C. Normalization Constants
Rmin is computed by training a policy with masked out observations. This demonstrates what score is trivially achievable in each environment. Rmax is computed in several different ways. For CoinRun, Dodgeball, Miner, Jumper, Leaper, Maze, BigFish, Heist, Plunder, Ninja, and Bossfight, the maximal theoretical and practical reward is trivial to compute.

For CaveFlyer, Chaser, and Climber, we empirically determine Rmax by generating many levels and computing the average max achievable reward.

For StarPilot and FruitBot, the max practical reward is not obvious, even though it is easy to establish a theoretical bound. We choose to define Rmax in these environments as the score PPO achieves after 8 billion timesteps when trained at an 8x larger batch size than our default hyperparameters. On observing these policies, we find them very close to optimal.
"

What do you think?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: Include the possible environment score ranges in the main descriptions #79

Suggestion: Include the possible environment score ranges in the main descriptions #79

nhansendev commented Apr 13, 2022

Suggestion: Include the possible environment score ranges in the main descriptions #79

Suggestion: Include the possible environment score ranges in the main descriptions #79

Comments

nhansendev commented Apr 13, 2022