Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Include the possible environment score ranges in the main descriptions #79

Open
nhansendev opened this issue Apr 13, 2022 · 0 comments

Comments

@nhansendev
Copy link

I found the table with the range of possible scores in the appendix of the paper and thought it could be a useful reference to include in a more visible location, such as on the main github page alongside the environment descriptions:

"
C. Normalization Constants
Rmin is computed by training a policy with masked out observations. This demonstrates what score is trivially achievable in each environment. Rmax is computed in several different ways. For CoinRun, Dodgeball, Miner, Jumper, Leaper, Maze, BigFish, Heist, Plunder, Ninja, and Bossfight, the maximal theoretical and practical reward is trivial to compute.

For CaveFlyer, Chaser, and Climber, we empirically determine Rmax by generating many levels and computing the average max achievable reward.

For StarPilot and FruitBot, the max practical reward is not obvious, even though it is easy to establish a theoretical bound. We choose to define Rmax in these environments as the score PPO achieves after 8 billion timesteps when trained at an 8x larger batch size than our default hyperparameters. On observing these policies, we find them very close to optimal.
"
scores

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant