-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VecNormalize vs Scaling #629
Comments
Hi @lhorus ! While the reward and observation ranges are known ahead of time, the actual achieved level of reward can vary quite a bit throughout training. Consider the humanoid runner, where the reward is the distance that the humanoid managed to run before falling - in the beginning, the rewards will be small (the humanoid cannot run and falls) and in this regime VecNormalize amplifies small reward signal. When the model trains to run / walk successfully, the rewards can be rather large - and here VecNormalize will imitate behaviour of scaling by max reward range. |
* Bump version * Add a message to PPO2 assert (closes openai#625) * Update replay buffer doctring (closes openai#610) * Don't specify a version for pytype * Fix `VecEnv` docstrings (closes openai#577) * Typo * Re-add python version for pytype
From my understanding, VecNormalize scales the rewards or observations in real-time, that is, according to the running observances, i.e., mean and stddev get updated in real time. However, for the typical RL envs presented here, one actually knows the reward and observation ranges, so why not use scaling based on that ? (e.g. scikit's preprocessing framework)
The text was updated successfully, but these errors were encountered: