Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VecNormalize vs Scaling #629

Closed
lhorus opened this issue Oct 1, 2018 · 1 comment
Closed

VecNormalize vs Scaling #629

lhorus opened this issue Oct 1, 2018 · 1 comment
Labels

Comments

@lhorus
Copy link

lhorus commented Oct 1, 2018

From my understanding, VecNormalize scales the rewards or observations in real-time, that is, according to the running observances, i.e., mean and stddev get updated in real time. However, for the typical RL envs presented here, one actually knows the reward and observation ranges, so why not use scaling based on that ? (e.g. scikit's preprocessing framework)

@pzhokhov
Copy link
Collaborator

pzhokhov commented Oct 1, 2018

Hi @lhorus ! While the reward and observation ranges are known ahead of time, the actual achieved level of reward can vary quite a bit throughout training. Consider the humanoid runner, where the reward is the distance that the humanoid managed to run before falling - in the beginning, the rewards will be small (the humanoid cannot run and falls) and in this regime VecNormalize amplifies small reward signal. When the model trains to run / walk successfully, the rewards can be rather large - and here VecNormalize will imitate behaviour of scaling by max reward range.
That being said, VecNormalize is not always necessary (and has its own drawbacks such as serialization difficulty). By default we only apply VecNormalize to mujoco environments; and there are several environment wrappers that allow for constant reward and observation scaling (ScaledFloatFrame, RewardScaler)

@pzhokhov pzhokhov closed this as completed Oct 1, 2018
mozammalchy pushed a commit to mozammalchy/baselines that referenced this issue Dec 20, 2019
* Bump version

* Add a message to PPO2 assert (closes openai#625)

* Update replay buffer doctring (closes openai#610)

* Don't specify a version for pytype

* Fix `VecEnv` docstrings (closes openai#577)

* Typo

* Re-add python version for pytype
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants