VecNormalize vs Scaling #629

lhorus · 2018-10-01T11:07:48Z

From my understanding, VecNormalize scales the rewards or observations in real-time, that is, according to the running observances, i.e., mean and stddev get updated in real time. However, for the typical RL envs presented here, one actually knows the reward and observation ranges, so why not use scaling based on that ? (e.g. scikit's preprocessing framework)

pzhokhov · 2018-10-01T16:37:22Z

Hi @lhorus ! While the reward and observation ranges are known ahead of time, the actual achieved level of reward can vary quite a bit throughout training. Consider the humanoid runner, where the reward is the distance that the humanoid managed to run before falling - in the beginning, the rewards will be small (the humanoid cannot run and falls) and in this regime VecNormalize amplifies small reward signal. When the model trains to run / walk successfully, the rewards can be rather large - and here VecNormalize will imitate behaviour of scaling by max reward range.
That being said, VecNormalize is not always necessary (and has its own drawbacks such as serialization difficulty). By default we only apply VecNormalize to mujoco environments; and there are several environment wrappers that allow for constant reward and observation scaling (ScaledFloatFrame, RewardScaler)

* Bump version * Add a message to PPO2 assert (closes openai#625) * Update replay buffer doctring (closes openai#610) * Don't specify a version for pytype * Fix `VecEnv` docstrings (closes openai#577) * Typo * Re-add python version for pytype

pzhokhov added the question label Oct 1, 2018

pzhokhov closed this as completed Oct 1, 2018

pzhokhov mentioned this issue Oct 1, 2018

PPO2 + VecNormalize #630

Open

araffin mentioned this issue Apr 6, 2019

[Question] DDPG return normalization hill-a/stable-baselines#234

Closed

araffin mentioned this issue Feb 20, 2020

Some questions regarding VecNormalize hill-a/stable-baselines#698

Closed

araffin mentioned this issue Mar 10, 2021

Question: normalize_reward not subtracting mean DLR-RM/stable-baselines3#348

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VecNormalize vs Scaling #629

VecNormalize vs Scaling #629

lhorus commented Oct 1, 2018

pzhokhov commented Oct 1, 2018

VecNormalize vs Scaling #629

VecNormalize vs Scaling #629

Comments

lhorus commented Oct 1, 2018

pzhokhov commented Oct 1, 2018