Some questions regarding VecNormalize #698

siferati · 2020-02-19T23:05:46Z

According to the docs, when creating custom environments, we should always normalize the observation space. For this, you have the VecNormalize wrapper, which creates a moving average and then normalizes the obs.

Let´s say I have 2 observations: height (m) and weight (kg) of a person. My observation space would be something like a Box with low = [0, 0] and high = [2.5, 300]. But since I'm using a VecNormalize, this isn't correct anymore, right?

So should I instead change it to low = [-10, -10] and high = [10, 10]? (10 being the default clipping value for VecNormalize)

Another question: when should we normalize the rewards as well? (in the mujoco example shown in the docs you chose to only normalize the observations - why?)

Finally, what's the purpose of the discount factor? Should it be the same as the discount factor of whatever algorithm we're using?

The text was updated successfully, but these errors were encountered:

araffin · 2020-02-20T16:03:16Z

So should I instead change it to

the boundaries in the observation space does not really matter (for everything that is not images), we usually set them to [-inf, inf].

Another question: when should we normalize the rewards as well?
Finally, what's the purpose of the discount factor? Should it be the same as the discount factor of whatever algorithm we're using?

Good question, the answer is there: openai#538 and openai#629
additional resource: #234

Should it be the same as the discount factor of whatever algorithm we're using?

yes

in the mujoco example shown in the docs you chose to only normalize the observations - why?

We should change that (we would appreciate a PR for that), it is an old example, no real reason for not normalizing the reward too.

siferati · 2020-02-22T18:58:12Z

Thank you!

One more question: is VecNormalize compatible with GoalEnv? How would I go about using VecNormalize with HER?

araffin · 2020-02-22T19:13:28Z

is VecNormalize compatible with GoalEnv?

For now, it is not.

How would I go about using VecNormalize with HER?

I would advise to use DDPG and its built-in normalization in that case, or to create a gym.Wrapper (instead of a VecEnvWrapper) that will do the job.

siferati · 2020-02-22T20:07:16Z

Ok, thanks!

I would advise to use DDPG and its built-in normalization in that case

Oh, so not all algorithms require us to normalize the observations? Are there other that have this built-in besides DDPG?

araffin · 2020-02-22T20:09:38Z

Oh, so not all algorithms require us to normalize the observations?

No, as mentioned in the doc, it is usually crucial for PPO and A2C, but not for SAC and TD3 for instance. DDPG is the only one, for legacy reason.

cevans3098 · 2020-02-26T02:06:31Z

I wanted to follow up on this topic to ensure I am implementing VecNormalize properly. I am primarily looking at how to continue to training while keeping moving average from the previous training. I have attached some code below

    env_learn = SubprocVecEnv(env_list)
    env_learn = VecCheckNan(env_learn , raise_exception=True)
    env_learn = VecNormalize(env_learn , training=True, norm_obs=True, norm_reward=True)

    env_render = Simulator()     # custom environment used to rending
    env_render.render_environment = True

    model = PPO2(policy ='CustomPolicy', env = env_learn , verbose = 1, 
                 vf_coef = VF_COEFF,
                 noptepochs = EPOCHS,
                 ent_coef = ENT_COEFF,
                 learning_rate = LEARNING_RATE,
                 tensorboard_log = tensorboard_log_location,
                 n_steps = NSTEPS,
                 nminibatches = MINIBATCHES)

    model.save(results_folder + run_name)

    # Training the model
    for i in range(number_training_steps):
        logname = run_name + '_' + str(i)
        model.learn(total_timesteps = int((total_timesteps/number_training_steps)),
                    reset_num_timesteps = False,
                    tb_log_name = logname)
        
        env_learn.close()
        
        path = results_folder + logname
        model.save(path)

        # testing the performance of the model
        for j in range(3):
            obs = env_render.reset()
            done = False
            rewards = 0.0
            while done != True:
                action, _states = model.predict(obs)
                obs, reward, done, info = env_render.step(action)
                rewards += reward
                env_render.render(mode='file')
            print(f'For Training step {i}, and test number {j} the reward was: {rewards}')

        if i < number_training_steps:
            env_learn = SubprocVecEnv(env_list)
            env_learn = VecCheckNan(env_learn, raise_exception=True)
            env_learn = VecNormalize(env_learn, training=True, norm_obs=True, norm_reward=True)
            model.load(load_path=path)
            model.set_env(env_learn)

Curious if anyone has done an implementation like this in the past and if I am doing it correctly. I am concerned with the way I currently have it implemented, the moving averages won't be retained for the next set of training the way I have it setup

siferati · 2020-02-26T14:46:03Z

Does the layer_norm kwarg have anything to do with this? Or is that a different kind of normalization?

araffin · 2020-02-26T15:28:12Z

Or is that a different kind of normalization?

Layer normalization is quite different, see associated paper: https://arxiv.org/abs/1607.06450
It is there mostly because of the parameter noise exploration for DDPG (cf doc).

I wanted to follow up on this topic to ensure I am implementing VecNormalize properly.

@cevans3098 I can only recommend you to take a look at the rl zoo, you forgot to save and load the VecNormalize in your case.

Closing this issue as the original question was answered.

araffin added the question Further information is requested label Feb 19, 2020

araffin mentioned this issue Feb 20, 2020

Is VecNormalize for PPO2 is necessary?[question] #694

Closed

araffin closed this as completed Feb 26, 2020

araffin mentioned this issue Jul 6, 2020

[question] In train.py, why is gamma in VecNormalize not updated per trial? araffin/rl-baselines-zoo#91

Open

araffin mentioned this issue Nov 5, 2020

Question about layer normalize and VecNormalize #1031

Closed

araffin mentioned this issue Mar 10, 2021

Question: normalize_reward not subtracting mean DLR-RM/stable-baselines3#348

Closed

araffin mentioned this issue Apr 4, 2023

Can't understand reward scaling in value clipping of PPO DLR-RM/stable-baselines3#1426

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some questions regarding VecNormalize #698

Some questions regarding VecNormalize #698

siferati commented Feb 19, 2020

araffin commented Feb 20, 2020

siferati commented Feb 22, 2020

araffin commented Feb 22, 2020

siferati commented Feb 22, 2020

araffin commented Feb 22, 2020

cevans3098 commented Feb 26, 2020

siferati commented Feb 26, 2020

araffin commented Feb 26, 2020

Some questions regarding VecNormalize #698

Some questions regarding VecNormalize #698

Comments

siferati commented Feb 19, 2020

araffin commented Feb 20, 2020

siferati commented Feb 22, 2020

araffin commented Feb 22, 2020

siferati commented Feb 22, 2020

araffin commented Feb 22, 2020

cevans3098 commented Feb 26, 2020

siferati commented Feb 26, 2020

araffin commented Feb 26, 2020