Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions regarding VecNormalize #698

Closed
siferati opened this issue Feb 19, 2020 · 8 comments
Closed

Some questions regarding VecNormalize #698

siferati opened this issue Feb 19, 2020 · 8 comments
Labels
question Further information is requested

Comments

@siferati
Copy link

According to the docs, when creating custom environments, we should always normalize the observation space. For this, you have the VecNormalize wrapper, which creates a moving average and then normalizes the obs.

Let´s say I have 2 observations: height (m) and weight (kg) of a person. My observation space would be something like a Box with low = [0, 0] and high = [2.5, 300]. But since I'm using a VecNormalize, this isn't correct anymore, right?

So should I instead change it to low = [-10, -10] and high = [10, 10]? (10 being the default clipping value for VecNormalize)

Another question: when should we normalize the rewards as well? (in the mujoco example shown in the docs you chose to only normalize the observations - why?)

Finally, what's the purpose of the discount factor? Should it be the same as the discount factor of whatever algorithm we're using?

@araffin araffin added the question Further information is requested label Feb 19, 2020
@araffin
Copy link
Collaborator

araffin commented Feb 20, 2020

So should I instead change it to

the boundaries in the observation space does not really matter (for everything that is not images), we usually set them to [-inf, inf].

Another question: when should we normalize the rewards as well?
Finally, what's the purpose of the discount factor? Should it be the same as the discount factor of whatever algorithm we're using?

Good question, the answer is there: openai#538 and openai#629
additional resource: #234

Should it be the same as the discount factor of whatever algorithm we're using?

yes

in the mujoco example shown in the docs you chose to only normalize the observations - why?

We should change that (we would appreciate a PR for that), it is an old example, no real reason for not normalizing the reward too.

@siferati
Copy link
Author

Thank you!

One more question: is VecNormalize compatible with GoalEnv? How would I go about using VecNormalize with HER?

@araffin
Copy link
Collaborator

araffin commented Feb 22, 2020

is VecNormalize compatible with GoalEnv?

For now, it is not.

How would I go about using VecNormalize with HER?

I would advise to use DDPG and its built-in normalization in that case, or to create a gym.Wrapper (instead of a VecEnvWrapper) that will do the job.

@siferati
Copy link
Author

Ok, thanks!

I would advise to use DDPG and its built-in normalization in that case

Oh, so not all algorithms require us to normalize the observations? Are there other that have this built-in besides DDPG?

@araffin
Copy link
Collaborator

araffin commented Feb 22, 2020

Oh, so not all algorithms require us to normalize the observations?

No, as mentioned in the doc, it is usually crucial for PPO and A2C, but not for SAC and TD3 for instance. DDPG is the only one, for legacy reason.

@cevans3098
Copy link

I wanted to follow up on this topic to ensure I am implementing VecNormalize properly. I am primarily looking at how to continue to training while keeping moving average from the previous training. I have attached some code below

    env_learn = SubprocVecEnv(env_list)
    env_learn = VecCheckNan(env_learn , raise_exception=True)
    env_learn = VecNormalize(env_learn , training=True, norm_obs=True, norm_reward=True)

    env_render = Simulator()     # custom environment used to rending
    env_render.render_environment = True

    model = PPO2(policy ='CustomPolicy', env = env_learn , verbose = 1, 
                 vf_coef = VF_COEFF,
                 noptepochs = EPOCHS,
                 ent_coef = ENT_COEFF,
                 learning_rate = LEARNING_RATE,
                 tensorboard_log = tensorboard_log_location,
                 n_steps = NSTEPS,
                 nminibatches = MINIBATCHES)

    model.save(results_folder + run_name)

    # Training the model
    for i in range(number_training_steps):
        logname = run_name + '_' + str(i)
        model.learn(total_timesteps = int((total_timesteps/number_training_steps)),
                    reset_num_timesteps = False,
                    tb_log_name = logname)
        
        env_learn.close()
        
        path = results_folder + logname
        model.save(path)

        # testing the performance of the model
        for j in range(3):
            obs = env_render.reset()
            done = False
            rewards = 0.0
            while done != True:
                action, _states = model.predict(obs)
                obs, reward, done, info = env_render.step(action)
                rewards += reward
                env_render.render(mode='file')
            print(f'For Training step {i}, and test number {j} the reward was: {rewards}')

        if i < number_training_steps:
            env_learn = SubprocVecEnv(env_list)
            env_learn = VecCheckNan(env_learn, raise_exception=True)
            env_learn = VecNormalize(env_learn, training=True, norm_obs=True, norm_reward=True)
            model.load(load_path=path)
            model.set_env(env_learn)

Curious if anyone has done an implementation like this in the past and if I am doing it correctly. I am concerned with the way I currently have it implemented, the moving averages won't be retained for the next set of training the way I have it setup

@siferati
Copy link
Author

Does the layer_norm kwarg have anything to do with this? Or is that a different kind of normalization?

@araffin
Copy link
Collaborator

araffin commented Feb 26, 2020

Or is that a different kind of normalization?

Layer normalization is quite different, see associated paper: https://arxiv.org/abs/1607.06450
It is there mostly because of the parameter noise exploration for DDPG (cf doc).

I wanted to follow up on this topic to ensure I am implementing VecNormalize properly.

@cevans3098 I can only recommend you to take a look at the rl zoo, you forgot to save and load the VecNormalize in your case.

Closing this issue as the original question was answered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants