Is VecNormalize for PPO2 is necessary?[question] #694

yjc765 · 2020-02-18T17:42:42Z

Hi, thanks for the good repo!
I trained the same agent(a human model on Mujoco) with same environment with PPO2, one is only with DummyVecEnv and the other one with DummyVecEnv and VecNormalize

from stable_baselines.common.policies import MlpPolicy, FeedForwardPolicy
from stable_baselines.common.vec_env import DummyVecEnv, VecNormalize
from stable_baselines import PPO2
import robosuite as suite
import tensorflow as tf

env = suite.make('HumanReachMultiDir',
                 use_camera_obs=False,
                 has_renderer=False,
                 ignore_done=False,
                 has_offscreen_renderer=False,
                 horizon=500,
                 use_her=False,
                 use_indicator_object=True,
                 reward_shaping=True,
                 )
env = DummyVecEnv([lambda: env])
env = VecNormalize(env, norm_obs=True, norm_reward=False,
                   clip_obs=10., gamma=0.95) # with or without VecNormalize
model = PPO2(MlpPolicy, env, gamma=0.95, n_steps=4096, learning_rate=5e-5, nminibatches=16, verbose=1, n_cpu_tf_sess=8)
model.learn(total_timesteps=int(10e7))
model.save('trained_model')
# Load saved model
model = PPO2.load('trained_model")
model.set_env(env)
obs = env.reset()
for _ in range(500):
    action, _states = model.predict(obs)
    obs, rewards, dones, info = env.step(action)
    env.render()

The result shows that the agent with VecNormalizeis much worse than without VecNormalize. But as here says

When applying RL to a custom problem, you should always normalize the input to the agent (e.g. using VecNormalize for PPO2/A2C).

I wonder if the reason is from the way I load the model and reset?
Thank you for everyone's reply!

The text was updated successfully, but these errors were encountered:

araffin · 2020-02-18T18:08:17Z

Hello,

You should also normalize the reward (but not during testing), btw, why did you change the default values of VecNormalize?
(you may need to run some hyperparameter optimization anyway)

EDIT: your gamma looks quite small compared to "classic" range

yjc765 · 2020-02-19T12:16:20Z

@araffin Thanks for your reply!
I followed the example from here

The gamma I chose is because the paper I read for training human model with PPO was 0.95.
Do you suggest to use default values of VecNormalize (e.g. norm_reward=True, gamma = 0.99)?
By the way, is it correct if the code remains like this during running the loaded model? I'm a little bit confused with

You should also normalize the reward (but not during testing) :

from stable_baselines.common.policies import MlpPolicy, FeedForwardPolicy
from stable_baselines.common.vec_env import DummyVecEnv, VecNormalize
from stable_baselines import PPO2
import robosuite as suite
import tensorflow as tf

env = suite.make('HumanReachMultiDir',
                 use_camera_obs=False,
                 has_renderer=True,
                 ignore_done=False,
                 has_offscreen_renderer=False,
                 horizon=500,
                 use_her=False,
                 use_indicator_object=True,
                 reward_shaping=True,
                 )
env = DummyVecEnv([lambda: env])
env = VecNormalize(env, norm_obs=True, norm_reward=False,
                   clip_obs=10., gamma=0.99) # when training norm_reward = True
model = PPO2.load('trained_model")
model.set_env(env)
obs = env.reset()
for _ in range(500):
    action, _states = model.predict(obs)
    obs, rewards, dones, info = env.step(action)
    env.render()

Thank you very much!

araffin · 2020-02-19T12:30:45Z

Do you suggest to use default values of VecNormalize

yes

You should also normalize the reward (but not during testing) :

your code looks right, but in fact, for you two questions, I would suggest you to use the rl zoo, it does (almost) everything for you (and you can see tuned hyperparameters for similar envs).

yjc765 · 2020-02-19T12:47:20Z

@araffin Thanks for your suggestion and I will check them!
Can I ask one more question please?
In your implementation of PPO2 there is a parameter learning_rate, is it the learning rate for policy or for value function? or the same learning rate value for both?
Because I read OpenAI Spinning UP, and they have two parameters pi_lr (float) – Learning rate for policy optimizer and vf_lr (float) – Learning rate for value function optimizer.
Therefore I'm a little bit confused with this.
Thanks!

araffin · 2020-02-19T12:52:58Z

is it the learning rate for policy or for value function? or the same learning rate value for both?

Best is to look at the code for those questions, but yes it is for both.

araffin · 2020-02-20T16:03:41Z

Related: #698

araffin added the question Further information is requested label Feb 18, 2020

araffin added the custom gym env Issue related to Custom Gym Env label Feb 18, 2020

yjc765 closed this as completed Feb 19, 2020

yjc765 reopened this Feb 19, 2020

araffin closed this as completed Feb 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is VecNormalize for PPO2 is necessary?[question] #694

Is VecNormalize for PPO2 is necessary?[question] #694

yjc765 commented Feb 18, 2020 •

edited

Loading

araffin commented Feb 18, 2020 •

edited

Loading

yjc765 commented Feb 19, 2020 •

edited

Loading

araffin commented Feb 19, 2020

yjc765 commented Feb 19, 2020

araffin commented Feb 19, 2020

araffin commented Feb 20, 2020

Is VecNormalize for PPO2 is necessary?[question] #694

Is VecNormalize for PPO2 is necessary?[question] #694

Comments

yjc765 commented Feb 18, 2020 • edited Loading

araffin commented Feb 18, 2020 • edited Loading

yjc765 commented Feb 19, 2020 • edited Loading

araffin commented Feb 19, 2020

yjc765 commented Feb 19, 2020

araffin commented Feb 19, 2020

araffin commented Feb 20, 2020

yjc765 commented Feb 18, 2020 •

edited

Loading

araffin commented Feb 18, 2020 •

edited

Loading

yjc765 commented Feb 19, 2020 •

edited

Loading