Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is VecNormalize for PPO2 is necessary?[question] #694

Closed
yjc765 opened this issue Feb 18, 2020 · 6 comments
Closed

Is VecNormalize for PPO2 is necessary?[question] #694

yjc765 opened this issue Feb 18, 2020 · 6 comments
Labels
custom gym env Issue related to Custom Gym Env question Further information is requested

Comments

@yjc765
Copy link

yjc765 commented Feb 18, 2020

Hi, thanks for the good repo!
I trained the same agent(a human model on Mujoco) with same environment with PPO2, one is only with DummyVecEnv and the other one with DummyVecEnv and VecNormalize

from stable_baselines.common.policies import MlpPolicy, FeedForwardPolicy
from stable_baselines.common.vec_env import DummyVecEnv, VecNormalize
from stable_baselines import PPO2
import robosuite as suite
import tensorflow as tf

env = suite.make('HumanReachMultiDir',
                 use_camera_obs=False,
                 has_renderer=False,
                 ignore_done=False,
                 has_offscreen_renderer=False,
                 horizon=500,
                 use_her=False,
                 use_indicator_object=True,
                 reward_shaping=True,
                 )
env = DummyVecEnv([lambda: env])
env = VecNormalize(env, norm_obs=True, norm_reward=False,
                   clip_obs=10., gamma=0.95) # with or without VecNormalize
model = PPO2(MlpPolicy, env, gamma=0.95, n_steps=4096, learning_rate=5e-5, nminibatches=16, verbose=1, n_cpu_tf_sess=8)
model.learn(total_timesteps=int(10e7))
model.save('trained_model')
# Load saved model
model = PPO2.load('trained_model")
model.set_env(env)
obs = env.reset()
for _ in range(500):
    action, _states = model.predict(obs)
    obs, rewards, dones, info = env.step(action)
    env.render()

The result shows that the agent with VecNormalizeis much worse than without VecNormalize. But as here says

When applying RL to a custom problem, you should always normalize the input to the agent (e.g. using VecNormalize for PPO2/A2C).

I wonder if the reason is from the way I load the model and reset?
Thank you for everyone's reply!

@araffin araffin added the question Further information is requested label Feb 18, 2020
@araffin
Copy link
Collaborator

araffin commented Feb 18, 2020

Hello,

You should also normalize the reward (but not during testing), btw, why did you change the default values of VecNormalize?
(you may need to run some hyperparameter optimization anyway)

EDIT: your gamma looks quite small compared to "classic" range

@araffin araffin added the custom gym env Issue related to Custom Gym Env label Feb 18, 2020
@yjc765 yjc765 closed this as completed Feb 19, 2020
@yjc765 yjc765 reopened this Feb 19, 2020
@yjc765
Copy link
Author

yjc765 commented Feb 19, 2020

@araffin Thanks for your reply!
I followed the example from here
image
The gamma I chose is because the paper I read for training human model with PPO was 0.95.
Do you suggest to use default values of VecNormalize (e.g. norm_reward=True, gamma = 0.99)?
By the way, is it correct if the code remains like this during running the loaded model? I'm a little bit confused with

You should also normalize the reward (but not during testing) :

from stable_baselines.common.policies import MlpPolicy, FeedForwardPolicy
from stable_baselines.common.vec_env import DummyVecEnv, VecNormalize
from stable_baselines import PPO2
import robosuite as suite
import tensorflow as tf

env = suite.make('HumanReachMultiDir',
                 use_camera_obs=False,
                 has_renderer=True,
                 ignore_done=False,
                 has_offscreen_renderer=False,
                 horizon=500,
                 use_her=False,
                 use_indicator_object=True,
                 reward_shaping=True,
                 )
env = DummyVecEnv([lambda: env])
env = VecNormalize(env, norm_obs=True, norm_reward=False,
                   clip_obs=10., gamma=0.99) # when training norm_reward = True
model = PPO2.load('trained_model")
model.set_env(env)
obs = env.reset()
for _ in range(500):
    action, _states = model.predict(obs)
    obs, rewards, dones, info = env.step(action)
    env.render()

Thank you very much!

@araffin
Copy link
Collaborator

araffin commented Feb 19, 2020

Do you suggest to use default values of VecNormalize

yes

You should also normalize the reward (but not during testing) :

your code looks right, but in fact, for you two questions, I would suggest you to use the rl zoo, it does (almost) everything for you (and you can see tuned hyperparameters for similar envs).

@yjc765
Copy link
Author

yjc765 commented Feb 19, 2020

@araffin Thanks for your suggestion and I will check them!
Can I ask one more question please?
In your implementation of PPO2 there is a parameter learning_rate, is it the learning rate for policy or for value function? or the same learning rate value for both?
Because I read OpenAI Spinning UP, and they have two parameters pi_lr (float) – Learning rate for policy optimizer and vf_lr (float) – Learning rate for value function optimizer.
Therefore I'm a little bit confused with this.
Thanks!

@araffin
Copy link
Collaborator

araffin commented Feb 19, 2020

is it the learning rate for policy or for value function? or the same learning rate value for both?

Best is to look at the code for those questions, but yes it is for both.

@araffin
Copy link
Collaborator

araffin commented Feb 20, 2020

Related: #698

@araffin araffin closed this as completed Feb 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
custom gym env Issue related to Custom Gym Env question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants