-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PandaPush-v2 does not learn with SB3 #21
Comments
Hi, What do you mean by "does not work" ? Is an error raised during code execution ? Or do you mean that the results that you obtain does not match with the curve on the paper ? The code I used for the paper is provided in this openai/baselines fork. It will allow you to reproduce strictly the results of the paper. Nevertheless, since openai has stopped maintaining its repo, I strongly advise you to use RL code maintained like stable-baselines3, even if you will probably not be able to reproduce exactly the results of the paper. If you still want to use OpenAI/baselines to reproduce strictly my results, please note that I used the v0 version of panda-gym (and not the v2 version I released in the meantime). The changes between these two versions won't change the curves much I think, but I can't guarantee it. |
Thanks for your reply. By does not work, I meant the learning curves did not match (No error in execution). I trained for almost 3x10^6 timesteps, but the success rate for PandaPush-v2 was stuck at 0.15 (The learning curves in the paper converge to a success rate ~ 1). Thanks for your suggestions, I will try them in the meantime. Did you use sparse reward for the curves? |
I did. You can also check the baselines results on the rl-baselines3-zoo repo. For Push, convergence occurs well before 1e6 timesteps. |
Can you please post a snippet for PandaPickAndPlace-v2 that learns using DDPG from SB3, to reproduce the results in the paper? I realize it might not be exactly equivalent with the results from the paper, but anything that learns should work for me I've tried this, but it does not work: `env = gym.make("PandaPickAndPlace-v2") model = DDPG(policy="MultiInputPolicy", env=env, replay_buffer_class=HerReplayBuffer, verbose=1, batch_size= 2048, buffer_size=1000000) model.learn(total_timesteps=4000000)` Thanks! |
You can use rl-baselines3-zoo to train PandaPush-v2. You just need to paste these hyperparameters in PandaPush-v2:
env_wrapper: sb3_contrib.common.wrappers.TimeFeatureWrapper
n_timesteps: !!float 1e6
policy: 'MultiInputPolicy'
buffer_size: 1000000
batch_size: 2048
gamma: 0.95
learning_rate: !!float 1e-3
noise_type: 'normal'
noise_std: 0.1
replay_buffer_class: HerReplayBuffer
replay_buffer_kwargs: "dict(
online_sampling=True,
goal_selection_strategy='future',
n_sampled_goal=4,
)"
policy_kwargs: "dict(net_arch=[512, 512, 512], n_critics=2)" then run python train.py --algo ddpg --env PandaPush-v2 Here is the result you will get: I should also converge with |
Hi,
I am trying to recreate your results from the paper 'panda-gym: Open-source goal-conditioned environments for robotic learning', and the code given in train_push.py does not seem to work with the default parameters.
Can you point me to the RL code you used to get those results? Also, are the learning curves in the paper from Sparse reward setting or Dense?
Thanks!
The text was updated successfully, but these errors were encountered: