Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace ContinuousMLPQFunction #892

Merged
merged 2 commits into from
Sep 27, 2019

Conversation

ahtsan
Copy link
Contributor

@ahtsan ahtsan commented Sep 23, 2019

This PR does the following:

  • Remove all occurrence of ContinuousMLPQFunction (old one)
  • Rename ContinuousMLPQFunctionWithModel to ContinuousMLPQFunction
  • Remove corresponding transition test (test_continuous_mlp_q_function_with_model_transit.py)

Summary: In most cases model-version performs better or at least as good as the old one.

For benchmark scripts, checkout the branch origin/benchmark_continuous_mlp_q_function.

Benchmark results:
image
image
image
image
image
image
image

This and #888 will close #526.

@ahtsan ahtsan requested a review from a team as a code owner September 23, 2019 07:56
Copy link
Contributor

@krzentner krzentner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ryanjulian
Copy link
Member

Overall this LGTM.

I'm a little concerned about a couple of the results on HalfCheetah and Walker2d (the hardest environments). It a couple cases seem inconsistent. I don't think it's indicative of a problem with the primitive, but instead it may be the algorithm.

Can you check that we're using the same exploration strategy for both the baseline and our algo? I think @utkarshjp7 may have also unearthed some subtle differences between our DDPG and the baselines one, which might explain some differences.

@ahtsan
Copy link
Contributor Author

ahtsan commented Sep 24, 2019

Overall this LGTM.

I'm a little concerned about a couple of the results on HalfCheetah and Walker2d (the hardest environments). It a couple cases seem inconsistent. I don't think it's indicative of a problem with the primitive, but instead it may be the algorithm.

Can you check that we're using the same exploration strategy for both the baseline and our algo? I think @utkarshjp7 may have also unearthed some subtle differences between our DDPG and the baselines one, which might explain some differences.

Utkarsh actually thought DDPG is fine, he just found something for TD3. I will have another pass on it and run benchmark on these two harder environments and see if it persist.

@ahtsan
Copy link
Contributor Author

ahtsan commented Sep 24, 2019

Overall this LGTM.

I'm a little concerned about a couple of the results on HalfCheetah and Walker2d (the hardest environments). It a couple cases seem inconsistent. I don't think it's indicative of a problem with the primitive, but instead it may be the algorithm.

Can you check that we're using the same exploration strategy for both the baseline and our algo? I think @utkarshjp7 may have also unearthed some subtle differences between our DDPG and the baselines one, which might explain some differences.

I ran benchmark again and it looks better. Perhaps it's a matter of random seed?
image
image

@ryanjulian
Copy link
Member

Yes I suspect it probably has something to do with these environments sensitivity to initialization and/or exploration.

Here's a way you can check (which will help people in the future running benchmarks):

Run the same 2 experiments here but using 30 random seeds. Plots the mean results (across all 30 seeds) for garage and garage_models (that's 2 curves) with a confidence interval. This is very easy to do with seaborn and pandas.

I suspect we will find that the confidence interval is wider than the variation we are seeing on the plots.

@ahtsan
Copy link
Contributor Author

ahtsan commented Sep 25, 2019

I ran for 10 trials and model version outperforms the old version.

HalfCheetah-v2:
Old
image
Model
image
Walker2d-v2
Old
image
Model
image

Code is in origin/benchmark_continuous_mlp_q_function

@ryanjulian
Copy link
Member

Awesome! Feel free to merge this.

If you can somehow add the code you used to produce these plots to master (and put both curves on the same plot), I think it would be super helpful for future benchmarking. At least post a branch or a gist for the people currently working on the models transition.

@ahtsan ahtsan force-pushed the replace_continuous_mlp_q_function_with_model branch 2 times, most recently from 8b7e6e0 to ea36f13 Compare September 26, 2019 05:45
@ahtsan ahtsan force-pushed the replace_continuous_mlp_q_function_with_model branch from ea36f13 to cefde74 Compare September 26, 2019 23:10
@codecov
Copy link

codecov bot commented Sep 27, 2019

Codecov Report

Merging #892 into master will increase coverage by 23.23%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #892       +/-   ##
===========================================
+ Coverage   50.84%   74.08%   +23.23%     
===========================================
  Files         181      180        -1     
  Lines       10301    10250       -51     
  Branches     1323     1317        -6     
===========================================
+ Hits         5238     7594     +2356     
+ Misses       4814     2323     -2491     
- Partials      249      333       +84
Impacted Files Coverage Δ
src/garage/tf/algos/td3.py 92.23% <100%> (+81.44%) ⬆️
src/garage/tf/algos/ddpg.py 88.59% <100%> (+4.13%) ⬆️
...garage/tf/q_functions/continuous_mlp_q_function.py 100% <100%> (+16.98%) ⬆️
src/garage/tf/q_functions/base.py 47.36% <0%> (-26.32%) ⬇️
src/garage/misc/krylov.py 17.94% <0%> (ø) ⬆️
...rc/garage/tf/optimizers/penalty_lbfgs_optimizer.py 95.06% <0%> (+1.23%) ⬆️
src/garage/envs/util.py 20% <0%> (+1.66%) ⬆️
src/garage/tf/algos/batch_polopt.py 94.79% <0%> (+2.08%) ⬆️
src/garage/tf/algos/dqn.py 96.8% <0%> (+2.12%) ⬆️
src/garage/plotter/plotter.py 31.81% <0%> (+2.27%) ⬆️
... and 106 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bd0465a...97afa00. Read the comment docs.

@ahtsan ahtsan merged commit d2b072e into master Sep 27, 2019
@ahtsan ahtsan deleted the replace_continuous_mlp_q_function_with_model branch September 27, 2019 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor ContinuousMLPPolicy & ContinuousMLPQFunction to use garage.tf.Model
4 participants