Replace ContinuousMLPQFunction #892

ahtsan · 2019-09-23T07:56:44Z

This PR does the following:

Remove all occurrence of ContinuousMLPQFunction (old one)
Rename ContinuousMLPQFunctionWithModel to ContinuousMLPQFunction
Remove corresponding transition test (test_continuous_mlp_q_function_with_model_transit.py)

Summary: In most cases model-version performs better or at least as good as the old one.

For benchmark scripts, checkout the branch origin/benchmark_continuous_mlp_q_function.

Benchmark results:

This and #888 will close #526.

krzentner

LGTM

ryanjulian · 2019-09-23T22:30:47Z

Overall this LGTM.

I'm a little concerned about a couple of the results on HalfCheetah and Walker2d (the hardest environments). It a couple cases seem inconsistent. I don't think it's indicative of a problem with the primitive, but instead it may be the algorithm.

Can you check that we're using the same exploration strategy for both the baseline and our algo? I think @utkarshjp7 may have also unearthed some subtle differences between our DDPG and the baselines one, which might explain some differences.

ahtsan · 2019-09-24T00:37:18Z

Overall this LGTM.

I'm a little concerned about a couple of the results on HalfCheetah and Walker2d (the hardest environments). It a couple cases seem inconsistent. I don't think it's indicative of a problem with the primitive, but instead it may be the algorithm.

Can you check that we're using the same exploration strategy for both the baseline and our algo? I think @utkarshjp7 may have also unearthed some subtle differences between our DDPG and the baselines one, which might explain some differences.

Utkarsh actually thought DDPG is fine, he just found something for TD3. I will have another pass on it and run benchmark on these two harder environments and see if it persist.

ahtsan · 2019-09-24T04:02:12Z

Overall this LGTM.

I'm a little concerned about a couple of the results on HalfCheetah and Walker2d (the hardest environments). It a couple cases seem inconsistent. I don't think it's indicative of a problem with the primitive, but instead it may be the algorithm.

Can you check that we're using the same exploration strategy for both the baseline and our algo? I think @utkarshjp7 may have also unearthed some subtle differences between our DDPG and the baselines one, which might explain some differences.

I ran benchmark again and it looks better. Perhaps it's a matter of random seed?

ryanjulian · 2019-09-24T20:49:43Z

Yes I suspect it probably has something to do with these environments sensitivity to initialization and/or exploration.

Here's a way you can check (which will help people in the future running benchmarks):

Run the same 2 experiments here but using 30 random seeds. Plots the mean results (across all 30 seeds) for garage and garage_models (that's 2 curves) with a confidence interval. This is very easy to do with seaborn and pandas.

I suspect we will find that the confidence interval is wider than the variation we are seeing on the plots.

ahtsan · 2019-09-25T15:30:17Z

I ran for 10 trials and model version outperforms the old version.

HalfCheetah-v2:
Old

Model

Walker2d-v2
Old

Model

Code is in origin/benchmark_continuous_mlp_q_function

ryanjulian · 2019-09-25T19:54:21Z

Awesome! Feel free to merge this.

If you can somehow add the code you used to produce these plots to master (and put both curves on the same plot), I think it would be super helpful for future benchmarking. At least post a branch or a gist for the people currently working on the models transition.

codecov · 2019-09-27T00:03:46Z

Codecov Report

Merging #892 into master will increase coverage by 23.23%.
The diff coverage is 100%.

@@             Coverage Diff             @@
##           master     #892       +/-   ##
===========================================
+ Coverage   50.84%   74.08%   +23.23%     
===========================================
  Files         181      180        -1     
  Lines       10301    10250       -51     
  Branches     1323     1317        -6     
===========================================
+ Hits         5238     7594     +2356     
+ Misses       4814     2323     -2491     
- Partials      249      333       +84

Impacted Files	Coverage Δ
src/garage/tf/algos/td3.py	`92.23% <100%> (+81.44%)`	⬆️
src/garage/tf/algos/ddpg.py	`88.59% <100%> (+4.13%)`	⬆️
...garage/tf/q_functions/continuous_mlp_q_function.py	`100% <100%> (+16.98%)`	⬆️
src/garage/tf/q_functions/base.py	`47.36% <0%> (-26.32%)`	⬇️
src/garage/misc/krylov.py	`17.94% <0%> (ø)`	⬆️
...rc/garage/tf/optimizers/penalty_lbfgs_optimizer.py	`95.06% <0%> (+1.23%)`	⬆️
src/garage/envs/util.py	`20% <0%> (+1.66%)`	⬆️
src/garage/tf/algos/batch_polopt.py	`94.79% <0%> (+2.08%)`	⬆️
src/garage/tf/algos/dqn.py	`96.8% <0%> (+2.12%)`	⬆️
src/garage/plotter/plotter.py	`31.81% <0%> (+2.27%)`	⬆️
... and 106 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bd0465a...97afa00. Read the comment docs.

ahtsan requested review from ryanjulian, nish21 and krzentner September 23, 2019 07:56

ahtsan requested a review from a team as a code owner September 23, 2019 07:56

nish21 approved these changes Sep 23, 2019

View reviewed changes

krzentner approved these changes Sep 23, 2019

View reviewed changes

ahtsan force-pushed the replace_continuous_mlp_q_function_with_model branch 2 times, most recently from 8b7e6e0 to ea36f13 Compare September 26, 2019 05:45

Replace continuous_mlp_q_function with model

cefde74

ahtsan force-pushed the replace_continuous_mlp_q_function_with_model branch from ea36f13 to cefde74 Compare September 26, 2019 23:10

Modify td3 for new API

97afa00

ahtsan merged commit d2b072e into master Sep 27, 2019

ahtsan deleted the replace_continuous_mlp_q_function_with_model branch September 27, 2019 15:13

ryanjulian mentioned this pull request Oct 2, 2019

Merge benchmark script for ContinuousMLPQFunction #907

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace ContinuousMLPQFunction #892

Replace ContinuousMLPQFunction #892

ahtsan commented Sep 23, 2019

krzentner left a comment

ryanjulian commented Sep 23, 2019

ahtsan commented Sep 24, 2019

ahtsan commented Sep 24, 2019 •

edited

Loading

ryanjulian commented Sep 24, 2019

ahtsan commented Sep 25, 2019

ryanjulian commented Sep 25, 2019

codecov bot commented Sep 27, 2019 •

edited

Loading

Replace ContinuousMLPQFunction #892

Replace ContinuousMLPQFunction #892

Conversation

ahtsan commented Sep 23, 2019

krzentner left a comment

Choose a reason for hiding this comment

ryanjulian commented Sep 23, 2019

ahtsan commented Sep 24, 2019

ahtsan commented Sep 24, 2019 • edited Loading

ryanjulian commented Sep 24, 2019

ahtsan commented Sep 25, 2019

ryanjulian commented Sep 25, 2019

codecov bot commented Sep 27, 2019 • edited Loading

Codecov Report

ahtsan commented Sep 24, 2019 •

edited

Loading

codecov bot commented Sep 27, 2019 •

edited

Loading