Fix policy entropy #368

CatherineSue · 2018-10-29T17:18:15Z

This commit fixes policy entropy. Currently it uses the following three
options:
• maximum entropy (augmented reward)
• with a logli estimator (-logli(action|policy)
• with the gradient turned off (tf.stop_gradient(entropy))

See: #281

codecov · 2018-10-29T17:41:58Z

Codecov Report

Merging #368 into master will increase coverage by 0.1%.
The diff coverage is 91.66%.

@@            Coverage Diff            @@
##           master     #368     +/-   ##
=========================================
+ Coverage   64.62%   64.72%   +0.1%     
=========================================
  Files         212      212             
  Lines       13977    13985      +8     
=========================================
+ Hits         9032     9052     +20     
+ Misses       4945     4933     -12

Impacted Files	Coverage Δ
garage/tf/algos/npo.py	`95.02% <91.66%> (-0.32%)`	⬇️
garage/tf/distributions/diagonal_gaussian.py	`66.66% <0%> (-5%)`	⬇️
garage/envs/mujoco/gather/gather_env.py	`57.57% <0%> (+0.67%)`	⬆️
.../theano/optimizers/conjugate_gradient_optimizer.py	`74.05% <0%> (+1.26%)`	⬆️
garage/envs/mujoco/maze/maze_env.py	`82.21% <0%> (+1.44%)`	⬆️
garage/theano/optimizers/first_order_optimizer.py	`85.5% <0%> (+1.44%)`	⬆️
garage/tf/distributions/categorical.py	`66.66% <0%> (+1.51%)`	⬆️
garage/tf/distributions/recurrent_categorical.py	`79.16% <0%> (+4.16%)`	⬆️
garage/envs/mujoco/point_env.py	`77.77% <0%> (+11.11%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e89ce7a...3589a48. Read the comment docs.

ryanjulian

Did this fix the behavior where the entropy term has no effect?

ryanjulian · 2018-10-29T17:43:03Z

garage/tf/algos/npo.py

-            policy_entropy = tf.reduce_mean(policy_entropy * i.valid_var)
+            policy_entropy = policy_entropy * i.valid_var
+
+            if self._stop_entropy_gradients:


stop_entropy_gradient

ryanjulian · 2018-10-29T17:43:13Z

garage/tf/algos/npo.py

                 policy=None,
                 policy_ent_coeff=1e-2,
                 use_softplus_entropy=True,
+                 log_estimate_entropy=True,


use_logli_entropy

ryanjulian · 2018-10-29T17:43:25Z

garage/tf/algos/npo.py

                 name="NPO",
                 policy=None,
                 policy_ent_coeff=1e-2,
                 use_softplus_entropy=True,


is this still needed?

coveralls · 2018-10-30T06:42:21Z

Coverage increased (+0.09%) to 65.171% when pulling 3589a48 on fix_pol_ent into e89ce7a on master.

garage/tf/algos/npo.py

zhanpenghe

Looks good to me.

zhanpenghe

Can you please check the example files or benchmarking comments? I believe there should be some changes with hyper parameters.

For not using negative log likelihood as entropy term, since you take out the mean now, the scale of the coefficient might be different too.

ryanjulian · 2018-10-30T20:20:28Z

Do any of the examples even use entropy?

zhanpenghe · 2018-10-30T20:23:17Z

policy_ent_coeff was by default set as 1e-2 before so any of the examples that does not explicit set it to 0 would use entropy.

CatherineSue · 2018-10-30T20:57:54Z

The default value of use_logli_entropy is True, so every related example and benchmark will be using negative log likelihood as entropy term if they don't set policy_ent_coeff to 0.

CatherineSue · 2018-10-30T21:55:54Z

I changed the default policy_ent_coeff to 0.0.

zhanpenghe

Ok. Please update the comments about benchmarking in the examples

This commit fixes policy entropy. Currently it uses the following three options: • maximum entropy (augmented reward) • with a logli estimator (-logli(action|policy) • with the gradient turned off (tf.stop_gradient(entropy))

CatherineSue requested a review from a team as a code owner October 29, 2018 17:18

ryanjulian reviewed Oct 29, 2018

View reviewed changes

CatherineSue force-pushed the fix_pol_ent branch from 79ab598 to ca1a136 Compare October 30, 2018 06:15

ryanjulian approved these changes Oct 30, 2018

View reviewed changes

ryanjulian requested a review from zhanpenghe October 30, 2018 18:06

zhanpenghe reviewed Oct 30, 2018

View reviewed changes

garage/tf/algos/npo.py Outdated Show resolved Hide resolved

zhanpenghe reviewed Oct 30, 2018

View reviewed changes

CatherineSue force-pushed the fix_pol_ent branch 3 times, most recently from 3d832b6 to c8c02e8 Compare October 30, 2018 21:55

zhanpenghe approved these changes Oct 30, 2018

View reviewed changes

Fix policy entropy

3589a48

This commit fixes policy entropy. Currently it uses the following three options: • maximum entropy (augmented reward) • with a logli estimator (-logli(action|policy) • with the gradient turned off (tf.stop_gradient(entropy))

CatherineSue force-pushed the fix_pol_ent branch from c8c02e8 to 3589a48 Compare October 30, 2018 22:16

CatherineSue merged commit 26c14a3 into master Oct 30, 2018

zhanpenghe deleted the fix_pol_ent branch October 31, 2018 17:58

ryanjulian mentioned this pull request Nov 2, 2018

Make test coverage deterministic #379

Closed

Fix policy entropy #368

Fix policy entropy #368

Uh oh!

Conversation

CatherineSue commented Oct 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Oct 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ryanjulian left a comment

Choose a reason for hiding this comment

Uh oh!

ryanjulian Oct 29, 2018

Choose a reason for hiding this comment

Uh oh!

ryanjulian Oct 29, 2018

Choose a reason for hiding this comment

Uh oh!

ryanjulian Oct 29, 2018

Choose a reason for hiding this comment

Uh oh!

CatherineSue Oct 30, 2018

Choose a reason for hiding this comment

Uh oh!

coveralls commented Oct 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

zhanpenghe left a comment

Choose a reason for hiding this comment

Uh oh!

zhanpenghe left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryanjulian commented Oct 30, 2018

Uh oh!

zhanpenghe commented Oct 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CatherineSue commented Oct 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CatherineSue commented Oct 30, 2018

Uh oh!

zhanpenghe left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

CatherineSue commented Oct 29, 2018 •

edited

Loading

codecov bot commented Oct 29, 2018 •

edited

Loading

coveralls commented Oct 30, 2018 •

edited

Loading

zhanpenghe left a comment •

edited

Loading

zhanpenghe commented Oct 30, 2018 •

edited

Loading

CatherineSue commented Oct 30, 2018 •

edited

Loading