-
Notifications
You must be signed in to change notification settings - Fork 326
Fix policy entropy #368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix policy entropy #368
Conversation
Codecov Report
@@ Coverage Diff @@
## master #368 +/- ##
=========================================
+ Coverage 64.62% 64.72% +0.1%
=========================================
Files 212 212
Lines 13977 13985 +8
=========================================
+ Hits 9032 9052 +20
+ Misses 4945 4933 -12
Continue to review full report at Codecov.
|
ryanjulian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this fix the behavior where the entropy term has no effect?
garage/tf/algos/npo.py
Outdated
| policy_entropy = tf.reduce_mean(policy_entropy * i.valid_var) | ||
| policy_entropy = policy_entropy * i.valid_var | ||
|
|
||
| if self._stop_entropy_gradients: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stop_entropy_gradient
garage/tf/algos/npo.py
Outdated
| policy=None, | ||
| policy_ent_coeff=1e-2, | ||
| use_softplus_entropy=True, | ||
| log_estimate_entropy=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use_logli_entropy
| name="NPO", | ||
| policy=None, | ||
| policy_ent_coeff=1e-2, | ||
| use_softplus_entropy=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
79ab598 to
ca1a136
Compare
zhanpenghe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please check the example files or benchmarking comments? I believe there should be some changes with hyper parameters.
For not using negative log likelihood as entropy term, since you take out the mean now, the scale of the coefficient might be different too.
|
Do any of the examples even use entropy? |
|
|
|
The default value of |
3d832b6 to
c8c02e8
Compare
|
I changed the default policy_ent_coeff to 0.0. |
zhanpenghe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Please update the comments about benchmarking in the examples
This commit fixes policy entropy. Currently it uses the following three options: • maximum entropy (augmented reward) • with a logli estimator (-logli(action|policy) • with the gradient turned off (tf.stop_gradient(entropy))
c8c02e8 to
3589a48
Compare
This commit fixes policy entropy. Currently it uses the following three
options:
• maximum entropy (augmented reward)
• with a logli estimator (-logli(action|policy)
• with the gradient turned off (tf.stop_gradient(entropy))
See: #281