Fix PPO #47

jvmncs · 2019-01-13T22:22:49Z

Fixes a bug in the PPO loss calculation in the PPOBase.learn method
Normalizes advantages before using them in the PPO loss calculation (Normalize rewards #46)

* master: exclude entropy bonus coeff in tensorboard entropy monitoring (jvmncs#48) Fix PPO (jvmncs#47) Small improvements (jvmncs#45)

jvmncs added 2 commits January 13, 2019 16:50

fix PPO learning loop shape

f44a7ae

normalize advantages

317b849

jvmncs merged commit 5ae2860 into master Jan 13, 2019

jvmncs mentioned this pull request Jan 13, 2019

Normalize rewards #46

Closed

alok added a commit to alok/safe-grid-agents that referenced this pull request Jan 16, 2019

Merge branch 'master' into tune

7a95b2e

* master: exclude entropy bonus coeff in tensorboard entropy monitoring (jvmncs#48) Fix PPO (jvmncs#47) Small improvements (jvmncs#45)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix PPO #47

Fix PPO #47

jvmncs commented Jan 13, 2019

Fix PPO #47

Fix PPO #47

Conversation

jvmncs commented Jan 13, 2019