Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix PPO #47

Merged
merged 2 commits into from
Jan 13, 2019
Merged

Fix PPO #47

merged 2 commits into from
Jan 13, 2019

Conversation

jvmncs
Copy link
Owner

@jvmncs jvmncs commented Jan 13, 2019

  • Fixes a bug in the PPO loss calculation in the PPOBase.learn method
  • Normalizes advantages before using them in the PPO loss calculation (Normalize rewards #46)

@jvmncs jvmncs merged commit 5ae2860 into master Jan 13, 2019
@jvmncs jvmncs mentioned this pull request Jan 13, 2019
alok added a commit to alok/safe-grid-agents that referenced this pull request Jan 16, 2019
* master:
  exclude entropy bonus coeff in tensorboard entropy monitoring (jvmncs#48)
  Fix PPO (jvmncs#47)
  Small improvements (jvmncs#45)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant