Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About Variance Reduction Method #20

Closed
daiquanyu opened this issue Mar 21, 2018 · 1 comment
Closed

About Variance Reduction Method #20

daiquanyu opened this issue Mar 21, 2018 · 1 comment

Comments

@daiquanyu
Copy link

In section 2.1 of the paper, the authors mention that the reward term is replaced with its advantage function. I have read the source code, but I still have some questions about the implementation of the variance reduction method.

  1. how is it implemented in the code?
  2. if the baseline function is set to a constant as I found (0.5 in the code ?), how this constant is obtained?
  3. the parameters are updated with the training, should the baseline function be set to different values?
  4. what if we just use simple policy gradient?

Look forward to your reply. Thanks.

@LantaoYu
Copy link
Collaborator

Yes, in this implementation, we simply use a constant baseline function (0.5), which is approximately the average of the reward for all actions. Although the parameters are continuously updated, the expectation of all the rewards remain close to 0.5 and empirically using the expected reward as a baseline performs well. Indeed, the optimal baseline should be the expected reward weighted by gradient magnitudes, but for simplicity, we didn't use that. However, it is well known that naive policy gradient suffer from the high variance and advantage function is an effective way to reduce the variance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants