-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About the operation in updating policy net #16
Comments
Hi, The RL part is a common implementation of policy gradient with baseline. The overall implementation is aligned with the formulation. Can you be more specific on your questions? Thanks, |
Sorry, my previous statement was not clear. My questions are as follows.
|
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Can you explain the goal in computing value loss and action loss when you update the policy net? I don't think that the way to updata net is consistent with the formula in your paper.
Or what should I understand?
The text was updated successfully, but these errors were encountered: