Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Actor model finetuning code based on reward and policy gradient #13

Closed
parshinsh opened this issue Sep 1, 2022 · 1 comment
Closed

Comments

@parshinsh
Copy link

parshinsh commented Sep 1, 2022

Thanks for the great work! Is it possible that you can share the code of whole RL framework finetuning (Actor & Critic updates based on the reward defined in the paper) for better reproducibility? For example, the code of updating Actor network based on reward and policy gradient is missing.

@parshinsh parshinsh changed the title Actor model finetuning code Actor model finetuning code based on reward and policy gradient Nov 18, 2022
@henryhungle
Copy link
Collaborator

@parshinsh we updated the code for finetuning the actor model with synthetic samples and their return estimates.

Thank you for your patience!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants