Actor model finetuning code based on reward and policy gradient #13

parshinsh · 2022-09-01T02:36:04Z

Thanks for the great work! Is it possible that you can share the code of whole RL framework finetuning (Actor & Critic updates based on the reward defined in the paper) for better reproducibility? For example, the code of updating Actor network based on reward and policy gradient is missing.

henryhungle · 2022-11-18T16:20:59Z

@parshinsh we updated the code for finetuning the actor model with synthetic samples and their return estimates.

Thank you for your patience!

parshinsh changed the title ~~Actor model finetuning code~~ Actor model finetuning code based on reward and policy gradient Nov 18, 2022

henryhungle mentioned this issue Nov 18, 2022

Finetuned model checkpoints #2

Closed

henryhungle closed this as completed Mar 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Actor model finetuning code based on reward and policy gradient #13

Actor model finetuning code based on reward and policy gradient #13

parshinsh commented Sep 1, 2022 •

edited

henryhungle commented Nov 18, 2022

Actor model finetuning code based on reward and policy gradient #13

Actor model finetuning code based on reward and policy gradient #13

Comments

parshinsh commented Sep 1, 2022 • edited

henryhungle commented Nov 18, 2022

parshinsh commented Sep 1, 2022 •

edited