Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RL A2C #460

Open
junxnone opened this issue Dec 4, 2023 · 0 comments
Open

RL A2C #460

junxnone opened this issue Dec 4, 2023 · 0 comments

Comments

@junxnone
Copy link
Owner

junxnone commented Dec 4, 2023

Advantage Actor Critic

  • A2C
  • Q(s,a) = V (s) + A (s, a)
    • V (s) 为状态值函数
    • A (s, a) 为优势值-
  • 优势函数评估在给定状态下与其他行为相比更好的行为
  • 引入了并行架构,各个 worker 都会独立的跟自己的环境去交互,得到独立的采样经验,而这些经验之间也是相互独立的,这样就打破了经验之间的耦合,起到跟 Experiencre Replay 相当的效果

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant