New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

RL A2C #460

Open

junxnone opened this issue Dec 4, 2023 · 0 comments

Owner

junxnone commented Dec 4, 2023 •

edited

Loading

Advantage Actor Critic

A2C
Q(s,a) = V (s) + A (s, a)
- V (s) 为状态值函数
- A (s, a) 为优势值-
优势函数评估在给定状态下与其他行为相比更好的行为
引入了并行架构，各个 worker 都会独立的跟自己的环境去交互，得到独立的采样经验，而这些经验之间也是相互独立的，这样就打破了经验之间的耦合，起到跟 Experiencre Replay 相当的效果

The text was updated successfully, but these errors were encountered:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment