Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why don't you use two networks for stabilization? #1

Open
Fjoelsak opened this issue Jun 30, 2022 · 0 comments
Open

Why don't you use two networks for stabilization? #1

Fjoelsak opened this issue Jun 30, 2022 · 0 comments

Comments

@Fjoelsak
Copy link

Hi,

I thought DQN would be tricky regarding convergence and that you more or less need the fixed q-targets and experience replay as workaround. However, looking at your code I see a replay buffer but you use the same network for action choice online and adjusting the one-step lookahead update of your q-values. Is there any reason for that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant