Why don't you use two networks for stabilization? #1

Fjoelsak · 2022-06-30T16:01:39Z

Hi,

I thought DQN would be tricky regarding convergence and that you more or less need the fixed q-targets and experience replay as workaround. However, looking at your code I see a replay buffer but you use the same network for action choice online and adjusting the one-step lookahead update of your q-values. Is there any reason for that?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why don't you use two networks for stabilization? #1

Why don't you use two networks for stabilization? #1

Fjoelsak commented Jun 30, 2022

Why don't you use two networks for stabilization? #1

Why don't you use two networks for stabilization? #1

Comments

Fjoelsak commented Jun 30, 2022