You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I thought DQN would be tricky regarding convergence and that you more or less need the fixed q-targets and experience replay as workaround. However, looking at your code I see a replay buffer but you use the same network for action choice online and adjusting the one-step lookahead update of your q-values. Is there any reason for that?
The text was updated successfully, but these errors were encountered:
Hi,
I thought DQN would be tricky regarding convergence and that you more or less need the fixed q-targets and experience replay as workaround. However, looking at your code I see a replay buffer but you use the same network for action choice online and adjusting the one-step lookahead update of your q-values. Is there any reason for that?
The text was updated successfully, but these errors were encountered: