using target network to calculate last state value #93

backpropper · 2020-08-20T19:10:14Z

Is there a reference to why we use a separate target network to calculate the last state value?

Line 88 in 19db772

prediction = self.target_network(self.states)

ShangtongZhang · 2020-08-20T19:18:25Z

No that's an ad-hoc decision.

backpropper · 2020-08-20T19:20:15Z

I see. Did it help reduce variance or something else?

ShangtongZhang · 2020-08-20T19:20:58Z

Not sure. I didn't test the one without target network, just followed DQN

backpropper · 2020-08-20T19:21:15Z

ok thanks!

backpropper closed this as completed Aug 20, 2020

Provide feedback