You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Closed, i was confused by different versions of a DDQN.
It is explained here:
What makes this network a Double DQN?
The Bellman equation used to calculate the Q values to update the online network follows the equation:
value = reward + discount_factor * target_network.predict(next_state)[argmax(online_network.predict(next_state))]
The Bellman equation used to calculate the Q value updates in the original (vanilla) DQN[1] is:
value = reward + discount_factor * max(target_network.predict(next_state))
The difference is that, using the terminology of the field, the second equation uses the target network for both SELECTING and EVALUATING the action to take whereas the first equation uses the online network for SELECTING the action to take and the target network for EVALUATING the action. Selection here means choosing which action to take, and evaluation means getting the projected Q value for that action. This form of the Bellman equation is what makes this agent a Double DQN and not just a DQN and was introduced in [2].
You are doing Q-Learning:
reinforcement-learning/2-cartpole/2-double-dqn/cartpole_ddqn.py
Line 111 in 2fe6984
But isn't that SARSA?
Is that a mistake or is that a valid approach? I'm new to RL...
The text was updated successfully, but these errors were encountered: