Why are you using SARSA instead of Q-Learning? #94

laz8 · 2020-03-19T17:21:12Z

You are doing Q-Learning:

            # get action for the current state and go one step in environment
            action = agent.get_action(state)
            next_state, reward, done, info = env.step(action)

reinforcement-learning/2-cartpole/2-double-dqn/cartpole_ddqn.py

Line 111 in 2fe6984

target[i][action[i]] = reward[i] + self.discount_factor * (

But isn't that SARSA?

                a = np.argmax(target_next[i])
                target[i][action[i]] = reward[i] + self.discount_factor * (target_val[i][a])

Is that a mistake or is that a valid approach? I'm new to RL...

The text was updated successfully, but these errors were encountered:

laz8 · 2020-05-31T13:48:16Z

Closed, i was confused by different versions of a DDQN.

It is explained here:

What makes this network a Double DQN?

The Bellman equation used to calculate the Q values to update the online network follows the equation:

value = reward + discount_factor * target_network.predict(next_state)[argmax(online_network.predict(next_state))]

The Bellman equation used to calculate the Q value updates in the original (vanilla) DQN[1] is:

value = reward + discount_factor * max(target_network.predict(next_state))

The difference is that, using the terminology of the field, the second equation uses the target network for both SELECTING and EVALUATING the action to take whereas the first equation uses the online network for SELECTING the action to take and the target network for EVALUATING the action. Selection here means choosing which action to take, and evaluation means getting the projected Q value for that action. This form of the Bellman equation is what makes this agent a Double DQN and not just a DQN and was introduced in [2].

https://medium.com/@leosimmons/double-dqn-implementation-to-solve-openai-gyms-cartpole-v-0-df554cd0614d

And also the names confused me, everything is a target, you renamed a lot of stuff that makes it harder to understand your code.

But it seems to be correct.

laz8 closed this as completed May 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why are you using SARSA instead of Q-Learning? #94

Why are you using SARSA instead of Q-Learning? #94

laz8 commented Mar 19, 2020 •

edited

laz8 commented May 31, 2020 •

edited

Why are you using SARSA instead of Q-Learning? #94

Why are you using SARSA instead of Q-Learning? #94

Comments

laz8 commented Mar 19, 2020 • edited

laz8 commented May 31, 2020 • edited

laz8 commented Mar 19, 2020 •

edited

laz8 commented May 31, 2020 •

edited