Regarding target calculations in DuelDDQN and indices #8

EXJUSTICE · 2020-09-27T10:13:53Z

Hi Phil,

I implemented your DuelDDQN architecture for myself, and was curious as to the following snippet of the learning function, as my question wasn't detailed in the course.

        q_pred = T.add(V_s,
                        (A_s - A_s.mean(dim=1, keepdim=True)))[indices, actions]

        q_next = T.add(V_s_, (A_s_ - A_s_.mean(dim=1, keepdim=True)))

        q_eval = T.add(V_s_eval, (A_s_eval - A_s_eval.mean(dim=1,keepdim=True)))

Why is it that only Q_pred is muliplied by the action maatrix, is it because it represents the actions we have just taken in the current state? Are all of these q_value matrices of the same dimensions?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding target calculations in DuelDDQN and indices #8

Regarding target calculations in DuelDDQN and indices #8

EXJUSTICE commented Sep 27, 2020

Regarding target calculations in DuelDDQN and indices #8

Regarding target calculations in DuelDDQN and indices #8

Comments

EXJUSTICE commented Sep 27, 2020