-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Open
Description
Q-learning formula (18.3.10) seems to be only for non-terminal states.
If St is one of terminal states (gold or traps), Q table should not be renewed and should keep the initial values (zeros).
Codes in the method _learn of the class Agent could be revised:
if done:
q_target = r
else:
q_target = r + self.gamma*np.max(q_table[next_s])
|-->q_table[s][a] += self.lr * (q_target - q_val)
Metadata
Metadata
Assignees
Labels
No labels