Skip to content

Q-learning formula #158

@Unamu7simure

Description

@Unamu7simure

Q-learning formula (18.3.10) seems to be only for non-terminal states.
If St is one of terminal states (gold or traps), Q table should not be renewed and should keep the initial values (zeros).
Codes in the method _learn of the class Agent could be revised:
if done:
q_target = r
else:
q_target = r + self.gamma*np.max(q_table[next_s])
|-->q_table[s][a] += self.lr * (q_target - q_val)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions