Q-learning formula

Q-learning formula (18.3.10) seems to be only for non-terminal states.
If St is one of terminal states (gold or traps), Q table should not be renewed and should keep the initial values (zeros).
Codes in the method _learn of the class Agent could be revised:
    if done:
          q_target = r                           
    else:
          q_target = r + self.gamma*np.max(q_table[next_s]) 
    |-->q_table[s][a] += self.lr * (q_target - q_val)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Q-learning formula #158

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Q-learning formula #158

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions