Skip to content

Latest commit

 

History

History
19 lines (16 loc) · 1.14 KB

Passive-TD-Agent.md

File metadata and controls

19 lines (16 loc) · 1.14 KB

PASSIVE-TD-AGENT

AIMA3e

function Passive-TD-Agent(percept) returns an action
inputs: percept, a percept indication the current state s' and reward signal r'
persistent: π, a fixed policy
       U, a table of utilities, initially empty
       Ns, a table of frequencies for states, initially zero
       s, a, r, the previous state, action, and reward, initially null

if s' is new then U[s'] ← r'
if s is not null then
   increment Ns[s]
   U[s] ← U[s] + α(Ns[s])(r + γ U[s'] - U[s])
if s'.Terminal? then s, a, r ← null else s, a, rs', π[s'], r'
 return a


Figure ?? A passive reinforcement learning agent that learns utility estimates using temporal differences. The step-size function α(n) is chosen to ensure convergence, as described in the text.