Documentation: Question about expected return #376

aflgit · 2022-11-14T12:32:10Z

Hi,

going through the RL Intro I stumbled something that is not yet clear to me. On https://spinningup.openai.com/en/latest/spinningup/rl_intro.html#the-rl-problem
for the expected return it says
.. math:: J(\pi) = \int_{\tau} P(\tau|\pi) R(\tau) = \underE{\tau\sim \pi}{R(\tau)}
where I would have expected
.. math:: J(\pi) = \int_{\tau} P(\tau|\pi) R(\tau) = \underE{\tau\sim P}{R(\tau)}

My understanding is that \tau is a RV distributed with respect to P, and only the actions are taken from \pi, as later clearly differentiated on https://spinningup.openai.com/en/latest/spinningup/rl_intro.html#bellman-equations

Please, can someone explain me why it says \tau\sim \pi?

Thank you very much in advance!

Alberto-Hache · 2022-11-17T17:08:24Z

Hi, @aflgit. I'll try to explain how I understand this:

P does determine the next state of the environment (based on previous state and the action taken), therefore you have P(s_t+1 | s_t, a_t).
However, the expression you highlight reflects the expected outcome of a trajectory by an agent that follows a given policy pi, i.e. sampling from it. The fact that the trajectory is indeed affected by other factors like P is simply implicit in the expression.

Hope it helps.

Alberto

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation: Question about expected return #376

Documentation: Question about expected return #376

aflgit commented Nov 14, 2022

Alberto-Hache commented Nov 17, 2022

Documentation: Question about expected return #376

Documentation: Question about expected return #376

Comments

aflgit commented Nov 14, 2022

Alberto-Hache commented Nov 17, 2022