Flawed Management of internal search environments in tree search planning #43

amarildolikmeta · 2020-06-24T08:56:04Z

Some tree search algorithms implemented might have flawed high performances because of the management of the environments inside the tree search. Specifically to conduct the search, the environment is copied and passed to the planners, but the environment seed is copied as well. This results in a kind of "foreseeing the future", because the planners optimize on the random realizations instead of in expectation. This happens in the OLOP planner and also in the deterministic planner (ODP). In the deterministic planner is not that serious since it is thought for deterministic transitions, but in practice if you run this planner with a stochastic environment, the effect is that it performs amazingly well (because it can "predict" the exact future realizations).
This can be easily fixed by setting a random seed to the environments after copying them for the planners, e.g. adding the seeding in the plan method of the planner.

eleurent · 2020-06-24T09:42:39Z

Hi @amarildolikmeta, that's absolutely right.

I initially experimented a lot with deterministic MDPs, which is why I didn't really notice this issue.
I eventually fixed it on a development branch (see e.g. a80279b), and completely forgot to backport the fix to master.

I am planning to merge this branch soon, and will check then that everything is in order.

amarildolikmeta · 2020-06-24T09:48:34Z

Hi @eleurent, thanks for the quick response.
Perfect than I will close the issue when the branches are merged.

eleurent · 2020-06-24T11:17:22Z

It should be fine now: https://github.com/eleurent/rl-agents/search?q=%22state.seed%22&unscoped_q=%22state.seed%22

saArbabi · 2020-07-13T06:52:46Z

@amarildolikmeta/ @eleurent ,
Could you clarify how seeding the environment object prior to each tree search iteration creates randomness? I am not sure what is actually being random.
Thanks.

amarildolikmeta · 2020-07-13T07:00:39Z

Depends on what you call randomness. What it achieves is the fact that the planner does not "see the future". Without changing the seed, the planner environments have the same seeds as the true environment which means that what the planner is actually doing is just choosing the best realization possible knowing what the outcome will be. If you change the seed the realization of the transitions inside the planning tree will be different from the ones in the "true" environment, and the planner will do what it is supposed to: optimize the values in expectation. On the other hand, this still makes the results reproducable.

…

On Mon, 13 Jul 2020, 08:52 saArbabi, ***@***.***> wrote: @amarildolikmeta/ @eleurent <https://github.com/eleurent> , Could you clarify how seeding the environment object prior to each tree search iteration creates randomness? I am not sure what is actually being random. Thanks. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#43 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEWGHXSXWBQ3L4GOHF444X3R3KVMTANCNFSM4OGOLKTA> .

saArbabi · 2020-07-14T06:42:19Z

Thanks for the reply, really appreciate it! You see, seeding the environment makes sense to me in a RL setting. During training, you want to seed the environment so that a) the agent is not trained in the same scenario all the time and b) you can reproduce experiments. However, in a tree search setting, I still lack understanding. I do not understand how seeding the environment object will randomize the transitions during planning (which is what we want to optimize values in expectation). Transitions result from actions of agents in the scene, so unless there is randomness in the actions, I cannot see how transitions will become stochastic? I might be missing something very obvious/silly!

________________________________ From: Amarildo Likmeta <notifications@github.com> Sent: 13 July 2020 08:00 To: eleurent/rl-agents <rl-agents@noreply.github.com> Cc: Arbabi, Salar (PG/R - Mech. Eng. Sci.) <s.arbabi@surrey.ac.uk>; Comment <comment@noreply.github.com> Subject: Re: [eleurent/rl-agents] Flawed Management of internal search environments in tree search planning (#43) Depends on what you call randomness. What it achieves is the fact that the planner does not "see the future". Without changing the seed, the planner environments have the same seeds as the true environment which means that what the planner is actually doing is just choosing the best realization possible knowing what the outcome will be. If you change the seed the realization of the transitions inside the planning tree will be different from the ones in the "true" environment, and the planner will do what it is supposed to: optimize the values in expectation. On the other hand, this still makes the results reproducable.

On Mon, 13 Jul 2020, 08:52 saArbabi, ***@***.***> wrote: @amarildolikmeta/ @eleurent <https://github.com/eleurent> , Could you clarify how seeding the environment object prior to each tree search iteration creates randomness? I am not sure what is actually being random. Thanks. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#43 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEWGHXSXWBQ3L4GOHF444X3R3KVMTANCNFSM4OGOLKTA> .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feleurent%2Frl-agents%2Fissues%2F43%23issuecomment-657392695&data=02%7C01%7Cs.arbabi%40surrey.ac.uk%7Cec5c39fa98bb44914eca08d826fa7ab7%7C6b902693107440aa9e21d89446a2ebb5%7C0%7C0%7C637302204528708101&sdata=1mS7PU9yHBFTlDa2qXvXjjo3Sx4N%2BYphz8vx1TmTM80%3D&reserved=0>, or unsubscribe<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAK5ZOXILJFD73U76WR3XDSDR3KWKFANCNFSM4OGOLKTA&data=02%7C01%7Cs.arbabi%40surrey.ac.uk%7Cec5c39fa98bb44914eca08d826fa7ab7%7C6b902693107440aa9e21d89446a2ebb5%7C0%7C0%7C637302204528708101&sdata=m2RHxprslNXplN6yv8JtvfyykRa6ohrIF7uwEpGGleo%3D&reserved=0>.

eleurent · 2020-07-23T07:46:08Z

@amarildolikmeta thanks for the anwser!
I will elaborate.

I do not understand how seeding the environment object will randomize the transitions during planning (which is what we want to optimize values in expectation). Transitions result from actions of agents in the scene, so unless there is randomness in the actions, I cannot see how transitions will become stochastic? I might be missing something very obvious/silly!

There are stochastic environments which, when stepped from a given state s with an action a, randomly transition to a next states s'. This randomness is inherent to the environment transitions (typically used to represent noise, perturbations, unmodelled effects) and does not stem from randomness in the actions: a deterministic policy will still yield random trajectories. And this randomness is controlled by a seed for reproducibility.

Now, monte-carlo tree search algorithms are based on the idea that it is possible to sample random trajectories from the current state s (you need a so-called generative model). In this repository, this is implemented by copying the full environment object, which contains its internal state, but also its seed. Thus, when sampling trajectories from the copies of the current state at the root, you will always end up in the same states as if the environment was deterministic, since the RandomState/seed is fixed.
Reseeding the copied environments will change the future outcomes when trajectories are sampled, thus reproducing the stochasticity of the dynamics during planning.

saArbabi · 2020-07-27T16:29:58Z

Thank you both for taking the time. This makes sense.

amarildolikmeta closed this as completed Jun 24, 2020

eleurent mentioned this issue Jun 29, 2020

stochastic transitions for tree search agents #44

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flawed Management of internal search environments in tree search planning #43

Flawed Management of internal search environments in tree search planning #43

amarildolikmeta commented Jun 24, 2020

eleurent commented Jun 24, 2020

amarildolikmeta commented Jun 24, 2020

eleurent commented Jun 24, 2020

saArbabi commented Jul 13, 2020

amarildolikmeta commented Jul 13, 2020 via email

saArbabi commented Jul 14, 2020 via email

eleurent commented Jul 23, 2020

saArbabi commented Jul 27, 2020

Flawed Management of internal search environments in tree search planning #43

Flawed Management of internal search environments in tree search planning #43

Comments

amarildolikmeta commented Jun 24, 2020

eleurent commented Jun 24, 2020

amarildolikmeta commented Jun 24, 2020

eleurent commented Jun 24, 2020

saArbabi commented Jul 13, 2020

amarildolikmeta commented Jul 13, 2020 via email

saArbabi commented Jul 14, 2020 via email

eleurent commented Jul 23, 2020

saArbabi commented Jul 27, 2020