-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flawed Management of internal search environments in tree search planning #43
Comments
Hi @amarildolikmeta, that's absolutely right. I initially experimented a lot with deterministic MDPs, which is why I didn't really notice this issue. I am planning to merge this branch soon, and will check then that everything is in order. |
Hi @eleurent, thanks for the quick response. |
It should be fine now: https://github.com/eleurent/rl-agents/search?q=%22state.seed%22&unscoped_q=%22state.seed%22 |
@amarildolikmeta/ @eleurent , |
Depends on what you call randomness.
What it achieves is the fact that the planner does not "see the future".
Without changing the seed, the planner environments have the same seeds as
the true environment which means that what the planner is actually doing is
just choosing the best realization possible knowing what the outcome will
be. If you change the seed the realization of the transitions inside the
planning tree will be different from the ones in the "true" environment,
and the planner will do what it is supposed to: optimize the values in
expectation.
On the other hand, this still makes the results reproducable.
…On Mon, 13 Jul 2020, 08:52 saArbabi, ***@***.***> wrote:
@amarildolikmeta/ @eleurent <https://github.com/eleurent> ,
Could you clarify how seeding the environment object prior to each tree
search iteration creates randomness? I am not sure what is actually being
random.
Thanks.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#43 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEWGHXSXWBQ3L4GOHF444X3R3KVMTANCNFSM4OGOLKTA>
.
|
Thanks for the reply, really appreciate it!
You see, seeding the environment makes sense to me in a RL setting. During training, you want to seed the environment so that a) the agent is not trained in the same scenario all the time and b) you can reproduce experiments.
However, in a tree search setting, I still lack understanding. I do not understand how seeding the environment object will randomize the transitions during planning (which is what we want to optimize values in expectation). Transitions result from actions of agents in the scene, so unless there is randomness in the actions, I cannot see how transitions will become stochastic? I might be missing something very obvious/silly!
________________________________
From: Amarildo Likmeta <notifications@github.com>
Sent: 13 July 2020 08:00
To: eleurent/rl-agents <rl-agents@noreply.github.com>
Cc: Arbabi, Salar (PG/R - Mech. Eng. Sci.) <s.arbabi@surrey.ac.uk>; Comment <comment@noreply.github.com>
Subject: Re: [eleurent/rl-agents] Flawed Management of internal search environments in tree search planning (#43)
Depends on what you call randomness.
What it achieves is the fact that the planner does not "see the future".
Without changing the seed, the planner environments have the same seeds as
the true environment which means that what the planner is actually doing is
just choosing the best realization possible knowing what the outcome will
be. If you change the seed the realization of the transitions inside the
planning tree will be different from the ones in the "true" environment,
and the planner will do what it is supposed to: optimize the values in
expectation.
On the other hand, this still makes the results reproducable.
On Mon, 13 Jul 2020, 08:52 saArbabi, ***@***.***> wrote:
@amarildolikmeta/ @eleurent <https://github.com/eleurent> ,
Could you clarify how seeding the environment object prior to each tree
search iteration creates randomness? I am not sure what is actually being
random.
Thanks.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#43 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEWGHXSXWBQ3L4GOHF444X3R3KVMTANCNFSM4OGOLKTA>
.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feleurent%2Frl-agents%2Fissues%2F43%23issuecomment-657392695&data=02%7C01%7Cs.arbabi%40surrey.ac.uk%7Cec5c39fa98bb44914eca08d826fa7ab7%7C6b902693107440aa9e21d89446a2ebb5%7C0%7C0%7C637302204528708101&sdata=1mS7PU9yHBFTlDa2qXvXjjo3Sx4N%2BYphz8vx1TmTM80%3D&reserved=0>, or unsubscribe<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAK5ZOXILJFD73U76WR3XDSDR3KWKFANCNFSM4OGOLKTA&data=02%7C01%7Cs.arbabi%40surrey.ac.uk%7Cec5c39fa98bb44914eca08d826fa7ab7%7C6b902693107440aa9e21d89446a2ebb5%7C0%7C0%7C637302204528708101&sdata=m2RHxprslNXplN6yv8JtvfyykRa6ohrIF7uwEpGGleo%3D&reserved=0>.
|
@amarildolikmeta thanks for the anwser!
There are stochastic environments which, when stepped from a given state Now, monte-carlo tree search algorithms are based on the idea that it is possible to sample random trajectories from the current state |
Thank you both for taking the time. This makes sense. |
Some tree search algorithms implemented might have flawed high performances because of the management of the environments inside the tree search. Specifically to conduct the search, the environment is copied and passed to the planners, but the environment seed is copied as well. This results in a kind of "foreseeing the future", because the planners optimize on the random realizations instead of in expectation. This happens in the OLOP planner and also in the deterministic planner (ODP). In the deterministic planner is not that serious since it is thought for deterministic transitions, but in practice if you run this planner with a stochastic environment, the effect is that it performs amazingly well (because it can "predict" the exact future realizations).
This can be easily fixed by setting a random seed to the environments after copying them for the planners, e.g. adding the seeding in the plan method of the planner.
The text was updated successfully, but these errors were encountered: