Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flawed Management of internal search environments in tree search planning #43

Closed
amarildolikmeta opened this issue Jun 24, 2020 · 8 comments

Comments

@amarildolikmeta
Copy link

Some tree search algorithms implemented might have flawed high performances because of the management of the environments inside the tree search. Specifically to conduct the search, the environment is copied and passed to the planners, but the environment seed is copied as well. This results in a kind of "foreseeing the future", because the planners optimize on the random realizations instead of in expectation. This happens in the OLOP planner and also in the deterministic planner (ODP). In the deterministic planner is not that serious since it is thought for deterministic transitions, but in practice if you run this planner with a stochastic environment, the effect is that it performs amazingly well (because it can "predict" the exact future realizations).
This can be easily fixed by setting a random seed to the environments after copying them for the planners, e.g. adding the seeding in the plan method of the planner.

@eleurent
Copy link
Owner

Hi @amarildolikmeta, that's absolutely right.

I initially experimented a lot with deterministic MDPs, which is why I didn't really notice this issue.
I eventually fixed it on a development branch (see e.g. a80279b), and completely forgot to backport the fix to master.

I am planning to merge this branch soon, and will check then that everything is in order.

@amarildolikmeta
Copy link
Author

Hi @eleurent, thanks for the quick response.
Perfect than I will close the issue when the branches are merged.

@eleurent
Copy link
Owner

@saArbabi
Copy link
Contributor

@amarildolikmeta/ @eleurent ,
Could you clarify how seeding the environment object prior to each tree search iteration creates randomness? I am not sure what is actually being random.
Thanks.

@amarildolikmeta
Copy link
Author

amarildolikmeta commented Jul 13, 2020 via email

@saArbabi
Copy link
Contributor

saArbabi commented Jul 14, 2020 via email

@eleurent
Copy link
Owner

@amarildolikmeta thanks for the anwser!
I will elaborate.

I do not understand how seeding the environment object will randomize the transitions during planning (which is what we want to optimize values in expectation). Transitions result from actions of agents in the scene, so unless there is randomness in the actions, I cannot see how transitions will become stochastic? I might be missing something very obvious/silly!

There are stochastic environments which, when stepped from a given state s with an action a, randomly transition to a next states s'. This randomness is inherent to the environment transitions (typically used to represent noise, perturbations, unmodelled effects) and does not stem from randomness in the actions: a deterministic policy will still yield random trajectories. And this randomness is controlled by a seed for reproducibility.

Now, monte-carlo tree search algorithms are based on the idea that it is possible to sample random trajectories from the current state s (you need a so-called generative model). In this repository, this is implemented by copying the full environment object, which contains its internal state, but also its seed. Thus, when sampling trajectories from the copies of the current state at the root, you will always end up in the same states as if the environment was deterministic, since the RandomState/seed is fixed.
Reseeding the copied environments will change the future outcomes when trajectories are sampled, thus reproducing the stochasticity of the dynamics during planning.

@saArbabi
Copy link
Contributor

Thank you both for taking the time. This makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants