need for more parameters in reset(), step(), etc. #337

osigaud · 2016-09-09T06:50:46Z

Hi,

I'm currently refactoring a more complicated environment to match gym's API and I'm meeting the limits of the current API.

For reset(), I may want to have a deterministic reset(), which always start from the same point, or a stochastic one (the current one). Thus adding a boolean "deterministic" parameter. Or I may have a list of starting states and want to add the number of the state from which I want to restart. Or even more complicated, I may perform active learning and choose any state from where to start (In that case, I may rather use an additional set_state() method rather than do it through reset()).

For step(), I may want to pass additional information (in my case, a target_size to see if the current reaching movement was a hit and compute the reward). I believe many environments will require these kinds of extensions and at the moment the only option is to move away with the standard API, making any pull request difficult.

I see two possible solutions:

add a general purpose "setContext( dictionary)" method where you can pass anything to your environment to provide additional information that any API method will use through dedicated attributes
add a (generally empty) dictionary parameter to all the standard methods (in the spirit of the "infos" output of the step() method), that the programmer can use for his own purpose.

Everything coded so far would still work with both methods.
I tend to believe the latter option is preferable, but of course the gym team may think differently (or find a different solution)!

Looking forward to reading you on that
Olivier Sigaud

gdb · 2016-09-09T06:56:43Z

Off the top of my head:

For step(), can you just make your action space more complicated, and include whatever info is required?
For reset(), though mildly hacky, why not have the first .step() of your environment be passing in configuration information like this?

Both approaches fit with the existing paradigm: everything your agent does to alter the environment is encapsulated in the action. This means all existing wrappers and expectations around the semantics of various calls will remain valid. And there's a decent argument that this is the semantically correct thing to do.

What do you think?

osigaud · 2016-09-09T07:05:14Z

Thanks for the immediate reaction!

For step(), can you just make your action space more complicated, and include whatever info is required?

Well, to me this is ugly because from a machine learning point of view, keeping the action space as small as possible is a good idea. Thus I will have to extract the "true action" from what I send to step() before sending it to the learner...

For reset(), though mildly hacky, why not have the first .step() of your environment be passing in configuration information like this?

Even worse, because I will have to build a very specific "action" variable just for that step, that will contain for instance the state where I want to start...

If you don't like the idea of adding a general purpose parameter everywhere, I find adding the "setContext()" method less hacky than what you suggest. And for you, in terms of effort, this is just adding one method in the generic env class...

Still not convinced? ;)
Olivier

osigaud · 2016-09-09T07:16:01Z

A further thought : rather than calling it "setContext()", the method could be called "setEnvAttributes()". The idea is that there might be several varying features in your environment (is it deterministic? where is the initial state? what is the target size?...) and you may want the agent (or another experimental scheduling process) to have some control over these features...

Olivier

gdb · 2016-09-09T07:21:10Z

Another option would be to initialize a new environment each time.

(FWIW, you're welcome to add whatever methods you want to a specific environment.)

gdb · 2016-09-09T07:23:32Z

Err, sorry early send. But, the thing I was going to say is, we're definitely going to try out hardest to keep the core interface as simple as possible. We've found there's a huge amount of power to having simple reset/step. Specific environments are welcome to grow more functionality, though they stop being automatically comparable in quite the same ways.

It's definitely a tradeoff, and we ultimately need to strike a balance between being flexible and being unified.

osigaud · 2016-09-09T07:31:46Z

Another option would be to initialize a new environment each time.

OK: the init is my "setEnvAttributes()" method... But then I need to send parameters to my init.

(FWIW, you're welcome to add whatever methods you want to a specific environment.)

Yes, but can these additional method be called on an environment created with
env = gym.make('myEnv') ?

It's definitely a tradeoff, and we ultimately need to strike a balance between being flexible and being unified.

Yes, I understand this and I agree on this general philosophy. It seems to me that adding my "setEnvAttributes()" will add a lot of power (thus unification) with nearly no loss in flexibility (all previous environment will just ignore it). But I won't insist more ;)

Olivier

gdb · 2016-09-09T07:36:02Z

Gotcha. I almost didn't want to mention it but there's also configure,
which on closer read is actually probably exactly the same as
setEnvAttributes.

I hoped it would not be the answer since the intent was only to use it for
things which don't change the semantics of the environment -- thus we could
be sure that two envs with the same ID are always comparable.

Let me know if that looks like what you want :).

On Friday, September 9, 2016, Olivier Sigaud notifications@github.com
wrote:

Another option would be to initialize a new environment each time.

OK: the init is my "setEnvAttributes()" method... But then I need to
send parameters to my init.

(FWIW, you're welcome to add whatever methods you want to a specific
environment.)

Yes, but can these additional method be called on an environment created
with
env = gym.make('myEnv') ?

It's definitely a tradeoff, and we ultimately need to strike a balance
between being flexible and being unified.

Yes, I understand this and I agree on this general philosophy. It seems to
me that adding my "setEnvAttributes()" will add a lot of power (thus
unification) with nearly no loss in flexibility (all previous environment
will just ignore it). But I won't insist more ;)

Olivier

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#337 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAM7kRoWpSqGreqqzLFPQgvK7hYkGCFwks5qoQtjgaJpZM4J4wGE
.

Sent from mobile

osigaud · 2016-09-09T07:47:00Z

Yes, this is exactly what I need!

Through configure(), I can tell my environment: "now, your reset is deterministic", or "now, you start from that state", "now your target is that large", etc.

I see the danger of using it too much, bu t I believe I have use cases where it is truly the solution.

Thanks!
Olivier

hholst80 · 2016-09-15T17:18:46Z

Can you explain more what a semantic compatible vs incompatible use of configure would be?

tlbtlbtlb · 2016-09-15T19:19:17Z

'Semantically compatible' means it doesn't change the action/observation/reward behavior, but something else like the way it renders the visualization.

Compatible: env.configure(display=':0')
Incompatible: env.configure(gravity=9.7)

Since we want to be able to make rigorous comparisons between agents on the same environment, it's important that changes to the environment's semantics be clearly distinguished.

hholst80 · 2016-09-15T19:35:23Z

Ok. I realise that we are abusing the API if we are for instance setting frame_skip in ALE via configure. How should such auguments to the environment be passed on to the environment? Is it impossible to use gym.make directly if we need to have this flexibility?

tlbtlbtlb · 2016-09-15T19:52:19Z

Define 2 separately named environments, with different kwargs to the constructor. See, for example, FrozenLake-v0 and FrozenLake8x8-v0.

For a more complex environment, see how kwargs are generated programatically for the standard Atari environemnts

tlbtlbtlb closed this as completed Oct 12, 2016

RedTachyon mentioned this issue Sep 6, 2021

[Proposal] Custom arguments in step and reset methods #2399

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

need for more parameters in reset(), step(), etc. #337

need for more parameters in reset(), step(), etc. #337

osigaud commented Sep 9, 2016

gdb commented Sep 9, 2016

osigaud commented Sep 9, 2016

osigaud commented Sep 9, 2016

gdb commented Sep 9, 2016

gdb commented Sep 9, 2016

osigaud commented Sep 9, 2016

gdb commented Sep 9, 2016

osigaud commented Sep 9, 2016

hholst80 commented Sep 15, 2016

tlbtlbtlb commented Sep 15, 2016

hholst80 commented Sep 15, 2016

tlbtlbtlb commented Sep 15, 2016

need for more parameters in reset(), step(), etc. #337

need for more parameters in reset(), step(), etc. #337

Comments

osigaud commented Sep 9, 2016

gdb commented Sep 9, 2016

osigaud commented Sep 9, 2016

osigaud commented Sep 9, 2016

gdb commented Sep 9, 2016

gdb commented Sep 9, 2016

osigaud commented Sep 9, 2016

gdb commented Sep 9, 2016

osigaud commented Sep 9, 2016

hholst80 commented Sep 15, 2016

tlbtlbtlb commented Sep 15, 2016

hholst80 commented Sep 15, 2016

tlbtlbtlb commented Sep 15, 2016