Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

need for more parameters in reset(), step(), etc. #337

Closed
osigaud opened this issue Sep 9, 2016 · 12 comments
Closed

need for more parameters in reset(), step(), etc. #337

osigaud opened this issue Sep 9, 2016 · 12 comments

Comments

@osigaud
Copy link
Contributor

osigaud commented Sep 9, 2016

Hi,

I'm currently refactoring a more complicated environment to match gym's API and I'm meeting the limits of the current API.

For reset(), I may want to have a deterministic reset(), which always start from the same point, or a stochastic one (the current one). Thus adding a boolean "deterministic" parameter. Or I may have a list of starting states and want to add the number of the state from which I want to restart. Or even more complicated, I may perform active learning and choose any state from where to start (In that case, I may rather use an additional set_state() method rather than do it through reset()).

For step(), I may want to pass additional information (in my case, a target_size to see if the current reaching movement was a hit and compute the reward). I believe many environments will require these kinds of extensions and at the moment the only option is to move away with the standard API, making any pull request difficult.

I see two possible solutions:

  • add a general purpose "setContext( dictionary)" method where you can pass anything to your environment to provide additional information that any API method will use through dedicated attributes
  • add a (generally empty) dictionary parameter to all the standard methods (in the spirit of the "infos" output of the step() method), that the programmer can use for his own purpose.

Everything coded so far would still work with both methods.
I tend to believe the latter option is preferable, but of course the gym team may think differently (or find a different solution)!

Looking forward to reading you on that
Olivier Sigaud

@gdb
Copy link
Collaborator

gdb commented Sep 9, 2016

Off the top of my head:

  • For step(), can you just make your action space more complicated, and include whatever info is required?
  • For reset(), though mildly hacky, why not have the first .step() of your environment be passing in configuration information like this?

Both approaches fit with the existing paradigm: everything your agent does to alter the environment is encapsulated in the action. This means all existing wrappers and expectations around the semantics of various calls will remain valid. And there's a decent argument that this is the semantically correct thing to do.

What do you think?

@osigaud
Copy link
Contributor Author

osigaud commented Sep 9, 2016

Thanks for the immediate reaction!

For step(), can you just make your action space more complicated, and include whatever info is required?

Well, to me this is ugly because from a machine learning point of view, keeping the action space as small as possible is a good idea. Thus I will have to extract the "true action" from what I send to step() before sending it to the learner...

For reset(), though mildly hacky, why not have the first .step() of your environment be passing in configuration information like this?

Even worse, because I will have to build a very specific "action" variable just for that step, that will contain for instance the state where I want to start...

If you don't like the idea of adding a general purpose parameter everywhere, I find adding the "setContext()" method less hacky than what you suggest. And for you, in terms of effort, this is just adding one method in the generic env class...

Still not convinced? ;)
Olivier

@osigaud
Copy link
Contributor Author

osigaud commented Sep 9, 2016

A further thought : rather than calling it "setContext()", the method could be called "setEnvAttributes()". The idea is that there might be several varying features in your environment (is it deterministic? where is the initial state? what is the target size?...) and you may want the agent (or another experimental scheduling process) to have some control over these features...

Olivier

@gdb
Copy link
Collaborator

gdb commented Sep 9, 2016

Another option would be to initialize a new environment each time.

(FWIW, you're welcome to add whatever methods you want to a specific environment.)

@gdb
Copy link
Collaborator

gdb commented Sep 9, 2016

Err, sorry early send. But, the thing I was going to say is, we're definitely going to try out hardest to keep the core interface as simple as possible. We've found there's a huge amount of power to having simple reset/step. Specific environments are welcome to grow more functionality, though they stop being automatically comparable in quite the same ways.

It's definitely a tradeoff, and we ultimately need to strike a balance between being flexible and being unified.

@osigaud
Copy link
Contributor Author

osigaud commented Sep 9, 2016

Another option would be to initialize a new environment each time.

OK: the init is my "setEnvAttributes()" method... But then I need to send parameters to my init.

(FWIW, you're welcome to add whatever methods you want to a specific environment.)

Yes, but can these additional method be called on an environment created with
env = gym.make('myEnv') ?

It's definitely a tradeoff, and we ultimately need to strike a balance between being flexible and being unified.

Yes, I understand this and I agree on this general philosophy. It seems to me that adding my "setEnvAttributes()" will add a lot of power (thus unification) with nearly no loss in flexibility (all previous environment will just ignore it). But I won't insist more ;)

Olivier

@gdb
Copy link
Collaborator

gdb commented Sep 9, 2016

Gotcha. I almost didn't want to mention it but there's also configure,
which on closer read is actually probably exactly the same as
setEnvAttributes.

I hoped it would not be the answer since the intent was only to use it for
things which don't change the semantics of the environment -- thus we could
be sure that two envs with the same ID are always comparable.

Let me know if that looks like what you want :).

On Friday, September 9, 2016, Olivier Sigaud notifications@github.com
wrote:

Another option would be to initialize a new environment each time.

OK: the init is my "setEnvAttributes()" method... But then I need to
send parameters to my init.

(FWIW, you're welcome to add whatever methods you want to a specific
environment.)

Yes, but can these additional method be called on an environment created
with
env = gym.make('myEnv') ?

It's definitely a tradeoff, and we ultimately need to strike a balance
between being flexible and being unified.

Yes, I understand this and I agree on this general philosophy. It seems to
me that adding my "setEnvAttributes()" will add a lot of power (thus
unification) with nearly no loss in flexibility (all previous environment
will just ignore it). But I won't insist more ;)

Olivier


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#337 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAM7kRoWpSqGreqqzLFPQgvK7hYkGCFwks5qoQtjgaJpZM4J4wGE
.

Sent from mobile

@osigaud
Copy link
Contributor Author

osigaud commented Sep 9, 2016

Yes, this is exactly what I need!

Through configure(), I can tell my environment: "now, your reset is deterministic", or "now, you start from that state", "now your target is that large", etc.

I see the danger of using it too much, bu t I believe I have use cases where it is truly the solution.

Thanks!
Olivier

@hholst80
Copy link

Can you explain more what a semantic compatible vs incompatible use of configure would be?

@tlbtlbtlb
Copy link
Contributor

'Semantically compatible' means it doesn't change the action/observation/reward behavior, but something else like the way it renders the visualization.

Compatible: env.configure(display=':0')
Incompatible: env.configure(gravity=9.7)

Since we want to be able to make rigorous comparisons between agents on the same environment, it's important that changes to the environment's semantics be clearly distinguished.

@hholst80
Copy link

Ok. I realise that we are abusing the API if we are for instance setting frame_skip in ALE via configure. How should such auguments to the environment be passed on to the environment? Is it impossible to use gym.make directly if we need to have this flexibility?

@tlbtlbtlb
Copy link
Contributor

Define 2 separately named environments, with different kwargs to the constructor. See, for example, FrozenLake-v0 and FrozenLake8x8-v0.

For a more complex environment, see how kwargs are generated programatically for the standard Atari environemnts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants