-
Notifications
You must be signed in to change notification settings - Fork 8.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
need for more parameters in reset(), step(), etc. #337
Comments
Off the top of my head:
Both approaches fit with the existing paradigm: everything your agent does to alter the environment is encapsulated in the What do you think? |
Thanks for the immediate reaction!
Well, to me this is ugly because from a machine learning point of view, keeping the action space as small as possible is a good idea. Thus I will have to extract the "true action" from what I send to step() before sending it to the learner...
Even worse, because I will have to build a very specific "action" variable just for that step, that will contain for instance the state where I want to start... If you don't like the idea of adding a general purpose parameter everywhere, I find adding the "setContext()" method less hacky than what you suggest. And for you, in terms of effort, this is just adding one method in the generic env class... Still not convinced? ;) |
A further thought : rather than calling it "setContext()", the method could be called "setEnvAttributes()". The idea is that there might be several varying features in your environment (is it deterministic? where is the initial state? what is the target size?...) and you may want the agent (or another experimental scheduling process) to have some control over these features... Olivier |
Another option would be to initialize a new environment each time. (FWIW, you're welcome to add whatever methods you want to a specific environment.) |
Err, sorry early send. But, the thing I was going to say is, we're definitely going to try out hardest to keep the core interface as simple as possible. We've found there's a huge amount of power to having simple reset/step. Specific environments are welcome to grow more functionality, though they stop being automatically comparable in quite the same ways. It's definitely a tradeoff, and we ultimately need to strike a balance between being flexible and being unified. |
OK: the init is my "setEnvAttributes()" method... But then I need to send parameters to my init.
Yes, but can these additional method be called on an environment created with
Yes, I understand this and I agree on this general philosophy. It seems to me that adding my "setEnvAttributes()" will add a lot of power (thus unification) with nearly no loss in flexibility (all previous environment will just ignore it). But I won't insist more ;) Olivier |
Gotcha. I almost didn't want to mention it but there's also configure, I hoped it would not be the answer since the intent was only to use it for Let me know if that looks like what you want :). On Friday, September 9, 2016, Olivier Sigaud notifications@github.com
Sent from mobile |
Yes, this is exactly what I need! Through configure(), I can tell my environment: "now, your reset is deterministic", or "now, you start from that state", "now your target is that large", etc. I see the danger of using it too much, bu t I believe I have use cases where it is truly the solution. Thanks! |
Can you explain more what a semantic compatible vs incompatible use of configure would be? |
'Semantically compatible' means it doesn't change the action/observation/reward behavior, but something else like the way it renders the visualization. Compatible: Since we want to be able to make rigorous comparisons between agents on the same environment, it's important that changes to the environment's semantics be clearly distinguished. |
Ok. I realise that we are abusing the API if we are for instance setting frame_skip in ALE via configure. How should such auguments to the environment be passed on to the environment? Is it impossible to use gym.make directly if we need to have this flexibility? |
Define 2 separately named environments, with different For a more complex environment, see how kwargs are generated programatically for the standard Atari environemnts |
Hi,
I'm currently refactoring a more complicated environment to match gym's API and I'm meeting the limits of the current API.
For reset(), I may want to have a deterministic reset(), which always start from the same point, or a stochastic one (the current one). Thus adding a boolean "deterministic" parameter. Or I may have a list of starting states and want to add the number of the state from which I want to restart. Or even more complicated, I may perform active learning and choose any state from where to start (In that case, I may rather use an additional set_state() method rather than do it through reset()).
For step(), I may want to pass additional information (in my case, a target_size to see if the current reaching movement was a hit and compute the reward). I believe many environments will require these kinds of extensions and at the moment the only option is to move away with the standard API, making any pull request difficult.
I see two possible solutions:
Everything coded so far would still work with both methods.
I tend to believe the latter option is preferable, but of course the gym team may think differently (or find a different solution)!
Looking forward to reading you on that
Olivier Sigaud
The text was updated successfully, but these errors were encountered: