* Make environments seedable
* Fix monitor bugs
- Set monitor_id before setting the infix. This was a bug that would yield incorrect results with multiple monitors.
- Remove extra pid from stats recorder filename. This should be purely cosmetic.
* Start uploading seeds in episode_batch
* Fix _bigint_from_bytes for python3
* Set seed explicitly in random_agent
* Pass through seed argument
* Also pass through random state to spaces
* Pass random state into the observation/action spaces
* Make all _seed methods return the list of used seeds
* Switch over to np.random where possible
* Start hashing seeds, and also seed doom engine
* Fixup seeding determinism in many cases
* Seed before loading the ROM
* Make seeding more Python3 friendly
* Make the MuJoCo skipping a bit more forgiving
* Remove debugging PDB calls
* Make setInt argument into raw bytes
* Validate and upload seeds
* Skip box2d
* Make seeds smaller, and change representation of seeds in upload
* Handle long seeds
* Fix RandomAgent example to be deterministic
* Handle integer types correctly in Python2 and Python3
* Try caching pip
* Try adding swap
* Add df and free calls
* Bump swap
* Bump swap size
* Try setting overcommit
* Try other sysctls
* Try fixing overcommit
* Try just setting overcommit_memory=1
* Add explanatory comment
* Add what's new section to readme
* BUG: Mark ElevatorAction-ram-v0 as non-deterministic for now
* Document seed
* Move nondetermistic check into spec
@@ -251,3 +251,10 @@ We are using `nose2 <https://github.com/nose-devs/nose2>`_ for tests. You can ru
nose2
You can also run tests in a specific directory by using the ``-s`` option, or by passing in the specific name of the test. See the `nose2 docs <http://nose2.readthedocs.org/en/latest/usage.html#naming-tests>`_ for more details.
What's new
----------
- 2016-05-28: For controlled reproducibility, envs now support seeding
(cf #91 and #135). The monitor records which seeds are used. We will
soon add seed information to the display on the scoreboard.
"""Seeds the 'random' and 'numpy.random' generators. By default,
Python seeds these with the system time. Call this if you are
using multiple processes.
Notes:
SECURITY SENSITIVE: a bug here would allow people to generate fake results. Please let us know if you find one :).
Args:
a (Optional[int, str]): None or no argument seeds from an operating system specific randomness source. If an int or str passed, then all of bits are used.
"""
# Adapted from https://svn.python.org/projects/python/tags/r32/Lib/random.py
It's non-intuitive to me that action/observation spaces would require a random seed initializer. Looking closer the reason it's required is for sample(), which IMO is an api for agents rather than part of envs.
What do you think about making sample() take a seed / numpy.RandomState directly? It doesn't solve the problem of allowing fully deterministic reproductions (because we don't have a way to record seeds used for other sources of agent randomness), but it does a better job separating agent / env randomness and still allows fully deterministic environments (in the sense that if you supply the same set of action sequences you'll get the same trajectory)
I can definitely see the argument. I leaned towards drawing the boundaries at Gym code vs user code since randomness is very hard to get right, and it seems worth us going as far out of our way to help the user as possible. (And also to be deterministic if the user has no randomness in their code.)
However, bringing the action/observation spaces into the fold definitely complicated the code, and as you say, it might be cleaner to think about env code vs agent code.
Another (half-baked) idea: add a gym.sample_action(env) which takes care of initializing and passing a seed to action_space.sample() as well as recording the seed in the monitor
edit: I'm starting to be more convinced that separating env / agent randomness is the right way to go - with the current approach even if you initialize the env with the same seed and supply the same actions to an env you will get different environment observations depending on how many calls to action_space.sample() you make.
What use case do you have in mind? The only one I really know is having a fully reproducible run. Would you ever expect the user would make varied calls to action_space.sample() but still make the exact same sequence of actions? Maybe for some model-based exploration?
One use case is debugging. I'm writing an agent and I see it make a bad action in a particular state. So I'd like to force the env back into that state so I can tweak the algorithm to get better performance. I record the seed and actions and replay them. As I'm tweaking I add some extra calls to action_space.sample(); suddenly I can't get back to the state I'm debugging even though it seems like the initialization and state changes are all the same. (calling action_space.sample() essentially re-seeds the environment under the hood by drawing from the RNG)
It's great that now the initial state sampling is deterministic. But it seems like there's a fair amount of clutter in the code (self.setting observation/action spaces in _seed) for a very uncommon use case -- needing deterministic numbers to come out of action_space.sample(). How about we just use a global RandomState object, say, spaces.randomstate? (It's a step up from using the global numpy randomstate, which is what we were doing before)
That would neatly also address Jie's point. One downside is it means not
correctly supporting multiple monitored envs in a single process.
I could see another option, which would be to just reseed the RandomState
rather than create a new one from scratch each time. I suspect this would
fix a lot of the code complexity, since _seed() would not have to recreate
any objects, and we could move most of the code back where it is.
I think making the final choice probably depends on the use-cases. I don't
have a great sense of when one would use action_space.sample() -- given you
can't control the distribution, how often does it end up being useful in
practice? Would adding hoops like passing your own prng or writing your own
sampler if you want determinism be annoying?
On Sunday, May 29, 2016, John Schulman ***@***.*** wrote:
It's great that now the initial state sampling is deterministic. But it
seems like there's a fair amount of clutter in the code (self.setting
observation/action spaces in _seed) for a very uncommon use case -- needing
deterministic numbers to come out of action_space.sample(). How about we
just use a global RandomState object, say, spaces.randomstate? (It's a step
up from using the global numpy randomstate, which is what we were doing
before)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
58e6aa9#commitcomment-17662027,
or mute the thread
https://github.com/notifications/unsubscribe/AAM7kRmX8VT2RU_FmR3waWWNdO2UkOgkks5qGnIagaJpZM4IpaXm
.
Using a global RNG doesn't rule out determinism. My TRPO and CEM implementations were fully deterministic, as long as you call np.random.seed, and as long as the environment isn't using some other source of randomness.
Using action_space.sample() isn't that common of a use case, but I can imagine using it for epsilon-greedy exploration. But it's rare that someone would use this feature and need determinism.
It's non-intuitive to me that action/observation spaces would require a random seed initializer. Looking closer the reason it's required is for sample(), which IMO is an api for agents rather than part of envs.
What do you think about making sample() take a seed / numpy.RandomState directly? It doesn't solve the problem of allowing fully deterministic reproductions (because we don't have a way to record seeds used for other sources of agent randomness), but it does a better job separating agent / env randomness and still allows fully deterministic environments (in the sense that if you supply the same set of action sequences you'll get the same trajectory)