Commit
* Make environments seedable * Fix monitor bugs - Set monitor_id before setting the infix. This was a bug that would yield incorrect results with multiple monitors. - Remove extra pid from stats recorder filename. This should be purely cosmetic. * Start uploading seeds in episode_batch * Fix _bigint_from_bytes for python3 * Set seed explicitly in random_agent * Pass through seed argument * Also pass through random state to spaces * Pass random state into the observation/action spaces * Make all _seed methods return the list of used seeds * Switch over to np.random where possible * Start hashing seeds, and also seed doom engine * Fixup seeding determinism in many cases * Seed before loading the ROM * Make seeding more Python3 friendly * Make the MuJoCo skipping a bit more forgiving * Remove debugging PDB calls * Make setInt argument into raw bytes * Validate and upload seeds * Skip box2d * Make seeds smaller, and change representation of seeds in upload * Handle long seeds * Fix RandomAgent example to be deterministic * Handle integer types correctly in Python2 and Python3 * Try caching pip * Try adding swap * Add df and free calls * Bump swap * Bump swap size * Try setting overcommit * Try other sysctls * Try fixing overcommit * Try just setting overcommit_memory=1 * Add explanatory comment * Add what's new section to readme * BUG: Mark ElevatorAction-ram-v0 as non-deterministic for now * Document seed * Move nondetermistic check into spec
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,7 @@ | ||
from gym import Env | ||
from gym.spaces import Discrete, Tuple | ||
from gym.utils import colorize | ||
from gym.utils import colorize, seeding | ||
import numpy as np | ||
import random | ||
from six import StringIO | ||
import sys | ||
import math | ||
|
@@ -17,6 +16,7 @@ class AlgorithmicEnv(Env): | |
|
||
def __init__(self, inp_dim=1, base=10, chars=False): | ||
global hash_base | ||
|
||
hash_base = 50 ** np.arange(inp_dim) | ||
self.base = base | ||
self.last = 10 | ||
|
@@ -27,10 +27,17 @@ def __init__(self, inp_dim=1, base=10, chars=False): | |
self.inp_dim = inp_dim | ||
AlgorithmicEnv.current_length = 2 | ||
tape_control = [] | ||
self.action_space = Tuple(([Discrete(2 * inp_dim), Discrete(2), Discrete(self.base)])) | ||
self.observation_space = Discrete(self.base + 1) | ||
|
||
self._seed() | ||
self.reset() | ||
|
||
def _seed(self, seed=None): | ||
self.np_random, seed = seeding.np_random(seed) | ||
|
||
self.action_space = Tuple(([Discrete(2 * self.inp_dim, np_random=self.np_random), Discrete(2, np_random=self.np_random), Discrete(self.base, np_random=self.np_random)])) | ||
This comment has been minimized.
Sorry, something went wrong.
This comment has been minimized.
Sorry, something went wrong.
gdb
Author
Collaborator
|
||
self.observation_space = Discrete(self.base + 1, np_random=self.np_random) | ||
return [seed] | ||
|
||
def _get_obs(self, pos=None): | ||
if pos is None: | ||
pos = self.x | ||
|
@@ -198,6 +205,6 @@ def _reset(self): | |
AlgorithmicEnv.sum_rewards = [] | ||
self.sum_reward = 0.0 | ||
self.time = 0 | ||
self.total_len = random.randrange(3) + AlgorithmicEnv.current_length | ||
self.total_len = self.np_random.randint(3) + AlgorithmicEnv.current_length | ||
self.set_data() | ||
return self._get_obs() |
It's non-intuitive to me that action/observation spaces would require a random seed initializer. Looking closer the reason it's required is for sample(), which IMO is an api for agents rather than part of envs.
What do you think about making sample() take a seed / numpy.RandomState directly? It doesn't solve the problem of allowing fully deterministic reproductions (because we don't have a way to record seeds used for other sources of agent randomness), but it does a better job separating agent / env randomness and still allows fully deterministic environments (in the sense that if you supply the same set of action sequences you'll get the same trajectory)