Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spaces class support 'sample without replacement' method? #497

Closed
rockingdingo opened this issue Feb 13, 2017 · 4 comments
Closed

Spaces class support 'sample without replacement' method? #497

rockingdingo opened this issue Feb 13, 2017 · 4 comments

Comments

@rockingdingo
Copy link

Hi OpenAI team,

I am quite new to gym package and want to know if it is reasonable to add methods to the Space related class to support ‘sample without replacement’ situation?

Right now spaces classes like Discrete/Box are assuming the action_space is fixed and valid all the time. And the sample() method randomly choose one action from the [0,n) action_space (sample with replacement). You can sample the same action many times during one episode. There games include ‘atari’ and others whose action space don’t change.

'Sample with replacement'

class Discrete (gym.Space):
    def __init__(self, n):
        self.n = n
    def sample(self):
        return prng.np_random.randint(self.n)

However there are a lot of other games, in which the valid action spaces keep changing.
For example, like ‘Go’ game you can’t take action on the positions where there are already stones on the board, otherwise it’s illegal and episode ends too early. And the sample() method should only sample from the remaining valid actions. Other games like ‘poker’ games, valid action spaces is limited to your hand cards and will keep becoming smaller.

Do you think there is a need to add method to the Space class to keep track of the remaining spaces
so that env class can know and only sample from the valid ones?

I am working on a Gomoku(five-in-a-row on Go board) environment. And now I use a work-around method : add a remove() method to the Space, So that whenever action is taken, it can be eliminated from the valid_action_space. And sample() method will be only be sampling from valid ones.

something like:
'Sample without replacement'

class DiscreteWrapper (spaces.Discrete):
    def __init__(self, n):
        self.n = n
        self.valid_spaces = list(range(n))

    def sample(self):
        # only sample from the remaining valid_spaces
        # … 

    def remove(self, s):
        '''Remove space s from the valid spaces
        '''
        if s in self.valid_spaces:
            self.valid_spaces.remove(s)
        else:
            print ("space %d is not in valid spaces" % s)
@tlbtlbtlb
Copy link
Contributor

I don't think it belongs in the action spaces which are intended to remain simple. Logic like keeping track of available moves belongs in the agent. If you want the agent to take random samples while eliminating illegal moves, rejection sampling is efficient in all but the most pathological situations:

    while True:
        action = space.sample()
        if is_valid_action(action): break

@rockingdingo
Copy link
Author

Hi @tlbtlbtlb
Thank you so much for your advice. Yes, rejection sampling is good for training the agent. A further question is that: what if there is an opponent in the Env, like the ‘white’ opponent player in the ‘Go’ game, and the opponent random samples() among illegal actions and the game will always finish earlier than normal and thus incomplete?

‘Go’ Environment:
I tried to reproduce the top algorithm on the website under the ‘Go’ game whose reward is always ‘1’. It seems like the black ’X’ policy is putting on the same fixed positions for all episodes. The trick is once the opponent ‘white’ randomly sample() a move from the whole action space, if it is already taken previously, it is thus illegal. The env always raise ‘lose’ status. That’s how the game is always rewards as ‘1’ while in fact the real world Go game never ends like this. And the agent can’t truly learn from incomplete game, right?

Any ideas?

Top Evaluations on website:
https://gym.openai.com/evaluations/eval_JIYm7FoWQlu1s1KIoijdAQ

@rockingdingo
Copy link
Author

Btw, there are only 3 games in the category ‘board_game’ now. And I just finished another board game ‘Gomoku’ (Five-in-a-row) on the Go board. May I know if it’s reasonable to contribute new environments to enrich this category?

I followed the guidance, and ‘gym_gomoku’ is already working. Repo:
https://github.com/rockingdingo/gym-gomoku

Thank you.

@tlbtlbtlb
Copy link
Contributor

tlbtlbtlb commented Feb 17, 2017

env = gym.make('Go9x9-v0') returns a go env with illegal_move_mode='lose'. A good RL agent will learn not to make illegal moves in the same way it will learn not to make other bad moves: by associating the move with losing.

If you want to use a version of the env that reports illegal moves rather than losing, you can call

import pachi_py, gym.envs.board_game.go
...
env = gym.envs.board_game.go.GoEnv(player_color='black', opponent='pachi:uct:_2400', observation_type='image3c', illegal_move_mode='raise', board_size=9)

Calling .step() with an illegal move will raise an exception pachi_py.IllegalMove, which you can catch with a try-except block:

while True:
    a = env.action_space.sample()
    try:
        observation, reward, done, info = env.step(a)
    except pachi_py.IllegalMove:
        continue
    break

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants