-
Notifications
You must be signed in to change notification settings - Fork 8.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spaces class support 'sample without replacement' method? #497
Comments
I don't think it belongs in the action spaces which are intended to remain simple. Logic like keeping track of available moves belongs in the agent. If you want the agent to take random samples while eliminating illegal moves, rejection sampling is efficient in all but the most pathological situations:
|
Hi @tlbtlbtlb ‘Go’ Environment: Any ideas? Top Evaluations on website: |
Btw, there are only 3 games in the category ‘board_game’ now. And I just finished another board game ‘Gomoku’ (Five-in-a-row) on the Go board. May I know if it’s reasonable to contribute new environments to enrich this category? I followed the guidance, and ‘gym_gomoku’ is already working. Repo: Thank you. |
If you want to use a version of the env that reports illegal moves rather than losing, you can call
Calling
|
Hi OpenAI team,
I am quite new to gym package and want to know if it is reasonable to add methods to the Space related class to support ‘sample without replacement’ situation?
Right now spaces classes like Discrete/Box are assuming the action_space is fixed and valid all the time. And the sample() method randomly choose one action from the [0,n) action_space (sample with replacement). You can sample the same action many times during one episode. There games include ‘atari’ and others whose action space don’t change.
'Sample with replacement'
However there are a lot of other games, in which the valid action spaces keep changing.
For example, like ‘Go’ game you can’t take action on the positions where there are already stones on the board, otherwise it’s illegal and episode ends too early. And the sample() method should only sample from the remaining valid actions. Other games like ‘poker’ games, valid action spaces is limited to your hand cards and will keep becoming smaller.
Do you think there is a need to add method to the Space class to keep track of the remaining spaces
so that env class can know and only sample from the valid ones?
I am working on a Gomoku(five-in-a-row on Go board) environment. And now I use a work-around method : add a remove() method to the Space, So that whenever action is taken, it can be eliminated from the valid_action_space. And sample() method will be only be sampling from valid ones.
something like:
'Sample without replacement'
The text was updated successfully, but these errors were encountered: