Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

~!~ OpenSpiel Extravaganza ~!~ Jun 22nd - Jul 3rd #251

Closed
lanctot opened this issue Jun 22, 2020 · 17 comments
Closed

~!~ OpenSpiel Extravaganza ~!~ Jun 22nd - Jul 3rd #251

lanctot opened this issue Jun 22, 2020 · 17 comments

Comments

@lanctot
Copy link
Collaborator

lanctot commented Jun 22, 2020

Hello everyone,

Starting today until July 3rd, there will be concentrated effort to add functionality to OpenSpiel. During this time, we will be updating github every day (possibly even many times per day).

We will be more available than usual to answer questions that you may have about OpenSpiel. Please use this thread.

Among the list of planned additions:

  • Public states sub-API (for implementing algorithms such as DeepStack)
  • Card games and RL (Hearts, Skat, Bridge, Tarok)
  • More games! Clobber, Emergent Communication, Emerald Mines / Boulderdash, Hyper-backgammon)
  • Add bidding to Skat
  • Get C++ AlphaZero working externally
  • AlphaZero in Hex; Hex & AlphaZero improvements
  • Initial investigation in convergence of CFR in n-player and general-sum games

We will also try to run some algorithms and might report a few results.

@lanctot lanctot changed the title OpenSpiel Extravaganza Jun 22nd - Jul 3rd ~!~ OpenSpiel Extravaganza ~!~ Jun 22nd - Jul 3rd Jun 22, 2020
@lanctot lanctot pinned this issue Jun 22, 2020
@lanctot
Copy link
Collaborator Author

lanctot commented Jun 22, 2020

@findmyway did you know about AlphaZero.jl project? I posted a suggestion to support OpenSpiel games: jonathan-laurent/AlphaZero.jl#15 . Would that be something you'd be interested in trying out or looking into, even if you don't have time between this specific period? Would be a neat collaboration between projects!

@findmyway
Copy link
Collaborator

findmyway commented Jun 23, 2020

Hi @lanctot ,
Yeah, in fact, I contacted @jonathan-laurent just after I decided to write this Julia wrapper. And we had some nice discussions several months ago jonathan-laurent/AlphaZero.jl#4 . I'll provide more info in the issue you created there. jonathan-laurent/AlphaZero.jl#15


By the way, I'm also working on porting some other algorithms (CFR related). Will keep you synced.

@lanctot
Copy link
Collaborator Author

lanctot commented Jun 30, 2020

Ok so we will use this thread as a way to describe our project in a bit more detail and show any relevant progress.

I will start. I implemented HyperBackgammon, which is a simple variant where each side has only 3 checkers. However, it is not the full game because we still don't have the doubling cube implemented (note to self: we should do that :-p)

Here's a screenshot of the initial state:

+------|------+
|......|...ooo|
|......|......|
|......|......|
|......|......|
|......|......|
|      |      |
|......|......|
|......|......|
|......|......|
|......|......|
|......|...xxx|
+------|------+
Turn: *
Dice: 
Bar:
Scores, X: 0, O: 0

It is implemented as a variant of Backgammon, you can load it using the game string backgammon(hyper_backgammon=true)

@lanctot
Copy link
Collaborator Author

lanctot commented Jun 30, 2020

With the help of several people @fzvinicius @finbarrtimbers @jblespiau , I did the final stages of getting OpenSpiel fully supported with TF 2.2 and Python 3.8. As a result, we now fully support Ubuntu 20.04. For details see #166 and #249.

image

@inejc
Copy link
Contributor

inejc commented Jul 2, 2020

I finalized de/serialization of CFR and MCCFR C++ solvers and exposed the functionality in Python. For details see #222.

Below is the output of the Python example where an MCCFR solver is pickled halfway through training and then the process is continued with the loaded solver:

$ python open_spiel/python/examples/mccfr_cpp_example.py --sampling=external --iterations=20
Iteration 0 exploitability: 0.583333
Iteration 1 exploitability: 0.604167
Iteration 2 exploitability: 0.611111
Iteration 3 exploitability: 0.562500
Iteration 4 exploitability: 0.529167
Iteration 5 exploitability: 0.465278
Iteration 6 exploitability: 0.443452
Iteration 7 exploitability: 0.432292
Iteration 8 exploitability: 0.425926
Iteration 9 exploitability: 0.420833
Persisting the model...
Loading the model...
Exploitability of the loaded model: 0.420833
Iteration 10 exploitability: 0.416667
Iteration 11 exploitability: 0.411458
Iteration 12 exploitability: 0.400819
Iteration 13 exploitability: 0.378803
Iteration 14 exploitability: 0.359722
Iteration 15 exploitability: 0.343027
Iteration 16 exploitability: 0.330226
Iteration 17 exploitability: 0.318848
Iteration 18 exploitability: 0.305478
Iteration 19 exploitability: 0.291289

@inejc
Copy link
Contributor

inejc commented Jul 2, 2020

@TimSmole and I implemented the Slovenian Tarok card game and started porting it to the OpenSpiel repository. For details see #274.

Below is part of the initial output of the Python script that enables one to play the full game:

$ python tarok/python/play_game.py                                          
Game phase: GamePhase.CARD_DEALING
Selected contract: Contract.NOT_SELECTED
Current player: -1

Player cards: []

Legal actions: [('Deal', 0)]

Enter action: 0
---------------------------------------------------------------------- 

Game phase: GamePhase.BIDDING
Selected contract: Contract.NOT_SELECTED
Current player: 1

Player cards: [('V', 4), ('VI', 5), ('VII', 6), ('XII', 11), ('XIV', 13), ...]

Legal actions: [('Pass', 0), ('Two', 3), ('One', 4), ('Beggar', 8), ...]

Enter action: 0
---------------------------------------------------------------------- 

Game phase: GamePhase.BIDDING
Selected contract: Contract.NOT_SELECTED
Current player: 2

Player cards: [('Pagat', 0), ('II', 1), ('IIII', 3), ('VIII', 7), ...]

Legal actions: [('Pass', 0), ('Two', 3), ('One', 4), ('Beggar', 8), ...]

Enter action: 0
---------------------------------------------------------------------- 

Game phase: GamePhase.BIDDING
Selected contract: Contract.NOT_SELECTED
Current player: 0

Player cards: [('III', 2), ('X', 9), ('XIII', 12), ('XV', 14), ...]

Legal actions: [('Klop', 1), ('Three', 2), ('Two', 3), ('One', 4), ...]

Enter action: 1
---------------------------------------------------------------------- 

Game phase: GamePhase.TRICKS_PLAYING
Selected contract: Contract.KLOP
Current player: 0

Player cards: [('III', 2), ('X', 9), ('XIII', 12), ('XV', 14), ...]

Trick cards: []

Legal actions: [('III', 2), ('X', 9), ('XIII', 12), ('XV', 14), ...]

Enter action: 2
---------------------------------------------------------------------- 

Game phase: GamePhase.TRICKS_PLAYING
Selected contract: Contract.KLOP
Current player: 1

Player cards: [('IIII', 3), ('V', 4), ('IX', 8), ('XVII', 16), ...]

Trick cards: [('III', 2)]

Legal actions: [('IIII', 3), ('V', 4), ('IX', 8), ('XVII', 16), ...]

@ryanbhayward
Copy link

I've been following what Bedir and Mohammad-R have been doing. (It's not yet directly related to openspiel, but I've been writing some postscript so that I can generate board position pix for the game of Y, especially on the geodesic-dome-style boards introduced by Craige Schensted)
Screen Shot 2020-07-02 at 12 19 52 PM
Screen Shot 2020-07-02 at 12 20 56 PM

@christianjans
Copy link
Contributor

I just have added the standard implementation of the game Clobber.

It supports up to a 99x26 board (99 rows, 26 columns), although boards for humans and computers are usually kept to 5x6, 8x8, and 10x10.

For this original version, beginning from a custom board is not supported in Python, and the option to invert the default checkerboard pattern of the pieces is also not implemented. However, these shouldn't be too difficult to get going and can be done soon.

Below is a full game of me (Player 1, 'x') losing to an MCTS player (Player 0, 'o') on a 4x4 board.

$ python3 open_spiel/python/examples/mcts.py --game=clobber\(rows=4,columns=4\) --player1=mcts --player2=human
Initial state:
4xoxo
3oxox
2xoxo
1oxox
 abcd

Player 0 sampled action: c1b1
Next state:
4xoxo
3oxox
2xoxo
1oo.x
 abcd

Choose an action (empty to print legal actions): a4a3
Player 1 sampled action: a4a3
Next state:
4.oxo
3xxox
2xoxo
1oo.x
 abcd

Player 0 sampled action: d2d3
Next state:
4.oxo
3xxoo
2xox.
1oo.x
 abcd

Choose an action (empty to print legal actions): c4b4
Player 1 sampled action: c4b4
Next state:
4.x.o
3xxoo
2xox.
1oo.x
 abcd

Player 0 sampled action: b2c2
Next state:
4.x.o
3xxoo
2x.o.
1oo.x
 abcd

Choose an action (empty to print legal actions): b3c3
Player 1 sampled action: b3c3
Next state:
4.x.o
3x.xo
2x.o.
1oo.x
 abcd

Player 0 sampled action: c2c3
Next state:
4.x.o
3x.oo
2x...
1oo.x
 abcd

Choose an action (empty to print legal actions): a2a1
Player 1 sampled action: a2a1
Next state:
4.x.o
3x.oo
2....
1xo.x
 abcd

Player 0 sampled action: b1a1
Next state:
4.x.o
3x.oo
2....
1o..x
 abcd

Returns: 1.0 -1.0 , Game actions: c1b1 a4a3 d2d3 c4b4 b2c2 b3c3 c2c3 a2a1 b1a1
Number of games played: 1
Number of distinct games played: 1
Players: mcts human
Overall wins [1, 0]
Overall returns [1.0, -1.0]

@tuero
Copy link
Contributor

tuero commented Jul 2, 2020

I implemented a bridged version of Boulder Dash/Emerald Mines #281 (see here for a description, and here for a playthrough). The mechanics of the game are simple: collect enough diamonds to open the exit, while avoiding enemies. This game offers many interesting challenges for AI

  • Long delays between rewards
  • Dodging enemies by exploiting their predictive movement
  • Dropping boulders onto enemies to reveal diamonds (understanding mechanics, consequences of actions, and timing)
  • While the agent moves through the level, dirt is removed which causes items to fall, which can result in deadlocks (blocking the exit or diamonds)
  • It is also quite fun to play :)

Many of the original elements are supported, and custom boards can be taken in as arguments. I have also converted many of the original Boulder Dash levels into the specified format, which you can find here (althrough these will be quite hard on their own for AI to tackle). I will also be adding simplified versions of the levels to that repo, which tackle different complexities the game offers while being easier to solve.

I have attached a gif below of the ASCII board.
bd_mines_small

@vbhatt-cs
Copy link
Contributor

I implemented a modified version of the Lewis Signaling Game (https://en.wikipedia.org/wiki/Lewis_signaling_game) with an option for arbitrary payoffs. This game can be used as a starting point for multi-agent communication algorithms.

With certain payoff matrices, even a game with 3 states and 3 messages can be challenging for decentralized algorithms. Below are the results for centralized and decentralized tabular Q-Learning and decentralized DQN on identity payoff matrices. Since ε is decayed linearly from 1 to 0, the curve looks nearly linear in the centralized case.

The following figures show the final joint policy for each state. The counts for each (state, action) pair denotes the number of runs that had that particular joint policy.

With the following payoff matrix, the decentralized methods fail to reach the optimal policy.

@jhtschultz
Copy link
Member

jhtschultz commented Jul 3, 2020

As part of the Hearts project, we (@nathansttt, @solinas, @jhtschultz) implemented the classic card game Hearts in OpenSpiel, open sourced the current state of the art Hearts program xinxin (https://github.com/nathansttt/hearts), and built an interface that allows xinxin to play as an OpenSpiel bot.

Example OpenSpiel Hearts state:
Screen Shot 2020-07-02 at 10 25 24 PM
Xinxin screenshot:
xinxin copy

We also added information state resampling to Hearts, which enables algorithms like IS-MCTS to run on the game. Along with xinxin, this provides another useful baseline to compare learning algorithms against. Beyond establishing the first open source benchmarks for Hearts, OpenSpiel-xinxin enables us to generate high-quality games which can be used to learn supervised policies. We wrote a python example of a script that does this. Going forward, we could use these policies to seed learning algorithms, or as a faster alternative to xinxin for move generation in MCTS rollouts.

Hearts is an important representative of the class of trick taking card games. Unlike Bridge, there are no teams in Hearts, making it a multiplayer general sum game. As such, it inherits many theoretical complexities and ambiguities. Although RL research on hearts dates back to at least 1997, it still remains a domain in which algorithms have yet to achieve superhuman performance. With this project we aim to preserve an important part of AI history, and use it to facilitate progress in the active research area of multi-agent RL.

@ElnazDavoodi
Copy link
Collaborator

ElnazDavoodi commented Jul 3, 2020

I worked on Public states sub-API to add factored-observation support. In imperfect information games, each agent gets different private information, as well as some public information. The agent's observation is factored to "private observation" and "public observation". During an imperfect information game, each agent should have a non-empty "private observation" at some states (e.g. in Kuhn-poker, agents are dealt cards and the card of each agent is its private observation), and both agents have "public observation" in all states of the game (e.g. in Kuhn-poker, both agents can observe all agent's actions such as bet, call, etc.).
Also, a version of Kuhn-poker was added which supports factored-observations.

@lanctot
Copy link
Collaborator Author

lanctot commented Jul 3, 2020

Hi all,

So other than importing PRs and running stuff, I managed to get a small but fun thing done: progress on CFR's empirical convergence outside two-player zero-sum games.

I had previously implemented the EFCCEDist and EFCEDist functions in algorithms/corr_dist.* to be analogues of NashConv: distances to extensive-form coarse-correlated equilibria (EFCCE) and to extensive-form correlated equilibria (EFCE). Motivated by my enthusiasm after reading Celli et al. 2019 and Farina et al. '2019, I wanted to know what CFR's emprical joint policy converges to.

So, during the extravaganza, I added the following (to be submitted in an update in the coming weeks): the ability to extract CFR's current joint policy and corr_dev_builder, a helper class that builds up correlation devices (distributions over joint pure policies) incrementally over time, with the ability to sample pure joint policies from CFR's stochastic policy.

I ran CFR on the general-sum variant of 3-card Goofspiel described in the above papers, the specific game string being turn_based_simultaneous_game(game=goofspiel(num_cards=3,points_order=descending,returns_type=total_points)) . This graph shows the EFCCEDist and EFCEDist of CFR's joint average policy (to build this I sampled 100 joint policies per CFR's mixed strategy and weighed them all equally):

image

And this graph shows the expect payoffs to each player:

image

@BedirT
Copy link
Contributor

BedirT commented Jul 3, 2020

Hi all,

I took the first steps on modified AlphaZero. I added the first version for alpha-beta search functionality instead of MCTS on AlphaZero. The main idea is to be able to compare the results and see the differences between the search methods. The policy is not used in the search yet (to be added), for now only values are used. It is significantly slower, and depth limited, therefore I couldn't get the results of the experiments yet, which I will be posting when I hopefully do.

@michalsustr
Copy link
Collaborator

I worked with Elnaz on adding support for factored observations, which are the basis of the Factored Observation Games (FOG) formalism. We implemented factored observations for Kuhn Poker with arbitrary number of players. We added a special PublicState interface and implemented it for Kuhn Poker as well. See Public State API for more details. This interface provides data structures and methods necessary for algorithms that solve large-scale games of imperfect information, like Heads-Up, No-Limit Texas Hold'em Poker (a game with ~10^160 states). We added python bindings that support efficient vectorized operations via numpy (resp. Eigen) arrays without copy overhead. This allows to create relatively efficient algorithms also in Python, expanding the accessibility of this research area.

We implemented a PublicState <-> State visualization that nicely shows the relationship between the game structures:

Kuhn Poker for 2 players

We have a work-in-progress implementation of the CFR algorithm that runs on the public tree. We expect to publish it within a few days, maybe even tomorrow. We also added a number of consistency tests between the PublicState and State API.

There remains much to do, but this was a nice first step. Hopefully it will be followed by contributions of many authors, and many scientific papers! :)

@mrdaliri
Copy link
Contributor

mrdaliri commented Jul 3, 2020

Hi,
During Extravaganza, I worked on getting C++ AlphaZero to work externally. The current version of AlphaZero is implemented using Tensorflow, which is known to have a tricky C/C++ API.

After many trials and errors, and with the help of many great people, I was able to fix all compile and linking errors for all four TF/C++ targets of OpenSpiel (alpha_zero library, vpnet_test, alpha_zero_example and tf_trajectories_example) and build them using Tensorflow 2.2.0. Since TF doesn't have a native C++ API (it only has C API), we utilize tensorflow_cc project to build TF using CMake and Bazel from scratch.

Right now, we still have two runtime errors reported by TF checks. I guess that they are caused by some version mismatch between external libraries (TF/Eigen) and can be fixed by spending more time digging into TF/Eigen source code and release notes.

I'll continue to work on these issues, and I hope we can have a working C++ AlphaZero in a couple of weeks. You can follow updates on this project in issue #172.

@flostim
Copy link
Collaborator

flostim commented Jul 3, 2020

Hi,

I spent most of the Extravaganza adding a GUI for Skat and Hearts to the soon to be released OpenSpiel GUI framework by @bart-devylder. They allow you to play the games against bots and see all the observations that a bot would see. Additionally they can be used to review games played by bots.

Skat GUI
skat-screenshot

Hearts GUI
hearts-screenshot

Skat Bidding
I also started working on adding full bidding to Skat. This turned out to be a bit more complicated than I thought and I will make the game configurable to allow both simplified and full bidding.

@lanctot lanctot unpinned this issue Jul 6, 2020
@lanctot lanctot closed this as completed Jul 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests