Public states support #1

michalsustr · 2019-08-27T14:24:08Z

I've looked over the code and the arxiv paper but I didn't see a mention of public states. What is the status on these? Are you planning to support them in the library? If yes, are you considering also introducing "shared" states between players? And how about the FOG formalism?

We are developing a game theoretic library at CTU as well. It seems like wasteful redundant effort to make two libraries, however for the algorithms we develop public trees / infoset trees are needed.

I'm very happy to open a discussion on this topic! :) It would save us tremendous engineering efforts if we do not have to duplicate code like this.

lanctot · 2019-08-28T01:57:33Z

Hi Michal,

You are correct, we do not currently have support for public states.

I see two possible levels of support we could provide for them (please, suggest more if you think there is a better way forward):

Have a State::PublicInformationState() string and vector for all games, similar to InformationState and Observation.
Have the implementations separate public information and private information which then get exposed via the API.

1 seems fine but I'm not sure how much it helps. I imagine most algorithms will need more than that, like the ability to enumerate private states in a public state, etc.

The problem with 2 is that it makes the implementations of the games fairly complex by default, and maybe only a subset of algorithms or research directions will require or make use of this structure. So, the question is: is the trade-off worth it? I like that the games are fairly easy to implement now, but I agree that it would be nice to support public states and the FOG formalism as well and it would be great if we could prevent overlapping engineering efforts!

I have a potential compromise: why not have a subdirectory called public_state (or FOG) which formalizes the games with these information structures, but supports multiple modes / views. One of those views could be the "history + infostate view" and then the current OpenSpiel API could be supported in this view. Another view would be the "public state view" where more information structure is available to the algorithms. It would be fantastic if we were able to support these infomation structures in the core API (and only have certain games support them) but I'm not sure how to do that in a clean way. If we can do that in a clean way, I think it's a win-win situation. I hope you agree.

I would also point this thread to the following people, who I expect would be interested in the topic: Ed Lockhart, Neil Burch, Martin Schmid, Viliam Lisy.

elkhrt · 2019-08-28T17:40:28Z

One possibility would be to create a new class (both Python and C++) called FOGame / FOState, which needs sub-classes to implement a factored observation, and which implements the Game & State interface in terms of it.

Then we could have algorithms which depend on the FOG formalism operate only on FOGames, but on the other hand things implemented as FOGames could be used by all the existing algorithms. WDYT?

Hacing done that, we could move relevant games over to the FOG formalism one at a time with zero impact.

finbarrtimbers · 2019-08-28T21:00:22Z

Another possibility is to do 1. as Marc proposed, but then add a PublicTree class that is similar to the HistoryTree class which could provide the methods we want. That would make it easy to add the following methods:

Enumerate private states for a given public state
Calculate the range transition from one node to a child.

This seems relatively clean to me, as it could sit on top of the existing states, while not requiring them to be implemented differently.

michalsustr · 2019-09-03T14:21:21Z

Do you want to support games with imperfect recall? I don't think the RL community is interested in such games, and if we drop imperfect recall implementation will become much more simple.

I'd suggest that State should have direct support for private and public observations, i.e. State should be an FOState. FOState is more general and most games can be implemented as a special case, for example with only one "stochastic" outcome, or only one player acting in given state. All the technical points about FOGs are explained in the paper https://arxiv.org/abs/1906.11110, as you're probably aware.

I'd make this discussion about two points:

How to make forward mappings, i.e. "History -> Public state" ?
How to make inverse mappings, i.e. "Public state -> History" ?

1) Forward mappings

If we drop imp. recall, we can have information sets be based solely on sequence of action-private observation histories (AOH).
Public states can based on sequence of public observations.

These can be functions implemented in the state, returning vectors of AOH or PubObs.

2) Inverse mappings

How do you currently define mapping of "Information set -> Histories"? Your answer to this question can guide the design to the similar question of implementing

"Public state -> Histories"
"Public state -> Player's infosets"
"Public state -> Opponents's (aug)infosets"

In GTLib2 these transformations are done in following fashion:

There is a "cache" which stores these partial maps, and the mappings may be incomplete for sampling algorithms.
It is possible to use a domain-independent algorithm that recreates the infoset/public state by targetted "sampling" (expansion of the tree).
Use a domain-dependent algorithm (more efficient, but harder to write).

lanctot · 2019-09-03T18:42:40Z

Hi Michal,

Thanks for the pointers to FOG.

Yes, we do want to support imperfect recall. It has been an important abstraction technique for scaling to large games such as Texas Hold'em. I disagree that it would not be important to RL researchers. State aggregation is a major research topic and depending how algorithms map observations to policies, several algorithms/agents may want to purposely merge observation histories.

We currently do not have mapping of infostates to histories.

We are happy to support public states and richer structure, but we do not plan to make such major changes to the core API at this time. I suggest that we discuss one of the compromises suggested by myself, Ed, and Finbarr. We can add a subdirectory for now and develop a sub-API that supports extensions to address these new formalisms, and we can discuss generalizing the core API once we are convinced that the implementations of the games are not too complex to be worth the trade-offs.

michalsustr · 2019-09-10T13:16:15Z

Thanks for all the effort you're putting in, and for being open to discuss these! I appreciate it a lot.

I took some time to implement something in the lib to gain more understanding.

Yes, we do want to support imperfect recall. It has been an important abstraction technique for scaling to large games such as Texas Hold'em.

I think there was a misunderstanding. By imperfect recall I didn't mean how the algorithm decides what to do with observations, and how to compute the strategy (i.e. to do some imp. recall abstraction). I meant that the environment imposes this onto agents, and that it is common knowledge. For example, in the real world somebody comes in, takes the player and forcibly erases his memory, and all players know when this happens in the game and adjust their strategies to account for this.

The "strategic imperfect recall" you talk about is fine with FOGs, but "enviromental imperfect recall" where judge erases memory is not, because it makes the game non-timeable. If the player wears a watch, he can distinguish the histories because of different timing, and therefore the environment cannot force this "enviromental imperfect recall".

About the implementation:

I understand you don't want to add this to the Core API, and would like to take time to design these things properly.

I think a subdir or an experimental branch is fine. I have some input you might consider useful.

Motivation

These additional structures (infoset trees, public trees) are introduced so that algorithms can take efficient advantage of them -- the aim is mainly towards MC algorithms which may need arbitrary forward/inverse mappings. These are the particular use cases I have currently in mind:

Monte Carlo algorithms: When playing with MC, only parts of the history tree are sampled, and only equivalent parts of infoset forest/public tree are sampled (via State method calls). The inverse mappings do not provide all of the consistent histories. In large games it may happen that we are told by the judge we are in an infoset for which we have no histories (the inverse mapping is empty). We may need to sample consistent histories with this new infoset to be able to continue in the game.
MC generating histories: We may impose a memory limit on how many histories we store per infoset (1000), and if the infoset is not built completely we can sample additional consistent histories one by one, up to this (computational) limit.
Public state variance reduction: Terminals consistent with a particular public state trajectory can serve as a baseline for the control variates. We need to recover histories which belong to the leaf public state.
Continual resolving (CR): When using CR in conjuction with MC, we may not sample some infosets within a public state. We may want to recover these infosets to construct gadget game and resolve from it.

Games

This can be an experimental subset of games that implements this new API. I suggest Poker, II-goofspiel and PTTT:

Poker has nice public states and infosets, and it's easy to make a domain-dependent implementation that either recovers the histories, or works on a specialized form (the previous CFR papers with vectorized utilities / public state var. reductions, etc.)
II-goofspiel has nice structure of public states (3 possible outcomes per round), but nasty infosets, so it's non-trivial to infer histories within an infoset.
PTTT has the unusual aspect of player moving repeatedly in his round, resulting into "thick" public states. The infosets are also non-trivial.

Additionally I'd like to suggest Stratego (currently it's not implemented in OpenSpiel, but we have code for it in GTLib). The public states are identified by the moves in the game and revealed ranks of figures in combat, and the size of information sets is decreasing towards the end of the game.

For even more challenging game, we can use variants Kriegspiel (we have code for it in GTLib).

Game implementation

I think it is sufficient to introduce State::PublicInformationState() which builds up on PublicObservation() method implemented by the game, which can be experimentally done via subclassing as suggested by @locked-deepmind.

I'll discuss the modes / views @lanctot talks about in the following section about structures. I think @finbarrtimbers's notion of HistoryTree / PublicTree can be abstracted into an unified approach:

Structures

The structures in the games are all forests (history tree, infoset forest per player, public state tree). They can be modeled as such - Node which has one parent (and can remember him).
This can be all template-based code (i.e Node<History>, Node<Infoset>, Node<PubState>), which will have a generic hashcode/equals (by traversing up the parent until the root).
I'd like to stress we are talking about forests (so there are multiple root nodes): "infoset forest" has no chance nodes, the initial chance outcomes and their associated observations create roots of the trees (for example particular hand cards in poker).

The children of nodes will be provided by an expander.

Expander for history tree is simple - it's just applying LegalActions of a State. For the other cases, the expander is something algorithm can call externally (independetly) from the Node.

There may be multiple versions of expanders:

a generic expander which traverses the history tree and may expand the states inefficiently
a domain specific expander

Expanders can be written using templates specified for a game. If there exists a domain specific expander it can be implemented as a template specialization.
Alternatively, expander can be injected to the algorithm along with the game instance.

There should be a possibility to use a node cache, where the expander will be called only for non-cached (non-expanded) children.

Edge between parent and child Node<X> corresponds to:

Action for Node<State>
(Action, Private observation) for Node<InformationState>
Public observation for Node<PublicState>

It may be difficult to even know how many children there are for the node! So this can be also part of the expander definition.

I found it useful to think about these structures in somewhat categorical terms [footnote 1] : The tree (EFG/infoset forest/public tree) represents a category which is a cone. Fully expanded tree is apex of the cone, and the morphisms represent all the consistent ways that a tree can be in a partial, not-fully-expanded state.

Mappings

The forward/inverse mappings I wrote previously are parametrized functors (parametrized by the current cache) from one cone to another. The mapping cache may contain only partial mapping, similarly how node cache had non-expanded children.

Similarly to expander, mappings can be again supplied domain-independently via templates and domain-specifically via template specializations.

I have some code which does the above in GTLib, but it's not fully finished (I had to rewrite it several times to be satisfied).

I'm happy discuss this in more details and hope I didn't forget anything important.

Thanks again. I think you're doing a great service by leading the way towards a unified framework for games!

[footnote 1] I'm not a category theorist, so very likely I have made mistakes in this formulation, but hopefully it gives at least some insight.

lanctot · 2019-09-13T15:56:25Z

As I stated in the email, I am mostly ok with this. Seems like a good compromise.

The only thing I'd say is avoid the templated types, because (reasons I'm sure are clear to you now). There's probably a better way to do it than templates, like using OOP. Maybe not.

Unofrtunately, I'm not sure I have enough cycles to be able to respond as quickly as you're working. You might want to consider forking and contributing upstream over time. I'll follow-up by personal email.

updating from base

Update from master

merge changes from official repo into fork

test Julia wrapper on mac

Xinxin return check

Updating fork

Fix ALE link in games.md

Pulling upstream

Add recent changes

michalsustr mentioned this issue Sep 13, 2019

Question: information state targetting #49

Closed

elkhrt mentioned this issue Sep 15, 2019

Laser tag #56

Merged

OpenSpiel pushed a commit that referenced this issue Sep 23, 2019

Merge pull request #1 from deepmind/master

cb95396

updating from base

jblespiau added the contribution welcome It's a nice feature! But we do not have the time to do it ourselves. Contribution welcomed! label Oct 1, 2019

lanctot closed this as completed Oct 17, 2019

OpenSpiel pushed a commit that referenced this issue Nov 25, 2019

Merge pull request #1 from deepmind/master

185d777

Update from master

OpenSpiel pushed a commit that referenced this issue Dec 2, 2019

Merge pull request #1 from deepmind/master

cdd51cb

merge changes from official repo into fork

OpenSpiel pushed a commit that referenced this issue Jan 14, 2020

Mac support with julia (#1)

f79238e

test Julia wrapper on mac

OpenSpiel pushed a commit that referenced this issue Jun 26, 2020

Merge pull request #1 from solinas/xinxin-return-check

901f0ef

Xinxin return check

OpenSpiel pushed a commit that referenced this issue Apr 26, 2021

Merge pull request #1 from deepmind/master

8884f15

Updating fork

GaoZiHong mentioned this issue Oct 19, 2022

A question about laser_tag #953

Closed

OpenSpiel pushed a commit that referenced this issue Nov 24, 2022

Merge pull request #1 from gregdeon/patch-1

3f2f24d

Fix ALE link in games.md

lanctot pushed a commit that referenced this issue Jul 4, 2023

Merge pull request #1 from deepmind/master

5eab354

Pulling upstream

lanctot pushed a commit that referenced this issue May 31, 2024

Merge pull request #1 from google-deepmind/master

0ef1ed4

Add recent changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Public states support #1

Public states support #1

michalsustr commented Aug 27, 2019

lanctot commented Aug 28, 2019

elkhrt commented Aug 28, 2019

finbarrtimbers commented Aug 28, 2019

michalsustr commented Sep 3, 2019

lanctot commented Sep 3, 2019 •

edited

michalsustr commented Sep 10, 2019 •

edited

lanctot commented Sep 13, 2019 •

edited

Public states support #1

Public states support #1

Comments

michalsustr commented Aug 27, 2019

lanctot commented Aug 28, 2019

elkhrt commented Aug 28, 2019

finbarrtimbers commented Aug 28, 2019

michalsustr commented Sep 3, 2019

1) Forward mappings

2) Inverse mappings

lanctot commented Sep 3, 2019 • edited

michalsustr commented Sep 10, 2019 • edited

Motivation

Games

Game implementation

Structures

Mappings

lanctot commented Sep 13, 2019 • edited

lanctot commented Sep 3, 2019 •

edited

michalsustr commented Sep 10, 2019 •

edited

lanctot commented Sep 13, 2019 •

edited