-
Notifications
You must be signed in to change notification settings - Fork 895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Public states support #1
Comments
Hi Michal, You are correct, we do not currently have support for public states. I see two possible levels of support we could provide for them (please, suggest more if you think there is a better way forward):
1 seems fine but I'm not sure how much it helps. I imagine most algorithms will need more than that, like the ability to enumerate private states in a public state, etc. The problem with 2 is that it makes the implementations of the games fairly complex by default, and maybe only a subset of algorithms or research directions will require or make use of this structure. So, the question is: is the trade-off worth it? I like that the games are fairly easy to implement now, but I agree that it would be nice to support public states and the FOG formalism as well and it would be great if we could prevent overlapping engineering efforts! I have a potential compromise: why not have a subdirectory called public_state (or FOG) which formalizes the games with these information structures, but supports multiple modes / views. One of those views could be the "history + infostate view" and then the current OpenSpiel API could be supported in this view. Another view would be the "public state view" where more information structure is available to the algorithms. It would be fantastic if we were able to support these infomation structures in the core API (and only have certain games support them) but I'm not sure how to do that in a clean way. If we can do that in a clean way, I think it's a win-win situation. I hope you agree. I would also point this thread to the following people, who I expect would be interested in the topic: Ed Lockhart, Neil Burch, Martin Schmid, Viliam Lisy. |
One possibility would be to create a new class (both Python and C++) called FOGame / FOState, which needs sub-classes to implement a factored observation, and which implements the Game & State interface in terms of it. Then we could have algorithms which depend on the FOG formalism operate only on FOGames, but on the other hand things implemented as FOGames could be used by all the existing algorithms. WDYT? Hacing done that, we could move relevant games over to the FOG formalism one at a time with zero impact. |
Another possibility is to do 1. as Marc proposed, but then add a PublicTree class that is similar to the HistoryTree class which could provide the methods we want. That would make it easy to add the following methods:
This seems relatively clean to me, as it could sit on top of the existing states, while not requiring them to be implemented differently. |
Do you want to support games with imperfect recall? I don't think the RL community is interested in such games, and if we drop imperfect recall implementation will become much more simple. I'd suggest that State should have direct support for private and public observations, i.e. State should be an FOState. FOState is more general and most games can be implemented as a special case, for example with only one "stochastic" outcome, or only one player acting in given state. All the technical points about FOGs are explained in the paper https://arxiv.org/abs/1906.11110, as you're probably aware. I'd make this discussion about two points:
1) Forward mappingsIf we drop imp. recall, we can have information sets be based solely on sequence of action-private observation histories (AOH). These can be functions implemented in the state, returning vectors of AOH or PubObs. 2) Inverse mappingsHow do you currently define mapping of "Information set -> Histories"? Your answer to this question can guide the design to the similar question of implementing
In GTLib2 these transformations are done in following fashion:
|
Hi Michal, Thanks for the pointers to FOG. Yes, we do want to support imperfect recall. It has been an important abstraction technique for scaling to large games such as Texas Hold'em. I disagree that it would not be important to RL researchers. State aggregation is a major research topic and depending how algorithms map observations to policies, several algorithms/agents may want to purposely merge observation histories. We currently do not have mapping of infostates to histories. We are happy to support public states and richer structure, but we do not plan to make such major changes to the core API at this time. I suggest that we discuss one of the compromises suggested by myself, Ed, and Finbarr. We can add a subdirectory for now and develop a sub-API that supports extensions to address these new formalisms, and we can discuss generalizing the core API once we are convinced that the implementations of the games are not too complex to be worth the trade-offs. |
Thanks for all the effort you're putting in, and for being open to discuss these! I appreciate it a lot. I took some time to implement something in the lib to gain more understanding.
I think there was a misunderstanding. By imperfect recall I didn't mean how the algorithm decides what to do with observations, and how to compute the strategy (i.e. to do some imp. recall abstraction). I meant that the environment imposes this onto agents, and that it is common knowledge. For example, in the real world somebody comes in, takes the player and forcibly erases his memory, and all players know when this happens in the game and adjust their strategies to account for this. The "strategic imperfect recall" you talk about is fine with FOGs, but "enviromental imperfect recall" where judge erases memory is not, because it makes the game non-timeable. If the player wears a watch, he can distinguish the histories because of different timing, and therefore the environment cannot force this "enviromental imperfect recall". About the implementation: I understand you don't want to add this to the Core API, and would like to take time to design these things properly. I think a subdir or an experimental branch is fine. I have some input you might consider useful. MotivationThese additional structures (infoset trees, public trees) are introduced so that algorithms can take efficient advantage of them -- the aim is mainly towards MC algorithms which may need arbitrary forward/inverse mappings. These are the particular use cases I have currently in mind:
GamesThis can be an experimental subset of games that implements this new API. I suggest Poker, II-goofspiel and PTTT:
Additionally I'd like to suggest Stratego (currently it's not implemented in OpenSpiel, but we have code for it in GTLib). The public states are identified by the moves in the game and revealed ranks of figures in combat, and the size of information sets is decreasing towards the end of the game. For even more challenging game, we can use variants Kriegspiel (we have code for it in GTLib). Game implementationI think it is sufficient to introduce State::PublicInformationState() which builds up on PublicObservation() method implemented by the game, which can be experimentally done via subclassing as suggested by @locked-deepmind. I'll discuss the modes / views @lanctot talks about in the following section about structures. I think @finbarrtimbers's notion of HistoryTree / PublicTree can be abstracted into an unified approach: StructuresThe structures in the games are all forests (history tree, infoset forest per player, public state tree). They can be modeled as such - Node which has one parent (and can remember him). The children of nodes will be provided by an expander. Expander for history tree is simple - it's just applying LegalActions of a State. For the other cases, the expander is something algorithm can call externally (independetly) from the Node. There may be multiple versions of expanders:
Expanders can be written using templates specified for a game. If there exists a domain specific expander it can be implemented as a template specialization. There should be a possibility to use a node cache, where the expander will be called only for non-cached (non-expanded) children. Edge between parent and child
It may be difficult to even know how many children there are for the node! So this can be also part of the expander definition. I found it useful to think about these structures in somewhat categorical terms [footnote 1] : The tree (EFG/infoset forest/public tree) represents a category which is a cone. Fully expanded tree is apex of the cone, and the morphisms represent all the consistent ways that a tree can be in a partial, not-fully-expanded state. MappingsThe forward/inverse mappings I wrote previously are parametrized functors (parametrized by the current cache) from one cone to another. The mapping cache may contain only partial mapping, similarly how node cache had non-expanded children. Similarly to expander, mappings can be again supplied domain-independently via templates and domain-specifically via template specializations. I have some code which does the above in GTLib, but it's not fully finished (I had to rewrite it several times to be satisfied). I'm happy discuss this in more details and hope I didn't forget anything important. Thanks again. I think you're doing a great service by leading the way towards a unified framework for games! [footnote 1] I'm not a category theorist, so very likely I have made mistakes in this formulation, but hopefully it gives at least some insight. |
As I stated in the email, I am mostly ok with this. Seems like a good compromise. The only thing I'd say is avoid the templated types, because (reasons I'm sure are clear to you now). There's probably a better way to do it than templates, like using OOP. Maybe not. Unofrtunately, I'm not sure I have enough cycles to be able to respond as quickly as you're working. You might want to consider forking and contributing upstream over time. I'll follow-up by personal email. |
merge changes from official repo into fork
I've looked over the code and the arxiv paper but I didn't see a mention of public states. What is the status on these? Are you planning to support them in the library? If yes, are you considering also introducing "shared" states between players? And how about the FOG formalism?
We are developing a game theoretic library at CTU as well. It seems like wasteful redundant effort to make two libraries, however for the algorithms we develop public trees / infoset trees are needed.
I'm very happy to open a discussion on this topic! :) It would save us tremendous engineering efforts if we do not have to duplicate code like this.
The text was updated successfully, but these errors were encountered: