Game Theory II: Extensive Form
=======

Shane Steinert-Threlkeld  
https://www.shane.st  
S.N.M.Steinert-Threlkeld AT uva DOT nl  

Last Time
----

* Games in _normal form_: simultaneous action of multiple agents
* Solution concepts: Pareto optimality, Iterated Removal of Dominante Strategies, Nash equilibria
    - tension between individual and collective rationality
* Searching for NE is hard

Sequential Games
--------

![AlphaGo Nature paper](imgs/AlphaGo_cover.jpg)

credit: Nature Publishing Group

Imperfect Information
-------

![Phil Ivey](imgs/philivey.jpg)

An Example Game
------

> A brother and sister can share two cookies.  First, the sister suggests a division of the cookies; the brother accepts (in which case they get the suggested outcome) or rejects.

![Sharing Game](imgs/SLB_5.1.png)

Source: Shoham and Leyton-Brown

Extensive Form, Defined
------

An _extensive form game of perfect information_ is a tuple $\langle N, A, H, Z, \chi, \rho, \sigma, \{ u_i \} \rangle$: 

* $N$: set of $n$ players
* $A$: a _single_ set of actions

* $H$: non-terminal, aka _choice_ nodes
* $Z$: terminal (aka final) nodes  
    Note: $Z \cap H = \emptyset$

* $\chi : H \to \mathcal{P}(A)$: action function  
    Assigns to each choice node the set of _available actions_ at that node.  
    
* $\rho : H \to N$: player function  
    Assigns to each choice node the player who gets to choose (i.e. whose "turn" it is) at that node.

* $\sigma : H \times A \to H \cup Z$: the successor function, i.e. what provides the tree structure

    (To enforce tree: if $\sigma(h_1 , a_1) = \sigma(h_2 , a_2)$, then $h_1 = h_2$ and $a_1 = a_2$.)

* $u_i : Z \to \mathbb{R}$: agent $i$'s utility function

    Note: assigns utility only to _terminal_ nodes.

Strategies and Equilibria
---------

* $S_i := \prod_{h \in H : \rho(h) = i} \chi(h)$

    In other words: player $i$'s _pure_ strategies specify one chosen action (element from $\chi(h)$) at each of that player's choice nodes (those where $\rho(h) = i$).

* Best Response and Nash Equilibrium: exactly as in the normal form case.

Converting to Normal Form
-------

![](imgs/SLB_5.2.png)

Source: Shoham and Leyton-Brown

![](imgs/SLB_5.3.png)

Source: Shoham and Leyton-Brown

![](imgs/SLB_5.4.png)

Source: Shoham and Leyton-Brown

Pure Strategy NE
------

**Theorem.** Every finite extensive-form game of perfect information has a _pure strategy_ Nash equilibrium.

Recall from last lecture: normal form games in general are only guaranteed to have a Nash equilibrium in _mixed_ strategies.

Extensive vs. Normal Form
-------

So: is the extensive form just a graphically nicer way of looking at a normal form game?  Yes and no.

* Much more compact representation (exponentially so)
* _Novel solution concepts_ that exploit the temporal structure

A Non-Credible Threat?
------

![](imgs/SLB_5.5.png)

Source: Shoham and Leyton-Brown

Subgame Perfect Equilibrium
------

* Subgame of $G$ rooted at $h$: restrict $G$ to only $h$ and its descendants.

* $s$ is a _subgame perfect equilibrium_: for any subgame $G'$ of $G$, $s$ restricted to $G'$ is a Nash equilibrium of $G'$

* $\{ (B, H), (C, E) \}$ is _not_ subgame perfect, since $H$ is a not a N.E. of the game restricted to $1$'s last choice node.  
    This captures the sense in which the threat is not credible.

Backward Induction
-----

Intuitively:
* start at terminal nodes
* traverse "up" the tree, assigning the maximum attainable utility at each node
* lifts utility from terminal to all nodes
* equilibrium: at each choice node, take an action with highest utility!

Backward Induction: pseudo-code
------

In [None]:
import numpy as np

def backward_induction(game, node):
    # base case of recursion: terminal nodes
    if node in game.terminal:
        return game.utility(node)
    # assign utility for each agent to the node
    best_util = [-np.inf]*game.num_players
    for action in game.actions(node):
        util_of_action = backward_induction(game.successor(node, action))
        # assign utility that's best for the choosing player
        if util_of_action[game.player(node)] > best_util(game.player(node)):
            best_util = util_of_action
    return best_util

Backward Induction
-----

The strategy computed by backward induction (i.e. agents choose an action which maximizes the utilities computed by BI) will always be _subgame perfect_.

Backward Induction: Problems, I
-------

![AlphaGo Nature paper](imgs/AlphaGo_cover.jpg)

Source: Nature Publishing Group

In principle, Backward Induction actually can solve Go!

Unfortunately: the corresponding game tree has more nodes than there are atoms in the universe.

AlphaGo's insight: use neural networks, trained by reinforcement learning, to prune the tree, only visiting nodes that are likely to have high utility.

Backward Induction: Problems, II
-------

![](imgs/SLB_5.9.png)

Source: Shoham and Leyton-Brown

Imperfect Information
-------

![Phil Ivey](imgs/philivey.jpg)

An Example Game
-----

![](imgs/SLB_5.10.png)

Source: Shoham and Leyton-Brown

Definition of Imperfect Information
----------

An extensive game of _imperfect information_ is a game of perfect information together with
* for each agent, an _equivalence relation_ $I_i$ on $\{ h : \rho(h) = 1 \}$, such that:  
    $$h_1 I_i h_2 \Rightarrow \chi(h_1) = \chi(h_2)$$
    $h_1 I_i h_2$, intuitively: player $i$ cannot distinguish between being at node $h_1$ and $h_2$  
    A dashed line between $h_1$ and $h_2$ indicates that $h_1 I_i h_2$.

Strategies and Equilibria
-----

* Pure strategies: instead of choosing at action at each choice node, specify an action at each _equivalence class_ (a.k.a. information set)
* Writing $[ h ]_i := \{ h' : h I_i h' \}$ and $H_i := \{ [ h ]_i : \rho(h) = i \}$:
    $$S_i := \prod_{H_i} \chi(H_i)$$
    where $\chi(H_i)$ is well-defined thanks to the condition that indistinguishable nodes have the same action set.

Another Example of Imperfect Information
----------

TODO: get tikzpicture in notebook; copy tree from Stockholm slides of MAD