Signaling Games
=======

Shane Steinert-Threlkeld  
https://www.shane.st  
S.N.M.Steinert-Threlkeld AT uva DOT nl  

In [1]:
%%HTML
<style type="text/css">
.rendered_html tbody tr td:first-child {
    border-right: 1px solid black;
}
    
.rendered_html table {
    font-size: 28px;
}
</style>

Last Time
-----

Low rationality game theory:
* Evolution:
    - stable strategies, replicator dynamics
* Learning:
    - reinforcement

Today
------

_Signaling games_: simple models for the emergence of communication
* evolution
* reinforcement

Afterwards: overview of the course final project, replicating a recent deep RL approach to signaling

A Deep Philosophical Issue
-------

Can language be a _convention_?

* Quine: NO! Language is required to establish conventions.

* Lewis: yes. Language can arise via coordination on Nash equilibria in _signaling games_.

    - for him: coordination via salience; later: evolution and learning

A Motivating Story
------

![](imgs/Paul_Revere_Ride.jpg)

Simplest Signaling Game
------

It is an example of a _Bayesian game_, i.e. it has random moves by a "third player" that we call "Nature".

The Sender has private information that it needs to communicate to a Receiver, who can act in the world, and with whom she shares common interest.

Simplest Signaling Game
------

1. Nature chooses one of the possible states $s \in S$ of the world, and informs Sender what state obtains.
2. The Sender chooses one of its messages $m \in M$ and sends it to the Receiver.
3. The Receiver sees $m$ (but _not_ $s$), and chooses an action $a \in A$.
4. Both players "win" if the Receiver chooses the "right" action.

| &#160; | $a_1$ | $a_2$ |
| ----- | ----- | ----- |
| $s_1$ | 1, 1 | 0, 0 |
| $s_2$ | 0, 0 | 1, 1 |

2x2 Signaling Game in Extensive Form
------

![](imgs/signaling_extensive.png)

2x2 Signaling Game in Normal Form
-----

![](imgs/signaling_normal.png)

Nash Equilibria
-----

* $(\sigma_1, \rho_1)$
* $(\sigma_2, \rho_2)$

* All combinations of $\sigma_3, \sigma_4$ and $\rho_3, \rho_4$: these are called "pooling" or "babbling" equilibria

General Case: Three Types of Nash
------

![](imgs/signaling_pooling.png)

![](imgs/signaling_partialpool.png)

![](imgs/signaling_signaling.png)

ESS in Signaling Games
-----

To apply the concept of ESS, we have to _symmetrize_ the game.  Now strategies are _pairs_ $(\sigma, \rho)$ of a sender and receiver strategy.  Assuming agents are paired uniformly at random, in each role half the time, we have:

$$u((\sigma, \rho), (\sigma', \rho')) = \frac{1}{2}(u(\sigma, \rho') + u(\sigma', \rho))$$

In the basic game: only the _signaling systems_ (which are the strict Nash equilibria) are ESS.

**Theorem (Warneryd 1993).** $(\sigma, \rho)$ is an ESS if and only if it is a signaling system.

Replicator Dynamic
-----

**Theorem (Huttegger 2007).** In the 2x2 signaling game, the replicator dynamic converges to a signaling system (modulo a set of measure 0).

**Theorem (Huttegger 2007, Pawlowitsch 2008).** When $n > 2$, the replicator dynamic can converge to partial pooling equilibria.

Reinforcement Learning
--------

In [None]:
import numpy as np

def belief_to_prob(beliefs, temp=None):
    # if temp is specified, do a soft-max
    if temp:
        beliefs = np.exp(beliefs / temp)
    return beliefs / np.sum(beliefs)

def choose(beliefs, temp=None):
    return np.random.choice(range(len(beliefs)), p=belief_to_prob(beliefs, temp=temp))

In [None]:
N = 2
# number of states and actions
sender = np.ones((N, N))  # rows = states, cols = messages
receiver = np.ones((N, N))  # rows = messages, cols = actions

utility = np.eye(N)

for _ in range(2000):
    state = np.random.randint(N)  # get state
    message = choose(sender[state])  # get message
    action = choose(receiver[message])  # get act
    reward = utility[action, state]  # get reward
    sender[state, message] += reward  # reinforce sender
    receiver[message, action] += reward  # reinforce receiver
    
print(sender)
print(receiver)

Reinforcement Learning
-----

This simple form of RL (sometimes called Roth-Erev) _provably converges_ to one of the two signaling systems, and to each one with probability 0.5.

This guarantee does not hold once $N > 2$.

Richer Settings for Signaling
------

* More senders, more receivers, more states

![](imgs/signaling_partialcommon.png)

![](imgs/Zollman2005.png)

Zollman 2005, "Talking to Neighbors"

![](imgs/voronoi.png)

![](imgs/OConnor2014.png)

O'Connor 2014, "The Evolution of Vagueness"

In [None]:
%%HTML

<iframe width="560" height="315" src="https://www.youtube.com/embed/liVFy7ZO4OA?start=57" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

<p>Mordatch and Abbeel 2018, "Emergence of Grounded Compositional Language in Multi-Agent Populations"</p>

Wrapping Up
-------

Signaling games provide tractable models to explore the emergence of communication.

In simple cases, we can prove results about the resulting systems.

There is increased interest in AI in learning dynamics of signaling games in richer environments (see, e.g. the [2017](https://sites.google.com/site/emecom2017/) and [2018](https://sites.google.com/site/emecom2018/home?authuser=0) Emergent Communication Workshops at NeurIPS).

You'll get some hands-on experience with this kind of work in your final project!