# Reinforcement learning creates categories

## Consumer choice results in elite polarization

## Hypothesis

Given a network with two node types, one that communicates and one that votes, and payoffs depending on observed coherence and vote accumulation, respectively. Agents must predict what agents of the opposite type will do based on past behavior. The update equations for _citizens_, the agents who vote, are

$$
c_{i,~t+1} \leftarrow c_{i,t} + \delta^c u^c_{i,t} \\
u^c_{i,t} = \sum_{j} \frac{c_{i,t}^{T}c_{j,t}}{\lVert c_{i,t} \rVert_2 \lVert c_{j,t} \rVert_2} \left \lvert P(c_{j,t}) - P(c_{i,t}) \right\rvert 
$$

where $\delta_c$ is the citizen learning rate. The payoff for a citizen that appears in the update equation is dependent on the signal produced by the _elite_ nodes, for whom the citizens distribute their votes and who communicate with citizens with the goal of earning the citizens' votes. The citizens gain if elites "speak their langugage"

$$
P(c_{i,t}) = A \sum_j \frac{c_{i,t}^{T}\epsilon_{j,t}}{\lVert c_{i,t} \rVert_2 \lVert \epsilon_{j,t} \rVert_2}
$$

where we'll set $A=1$, to start, but it could make a difference.

The conceptual language spoken by the elite node $i$ at time $t$ is $\epsilon_{i,t}$. This conceptual system has its own update equations based on the fraction of income $B$ that each citizen distributes each elite. 

$$
\epsilon_{i,t+1} \leftarrow \epsilon_{i,t} + \delta^\epsilon u^\epsilon_{i,t} \\
u^\epsilon_{i,t} = \sum_{j} \frac{\epsilon_{i,t}^T\epsilon_{j,t}}{\lVert \epsilon_{i,t} \rVert_2 \lVert \epsilon_{j,t}\rVert_2} \left \lvert P(\epsilon_{j,t}) - P(\epsilon_{i,t}) \right\rvert
$$

The payoff to a given elite node $i$ at a particular timestep is the sum of vote fractions received from all citizen nodes

$$
P(\epsilon_{i,t}) = \sum_j f_{ji,t} 
$$

where again $B$ is some total benefit the citizens can give away to elite nodes, which we set to one for simplicity to start, but should be explored. To close the loop, the fraction of "votes" given to elite node $i$ from citizen node $j$ is 

$$
f_{ij,t} = \frac{e^{\epsilon_{j,t}^T c_{i,t}}}{\sum_j e^{\epsilon_{j,t}^T c_{i,t}}}
$$

Note that $0 \leq f_{ij,t} \leq B$. For simplicity to start, we'll set $B=1$. Not sure it matters, but could be interesting to test.

The payoffs...

## Based on...

Iterated learning and evolution of language, see, e.g. Simon & Kirby (2008), Kalish & Griffiths (2007), and Simon (2017) for reviews. Aiming to get a dynamical model of language, and see how, to take one of many example hypotheses about the source of polarization, 

## Data analysis

After the model converges, I want to do k-NN clustering to identify groups sharing categories, like Lakoff's "strict father" and "nurturant parent" stereotypes. Lakoff admits that conceptual relationships among each of these group members differs, he asserts there is a set of features that is maximally common among the members, while also being maximally distinct from non-members. 

In [4]:
# matrix dot product testing
import numpy as np
a = np.array([[1, 1], [2, 3]])
b = np.array([[0, 1], [0, 1]])
np.linalg.norm(a.dot(b))

5.3851648071345037