# Iterative Prisoner's Dilemma


### Description

The [Prisoner's Dilemma](https://en.wikipedia.org/wiki/Prisoner%27s_dilemma) (PD) is a classical game analyzed in game theory, which is widely used to (attempt to) model social/economical interaction. It's a "dilemma" as, if exploited to explain the emergence of altruism in human or in general animal society, it fails badly at a first glance.

The classical situation-representation of the PD is that of two prisoners whose conviction depends on their mutual cooperation. It is easier understood though if illustrated in terms of a trade-off game (closed bag exachange):

*Two people meet and exchange closed bags, with the understanding that one of them contains money, and the other contains a purchase. Either player can choose to honor the deal by putting into his or her bag what he or she agreed, or he or she can defect by handing over an empty bag.*

It is obvious that for both players the winning strategy is to NOT cooperate.

Things changes when the interaction between the two individuals is iterated, in that case a more altruist attitude (strategy) is expected to emerge. The goal of this project is to test this hypothesis.

Mathematically the PD can be expressed with very basic linear algebra. The key component is the **Payoff matrix** $M$, which quantify the reward each player gets depending on whether she cooperated or not (defect):

$$
M = 
\begin{pmatrix} 
R & S \\
T & P 
\end{pmatrix}
$$

with $T,R,S,P$ integers that satisfy the following conditions:

$$
T>R>P>S; \quad 2R > T+S
$$

for example $T=3$, $R=2$, $P=1$ and $S=0$, or  $T=5$, $R=3$, $P=2$, $S=0$. Each player choice (move) can be represented by one of the two axis in ${\rm I\!R}^2$, i.e. $u_C=\begin{pmatrix} 1 \\ 0 \end{pmatrix}$ or $u_D=\begin{pmatrix} 0 \\ 1 \end{pmatrix}$, where the first coordinate stands for *Cooperate* and the second for *Defect*. Being $u_1$ and $u_2$ their rewards $r_1$ and $r_2$ can be computed then as:

$$
r_1 = u_1^T M u_2
\quad
\quad
r_2 = u_2^T M u_1
$$

In an Iterative Prisoner's Dilemma (IPD), two players play prisoner's dilemma more than once in succession and they remember previous actions of their opponent and change their strategy accordingly. The winning strategy is the one which yields to a larger reward at the end of the IPD.

The strategy can be represented as a function which outputs either $u_C$ or $u_D$. Such function can depend on the opponent's history of moves, her on history of moves, on the number of moves played till that moment and so on, but it can only be based on a probability density function. Possible strategies are:

* **Nice guy**: always cooperate (the function's output is always $u_D$)
* **Bad guy**: always defect 
* **Mainly nice**: randomly defect $k\%$ of the times and cooperate $100-k\%$, $k<50$
* **Mainly bad**: randomly defect $k\%$ of the times and cooperate $100-k\%$, $k>50$
* **tit-for-tat**: start by cooperating, then repeat what the opponent has done in the previous move 

Many more and much more complex strategies can be implemented. The strategy can even change during the IPD.


### Assignments

* Implement a simple IPD between two players implementing two given strategies. Study the evolution along the tournament confronting different strategies; study the overall outcome in the different configurations. 
* Implement a multiple players IPD (MPIPD) where several strategies play against each other in a roud-robin scheme
* Iterate what done in the previous task (repeated MPIPD, rMPIPD)  by increasing the population implementing a given strategy depending on the results that strategy achieved in the previous iteration
* (*difficult*) Implement a rMPIPD where strategies are allowed to mutate. The goal is to simulate the effect of genetic mutations and the effect of natura selection. A parameter (gene) should encode the attidue of an individual to cooperate, such gene can mutate randomly and the corresponding phenotype should compete in the MPIPD such that the best-fitted is determined.  


In [1]:
import numpy as np
import numpy.random as npr
import pandas as pd
from IPython.display import display
import it_pris_dil_func as pris_dil
import graph_func as grf

In [None]:
#Point 1)

s = ['nice','bad','m_nice','m_bad','tit_tat',
    'random','grim','f_tit_tat','sus_tit_tat',
    'pavlov','reactive_nice','reactive_bad',
    'hard_joss','soft_joss']

#s = ['nice','bad','m_nice','m_bad','tit_tat']

result = np.zeros((len(s),len(s)))
for i in range(len(s)):
    for j in range(i,len(s)):
        p1, p2 = pris_dil.fight(s[i],s[j])
        result[i,j] = p1
        result[j,i] = p2

df = pd.DataFrame(result,dtype=int,columns=s,index=s)

#Rappresentazione risultati
def colorcell(x):
    data = x.copy()
    data.loc[:,:] = ''
    for i in range(len(s)):
        for j in range(len(s)):
            if result[i,j] < result[j,i]: data.iloc[i,j] = 'color : red'
            elif result[i,j] > result[j,i]: data.iloc[i,j] = 'color : green'
            else: data.iloc[i,j] = 'color : gray'
    return data

'''sum = np.sum(result, axis=1)
df['result'] = sum
df = df.astype({'result': int})'''
df = df.style.apply(colorcell,axis=None)

display(df)


In [None]:
res1, res2 = pris_dil.fight('m_bad','tit_tat',N=50,graph=True)#,all_outcome=True)

In [None]:
grf.fight_grid(p1,p2,range=[2,30])

In [None]:
#Point 2)
'''
s = ['nice','bad','m_nice','m_bad','tit_tat',
    'random','grim','f_tit_tat','sus_tit_tat',
    'pavlov','reactive_nice','reactive_bad',
    'hard_joss','soft_joss']

s = ['nice','m_nice','reactive_nice','soft_joss',
    'bad','m_bad','reactive_bad','hard_joss',
    'tit_tat','f_tit_tat','sus_tit_tat','grim']
s_good = ['nice','m_nice','reactive_nice','soft_joss']
s_bad = ['bad','m_bad','reactive_bad','hard_joss']
s_tit_tat = ['tit_tat','f_tit_tat','sus_tit_tat']'''
s = ['nice','bad','m_nice','m_bad','tit_tat',
    'random','grim','f_tit_tat','sus_tit_tat',
    'pavlov','reactive_nice','reactive_bad',
    'hard_joss','soft_joss']

h = pris_dil.h_build([15,15,15,15,15,15,15,15,15,15,15,15,15,15])

unique, n_strategies = np.unique(h,return_counts=True)

for a,b in zip(unique,n_strategies): print(s[a],b)

unique, media, n_strat = pris_dil.round_robin(h,s,ord=True)

s_unique = [s[val] for val in unique]

grf.graph_bar(media,unique,n_strat,s)


In [None]:
#Point 3)

s = ['nice','bad','m_nice','m_bad','tit_tat',
    'random','grim','f_tit_tat','sus_tit_tat',
    'pavlov','reactive_nice','reactive_bad',
    'hard_joss','soft_joss']

 #s = ['nice','bad','m_nice','m_bad','tit_tat']
#s = ['reactive_nice','reactive_bad','pavlov','grim','hard_joss','soft_joss']


s=['bad','m_bad','tit_tat'] 
h = npr.randint(0,len(s),size=30)
#h = pris_dil.h_build([10,10,10,10,10,10,10,10,10,10,10,10,10,10])

unique, n_strategies = np.unique(h,return_counts=True)

for a,b in zip(unique,n_strategies): print(s[a],b)

iterations = 20
n_ma3, val_ma3, iterations = pris_dil.tournament(h,'update_5',s,it=iterations,n_change=2)

In [None]:
#grf.graph_average(h,val_ma3,s,iterations)
grf.graph_population(n_ma3,iterations,s)

In [None]:
grf.gif(s,n_ma3,'population_3')

In [None]:
#Point 4)

'''s = ['nice','bad','m_nice','m_bad','tit_tat',
    'random','grim','f_tit_tat','sus_tit_tat',
    'pavlov','reactive_nice','reactive_bad',
    'hard_joss','soft_joss']'''
s=['hard_joss','grim','pavlov','reactive_nice','reactive_bad','soft_joss']
    
start_s = len(s)
#n_players = 14
#h = np.zeros((2,np.sum(numbers)))
#h[0] = npr.randint(0,len(s),size=n_players)
h = pris_dil.h_build([17,16,17,17,17,16],mutation = True)
h[0:7,0] = 0.4
unique, n_strategies = np.unique(h[0],return_counts=True)

for a,b in zip(unique,n_strategies): print(s[int(a)],b)

iterations = 100
n_ma4, val_ma4, iteration = pris_dil.tournament(h,'update_5',s,it=iterations,mutation_prob=0.05, n_change = 2)

hard_joss 16
hard_joss 1
grim 16
pavlov 17
reactive_nice 17
reactive_bad 17
soft_joss 16


In [None]:
grf.graph_average(h,val_ma4,s,iteration,start_s=start_s)
grf.graph_population(n_ma4,iteration,s,start_s=start_s)