# Colonel Blotto Implementation
(based on the "Open-ended Learning in Symmetric Zero-sum Games" paper)

## 1. Introduction

**Colonel Blotto** is a resource allocation game that is often used as a model for electoral competition. Each of two players has a budget of $c$ coins which they simultaneously distribute over a fixed number of areas. Area $a_i$ is won by the player with more coins on $a_i$. The player that wins the most areas wins the game. Since Blotto is not differentiable, maximum a posteriory policy optimization (MPO) ("Maximum a Posteriori Policy Optimisation", Abdolmaleki et al., 2018) is used as best response oracle. MPO is an inference-based policy optimization algorithm; many other reinforcement learning algorithms could be used.

For this implmentation,

- $c = 10$
- $i = 3$
- $k = 1000$ games

## 2. Code Implementation of the game

Importing all necessary packages

In [1]:
import numpy as np
from random import randint
from scipy.optimize import linprog

*blotto(A,B)*: Play a single colonel blotto battle between strategy A and B

- *Input:* strategy A and B (each strategy is a vector of 3 numbers that add up to 10)
- *Output:* 

    - 1 if A wins
    - -1 if B wins
    - 0 if tie

In [14]:
def blotto(A,B):
    """execute a colonel blotto battle between A & B players. Note that both A and B are lists of numbers."""
    #to check if the length of both players are the same
    if(len(A) != len(B)):
       print('fronts mismatch')
       return
    
    battles = [0,0]

    for i in range(len(A)):
        if A[i] > B[i]: battles[0] += 1
        if A[i] < B[i]: battles[1] += 1

    if battles[0] > battles[1]: 
        return 1
    if battles[0] < battles[1]: 
        return -1
    if battles[0] == battles[1]: 
        return 0

*payoff_matrices(A_strategies, B_strategies)*:

*Input*: a set of strategies from A and a set from B

*Output*: payoff matrices for A and B based on different combination of strategies

In [19]:
def payoff_matrices(A_strategies, B_strategies):
    '''Generate individual payoff matrix for both A and B based on different combination of strategies.'''
    nA = len(A_strategies)
    nB = len(B_strategies)
    matrixA = numpy.zeros((nA,nB))
    matrixB_temp = numpy.zeros((nA,nB))
    for i in range(nA):
        for j in range(nB):
            if blotto(A_strategies[i],B_strategies[j]) == 1:
                matrixA[i][j] = 1
                matrixB_temp[i][j] = -1
            elif blotto(A_strategies[i],B_strategies[j]) == -1:
                matrixA[i][j] = -1
                matrixB_temp[i][j] = 1
    matrixB = matrixB_temp.T
    return matrixA, matrixB #as this is a zero-sum game, matrixA + matrixB.T = zero matrix

Example 1:

- Initialize 2 strategies for both A and B each.
- Generate the payoff matrices.

In [16]:
def initialise_strategy(n = 10, size = 3):
    #initialise a random strategy
    strategy = np.zeros((size,))
    strategy[0] = randint(0,10)
    strategy[1] = randint(0,10-strategy[0])
    strategy[2] = 10 - strategy[0] - strategy[1]
    return strategy

In [21]:
A_policies = [initialise_strategy(), initialise_strategy()]
B_policies = [initialise_strategy(), initialise_strategy()]
matrixA,matrixB = payoff_matrices(A_policies, B_policies)
print(A_policies)
print(B_policies)
print(matrixA)
print(matrixB)

[array([2., 2., 6.]), array([5., 5., 0.])]
[array([1., 5., 4.]), array([2., 6., 2.])]
[[ 1.  0.]
 [ 0. -1.]]
[[-1.  0.]
 [ 0.  1.]]


-------------------

*Nash_eq(A)*:
- Input: matrix A of a population
- Output: Nash equilibrium p

In [27]:
def Nash_eq(A):
    '''Input: matrix A of a population
        Output: Nash equilibrium'''
    n = A.shape[0]
    A_ub = np.concatenate((-A.T,[np.ones(n,),-np.ones(n,)]), axis = 0)
    b_ub = np.append(np.zeros((n,)),[1,-1])
    soln = linprog(c = np.zeros((n,)), A_ub = A_ub, b_ub = b_ub, bounds = (0,1))
    return soln.x

RPS example:

In [30]:
A = np.array([[0,1,-1],[-1,0,1],[1,-1,0]])
print(Nash_eq(A))

[0.33333333 0.33333333 0.33333333]


-------------------