# Introduction
In a previous exercise we solved rock paper scissors using counterfactual regret minimization. Now, we will apply the principles learned from RPS to solve a more intriguing exercise.

We will be solving a sub-game of the more general [Colonel Blotto](https://en.wikipedia.org/wiki/Blotto_game) resource allocation problem. In this game, two millitary generals must allocate S soldiers across N possible battlefields. Allocating more soldiers to a battlefield than one's opponent results in a conquest. Allocating an equal number of soldiers results in a tie. The player with the most conquests wins the round.

# Problem Statement
This problem was originally posed as an exercise in [Introduction To Counterfactual Regret Minimization, Todd W. Neller and Lanctot](http://modelai.gettysburg.edu/2013/cfr/cfr.pdf). I could not find an available solution to this problem online so I took it upon myself to make one.

## Colonel Blotto Toy Game (S, N) = (5, 3)
Write a program that solves the Colonel Blotto problem for S = 5 soldiers and N = 3 battlefields

## Mixed vs Pure Strategy
In the context of this game, a 'pure' strategy is a single possible allocation of soldiers eg. (3,1,1). A mixed strategy is a mixed allocation of soldiers. Example: choosing (3,1,1) 30% of the time and (5,00) 70% of the time. We will begin by implementing a function that can generate all possible pure strategies. 

A valid allocation can be modeled as a linear equation consisting of N variables where the sum of those variables are S
$$
\sum_{i = 1}^{N} X_{i} = S
$$
In the case of this game, N = 3 and S = 5
$$
\sum_{i = 1}^{3} X_{i} = 5
$$
Where $$X_{i}$$ is an integer and $$0 \leq X_{i} \leq 5$$

Knowing this, we will write a function that generates all possible solutions to the equation under these constraints. Each solution will be a possible pure strategy.

## Implementation
Note: We will be implementing this in a functional style rather than an object oriented one. This format works better when explaining math in conjunction with code.

In [84]:
import random

### Generate Strategies
Since our problem only contains 3 battle-fields, we can straightforwardly generate all solutions to the equation using 2 nested loops

In [85]:
#Generate all pure strategies for n = 3, S = 5
def generateStrategies():
    s = 5
    strats = []
    for x1 in range(s + 1):
        current_strat = []
        for x2 in range((s + 1) - x1):
            for x3 in range((s + 1) - (x1 + x2)):
                if (x1 + x2 + x3) == s:
                    current_strat = [x1,x2,x3]
                    strats.append(current_strat)
    return strats        

Running the function will generate 21 possible unique allocations. 

In [86]:
generateStrategies()

[[0, 0, 5],
 [0, 1, 4],
 [0, 2, 3],
 [0, 3, 2],
 [0, 4, 1],
 [0, 5, 0],
 [1, 0, 4],
 [1, 1, 3],
 [1, 2, 2],
 [1, 3, 1],
 [1, 4, 0],
 [2, 0, 3],
 [2, 1, 2],
 [2, 2, 1],
 [2, 3, 0],
 [3, 0, 2],
 [3, 1, 1],
 [3, 2, 0],
 [4, 0, 1],
 [4, 1, 0],
 [5, 0, 0]]

### Compute Utility
Computing utility for a round of Colonel Blotto is as simple as counting how many battle-fields are won in a single round. The arguments we pass to the function will be lists of length 3 representing strategy profiles.

In [87]:
#Compute utility for player 1
#Assume p1 and p2 are 
def getUtility(p1,p2):
    u = 0
    for i in range(len(p1)):
        if p1[i] > p2[i]:
            u+=1
    return u

### Get Action According To Current Strategy
Next, we will want to create a function that randomly allocates soldiers to battle fields based on a given strategy profile. A strategy profile can be represented as a list of 21 real numbers $$X$$
where $$ 0 \leq X_{i}\leq 1$$
Each element of the list represents a frequency at which a corresponding allocation is chosen. As such, the sum of all the elements in the list must be equal to 1.
$$
\sum_{i = 1}^{21}X_{i} = 1
$$

Let us first make a function that generates a default strategy profile where each action is equally weighted.

In [88]:
#This function generates a default strategy where each action profile has equal weighting
def defaultStrat():
    actionProfiles = generateStrategies()
    strat = []
    for i in range(len(actionProfiles)):
        strat.append(1/len(actionProfiles))
    return strat

In [89]:
defaultStrat()

[0.047619047619047616,
 0.047619047619047616,
 0.047619047619047616,
 0.047619047619047616,
 0.047619047619047616,
 0.047619047619047616,
 0.047619047619047616,
 0.047619047619047616,
 0.047619047619047616,
 0.047619047619047616,
 0.047619047619047616,
 0.047619047619047616,
 0.047619047619047616,
 0.047619047619047616,
 0.047619047619047616,
 0.047619047619047616,
 0.047619047619047616,
 0.047619047619047616,
 0.047619047619047616,
 0.047619047619047616,
 0.047619047619047616]

Now we will write a function that randomly chooses an action profile given a strategy

In [146]:
#[List] -> [X1,X2,X3]
def getAction(strategy):
    rand = random.random()
    actions = generateStrategies()
    leftSum = 0
    rightSum = 0
    for i in range(len(strategy)):
        rightSum+=strategy[i]
        if rand > leftSum and rand <= rightSum:
            return actions[i]
        else:
            leftSum+=strategy[i]
    return actions[0]

In [147]:
getAction(defaultStrat())

[0, 2, 3]

### Get Regret Matched Strategy
We will now write a function that computes a regret matched strategy based on a regret sum and a strategy sum. The regret sum is a list of 21 elements where each element represents the expected value of a particular decision in a particular round of the game. The strategySum is the current accumulated sum of all normalized strategies.

In [148]:
##Returns the adjusted strategy after an iteration
def getStrategy(regretSum,strategySum):
    actions = 21
    normalizingSum = 0
    strategy = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
    #Normalizingsum is the sum of positive regrets. 
    #This ensures do not 'over-adjust' our strategy
    for i in range(0,actions):
        if regretSum[i] > 0:
            strategy[i] = regretSum[i]
        else:
            strategy[i] = 0
        normalizingSum += strategy[i]
    ##This loop normalizes our updated strategy
    for i in range(0,actions):
        if normalizingSum > 0:
            strategy[i] = strategy[i]/normalizingSum
        else:
            #Default to 33%
            strategy[i] = 1.0 / actions
        strategySum[i] += strategy[i]
    return (strategy,strategySum)

### Training Algorithm
We will write a training algorithm that allows both agents to converge to a game theory optimal solution of [Nash Equilibria](https://en.wikipedia.org/wiki/Nash_equilibrium)

In [None]:
def train(iterations,regretSum,oppStrategy):
    actionUtility = [0,0,0]
    strategySum = [0,0,0]
    actions = 3
    for i in range(0,iterations):
        ##Retrieve Actions
        t = getStrategy(regretSum,strategySum)
        strategy = t[0]
        strategySum = t[1]
        #print(strategy)
        myaction = getAction(strategy)
        #Define an arbitrary opponent strategy from which to adjust
        otherAction = getAction(oppStrategy)   
        #Opponent Chooses scissors
        if otherAction == actions - 1:
            #Utility(Rock) = 1
            actionUtility[0] = 1
            #Utility(Paper) = -1
            actionUtility[1] = -1
        #Opponent Chooses Rock
        elif otherAction == 0:
            #Utility(Scissors) = -1
            actionUtility[actions - 1] = -1
            #Utility(Paper) = 1
            actionUtility[1] = 1
        #Opopnent Chooses Paper
        else:
            #Utility(Rock) = -1
            actionUtility[0] = -1
            #Utility(Scissors) = 1
            actionUtility[2] = 1
            
        #Add the regrets from this decision
        for i in range(0,actions):
            regretSum[i] += actionUtility[i] - actionUtility[myaction]
    return strategySum