In [389]:
# Import packages needed to run code
import sys
import importlib
import plotly
import random
from statistics import mean, stdev

sys.path.append('../source') # module path
import pig
import visualisation
importlib.reload(pig) # avoid restarting kernel in each module modification
importlib.reload(visualisation)

# Import specific class and functions
from pig import Pig
from visualisation import plot_pig_policy, plot_reachable_states

Find optimal policy using value iteration

In [2]:
result_pig = Pig(T=100)
result_pig.value_iteration(tol=1e-6)

Define hold at twenty and optimal policy functions

In [376]:
def hold_at_twenty(i, j, k):
    if k < 20:
        return 'roll'
    else:
        return 'hold'

In [308]:
def optimal_policy(i, j, k):
        return result_pig.policy[i, j, k]

Pig game and tournament functions. Tournament is for comparing the optimal policy with itself and the hold at twenty policy, varying which policy goes first. This is done by simulating $n$ games for optimal vs optimal, optimal vs hold at twenty where optimal goes first, and optimal vs hold at twenty where hold at twenty goes first. The proportion of times that the strategy goes first wins is then calculated, and used to estimate the probability that it wins.

In [346]:
def game(strats):
    scores = [0, 0]
    while max(scores) < 100:
        for i in range(2):
            round_score = 0
            roll = 0
            policy = strats[i]
            while policy(scores[i], scores[1-i], round_score) == 'roll' and roll != 1:
                roll = random.sample([1,2,3,4,5,6], 1)[0]
                round_score = (roll !=  1) * (round_score + roll)
                if scores[i] + round_score >= 100 and roll != 1:
                    return i
            scores[i] = scores[i] + round_score

In [377]:
def tournament(n):
    opt_v_opt = [1-game([optimal_policy, optimal_policy]) for i in range(n)]
    opt_v_hold = [1-game([optimal_policy, hold_at_twenty]) for i in range(n)]
    hold_v_opt = [1-game([hold_at_twenty, optimal_policy]) for i in range(n)]
    return([mean(opt_v_opt), mean(opt_v_hold), mean(hold_v_opt)])

Simulate 10000 games in each scenario and get point estimates of probabilities. The report gives $[0.5306, 0.5874, 0.4776]$ for the probabilities of optimal winning against optimal if it goes first, optimal winning against hold at twenty if it goes first, and hold at twenty winning against optimal if it goes first respectively.

In [383]:
PI_simulation = tournament(10000)

In [384]:
PI_simulation

[0.5302, 0.5665, 0.4943]

So the point estimate for optimal vs optimal matches that in the paper, while the point estimates for the optimal policy beating the hold at twenty policy is slightly too high regardless of which policy goes first compared to the results from the paper. The method used in the paper is unclear, as simulation is not mentioned and they describe doing 'the same technique' as before. Explore this further by calculating 95% confidence intervals for each probability. Calculate point intervals based on $1000$ games $100$ times to calculate confidence intervals.

In [385]:
N = 100
CI_simulation = [tournament(1000) for i in range(100)]

In [388]:
z = 1.96
means = [mean([CI_simulation[i][j] for i in range(N)]) for j in range(3)]
sds = [stdev([CI_simulation[i][j] for i in range(N)]) for j in range(3)]
CIs = [[means[i]+u*z*sds[i]/N**(1/2) for u in [-1, 1]] for i in range(3)]

In [387]:
CIs

[[0.5292358523338704, 0.5347041476661297],
 [0.5785143529153822, 0.5846056470846177],
 [0.47507128551777505, 0.48154871448222497]]

So the probability for optimal vs optimal in the paper falls into this interval, and the other two probabilities from the paper don't.