These are global analysis to determine important boundaries and metrics on the Santa 2020 contest. This is no discussion on how to build the best AI for this.

In [None]:
# package loading
import sys
import numpy as np
import matplotlib.pyplot as pl
from scipy.stats import linregress

np.random.seed(714)

In [None]:
# main dimensions

NB_SLOT = 100
DECAY = 0.97
NB_TURN = 2000

# Initial probabilities

Initially, all the 100 machines have a random probability to deliver a candy cane. Let's denote $P_i$ the initial delivering probability of the $i^{th}$ machine. The different machines probabilities are independant random variables uniformely picked on **[0, 1]**.
So every machine has an expected $P_i = 0.5$ with standard deviation $\sigma = \sqrt{\frac{1}{12}} = 0.289$.

* As the $P_i$ are independant, the 100 machines have an expected average of $P_{mean} = mean(P_i) = 0.5$ with a standard deviation of $\sigma_{100} = 0.029$ (have a look at [Bates distribution](https://en.wikipedia.org/wiki/Bates_distribution) shapes).

# Decay process

In [None]:
# compute cumulative decaying
decaying = np.power(DECAY, np.arange(NB_TURN*2))
decaying_cumsum = np.cumsum(decaying)




fig = pl.figure()
ax = fig.gca()
ax.grid()
ax.plot(decaying[:200])
ax.set_ylabel("decay")
ax2 = ax.twinx()
ax2.plot(decaying_cumsum[:200], color="r")
ax2.axhline(33.333333, color="g")
ax2.set_ylabel("decay cumsum")



The convergence of the decay cumulative sum graph gives us an interesting information: the expectated total number of candy canes a machine may deliver, is $33.333 P_i$. It can be analitically calculated with the geometric series : $\frac{1}{1-0.97} = 33.333$.

Thus, statiscally, in an average game ($P_{mean} = 0.5$) there is $1666.66$ available candy canes ($833.33$ per player). Note that this boundary assumes pulls are not simultaneous, which should not make big difference (as same pulling is quite rare). And we can estimate that $13000$ pulls would be

# Depleting

When probability is lower than $1 \%$, it remains less than $0.333$ expected candy canes in the machine, which can reasonably considered as depleted (this is an arbitrary threshold).

In [None]:
pull_to_deplete = np.zeros((101), dtype="i4")
pull_to_deplete[1:] = len(decaying)-1-np.searchsorted(decaying[::-1], 1./np.arange(100,0,-1))[::-1]
print("decay 100% to 1%:", pull_to_deplete[99])
print("decay 50% to 1%:",pull_to_deplete[49])
print("decay 10% to 1%:", pull_to_deplete[9])
print("mean %.2f std %2f" %(pull_to_deplete.mean(), pull_to_deplete.std()))


pl.figure()
pl.grid()
pl.plot(range(101), pull_to_deplete)
pl.xlabel("$P_i$")
pl.ylabel("pull to deplete")
pl.show()


It takes $129$ and $152$ pulls to deplete respectively machines starting at $P_i = 1.0$ and $P_i = 0.5$. In average, $117.72$ pulls are necessary to deplete a machine, and so $11772$ pulls to deplete all machines. Thus a game of 2000 steps, so 4000 pulls is far from depleting all available candy canes.

# Game simulator

In [None]:
class Configuration(object):
    """ configuration interaction object"""
    
    def __init__(self):
        self.decayRate = DECAY
        self.episodeSteps = NB_TURN
        self.banditCount = NB_SLOT
        self.actTimeout = 0.25
        self.sampleResolution = 100

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#

class Observation(object):
    """ observation interaction object"""
    
    def __init__(self, step, reward, last_turn, probas):
        self.step = step
        self.reward = reward
        self.last_turn = last_turn
        self.probas = probas
        self.remainingOverageTime = 60
        
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#


def one_game(configuration, bots, probas0=None):
    """ launch a game 
    return scores
    """
    
    # initialize    
    nb_player = len(bots)
    probas0 = probas0 or np.round(np.random.rand(NB_SLOT), 2)    
    probas = probas0.copy()
    scores = [0]*nb_player
    expected_scores = [0]*nb_player
    last_pick = [None]*nb_player    
    
    plays = [[] for p in range(nb_player)]
    
    
    def give_turn(p):
        """ give turn to player p"""
        
        observation = Observation(turn, scores[p], last_pick, probas)
        observation.magic = probas
        ind = bots[p](observation, configuration)
        plays[p].append(ind)
        get = np.random.rand() < probas[ind]
        scores[p] += int(get)
        expected_scores[p] += probas[ind]
        pick.append(ind)
        return get

    # loop on turns
    for turn in range(NB_TURN):
        pick = []
        
        for p in range(nb_player):
            give_turn(p)
        for ind in pick:
            probas[ind] *= DECAY
        last_pick = pick
    
    #print("BC0", np.bincount(plays[0]))
    #print("BC1", np.bincount(plays[1]))
    #print("scores", scores)
    return scores, expected_scores, probas0.mean(), np.sqrt(np.square(probas0).mean())         


This simulator can handle any number of bots. Single game are useful to obtain some accurate metrics on a bot performance. It returns the result score but also tracks the expected score which is the sum of the probabilities when pulling. Now let's create some facilities to launch series of games and build statistics on it.

In [None]:
class Statistics(object):
    """ Multi game statistics"""

    def __init__(self, players, nb_run):
        
        self.players = players
        self.nb_player = len(players)
        self.nb_run = nb_run
        
        # arrays
        self.scores = np.ones((self.nb_player, nb_run), "i4")*-1
        self.exp_scores = np.ones((self.nb_player, nb_run), "f8")*-1
        self.mean_probas = np.ones((nb_run), "f8")*-1
        self.mean_probas2 = np.ones((nb_run), "f8")*-1        
        self.bias = np.ones((self.nb_player, nb_run), "f8")*-1
        
    #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#        
    def __repr__(self):    
        # statistics
        txt = ""
        for p in range(self.nb_player):
            txt += "%d) score    mean %.2f std %.2f\n" % (p, self.scores[p].mean(), self.scores[p].std())
            txt += "%d) expscore mean %.2f std %.2f\n" % (p, self.exp_scores[p].mean(), self.exp_scores[p].std()) 
            txt += "%d) bias    mean %.2f std %.2f\n" % (p, self.bias[p].mean(), self.bias[p].std())
        return txt
        
    #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
        
    def acquisition(self, ind, results, print_round=False):
        """ acquire a one game result """
        
        scores, exp_scores, gameval, gameval2 = results

        self.mean_probas[ind] = gameval
        self.mean_probas2[ind] = gameval2
        for p in range(len(bots)):
            self.scores[p, ind] = scores[p]
            self.exp_scores[p, ind] = exp_scores[p]
            self.bias[p, ind] = scores[p]-exp_scores[p]
    
        if print_round:
            print("game %d / %d meanproba %.2f" % (ind+1, self.nb_run, gameval), "scores :", scores, "expscores", np.round(exp_scores, 1))
        
        
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#


def multi_game(configuration, bots, N):
    """ launch several games and collect statistics"""
    
    # initialization
    nb_bot = len(bots)
    stats = Statistics(bots, N)

    # loop on games
    for i in range(N):
        results = one_game(configuration, bots)
        print_round = i%(N//10) == 0
        stats.acquisition(i, results, print_round)
        
    return stats

# Cheater bot with optimized exploitation

One can notice I added the current delivering probabilities to observation. This way, we can simulate a bot knowing current probabilities (like a cheater) to see how it performs. Using multi-armed bandit paradigm, this is like having exploration phase totally achieved and only exploitation to do. Well, in fact, it is an upper bound on how well can perform a bot as only luck can let another bot beat it.

So the bot code is quite simple isn't it ?

In [None]:
def cheater(observation, configuration):
    """ bot knowing exact current probabilities,
    greedily pick the best.
    """
    return np.argmax(observation.probas)

Let's run some games with a single cheater bot and a duel with two of them. 

In [None]:
# single player games
configuration = Configuration()
bots = [cheater] 

single_stats = multi_game(configuration, bots, 1600)
print("SINGLE GAME")
print(single_stats)


# 2-players games

configuration = Configuration()
bots = [cheater, cheater] 

duel_stats = multi_game(configuration, bots, 1600)
print("DUEL GAME")
print(duel_stats)
dbias = duel_stats.bias[0]-duel_stats.bias[1]

print("bias1-bias2 mean %.2f std %.2f" % (dbias.mean(), dbias.std()))

So, a single "perfect" bot gets on average $912$ candy canes. If they are two of them, they get each $652$ candy canes for a total of about $1300$ candy canes. The initial pool of $1666$ candy canes is indeed quite reduced. This also shows that the competition between the two players is really important. A given bot may perform very differently versus a strong and a weak opponent. I tested different bots and was surprised to discover that the ones performing well on single games were not necessary good in the duel. Furthermore, the duel performances do not let me establish a clear ranking (cheater bot always wins, don't worry !).

In [None]:
pl.figure()
pl.grid()
pl.hist(dbias, 30, density=True, edgecolor="k")
pl.xlabel("relative bias")

On the average of these many games, the bias between score and deserved score is very small but on a single game, the quadratic mean is about $20$ !!! The bias on the relative scores of the players reaches a quadratic mean of $27.4$ (about $\sqrt 2$ times more), this is huge.

Just keep it in mind when you run a single game to judge the efficiency of a new feature... Moreover, regular bots may also be misleaded on exploration phases. Thus, one should expect this bias to be even more important.

# Reduced scores ?

The global expected scores are very variable in the different games. It is evident that the initial probability distribution has an important role in this. Let's try to figure how. Hereinafter $P_{mean}$ is the mean initial probability (as before) and $P_{mean,2}$ is the quadratic mean.



In [None]:
fig, axs = pl.subplots(2, 2, figsize=(12,12))

X = np.array([0.4, 0.6])
X2 = np.array([0.48, 0.67])

slope, intercept, r_value, p_value, std_err = linregress(single_stats.mean_probas, single_stats.exp_scores[0])
axs[0, 0].set_title("single $P_{mean}$ C=%.3f (a=%.1f, b=%.1f)" % (r_value, slope, intercept))
axs[0, 0].set_xlim(X)
axs[0, 0].grid()
axs[0, 0].scatter(single_stats.mean_probas, single_stats.exp_scores[0], marker="+")
axs[0, 0].plot(X, slope*X+intercept, "r")

slope, intercept, r_value, p_value, std_err = linregress(single_stats.mean_probas2, single_stats.exp_scores[0])
axs[0, 1].set_title("single $P_{mean, 2}$ C=%.3f (a=%.1f, b=%.1f)" % (r_value, slope, intercept))
axs[0, 1].set_xlim(X2)
axs[0, 1].grid()
axs[0, 1].scatter(single_stats.mean_probas2, single_stats.exp_scores[0], marker="+")
axs[0, 1].plot(X2, slope*X2+intercept, "r")

slope, intercept, r_value, p_value, std_err = linregress(duel_stats.mean_probas, duel_stats.exp_scores[0])
axs[1, 0].set_title("duel $P_{mean}$ C=%.3f (a=%.1f, b=%.1f)" % (r_value, slope, intercept))
axs[1, 0].grid()
axs[1, 0].set_xlim(X)
axs[1, 0].scatter(duel_stats.mean_probas, duel_stats.exp_scores[0], marker="+")
axs[1, 0].set_xlabel("$P_{mean}$")
axs[1, 0].plot(X, slope*X+intercept, "r")

slope, intercept, r_value, p_value, std_err = linregress(duel_stats.mean_probas2, duel_stats.exp_scores[0])
axs[1, 1].set_title("duel $P_{mean, 2}$ C=%.3f (a=%.1f, b=%.1f)" % (r_value, slope, intercept))
axs[1, 1].grid()
axs[1, 1].set_xlim(X2)
axs[1, 1].scatter(duel_stats.mean_probas2, duel_stats.exp_scores[0], marker="+")
axs[1, 1].set_xlabel("$P_{mean, 2}$")
axs[1, 1].plot(X2, slope*X2+intercept, "r")

pl.show()

the global expected scores seem to be mainly driven by the quadratic mean. It indeed takes in acount that it is more efficient to exploit two machines $(P_i=1.0\ \&\ P_j=0.0)$ than two machines $(P_i=0.5\ \&\ P_j=0.5)$. Nevertheless, it is interesting to notice that the mean has significantly more influence on duel games than on single games. I assume that it is because depleting is more important and the bots have to use more often machines with low probability. If we imagine longer games of something like 5000 steps, nearly all the machines would be depleted at end and global score would be driven only by the mean.