# Rationale

ACS for MAB v1 was wayyy unsophisticated.

New idea here is to have a directed (asymmetric TSP) graph, where each city has edges leading away from it that each represents an arm. Hence, for a k-arm MAB, we have k+1 cities.

We also have to make sure that each set of same-$\theta$-ed edges form a tour, such that the shortest tour indicates the shortest arm. i.e. $k$ tours out of a possible $(k-1)!$ actually represent arms.

Denote the true thetas $\{ \theta_i \}_{i=1}^k$.

The cost for each edge leading out from an arbitrary node $u$ would be given by
$$
\delta(u,v_i) = \theta_i \\
$$
(note that the $i$ in the above expression is actually based on which $\theta_i$ corresponds to which end node $v_i$, not the other way around)

Now we redefine
$$
\mathbb{E} [\eta(u, v_i)] = e^{\delta(u, v_i)} = e^{\theta_i} \\
$$
the exponentiation is so that we avoid distributional issues at $\theta_i = 0$. Then, using a $\Gamma$ distribution with $\alpha=10$ (although this $\alpha$ could be arbitrary)
$$
\mathbb{E} [\eta(u, v_i)] = e^{\theta_i} = \frac{10}{\beta} \\
\implies \beta = 10 e^{- \theta_i}
$$
such that
$$
\eta (u, v_i) \sim \Gamma ( \alpha, \alpha e^{- \theta_i} )
$$

Above it, Thompson sampling remains the same. Choose the edge $\implies$ pull the arm. Now we have a valid way to quantify regret.


# Les Codez

In [1]:
import numpy as np
import matplotlib.pylab as plt
%matplotlib notebook
plt.style.use(['ggplot', 'seaborn-poster'])

In [2]:
K = 100
ANTS = 10

In [3]:
TRUE_THETAS = np.random.uniform(size=K)

In [56]:
def generate_arm_tours(k):
    """
    k: number of arms
    k+1: number of cities
    """
    U = -np.eye(k+1)
    for i in xrange(k+1):
        np.put(U[i,:], xrange(i+1, i+1+k), xrange(1,k+1), mode='wrap')
    return U

In [57]:
generate_arm_tours(K)

array([[  -1.,    1.,    2., ...,   98.,   99.,  100.],
       [ 100.,   -1.,    1., ...,   97.,   98.,   99.],
       [  99.,  100.,   -1., ...,   96.,   97.,   98.],
       ..., 
       [   3.,    4.,    5., ...,   -1.,    1.,    2.],
       [   2.,    3.,    4., ...,  100.,   -1.,    1.],
       [   1.,    2.,    3., ...,   99.,  100.,   -1.]])

In [47]:
range(2,10)

[2, 3, 4, 5, 6, 7, 8, 9]

In [54]:
np.put(U[1,:], xrange(2,2+K), xrange(1,K+1), mode='wrap')

In [55]:
U

array([[  -1.,   -0.,   -0., ...,   -0.,   -0.,   -0.],
       [ 100.,   -1.,    1., ...,   97.,   98.,   99.],
       [  -0.,   -0.,   -1., ...,   -0.,   -0.,   -0.],
       ..., 
       [  -0.,   -0.,   -0., ...,   -1.,   -0.,   -0.],
       [  -0.,   -0.,   -0., ...,   -0.,   -1.,   -0.],
       [  -0.,   -0.,   -0., ...,   -0.,   -0.,   -1.]])