# HDP Gibbs Samplers

## *( 5.1 )* Posterior Sampling in the Chinese Restaurant Franchise

### Brief Overview

The Hierarchical Dirichlet Process mixture model is given by
$$\begin{aligned}
G_0 | \gamma, H &\sim DP(\gamma, H) \\
G_j | \alpha_0, G_0 &\sim DP(\alpha_0, G_0) \\
\theta_{ji} | G_j &\sim G_j \\
x_{ji} | \theta_{ji} &\sim F(\theta_{ji})
\end{aligned} $$

This model is able to non-parametrically cluster each group's data while sharing information both between and within groups.  A Dirichlet process is essentially a discrete distribution with atoms drawn from a (not-necessarily discrete) base measure $H$ and gradually decreasing weights determined by the "stick-breaking process."  In the HDP, each group is a Dirichlet process drawn from another DP $G_0$, so these will contain the same atoms as $G_0$ but with different weights:
$$\begin{aligned}
G_0 &= \sum_{k=1}^{\infty} \beta_k \delta(\phi_k) \\
G_j &= \sum_{k=1}^{\infty} \pi_{jk} \delta(\phi_k) \\
\phi_k | H &\sim H
\end{aligned} $$
Additionally, if we define $\beta, \pi_j$ as the collected weights above, it can be shown that these vectors encode a distribution over $\mathbb{Z}^+$ such that $\beta | \gamma \sim GEM(\gamma)$ and $\pi_j | \alpha_0, \beta \sim DP(\alpha_0, \beta)$.

Successive draws from a DP exhibit clustering behavior, since the probability of taking a certain value is a related to the number of previous draws of that value.  This is shown in the hierarchical sense by the *Chinese restaurant franchise* process.  Imagine a group of Chinese restaurants with a certain number of tables at each restaurant.  Let $\phi_k$ be the global dishes, drawn from $H$; $\psi_{jt}$ be the table-specific dishes, drawn from $G_0$; and $\theta_{ji}$ be the customer-specific dishes, drawn from $G_j$.  Denote $z_{ji}$ as the dish index eaten by customer $ji$; $t_{ji}$ as the table index where customer $ji$ sits; $k_{jt}$ be the dish index served at table $jt$; $n_{jtk}$ be the customer counts; and $m_{jk}$ be the table counts.  Then:

$$\begin{aligned}
\theta_{ji} | \text{other } \theta, \alpha_0, G_0 &\sim
    \sum_{t=1}^{m_{j\cdot}} \frac{n_{jt\cdot}}{i-1+\alpha_0} \delta(\psi_{jt}) +
                            \frac{\alpha_0}{i-1+\alpha_0} G_0 \\
\psi_{jt} | \text{other } \psi, \gamma, H &\sim
    \sum_{k=1}^{K} \frac{m_{\cdot k}}{m_{\cdot k} + \gamma} \delta(\phi_k) +
                            \frac{\gamma}{m_{\cdot k} + \gamma} H
\end{aligned} $$

### Full Conditionals

Choose some base measure $h(\cdot)$ and a conjugate data-generating distribution $f(\cdot | \theta)$.  Important to compute are $f_k^{-x_{ji}}(x_{ji})$, the mixture component of customer $ij$ under $k$, and $f_k^{-\mathbf{x}_{jt}}(\mathbf{x}_{jt})$, the mixture component of table $jt$ under $k$.  This is done by integrating out $\phi_k$ over the joint density of all such points, for example:

$$\begin{aligned}
f_k^{-x_{ji}}(x_{ji}) &= \frac { \int f(x_{ij} | \phi_k) g(k)d\phi_k } { \int g(k)d\phi_k } \\
g(k) &= h(\phi_k) \prod_{j'i' \neq ji, z_{j'i'} = k} f(x_{j'i'} | \phi_k) 
\end{aligned} $$

The corresponding mixture components for a new customer assignment and new table assignment are denoted $f_{k^*}^{-x_{ji}}(x_{ji})$ and $f_{k^*}^{-\mathbf{x}_{jt}}(\mathbf{x}_{jt})$, which are special cases of their the respective $f_k$ component where no data points have $z_{ij} = k^*$.

Using this, we first compute the likelihood of a given point $x_{ji}$ given the current clustering scheme:
$$
p(x_{ji} | t^{-ji}, t_{ji} = t^*, k) =
    \sum_{k=1}^{K} \frac{m_{\cdot k}}{m_{\cdot k} + \gamma} f_k^{-x_{ji}}(x_{ji}) +
                            \frac{\gamma}{m_{\cdot k} + \gamma} f_{k^*}^{-x_{ji}}(x_{ji})
$$

For efficiency, the Gibbs scheme implemented below only samples the $t$ and $k$ indexes (which can later be reverse-engineered to obtain the actual parameters).  The state space of the $k$ values is technically infinite, and the number of tables/dishes currently associated with the data is undefined.  We keep a running list of active $t$ and $k$ values.  Each update step, each customer is assigned either to one of the existing tables or to a new table, and if a customer is assigned to a new table, a new $k$ corresponding value gets drawn; similarly, each table is assigned a dish, either from the existing dishes or with a new dish.  If a table/dish becomes unrepresented in the current scheme, it gets removed from its respective list.  The update full conditionals are:

$$ \begin{aligned}
p(t_{ji} = t | t^{-ji}, k, ...) &\propto \begin{cases}
    n_{jt\cdot}^{-ji} f_{k_{jt}}^{-x_{ji}}(x_{ji}) & t\text{ used}\\
    \alpha_0 p(x_{ji} | ...) & t\text{ new}
    \end{cases} \\
p(k_{jt} = k | t, k^{-jt}) &\propto \begin{cases}
    m_{\cdot k} f_k^{-\mathbf{x}_{jt}}(\mathbf{x}_{jt}) & k\text{ used}\\
    \gamma f_{k^*}^{-\mathbf{x}_{jt}}(\mathbf{x}_{jt}) & k\text{ new}
    \end{cases} \\
\end{aligned} $$

### Distribution-Specific Mixture Components

The only part of this sampling algorithm that depends on the choice of the measures $H$ and $F$ are the mixture components $f_k$, so this is the only part that needs rewritten for each type of model.  Let
$$ \begin{aligned}
V_{kji} &= \{ j'i' : j'i' \neq ji, z_{j'i'} = k \} \\
W_{kjt} &= \{ j'i' : j't_{j'i'} \neq jt, k_{j't_{j'i'} = k} \} \\
T_{jt} &= \{ j'i': t_{j'i'} = jt \} \\
\end{aligned} $$
$V$ is the set of all customers (excluding customer $ij$) eating dish $k$; $W$ is the set of all customers at tables (excluding table $jt$) eating $k$; these correspond to the product terms in the mixture components.  By conjugacy rules and kernel tricks, each $f_k$ can be expressed as functions of these sets.  Each $f_{k^*}$ can be found by using the corresponding $f_k$ formula where $V$ or $W$ is the empty set.

*F = Poisson, H = Gamma*

$$ \begin{aligned}
f(x | \phi_k) &\sim Poisson(\phi_k) \\
h(\phi_k) &\sim Gamma(\alpha, \beta) \\
\\
f_k^{-x_{ji}}(x_{ji}) &= \frac{1}{x_{ji}!} \cdot
    \frac{\Gamma(x_{ji} + \alpha_v)}{(1 + \beta_v)^{x_{ji} + \alpha_v}} \cdot
    \frac{(\beta_v)^{\alpha_v}}{\Gamma(\alpha_v)} \\
f_k^{-\mathbf{x}_{jt}}(\mathbf{x}_{jt}) &= \frac{1}{\prod_T x_t!} \cdot
    \frac{\Gamma(\sum_T x_t + \alpha_w)}{(|T| + \beta_w)^{\sum_T x_t + \alpha_w}} \cdot
    \frac{(\beta_w)^{\alpha_w}}{\Gamma(\alpha_w)} \\
\alpha_v &= \sum_V x_v + \alpha \quad , \quad \beta_v = |V| + \beta \\
\alpha_w &= \sum_W x_w + \alpha \quad , \quad \beta_w = |W| + \beta \\
\end{aligned} $$

*F = Multinomial, H = Dirichlet*

Let $\mathbf{x}$ be a feature vector of length $L$.  The Multinomial/Dirichlet model is given by
$$ \begin{aligned}
f(\mathbf{x} | n, \mathbf{\phi}_k) &\sim Multinomial(n, \mathbf{\phi}_k) \\
h(\mathbf{\phi}_k) &\sim Dirichlet(L, \mathbf{\alpha}) \\
\\
f_k^{-\mathbf{x}_{ji}}(\mathbf{x}_{ji}) &=
    \frac{n_{ji}!}{\prod_{\ell=1}^L (\mathbf{x}_{ji})_\ell!} \cdot
    \frac{ \prod \Gamma(\mathbf{\alpha}_{\ell}^{top}) }{ \Gamma(\sum \mathbf{\alpha}_{\ell}^{top}) } \cdot
    \frac{ \Gamma(\sum \mathbf{\alpha}_{\ell}^{bottom}) }{ \prod \Gamma(\mathbf{\alpha}_{\ell}^{bottom}) } \\
\mathbf{\alpha}_{\ell}^{bottom} &= \sum_V (\mathbf{x}_v)_{\ell} + \mathbf{\alpha}_{\ell} \\
\mathbf{\alpha}_{\ell}^{top} &= (\mathbf{x}_{ji})_{\ell} + \mathbf{\alpha}_{\ell}^{bottom} \\
\\
f_k^{-\mathbf{X}_{jt}}(\mathbf{X}_{jt}) &=
    \frac{ \prod_T n_t! }{ \left[ \prod_{\ell=1}^L (\mathbf{x}_t)_\ell! \right]^{|T|} } \cdot
    \frac{ \prod \Gamma(\mathbf{\alpha}_{\ell}^{top}) }{ \Gamma(\sum \mathbf{\alpha}_{\ell}^{top}) } \cdot
    \frac{ \Gamma(\sum \mathbf{\alpha}_{\ell}^{bottom}) }{ \prod \Gamma(\mathbf{\alpha}_{\ell}^{bottom}) } \\
\mathbf{\alpha}_{\ell}^{bottom} &= \sum_W (\mathbf{x}_w)_{\ell} + \mathbf{\alpha}_{\ell} \\
\mathbf{\alpha}_{\ell}^{top} &= \sum_T (\mathbf{x}_t)_\ell + \mathbf{\alpha}_{\ell}^{bottom} \\
\end{aligned} $$

To be continued...

In [43]:
import numpy as np
import pandas as pd
from scipy.special import loggamma as logg
from sklearn.preprocessing import normalize

In [87]:
def pois_fk_cust(i, x, k, Kmax, ha, hb, new=False):
    """
    Computes the mixture components for a given customer across all k values.
    MODEL: base measure H ~ Gamma(ha, hb), F(x|phi) ~ Poisson(phi)
    All components are calculated exactly in log-space and then exponentiated.
    
    returns: (Kmax,) vector; if new=True, all entries will be the same
    """
    
    x = x.flatten()  # reshape to 1D, since gibbs routine passes in a 2D array
    
    # Calculate the case where k has no members
    fknew_cust = np.exp( -logg(x[i] + 1) + logg(x[i] + ha) - logg(ha) -
                         (x[i] + ha)*np.log(1 + hb) + ha*np.log(hb) )
    if new == True: return np.full(Kmax, fknew_cust)        
    
    x_kks = [x[k == kk] for kk in range(Kmax)]  # subset of customers eating kk
    xi_in = np.zeros(Kmax)                      # offset if x[i] is in this subset
    xi_in[k[i]] = 1
      
    # Compute (a,b) params from gamma kernel tricks done in fk function
    av = np.array(list(map(np.sum, x_kks))) - xi_in*x[i] + ha
    bv = np.array(list(map(len, x_kks))) - xi_in + hb
    fk_cust = np.exp( -logg(x[i] + 1) + logg(x[i] + av) - logg(av) -
                      (x[i] + av)*np.log(1 + bv) + av*np.log(bv) )
     
    return fk_cust


def pois_fk_tabl(jj, tt, x, j, t, k, Kmax, ha, hb, new=False):
    """
    Computes the mixture components for a given table across all k values.
    MODEL: base measure H ~ Gamma(ha, hb), F(x|phi) ~ Poisson(phi)
    All components are calculated exactly in log-space and then exponentiated.
    
    returns: (Kmax,) vector; if new=True, all entries will be the same
    """
    
    x = x.flatten()  # reshape to 1D, since gibbs routine passes in a 2D array
    x_jt = x[np.logical_and(j == jj, t == tt)]
    kk = k[np.logical_and(j == jj, t == tt)]
    
    fknew_tabl = np.exp( -np.sum(logg(x_jt + 1)) + logg(np.sum(x_jt) + ha) - logg(ha) -
                         (np.sum(x_jt) + ha)*np.log(len(x_jt) + hb) + ha*np.log(hb) )
    # If table jt doesn't exist, just return the "new" mixture component
    if len(x_jt) == 0:
        print(f"WARNING: table {(jj, tt)} does not exist currently")
        new = True
    if new == True: return np.full(Kmax, fknew_tabl)
    
    x_kks = [x[k == kk] for kk in range(Kmax)]  # subset of customers at tables serving kk
    xi_in = np.zeros(Kmax)                      # offset if table x_jt is in this subset
    xi_in[kk[0]] = 1
      
    # Compute (a,b) params from gamma kernel tricks done in fk function
    av = np.array(list(map(np.sum, x_kks))) - xi_in*np.sum(x_jt) + ha
    bv = np.array(list(map(len, x_kks))) - xi_in*len(x_jt) + hb
    fk_tabl = np.exp( -np.sum(logg(x_jt + 1)) + logg(np.sum(x_jt) + av) - logg(av) -
                       (np.sum(x_jt) + av)*np.log(len(x_jt) + bv) + ha*np.log(bv) )
     
    return fk_tabl

In [149]:
class CFRP:
    """
    Model implementing the Chinese Franchise Restaurant Process.
    
    CONSTRUCTOR PARAMETERS
    - gamma, alpha0: scaling parameters > 0 for base measures H and G0
    - f: string representing distribution of data; h is chosen to be conjugate
    - hypers: tuple of hyperparameter values specific to f/h scheme chosen
    
    PRIVATE ATTRIBUTES (volatile)
    - tk_map_: (J x Tmax) matrix of k values for each (j,t) pair
    - n_: (J x Tmax) matrix specifying counts of customers
    - m_: (J x Kmax) matrix specifying counts of tables
    - fk_cust_, fk_tabl_: functions to compute mixing components for Gibbs sampling
    
    PUBLIC ATTRIBUTES
    post_samples: (S x 3) matrix of (j, t, k) values for each data point i;
                  exists only after gibbs() has been called
    """
    
    def __init__(self, gamma=1, alpha0=1, f='poisson', hypers=None):
        self.g_ = gamma
        self.a0_ = alpha0
        self.set_priors(f, hypers)
        
    def set_priors(self, f, hypers):
        """
        Initializes the type of base measure h_ and data-generation function f_.
        Also sets hypers_, the relevelant hyperparameters and
                  fk_routine_, the function to compute mixing components.
        """
        if f == 'poisson':
            # Specify parameters of H ~ Gamma(a,b)
            if hypers is None:
                self.hypers_ = (1,1)
            else: self.hypers_ = hypers
            self.fk_cust_ = pois_fk_cust
            self.fk_tabl_ = pois_fk_tabl
    
    
    def tally_up(self, it, which=None):
        """
        Helper function for computing maps and counts in gibbs().
        Given a current iteration in the post_samples attribute, does a full
        recount of customer/table allocations, updating n_ and m_.
        Set which = 'n' or 'm' to only tally up that portion
        """
        
        jt_pairs = self.post_samples[it,:,0:2]
        
        if which != 'm':
            # Count customers at each table (jt)
            cust_counts = pd.Series(map(tuple, jt_pairs)).value_counts()
            j_idx, t_idx = tuple(map(np.array, zip(*cust_counts.index)))
            self.n_ *= 0
            self.n_[j_idx, t_idx] = cust_counts
            
        if which != 'n':
            # First filter by unique tables (jt), then count tables with each k value
            jt_unique, k_idx = np.unique(jt_pairs, axis=0, return_index=True)
            jk_pairs = np.c_[self.post_samples[it, k_idx, 0],
                             self.post_samples[it, k_idx, 2]]
            #print(jk_pairs)
            tabl_counts = pd.Series(map(tuple, jk_pairs)).value_counts()
            #print(tabl_counts)
            j_idx, k_idx = tuple(map(np.array, zip(*tabl_counts.index)))
            self.m_ *= 0
            self.m_[j_idx, k_idx] = tabl_counts
    
    
    def draw_t(self, it, i, Tmax):
        """
        Helper function which draws from the t_ij full conditional.
        Returns the drawn t, updates the counts and the samples matrices.
        If the drawn t is unused for j, will also draw a k.
        """
        ## CURRENTLY IMPLEMENTED IN GIBBS (TODO?)
    
    
    def draw_k(self, it, i, Tmax):
        """
        Helper function which draws from the k_jt full conditional.
        Returns the drawn k, updates the counts and the samples matrices.
        """
        ## CURRENTLY IMPLEMENTED IN GIBBS (TODO?)
 
        
    def gibbs(self, x, j, iters, Tmax=5, Kmax=10, verbose=False):
        """
        Runs the Gibbs sampler to generate posterior estimates of t and k.
        x: data matrix, stored row-wise if multidimensional
        j: vector of group labels; must have same #rows as x
        iters: number of iterations to run
        Tmax: maximum number of clusters for each group
        Kmax: maximum number of atoms to draw from base measure H
        
        returns: this CFRP object with post_samples attribute
        """
            
        group_counts = pd.Series(j).value_counts()
        # number of tables cannot exceed size of max group
        J, N = np.max(j) + 1, len(j)
        self.n_ = np.zeros((J, Tmax))
        self.m_ = np.zeros((J, Kmax))
        self.post_samples = np.zeros((iters+1, N, 4), dtype='int')
        self.post_samples[:,:,0] = j
        
        # Set random initial values for t and k assignments
        t0, k0 = self.post_samples[0,:,1], self.post_samples[0,:,2]
        t0[:] = np.random.randint(1, Tmax, size=N)
        self.tk_map_ = np.random.randint(1, Kmax//2, (J, Tmax))
        self.tally_up(it=0, which='n')
        for jj in range(J):
            for tt in np.where(self.n_[jj, :] > 0)[0]:
                #print(f"mapping: {(jj, tt)} -> {self.tk_map_[jj, tt]}")
                k0[np.logical_and(j == jj, t0 == tt)] = self.tk_map_[jj, tt]
        self.tally_up(it=0, which='m')
        
        
        for s in range(iters):
            if verbose: print(f"----------------\n ITERATION {s}\n----------------")
            t_prev, k_prev = self.post_samples[s,:,1], self.post_samples[s,:,2]
            t_next, k_next = self.post_samples[s+1,:,1], self.post_samples[s+1,:,2]
            
            ##############
            t_next[:], k_next[:] = t_prev, k_prev
            # Cycle through each t value of each customer, conditioning on everything
            # Randomize the order in which updates occur
            for i in np.random.permutation(N):
                jj, tt0, kk0 = j[i], t_next[i], k_next[i]
                
                # Get vector of customer f_k values (dependent on model specification)
                old_mixes = self.fk_cust_(i, x, k_next, Kmax, *self.hypers_) 
                new_mixes = self.fk_cust_(i, x, k_next, Kmax, *self.hypers_, new=True) 
                # Calculate pointwise likelihoods p(x_ji | ...)
                M = np.sum(self.m_)
                Mk = np.sum(self.m_, axis=0)   # number of tables serving k
                lik = old_mixes @ (Mk / (M + self.g_)) + new_mixes @ (self.g_ / (M + self.g_))
                
                cust_offset = np.zeros(Tmax)
                cust_offset[tt0] = 1
                old_t = (self.n_[jj, :] - cust_offset) * old_mixes[self.tk_map_[jj, :]]      
                new_t = self.a0_ * lik
                # If a table is in use, prob comes from old_t; otherwise, from new_t
                t_used = self.n_[jj, :] > 0
                t_dist = old_t * t_used.astype('int') + new_t * np.logical_not(t_used).astype('int')
                """TEMPORARY FIX (bug should be found later):
                   Remove nans and add epsilon so that distribution is all positive"""
                t_dist[np.isnan(t_dist)] = 0
                t_dist += 1e-6
                
                tt1 = np.random.choice(Tmax, p=t_dist/np.sum(t_dist))
                t_next[i] = tt1
                self.tally_up(it=s+1, which='n')
                
                # If this table was previously unoccupied, we need to select a k
                if self.n_[jj, tt1] == 1 and tt0 != tt1:
                    old_k = np.sum(self.m_, axis=0) * old_mixes
                    new_k = self.g_ * new_mixes
                    k_used = np.sum(self.m_, axis=0) > 0
                    k_dist = old_k * k_used.astype('int') + new_k * np.logical_not(k_used).astype('int')
                    
                    kk1 = np.random.choice(Kmax, p=k_dist/np.sum(k_dist))
                    self.tk_map_[jj, tt1] = kk1
                    k_next[i] = self.tk_map_[jj, tt1]
                self.tally_up(it=s+1, which='m')

                if verbose: print(f"~ customer (j,i) = {(jj,i)}" +
                                  f" moves table: {tt0} -> {t_next[i]}, k: {kk0} -> {k_next[i]}")  
            
            ##############
            # Similarly, cycle through the k values of each table
            j_idx, t_idx = np.where(self.n_ > 0)   # find the occupied tables
            for i in np.random.permutation(len(j_idx)):
                jj, tt = j_idx[i], t_idx[i]
                kk0 = self.tk_map_[jj, tt]
                
                # Get vector of table f_k values (dependent on model specification)
                old_mixes = self.fk_tabl_(jj, tt, x, j, t_next, k_next, Kmax, *self.hypers_) 
                new_mixes = self.fk_tabl_(jj, tt, x, j, t_next, k_next, Kmax, *self.hypers_, new=True) 
                
                tabl_offset = np.zeros(Kmax)
                tabl_offset[kk0] = 1
                old_k = (np.sum(self.m_, axis=0) - tabl_offset) * old_mixes
                new_k = self.g_ * new_mixes
                k_used = np.sum(self.m_, axis=0) > 0
                k_dist = old_k * k_used.astype('int') + new_k * np.logical_not(k_used).astype('int')
                """TEMPORARY FIX (bug should be found later):
                   Remove nans and add epsilon so that distribution is all positive"""
                k_dist[np.isnan(k_dist)] = 0
                k_dist += 1e-6
                
                #print(f"{old_k.round(3)}\n{new_k.round(3)}\n{k_used}\n{k_dist.round(3)}")
                kk1 = np.random.choice(Kmax, p=k_dist/np.sum(k_dist))
                self.tk_map_[jj, tt] = kk1
                k_next[np.logical_and(j == jj, t_next == tt)] = kk1
                self.tally_up(it=s+1, which='m')

                if verbose: print(f"~~ table (j,t) = {(jj,tt)} changes dish: {kk0} -> {kk1}")
        
        return self
    

In [162]:
# Simulated data
N = 25
np.random.seed(0)
j = np.random.randint(0, 9, N)
x = np.random.poisson(j, N)
data = np.c_[x, j]

c = CFRP(hypers=(1,10)).gibbs(x[:,None], j, iters=5, verbose = True)

----------------
 ITERATION 0
----------------
~ customer (j,i) = (3, 5) moves table: 4 -> 3, k: 2 -> 1
~ customer (j,i) = (3, 2) moves table: 2 -> 0, k: 2 -> 2
~ customer (j,i) = (0, 1) moves table: 4 -> 5, k: 2 -> 6
~ customer (j,i) = (7, 16) moves table: 3 -> 1, k: 3 -> 2
~ customer (j,i) = (8, 12) moves table: 4 -> 5, k: 2 -> 2
~ customer (j,i) = (1, 13) moves table: 2 -> 1, k: 4 -> 4
~ customer (j,i) = (3, 3) moves table: 2 -> 1, k: 2 -> 2
~ customer (j,i) = (2, 7) moves table: 3 -> 5, k: 4 -> 4
~ customer (j,i) = (0, 23) moves table: 3 -> 2, k: 2 -> 1
~ customer (j,i) = (4, 8) moves table: 5 -> 3, k: 2 -> 2
~ customer (j,i) = (7, 4) moves table: 2 -> 0, k: 2 -> 2
~ customer (j,i) = (7, 15) moves table: 3 -> 5, k: 3 -> 2
~ customer (j,i) = (3, 24) moves table: 1 -> 4, k: 3 -> 3
~ customer (j,i) = (5, 6) moves table: 3 -> 2, k: 2 -> 2
~ customer (j,i) = (8, 20) moves table: 5 -> 4, k: 2 -> 2
~ customer (j,i) = (6, 10) moves table: 2 -> 4, k: 2 -> 2
~ customer (j,i) = (5, 19) moves 

## *( 5.2 )* Posterior Sampling with Augmented Representation

## *( 5.3 )* Posterior Sampling by Direct Assignment