### Bayesian Naive bayes

The data set consists of $n$ invoices

$$ ((x^{(1)}, r_1, s_1), ..., (x^{(n)}, r_n, s_n)) $$

$x^{(i)}$ is the counts of words in invoice $i$. There are $V$ different words in the data set.

$$ x^{(i)} = (x^{(i)}_1, ..., x^{(i)}_V) $$

$s_i$ is the sender of invoice $i$. There are $S$ different senders in the data set.

$$ s_i \in \{1,...,S\} $$

$r_i$ is the receiver of invoice $i$. There are $R$ different receivers in the data set.

$$ r_i \in \{1,...,R\} $$

The distribution for the receiver $r$ is a categorical distribution, parameterized by $\lambda$ probabilities of $r$ being each of the $R$ different senders

$$ 
p(r|\lambda) =
Cat(r|\lambda) =
\prod_{l=1}^R 
\lambda_l^{[l=r]}
$$

The distribution for the sender $s$ is a categorical distribution parameterized by $\theta_r$ the probabilities of $s$ being the sender given the receiver $r$.

$$ 
p(s|r, \theta) =
Cat(s|r, \theta) =
\prod_{l=1}^R 
\prod_{k=1}^S 
\theta_{lk}^{[l=r][k=s]}
$$

The distribution for the counts of words $x$ in a document is the multinomial distribution parameterized by $\phi_{rsj}$, the probability of word $j$ appearing in documents to receiver $r$ from sender $s$.

$$ 
p(x|s,r,\phi) =
Mult(x|r,s,\phi) =
\frac
{\left(\sum_{j=1}^V x_j\right)!}
{\prod_{j=1}^V x_j!}
\prod_{l=1}^R
\prod_{k=1}^S
\prod_{j=1}^V 
\phi_{lkj}^{x_j[l=r][k=s]} 
$$

For the priors on $\lambda$, $\theta$ and $\phi$ we'll use dirichlet distributions as they're the conjugate priors for both the categorical and the multinomial distribution. The priors are parameterized by the hyper parameters $\eta$, $\alpha$ and $\beta$ respectively.

$$
p(\lambda) = 
Dir(\lambda|\eta) \propto
\prod_{l=1}^R \lambda_l^{\eta_l-1}
$$

$$ 
p(\theta) = 
\prod_{l=1}^R
Dir(\theta_l|\alpha_l) \propto
\prod_{l=1}^R
\prod_{k=1}^S 
\theta_{lk}^{\alpha_{lk}-1} 
$$

$$ 
p(\phi) =
\prod_{l=1}^R
\prod_{k=1}^S 
Dir(\phi_{lk}|\beta_{lk}) \propto
\prod_{l=1}^R
\prod_{k=1}^S
\prod_{j=1}^V 
\phi_{lkj}^{\beta_{lkj}-1} 
$$

We'll find the posterior of the parameters first

$$
p(\lambda, \theta, \phi | D) \propto p(\lambda)p(\theta)p(\phi)p(D|\theta, \phi, \lambda)
$$

Assuming the data is i.i.d

$$
p(\lambda, \theta, \phi | D) \propto p(\lambda)p(\theta)p(\phi)\prod_{i=1}^n p(x^{(i)}|s_i, r_i, \phi)p(s_i|, r_i, \theta)p(r_i|\lambda)
$$

Re-arranging

$$
p(\lambda, \theta, \phi | D) \propto
\left[
p(\lambda)
\prod_{i=1}^n 
p(r_i|\lambda)
\right]
\left[
p(\theta)
\prod_{i=1}^n 
p(s_i|r_i, \theta)
\right]
\left[
p(\phi)
\prod_{i=1}^n 
p(x^{(i)}|s_i, r_i, \phi)
\right]
$$

We'll handle the $\lambda$ bracket first. Inserting the definitions.

$$
p(\lambda, \theta, \phi | D) \propto
\left[
\prod_{l=1}^R \lambda_l^{\eta_l-1}
\prod_{i=1}^n 
\prod_{l=1}^R \lambda_l^{[l=r_i]}
\right]
\left[
p(\theta)
\prod_{i=1}^n 
p(s_i|r_i, \theta)
\right]
\left[
p(\phi)
\prod_{i=1}^n 
p(x^{(i)}|s_i, r_i, \phi)
\right]
$$

Introducing $m_l = \sum_{i=1}^n[r_i=l]$, the count of receiver $l$ in the data set

$$
p(\lambda, \theta, \phi | D) \propto
\left[
\prod_{l=1}^R \lambda_l^{\eta_l-1}
\prod_{l=1}^R \lambda_l^{m_l}
\right]
\left[
p(\theta)
\prod_{i=1}^n 
p(s_i|r_i, \theta)
\right]
\left[
p(\phi)
\prod_{i=1}^n 
p(x^{(i)}|s_i, r_i, \phi)
\right]
$$

Joining the products

$$
p(\lambda, \theta, \phi | D) \propto
\left[
\prod_{l=1}^R \lambda_l^{\eta_l + m_l - 1}
\right]
\left[
p(\theta)
\prod_{i=1}^n 
p(s_i|r_i, \theta)
\right]
\left[
p(\phi)
\prod_{i=1}^n 
p(x^{(i)}|s_i, r_i, \phi)
\right]
$$

And now for the $\theta$ bracket. Inserting the definitions

$$
p(\lambda, \theta, \phi | D) \propto
\left[
\prod_{l=1}^R \lambda_l^{\eta_l + m_l - 1}
\right]
\left[
\prod_{l=1}^R
\prod_{k=1}^S 
\theta_{lk}^{\alpha_{lk}-1} 
\prod_{i=1}^n 
\prod_{l=1}^R 
\prod_{k=1}^S 
\theta_{lk}^{[l=r_i][k=s_i]}
\right]
\left[
p(\phi)
\prod_{i=1}^n 
p(x^{(i)}|s_i, r_i, \phi)
\right]
$$

Introducing $c_{lk} = \sum_{i=1}^n [l=r_i][k=s_i]$, the count of documents to receiver $l$ from sender $k$

$$
p(\lambda, \theta, \phi | D) \propto
\left[
\prod_{l=1}^R \lambda_l^{\eta_l + m_l - 1}
\right]
\left[
\prod_{l=1}^R
\prod_{k=1}^S 
\theta_{lk}^{\alpha_{lk}-1} 
\prod_{l=1}^R 
\prod_{k=1}^S 
\theta_{lk}^{c_{lk}}
\right]
\left[
p(\phi)
\prod_{i=1}^n 
p(x^{(i)}|s_i, r_i, \phi)
\right]
$$

Joining the products

$$
p(\lambda, \theta, \phi | D) \propto
\left[
\prod_{l=1}^R \lambda_l^{\eta_l + m_l - 1}
\right]
\left[
\prod_{l=1}^R
\prod_{k=1}^S 
\theta_{lk}^{\alpha_{lk} + c_{lk} - 1} 
\right]
\left[
p(\phi)
\prod_{i=1}^n 
p(x^{(i)}|s_i, r_i, \phi)
\right]
$$

And now for the $\phi$ bracket. Inserting the definitions

$$
p(\lambda, \theta, \phi | D) \propto
\left[
\prod_{l=1}^R \lambda_l^{\eta_l + m_l - 1}
\right]
\left[
\prod_{l=1}^R
\prod_{k=1}^S 
\theta_{lk}^{\alpha_{lk} + c_{lk} - 1} 
\right]
\left[
\prod_{l=1}^R
\prod_{k=1}^S
\prod_{j=1}^V 
\phi_{lkj}^{\beta_{lkj}-1} 
\prod_{i=1}^n
\frac
{\left(\sum_{j=1}^V x^{(i)}_j\right)!}
{\prod_{j=1}^V x^{(i)}_j!}
\prod_{l=1}^R
\prod_{k=1}^S
\prod_{j=1}^V 
\phi_{lkj}^{x^{(i)}_j[l=r_i][k=s_i]} 
\right]
$$

Dropping the constant term

$$
p(\lambda, \theta, \phi | D) \propto
\left[
\prod_{l=1}^R \lambda_l^{\eta_l + m_l - 1}
\right]
\left[
\prod_{l=1}^R
\prod_{k=1}^S 
\theta_{lk}^{\alpha_{lk} + c_{lk} - 1} 
\right]
\left[
\prod_{l=1}^R
\prod_{k=1}^S
\prod_{j=1}^V 
\phi_{lkj}^{\beta_{lkj}-1} 
\prod_{i=1}^n
\prod_{l=1}^R
\prod_{k=1}^S
\prod_{j=1}^V 
\phi_{lkj}^{x^{(i)}_j[l=r_i][k=s_i]} 
\right]
$$

Introducing $w_{lkj} = \sum_{i=1}^n x^{(i)}_j[l=r_i][k=s_i]$, the total count of words $j$ in all documents to receiver $l$ from sender $k$

$$
p(\lambda, \theta, \phi | D) \propto
\left[
\prod_{l=1}^R \lambda_l^{\eta_l + m_l - 1}
\right]
\left[
\prod_{l=1}^R
\prod_{k=1}^S 
\theta_{lk}^{\alpha_{lk} + c_{lk} - 1} 
\right]
\left[
\prod_{l=1}^R
\prod_{k=1}^S
\prod_{j=1}^V 
\phi_{lkj}^{\beta_{lkj}-1} 
\prod_{l=1}^R
\prod_{k=1}^S
\prod_{j=1}^V 
\phi_{lkj}^{w_{lkj}} 
\right]
$$

Joining the products

$$
p(\lambda, \theta, \phi | D) \propto
\left[
\prod_{l=1}^R \lambda_l^{\eta_l + m_l - 1}
\right]
\left[
\prod_{l=1}^R
\prod_{k=1}^S 
\theta_{lk}^{\alpha_{lk} + c_{lk} - 1} 
\right]
\left[
\prod_{l=1}^R
\prod_{k=1}^S
\prod_{j=1}^V 
\phi_{lkj}^{\beta_{lkj} + w_{lkj} - 1} 
\right]
$$

We're interested in the posterior predictive distribution, i.e. given a new input $\tilde{x}$ what is the joint distribution of receivers $\tilde{r}$ and senders $\tilde{s}$, conditioned on the data set $D$.

$$
p(\tilde{r}, \tilde{s} | \tilde{x}, D) \propto
\int
\int
\int
p(\tilde{x}, \tilde{r}, \tilde{s} | \lambda, \theta, \phi, D)
p(\lambda, \theta, \phi | D) 
d\lambda
d\theta 
d\phi
$$

Re-arranging

$$
p(\tilde{r}, \tilde{s} | \tilde{x}, D) \propto
\int
p(\lambda|D)
p(\tilde{r} | \lambda)
d\lambda
\int
p(\theta|D)
p(\tilde{s} | \tilde{r}, \theta)
d\theta 
\int
p(\phi|D)
p(\tilde{x} | \tilde{r}, \tilde{s}, \phi)
d\phi
$$

We'll handle the integral over $\lambda$ first. Inserting the definitions

$$
p(\tilde{r}, \tilde{s} | \tilde{x}, D) \propto
\int
\left[
\prod_{l=1}^R \lambda_l^{\eta_l + m_l - 1}
\right]
\left[
\prod_{l=1}^R
\lambda_l^{[l=\tilde{r}]}
\right]
d\lambda
\int
p(\theta|D)
p(\tilde{s} | \tilde{r}, \theta)
d\theta 
\int
p(\phi|D)
p(\tilde{x} | \tilde{r}, \tilde{s}, \phi)
d\phi
$$

Joining the products

$$
p(\tilde{r}, \tilde{s} | \tilde{x}, D) \propto
\int
\prod_{l=1}^R \lambda_l^{\eta_l + m_l + [l=\tilde{r}] - 1}
d\lambda
\int
p(\theta|D)
p(\tilde{s} | \tilde{r}, \theta)
d\theta 
\int
p(\phi|D)
p(\tilde{x} | \tilde{r}, \tilde{s}, \phi)
d\phi
$$

This is an integral over the un-normalized $Dir(\lambda|\eta_l + m_l + [l=\tilde{r}])$ distribution.

Using $\int \frac{1}{Z}p(x)dx = 1 \implies \int p(x) dx = Z$

$$
p(\tilde{r}, \tilde{s} | \tilde{x}, D) \propto
\frac
{\prod_{l=1}^R\Gamma(\eta_l + m_l + [l=\tilde{r}])}
{\Gamma \left( \sum_{l=1}^R \eta_l + m_l + [l=\tilde{r}] \right)}
\int
p(\theta|D)
p(\tilde{s} | \tilde{r}, \theta)
d\theta 
\int
p(\phi|D)
p(\tilde{x} | \tilde{r}, \tilde{s}, \phi)
d\phi
$$

Using $\Gamma(x+1) = x\Gamma(x)$

$$
p(\tilde{r}, \tilde{s} | \tilde{x}, D) \propto
\frac
{\eta_\tilde{r} + m_\tilde{r}}
{\sum_{l=1}^R \eta_l + m_l}
\frac
{\prod_{l=1}^R\Gamma(\eta_l + m_l)}
{\Gamma \left( \sum_{l=1}^R \eta_l + m_l\right)}
\int
p(\theta|D)
p(\tilde{s} | \tilde{r}, \theta)
d\theta 
\int
p(\phi|D)
p(\tilde{x} | \tilde{r}, \tilde{s}, \phi)
d\phi
$$

Dropping the constant terms

$$
p(\tilde{r}, \tilde{s} | \tilde{x}, D) \propto
(\eta_\tilde{r} + m_\tilde{r})
\int
p(\theta|D)
p(\tilde{s} | \tilde{r}, \theta)
d\theta 
\int
p(\phi|D)
p(\tilde{x} | \tilde{r}, \tilde{s}, \phi)
d\phi
$$

Now for the integral over $\theta$. Inserting the definitions

$$
p(\tilde{r}, \tilde{s} | \tilde{x}, D) \propto
(\eta_\tilde{r} + m_\tilde{r})
\int
\left[
\prod_{l=1}^R
\prod_{k=1}^S 
\theta_{lk}^{\alpha_{lk} + c_{lk} - 1} 
\right]
\left[
\prod_{l=1}^R 
\prod_{k=1}^S 
\theta_{lk}^{[l=\tilde{r}][k=\tilde{s}]}
\right]
d\theta
\int
p(\phi|D)
p(\tilde{x} | \tilde{r}, \tilde{s}, \phi)
d\phi
$$

Joining the products, and moving the product over $l$ outside the integral

$$
p(\tilde{r}, \tilde{s} | \tilde{x}, D) \propto
(\eta_\tilde{r} + m_\tilde{r})
\prod_{l=1}^R
\int
\left[
\prod_{k=1}^S 
\theta_{lk}^{\alpha_{lk} + c_{lk} + [l=\tilde{r}][k=\tilde{s}] - 1} 
\right]
d\theta
\int
p(\phi|D)
p(\tilde{x} | \tilde{r}, \tilde{s}, \phi)
d\phi
$$

This is an integral over the unnormalized $Dir(\theta_l|\alpha_l+c_l+[l=\tilde{r}][k=\tilde{s}])$

$$
p(\tilde{r}, \tilde{s} | \tilde{x}, D) \propto
(\eta_\tilde{r} + m_\tilde{r})
\prod_{l=1}^R
\frac
{\prod_{k=1}^S\Gamma(\alpha_{lk}+c_{lk}+[l=\tilde{r}][k=\tilde{s}])}
{\Gamma \left( \sum_{k=1}^S \alpha_{lk}+c_{lk}+[l=\tilde{r}][k=\tilde{s}] \right)}
\int
p(\phi|D)
p(\tilde{x} | \tilde{r}, \tilde{s}, \phi)
d\phi
$$

Splitting the sum in the denominator

$$
p(\tilde{r}, \tilde{s} | \tilde{x}, D) \propto
(\eta_\tilde{r} + m_\tilde{r})
\prod_{l=1}^R
\frac
{\prod_{k=1}^S\Gamma(\alpha_{lk}+c_{lk}+[l=\tilde{r}][k=\tilde{s}])}
{\Gamma \left( \sum_{k=1}^S (\alpha_{lk}+c_{lk})+\sum_{k=1}^S[l=\tilde{r}][k=\tilde{s}] \right)}
\int
p(\phi|D)
p(\tilde{x} | \tilde{r}, \tilde{s}, \phi)
d\phi
$$

Using $\Gamma(x+n) = x^{(n)}\Gamma(x)$, where $x^{(n)} = x(x+1)...x(x+n-1)$ denotes the rising factorial

$$
p(\tilde{r}, \tilde{s} | \tilde{x}, D) \propto
(\eta_\tilde{r} + m_\tilde{r})
\prod_{l=1}^R
\frac
{\prod_{k=1}^S (\alpha_{lk}+c_{lk})^{([l=\tilde{r}][k=\tilde{s}])}}
{\left( \sum_{k=1}^S \alpha_{lk}+c_{lk}\right)^{(\sum_{k=1}^S[l=\tilde{r}][k=\tilde{s}])}}
\frac
{\prod_{k=1}^S\Gamma(\alpha_{lk}+c_{lk})}
{\Gamma \left( \sum_{k=1}^S (\alpha_{lk}+c_{lk})\right)}
\int
p(\phi|D)
p(\tilde{x} | \tilde{r}, \tilde{s}, \phi)
d\phi
$$

Dropping the constant term

$$
p(\tilde{r}, \tilde{s} | \tilde{x}, D) \propto
(\eta_\tilde{r} + m_\tilde{r})
\prod_{l=1}^R
\frac
{\prod_{k=1}^S (\alpha_{lk}+c_{lk})^{([l=\tilde{r}][k=\tilde{s}])}}
{\left( \sum_{k=1}^S \alpha_{lk}+c_{lk}\right)^{(\sum_{k=1}^S[l=\tilde{r}][k=\tilde{s}])}}
\int
p(\phi|D)
p(\tilde{x} | \tilde{r}, \tilde{s}, \phi)
d\phi
$$

Evaluating the product over $l$ by using $x^{(0)}=1$ by definition

$$
p(\tilde{r}, \tilde{s} | \tilde{x}, D) \propto
(\eta_\tilde{r} + m_\tilde{r})
\frac
{\prod_{k=1}^S (\alpha_{\tilde{r}k}+c_{\tilde{r}k})^{([k=\tilde{s}])}}
{\left( \sum_{k=1}^S \alpha_{\tilde{r}k}+c_{\tilde{r}k}\right)^{(\sum_{k=1}^S[k=\tilde{s}])}}
\int
p(\phi|D)
p(\tilde{x} | \tilde{r}, \tilde{s}, \phi)
d\phi
$$

The $\sum_{k=1}^S[k=\tilde{s}]$ in the denominator is equal to 1, and by definition $x^{(1)} = x$. The numerator is going to be a product over 1, except for the single case where $k=\tilde{s}$.

$$
p(\tilde{r}, \tilde{s} | \tilde{x}, D) \propto
(\eta_\tilde{r} + m_\tilde{r})
\frac
{\alpha_{\tilde{r}\tilde{s}}+c_{\tilde{r}\tilde{s}}}
{\sum_{k=1}^S (\alpha_{\tilde{r}k}+c_{\tilde{r}k})}
\int
p(\phi|D)
p(\tilde{x} | \tilde{r}, \tilde{s}, \phi)
d\phi
$$

And now for the last integral over $\phi$. Inserting the definitions

$$
p(\tilde{r}, \tilde{s} | \tilde{x}, D) \propto
(\eta_\tilde{r} + m_\tilde{r})
\frac
{\alpha_{\tilde{r}\tilde{s}}+c_{\tilde{r}\tilde{s}}}
{\sum_{k=1}^S (\alpha_{\tilde{r}k}+c_{\tilde{r}k})}
\int
\prod_{l=1}^R
\prod_{k=1}^S
\prod_{j=1}^V 
\phi_{lkj}^{\beta_{lkj} + w_{lkj} - 1}
\frac
{\left(\sum_{j=1}^V \tilde{x}_j\right)!}
{\prod_{j=1}^V \tilde{x}_j!}
\prod_{l=1}^R
\prod_{k=1}^S
\prod_{j=1}^V 
\phi_{lkj}^{\tilde{x}_j[l=\tilde{r}][k=\tilde{s}]} 
d\phi
$$

Dropping the constant term

$$
p(\tilde{r}, \tilde{s} | \tilde{x}, D) \propto
(\eta_\tilde{r} + m_\tilde{r})
\frac
{\alpha_{\tilde{r}\tilde{s}}+c_{\tilde{r}\tilde{s}}}
{\sum_{k=1}^S (\alpha_{\tilde{r}k}+c_{\tilde{r}k})}
\int
\prod_{l=1}^R
\prod_{k=1}^S
\prod_{j=1}^V 
\phi_{lkj}^{\beta_{lkj} + w_{lkj} - 1}
\prod_{l=1}^R
\prod_{k=1}^S
\prod_{j=1}^V 
\phi_{lkj}^{\tilde{x}_j[l=\tilde{r}][k=\tilde{s}]} 
d\phi
$$

Joining the products, and moving the products over $l$ and $k$ outside the integral

$$
p(\tilde{r}, \tilde{s} | \tilde{x}, D) \propto
(\eta_\tilde{r} + m_\tilde{r})
\frac
{\alpha_{\tilde{r}\tilde{s}}+c_{\tilde{r}\tilde{s}}}
{\sum_{k=1}^S (\alpha_{\tilde{r}k}+c_{\tilde{r}k})}
\prod_{l=1}^R
\prod_{k=1}^S
\int
\prod_{j=1}^V 
\phi_{lkj}^{\beta_{lkj} + w_{lkj} + \tilde{x}_j[l=\tilde{r}][k=\tilde{s}] - 1} 
d\phi
$$

This is an integral over the unnormalized $Dir(\phi_{lk}|\beta_{lk} + w_{lk} + \tilde{x}_j[l=\tilde{r}][k=\tilde{s}])$ distribution.

$$
p(\tilde{r}, \tilde{s} | \tilde{x}, D) \propto
(\eta_\tilde{r} + m_\tilde{r})
\frac
{\alpha_{\tilde{r}\tilde{s}}+c_{\tilde{r}\tilde{s}}}
{\sum_{k=1}^S (\alpha_{\tilde{r}k}+c_{\tilde{r}k})}
\prod_{l=1}^R
\prod_{k=1}^S
\frac
{\prod_{j=1}^V \Gamma(\beta_{lkj} + w_{lkj} + \tilde{x}_j[l=\tilde{r}][k=\tilde{s}])}
{\Gamma \left( \sum_{j=1}^V \beta_{lkj} + w_{lkj} + \tilde{x}_j[l=\tilde{r}][k=\tilde{s}] \right)}
$$

Splitting the sum in the denominator

$$
p(\tilde{r}, \tilde{s} | \tilde{x}, D) \propto
(\eta_\tilde{r} + m_\tilde{r})
\frac
{\alpha_{\tilde{r}\tilde{s}}+c_{\tilde{r}\tilde{s}}}
{\sum_{k=1}^S (\alpha_{\tilde{r}k}+c_{\tilde{r}k})}
\prod_{l=1}^R
\prod_{k=1}^S
\frac
{\prod_{j=1}^V \Gamma(\beta_{lkj} + w_{lkj} + \tilde{x}_j[l=\tilde{r}][k=\tilde{s}])}
{\Gamma \left( \sum_{j=1}^V (\beta_{lkj} + w_{lkj}) + \sum_{j=1}^V \tilde{x}_j[l=\tilde{r}][k=\tilde{s}] \right)}
$$

Using $\Gamma(x+n) = x^{(n)}\Gamma(x)$

$$
p(\tilde{r}, \tilde{s} | \tilde{x}, D) \propto
(\eta_\tilde{r} + m_\tilde{r})
\frac
{\alpha_{\tilde{r}\tilde{s}}+c_{\tilde{r}\tilde{s}}}
{\sum_{k=1}^S (\alpha_{\tilde{r}k}+c_{\tilde{r}k})}
\prod_{l=1}^R
\prod_{k=1}^S
\frac
{\prod_{j=1}^V (\beta_{lkj} + w_{lkj})^{(\tilde{x}_j[l=\tilde{r}][k=\tilde{s}])}}
{\left(\sum_{j=1}^V \beta_{lkj} + w_{lkj}\right)^{(\sum_{j=1}^V \tilde{x}_j[l=\tilde{r}][k=\tilde{s}])}}
\frac
{\prod_{j=1}^V \Gamma(\beta_{lkj} + w_{lkj})}
{\Gamma \left( \sum_{j=1}^V (\beta_{lkj} + w_{lkj}) \right)}
$$

Dropping the constant term

$$
p(\tilde{r}, \tilde{s} | \tilde{x}, D) \propto
(\eta_\tilde{r} + m_\tilde{r})
\frac
{\alpha_{\tilde{r}\tilde{s}}+c_{\tilde{r}\tilde{s}}}
{\sum_{k=1}^S (\alpha_{\tilde{r}k}+c_{\tilde{r}k})}
\prod_{l=1}^R
\prod_{k=1}^S
\frac
{\prod_{j=1}^V (\beta_{lkj} + w_{lkj})^{(\tilde{x}_j[l=\tilde{r}][k=\tilde{s}])}}
{\left(\sum_{j=1}^V \beta_{lkj} + w_{lkj}\right)^{(\sum_{j=1}^V \tilde{x}_j[l=\tilde{r}][k=\tilde{s}])}}
$$

Evaluating the products

$$
p(\tilde{r}, \tilde{s} | \tilde{x}, D) \propto
(\eta_\tilde{r} + m_\tilde{r})
\frac
{\alpha_{\tilde{r}\tilde{s}}+c_{\tilde{r}\tilde{s}}}
{\sum_{k=1}^S (\alpha_{\tilde{r}k}+c_{\tilde{r}k})}
\frac
{\prod_{j=1}^V (\beta_{\tilde{r}\tilde{s}j} + w_{\tilde{r}\tilde{s}j})^{(\tilde{x}_j)}}
{\left(\sum_{j=1}^V \beta_{\tilde{r}\tilde{s}j} + w_{\tilde{r}\tilde{s}j}\right)^{(\sum_{j=1}^V \tilde{x}_j)}}
$$

Using $x^{(n)} = \frac{\Gamma(x+n)}{\Gamma(x)}$

$$
p(\tilde{r}, \tilde{s} | \tilde{x}, D) \propto
(\eta_\tilde{r} + m_\tilde{r})
\frac
{\alpha_{\tilde{r}\tilde{s}}+c_{\tilde{r}\tilde{s}}}
{\sum_{k=1}^S (\alpha_{\tilde{r}k}+c_{\tilde{r}k})}
\frac
{
  \prod_{j=1}^V
  \frac{\Gamma(\beta_{\tilde{r}\tilde{s}j} + w_{\tilde{r}\tilde{s}j} + \tilde{x}_j)}{\Gamma(\beta_{\tilde{r}\tilde{s}j} + w_{\tilde{r}\tilde{s}j})}
}
{
  \frac{\Gamma\left(\sum_{j=1}^V \beta_{\tilde{r}\tilde{s}j} + w_{\tilde{r}\tilde{s}j} + \tilde{x}_j\right)}{\Gamma\left(\sum_{j=1}^V \beta_{\tilde{r}\tilde{s}j} + w_{\tilde{r}\tilde{s}j}\right)}
}
$$

Simplifying

$$
p(\tilde{r}, \tilde{s} | \tilde{x}, D) \propto
(\eta_\tilde{r} + m_\tilde{r})
\frac
{\alpha_{\tilde{r}\tilde{s}}+c_{\tilde{r}\tilde{s}}}
{\sum_{k=1}^S (\alpha_{\tilde{r}k}+c_{\tilde{r}k})}
\prod_{j=1}^V
\frac
{\Gamma(\beta_{\tilde{r}\tilde{s}j} + w_{\tilde{r}\tilde{s}j} + \tilde{x}_j)}
{\Gamma(\beta_{\tilde{r}\tilde{s}j} + w_{\tilde{r}\tilde{s}j})}
\frac
{\Gamma\left(\sum_{j=1}^V \beta_{\tilde{r}\tilde{s}j} + w_{\tilde{r}\tilde{s}j}\right)}
{\Gamma\left(\sum_{j=1}^V \beta_{\tilde{r}\tilde{s}j} + w_{\tilde{r}\tilde{s}j} + \tilde{x}_j\right)}
$$

Simplifying

$$
p(\tilde{r}, \tilde{s} | \tilde{x}, D) \propto
(\eta_\tilde{r} + m_\tilde{r})
\frac
{\alpha_{\tilde{r}\tilde{s}}+c_{\tilde{r}\tilde{s}}}
{\sum_{k=1}^S (\alpha_{\tilde{r}k}+c_{\tilde{r}k})}
\frac
{\prod_{j=1}^V \Gamma(\beta_{\tilde{r}\tilde{s}j} + w_{\tilde{r}\tilde{s}j} + \tilde{x}_j)}
{\prod_{j=1}^V \Gamma(\beta_{\tilde{r}\tilde{s}j} + w_{\tilde{r}\tilde{s}j})}
\frac
{\Gamma\left(\sum_{j=1}^V \beta_{\tilde{r}\tilde{s}j} + w_{\tilde{r}\tilde{s}j}\right)}
{\Gamma\left(\sum_{j=1}^V \beta_{\tilde{r}\tilde{s}j} + w_{\tilde{r}\tilde{s}j} + \tilde{x}_j\right)}
$$

Simplifying

$$
p(\tilde{r}, \tilde{s} | \tilde{x}, D) \propto
(\eta_\tilde{r} + m_\tilde{r})
\frac
{\alpha_{\tilde{r}\tilde{s}}+c_{\tilde{r}\tilde{s}}}
{\sum_{k=1}^S (\alpha_{\tilde{r}k}+c_{\tilde{r}k})}
\frac
{\prod_{j=1}^V \Gamma(\beta_{\tilde{r}\tilde{s}j} + w_{\tilde{r}\tilde{s}j} + \tilde{x}_j)}
{\Gamma\left(\sum_{j=1}^V \beta_{\tilde{r}\tilde{s}j} + w_{\tilde{r}\tilde{s}j} + \tilde{x}_j\right)}
\frac
{\Gamma\left(\sum_{j=1}^V \beta_{\tilde{r}\tilde{s}j} + w_{\tilde{r}\tilde{s}j}\right)}
{\prod_{j=1}^V \Gamma(\beta_{\tilde{r}\tilde{s}j} + w_{\tilde{r}\tilde{s}j})}
$$

Simplifying, where $B$ is the generalized beta function

$$
p(\tilde{r}, \tilde{s} | \tilde{x}, D) \propto
(\eta_\tilde{r} + m_\tilde{r})
\frac
{\alpha_{\tilde{r}\tilde{s}}+c_{\tilde{r}\tilde{s}}}
{\sum_{k=1}^S (\alpha_{\tilde{r}k}+c_{\tilde{r}k})}
\frac
{B\left( \beta_{\tilde{r}\tilde{s}j} + w_{\tilde{r}\tilde{s}j} + \tilde{x}_j \right)}
{B\left( \beta_{\tilde{r}\tilde{s}j} + w_{\tilde{r}\tilde{s}j} \right)}
$$