# Bayesian Functional Overlapping Clusters

Bayesian functional overlapping clusters provides an alternative to traditional clustering methods where each observation can belong to multiple different clusters. We will use a latent variable $z_{(i,j)} \in \{0, 1\}$ to denote the $i^{th}$ observations membership to the $j^{th}$ cluster. The model will assume an additive structure of the clusters and will assume that the number of clusters ($K$) is known. We will give guidence on how to pick $K$ using information criterion.

In [1]:
%%javascript
MathJax.Hub.Config({
    TeX: { equationNumbers: { autoNumber: "AMS" } }
});

<IPython.core.display.Javascript object>

## Latent Factor Model

Let $\mathcal{T}$ be the domain of interest for the functions that we wish to cluster. We will assume that $\mathcal{T}$ is a compact set in $\mathbb{R}$ and that the domains of interest are the same across the functions that we wish to model. Let $y_i(\mathbf{t}_i)$, $i = 1, \dots, N$, be the observed time points of the second-order stochastic processes (meaning the mean and covariance functions are continuous) we wish to model. We will assume that $y_i(.) \in L^2(\mathcal{T})$.

We will start by assuming that there are $K$ different clusters of interest. Let $f^{(j)}(.)$, for $j = 1, \dots ,K$, be second order stochastic processes with mean $\mathbf{\mu}^{(j)}(.)$ and covariance kernel $C_{(i,j)}(s_i, t_j) = Cov\left(f_{i}(s_i), f_{j}(t_j)\right)$ where $s_i, t_j \in \mathcal{T}$. We will assume that $f^{(j)}(.) \in L^2(\mathcal{T})$. Let 
$$f(\mathbf{t}) = \left(f^{(1)}(t_1), \dots, f^{(K)}(t_K)\right)'\;\;\; \mathbf{t} \in \mathcal{T}^K$$

Letting $\mathcal{H}:= L^2(\mathcal{T}) \times L^2(\mathcal{T}) \times \dots \times L^2(\mathcal{T}) = \left(L^2(\mathcal{T})\right)^K$. Using the following definition of an inner product
$$\langle f, g \rangle = \sum_{j=1}^K \int_{\mathcal{T}} f^{(j)}(t_j) g^{(j)}(t_j)\text{d}t_j \;\;\; f, g \in \mathcal{H},$$


we have that $\mathcal{H}$ is a Hilbert space with respect to $\langle .,. \rangle$. We can define the Covariance Operator, $\mathcal{K}$, element wise in the following way:
$$\left(\mathcal{K}f\right)^{(i)}(\mathbf{t}) = \sum_{j=1}^K \int_{\mathcal{T}} C_{(j,i)}(s_j,t_i) f^{(j)}(s_j) \text{d}s_j,$$
where $f \in \mathcal{H}$.

Since we assumed that $f(\mathbf{t})$ is a second order stochastic process, linear, self-dadjoint, and a positive operator, we know that $\mathcal{K}$ is a compact operator (Clara Maria Happ Ch.3 Prop 2). Thus there exists a complete orthonormal basis of eigenfunctions $\boldsymbol{\psi}_m \in \mathcal{H}$ such that
$$\mathcal{K}\boldsymbol{\psi}_m = \lambda_m \boldsymbol{\psi}_m, \text{ where } \lambda_m > 0 \;\forall \;m \text{ and } \lambda_m \rightarrow 0 \text{ as } m \rightarrow \infty$$


Thus, using the Multivariate Karhunen-Loeve Theorem (Clara Maria Happ Ch.3 Proposition 4), we have that 
$$f(\mathbf{t}) = \boldsymbol{\mu}(\mathbf{t}) + \sum_{m=1}^\infty \rho_m \boldsymbol{\psi}_m(\mathbf{t}), \;\;\; \mathbf{t} \in \mathcal{T}^K$$
where $\rho_m = \langle f(\mathbf{t}) - \boldsymbol{\mu}(\mathbf{t}), \boldsymbol{\psi}_m \rangle$ being a zero mean random variable with $Cov(\rho_m, \rho_n) = \lambda_m \delta_{mn}$. Letting $\chi_m = \langle f(\mathbf{t}) - \boldsymbol{\mu}(\mathbf{t}), \lambda_m^{-1/2}\boldsymbol{\psi}_m \rangle$ and $\boldsymbol{\psi}_m'(\mathbf{t}) = \lambda_m^{1/2}\boldsymbol{\psi}_m(\mathbf{t})$, we get that $Cov(\chi_m, \chi_n) = \delta_{mn}$.

Taking a truncated Multivariate Karhunen-Loeve Expansion, we have
\begin{equation} 
f(\mathbf{t}) \approx \boldsymbol{\mu}(\mathbf{t}) + \sum_{m=1}^M \chi_m \boldsymbol{\psi}'_m(\mathbf{t}), \;\;\; \mathbf{t} \in \mathcal{T}  \label{eq1}
\end{equation}


***


We will assume that $f^{(i)}(.)$ are smooth functions that can be approximated by the following basis expansion:

$$f^{(i)}(\mathbf{t}) = \sum_{j=1}^P \theta_{(i,j)} b_{j}(t_i) = B'(t_i)\boldsymbol{\theta}_i, \;\;\; \mathbf{t} \in \mathcal{T}^K$$

Because $f_i(t)$ can be represented by a basis expansion, we can simplify a few things. Our covariance kernel for function $i$ can be represented as
$$C_{(i,j)}(s_i, t_j) = Cov\left(f^{(i)}(s_i), f^{(j)}(t_j)\right) = Cov\left(B'(s_i)\boldsymbol{\theta}_i, B'(t_j)\boldsymbol{\theta}_j\right) = B'(s_i)Cov\left(\boldsymbol{\theta}_i, \boldsymbol{\theta}_j\right)B(t_j)$$
Thus the Covariance Operator $\mathcal{K}$ can be written as
$$\begin{aligned}
\left(\mathcal{K}f\right)^{(i)}(\mathbf{t}) &= \sum_{j=1}^K \int_{\mathcal{T}} C_{(j,i)}(s_j,t_i) f^{(j)}(s_j) \text{d}s_j \\
 & = \sum_{j=1}^K \int_{\mathcal{T}} B'(s_j)Cov\left(\boldsymbol{\theta}_j, \boldsymbol{\theta}_i\right)B(t_i) f^{(j)}(s_j)\text{d}s_j\\
&  =\sum_{j=1}^K \int_{\mathcal{T}}  B'(t_i)Cov\left(\boldsymbol{\theta}_i, \boldsymbol{\theta}_j\right)B(s_j) f^{(j)}(s_j)\text{d}s_j\\
& = B'(t_i) \sum_{j=1}^K \left(Cov\left(\boldsymbol{\theta}_i, \boldsymbol{\theta}_j\right)
\begin{bmatrix}
\int_{\mathcal{T}} b_1(s_j) f^{(j)}(s_j)\text{d}s_j\\
\vdots \\
\int_{\mathcal{T}} b_P(s_j) f^{(j)}(s_j)\text{d}s_j\\
\end{bmatrix}\right)
\end{aligned}$$
Thus looking at the eigen analysis of $\mathcal{K}$, we have
$$\mathcal{K} \mathbf{\Psi}(\mathbf{t})  = \lambda \mathbf{\Psi}(\mathbf{t}) \;\;\; \forall \; \mathbf{t} \in \mathcal{T}^K$$
Element-wise, that implies that
$$B'(t_i) \sum_{j=1}^K \left(Cov\left(\boldsymbol{\theta}_i, \boldsymbol{\theta}_j\right)
\begin{bmatrix}
\int_{\mathcal{T}} b_1(s_j) \mathbf{\Psi}^{(j)}(s_j)\text{d}s_j\\
\vdots \\
\int_{\mathcal{T}} b_P(s_j) \mathbf{\Psi}^{(j)}(s_j)\text{d}s_j\\
\end{bmatrix}\right)  = \lambda \mathbf{\Psi}^{(i)}(\mathbf{t}), \;\;\; i = 1, \dots, K$$
Thus we can see that for $\lambda_j > 0$,  we have that $\mathbf{\Psi}^{(i)}_j(\mathbf{t}) =  B'(t_i) \boldsymbol{\phi}_{ij}$ for some $\boldsymbol{\phi}_{ij} \in \mathbb{R}^P$.

Thus rewriting it in matrix form, we have
$$\mathbf{\Psi}_m = \begin{bmatrix}
\mathbf{\Psi}^{(1)}_m(\mathbf{t}) \\
\vdots \\
\mathbf{\Psi}^{(K)}_m(\mathbf{t}) \\
\end{bmatrix} = \begin{bmatrix}
\boldsymbol{\phi}_{1m}' & \mathbf{0} & \dots & \mathbf{0} \\
\mathbf{0} & \boldsymbol{\phi}_{2m}' & \dots & \mathbf{0} \\
\vdots & \vdots & \ddots & \vdots \\
\mathbf{0} & \mathbf{0} & \dots & \boldsymbol{\phi}_{Km}'  \\
\end{bmatrix} \begin{bmatrix}
B(t_1) \\
\vdots \\
B(t_K)\\
\end{bmatrix} = \boldsymbol{\Phi}'_mS(\mathbf{t})$$



Since we assume that $f^{(i)}(.)$ can be well approximated using a basis expansion, we will also approximate the mean function using these basis functions. Thus we have
$$\boldsymbol{\mu}^{(i)}(\mathbf{t}) = \sum_{j=1}^P \nu_{(i,j)} b_j(t_i) = B'(t_i)\nu_i$$

In matrix form, we have
$$\boldsymbol{\mu}(\mathbf{t}) = \begin{bmatrix}
\boldsymbol{\mu}^{(1)}(\mathbf{t}) \\
\vdots \\
\boldsymbol{\mu}^{(K)}(\mathbf{t}) \\
\end{bmatrix} = \begin{bmatrix}
\boldsymbol{\nu}_1' & \mathbf{0} & \dots & \mathbf{0} \\
\mathbf{0} & \boldsymbol{\nu}_2' & \dots & \mathbf{0} \\
\vdots & \vdots & \ddots & \vdots \\
\mathbf{0} & \mathbf{0} & \dots & \boldsymbol{\nu}_K' \\
\end{bmatrix}\begin{bmatrix}
B(t_1)\\
\vdots \\
B(t_K)\\
\end{bmatrix}
 = \boldsymbol{\nu}'S(\mathbf{t})$$
 
 In order to use $\eqref{eq1}$ as the expansion, we will let the $\lambda_m^{1/2}$ term to be absorbed into $\boldsymbol{\Phi}_m$. Thus we have:
 $$\begin{equation}
 f(\mathbf{t}) = \boldsymbol{\nu}'S(\mathbf{t}) + \sum_{m=1}^P\chi_m\boldsymbol{\Phi}'_mS(\mathbf{t})
 \label{KL_expansion_f}
 \end{equation}$$
 
We will introduce a set of latent variables $z_{ij} \in \{0, 1\}$ that indicates the $i^{th}$ observation's membership to the $j^{th}$ cluster, for $i =1, \dots, N$ and $j = 1, \dots, K$. Using a truncated version of $\eqref{KL_expansion_f}$ and adding a random error component, we can use the following to model $y_i(t)$:
$$\begin{equation}
y_i(t) = \mathbf{z}_i\left(f_i(\mathbf{t}) \right) + \epsilon_i(t) = \mathbf{z}_i\left(\boldsymbol{\nu}'S(\mathbf{t}) + \sum_{m=1}^M \chi_{im} \boldsymbol{\Phi}_m'S(\mathbf{t}) \right) + \epsilon_i(t)
\end{equation}$$


where $\mathbf{t} = [t, \dots, t]'$, $\mathbf{z}_i$ is a row vector of length $K$, and $\epsilon_i \sim GP\left(\mathbf{0}, K(.,.)\right)$ where $K(x,y) = \sigma^2 \delta_{xy}$ for $i = 1, \dots N$. By letting $\boldsymbol{\chi}_i \sim \mathcal{N}(0, \mathbf{I}_M)$, we have that $$y_i(t) \sim \mathcal{N}\left( \mathbf{z}_i\boldsymbol{\nu}'S(\mathbf{t}), \sum_{m=1}^M \mathbf{z}_i\left[\boldsymbol{\Phi}_m'S(\mathbf{t})S'(\mathbf{t})\boldsymbol{\Phi}_m\right]\mathbf{z}_i' + \sigma^2\right)$$

***

If we wanted the covariance of $\mathbf{y}_i(\mathbf{t})|\mathbf{z}_i, \boldsymbol{\Phi}_m', \boldsymbol{\nu}$, where $\mathbf{t} = (t_1, \dots, t_n)$, we would have to first find the covariance matrix of $\left[\mathbf{z}_i(f_i(\mathbf{t}_1)), \dots , \mathbf{z}_i(f_i(\mathbf{t}_n)) \right]$.
The elements can be computed by letting
$$\begin{aligned}
Var(\mathbf{z}_if_i(\mathbf{t}_j)) & = \sum_{m=1}^M \mathbf{z}_i \left[ \boldsymbol{\Phi}_m' S(\mathbf{t}_j)S'(\mathbf{t}_j) \boldsymbol{\Phi}_m\right]\mathbf{z}_i'\\
Cov\left(\mathbf{z}_if_i(\mathbf{t}_j), \mathbf{z}_if_i(\mathbf{t}_k)\right) & = \sum_{m=1}^M \mathbf{z}_i \left[ \boldsymbol{\Phi}_m' S(\mathbf{t}_j)S'(\mathbf{t}_k) \boldsymbol{\Phi}_m\right]\mathbf{z}_i'
\end{aligned}$$

Letting $\boldsymbol{\Sigma}$ be the covariance matrix of $\left[\mathbf{z}_i(f_i(\mathbf{t}_1)), \dots , \mathbf{z}_i(f_i(\mathbf{t}_n)) \right]$, we have that
$$Cov(\mathbf{y}_i(\mathbf{t})|\mathbf{z}_i, \boldsymbol{\Phi}_m', \boldsymbol{\nu}) = \boldsymbol{\Sigma} + \sigma^2\mathbf{I}_n$$
***

Since our main goal for this paper is to create overlapping clustering and not functional principal component analysis, we will relax the restriction that $\langle \boldsymbol{\Phi}'_m S(\mathbf{t}) , \boldsymbol{\Phi}'_n S(\mathbf{t}) \rangle = 0$ for $m \ne n$. However, since $\boldsymbol{\Phi}'_m$ absorbed the $\lambda_m^{1/2}$ term, we know that $||\boldsymbol{\Phi}_m|| > ||\boldsymbol{\Phi}_n||$ for $\lambda_m > \lambda_n$. Thus we will use an extension of the multiplicative gamma prior (Bhattacharya and Dunson).

Let $\phi_{jrm}$ be the $r^{th}$ element of $\boldsymbol{\phi}_{jm}$ for $j = 1, \dots, K$, $r = 1, \dots, P$, and $m = 1, \dots, P$.
 Thus we have the following:

$$\phi_{jrm}|\gamma_{jrm}, \tilde{\tau}_{m} \sim \mathcal{N}(0, \gamma_{jrm}^{-1} \tilde{\tau}_{m}^{-1}); \;\;\; j = 1, \dots, K, \;\;\;r = 1, \dots, P,
\;\;\; m = 1, \dots, M$$
$$\gamma_{jrm} \sim Gamma(\nu_{\gamma}/2, \nu_{\gamma}/2); \;\;\; i = 1,\dots, P, \;\;\; j = 1, \dots, M,\;\;\; l = 1,\dots, R$$
$$\tilde{\tau}_{m} = \prod_{n=1}^m \delta_{n}$$
$$\delta_{1} \sim Gamma(a_{1}, 1)$$
$$\delta_{j} \sim Gamma(a_{2}, 1); \;\;\; j = 2, \dots, M$$
$$a_{1} \sim Gamma(\alpha_{1}, \beta_{1})$$
$$a_{2} \sim Gamma(\alpha_{2}, \beta_{2})$$

In Dunson, $\alpha_{i}= \alpha_{i}^*= 2$, $\beta_{i} = \beta_{i}^* = 1$, and $\nu_{\gamma} = 3$.

For the mean parameter, we will use a prior that penalizes the sum of squares of the mean coefficients, $\nu_{rk}$. This in turn will help ensure that the mean parameters are not over-fit and will favor more smooth mean functions. Thus we have the following:
$$P(\boldsymbol{\nu}_l|\tau_l) \propto exp\left(-\frac{\tau_l}{2}\boldsymbol{\nu}_l'\mathbf{P}\boldsymbol{\nu}_l\right); \;\;\; l = 1,\dots, K$$
$$\tau_l \sim Gamma(\alpha, \beta); \;\;\; l = 1,\dots, K$$
where $\mathbf{P}= \begin{bmatrix}
1 & -1 & 0 &  & \\
-1 & 2 & -1 &  &  \\
 & \ddots & \ddots & \ddots&  \\
 &  & -1 & 2 & -1  \\
 &  &  & -1 & 1 \\
\end{bmatrix}$

In this setting, $\tau_l$ acts as a tuning parameter for the penalty that we put on the coefficients (similar to $\lambda$ in ridge regression).

We will start with the case where $k$ is a fixed parameter in the model.  Thus we will have the following distribution over $\mathbf{Z} \in \mathbb{R}^{N \times K}$
$$z_{il}|\pi_l \sim Bernoulli(\pi_l) \;\; i = 1,\dots, N \;\; l = 1, \dots, K$$
$$\pi_l \sim Beta(\alpha_3/K, 1)\;\; l = 1, \dots, K$$

Our last random variable that we have yet to assign a prior is $\sigma^2$. Thus we will let $\sigma^2$ have the following prior:
$$\sigma^2 \sim IG(\alpha_0, \beta_0)$$
***

Let $\boldsymbol{\zeta}$ be the collection of all the random variables in this model, and let $\boldsymbol{\zeta}_{-\theta}$ be the collection of all random variables in the model minus $\theta$. Let $\mathbf{t}_i = (t_{i1}, \dots, t_{in_i})'$ be the observed time points of the $i^{th}$ function. Let $\mathbf{t}_i^* = (t_{i1}, \dots, t_{in_i}^*)'$ be the unobserved time points on which we want to conduct inference on. Thus we have
$$\boldsymbol{\phi}_{jm} | \boldsymbol{\zeta}_{-\boldsymbol{\phi}_{jm}} = \boldsymbol{\phi}_{jm}| y_i(\mathbf{t}_i), y_i(\mathbf{t}^*_i) , \gamma_{jrm},  \tilde{\tau}_m, \mathbf{z}_i, \boldsymbol{\nu}, \boldsymbol{\phi}_{j'm'}, \sigma^2;\;\;\; i = 1, \dots, N  \;\;\; n = 1, \dots, P\;\;\; (j, m) \ne (j', m')$$


$$\begin{aligned}
p\left(\boldsymbol{\phi}_{jm} | \boldsymbol{\zeta}_{-\boldsymbol{\phi}_{jm}}\right) & \propto  \prod_{i = 1}^N \prod_{l=1}^{n_i} exp\left\{-\frac{1}{2\sigma^2}\left(y_i(t_{il}) -  \sum_{k=1}^K z_{ik}\left(\boldsymbol{\nu}_k'B(t_{il}) + \sum_{n=1}^M\chi_{in}\boldsymbol{\phi}_{kn}'B(t_{il})\right)\right)^2\right\}\\
& \times \prod_{i = 1}^N \prod_{l=1}^{n_i^*} exp\left\{-\frac{1}{2\sigma^2}\left(y_i(t_{li}^*) -  \sum_{k=1}^Kz_{ik}\left(\boldsymbol{\nu}_k'B(t_{il}^*) + \sum_{n=1}^M\chi_{in}\boldsymbol{\phi}_{kn}'B(t_{il}^*)\right)\right)^2\right\}\\
& \times exp \left\{-\frac{1}{2}\boldsymbol{\phi}_{jm}'\mathbf{D}_{jm}^{-1}\boldsymbol{\phi}_{jm} \right\}\\
& \propto \prod_{i = 1}^N \prod_{l=1}^{n_i} exp\left\{-\frac{1}{2\sigma^2}\left(\boldsymbol{\phi}_{jm}'\left(z_{ij}\chi^2_{im} B(t_{il})B'(t_{il}) \right)\boldsymbol{\phi}_{jm}  \right. \right.  \\
 & \left. \left.-2\boldsymbol{\phi}'_{jm}B(t_{il})\chi_{im} \left(y_i(t_l)z_{ij} -  z_{ij} \boldsymbol{\nu}_{j}'B(t_{il}) - z_{ij}\sum_{n \ne m}\left[\chi_{in} \boldsymbol{\phi}_{kn}' B(t_{il})\right] - \sum_{k \ne j} \left[z_{ik}\boldsymbol{\nu}_{k}' B(t_{il}) + \sum_{n=1}^M \chi_n \boldsymbol{\phi}_{kn}'B(t_{il}) \right] \right)\right)\right\}\\
 & \times \prod_{i = 1}^N \prod_{l=1}^{n_i^*} exp\left\{-\frac{1}{2\sigma^2}\left(\boldsymbol{\phi}_{jm}'\left(z_{ij}\chi^2_{im} B(t_{il}^*)B'(t_{il}^*) \right)\boldsymbol{\phi}_{jm}  \right. \right.  \\
 & \left. \left.-2\boldsymbol{\phi}'_{jm}B(t_{il}^*)\chi_{im} \left(y_i(t_{il}^*)z_{ij} -  z_{ij} \boldsymbol{\nu}_{j}'B(t_{il}^*) - z_{ij}\sum_{n \ne m}\left[\chi_{in} \boldsymbol{\phi}_{kn}' B(t_{il}^*)\right] - \sum_{k \ne j} \left[z_{ik}\boldsymbol{\nu}_{k}' B(t_{il}^*) + \sum_{n=1}^M \chi_{in} \boldsymbol{\phi}_{kn}'B(t_{il}^*) \right] \right)\right)\right\}\\
& \times exp \left\{-\frac{1}{2}\boldsymbol{\phi}_{jm}'\mathbf{D}_{jm}^{-1}\boldsymbol{\phi}_{jm} \right\}\\
\end{aligned}$$


where $\mathbf{D}_{jm} = \tilde{\tau}_m^{-1} diag\left(\gamma_{j1m}^{-1}, \dots, \gamma_{jPm}^{-1}\right)$.
Letting 
$$\mathbf{m}_1 = \frac{1}{\sigma^2} \sum_{i=1}^N \sum_{l = 1}^{n_i}\left(B(t_{il})\chi_{im} \left(y_i(t_{il})z_{ij} -  z_{ij} \boldsymbol{\nu}_{j}'B(t_{il}) - z_{ij}\sum_{n \ne m}\left[\chi_{in} \boldsymbol{\phi}_{kn}' B(t_{il})\right] - \sum_{k \ne j} \left[z_{ik}\boldsymbol{\nu}_{k}' B(t_{il}) + \sum_{n=1}^M \chi_{in} \boldsymbol{\phi}_{kn}'B(t_{il}) \right] \right) \right)$$
$$\mathbf{m}_2 = \frac{1}{\sigma^2} \sum_{i=1}^N \sum_{l = 1}^{n_i^*}\left(B(t_{il}^*)\chi_{im} \left(y_i(t_{il}^*)z_{ij} -  z_{ij} \boldsymbol{\nu}_{j}'B(t_{il}^*) - z_{ij}\sum_{n \ne m}\left[\chi_{in} \boldsymbol{\phi}_{kn}' B(t_{il}^*)\right] - \sum_{k \ne j} \left[z_{ik}\boldsymbol{\nu}_{k}' B(t_{il}^*) + \sum_{n=1}^M \chi_{in} \boldsymbol{\phi}_{kn}'B(t_{il}^*) \right] \right) \right)$$ 
$$\mathbf{M}^{-1} = \frac{1}{\sigma^2}\sum_{i=1}^N \sum_{l = 1 }^{n_i} \left(z_{ij}\chi_{im}^2B(t_{il})B'(t_{il})\right) + \frac{1}{\sigma^2}\sum_{i=1}^N \sum_{l = 1 }^{n_i^*} \left(z_{ij}\chi_{im}^2B(t_{il}^*)B'(t_{il}^*)\right) + \mathbf{D}_{jm}^{-1}$$
and $\mathbf{m} = \mathbf{m}_1 + \mathbf{m}_2$, we have
$$\boldsymbol{\phi}_{jm} | \boldsymbol{\zeta}_{-\boldsymbol{\phi}_{jm}} \sim \mathcal{N}(\mathbf{M}\mathbf{m}, \mathbf{M})$$


***
$$\delta_1 | \boldsymbol{\zeta}_{-\delta_1} = \delta_1| \delta_{2}, \dots, \delta_{M}, \boldsymbol{\Phi}_1, \dots, \boldsymbol{\Phi}_M, \boldsymbol{\Gamma}_{1}, \dots,\boldsymbol{\Gamma}_{M}, a_1$$

where $\boldsymbol{\Gamma}_{i} = \{ \gamma_{n, m, i}| 1 \le n \le K, 1 \le m \le P\}$.


$$\begin{aligned}
\delta_1 | \boldsymbol{\zeta}_{-\delta_1} & \propto \delta_1^{a_1 -1}exp\{-\delta_1\}  \\
& \times \prod_{k =1}^K \prod_{r = 1}^{P} \prod_{m =1}^M (\gamma_{k,r,m}\prod_{i=1}^m\delta_i)^{1/2} exp \left\{ -\frac{\gamma_{k,r,m}\prod_{i=1}^m\delta_i}{2}(\phi_{k,r, m})^2\right\} \\
& \propto \delta_1^{a_1 -1}exp\{-\delta_1\} (\delta_i)^{(PMK/ 2)}exp \left\{-\delta_1\left(\frac{1}{2}\sum_{k = 1}^K \sum_{r=1}^P \gamma_{k,r,1}\phi_{k,r,1}^2 + \frac{1}{2}\sum_{m=2}^M\sum_{k=1}^K \sum_{r=1}^P \gamma_{k,r,m}\phi_{k,r,m}^2\left( \prod_{j=2}^m \delta_{j} \right)\right) \right\}
\end{aligned}$$

Thus we can see that $$\delta_1 | \boldsymbol{\zeta}_{-\delta_1} \sim Gamma\left(a_1 + (PMK/2), 1 + \frac{1}{2}\sum_{k = 1}^K \sum_{r=1}^P \gamma_{k,r,1}\phi_{k,r,1}^2 + \frac{1}{2}\sum_{m=2}^M\sum_{k=1}^K \sum_{r=1}^P \gamma_{k,r,m}\phi_{k,r,m}^2\left( \prod_{j=2}^m \delta_{j} \right)\right)$$

***
$$\delta_i | \boldsymbol{\zeta}_{-\delta_i} = \delta_i| \delta_{1}, \delta_{i-1}, \delta_{i+1}. \dots, \delta_{M}, \boldsymbol{\Phi}_i, \dots, \boldsymbol{\Phi}_M, \boldsymbol{\Gamma}_{i}, \dots,\boldsymbol{\Gamma}_{M}, a_2; \;\;\; i =2,  \dots, M$$

$$\begin{aligned}
\delta_i | \boldsymbol{\zeta}_{-\delta_i} & \propto \delta_i^{a_2 -1}exp\{-\delta_i\}  \\
& \times \prod_{k =1}^K \prod_{r = 1}^{P} \prod_{m =i}^M (\gamma_{k,r,m}\prod_{i=1}^m\delta_i)^{1/2} exp \left\{ -\frac{\gamma_{k,r,m}\prod_{i=1}^m\delta_i}{2}(\phi_{k,r, m})^2\right\} \\
& \propto \delta_i^{a_2 -1}exp\{-\delta_i\} (\delta_i)^{(P(M-i+1)K/ 2)}exp \left\{-\delta_i\left(\frac{1}{2}\sum_{m = i}^M\sum_{k=1}^K \sum_{r=1}^P \gamma_{k,r,m}\phi_{k,r,m}^2\left( \prod_{j=1; j \ne i}^m \delta_{j} \right)\right) \right\}
\end{aligned}$$

Thus we can see that $$\delta_i | \boldsymbol{\zeta}_{-\delta_i} \sim Gamma\left(a_2 + (P(M - i + 1)K/2), 1 + \frac{1}{2}\sum_{m = i}^M\sum_{k=1}^K \sum_{r=1}^P \gamma_{k,r,m}\phi_{k,r,m}^2\left( \prod_{j=1; j \ne i}^m \delta_{j} \right)\right)$$

***
$$a_{1}| \boldsymbol{\zeta}_{-a_{1}} = a_{1} | \delta_{1}$$

$$P(a_{1}|\delta_{1}) \propto \frac{1}{\Gamma(a_{1})}\delta_{1}^{a_{1} -1} a_{1}^{\alpha_{1} -1} exp \left\{-a_{1}\beta_{1} \right\}$$


Since this is not a known kernel of a distribution, we will have to use Metropolis-Hastings algorithm. Consider the proposal distribution $Q(a_{1}', a_{1}) = \mathcal{N}\left(a_{1}, \epsilon\beta_{1}^{-1}, 0, + \infty\right)$ (Truncated Normal) for some small $\epsilon > 0$. Thus the probability of accepting any step is

$$A(a_{1)}',a_{1}) = \min \left\{1, \frac{P\left(a_{1}'| \boldsymbol{\zeta}_{-a_{1}'}\right)}{P\left(a_{1}| \boldsymbol{\zeta}_{-a_{1}}\right)} \frac{Q\left(a_{1}|a_{1}'\right)}{Q\left(a_{1}'|a_{1}\right)}\right\}$$

***


$$a_{2}| \boldsymbol{\zeta}_{-a_{2}} = a_{2}| \delta_{2}, \dots, \delta_{M}$$
$$P(a_{2} | \delta_{2}, \dots, \delta_{M}) \propto \frac{1}{\Gamma(a_{2})^{M-1}}\left(\prod_{i=2}^M\delta_{i}^{a_{2} -1}\right) a_{2}^{\alpha_{2} -1} exp \left\{-a_{2}\beta_{2} \right\}$$

Since this is not a known kernel of a distribution, we will have to use Metropolis-Hastings algorithm. Consider the proposal distribution $Q(a_{2}', a_{2}) = \mathcal{N}\left(a_{2}, \epsilon\beta_{2}^{-1}, 0, + \infty\right)$ (Truncated Normal) for some small $\epsilon > 0$. Thus the probability of accepting any step is

$$A(a_{(2,j)}',a_{(2,j)}) = \min \left\{1, \frac{P\left(a_{(2,j)}'| \boldsymbol{\zeta}_{-a_{(2,j)}'}\right)}{P\left(a_{(2,j)}| \boldsymbol{\zeta}_{-a_{(2,j)}}\right)} \frac{Q\left(a_{(2,j)}|a_{(2,j)}'\right)}{Q\left(a_{(2,j)}'|a_{(2,j)}\right)}\right\}$$

***

$$\gamma_{j,r,m}| \boldsymbol{\zeta}_{-\gamma_{j,r,m}} = \gamma_{j,r,m}| \phi_{j,r,m}, \tau_{m}$$

$$\begin{aligned}
P(\gamma_{j,r,m}| \boldsymbol{\zeta}_{-\gamma_{j,r,m}} & \propto (\gamma_{j,r,m}\tau_m)^{1/2}exp \left\{-\frac{\gamma_{j,r,m}\tau_{m}}{2}\phi_{j,r,m}^2 \right\} \\
& \times \gamma_{j,r,m}^{(\nu_\gamma /2) - 1} exp\{-\gamma_{j,r,m}\nu_\gamma / 2\}\\
& \propto \gamma_{j,r,m}^{((\nu_\gamma + 1)/2) - 1}exp\left\{-\gamma_{j,r,m} \left(\frac{\phi_{j,r,m}^2\tau_m + \nu_\gamma}{2} \right)\right\}
\end{aligned}$$

Thus we can see that 
$$\gamma_{j,r,m}| \boldsymbol{\zeta}_{-\gamma_{j,r,m}} \sim Gamma\left(\frac{\nu_\gamma + 1}{2},\frac{\phi_{j,r,m}^2\tau_m + \nu_\gamma}{2} \right)$$

***


$$\mathbf{z}_{i}| \boldsymbol{\zeta}_{-\mathbf{z}_{i}} = \mathbf{z}_{i} |\boldsymbol{\nu}, \boldsymbol{\Phi}_1, \dots, \boldsymbol{\Phi}_M, \sigma^2, \chi_{i1}, \dots, \chi_{iM}, \pi_1, \dots, \pi_K, y_{i}(\mathbf{t}), y_i(\mathbf{t}^*); \;\;\; i = 1, \dots, N$$

$$\begin{aligned}
P(\mathbf{z}_{i}| \boldsymbol{\zeta}_{-\mathbf{z}_{i}}) & \propto \prod_{l=1}^K \pi_l^{z_{il}} (1-\pi_l)^{(1 - z_{il})}\\
& \times \prod_{l=1}^{n_i} exp\left\{-\frac{1}{2\sigma^2}\left(y_i(t_{il}) -  \sum_{k=1}^K z_{ik}\left(\boldsymbol{\nu}_k'B(t_{il}) + \sum_{n=1}^M\chi_{in}\boldsymbol{\phi}_{kn}'B(t_{il})\right)\right)^2\right\}\\
& \times \prod_{l=1}^{n_i^*} exp\left\{-\frac{1}{2\sigma^2}\left(y_i(t_{il}^*) -  \sum_{k=1}^Kz_{ik}\left(\boldsymbol{\nu}_k'B(t_{il}^*) + \sum_{n=1}^M\chi_{in}\boldsymbol{\phi}_{kn}'B(t_{il}^*)\right)\right)^2\right\}
\end{aligned}$$



Since this is not proportional to a known distribution, we will perform a Metropolis-Hastings update. Thus let the proposal distribution of $z_{ij}$ to be $Q(z_{ij}'|z_{ij}) = Bin(z_{ij}|p = z_{ij} \rho + (1-z_{ij})(1-\rho))$, where $z_{ij}$'s are independent. Thus the probability of accepting  $\mathbf{z}_{i}'$ is 
$$A(\mathbf{z}_i', \mathbf{z}_i)= \min\left\{\frac{P(\mathbf{z}_i'| \boldsymbol{\zeta}_{-\mathbf{z}_{i}'})}{P(\mathbf{z}_i| \boldsymbol{\zeta}_{-\mathbf{z}_{i}})} \right\}$$

***

$$\pi_{l}| \boldsymbol{\zeta}_{-\pi_{l}} = \pi_{l} | z_{(1,l)}, \dots, z_{(N,l)}; \;\;\;  l = 1, \dots, K$$

$$\begin{aligned}
P(\pi_{l}| \boldsymbol{\zeta}_{-\pi_{l}}) &\propto \prod_{i=1}^N \left[\left(\pi_l\right)^{z_{il}}\left(1 - \pi_l\right)^{1 - z_{il}}\right]  (\pi_l)^{\alpha_3/K -1}\\
 & \propto \left(\pi_l\right)^{ \left(\alpha_3/K + \sum_{i=1}^N z_{il}\right) -1} \left(1 - \pi_l\right)^{N - \sum_{i=1}^N z_{il}}\\
\end{aligned}$$

Thus we have 
$$\pi_{l}| \boldsymbol{\zeta}_{-\pi_{l}}\sim Beta \left(\alpha_3/K + \sum_{i=1}^N z_{il}, N - \sum_{i=1}^N z_{il} + 1\right)$$

***
$$\boldsymbol{\nu}_j| \boldsymbol{\zeta}_{-\boldsymbol{\nu}_j} = \boldsymbol{\nu}_j|\boldsymbol{\nu}_1, \dots , \boldsymbol{\nu}_{j-1}, \boldsymbol{\nu}_{j+1}, \boldsymbol{\nu}_k, y_1(\mathbf{t}_1), \dots y_N(\mathbf{t}_N), y_1(\mathbf{t}^*_1), \dots, y_N(\mathbf{t}^*_N), \tau_i^2, \boldsymbol{\Phi}_1, \dots, \boldsymbol{\Phi}_M, \mathbf{z}_1, \dots, \mathbf{z}_N, \sigma^2; \;\;\; j= 1, \dots, K$$

$$\begin{aligned}
P(\boldsymbol{\nu}_j| \boldsymbol{\zeta}_{-\boldsymbol{\nu}_j}) & \propto exp\left(-\frac{\tau_j}{2}\boldsymbol{\nu}_j'\mathbf{P}\boldsymbol{\nu}_j\right)\\
& \times \prod_{i = 1}^N \prod_{l=1}^{n_i} exp\left\{-\frac{1}{2\sigma^2}\left(y_i(t_{il}) -  \sum_{k=1}^K z_{ik}\left(\boldsymbol{\nu}_k'B(t_{il}) + \sum_{n=1}^M\chi_{in}\boldsymbol{\phi}_{kn}'B(t_{l})\right)\right)^2\right\}\\
& \times \prod_{i = 1}^N \prod_{l=1}^{n_i^*} exp\left\{-\frac{1}{2\sigma^2}\left(y_i(t_{il}^*) -  \sum_{k=1}^Kz_{ik}\left(\boldsymbol{\nu}_k'B(t_{il}^*) + \sum_{n=1}^M\chi_{in}\boldsymbol{\phi}_{kn}'B(t_{il}^*)\right)\right)^2\right\}\\
& \propto exp\left\{- \frac{1}{2}\left(\boldsymbol{\nu}_j'\left(\tau_j\mathbf{P} + \frac{1}{\sigma^2} \sum_{i =1}^N \sum_{l=1}^{n_i}z_{ij}B(t_{il})B'(t_{il}) + \frac{1}{\sigma^2} \sum_{i =1}^N \sum_{l=1}^{n_i^*}z_{ij}B(t_{il}^*)B'(t_{il}^*) \right)\boldsymbol{\nu}_j \right. \right.\\
& \left.\left. -2\boldsymbol{\nu}_j'\left(\frac{1}{\sigma^2}\sum_{i=1}^N\sum_{l=1}^{n_i}z_{ij}B(t_{il})\left(y_i(t_{il}) - \left(\sum_{k\ne j}z_{ik}\boldsymbol{\nu}'_{k}B(t_{il})\right)  - \left(\sum_{k=1}^K \sum_{n=1}^M\chi_{in}\boldsymbol{\phi}_{kn}'B(t_{l}) \right)\right)\right) \right.\right.\\
& \left.\left. -2\boldsymbol{\nu}_j'\left(\frac{1}{\sigma^2}\sum_{i=1}^N\sum_{l=1}^{n_i}z_{ij}B(t_{il}^*)\left(y_i(t_{il}^*) - \left(\sum_{k\ne j}z_{ik}\boldsymbol{\nu}'_{k}B(t_{il}^*)\right)  - \left(\sum_{k=1}^K \sum_{n=1}^M\chi_{in}\boldsymbol{\phi}_{kn}'B(t_{l}^*) \right)\right)\right) \right)\right\}
\end{aligned}$$

Letting 
$$\mathbf{B}_j = \left( \tau_j\mathbf{P} + \frac{1}{\sigma^2} \sum_{i =1}^N \sum_{l=1}^{n_i}z_{ij}B(t_{il})B'(t_{il}) + \frac{1}{\sigma^2} \sum_{i =1}^N \sum_{l=1}^{n_i^*}z_{ij}B(t_{il}^*)B'(t_{il}^*)  \right)^{-1},$$
$$\mathbf{b}_{1j} = \frac{1}{\sigma^2}\sum_{i=1}^N\sum_{l=1}^{n_i}z_{ij}B(t_{il})\left(y_i(t_{il}) - \left(\sum_{k\ne j}z_{ik}\boldsymbol{\nu}'_{k}B(t_{il})\right)  - \left(\sum_{k=1}^K \sum_{n=1}^M\chi_{in}\boldsymbol{\phi}_{kn}'B(t_{l}) \right)\right),$$
$$\mathbf{b}_{2j} = \frac{1}{\sigma^2}\sum_{i=1}^N\sum_{l=1}^{n_i}z_{ij}B(t_{il}^*)\left(y_i(t_{il}^*) - \left(\sum_{k\ne j}z_{ik}\boldsymbol{\nu}'_{k}B(t_{il}^*)\right)  - \left(\sum_{k=1}^K \sum_{n=1}^M\chi_{in}\boldsymbol{\phi}_{kn}'B(t_{l}^*) \right)\right),$$
and $\mathbf{b}_j = \mathbf{b}_{1j} + \mathbf{b}_{2j}$, then we have that 
$$\boldsymbol{\nu}_j| \boldsymbol{\zeta}_{-\boldsymbol{\nu}_j} \sim \mathcal{N}(\mathbf{B}_j\mathbf{b}_j, \mathbf{B}_j)$$

***

$$\tau_l| \boldsymbol{\zeta}_{-\tau_l} = \tau_l | \boldsymbol{\nu}_l; \;\;\; l = 1, \dots, K$$

$$\begin{aligned}
P(\tau^2_i| \boldsymbol{\zeta}_{-\tau^2_i}) & \propto \tau_l^{\alpha - 1} exp\left\{-\tau_l\beta\right\}\\
& \times |\tau_l \mathbf{P}|^{1/2} exp\left\{-\frac{\tau_l}{2}\boldsymbol{\nu}'_l\mathbf{P}\boldsymbol{\nu}_l\right\}\\
& \propto \tau_l^{(\alpha +  K / 2) - 1} exp \left\{-\tau_l\left( \beta + \frac{1}{2}\boldsymbol{\nu}'_l\mathbf{P}\boldsymbol{\nu}_l\right) \right\}\\
\end{aligned}$$

Thus we have that $$\tau_l| \boldsymbol{\zeta}_{-\tau_l} \sim Gamma\left(\alpha + K/2, \beta + \frac{1}{2}\boldsymbol{\nu}'_l\mathbf{P}\boldsymbol{\nu}_l\right)$$

***
$$\sigma^2| \boldsymbol{\zeta}_{-\sigma^2} = \sigma^2 | y_1(\mathbf{t}_1), \dots, y_N(\mathbf{t}_N), y_1(\mathbf{t}_1^*), \dots, y_N(\mathbf{t}_N^*),  \boldsymbol{\nu}, \boldsymbol{\Phi}_1, \dots, \boldsymbol{\Phi}_M, \mathbf{Z},\chi_1, \dots \chi_M$$

$$\begin{aligned}
P(\sigma^2| \boldsymbol{\zeta}_{-\sigma^2}) & \propto (\sigma^2)^{-\alpha_0 - 1}exp \left\{- \frac{\sigma^2}{\beta_0}\right\}\\
& \times \prod_{i = 1}^N \prod_{l=1}^{n_i}\frac{1}{(\sigma^2)^{1/2}} exp\left\{-\frac{1}{2\sigma^2}\left(y_i(t_{il}) -  \sum_{k=1}^K z_{ik}\left(\boldsymbol{\nu}_k'B(t_{il}) + \sum_{n=1}^M\chi_{in}\boldsymbol{\phi}_{kn}'B(t_{il})\right)\right)^2\right\}\\
& \times \prod_{i = 1}^N \prod_{l=1}^{n_i^*}\frac{1}{(\sigma^2)^{1/2}} exp\left\{-\frac{1}{2\sigma^2}\left(y_i(t_{li}^*) -  \sum_{k=1}^Kz_{ik}\left(\boldsymbol{\nu}_k'B(t_{il}^*) + \sum_{n=1}^M\chi_{in}\boldsymbol{\phi}_{kn}'B(t_{il}^*)\right)\right)^2\right\}\\
& \propto (\sigma^2)^{-\left(\alpha_0 + \left(\sum_{i=1}^N n_i\right)/2 + \left(\sum_{i=1}^N n_i^*\right)/2\right)  - 1}exp \left\{- \frac{\sigma^2}{\beta_0 + \beta_{01} +\beta_{02}}\right\}\\
\end{aligned}$$
where $$\beta_{01} =\frac{1}{2}\sum_{i=1}^N\sum_{l=1}^{n_i}\left(y_i(t_{il}) -  \sum_{k=1}^K z_{ik}\left(\boldsymbol{\nu}_k'B(t_{il}) + \sum_{n=1}^M\chi_{in}\boldsymbol{\phi}_{kn}'B(t_{il})\right)\right)^2$$

$$\beta_{02} = \frac{1}{2}\sum_{i=1}^N\sum_{l=1}^{n_i^*}\left(y_i(t_{il}^*) -  \sum_{k=1}^K z_{ik}\left(\boldsymbol{\nu}_k'B(t_{il}^*) + \sum_{n=1}^M\chi_{in}\boldsymbol{\phi}_{kn}'B(t_{il}^*)\right)\right)^2$$

Thus we  can see that 
$$\sigma^2| \boldsymbol{\zeta}_{-\sigma^2}  \sim  IG\left(\alpha_0 + \frac{\sum_{i=1}^N n_i}{2} +\frac{\sum_{i=1}^N n_i^*}{2}, \beta_0 +\beta_{01} + \beta_{02}\right)$$

***
$$y_i(t_{il}^*)| \boldsymbol{\zeta}_{-y_i(t_{il}^*)} = y_i(t_{il}^*) | \sigma^2,  \boldsymbol{\nu}, \boldsymbol{\Phi}_1, \dots, \boldsymbol{\Phi}_M, \mathbf{z}_i,\chi_1, \dots \chi_M$$

$$y_i(t_{il}^*)| \boldsymbol{\zeta}_{-y_i(t_{il}^*)}  \sim \mathcal{N}\left(\sum_{k=1}^Kz_{ik}\left(\boldsymbol{\nu}_k'B(t_{il}^*) + \sum_{n=1}^M\chi_{in}\boldsymbol{\phi}_{kn}'B(t_{il}^*)\right),\sigma^2\right)$$