# Calculating the Likelihood of Biomarker Measurements

Suppose we have a participant's data, for example, @fig-sample-data. Now, the question is:

>What is the likelihood of this participant having this sequence of biomarker data, given that we know $S, k_j, \theta, \phi$.

In the following, we explain how to calculate this likelihood in two scenartios: (1) known $k_j$ and (2) unknown $k_j$.

## Known $k_j$

$$
p(X_{j} | S, z_j = 1, k_j) = \underbrace{\prod_{i=1}^{k_j}{p(X_{S(i)j} \mid \theta_{S(i)} )}}_{\text{Affected biomarker likelihood}} \, 
\underbrace{\prod_{i=k_j+1}^N{p(X_{S(i)j} \mid \phi_{S(i)})}}_{\text{Non-affected biomarker likelihood}}
$$ {#eq-known-kj}

This equation compuates the likelihood of the observed biomarker data of a specific participant, given that we know the disease stage this patient is at ($k_j$). 

- $S$ is an **orded array** of biomarkers that are affected by the disease, for example, $[b, a, d, c]$. This means that biomarker $b$ is affected at stage 1. At stage 2, biomarker $b$ and $a$ will be affected. 

- $S(i)$ is the $i^{th}$ biomarker according to $S$. For example $S_1$ will be biomarker $b$. 

- $k_j$ indicates the stage the patient is at, for example, $k_j = 2$. This means that the disease has effected biomarker $a$ and $b$. Biomarker $c$ and $d$ have not been affected yet. 

- $\theta_{S(i)}$ is the parameters for the probability density function (PDF) of observed value of biomarker $S(i)$ when this biomarker has been affected by the disease. Let's assume this distribution is a Gaussian distribution with means of $[45, 50, 55, 60]$ and a standard deviation of $5$ for biomarker $b$, $a$, $d$, and $c$. 

- $\phi_{S(i)}$ is the parameters for the probability density function (PDF) of observed value of biomarker $S(i)$ when this biomarker has **NOT** been affected by the disease. Let's assume this distribution is a Gaussian distribution with means of $[25, 30, 35, 40]$ and a standard deviation of $3$ for biomarker $b$, $a$, $d$, and $c$.

- $X_j$ is an array representing the patient's observed data for all biomarker. Assume the data is $[77, 45, 53, 90]$ for biomarker $b$, $a$, $d$, and $c$.

We assume that the patient is at stage $2$ of this disease; hence $k_j = 2$. 

Next, we are going to calculate $p(X_j|S, z_j = 1, k_j)$:

When $i = 1$, we have $S_{(i)} = b$ and $X_{S_{(i)}} = X_b = 45$. So

$$p(X_{S_{(i)}} | \theta_{S(i)}) = \frac{1}{\sigma \sqrt{2 \pi}} e^{-\frac{1}{2}\left(\frac{X_b - \mu}{\sigma} \right)^2}$$

Because $k_j = 2$, so biomarker $b$ and $a$ are affected. We should use the distribution of $\theta_b$; therefore, we should plug in $\mu = 45, \sigma = 5$ in the above equation. 

We can do the same for $i$ = 2, 3, and 4. 

So

$$p(X_j | S, k_j = 2) = p (X_b | \theta_b) \times p (X_a | \theta_a) \times p (X_d | \phi_d) \times p (X_c | \phi_c)$$

The above is the likelihood of the given biomarker data when $k_j = 2$.

Note that $p (X_b | \theta_b)$ is probability density, a value of a probability density function at a specific point; so it is not a probability itself. 

**Multiplying multiple probability densities will give us a likelihood**. 

## Unknown $k_j$

$$
P(X_{j} | S) = \sum_{k_j=0}^N{P(k_j) p(X_{j} \mid S, k_j)}
$$ {#eq-unknown-kj}

Suppose we have the same information above, except that we do not know at which disease stage the patient is, i.e., we do not know $k_j$. We have the observed biomarker data: $X_j = [77, 45, 53, 90]$. And I wonder: what is the likelihood of seeing this specific ovserved data?

We assume that all five stages (including $k_j = 0$) are equally likely. 

We do not know $k_j$, so the best option is to calculate the "average" likelihood of all the biomarker data. 

Based on @eq-known-kj, we can calculate the following:

$L_1 = p(X_j | S, k_j = 1)$

$L_2 = p(X_j | S, k_j = 2)$

$L_3 = p(X_j | S, k_j = 3)$

$L_4 = p(X_j | S, k_j = 4)$

Also note that we need to consider $L_0$ because in the equation above, $k_j$ starts from $0$.

$$L_0 = p(X_j | S, k_j = 0) = p (X_b | \phi_b) \times p (X_a | \phi_a) \times p (X_d | \phi_d) \times p (X_c | \phi_c)$$

$$L_1 = p(X_j | S, k_j = 1) = p (X_b | \theta_b) \times p (X_a | \phi_a) \times p (X_d | \phi_d) \times p (X_c | \phi_c)$$

$$L_2 = p(X_j | S, k_j = 2) = p (X_b | \theta_b) \times p (X_a | \theta_a) \times p (X_d | \phi_d) \times p (X_c | \phi_c)$$

$$L_3 = p(X_j | S, k_j = 3) = p (X_b | \theta_b) \times p (X_a | \theta_a) \times p (X_d | \theta_d) \times p (X_c | \phi_c)$$

$$L_4 = p(X_j | S, k_j = 4) = p (X_b | \theta_b) \times p (X_a | \theta_a) \times p (X_d | \theta_d) \times p (X_c | \theta_c)$$

$P(k_j)$ is the prior likelihood of being at stage $k$. **Event based models assume a uniform prior on $k_j$**. Therefore:

$P(X_{j} | z_j=1, S) = \frac{1}{5} \left(L_0 + L_1 + L_2 + L_3 + L_4 \right)$