# Bayes Factor

[Bayes factor](https://academic.oup.com/ije/article/37/3/641/742885) can be used as an alternative to $p$-value for assessing significance. Here we go over [asymptotic Bayes factor](https://onlinelibrary.wiley.com/doi/abs/10.1002/gepi.20359) (ABF) for ranking associations from genome-wide association studies (GWAS). When exactly one causal variant is assumed within a given genomic region, ABF works well for prioritizing variants (i.e. fine-mapping). With more than two causal variants, however, more sophisticated Bayesian models are needed. The derivation of ABF is based on asymptotic properties of maximum likelihood estimator (MLE).

### Derivation

Bayes factor is the ratio of likelihood of data under two different models. For SNP$_j$, the null model is that $\beta_j = 0$, while the alternative model is that $\beta_j \sim N(0, \sigma_b^2)$.

$$
\text{BF}_j = \frac{p(\mathbf{y} \mid H_1)}{p(\mathbf{y} \mid H_0)}
$$

In case-control design, in which phenotype is the disease status, logistic regression can be applied.

$$
\frac{p_i}{1 - p_i} = e^{\boldsymbol{b}_i^T \boldsymbol{\alpha} + x_{ij} \beta_j},
$$

where $\boldsymbol{b}_i \in \mathbb{R}^{p\text{x}1}$ is a vector of covariates for individual $i$, $\boldsymbol{\alpha} \in \mathbb{R}^{p\text{x}1}$ is a vector of corresponding parameters, $x_{ij}$ is the genotype of individual $i$ at SNP$_j$, and $\beta_j$ its corresponding effect size. Then the MLE of these $p+1$ parameters is asymptotically distributed as:

$$
\begin{bmatrix} 
\hat{\boldsymbol{\alpha}} \\
\hat{\beta}_j \end{bmatrix} 
\sim N_{p+1} 
\left(
\begin{bmatrix} 
\boldsymbol{\alpha} \\
\beta_j 
\end{bmatrix}, 
\begin{bmatrix} 
\boldsymbol{I}_{00} & \boldsymbol{I}_{01} \\
\boldsymbol{I}_{01}^T & I_{11} \end{bmatrix}^{-1} \right),
$$

where $\boldsymbol{I}_{00} \in \mathbb{R}^{p\text{x}p}$ is the expected Fisher information concering $\boldsymbol{\alpha}$ and $I_{11}$ is the expected information concerning $\beta_j$. In order to remove $\boldsymbol{I}_{01} \in \mathbb{R}^{p\text{x}1}$, we reparameterize the model as,

$$
\begin{bmatrix} 
\hat{\boldsymbol{\gamma}} \\
\hat{\beta}_j \end{bmatrix} =
\begin{bmatrix} 
\mathbf{I}_{p \times p} & \frac{\boldsymbol{I}_{01}}{I_{11}} \\
\mathbf{0}_{1 \times p} & 1 
\end{bmatrix} 
\begin{bmatrix} 
\hat{\boldsymbol{\alpha}} \\
\hat{\beta}_j \end{bmatrix} 
$$

$$
\begin{bmatrix} 
\hat{\boldsymbol{\gamma}} \\
\hat{\beta}_j \end{bmatrix} 
\sim N_{p+1} 
\left(
\begin{bmatrix} 
\boldsymbol{\gamma} \\
\beta_j 
\end{bmatrix}, 
\begin{bmatrix} 
\boldsymbol{I}_{00}^* & \boldsymbol{0}_{p \times 1} \\
\boldsymbol{0}^T_{1 \times p} & I_{11} \end{bmatrix}^{-1} \right)
$$

Hence, asymptotically, $\hat{\boldsymbol{\gamma}}$ and $\hat{\beta}_j$ are independent. If we also assume independent priors on $\boldsymbol{\gamma}$ and $\beta_j$, under $H_1$ and $H_0$,

$$
p(\mathbf{y} \mid H_1) = p(\hat{\boldsymbol{\gamma}}, \hat{\beta}_j \mid H_1) = \int p(\mathbf{y} \mid \boldsymbol{\gamma}, \beta_j) p(\boldsymbol{\gamma}, \beta_j) d\boldsymbol{\gamma} d\beta_j
$$

$$
= \int p(\hat{\boldsymbol{\gamma}}, \hat{\beta}_j \mid \boldsymbol{\gamma}, \beta_j) p(\boldsymbol{\gamma}, \beta_j) d\boldsymbol{\gamma} d\beta_j = \int p(\hat{\boldsymbol{\gamma}} \mid \boldsymbol{\gamma}) p(\boldsymbol{\gamma}) d\boldsymbol{\gamma} \cdot \int p(\hat{\beta}_j \mid \beta_j) p(\beta_j) d\beta_j
$$

$$
p(\mathbf{y} \mid H_0) = p(\hat{\boldsymbol{\gamma}}, \hat{\beta}_j \mid H_0) = \int p(\mathbf{y} \mid \boldsymbol{\gamma}, \beta_j = 0) p(\boldsymbol{\gamma}) d\boldsymbol{\gamma}
$$

$$
= \int p(\hat{\boldsymbol{\gamma}}, \hat{\beta}_j \mid \boldsymbol{\gamma}, \beta_j = 0) p(\boldsymbol{\gamma}) d\boldsymbol{\gamma} = \int p(\hat{\boldsymbol{\gamma}} \mid \boldsymbol{\gamma}) p(\boldsymbol{\gamma}) d\boldsymbol{\gamma} \cdot p(\hat{\beta}_j \mid \beta_j = 0)
$$

$$
\therefore \text{ABF}_j  = \frac{\int p(\hat{\beta}_j \mid \beta_j) p(\beta_j) d\beta_j}{p(\hat{\beta}_j \mid \beta_j = 0)}
$$

$$
= \sqrt{\frac{s_j^2}{s_j^2 + \sigma_b^2}} \exp \left[ \frac {z^2_j}{2} \frac{\sigma_b^2}{s_j^2 + \sigma_b^2}\right]
$$

The last equality holds after integrating out $\beta_j$ and substituting $z_j = \hat{\beta}_j / \text{se}(\hat{\beta}_j)$, $\text{se}(\hat{\beta}) = s_j$. Note that the $\boldsymbol{\gamma}$ terms cancel out, so there is no need to specify a prior on $\boldsymbol{\gamma}$.

### Interpretation

1. ABF can be calculated using summary statistics.
2. ABF is approrpriate when sample size is large, which is usually the case for GWAS.
3. ABF is of closed form, so there is no need for multidimensional integrals.
3. ABF does not increase monotonically with $s_j^2$. For example, when $s_j^2$ is small, which corresponds to high power, ABF for $H_1$ is small. ABF is maximum when $s_j^2 = \sigma_b^2/(z_j^2 - 1)$.
4. It is straightforward to combine ABF from multiple studies.
5. Different priors on $\beta_j$ can lead to different ABF.