# Analysis of toy data via DAP

This notebook explores using [DAP](https://github.com/xqwen/dap) software directly to analyze our CNV data, on a toy example. It largely follows from [this tutorial](https://github.com/xqwen/dap/wiki/Case-study:-multi-SNP-fine-mapping).

In [1]:
# ! cd ~/GIT/software; git clone https://github.com/xqwen/dap

Here is Python codes to prepare DAP input data, from a matrix where the first column is response and the rest are regressors. In the case of our toy data the first column is disease status and the rest columns are indicators of whether a gene harbors a CNV.

In [13]:
import feather
from utils import run_dap_lite

fn = "data/toy_n2000_p0.5_causal5.feather"
fout = "data/toy_n2000_p0.5_causal5_test.dap"

input_file = feather.read_dataframe(fn)
print (input_file.shape)


(2000, 11)


To run DAP:

In [14]:
run_dap_lite(input_file, fout)

2017-08-06 16:46:21.754220
2017-08-06 16:46:26.263577


%preview -n data/toy_4genes_n_1280.dap

    1   7.2437e-01    2      7.465   [chr6.100001] [chr6.100003]
    2   2.1527e-01    1      6.938   [chr6.100001]

Posterior expected model size: 1.664 (sd = 0.586)

LogNC = 17.51028 ( Log10NC = 7.605 )

Posterior inclusion probability

    1 chr6.100001   9.39636e-01      7.914
    2 chr6.100003   7.24368e-01     -1.392

The first line ranks the high-probability association models. 

The posterior probability of the association model for [chr6.100001] and [chr6.100003] is $0.72437$.

The unnormalized posterior score $\text{log}_{10}$(prior)+$\text{log}_{BF}$ is $7.465$. 

    1   7.2437e-01    2      7.465   [chr6.100001] [chr6.100003]
The last section of the output is the posterior inclusion probabilities (PIP) for top ranked genes, for example, PIP for gene1 is $0.9396$.

The last column gives $\text{log}_{10}$ Bayes factor of the corresponding SNP/gene in single-SNP/gene testing.