Dynamic Statistical Comparisons for MOUTHWASH simulations.
The necessary computational environment must be installed on a linux-64 machine. Below are instructions for using Docker or conda.
The Docker image is based on continuumio/miniconda3.
# Build image
docker build -t dsc-mouthwash .
# Run container
docker run --rm -it dsc-mouthwash
# Run Jupyter notebook
docker run -i -t -p 8888:8888 continuumio/miniconda3 /bin/bash -c "/opt/conda/bin/conda install jupyter -y --quiet && mkdir /opt/notebooks && /opt/conda/bin/jupyter notebook --notebook-dir=/opt/notebooks --ip='0.0.0.0' --port=8888 --no-browser --allow-root"
Alternatively if you have conda installed, you can install the environment directly on your machine.
conda create -n dsc-mouthwash \
-c conda-forge -c bioconda -c defaults -c jdblischak \
--file requirements/misc \
--file requirements/python \
--file requirements/r
source activate dsc-mouthwash
From Section 4.1:
-
$p$ (genes) - 1000 -
$\pi_0$ (proportion of null genes) - 0.5, 0.9, 1 -
$n$ (samples) - 6, 10, 20, 40 -
$m$ (control genes) - 10, 100
Number of combinbations = 3 * 4 * 2 = 24
500 replicates of each combination, therefore 500 * 24 = 12,000 simulated datasets.
- OLS
- SVA
- CATErr
- CATErr+MAD
- RUV2
- RUV3
- CATEnc
- CATEnc+MAD
- CATEnc+Cal
- MOUTHWASH
- BACKWASH (our current plan is to skip this b/c it runs too long w/o sufficient improvement)
- AUC - ability to distinguish null versus non-null genes
-
$\pi_0$ - estimated proportion of null genes
From section 4.3:
- GTEx tissue (23)
- List of control genes (2)
Therefore 23 * 2 = 46 datasets.
Same as above
-
$s$ - number of sex-chromosome genes in top 100 most significant results -
$\pi_0$ - estimated proportion of null genes