This repository contains scripts and (limited) results from multi-scale analyses of simulated data or real data.
We have multiple ongoing projects that perform analyses of simulated data or different types of high-throughput sequencing data (e.g., DNase-seq (Boyle et al., 2008; Hesselberth et al., 2009), ATAC-seq (Buenrostro et al., 2013), Ribo-seq (Ingolia et al., 2011), RNA-seq (Mortazavi et al., 2008; Wang et al., 2008; Marioni et al., 2008), ChIP-seq (Johnson et al., 2007; Barski et al., 2007; Mikkelsen et al., 2007) data) by using different multi-scale approaches (e.g., WaveQTL, multiseq, and WaveHMT). Those multi-scale analyses share scripts and results to some extent, so we have tried to put them together in one repository. For now, this repository is mostly for our collaborators to share scripts/results and replicate analyses. As some data sets are not publicly available and all analyses are work in progress, we put only limited results in the repository. However, once data sets become publicly available and projects are close to be complete, we'll share all scripts/data/results. If you are interested in our analyses (results, contributing to analyses, performing similar analyses for other applications), contact hjshim at gmail dot com.
Compare multiseq to WaveQTL on simulated data
We simulated null and alternative data sets from the 578 dsQTLs, identified by either or both of the wavelet-based or 100bp window approach at FDR=0.01 in Shim and Stephens 2014, with a procedure similar to those described in Supplementary Material of Shim and Stephens 2014. We then compare performances of two approaches (WaveQTL and multiseq) on simulated data sets with different sample sizes or different read depths. As expected, multiseq outperforms WaveQTL in smaller sample sizes. Even with larger sample sizes (e.g., 70), multiseq outperforms WaveQTL unless library read depths are very high.
Analysis of ATAC-seq data measuring chromatin accessibility on Copper-treated samples and control samples
Pique-Regi et al. have been interested in understanding the potential mechanism underlying gene-environment interactions. Thus, they have measured chromatin accessibility using ATAC-seq on treated samples and control samples. Here, we focused on ATAC-seq data on Copper-treated samples and control samples. This data has small sample size (3 vs 3), but potentially has very strong signal. We applied two multi-scale methods, multiseq and WaveQTL, and a window-based approach, DESeq, to the ATAC-seq data. We found that multiseq detected substantially more differences in chromatin accessibility between two conditions than WaveQTL and DESeq.