ExAC analysis scripts

Analysis and figures from ExAC flagship paper, bioRxiv 2015

This package can recreate the figures from the main ExAC paper.

It requires the following packages:

install.packages(c('binom', 'plyr', 'ggplot2', 'data.table', 'reshape', 'plotrix', 'dplyr', 'Hmisc', 'gdata', 'magrittr', 'vioplot'))

Basic usage

Warning: this process requires a decent amount of memory, ideally on a machine that has at least 16G RAM (newer Macbook Pros are fine, but not Macbook Airs). On Linux and machines with stricter memory policies, we recommend 32G or even 64G to allow for overhead in loading.

source('exac_constants.R')
exac = load_exac_data()

exac is then the data frame of ALL ~10M variants in the dataset with over ~100 annotations per variant. Each variant is now its own entry, as opposed to a typical VCF where multi-allelic variants are combined into a single line. Note that these are unfiltered variant calls. Make sure to filter to either PASS sites, or better yet, our criteria for high-quality calls (given by the column use, see this blog post for details):

filtered_calls = subset(exac, filter == 'PASS')

or:

use_data = subset(exac, use)

The code for figures is in exac_figures.R. This process is very memory intensive, so depending on your system, it may be easier to open the script and run sections of the code independently.

Additional analyses

Note for ExAC analysts: When writing additional analyses, for consistency, it is suggested to start your script with the following (including exac, constraint, etc., as needed):

source('exac_constants.R')
if (!("exac" %in% ls(globalenv()))) {
  exac = load_exac_data()
}
if (!("use_data" %in% ls(globalenv()))) {
  use_data = subset(exac, use)
}
if (!("constraint" %in% ls(globalenv()))) {
  constraint = load_constraint_data()
}

Then, use exac as the full exac data frame, and use_data as the high-confidence variant set.

Alternatively, particularly for scripts that will be used on arbitrary subsets of data, wrap your scripts in functions that take an exac variable as an argument.

If you are using source('script.R'), assume the user is running the code from the root of exac_papers.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
analysis		analysis
data		data
figures		figures
src		src
LICENSE		LICENSE
README.md		README.md
exac_constants.R		exac_constants.R
exac_figures.R		exac_figures.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

analysis

analysis

data

data

figures

figures

src

src

LICENSE

LICENSE

README.md

README.md

exac_constants.R

exac_constants.R

exac_figures.R

exac_figures.R

Repository files navigation

ExAC analysis scripts

Analysis and figures from ExAC flagship paper, bioRxiv 2015

Basic usage

Additional analyses

About

Releases

Packages

Languages

License

macarthur-lab/exac_2015

Folders and files

Latest commit

History

Repository files navigation

ExAC analysis scripts

Analysis and figures from ExAC flagship paper, bioRxiv 2015

Basic usage

Additional analyses

About

Resources

License

Stars

Watchers

Forks

Languages