Skip to content

frankwalbert/promoterVariants

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 

Repository files navigation

promoterVariants

Code, objects and annotations for Renganaath, Cheung, et al., 2020

The code is divided into the six major areas listed below. Area 7 is additional code generated during revision for eLife (https://elifesciences.org/articles/62669)

Code in areas 1 to 3 and some of area 4 written by Frank Albert. Code in area 5 and the rest of area 4 written by Kaushik Renganaath. Code in area 6 from https://www.nature.com/articles/s41587-019-0315-8 with minor modifications by Kaushik Renganaath.

Each area has one code folder. Areas 1 – 4 also have folders with various objects, annotations, etc, that get used by the code. In some cases, objects get used by several analysis areas (such as the oligo design file, gene annotations, barcode counts). Be sure to look through all object folders and adjust your paths as necessary. Some of the annotation files are gzipped – need to unzip before use with the R code.

Throughout, the code files are numbered in the order in which they should be used. Sometimes a later file won't work if it is run before an earlier file because it needs intermediate objects saved by some of the earlier files. We provided the most important of these intermediate objects, but some minor files may need to be created using the code here. Some huge files are also missing from this code repository (e.g. barcode level counts, not just those to designed oligos). These are available at the GEO repository listed in the paper or at https://figshare.com/s/1c23d927e17fc203ac3b

A general note: Many "save" or "write.table" commands are commented out. This can prevent accidental overwriting of files. Check for these commented lines – you want to actually run these, once you've confirmed that no important files will be overwritten.

The analysis areas are:

1. Oligo library design

a) TSS_preparations.R: Make TSS annotations from Pelechano 2013 data

b) The actual design code

c) reverse-complement oligos with more A than T nucleotides

d) analyze the design to get descriptives for the paper

2. Annotation runs

Map barcodes to oligos, count the combinations, and map them to the MPRA design. There is one code file for TSS and one for Upstream. For each library, the code produces three files that will be needed in downstream analyses. Four of these six files are big (TSS library: R_bcCountsAssigned_160629.RData and R_bcCountsTopOligoOnly_160629.RData; Upstream library: R_bcCountsAssigned_UpStream.RData and R_bcCountsTopOligoOnly_161203_UpStream.RData). They are available at https://figshare.com/s/1c23d927e17fc203ac3b.

3. Test for causal variants and do basic analyses on them

"01_countAndcombineAllSamples": does just that. For every replicate sample, count the barcodes, aggregate them into a big table. This uses a lot of RAM and runs for hours. The key output of this is "R_countsMappedToDesignedOligos_180616.RData", which is available at https://figshare.com/s/1c23d927e17fc203ac3b.

The three files starting "0x_mpralm..." contain the statistical analyses of single variants and epistasis.

"06_aggregateVariantResults.R" combines the results across the TSS and Upstream library, does reproducibility of single variants, and makes the big volcano plot.

"07_variantsPerGene_and_eQTLs.R" compares the single variant effects to local eQTLs.

4. variant annotation

Make and gather various features. While all code needed to recapitulate how these features were assembled is here, it is messy due to having been written over multiple years, with multiple updates to various intermediate objects. It may be more efficient to just work with the final product ("R_resultsAndAnnotations_200719.RData"). This file has all non-TFBS annotations and variant results, which then got fed into feature association testing.

5. feature analyses & prediction

Single features and the various multiple regression models

6. Application of the de Boer 2020 model to our data

The model is described here: https://www.nature.com/articles/s41587-019-0315-8

For this area, all required code and data are in a zipped folder structure. The code is almost entirely from de Boer et al., 2020. We made minor modifications.

7. Code for analyses during revision for eLife

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages