Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integrate custom r script into snakemake pipeline #2

Merged
merged 1 commit into from Feb 12, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
23 changes: 23 additions & 0 deletions Snakefile
Expand Up @@ -86,6 +86,7 @@ rule make_summary:
homolog_escape=nb_markdown('homolog_escape.ipynb'),
make_supp_data=nb_markdown('make_supp_data.ipynb'),
structural_contacts='results/summary/annotate_structural_contacts.md',
custom_plots='results/summary/custom-plots_LY555.md',
output:
summary = os.path.join(config['summary_dir'], 'summary.md')
run:
Expand Down Expand Up @@ -146,6 +147,8 @@ rule make_summary:
which are [here]({path(config['supp_data_dir'])}). These include
`dms-view` input files.

16. Make custom plots for this project that fall out of the core pipeline, summarzied in [this notebook]({path(input.custom_plots)})

"""
).strip())

Expand All @@ -158,6 +161,26 @@ rule make_rulegraph:
shell:
"snakemake --forceall --rulegraph | dot -Tsvg > {output}"

rule custom_plots:
input:
config['escape_fracs'],
config['gisaid_mutation_counts']
output:
md='results/summary/custom-plots_LY555.md',
md_files=directory('results/summary/custom-plots_LY555_files')
envmodules:
'R/3.6.2-foss-2019b'
params:
nb='custom-plots_LY555.Rmd',
md='custom-plots_LY555.md',
md_files='custom-plots_LY555_files'
shell:
"""
R -e \"rmarkdown::render(input=\'{params.nb}\')\";
mv {params.md} {output.md};
mv {params.md_files} {output.md_files}
"""

rule structural_contacts:
input:
config['structural_contacts_config']
Expand Down
6 changes: 6 additions & 0 deletions cluster.yaml
Expand Up @@ -31,3 +31,9 @@ structural_contacts:
time: 0-1
mem: 50000
constraint: gizmok

custom_plots:
cpus: 4
time: 0-1
mem: 50000
constraint: gizmok
1 change: 1 addition & 0 deletions config.yaml
Expand Up @@ -179,3 +179,4 @@ structural_contacts_dir: results/structural_contacts
structural_contacts: results/structural_contacts/structural_contacts.csv
antibody_contacts: results/structural_contacts/antibody_contacts.csv
escape_selections_dir: results/escape_selections
custom_plots_dir: results/custom_plots_LY
Expand Up @@ -28,13 +28,13 @@ if(any(installed_packages == F)){
invisible(lapply(packages, library, character.only=T))

#read in config file
config <- read_yaml("../config.yaml")
config <- read_yaml("./config.yaml")

#read in escape profiles file
profiles_config <- read_yaml(file=paste("../",config$escape_profiles_config,sep=""))
profiles_config <- read_yaml(file=config$escape_profiles_config)

#make output directory
output_dir <- "../results/custom_plots_LY"
output_dir <- config$custom_plots_dir
if(!file.exists(output_dir)){
dir.create(file.path(output_dir))
}
Expand All @@ -49,7 +49,7 @@ sessionInfo()
Read in escape fractions, rename some things to make them easier to work with.

```{r data input}
scores <- data.table(read.csv(file=paste("../",config$escape_fracs,sep=""),stringsAsFactors=F))
scores <- data.table(read.csv(file=config$escape_fracs,stringsAsFactors=F))

scores <- scores[selection %in% names(profiles_config$`LY_cocktail`$conditions) & library=="average", .(selection,condition,site,protein_site,wildtype,mutation,mut_escape_frac_epistasis_model,site_total_escape_frac_epistasis_model)]

Expand All @@ -72,7 +72,7 @@ We read in table reporting circulating variants. We add new columns to our data

```{r mutation_escape_v_freq, echo=T, fig.width=10, fig.height=3.75, fig.align="center", dpi=300,dev="png"}
#read in table giving mutant frequencies on GISAID
counts <- read.csv(paste("../",config$gisaid_mutation_counts,sep=""),stringsAsFactors=F)
counts <- read.csv(config$gisaid_mutation_counts,stringsAsFactors=F)
#add to scores table
scores[,count:=0];scores[,n_countries:=0];scores[,frequency:=0]
for(i in 1:nrow(counts)){
Expand Down Expand Up @@ -108,7 +108,7 @@ For contextualizing ACE2 and expression deficits of antigenic mutations, what is

```{r GISAID_DFE, echo=T, fig.width=7, fig.height=4, fig.align="center", dpi=300,dev="png"}
#read in DMS scores
dms <- data.table(read.csv(file=paste("../",config$mut_bind_expr,sep=""),stringsAsFactors = F))
dms <- data.table(read.csv(file=config$mut_bind_expr,stringsAsFactors = F))
setnames(dms,"site_SARS2","site")
dms <- dms[mutant != "*",]
dms <- dms[as.character(wildtype) != as.character(mutant),]
Expand Down Expand Up @@ -160,8 +160,8 @@ We want to try a few plots -- at the per-mut and per-site level, compare distrib
First, load pdbs, and generate list of residues that are <4A or 4-8A from each mAb

```{r define_structural_contacts}
pdb_LY555 <- read.pdb(file="../data/pdbs/7kmg_single.pdb")
pdb_LY016 <- read.pdb(file="../data/pdbs/7c01_single.pdb")
pdb_LY555 <- read.pdb(file="./data/pdbs/7kmg_single.pdb")
pdb_LY016 <- read.pdb(file="./data/pdbs/7c01_single.pdb")

contacts_LY555_4A <- binding.site(pdb_LY555,
a.inds=atom.select(pdb_LY555, chain="C"),
Expand Down
15 changes: 0 additions & 15 deletions custom_analyses/README.md

This file was deleted.