Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel pathway file generation #19

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
8853266
Changes to generate depmap pathways in batches
johnbradley Dec 19, 2019
0515909
fix makefile variable rename bug
johnbradley Dec 19, 2019
9574c9e
remove pathway generation data limiting debug changes
johnbradley Dec 19, 2019
0618bc8
save RData in merge_depmap_pathways
johnbradley Dec 19, 2019
3572b75
add retry enrichr failures in generate pathways
johnbradley Dec 20, 2019
aaf358d
consistent naming for pathway group size
johnbradley Dec 20, 2019
587f0e2
better break up pathways files
johnbradley Dec 20, 2019
7ee3c5d
remove data limiting debug lines
johnbradley Dec 20, 2019
8f5dc2b
handle case where no correlated genes are found
johnbradley Dec 22, 2019
3cd5d8e
Merge branch 'master' of github.com:johnbradley/depmap into 16-batch-…
johnbradley Dec 23, 2019
e5009fa
simplify fix for arrange error
johnbradley Dec 23, 2019
d36c3af
fix enrichr_loop call chain
johnbradley Dec 23, 2019
d052d82
put enricher sleep back to 0.5 seconds
johnbradley Dec 23, 2019
c9ad118
better comment and organize generate_depmap_pathways
johnbradley Dec 23, 2019
db47431
Comment how to run generate_depmap_pathways and update README
johnbradley Dec 23, 2019
58d635c
update slurm script to build in parallel
johnbradley Dec 23, 2019
ea86b7c
tweak README
johnbradley Dec 23, 2019
96bad35
improve README and Makefile comments
johnbradley Dec 23, 2019
cd514c0
more comment improvements
johnbradley Dec 23, 2019
a9dd975
add newlines at end of files
johnbradley Dec 23, 2019
8ee5d20
Merge branch 'master' into 16-batch-pathway-generation
johnbradley Dec 24, 2019
99de220
simplify makefile for pathway generation
johnbradley Jan 9, 2020
f194c3d
Merge branch 'master' of https://github.com/hirscheylab/depmap into 1…
johnbradley Jan 9, 2020
ef5f80f
simplify Makefile for pathway generation
johnbradley Jan 9, 2020
7fd7932
better comments for generate_pathways.sh
johnbradley Jan 9, 2020
f73c3b6
simplify build-slurm make command
johnbradley Jan 9, 2020
98d4d70
revert un-necessary build-slurm.sh change
johnbradley Jan 10, 2020
61c545c
Improve README for generate_depmap_pathways
johnbradley Jan 10, 2020
5434f2d
Simplify readme further for generating pathway data
johnbradley Jan 10, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 12 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@
# singularity exec singularity/depmap.sif RScript
RSCRIPT_CMD ?= Rscript

# Defines the number intermediate pathway data files that will be created. This allow generating the pathway data in parallel.
# Changing this number requires an update to the data/master_positive.RData and data/master_negative.RData rules below.
NUM_SUBSET_FILES ?= 10

# The first target is the default, it makes "all" the data. Does not include container_image
all: gene_summary depmap_data depmap_stats depmap_tables depmap_pathways

Expand Down Expand Up @@ -46,6 +50,11 @@ data/master_top_table.RData data/master_bottom_table.RData: code/generate_depmap
@echo "Creating depmap tables"
$(RSCRIPT_CMD) code/generate_depmap_tables.R

data/master_positive.RData data/master_negative.RData: code/generate_depmap_pathways.R data/gene_summary.RData data/gene_summary.RData data/19Q4_achilles_cor.RData data/achilles_lower.Rds data/achilles_upper.Rds
@echo "Creating depmap pathways"
$(RSCRIPT_CMD) code/generate_depmap_pathways.R
data/master_positive.RData: code/generate_pathways.sh code/generate_depmap_pathways.R code/merge_depmap_pathways.R
@echo "Creating positive pathways data"
RSCRIPT_CMD=$(RSCRIPT_CMD) PATHWAY_TYPE=positive NUM_SUBSET_FILES=$(NUM_SUBSET_FILES) ./code/generate_pathways.sh

data/master_negative.RData: code/generate_pathways.sh code/generate_depmap_pathways.R code/merge_depmap_pathways.R
@echo "Creating negative pathways data"
RSCRIPT_CMD=$(RSCRIPT_CMD) PATHWAY_TYPE=negative NUM_SUBSET_FILES=$(NUM_SUBSET_FILES) ./code/generate_pathways.sh

4 changes: 1 addition & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,4 @@ To generate the data files, run:
2. code/generate_depmap_data.R
3. code/generate_depmap_stats.R
4. code/generate_depmap_tables.R
5. code/generate_depmap_pathways.R

The files generated in steps 1-3 are required for steps 4 and 5. Step 4 takes about 60' to run locally. Step 5 requires some parallization, and you'll see objects dec(ile)1-10 that could be run in parallel. The code for step 5 has `gene_group <- sample` so it can be tested.
5. code/generate_pathways.sh
1 change: 1 addition & 0 deletions build-slurm.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
#!/usr/bin/env bash
#SBATCH --mem=32G
#SBATCH --cpus-per-task=10
#SBATCH --output=logs/ddh-%j.out

source config.sh
Expand Down
49 changes: 49 additions & 0 deletions code/depmap_pathways_util.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Utility functions for use with depmap pathway generation
library(here)

dpu_pathways_positive_type <- "positive"
dpu_pathways_negative_type <- "negative"

# Get a path to a subset file within the data directory for either positive or negative pathway data
dpu_get_subset_filepath <- function(pathways_type, file_idx, num_subset_files) {
# create filename like 'positive_subset_1_of_10' or 'negative_subset_1_of_10'
subset_filename <- paste0(pathways_type, "_subset", "_", file_idx, "_of_", num_subset_files, ".Rds")
here::here("data", subset_filename)
}

# Get a path all subset files within the data directory for either positive or negative pathway data
dpu_get_all_pathways_subset_filepaths <- function(pathways_type, num_subset_files) {
dpu_get_subset_filepath(pathways_type, seq(num_subset_files), num_subset_files)
}

# parse command line for --type, --num-subset-files and optionally --idx
dpu_parse_command_line <- function (include_idx) {
option_list <- list(
make_option(c("--type"), type="character", default="positive", dest='pathways_type',
help="Type of pathways file to create either 'positive' or 'negative'"),
make_option(c("--num-subset-files"), type="integer", dest='num_subset_files',
help="Number of subset pathways files")
)
if (include_idx) {
option_list <- c(
option_list,
make_option(c("--idx"), type="integer", dest='subset_file_idx',
help="Specifies single pathways subset file to create.")
)
}
opt_parser <- OptionParser(option_list=option_list)
opt <- parse_args(opt_parser)
if (opt$pathways_type != dpu_pathways_positive_type && opt$pathways_type != dpu_pathways_negative_type) {
message("The --type argument must be either '", dpu_pathways_positive_type, "' or '", dpu_pathways_positive_type, "'")
q(status=1)
}
if (is.null(opt$num_subset_files)) {
message("The --num-subset-files flag is required.")
q(status=1)
}
if (include_idx && is.null(opt$subset_file_idx)) {
message("The --idx is required.")
q(status=1)
}
opt
}
Loading