-
Notifications
You must be signed in to change notification settings - Fork 14
[WIP] STAR Workflow v2.0.0 #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
56 commits
Select commit
Hold shift + click to select a range
5787bb6
Getting acquainted with WDL. Sketch of STAR workflow.
claymcleod f28631c
Draft of star-alignment dockerfile.
claymcleod 0e3190a
Syncing of latest STAR dockerfile for Andrew.
claymcleod 8b02f71
Add tasks to star.wdl
a-frantz 3fe0ad3
Move new tasks in star.wdl to qc.wdl
a-frantz e98be0f
Add star-qc workflow
a-frantz 427178c
Refactor star-qc.wdl
a-frantz eb79ac0
Reorganize wdl tools
adthrasher c52cdcc
Adding samtools tasks
adthrasher f4e48df
Use `glob()` instead of ugly Array[File] hack
a-frantz 24cca66
Make samtools split fail on unaccounted reads, unless workflow overrides
a-frantz 581e685
Remove extra bracket
a-frantz f1da9b0
Updated split function for samtools
adthrasher 774b7ac
Merging changes
adthrasher 47def83
Adding RSeQC infer_experiment call.
adthrasher fa5bd42
feat(docker): Make improvements to the docker images included here (s…
claymcleod ea7c349
chore(gitignore): Add log files to .gitignore (produced by `dive`)
claymcleod 4b783c1
Fixing variable references
adthrasher 900f72f
Merge branch 'rnaseq-workflow' of https://github.com/stjude/sjcloud-w…
adthrasher 6dc5dc8
Implement steps 1-4 of RFC in bam_to_fasqs.wdl
a-frantz b1be8a0
Merge branch 'rnaseq-workflow' of github.com:stjude/sjcloud-workflows…
a-frantz 0e01dcf
feat(docker): Update bioinformatics-base based on Michael's comments
claymcleod 1d60b2b
Adding md5sum wdl
adthrasher 69ac21a
Adding htseq-count
adthrasher bb78cbd
Apply suggestions from code review
adthrasher 024c187
Changes to star alignment and star db build
a-frantz b089935
Begin creating full workflow, from start to finish
a-frantz cbfed24
Add to .gitignore
a-frantz a7b9d65
Adding output files
adthrasher 78d37a7
Updated workflow that runs through infer_experiment.py
adthrasher 3a948ac
Create fastqc output directory before run, otherwise fastqc complains…
adthrasher 141be7f
Adding qualimap rnaseq qc task
adthrasher c038a9f
Updating full workflow to run through htseq-count step
adthrasher 7e994f4
Adding a utility tool for handling tool output parsing and pre-proces…
adthrasher c0c6ecd
Capturing htseq-count output
adthrasher 132f109
Adding samtools flagstat, index, and md5sum steps
adthrasher d1b36aa
Redirect desired stdout to files and use read_string(File) for output
a-frantz be30cbc
Implement multiqc
a-frantz dcb3d41
Use gtf instead of gff for htseq-count
a-frantz 3726440
Make var name changes to make wdltool validate happy
a-frantz a2df3b3
Add runtime memory parameters to heavy tasks
a-frantz 3d73ed8
Bumping the star alignment memory requirement and setting a limit for…
adthrasher 381f7c6
Adding lsf.conf for Cromwell backend. Setting backend to use MB for m…
adthrasher 41fe74e
Adding fq lib to the Docker image.
adthrasher 52a1587
Renaming variables from basename for toil compatibility.
adthrasher 552eaad
Installing fq to /usr/local/
adthrasher c0ee835
Moving fq lib to a separate build layer.
adthrasher d31370f
Updating LSF conf to allow singularity wrapper if a docker image is s…
adthrasher 955df89
Merge branch 'rnaseq-workflow' of https://github.com/stjude/sjcloud-w…
adthrasher 925bbe1
Adding docker runtime
adthrasher f98ad5e
Adding v0.3.1 tag to fq lib install
adthrasher 67d38ed
Adding a template configuration file for running on AWS with Cromwell.
adthrasher f749943
Adding documentation to workflows and licensing information.
adthrasher 0b9d066
Adding additional documentation.
adthrasher fea42d2
Adding deeptools to Docker image.
adthrasher 4aafcac
Adding bigwig generation step with deeptools.
adthrasher File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,9 @@ | ||
| # Blacklist common bioinformatics formats used in these workflows. | ||
| **/*.fastq.gz | ||
| **/*.fa | ||
| **/*.gtf | ||
| **/*.gtf | ||
| **/*.bam | ||
| **/*.log | ||
|
|
||
| # Blacklist Cromwell files | ||
| **/cromwell* |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| MIT License | ||
|
|
||
| Copyright (c) 2019 St. Jude Children's Research Hospital | ||
|
|
||
| Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: | ||
|
|
||
| The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software. | ||
|
|
||
| THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,51 @@ | ||
| include required(classpath("application")) | ||
|
|
||
| aws { | ||
|
|
||
| application-name = "cromwell" | ||
| auths = [ | ||
| { | ||
| name = "default" | ||
| scheme = "default" | ||
| } | ||
| ] | ||
| region = "<your-region>" | ||
| } | ||
|
|
||
| engine { | ||
| filesystems { | ||
| s3.auth = "default" | ||
| } | ||
| } | ||
|
|
||
| backend { | ||
| default = "AWSBatch" | ||
| providers { | ||
| AWSBatch { | ||
| actor-factory = "cromwell.backend.impl.aws.AwsBatchBackendLifecycleActorFactory" | ||
| config { | ||
|
|
||
| numSubmitAttempts = 6 | ||
| numCreateDefinitionAttempts = 6 | ||
|
|
||
| // Base bucket for workflow executions | ||
| root = "s3://<your-s3-bucket-name>/cromwell-execution" | ||
|
|
||
| // A reference to an auth defined in the `aws` stanza at the top. This auth is used to create | ||
| // Jobs and manipulate auth JSONs. | ||
| auth = "default" | ||
|
|
||
| default-runtime-attributes { | ||
| queueArn: "<your arn here>" | ||
| } | ||
|
|
||
| filesystems { | ||
| s3 { | ||
| // A reference to a potentially different auth for manipulating files via engine functions. | ||
| auth = "default" | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,59 @@ | ||
| include required(classpath("application")) | ||
|
|
||
| call-caching { | ||
| enabled = true | ||
| } | ||
|
|
||
| backend { | ||
| default = LSF | ||
| providers { | ||
| LSF { | ||
| actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory" | ||
| config { | ||
| runtime-attributes = """ | ||
| Int cpu = 1 | ||
| Int hosts = 1 | ||
| Float? memory_mb = 3800 | ||
| String lsf_queue = "standard" | ||
| String? lsf_job_group | ||
| String? docker | ||
| """ | ||
|
|
||
| #submit = """bsub -J ${job_name} -cwd ${cwd} -o ${out} -e ${err} -R rusage[mem=${memory_mb + "MB"}] -R "span[hosts=1]" -n ${cpu} ${"-q " + lsf_queue} /usr/bin/env bash ${script}""" | ||
|
|
||
| submit = """ | ||
| bsub \ | ||
| -q ${lsf_queue} \ | ||
| -n ${cpu} \ | ||
| ${"-g " + lsf_job_group} \ | ||
| -R "rusage[mem=${memory_mb}] span[hosts=${hosts}]" \ | ||
| -J ${job_name} \ | ||
| -cwd ${cwd} \ | ||
| -o ${out} \ | ||
| -e ${err} \ | ||
| /usr/bin/env bash ${script} | ||
| """ | ||
|
|
||
| submit-docker = """ | ||
| bsub \ | ||
| -q ${lsf_queue} \ | ||
| -n ${cpu} \ | ||
| ${"-g " + lsf_job_group} \ | ||
| -R "select[singularity] rusage[mem=${memory_mb}] span[hosts=${hosts}]" \ | ||
| -J ${job_name} \ | ||
| -cwd ${cwd} \ | ||
| -o ${cwd}/execution/stdout \ | ||
| -e ${cwd}/execution/stderr \ | ||
| "singularity exec --bind ${cwd}:${docker_cwd} docker://${docker} ${job_shell} ${script}" | ||
| """ | ||
|
|
||
|
|
||
| kill = "bkill ${job_id}" | ||
| check-alive = "bjobs ${job_id}" | ||
| job-id-regex = "Job <(\\d+)>.*" | ||
|
|
||
| exit-code-timeout-seconds = 120 | ||
| } | ||
| } | ||
| } | ||
| } |
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| FROM rust:1.36.0 as fqlib-builder | ||
| RUN cargo install \ | ||
| --git https://github.com/stjude/fqlib.git \ | ||
| --tag v0.3.1 \ | ||
| --root /opt/fqlib/ | ||
|
|
||
| FROM ubuntu:18.04 as builder | ||
|
|
||
| ENV PATH /opt/conda/bin:$PATH | ||
|
|
||
| RUN apt-get update && \ | ||
| apt-get upgrade -y && \ | ||
| apt-get install wget -y && \ | ||
| apt-get install gcc -y && \ | ||
| rm -r /var/lib/apt/lists/* | ||
|
|
||
| RUN wget "https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh" -O miniconda.sh && \ | ||
| bash miniconda.sh -b -p /opt/conda/ && \ | ||
| rm miniconda.sh | ||
|
|
||
| RUN conda update -n base -c defaults conda -y && \ | ||
| conda install \ | ||
| -c conda-forge \ | ||
| -c bioconda \ | ||
| coreutils==8.31 \ | ||
| picard==2.20.2 \ | ||
| samtools==1.9 \ | ||
| bwa==0.7.17 \ | ||
| star==2.7.1a \ | ||
| fastqc==0.11.8 \ | ||
| qualimap==2.2.2c \ | ||
| multiqc==1.7 \ | ||
| rseqc==3.0.0 \ | ||
| htseq==0.11.2 \ | ||
| deeptools==3.3.1 \ | ||
| -y && \ | ||
| conda clean --all -y | ||
|
|
||
| COPY --from=fqlib-builder /opt/fqlib/bin/fq /usr/local/bin/ | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| FROM stjudecloud/bioinformatics-base:bleeding-edge |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| ## Description: | ||
| ## | ||
| ## This WDL tool wraps the DeepTools tool (https://deeptools.readthedocs.io/en/develop/index.html). | ||
| ## DeepTools is a suite of Python tools for analysis of high throughput sequencing analysis. | ||
|
|
||
| task bamCoverage { | ||
| File bam | ||
| File bai | ||
| String prefix = basename(bam, ".bam") | ||
|
|
||
| command { | ||
| if [ ! -e ${bam}.bai ] | ||
| then | ||
| ln -s ${bai} ${bam}.bai | ||
| fi | ||
|
|
||
| bamCoverage --bam ${bam} --outFileName ${prefix}.bw --outFileFormat bigwig --numberOfProcessors "max" | ||
| } | ||
|
|
||
| runtime { | ||
| docker: 'stjudecloud/bioinformatics-base:bleeding-edge' | ||
| } | ||
|
|
||
| output { | ||
| File bigwig = "${prefix}.bw" | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| ## Description: | ||
| ## | ||
| ## This WDL tool wraps the FastQC tool (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). | ||
| ## FastQC generates quality control metrics for sequencing pipelines. | ||
|
|
||
| task fastqc { | ||
| File bam | ||
| Int ncpu | ||
| String prefix = basename(bam, ".bam") | ||
|
|
||
| command { | ||
| mkdir ${prefix}_fastqc_results | ||
| fastqc -f bam \ | ||
| -o ${prefix}_fastqc_results \ | ||
| -t ${ncpu} \ | ||
| ${bam} | ||
| } | ||
|
|
||
| runtime { | ||
| docker: 'stjudecloud/bioinformatics-base:bleeding-edge' | ||
| } | ||
|
|
||
| output { | ||
| Array[File] out_files = glob("${prefix}_fastqc_results/*") | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,33 @@ | ||
| ## Description: | ||
| ## | ||
| ## This WDL tool wraps the fq tool (https://github.com/stjude/fqlib). | ||
| ## The fq library provides methods for manipulating Illumina generated | ||
| ## FastQ files. | ||
|
|
||
| task print_version { | ||
| command { | ||
| fq --version | ||
| } | ||
|
|
||
| runtime { | ||
| docker: 'stjudecloud/bioinformatics-base:bleeding-edge' | ||
| } | ||
|
|
||
| output { | ||
| String out = read_string(stdout()) | ||
| } | ||
|
|
||
| } | ||
|
|
||
| task fqlint { | ||
| File read1 | ||
| File read2 | ||
|
|
||
| runtime { | ||
| docker: 'stjudecloud/bioinformatics-base:bleeding-edge' | ||
| } | ||
|
|
||
| command { | ||
| fq lint ${read1} ${read2} | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,24 @@ | ||
| ## Description: | ||
| ## | ||
| ## This WDL tool wraps the htseq tool (https://github.com/simon-anders/htseq). | ||
| ## HTSeq is a Python library for analyzing sequencing data. | ||
|
|
||
| task count { | ||
| File bam | ||
| File gtf | ||
| String strand = "reverse" | ||
| String outfile = basename(bam, ".bam") + ".counts.txt" | ||
|
|
||
| command { | ||
| htseq-count -f bam -r pos -s ${strand} -m union -i gene_name --secondary-alignments ignore --supplementary-alignments ignore ${bam} ${gtf} > ${outfile} | ||
| } | ||
|
|
||
| runtime { | ||
| memory: "8G" | ||
| docker: 'stjudecloud/bioinformatics-base:bleeding-edge' | ||
| } | ||
|
|
||
| output { | ||
| File out = "${outfile}" | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,50 @@ | ||
| ## Description: | ||
| ## | ||
| ## This WDL tool wraps the md5sum tool from the GNU core | ||
| ## utilities (https://github.com/coreutils/coreutils). | ||
| ## md5sum is a utility for generating and verifying MD5 | ||
| ## hashes. | ||
|
|
||
| task print_version { | ||
| command { | ||
| md5sum --version | ||
| } | ||
|
|
||
| runtime { | ||
| docker: 'stjudecloud/bioinformatics-base:bleeding-edge' | ||
| } | ||
|
|
||
| output { | ||
| String out = read_string(stdout()) | ||
| } | ||
| } | ||
| task compute_checksum { | ||
| File infile | ||
|
|
||
| command { | ||
| md5sum ${infile} > stdout.txt | ||
| } | ||
|
|
||
| runtime { | ||
| docker: 'stjudecloud/bioinformatics-base:bleeding-edge' | ||
| } | ||
|
|
||
| output { | ||
| String out = read_string("stdout.txt") | ||
| } | ||
| } | ||
| task check_checksum { | ||
| File infile | ||
|
|
||
| command { | ||
| md5sum -c ${infile} > stdout.txt | ||
| } | ||
|
|
||
| runtime { | ||
| docker: 'stjudecloud/bioinformatics-base:bleeding-edge' | ||
| } | ||
|
|
||
| output { | ||
| String out = read_string("stdout.txt") | ||
| } | ||
| } |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assume the base image is already up-to-date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain a little further why this would be a best practice? Just curious from your perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the responsibility of the base image to maintain and update core packages periodically. Anything else can be updated/installed individually.