Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
5787bb6
Getting acquainted with WDL. Sketch of STAR workflow.
claymcleod Jun 30, 2019
f28631c
Draft of star-alignment dockerfile.
claymcleod Jul 1, 2019
0e3190a
Syncing of latest STAR dockerfile for Andrew.
claymcleod Jul 10, 2019
8b02f71
Add tasks to star.wdl
a-frantz Jul 15, 2019
3fe0ad3
Move new tasks in star.wdl to qc.wdl
a-frantz Jul 15, 2019
e98be0f
Add star-qc workflow
a-frantz Jul 15, 2019
427178c
Refactor star-qc.wdl
a-frantz Jul 15, 2019
eb79ac0
Reorganize wdl tools
adthrasher Jul 18, 2019
c52cdcc
Adding samtools tasks
adthrasher Jul 18, 2019
f4e48df
Use `glob()` instead of ugly Array[File] hack
a-frantz Jul 18, 2019
24cca66
Make samtools split fail on unaccounted reads, unless workflow overrides
a-frantz Jul 18, 2019
581e685
Remove extra bracket
a-frantz Jul 18, 2019
f1da9b0
Updated split function for samtools
adthrasher Jul 18, 2019
774b7ac
Merging changes
adthrasher Jul 18, 2019
47def83
Adding RSeQC infer_experiment call.
adthrasher Jul 19, 2019
fa5bd42
feat(docker): Make improvements to the docker images included here (s…
claymcleod Jul 19, 2019
ea7c349
chore(gitignore): Add log files to .gitignore (produced by `dive`)
claymcleod Jul 19, 2019
4b783c1
Fixing variable references
adthrasher Jul 19, 2019
900f72f
Merge branch 'rnaseq-workflow' of https://github.com/stjude/sjcloud-w…
adthrasher Jul 19, 2019
6dc5dc8
Implement steps 1-4 of RFC in bam_to_fasqs.wdl
a-frantz Jul 20, 2019
b1be8a0
Merge branch 'rnaseq-workflow' of github.com:stjude/sjcloud-workflows…
a-frantz Jul 20, 2019
0e01dcf
feat(docker): Update bioinformatics-base based on Michael's comments
claymcleod Jul 20, 2019
1d60b2b
Adding md5sum wdl
adthrasher Jul 22, 2019
69ac21a
Adding htseq-count
adthrasher Jul 22, 2019
bb78cbd
Apply suggestions from code review
adthrasher Jul 22, 2019
024c187
Changes to star alignment and star db build
a-frantz Jul 23, 2019
b089935
Begin creating full workflow, from start to finish
a-frantz Jul 23, 2019
cbfed24
Add to .gitignore
a-frantz Jul 23, 2019
a7b9d65
Adding output files
adthrasher Jul 24, 2019
78d37a7
Updated workflow that runs through infer_experiment.py
adthrasher Jul 26, 2019
3a948ac
Create fastqc output directory before run, otherwise fastqc complains…
adthrasher Jul 29, 2019
141be7f
Adding qualimap rnaseq qc task
adthrasher Jul 29, 2019
c038a9f
Updating full workflow to run through htseq-count step
adthrasher Jul 29, 2019
7e994f4
Adding a utility tool for handling tool output parsing and pre-proces…
adthrasher Jul 29, 2019
c0c6ecd
Capturing htseq-count output
adthrasher Jul 29, 2019
132f109
Adding samtools flagstat, index, and md5sum steps
adthrasher Jul 29, 2019
d1b36aa
Redirect desired stdout to files and use read_string(File) for output
a-frantz Jul 30, 2019
be30cbc
Implement multiqc
a-frantz Jul 31, 2019
dcb3d41
Use gtf instead of gff for htseq-count
a-frantz Jul 31, 2019
3726440
Make var name changes to make wdltool validate happy
a-frantz Jul 31, 2019
a2df3b3
Add runtime memory parameters to heavy tasks
a-frantz Aug 1, 2019
3d73ed8
Bumping the star alignment memory requirement and setting a limit for…
adthrasher Aug 7, 2019
381f7c6
Adding lsf.conf for Cromwell backend. Setting backend to use MB for m…
adthrasher Aug 8, 2019
41fe74e
Adding fq lib to the Docker image.
adthrasher Aug 9, 2019
52a1587
Renaming variables from basename for toil compatibility.
adthrasher Aug 9, 2019
552eaad
Installing fq to /usr/local/
adthrasher Aug 9, 2019
c0ee835
Moving fq lib to a separate build layer.
adthrasher Aug 12, 2019
d31370f
Updating LSF conf to allow singularity wrapper if a docker image is s…
adthrasher Aug 14, 2019
955df89
Merge branch 'rnaseq-workflow' of https://github.com/stjude/sjcloud-w…
adthrasher Aug 14, 2019
925bbe1
Adding docker runtime
adthrasher Aug 14, 2019
f98ad5e
Adding v0.3.1 tag to fq lib install
adthrasher Aug 14, 2019
67d38ed
Adding a template configuration file for running on AWS with Cromwell.
adthrasher Aug 21, 2019
f749943
Adding documentation to workflows and licensing information.
adthrasher Aug 27, 2019
0b9d066
Adding additional documentation.
adthrasher Aug 27, 2019
fea42d2
Adding deeptools to Docker image.
adthrasher Sep 22, 2019
4aafcac
Adding bigwig generation step with deeptools.
adthrasher Sep 23, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# Blacklist common bioinformatics formats used in these workflows.
**/*.fastq.gz
**/*.fa
**/*.gtf
**/*.gtf
**/*.bam
**/*.log

# Blacklist Cromwell files
**/cromwell*
9 changes: 9 additions & 0 deletions LICENSE.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
MIT License

Copyright (c) 2019 St. Jude Children's Research Hospital

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,10 @@
This repository is intended to contain all source code related to bioinformatics
workflows in St. Jude Cloud. This includes but is not limited to Docker container
definition, WDL/CWL tools and workflows, and relevant documentation. The repo is
currently under construction and should be considered in an experimental state.
currently under construction and should be considered in an experimental state.

## RNA-seq workflow V2
The RNA-seq workflow used by St. Jude Cloud to generate BAM files is implemented
in WDL. It uses the bioinformatics-base Docker image found in this repository.
Configuration files for the Cromwell runner have been included to facilitate running
on LSF in addition to local execution.
51 changes: 51 additions & 0 deletions conf/aws.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
include required(classpath("application"))

aws {

application-name = "cromwell"
auths = [
{
name = "default"
scheme = "default"
}
]
region = "<your-region>"
}

engine {
filesystems {
s3.auth = "default"
}
}

backend {
default = "AWSBatch"
providers {
AWSBatch {
actor-factory = "cromwell.backend.impl.aws.AwsBatchBackendLifecycleActorFactory"
config {

numSubmitAttempts = 6
numCreateDefinitionAttempts = 6

// Base bucket for workflow executions
root = "s3://<your-s3-bucket-name>/cromwell-execution"

// A reference to an auth defined in the `aws` stanza at the top. This auth is used to create
// Jobs and manipulate auth JSONs.
auth = "default"

default-runtime-attributes {
queueArn: "<your arn here>"
}

filesystems {
s3 {
// A reference to a potentially different auth for manipulating files via engine functions.
auth = "default"
}
}
}
}
}
}
59 changes: 59 additions & 0 deletions conf/lsf.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
include required(classpath("application"))

call-caching {
enabled = true
}

backend {
default = LSF
providers {
LSF {
actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
config {
runtime-attributes = """
Int cpu = 1
Int hosts = 1
Float? memory_mb = 3800
String lsf_queue = "standard"
String? lsf_job_group
String? docker
"""

#submit = """bsub -J ${job_name} -cwd ${cwd} -o ${out} -e ${err} -R rusage[mem=${memory_mb + "MB"}] -R "span[hosts=1]" -n ${cpu} ${"-q " + lsf_queue} /usr/bin/env bash ${script}"""

submit = """
bsub \
-q ${lsf_queue} \
-n ${cpu} \
${"-g " + lsf_job_group} \
-R "rusage[mem=${memory_mb}] span[hosts=${hosts}]" \
-J ${job_name} \
-cwd ${cwd} \
-o ${out} \
-e ${err} \
/usr/bin/env bash ${script}
"""

submit-docker = """
bsub \
-q ${lsf_queue} \
-n ${cpu} \
${"-g " + lsf_job_group} \
-R "select[singularity] rusage[mem=${memory_mb}] span[hosts=${hosts}]" \
-J ${job_name} \
-cwd ${cwd} \
-o ${cwd}/execution/stdout \
-e ${cwd}/execution/stderr \
"singularity exec --bind ${cwd}:${docker_cwd} docker://${docker} ${job_shell} ${script}"
"""


kill = "bkill ${job_id}"
check-alive = "bjobs ${job_id}"
job-id-regex = "Job <(\\d+)>.*"

exit-code-timeout-seconds = 120
}
}
}
}
27 changes: 0 additions & 27 deletions docker/bio-base.Dockerfile

This file was deleted.

39 changes: 39 additions & 0 deletions docker/bioinformatics-base/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
FROM rust:1.36.0 as fqlib-builder
RUN cargo install \
--git https://github.com/stjude/fqlib.git \
--tag v0.3.1 \
--root /opt/fqlib/

FROM ubuntu:18.04 as builder

ENV PATH /opt/conda/bin:$PATH

RUN apt-get update && \
apt-get upgrade -y && \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
apt-get upgrade -y && \

Assume the base image is already up-to-date.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain a little further why this would be a best practice? Just curious from your perspective.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the responsibility of the base image to maintain and update core packages periodically. Anything else can be updated/installed individually.

apt-get install wget -y && \
apt-get install gcc -y && \
rm -r /var/lib/apt/lists/*

RUN wget "https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh" -O miniconda.sh && \
bash miniconda.sh -b -p /opt/conda/ && \
rm miniconda.sh

RUN conda update -n base -c defaults conda -y && \
conda install \
-c conda-forge \
-c bioconda \
coreutils==8.31 \
picard==2.20.2 \
samtools==1.9 \
bwa==0.7.17 \
star==2.7.1a \
fastqc==0.11.8 \
qualimap==2.2.2c \
multiqc==1.7 \
rseqc==3.0.0 \
htseq==0.11.2 \
deeptools==3.3.1 \
-y && \
conda clean --all -y

COPY --from=fqlib-builder /opt/fqlib/bin/fq /usr/local/bin/
1 change: 1 addition & 0 deletions docker/star/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
FROM stjudecloud/bioinformatics-base:bleeding-edge
27 changes: 27 additions & 0 deletions tools/deeptools.wdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
## Description:
##
## This WDL tool wraps the DeepTools tool (https://deeptools.readthedocs.io/en/develop/index.html).
## DeepTools is a suite of Python tools for analysis of high throughput sequencing analysis.

task bamCoverage {
File bam
File bai
String prefix = basename(bam, ".bam")

command {
if [ ! -e ${bam}.bai ]
then
ln -s ${bai} ${bam}.bai
fi

bamCoverage --bam ${bam} --outFileName ${prefix}.bw --outFileFormat bigwig --numberOfProcessors "max"
}

runtime {
docker: 'stjudecloud/bioinformatics-base:bleeding-edge'
}

output {
File bigwig = "${prefix}.bw"
}
}
26 changes: 26 additions & 0 deletions tools/fastqc.wdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
## Description:
##
## This WDL tool wraps the FastQC tool (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/).
## FastQC generates quality control metrics for sequencing pipelines.

task fastqc {
File bam
Int ncpu
String prefix = basename(bam, ".bam")

command {
mkdir ${prefix}_fastqc_results
fastqc -f bam \
-o ${prefix}_fastqc_results \
-t ${ncpu} \
${bam}
}

runtime {
docker: 'stjudecloud/bioinformatics-base:bleeding-edge'
}

output {
Array[File] out_files = glob("${prefix}_fastqc_results/*")
}
}
33 changes: 33 additions & 0 deletions tools/fq.wdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
## Description:
##
## This WDL tool wraps the fq tool (https://github.com/stjude/fqlib).
## The fq library provides methods for manipulating Illumina generated
## FastQ files.

task print_version {
command {
fq --version
}

runtime {
docker: 'stjudecloud/bioinformatics-base:bleeding-edge'
}

output {
String out = read_string(stdout())
}

}

task fqlint {
File read1
File read2

runtime {
docker: 'stjudecloud/bioinformatics-base:bleeding-edge'
}

command {
fq lint ${read1} ${read2}
}
}
24 changes: 24 additions & 0 deletions tools/htseq.wdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
## Description:
##
## This WDL tool wraps the htseq tool (https://github.com/simon-anders/htseq).
## HTSeq is a Python library for analyzing sequencing data.

task count {
File bam
File gtf
String strand = "reverse"
String outfile = basename(bam, ".bam") + ".counts.txt"

command {
htseq-count -f bam -r pos -s ${strand} -m union -i gene_name --secondary-alignments ignore --supplementary-alignments ignore ${bam} ${gtf} > ${outfile}
}

runtime {
memory: "8G"
docker: 'stjudecloud/bioinformatics-base:bleeding-edge'
}

output {
File out = "${outfile}"
}
}
50 changes: 50 additions & 0 deletions tools/md5sum.wdl
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
## Description:
##
## This WDL tool wraps the md5sum tool from the GNU core
## utilities (https://github.com/coreutils/coreutils).
## md5sum is a utility for generating and verifying MD5
## hashes.

task print_version {
command {
md5sum --version
}

runtime {
docker: 'stjudecloud/bioinformatics-base:bleeding-edge'
}

output {
String out = read_string(stdout())
}
}
task compute_checksum {
File infile

command {
md5sum ${infile} > stdout.txt
}

runtime {
docker: 'stjudecloud/bioinformatics-base:bleeding-edge'
}

output {
String out = read_string("stdout.txt")
}
}
task check_checksum {
File infile

command {
md5sum -c ${infile} > stdout.txt
}

runtime {
docker: 'stjudecloud/bioinformatics-base:bleeding-edge'
}

output {
String out = read_string("stdout.txt")
}
}
Loading