From 46348e4cd1e75ef599b8bc33209cda1cdd70b523 Mon Sep 17 00:00:00 2001
From: adthrasher <adthrasher@users.noreply.github.com>
Date: Thu, 23 May 2024 15:59:47 +0000
Subject: [PATCH] deploy: 2970ed4ffec95cbc3c94b3e85543d46c4e730f71

---
 index.html                                 |   2 +-
 search/search_index.json                   |   2 +-
 sitemap.xml.gz                             | Bin 127 -> 127 bytes
 workflows/dnaseq-core/index.html           |   1 +
 workflows/dnaseq-standard-fastq/index.html |   3 ++-
 workflows/dnaseq-standard/index.html       |   2 +-
 6 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/index.html b/index.html
index 72ef050d..124e4c29 100644
--- a/index.html
+++ b/index.html
@@ -265,5 +265,5 @@ <h2 id="license">📝 License</h2>
 
 <!--
 MkDocs version : 1.6.0
-Build Date UTC : 2024-05-13 17:33:31.987922+00:00
+Build Date UTC : 2024-05-23 15:59:46.573858+00:00
 -->
diff --git a/search/search_index.json b/search/search_index.json
index 6062d0ac..884b0699 100644
--- a/search/search_index.json
+++ b/search/search_index.json
@@ -1 +1 @@
-{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"This repository contains all bioinformatics workflows used on the St. Jude Cloud project. Officially, the repository is in beta \u2014 the project is adding workflows as they are developed and put into production. Resources requirements have been optimized to minimize failures in our computing environment, but they may not reflect the best settings for your use case. Please ensure that you tailor these parameters to fit your needs. \ud83c\udfe0 Homepage Please excuse the state of our documentation. We are working on some big changes around here, and with those changes will come much improved documentation. Repository Structure The repository is laid out as follows: workflows/ - Directory containing all end-to-end bioinformatics workflows. tools/ - All tools we have wrapped as individual WDL tasks. data_structures/ - WDL struct definitions and tasks or workflows related to their construction, parsing, or validation. docker/ - Dockerfiles used in our workflows. All docker images are published to the GitHub Container Registy as a part of our CI and are versioned. tests/ - Home to all of our testing infrastructure. We use pytest-workflow for validating our code. bin/ - no longer in use Scripts used by Cromwell configuration settings. Add this to $PATH prior to using configurations in conf with Cromwell. conf/ - no longer in use Cromwell configuration files created for various environments that we use across our team. Feel free to use/fork/suggest improvements. Bootstrap guide This repository implements workflows using the Workflow Description Language (WDL). If unfamiliar with WDL, a short overview is available in the WDL spec . The workflows and tasks in this repository should require minimal set-up and configuration before you're ready to run. You don't even need to clone the repo! The bare minimum requirements are a locally installed WDL runner and an internet connection. The exact steps for installation, configuration, and execution are going to depend on you environment and preferred engine. There are a variety of WDL engines you could use, though our team prefers miniwdl . We also make use of the miniwdl-lsf plugin for running on our LSF cluster. Most WDL runners are capable of running a WDL file from a URL. This is how we most commonly execute our workflows and tasks. The below command could be used to submit a run of our rnaseq-standard workflow using miniwdl : miniwdl run --verbose --input inputs.json https://raw.githubusercontent.com/stjudecloud/workflows/rnaseq-standard/v3.0.1/workflows/rnaseq/rnaseq-standard.wdl For an introduction to WDL, there are many guides, one of which is from Terra . Author \ud83d\udc64 St. Jude Cloud Team Website: https://stjude.cloud Github: @stjudecloud Twitter: @StJudeResearch Tests Every task in this repository is covered by at least one test (see all of our tests in tests/tools/ ). These are run using pytest-workflow . The command for running our tests should be executed at the root of the repo: python -m pytest --kwdof --git-aware \ud83e\udd1d Contributing Contributions, issues and feature requests are welcome! Feel free to check issues page . You can also take a look at the contributing guide . Links worth checking out The OpenWDL GitHub Our preferred WDL runner: miniwdl Most of our tasks are run inside a BioContainers image Our tasks are validated using pytest-workflow \ud83d\udcdd License Copyright \u00a9 2020-Present St. Jude Cloud Team . This project is MIT licensed.","title":"Home"},{"location":"#homepage","text":"Please excuse the state of our documentation. We are working on some big changes around here, and with those changes will come much improved documentation.","title":"\ud83c\udfe0 Homepage"},{"location":"#repository-structure","text":"The repository is laid out as follows: workflows/ - Directory containing all end-to-end bioinformatics workflows. tools/ - All tools we have wrapped as individual WDL tasks. data_structures/ - WDL struct definitions and tasks or workflows related to their construction, parsing, or validation. docker/ - Dockerfiles used in our workflows. All docker images are published to the GitHub Container Registy as a part of our CI and are versioned. tests/ - Home to all of our testing infrastructure. We use pytest-workflow for validating our code. bin/ - no longer in use Scripts used by Cromwell configuration settings. Add this to $PATH prior to using configurations in conf with Cromwell. conf/ - no longer in use Cromwell configuration files created for various environments that we use across our team. Feel free to use/fork/suggest improvements.","title":"Repository Structure"},{"location":"#bootstrap-guide","text":"This repository implements workflows using the Workflow Description Language (WDL). If unfamiliar with WDL, a short overview is available in the WDL spec . The workflows and tasks in this repository should require minimal set-up and configuration before you're ready to run. You don't even need to clone the repo! The bare minimum requirements are a locally installed WDL runner and an internet connection. The exact steps for installation, configuration, and execution are going to depend on you environment and preferred engine. There are a variety of WDL engines you could use, though our team prefers miniwdl . We also make use of the miniwdl-lsf plugin for running on our LSF cluster. Most WDL runners are capable of running a WDL file from a URL. This is how we most commonly execute our workflows and tasks. The below command could be used to submit a run of our rnaseq-standard workflow using miniwdl : miniwdl run --verbose --input inputs.json https://raw.githubusercontent.com/stjudecloud/workflows/rnaseq-standard/v3.0.1/workflows/rnaseq/rnaseq-standard.wdl For an introduction to WDL, there are many guides, one of which is from Terra .","title":"Bootstrap guide"},{"location":"#author","text":"\ud83d\udc64 St. Jude Cloud Team Website: https://stjude.cloud Github: @stjudecloud Twitter: @StJudeResearch","title":"Author"},{"location":"#tests","text":"Every task in this repository is covered by at least one test (see all of our tests in tests/tools/ ). These are run using pytest-workflow . The command for running our tests should be executed at the root of the repo: python -m pytest --kwdof --git-aware","title":"Tests"},{"location":"#contributing","text":"Contributions, issues and feature requests are welcome! Feel free to check issues page . You can also take a look at the contributing guide .","title":"\ud83e\udd1d Contributing"},{"location":"#links-worth-checking-out","text":"The OpenWDL GitHub Our preferred WDL runner: miniwdl Most of our tasks are run inside a BioContainers image Our tasks are validated using pytest-workflow","title":"Links worth checking out"},{"location":"#license","text":"Copyright \u00a9 2020-Present St. Jude Cloud Team . This project is MIT licensed.","title":"\ud83d\udcdd License"},{"location":"build_for_dnanexus/","text":"Building WDL workflows for DNAnexus Obtain dxWDL JAR Retrieve the latest dxWDL JAR release from GitHub: https://github.com/dnanexus/dxWDL/releases Optional workflow parameters for dxWDL -project - Specify a project to compile the workflow. This is optional and otherwise uses the currently selected project. -archive - Archive older versions of the workflow and applets -defaults - Set default options for certain parameters -verbose - Detailed build information -locked - Creates a one stage worklfow that is cleaner in the interface -extras - JSON formatted file with options primarily for the DNAnexus platform settings Build Interactive t-SNE workflow for DNAnexus Commands for building the t-SNE workflows are included below. Your version of dxWDL may differ from the version included below. Several optional parameters are included. -defaults specifies DNAnexus paths to reference data for the workflow. -extras specifies that tasks should be retried by default on failure. Build workflow running htseq-count on BAM input java -jar dxWDL-v1.46.2.jar compile workflows/interactive-tsne/interactive_tsne_from_bams.wdl -project project-FjFfvV89F80QvvxJ8131yzpB -archive -verbose -defaults workflows/interactive-tsne/inputs/defaults_bam.json -extras workflows/interactive-tsne/inputs/extras.json -locked Build workflow from HTSeq counts data java -jar dxWDL-v1.46.2.jar compile workflows/interactive-tsne/interactive_tsne_from_counts.wdl -project project-FjFfvV89F80QvvxJ8131yzpB -archive -verbose -defaults workflows/interactive-tsne/inputs/defaults_counts.json -extras workflows/interactive-tsne/inputs/extras.json -locked Build workflow with RNA-Seq V2 remapping of BAM input java -jar dxWDL-v1.46.2.jar compile workflows/interactive-tsne/interactive-tsne.wdl -project project-FjFfvV89F80QvvxJ8131yzpB -archive -verbose -defaults workflows/interactive-tsne/inputs/defaults.json -extras workflows/interactive-tsne/inputs/extras.json -locked","title":"Building WDL workflows for DNAnexus"},{"location":"build_for_dnanexus/#building-wdl-workflows-for-dnanexus","text":"","title":"Building WDL workflows for DNAnexus"},{"location":"build_for_dnanexus/#obtain-dxwdl-jar","text":"Retrieve the latest dxWDL JAR release from GitHub: https://github.com/dnanexus/dxWDL/releases","title":"Obtain dxWDL JAR"},{"location":"build_for_dnanexus/#optional-workflow-parameters-for-dxwdl","text":"-project - Specify a project to compile the workflow. This is optional and otherwise uses the currently selected project. -archive - Archive older versions of the workflow and applets -defaults - Set default options for certain parameters -verbose - Detailed build information -locked - Creates a one stage worklfow that is cleaner in the interface -extras - JSON formatted file with options primarily for the DNAnexus platform settings","title":"Optional workflow parameters for dxWDL"},{"location":"build_for_dnanexus/#build-interactive-t-sne-workflow-for-dnanexus","text":"Commands for building the t-SNE workflows are included below. Your version of dxWDL may differ from the version included below. Several optional parameters are included. -defaults specifies DNAnexus paths to reference data for the workflow. -extras specifies that tasks should be retried by default on failure.","title":"Build Interactive t-SNE workflow for DNAnexus"},{"location":"build_for_dnanexus/#build-workflow-running-htseq-count-on-bam-input","text":"java -jar dxWDL-v1.46.2.jar compile workflows/interactive-tsne/interactive_tsne_from_bams.wdl -project project-FjFfvV89F80QvvxJ8131yzpB -archive -verbose -defaults workflows/interactive-tsne/inputs/defaults_bam.json -extras workflows/interactive-tsne/inputs/extras.json -locked","title":"Build workflow running htseq-count on BAM input"},{"location":"build_for_dnanexus/#build-workflow-from-htseq-counts-data","text":"java -jar dxWDL-v1.46.2.jar compile workflows/interactive-tsne/interactive_tsne_from_counts.wdl -project project-FjFfvV89F80QvvxJ8131yzpB -archive -verbose -defaults workflows/interactive-tsne/inputs/defaults_counts.json -extras workflows/interactive-tsne/inputs/extras.json -locked","title":"Build workflow from HTSeq counts data"},{"location":"build_for_dnanexus/#build-workflow-with-rna-seq-v2-remapping-of-bam-input","text":"java -jar dxWDL-v1.46.2.jar compile workflows/interactive-tsne/interactive-tsne.wdl -project project-FjFfvV89F80QvvxJ8131yzpB -archive -verbose -defaults workflows/interactive-tsne/inputs/defaults.json -extras workflows/interactive-tsne/inputs/extras.json -locked","title":"Build workflow with RNA-Seq V2 remapping of BAM input"},{"location":"tasks/bwa/","text":"Homepage bwa_aln description Maps Single-End FASTQ files to BAM format using bwa aln outputs {'bam': 'Aligned BAM format file'} Inputs Required _runtime (Any, required ) bwa_db_tar_gz (File, required ): Gzipped tar archive of the bwa reference files. Files should be at the root of the archive. fastq (File, required ): Input FASTQ file to align with bwa Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true prefix (String, default=sub(basename(fastq),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\")): Prefix for the BAM file. The extension .bam will be added. read_group (String, default=\"\"); description : Read group information for BWA to insert into the header. BWA format: '@RG ID:foo SM:bar'; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs bam (File) bwa_aln_pe description Maps Paired-End FASTQ files to BAM format using bwa aln outputs {'bam': 'Aligned BAM format file'} Inputs Required _runtime (Any, required ) bwa_db_tar_gz (File, required ): Gzipped tar archive of the bwa reference files. Files should be at the root of the archive. read_one_fastq_gz (File, required ); description : Input gzipped FASTQ read one file to align with bwa; stream : false read_two_fastq_gz (File, required ); description : Input gzipped FASTQ read two file to align with bwa; stream : false Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4); description : Number of cores to allocate for task; common : true prefix (String, default=sub(basename(read_one_fastq_gz),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\")): Prefix for the BAM file. The extension .bam will be added. read_group (String, default=\"\"); description : Read group information for BWA to insert into the header. BWA format: '@RG ID:foo SM:bar'; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs bam (File) bwa_mem description Maps FASTQ files to BAM format using bwa mem outputs {'bam': 'Aligned BAM format file'} Inputs Required _runtime (Any, required ) bwa_db_tar_gz (File, required ): Gzipped tar archive of the bwa reference files. Files should be at the root of the archive. read_one_fastq_gz (File, required ): Input gzipped FASTQ read one file to align with bwa Optional read_two_fastq_gz (File?): Input gzipped FASTQ read two file to align with bwa Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4); description : Number of cores to allocate for task; common : true prefix (String, default=sub(basename(read_one_fastq_gz),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\")): Prefix for the BAM file. The extension .bam will be added. read_group (String, default=\"\"); description : Read group information for BWA to insert into the header. BWA format: '@RG ID:foo SM:bar'; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs bam (File) build_bwa_db description Creates a BWA index and returns it as a compressed tar archive outputs {'bwa_db_tar_gz': 'Tarballed bwa reference files'} Inputs Required _runtime (Any, required ) reference_fasta (File, required ): Input reference Fasta file to index with bwa. Should be compressed with gzip. Defaults db_name (String, default=\"bwa_db\"); description : Name of the output gzipped tar archive of the bwa reference files. The extension .tar.gz will be added.; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. Outputs bwa_db_tar_gz (File)","title":"Bwa"},{"location":"tasks/bwa/#bwa_aln","text":"description Maps Single-End FASTQ files to BAM format using bwa aln outputs {'bam': 'Aligned BAM format file'}","title":"bwa_aln"},{"location":"tasks/bwa/#inputs","text":"","title":"Inputs"},{"location":"tasks/bwa/#required","text":"_runtime (Any, required ) bwa_db_tar_gz (File, required ): Gzipped tar archive of the bwa reference files. Files should be at the root of the archive. fastq (File, required ): Input FASTQ file to align with bwa","title":"Required"},{"location":"tasks/bwa/#defaults","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true prefix (String, default=sub(basename(fastq),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\")): Prefix for the BAM file. The extension .bam will be added. read_group (String, default=\"\"); description : Read group information for BWA to insert into the header. BWA format: '@RG ID:foo SM:bar'; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/bwa/#outputs","text":"bam (File)","title":"Outputs"},{"location":"tasks/bwa/#bwa_aln_pe","text":"description Maps Paired-End FASTQ files to BAM format using bwa aln outputs {'bam': 'Aligned BAM format file'}","title":"bwa_aln_pe"},{"location":"tasks/bwa/#inputs_1","text":"","title":"Inputs"},{"location":"tasks/bwa/#required_1","text":"_runtime (Any, required ) bwa_db_tar_gz (File, required ): Gzipped tar archive of the bwa reference files. Files should be at the root of the archive. read_one_fastq_gz (File, required ); description : Input gzipped FASTQ read one file to align with bwa; stream : false read_two_fastq_gz (File, required ); description : Input gzipped FASTQ read two file to align with bwa; stream : false","title":"Required"},{"location":"tasks/bwa/#defaults_1","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4); description : Number of cores to allocate for task; common : true prefix (String, default=sub(basename(read_one_fastq_gz),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\")): Prefix for the BAM file. The extension .bam will be added. read_group (String, default=\"\"); description : Read group information for BWA to insert into the header. BWA format: '@RG ID:foo SM:bar'; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/bwa/#outputs_1","text":"bam (File)","title":"Outputs"},{"location":"tasks/bwa/#bwa_mem","text":"description Maps FASTQ files to BAM format using bwa mem outputs {'bam': 'Aligned BAM format file'}","title":"bwa_mem"},{"location":"tasks/bwa/#inputs_2","text":"","title":"Inputs"},{"location":"tasks/bwa/#required_2","text":"_runtime (Any, required ) bwa_db_tar_gz (File, required ): Gzipped tar archive of the bwa reference files. Files should be at the root of the archive. read_one_fastq_gz (File, required ): Input gzipped FASTQ read one file to align with bwa","title":"Required"},{"location":"tasks/bwa/#optional","text":"read_two_fastq_gz (File?): Input gzipped FASTQ read two file to align with bwa","title":"Optional"},{"location":"tasks/bwa/#defaults_2","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4); description : Number of cores to allocate for task; common : true prefix (String, default=sub(basename(read_one_fastq_gz),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\")): Prefix for the BAM file. The extension .bam will be added. read_group (String, default=\"\"); description : Read group information for BWA to insert into the header. BWA format: '@RG ID:foo SM:bar'; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/bwa/#outputs_2","text":"bam (File)","title":"Outputs"},{"location":"tasks/bwa/#build_bwa_db","text":"description Creates a BWA index and returns it as a compressed tar archive outputs {'bwa_db_tar_gz': 'Tarballed bwa reference files'}","title":"build_bwa_db"},{"location":"tasks/bwa/#inputs_3","text":"","title":"Inputs"},{"location":"tasks/bwa/#required_3","text":"_runtime (Any, required ) reference_fasta (File, required ): Input reference Fasta file to index with bwa. Should be compressed with gzip.","title":"Required"},{"location":"tasks/bwa/#defaults_3","text":"db_name (String, default=\"bwa_db\"); description : Name of the output gzipped tar archive of the bwa reference files. The extension .tar.gz will be added.; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.","title":"Defaults"},{"location":"tasks/bwa/#outputs_3","text":"bwa_db_tar_gz (File)","title":"Outputs"},{"location":"tasks/cellranger/","text":"Cell Ranger This WDL file wrap the 10x Genomics Cell Ranger tool. Cell Ranger is a tool for handling scRNA-Seq data. count description This WDL task runs Cell Ranger count to generate an aligned BAM and feature counts from scRNA-Seq data. Inputs Required _runtime (Any, required ) fastqs_tar_gz (File, required ): Path to the FASTQ folder archive in .tar.gz format id (String, required ): A unique run ID transcriptome_tar_gz (File, required ): Path to Cell Ranger-compatible transcriptome reference in .tar.gz format Defaults memory_gb (Int, default=16): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=1): Number of cores to allocate for task use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. Outputs bam (File) bam_index (File) qc (File) barcodes (File) features (File) matrix (File) filtered_gene_h5 (File) raw_gene_h5 (File) raw_barcodes (File) raw_features (File) raw_matrix (File) mol_info_h5 (File) web_summary (File) cloupe (File) bamtofastq description This WDL task runs the 10x bamtofastq tool to convert Cell Ranger generated BAM files back to FASTQ files Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM to convert to Cell Ranger compatible fastqs Defaults cellranger11 (Boolean, default=false): Convert a BAM produced by Cell Ranger 1.0-1.1 gemcode (Boolean, default=false): Convert a BAM produced from GemCode data (Longranger 1.0 - 1.3) longranger20 (Boolean, default=false): Convert a BAM produced by Longranger 2.0 memory_gb (Int, default=40): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=1): Number of cores to allocate for task use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. Outputs fastqs (Array[File]) fastqs_archive (File) read_one_fastq_gz (Array[File]) read_two_fastq_gz (Array[File])","title":"Cellranger"},{"location":"tasks/cellranger/#count","text":"description This WDL task runs Cell Ranger count to generate an aligned BAM and feature counts from scRNA-Seq data.","title":"count"},{"location":"tasks/cellranger/#inputs","text":"","title":"Inputs"},{"location":"tasks/cellranger/#required","text":"_runtime (Any, required ) fastqs_tar_gz (File, required ): Path to the FASTQ folder archive in .tar.gz format id (String, required ): A unique run ID transcriptome_tar_gz (File, required ): Path to Cell Ranger-compatible transcriptome reference in .tar.gz format","title":"Required"},{"location":"tasks/cellranger/#defaults","text":"memory_gb (Int, default=16): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=1): Number of cores to allocate for task use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments.","title":"Defaults"},{"location":"tasks/cellranger/#outputs","text":"bam (File) bam_index (File) qc (File) barcodes (File) features (File) matrix (File) filtered_gene_h5 (File) raw_gene_h5 (File) raw_barcodes (File) raw_features (File) raw_matrix (File) mol_info_h5 (File) web_summary (File) cloupe (File)","title":"Outputs"},{"location":"tasks/cellranger/#bamtofastq","text":"description This WDL task runs the 10x bamtofastq tool to convert Cell Ranger generated BAM files back to FASTQ files","title":"bamtofastq"},{"location":"tasks/cellranger/#inputs_1","text":"","title":"Inputs"},{"location":"tasks/cellranger/#required_1","text":"_runtime (Any, required ) bam (File, required ): Input BAM to convert to Cell Ranger compatible fastqs","title":"Required"},{"location":"tasks/cellranger/#defaults_1","text":"cellranger11 (Boolean, default=false): Convert a BAM produced by Cell Ranger 1.0-1.1 gemcode (Boolean, default=false): Convert a BAM produced from GemCode data (Longranger 1.0 - 1.3) longranger20 (Boolean, default=false): Convert a BAM produced by Longranger 2.0 memory_gb (Int, default=40): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=1): Number of cores to allocate for task use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments.","title":"Defaults"},{"location":"tasks/cellranger/#outputs_1","text":"fastqs (Array[File]) fastqs_archive (File) read_one_fastq_gz (Array[File]) read_two_fastq_gz (Array[File])","title":"Outputs"},{"location":"tasks/deeptools/","text":"Homepage bam_coverage description Generates a BigWig coverage track using bamCoverage from DeepTools outputs {'bigwig': 'BigWig format coverage file'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to generate coverage for bam_index (File, required ): BAM index file corresponding to the input BAM Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\")): Prefix for the BigWig file. The extension .bw will be added. use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs bigwig (File)","title":"Deeptools"},{"location":"tasks/deeptools/#bam_coverage","text":"description Generates a BigWig coverage track using bamCoverage from DeepTools outputs {'bigwig': 'BigWig format coverage file'}","title":"bam_coverage"},{"location":"tasks/deeptools/#inputs","text":"","title":"Inputs"},{"location":"tasks/deeptools/#required","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to generate coverage for bam_index (File, required ): BAM index file corresponding to the input BAM","title":"Required"},{"location":"tasks/deeptools/#defaults","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\")): Prefix for the BigWig file. The extension .bw will be added. use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/deeptools/#outputs","text":"bigwig (File)","title":"Outputs"},{"location":"tasks/estimate/","text":"Homepage run_estimate description [DEPRECATED] Given a gene expression file, run the ESTIMATE software package outputs {'estimate_file': 'The results file of the ESTIMATE software package'} deprecated true Inputs Required _runtime (Any, required ) gene_expression_file (File, required ): A 2 column headered TSV file with 'Gene name' in the first column and gene expression values (as floats) in the second column. Can be generated with the calc_tpm task. Defaults disk_size_gb (Int, default=10): Disk space to allocate for task, specified in GB max_retries (Int, default=1): Number of times to retry in case of failure memory_gb (Int, default=4): RAM to allocate for task, specified in GB outfile_name (String, default=basename(gene_expression_file,\".TPM.txt\") + \".ESTIMATE.gct\"): Name of the ESTIMATE output file Outputs estimate_file (File)","title":"Estimate"},{"location":"tasks/estimate/#run_estimate","text":"description [DEPRECATED] Given a gene expression file, run the ESTIMATE software package outputs {'estimate_file': 'The results file of the ESTIMATE software package'} deprecated true","title":"run_estimate"},{"location":"tasks/estimate/#inputs","text":"","title":"Inputs"},{"location":"tasks/estimate/#required","text":"_runtime (Any, required ) gene_expression_file (File, required ): A 2 column headered TSV file with 'Gene name' in the first column and gene expression values (as floats) in the second column. Can be generated with the calc_tpm task.","title":"Required"},{"location":"tasks/estimate/#defaults","text":"disk_size_gb (Int, default=10): Disk space to allocate for task, specified in GB max_retries (Int, default=1): Number of times to retry in case of failure memory_gb (Int, default=4): RAM to allocate for task, specified in GB outfile_name (String, default=basename(gene_expression_file,\".TPM.txt\") + \".ESTIMATE.gct\"): Name of the ESTIMATE output file","title":"Defaults"},{"location":"tasks/estimate/#outputs","text":"estimate_file (File)","title":"Outputs"},{"location":"tasks/fastqc/","text":"Homepage fastqc description Generates a FastQC quality control metrics report for the input BAM file outputs {'raw_data': 'A zip archive of raw FastQC data. Can be parsed by MultiQC.', 'results': 'A gzipped tar archive of all FastQC output files'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to run FastQC on Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\") + \".fastqc_results\"): Prefix for the FastQC results directory. The extension .tar.gz will be added. use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs raw_data (File) results (File)","title":"Fastqc"},{"location":"tasks/fastqc/#fastqc","text":"description Generates a FastQC quality control metrics report for the input BAM file outputs {'raw_data': 'A zip archive of raw FastQC data. Can be parsed by MultiQC.', 'results': 'A gzipped tar archive of all FastQC output files'}","title":"fastqc"},{"location":"tasks/fastqc/#inputs","text":"","title":"Inputs"},{"location":"tasks/fastqc/#required","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to run FastQC on","title":"Required"},{"location":"tasks/fastqc/#defaults","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\") + \".fastqc_results\"): Prefix for the FastQC results directory. The extension .tar.gz will be added. use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/fastqc/#outputs","text":"raw_data (File) results (File)","title":"Outputs"},{"location":"tasks/fq/","text":"Homepage fqlint description Performs quality control on the input FASTQs to ensure proper formatting outputs {'validated_read1': 'The unmodified input read one FASTQ after it has been successfully validated', 'validated_read2': 'The unmodified input read two FASTQ after it has been successfully validated'} Inputs Required _runtime (Any, required ) read_one_fastq (File, required ); description : Input FASTQ with read one. Can be gzipped or uncompressed.; stream : true Optional read_two_fastq (File?); description : Input FASTQ with read two. Can be gzipped or uncompressed.; stream : true Defaults disable_validator_codes (Array[String], default=[]); description : Array of codes to disable specific validators; choices : {'S001': \"Plus line starts with a '+'\", 'S002': \"All characters in sequence line are one of 'ACGTN', case-insensitive\", 'S003': \"Name line starts with an '@'\", 'S004': 'All four record lines (name, sequence, plus line, and quality) are present', 'S005': 'Sequence and quality lengths are the same', 'S006': \"All characters in quality line are between '!' and '~' (ordinal values)\", 'S007': 'All record names are unique', 'P001': 'Each paired read name is the same, excluding interleave'} modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. paired_read_validation_level (String, default=\"high\"); description : Only use paired read validators up to a given level; choices : ['low', 'medium', 'high'] panic (Boolean, default=true); description : Panic on first error (true) or log all errors (false)?; common : true single_read_validation_level (String, default=\"high\"); description : Only use single read validators up to a given level; choices : ['low', 'medium', 'high'] Outputs check (String) subsample description Subsamples the input FASTQ(s) outputs {'subsampled_read1': 'Gzipped FASTQ file containing subsampled read1', 'subsampled_read2': 'Gzipped FASTQ file containing subsampled read2'} Inputs Required _runtime (Any, required ) read_one_fastq (File, required ): Input FASTQ with read one. Can be gzipped or uncompressed. Optional read_two_fastq (File?): Input FASTQ with read two. Can be gzipped or uncompressed. Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=sub(basename(read_one_fastq),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\")): Prefix for the output FASTQ file(s). The extension _R1.subsampled.fastq.gz and _R2.subsampled.fastq.gz will be added. probability (Float, default=1.0); description : The probability a record is kept, as a decimal (0.0, 1.0). Cannot be used with record-count . Any probability<=0.0 or probability>=1.0 to disable.; common : true record_count (Int, default=-1); description : The exact number of records to keep. Cannot be used with probability . Any record_count<=0 to disable.; common : true Outputs subsampled_read1 (File) subsampled_read2 (File?)","title":"Fq"},{"location":"tasks/fq/#fqlint","text":"description Performs quality control on the input FASTQs to ensure proper formatting outputs {'validated_read1': 'The unmodified input read one FASTQ after it has been successfully validated', 'validated_read2': 'The unmodified input read two FASTQ after it has been successfully validated'}","title":"fqlint"},{"location":"tasks/fq/#inputs","text":"","title":"Inputs"},{"location":"tasks/fq/#required","text":"_runtime (Any, required ) read_one_fastq (File, required ); description : Input FASTQ with read one. Can be gzipped or uncompressed.; stream : true","title":"Required"},{"location":"tasks/fq/#optional","text":"read_two_fastq (File?); description : Input FASTQ with read two. Can be gzipped or uncompressed.; stream : true","title":"Optional"},{"location":"tasks/fq/#defaults","text":"disable_validator_codes (Array[String], default=[]); description : Array of codes to disable specific validators; choices : {'S001': \"Plus line starts with a '+'\", 'S002': \"All characters in sequence line are one of 'ACGTN', case-insensitive\", 'S003': \"Name line starts with an '@'\", 'S004': 'All four record lines (name, sequence, plus line, and quality) are present', 'S005': 'Sequence and quality lengths are the same', 'S006': \"All characters in quality line are between '!' and '~' (ordinal values)\", 'S007': 'All record names are unique', 'P001': 'Each paired read name is the same, excluding interleave'} modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. paired_read_validation_level (String, default=\"high\"); description : Only use paired read validators up to a given level; choices : ['low', 'medium', 'high'] panic (Boolean, default=true); description : Panic on first error (true) or log all errors (false)?; common : true single_read_validation_level (String, default=\"high\"); description : Only use single read validators up to a given level; choices : ['low', 'medium', 'high']","title":"Defaults"},{"location":"tasks/fq/#outputs","text":"check (String)","title":"Outputs"},{"location":"tasks/fq/#subsample","text":"description Subsamples the input FASTQ(s) outputs {'subsampled_read1': 'Gzipped FASTQ file containing subsampled read1', 'subsampled_read2': 'Gzipped FASTQ file containing subsampled read2'}","title":"subsample"},{"location":"tasks/fq/#inputs_1","text":"","title":"Inputs"},{"location":"tasks/fq/#required_1","text":"_runtime (Any, required ) read_one_fastq (File, required ): Input FASTQ with read one. Can be gzipped or uncompressed.","title":"Required"},{"location":"tasks/fq/#optional_1","text":"read_two_fastq (File?): Input FASTQ with read two. Can be gzipped or uncompressed.","title":"Optional"},{"location":"tasks/fq/#defaults_1","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=sub(basename(read_one_fastq),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\")): Prefix for the output FASTQ file(s). The extension _R1.subsampled.fastq.gz and _R2.subsampled.fastq.gz will be added. probability (Float, default=1.0); description : The probability a record is kept, as a decimal (0.0, 1.0). Cannot be used with record-count . Any probability<=0.0 or probability>=1.0 to disable.; common : true record_count (Int, default=-1); description : The exact number of records to keep. Cannot be used with probability . Any record_count<=0 to disable.; common : true","title":"Defaults"},{"location":"tasks/fq/#outputs_1","text":"subsampled_read1 (File) subsampled_read2 (File?)","title":"Outputs"},{"location":"tasks/gatk4/","text":"Homepage split_n_cigar_reads description Splits reads that contain Ns in their CIGAR strings into multiple reads. external_help https://gatk.broadinstitute.org/hc/en-us/articles/360036858811-SplitNCigarReads outputs {'split_n_reads_bam': 'BAM file with reads split at N CIGAR elements and updated CIGAR strings.', 'split_n_reads_bam_index': 'Index file for the split BAM', 'split_n_reads_bam_md5': 'MD5 checksum for the split BAM'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to with unsplit reads containing Ns in their CIGAR strings. bam_index (File, required ): BAM index file corresponding to the input BAM dict (File, required ): Dictionary file for FASTA format genome fasta (File, required ): Reference genome in FASTA format. Must be uncompressed. fasta_index (File, required ): Index for FASTA format genome interval_list (File, required ): Interval list indicating regions in which to split reads Defaults memory_gb (Int, default=25): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=8): Number of cores to allocate for task prefix (String, default=basename(bam,\".bam\") + \".split\"): Prefix for the BAM file. The extension .bam will be added. Outputs split_n_reads_bam (File) split_n_reads_bam_index (File) split_n_reads_bam_md5 (File) base_recalibrator description Generates recalibration report for base quality score recalibration. external_help https://gatk.broadinstitute.org/hc/en-us/articles/360036897372-BaseRecalibratorSpark-BETA outputs {'recalibration_report': 'Recalibration report file'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file on which to recabilbrate base quality scores bam_index (File, required ): BAM index file corresponding to the input BAM dbSNP_vcf (File, required ): dbSNP VCF file dbSNP_vcf_index (File, required ): dbSNP VCF index file dict (File, required ): Dictionary file for FASTA format genome fasta (File, required ): Reference genome in FASTA format fasta_index (File, required ): Index for FASTA format genome known_indels_sites_VCFs (Array[File], required ): List of VCF files containing known indels known_indels_sites_indices (Array[File], required ): List of VCF index files corresponding to the VCF files in known_indels_sites_VCFs Defaults memory_gb (Int, default=25): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4): Number of cores to allocate for task outfile_name (String, default=basename(bam,\".bam\") + \".recal.txt\"): Name for the output recalibration report. use_original_quality_scores (Boolean, default=false): Use original quality scores from the input BAM. Default is to use recalibrated quality scores. Outputs recalibration_report (File) apply_bqsr description Applies base quality score recalibration to a BAM file. external_help https://gatk.broadinstitute.org/hc/en-us/articles/360040097972-ApplyBQSRSpark-BETA outputs {'recalibrated_bam': 'Recalibrated BAM file', 'recalibrated_bam_index': 'Index file for the recalibrated BAM'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file on which to apply base quality score recalibration bam_index (File, required ): BAM index file corresponding to the input BAM recalibration_report (File, required ): Recalibration report file Defaults memory_gb (Int, default=25): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4): Number of cores to allocate for task prefix (String, default=basename(bam,\".bam\")): Prefix for the output recalibrated BAM. The extension .bqsr.bam will be added. use_original_quality_scores (Boolean, default=false): Use original quality scores from the input BAM. Default is to use recalibrated quality scores. Outputs recalibrated_bam (File) recalibrated_bam_index (File) haplotype_caller description Calls germline SNPs and indels via local re-assembly of haplotypes. external_help https://gatk.broadinstitute.org/hc/en-us/articles/360037225632-HaplotypeCaller outputs {'vcf': 'VCF file containing called variants', 'vcf_index': 'Index file for the VCF'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file on which to call variants bam_index (File, required ): BAM index file corresponding to the input BAM dbSNP_vcf (File, required ): dbSNP VCF file dbSNP_vcf_index (File, required ): dbSNP VCF index file dict (File, required ): Dictionary file for FASTA format genome fasta (File, required ): Reference genome in FASTA format fasta_index (File, required ): Index for FASTA format genome interval_list (File, required ); description : Interval list indicating regions in which to call variants; external_help : https://gatk.broadinstitute.org/hc/en-us/articles/360035531852-Intervals-and-interval-lists Defaults memory_gb (Int, default=25): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4): Number of cores to allocate for task prefix (String, default=basename(bam,\".bam\")): Prefix for the output VCF. The extension .vcf.gz will be added. stand_call_conf (Int, default=20); description : Minimum confidence threshold for calling variants; external_help : https://gatk.broadinstitute.org/hc/en-us/articles/360037225632-HaplotypeCaller#--standard-min-confidence-threshold-for-calling use_soft_clipped_bases (Boolean, default=false): Use soft clipped bases in variant calling. Default is to ignore soft clipped bases. Outputs vcf (File) vcf_index (File) variant_filtration description Filters variants based on specified criteria. external_help https://gatk.broadinstitute.org/hc/en-us/articles/360037434691-VariantFiltration outputs {'vcf_filtered': 'Filtered VCF file', 'vcf_filtered_index': 'Index file for the filtered VCF'} Inputs Required _runtime (Any, required ) dict (File, required ): Dictionary file for FASTA format genome fasta (File, required ): Reference genome in FASTA format fasta_index (File, required ): Index for FASTA format genome vcf (File, required ): Input VCF format file to filter vcf_index (File, required ): VCF index file corresponding to the input VCF Defaults cluster (Int, default=3): Number of SNPs that must be present in a window to filter filter_expressions (Array[String], default=[\"FS > 30.0\", \"QD < 2.0\"]); description : Expressions for the filters; external_help : https://gatk.broadinstitute.org/hc/en-us/articles/360037434691-VariantFiltration#--filter-expression filter_names (Array[String], default=[\"FS\", \"QD\"]); description : Names of the filters to apply; external_help : https://gatk.broadinstitute.org/hc/en-us/articles/360037434691-VariantFiltration#--filter-name modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=1): Number of cores to allocate for task prefix (String, default=basename(vcf,\".vcf.gz\")): Prefix for the output filtered VCF. The extension .filtered.vcf.gz will be added. window (Int, default=35): Size of the window (in bases) for filtering Outputs vcf_filtered (File) vcf_filtered_index (File)","title":"Gatk4"},{"location":"tasks/gatk4/#split_n_cigar_reads","text":"description Splits reads that contain Ns in their CIGAR strings into multiple reads. external_help https://gatk.broadinstitute.org/hc/en-us/articles/360036858811-SplitNCigarReads outputs {'split_n_reads_bam': 'BAM file with reads split at N CIGAR elements and updated CIGAR strings.', 'split_n_reads_bam_index': 'Index file for the split BAM', 'split_n_reads_bam_md5': 'MD5 checksum for the split BAM'}","title":"split_n_cigar_reads"},{"location":"tasks/gatk4/#inputs","text":"","title":"Inputs"},{"location":"tasks/gatk4/#required","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to with unsplit reads containing Ns in their CIGAR strings. bam_index (File, required ): BAM index file corresponding to the input BAM dict (File, required ): Dictionary file for FASTA format genome fasta (File, required ): Reference genome in FASTA format. Must be uncompressed. fasta_index (File, required ): Index for FASTA format genome interval_list (File, required ): Interval list indicating regions in which to split reads","title":"Required"},{"location":"tasks/gatk4/#defaults","text":"memory_gb (Int, default=25): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=8): Number of cores to allocate for task prefix (String, default=basename(bam,\".bam\") + \".split\"): Prefix for the BAM file. The extension .bam will be added.","title":"Defaults"},{"location":"tasks/gatk4/#outputs","text":"split_n_reads_bam (File) split_n_reads_bam_index (File) split_n_reads_bam_md5 (File)","title":"Outputs"},{"location":"tasks/gatk4/#base_recalibrator","text":"description Generates recalibration report for base quality score recalibration. external_help https://gatk.broadinstitute.org/hc/en-us/articles/360036897372-BaseRecalibratorSpark-BETA outputs {'recalibration_report': 'Recalibration report file'}","title":"base_recalibrator"},{"location":"tasks/gatk4/#inputs_1","text":"","title":"Inputs"},{"location":"tasks/gatk4/#required_1","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file on which to recabilbrate base quality scores bam_index (File, required ): BAM index file corresponding to the input BAM dbSNP_vcf (File, required ): dbSNP VCF file dbSNP_vcf_index (File, required ): dbSNP VCF index file dict (File, required ): Dictionary file for FASTA format genome fasta (File, required ): Reference genome in FASTA format fasta_index (File, required ): Index for FASTA format genome known_indels_sites_VCFs (Array[File], required ): List of VCF files containing known indels known_indels_sites_indices (Array[File], required ): List of VCF index files corresponding to the VCF files in known_indels_sites_VCFs","title":"Required"},{"location":"tasks/gatk4/#defaults_1","text":"memory_gb (Int, default=25): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4): Number of cores to allocate for task outfile_name (String, default=basename(bam,\".bam\") + \".recal.txt\"): Name for the output recalibration report. use_original_quality_scores (Boolean, default=false): Use original quality scores from the input BAM. Default is to use recalibrated quality scores.","title":"Defaults"},{"location":"tasks/gatk4/#outputs_1","text":"recalibration_report (File)","title":"Outputs"},{"location":"tasks/gatk4/#apply_bqsr","text":"description Applies base quality score recalibration to a BAM file. external_help https://gatk.broadinstitute.org/hc/en-us/articles/360040097972-ApplyBQSRSpark-BETA outputs {'recalibrated_bam': 'Recalibrated BAM file', 'recalibrated_bam_index': 'Index file for the recalibrated BAM'}","title":"apply_bqsr"},{"location":"tasks/gatk4/#inputs_2","text":"","title":"Inputs"},{"location":"tasks/gatk4/#required_2","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file on which to apply base quality score recalibration bam_index (File, required ): BAM index file corresponding to the input BAM recalibration_report (File, required ): Recalibration report file","title":"Required"},{"location":"tasks/gatk4/#defaults_2","text":"memory_gb (Int, default=25): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4): Number of cores to allocate for task prefix (String, default=basename(bam,\".bam\")): Prefix for the output recalibrated BAM. The extension .bqsr.bam will be added. use_original_quality_scores (Boolean, default=false): Use original quality scores from the input BAM. Default is to use recalibrated quality scores.","title":"Defaults"},{"location":"tasks/gatk4/#outputs_2","text":"recalibrated_bam (File) recalibrated_bam_index (File)","title":"Outputs"},{"location":"tasks/gatk4/#haplotype_caller","text":"description Calls germline SNPs and indels via local re-assembly of haplotypes. external_help https://gatk.broadinstitute.org/hc/en-us/articles/360037225632-HaplotypeCaller outputs {'vcf': 'VCF file containing called variants', 'vcf_index': 'Index file for the VCF'}","title":"haplotype_caller"},{"location":"tasks/gatk4/#inputs_3","text":"","title":"Inputs"},{"location":"tasks/gatk4/#required_3","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file on which to call variants bam_index (File, required ): BAM index file corresponding to the input BAM dbSNP_vcf (File, required ): dbSNP VCF file dbSNP_vcf_index (File, required ): dbSNP VCF index file dict (File, required ): Dictionary file for FASTA format genome fasta (File, required ): Reference genome in FASTA format fasta_index (File, required ): Index for FASTA format genome interval_list (File, required ); description : Interval list indicating regions in which to call variants; external_help : https://gatk.broadinstitute.org/hc/en-us/articles/360035531852-Intervals-and-interval-lists","title":"Required"},{"location":"tasks/gatk4/#defaults_3","text":"memory_gb (Int, default=25): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4): Number of cores to allocate for task prefix (String, default=basename(bam,\".bam\")): Prefix for the output VCF. The extension .vcf.gz will be added. stand_call_conf (Int, default=20); description : Minimum confidence threshold for calling variants; external_help : https://gatk.broadinstitute.org/hc/en-us/articles/360037225632-HaplotypeCaller#--standard-min-confidence-threshold-for-calling use_soft_clipped_bases (Boolean, default=false): Use soft clipped bases in variant calling. Default is to ignore soft clipped bases.","title":"Defaults"},{"location":"tasks/gatk4/#outputs_3","text":"vcf (File) vcf_index (File)","title":"Outputs"},{"location":"tasks/gatk4/#variant_filtration","text":"description Filters variants based on specified criteria. external_help https://gatk.broadinstitute.org/hc/en-us/articles/360037434691-VariantFiltration outputs {'vcf_filtered': 'Filtered VCF file', 'vcf_filtered_index': 'Index file for the filtered VCF'}","title":"variant_filtration"},{"location":"tasks/gatk4/#inputs_4","text":"","title":"Inputs"},{"location":"tasks/gatk4/#required_4","text":"_runtime (Any, required ) dict (File, required ): Dictionary file for FASTA format genome fasta (File, required ): Reference genome in FASTA format fasta_index (File, required ): Index for FASTA format genome vcf (File, required ): Input VCF format file to filter vcf_index (File, required ): VCF index file corresponding to the input VCF","title":"Required"},{"location":"tasks/gatk4/#defaults_4","text":"cluster (Int, default=3): Number of SNPs that must be present in a window to filter filter_expressions (Array[String], default=[\"FS > 30.0\", \"QD < 2.0\"]); description : Expressions for the filters; external_help : https://gatk.broadinstitute.org/hc/en-us/articles/360037434691-VariantFiltration#--filter-expression filter_names (Array[String], default=[\"FS\", \"QD\"]); description : Names of the filters to apply; external_help : https://gatk.broadinstitute.org/hc/en-us/articles/360037434691-VariantFiltration#--filter-name modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=1): Number of cores to allocate for task prefix (String, default=basename(vcf,\".vcf.gz\")): Prefix for the output filtered VCF. The extension .filtered.vcf.gz will be added. window (Int, default=35): Size of the window (in bases) for filtering","title":"Defaults"},{"location":"tasks/gatk4/#outputs_4","text":"vcf_filtered (File) vcf_filtered_index (File)","title":"Outputs"},{"location":"tasks/htseq/","text":"Homepage count description Performs read counting for a set of features in the input BAM file outputs {'feature_counts': 'A two column headerless TSV file. First column is feature names and second column is counts.'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to generate coverage for gtf (File, required ): Input genomic features in gzipped GTF format to count reads for strandedness (String, required ); description : Strandedness protocol of the RNA-Seq experiment; external_help : https://htseq.readthedocs.io/en/latest/htseqcount.html#cmdoption-htseq-count-s; choices : ['yes', 'reverse', 'no'] Defaults feature_type (String, default=\"exon\"); description : Feature type (3rd column in GTF file) to be used, all features of other type are ignored; common : true idattr (String, default=\"gene_name\"); description : GFF attribute to be used as feature ID; common : true include_custom_header (Boolean, default=false); description : Include a custom header for the output file? This is not an official feature of HTSeq. If true, the first line of the output file will be feature ~{prefix} . This may break downstream tools that expect the typical headerless HTSeq output format.; common : true minaqual (Int, default=10); description : Skip all reads with alignment quality lower than the given minimum value; common : true mode (String, default=\"union\"); description : Mode to handle reads overlapping more than one feature. union is recommended for most use-cases.; external_help : https://htseq.readthedocs.io/en/latest/htseqcount.html#htseq-count-counting-reads-within-features; choices : ['union', 'intersection-strict', 'intersection-nonempty'] modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. nonunique (Boolean, default=false); description : Score reads that align to or are assigned to more than one feature?; common : true pos_sorted (Boolean, default=true); description : Is the BAM position sorted (true) or name sorted (false)?; common : true prefix (String, default=basename(bam,\".bam\")): Prefix for the feature counts file. The extension .feature-counts.txt will be added. secondary_alignments (Boolean, default=false); description : Score secondary alignments (SAM flag 0x100)?; common : true supplementary_alignments (Boolean, default=false); description : Score supplementary/chimeric alignments (SAM flag 0x800)?; common : true Outputs feature_counts (File) calc_tpm description Given a gene counts file and a gene lengths file, calculate Transcripts Per Million (TPM) outputs {'tpm_file': 'Transcripts Per Million (TPM) file. A two column headered TSV file.'} Inputs Required _runtime (Any, required ) counts (File, required ): A two column headerless TSV file with gene names in the first column and counts (as integers) in the second column. Entries starting with '__' will be discarded. Can be generated with the count task. gene_lengths (File, required ): A two column headered TSV file with gene names (matching those in the counts file) in the first column and feature lengths (as integers) in the second column. Can be generated with the calc_gene_lengths task in util.wdl . Defaults prefix (String, default=basename(counts,\".feature-counts.txt\")): Prefix for the TPM file. The extension .TPM.txt will be added. Outputs tpm_file (File)","title":"Htseq"},{"location":"tasks/htseq/#count","text":"description Performs read counting for a set of features in the input BAM file outputs {'feature_counts': 'A two column headerless TSV file. First column is feature names and second column is counts.'}","title":"count"},{"location":"tasks/htseq/#inputs","text":"","title":"Inputs"},{"location":"tasks/htseq/#required","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to generate coverage for gtf (File, required ): Input genomic features in gzipped GTF format to count reads for strandedness (String, required ); description : Strandedness protocol of the RNA-Seq experiment; external_help : https://htseq.readthedocs.io/en/latest/htseqcount.html#cmdoption-htseq-count-s; choices : ['yes', 'reverse', 'no']","title":"Required"},{"location":"tasks/htseq/#defaults","text":"feature_type (String, default=\"exon\"); description : Feature type (3rd column in GTF file) to be used, all features of other type are ignored; common : true idattr (String, default=\"gene_name\"); description : GFF attribute to be used as feature ID; common : true include_custom_header (Boolean, default=false); description : Include a custom header for the output file? This is not an official feature of HTSeq. If true, the first line of the output file will be feature ~{prefix} . This may break downstream tools that expect the typical headerless HTSeq output format.; common : true minaqual (Int, default=10); description : Skip all reads with alignment quality lower than the given minimum value; common : true mode (String, default=\"union\"); description : Mode to handle reads overlapping more than one feature. union is recommended for most use-cases.; external_help : https://htseq.readthedocs.io/en/latest/htseqcount.html#htseq-count-counting-reads-within-features; choices : ['union', 'intersection-strict', 'intersection-nonempty'] modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. nonunique (Boolean, default=false); description : Score reads that align to or are assigned to more than one feature?; common : true pos_sorted (Boolean, default=true); description : Is the BAM position sorted (true) or name sorted (false)?; common : true prefix (String, default=basename(bam,\".bam\")): Prefix for the feature counts file. The extension .feature-counts.txt will be added. secondary_alignments (Boolean, default=false); description : Score secondary alignments (SAM flag 0x100)?; common : true supplementary_alignments (Boolean, default=false); description : Score supplementary/chimeric alignments (SAM flag 0x800)?; common : true","title":"Defaults"},{"location":"tasks/htseq/#outputs","text":"feature_counts (File)","title":"Outputs"},{"location":"tasks/htseq/#calc_tpm","text":"description Given a gene counts file and a gene lengths file, calculate Transcripts Per Million (TPM) outputs {'tpm_file': 'Transcripts Per Million (TPM) file. A two column headered TSV file.'}","title":"calc_tpm"},{"location":"tasks/htseq/#inputs_1","text":"","title":"Inputs"},{"location":"tasks/htseq/#required_1","text":"_runtime (Any, required ) counts (File, required ): A two column headerless TSV file with gene names in the first column and counts (as integers) in the second column. Entries starting with '__' will be discarded. Can be generated with the count task. gene_lengths (File, required ): A two column headered TSV file with gene names (matching those in the counts file) in the first column and feature lengths (as integers) in the second column. Can be generated with the calc_gene_lengths task in util.wdl .","title":"Required"},{"location":"tasks/htseq/#defaults_1","text":"prefix (String, default=basename(counts,\".feature-counts.txt\")): Prefix for the TPM file. The extension .TPM.txt will be added.","title":"Defaults"},{"location":"tasks/htseq/#outputs_1","text":"tpm_file (File)","title":"Outputs"},{"location":"tasks/kraken2/","text":"Homepage download_taxonomy description Downloads the NCBI taxonomy which Kraken2 uses to create a tree and taxon map during the database build outputs {'taxonomy': 'The NCBI taxonomy, which is needed by the build_db task. This output is not human-readable or meant for anything other than building a Kraken2 database.'} Inputs Required _runtime (Any, required ) Defaults protein (Boolean, default=false): Construct a protein database? Outputs taxonomy (File) download_library description Downloads a predefined library of reference genomes from NCBI. Detailed organism list for libraries (except nt) available here outputs {'library': 'A library of reference genomes, which is needed by the build_db task. This output is not human-readable or meant for anything other than building a Kraken2 database.'} Inputs Required _runtime (Any, required ) library_name (String, required ); description : Library to download. Note that protein must equal true if downloading the nr library, and protein must equal false if downloading the UniVec or UniVec_Core library.; choices : ['archaea', 'bacteria', 'plasmid', 'viral', 'human', 'fungi', 'plant', 'protozoa', 'nt', 'nr', 'UniVec', 'UniVec_Core'] Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation, specified in GB. Default disk size is determined dynamically based on library_name . Note that the default sizes are adequate as of April 2023, but new genomes are constantly being added to the NCBI database. More disk space may be required depending on when in the future this task is run. protein (Boolean, default=false): Construct a protein database? Outputs library (File) create_library_from_fastas description Adds custom entries from FASTA files to a Kraken2 DB outputs {'custom_library': 'Kraken2 compatible library, which is needed by the build_db task. This output is not human-readable or meant for anything other than building a Kraken2 database.'} Inputs Required _runtime (Any, required ) fastas_gz (Array[File], required ): Array of gzipped FASTA files. Each FASTA sequence ID must contain either an NCBI accession number or an explicit assignment of the taxonomy ID using kraken:taxid Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. protein (Boolean, default=false): Construct a protein database? Outputs custom_library (File) build_db description Builds a custom Kraken2 database outputs {'built_db': 'A complete Kraken2 database'} Inputs Required _runtime (Any, required ) tarballs (Array[File], required ): Tarballs containing the NCBI taxonomy (generated by the download_taxonomy task) and at least one library (generated by the download_library or create_library_from_fastas task). Tarballs must not have a root directory. Defaults db_name (String, default=\"kraken2_db\"); description : Name for output in compressed, archived format. The suffix .tar.gz will be added.; common : true kmer_len (Int, default=if protein then 15 else 35): K-mer length in bp that will be used to build the database max_db_size_gb (Int, default=-1): Maximum number of GBs for Kraken 2 hash table; if the Kraken 2 estimator determines more would normally be needed, the reference library will be downsampled to fit. minimizer_len (Int, default=if protein then 12 else 31): Minimizer length in bp that will be used to build the database minimizer_spaces (Int, default=if protein then 0 else 7): Number of characters in minimizer that are ignored in comparisons modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4); description : Number of cores to allocate for task; common : true protein (Boolean, default=false): Construct a protein database? use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs built_db (File) kraken description Runs Kraken2 on a pair of fastq files outputs {'report': {'description': 'A Kraken2 summary report', 'external_help': 'https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#sample-report-output-format'}, 'sequences': {'description': 'Detailed Kraken2 output that has been gzipped', 'external_help': 'https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#standard-kraken-output-format'}} Inputs Required _runtime (Any, required ) db (File, required ): Kraken2 database. Can be generated with make-qc-reference.wdl . Must be a tarball without a root directory. read_one_fastq_gz (File, required ): Gzipped FASTQ file with 1st reads in pair read_two_fastq_gz (File, required ): Gzipped FASTQ file with 2nd reads in pair Defaults min_base_quality (Int, default=0): Minimum base quality used in classification modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4); description : Number of cores to allocate for task; common : true prefix (String, default=sub(basename(read_one_fastq_gz),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\")): Prefix for the Kraken2 output files. The extensions .kraken2.txt and .kraken2.sequences.txt.gz will be added. store_sequences (Boolean, default=false); description : Store and output main Kraken2 output in addition to the summary report?; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true use_names (Boolean, default=true): Print scientific names instead of just taxids? Outputs report (File) sequences (File?)","title":"Kraken2"},{"location":"tasks/kraken2/#download_taxonomy","text":"description Downloads the NCBI taxonomy which Kraken2 uses to create a tree and taxon map during the database build outputs {'taxonomy': 'The NCBI taxonomy, which is needed by the build_db task. This output is not human-readable or meant for anything other than building a Kraken2 database.'}","title":"download_taxonomy"},{"location":"tasks/kraken2/#inputs","text":"","title":"Inputs"},{"location":"tasks/kraken2/#required","text":"_runtime (Any, required )","title":"Required"},{"location":"tasks/kraken2/#defaults","text":"protein (Boolean, default=false): Construct a protein database?","title":"Defaults"},{"location":"tasks/kraken2/#outputs","text":"taxonomy (File)","title":"Outputs"},{"location":"tasks/kraken2/#download_library","text":"description Downloads a predefined library of reference genomes from NCBI. Detailed organism list for libraries (except nt) available here outputs {'library': 'A library of reference genomes, which is needed by the build_db task. This output is not human-readable or meant for anything other than building a Kraken2 database.'}","title":"download_library"},{"location":"tasks/kraken2/#inputs_1","text":"","title":"Inputs"},{"location":"tasks/kraken2/#required_1","text":"_runtime (Any, required ) library_name (String, required ); description : Library to download. Note that protein must equal true if downloading the nr library, and protein must equal false if downloading the UniVec or UniVec_Core library.; choices : ['archaea', 'bacteria', 'plasmid', 'viral', 'human', 'fungi', 'plant', 'protozoa', 'nt', 'nr', 'UniVec', 'UniVec_Core']","title":"Required"},{"location":"tasks/kraken2/#defaults_1","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation, specified in GB. Default disk size is determined dynamically based on library_name . Note that the default sizes are adequate as of April 2023, but new genomes are constantly being added to the NCBI database. More disk space may be required depending on when in the future this task is run. protein (Boolean, default=false): Construct a protein database?","title":"Defaults"},{"location":"tasks/kraken2/#outputs_1","text":"library (File)","title":"Outputs"},{"location":"tasks/kraken2/#create_library_from_fastas","text":"description Adds custom entries from FASTA files to a Kraken2 DB outputs {'custom_library': 'Kraken2 compatible library, which is needed by the build_db task. This output is not human-readable or meant for anything other than building a Kraken2 database.'}","title":"create_library_from_fastas"},{"location":"tasks/kraken2/#inputs_2","text":"","title":"Inputs"},{"location":"tasks/kraken2/#required_2","text":"_runtime (Any, required ) fastas_gz (Array[File], required ): Array of gzipped FASTA files. Each FASTA sequence ID must contain either an NCBI accession number or an explicit assignment of the taxonomy ID using kraken:taxid","title":"Required"},{"location":"tasks/kraken2/#defaults_2","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. protein (Boolean, default=false): Construct a protein database?","title":"Defaults"},{"location":"tasks/kraken2/#outputs_2","text":"custom_library (File)","title":"Outputs"},{"location":"tasks/kraken2/#build_db","text":"description Builds a custom Kraken2 database outputs {'built_db': 'A complete Kraken2 database'}","title":"build_db"},{"location":"tasks/kraken2/#inputs_3","text":"","title":"Inputs"},{"location":"tasks/kraken2/#required_3","text":"_runtime (Any, required ) tarballs (Array[File], required ): Tarballs containing the NCBI taxonomy (generated by the download_taxonomy task) and at least one library (generated by the download_library or create_library_from_fastas task). Tarballs must not have a root directory.","title":"Required"},{"location":"tasks/kraken2/#defaults_3","text":"db_name (String, default=\"kraken2_db\"); description : Name for output in compressed, archived format. The suffix .tar.gz will be added.; common : true kmer_len (Int, default=if protein then 15 else 35): K-mer length in bp that will be used to build the database max_db_size_gb (Int, default=-1): Maximum number of GBs for Kraken 2 hash table; if the Kraken 2 estimator determines more would normally be needed, the reference library will be downsampled to fit. minimizer_len (Int, default=if protein then 12 else 31): Minimizer length in bp that will be used to build the database minimizer_spaces (Int, default=if protein then 0 else 7): Number of characters in minimizer that are ignored in comparisons modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4); description : Number of cores to allocate for task; common : true protein (Boolean, default=false): Construct a protein database? use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/kraken2/#outputs_3","text":"built_db (File)","title":"Outputs"},{"location":"tasks/kraken2/#kraken","text":"description Runs Kraken2 on a pair of fastq files outputs {'report': {'description': 'A Kraken2 summary report', 'external_help': 'https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#sample-report-output-format'}, 'sequences': {'description': 'Detailed Kraken2 output that has been gzipped', 'external_help': 'https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#standard-kraken-output-format'}}","title":"kraken"},{"location":"tasks/kraken2/#inputs_4","text":"","title":"Inputs"},{"location":"tasks/kraken2/#required_4","text":"_runtime (Any, required ) db (File, required ): Kraken2 database. Can be generated with make-qc-reference.wdl . Must be a tarball without a root directory. read_one_fastq_gz (File, required ): Gzipped FASTQ file with 1st reads in pair read_two_fastq_gz (File, required ): Gzipped FASTQ file with 2nd reads in pair","title":"Required"},{"location":"tasks/kraken2/#defaults_4","text":"min_base_quality (Int, default=0): Minimum base quality used in classification modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4); description : Number of cores to allocate for task; common : true prefix (String, default=sub(basename(read_one_fastq_gz),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\")): Prefix for the Kraken2 output files. The extensions .kraken2.txt and .kraken2.sequences.txt.gz will be added. store_sequences (Boolean, default=false); description : Store and output main Kraken2 output in addition to the summary report?; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true use_names (Boolean, default=true): Print scientific names instead of just taxids?","title":"Defaults"},{"location":"tasks/kraken2/#outputs_4","text":"report (File) sequences (File?)","title":"Outputs"},{"location":"tasks/librarian/","text":"librarian librarian description Runs the librarian tool to derive the likely Illumina library preparation protocol used to generate a pair of FASTQ files. help WARNING this tool is not guaranteed to work on all data, and may produce nonsensical results. librarian was trained on a limited set of GEO read data (Gene Expression Oriented). This means the input data should be Paired-End, of mouse or human origin, read length should be >50bp, and derived from a library prep kit that is in the librarian database. This version of librarian has been trained on \"read one\" data of Paired-End sequencing data. It is not intended for use with Single-End data, even though it only accepts a single FASTQ. output {'report': 'A tar archive containing the librarian report and raw data.', 'raw_data': 'The raw data that can be processed by MultiQC.'} Inputs Required _runtime (Any, required ) read_one_fastq (File, required ): Read one FASTQ of a Paired-End sample to analyze. May be uncompressed or gzipped. Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=sub(basename(read_one_fastq),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\") + \".librarian\"): Name of the output tar archive. The extension .tar.gz will be added. Outputs report (File) raw_data (File)","title":"librarian"},{"location":"tasks/librarian/#librarian","text":"","title":"librarian"},{"location":"tasks/librarian/#librarian_1","text":"description Runs the librarian tool to derive the likely Illumina library preparation protocol used to generate a pair of FASTQ files. help WARNING this tool is not guaranteed to work on all data, and may produce nonsensical results. librarian was trained on a limited set of GEO read data (Gene Expression Oriented). This means the input data should be Paired-End, of mouse or human origin, read length should be >50bp, and derived from a library prep kit that is in the librarian database. This version of librarian has been trained on \"read one\" data of Paired-End sequencing data. It is not intended for use with Single-End data, even though it only accepts a single FASTQ. output {'report': 'A tar archive containing the librarian report and raw data.', 'raw_data': 'The raw data that can be processed by MultiQC.'}","title":"librarian"},{"location":"tasks/librarian/#inputs","text":"","title":"Inputs"},{"location":"tasks/librarian/#required","text":"_runtime (Any, required ) read_one_fastq (File, required ): Read one FASTQ of a Paired-End sample to analyze. May be uncompressed or gzipped.","title":"Required"},{"location":"tasks/librarian/#defaults","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=sub(basename(read_one_fastq),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\") + \".librarian\"): Name of the output tar archive. The extension .tar.gz will be added.","title":"Defaults"},{"location":"tasks/librarian/#outputs","text":"report (File) raw_data (File)","title":"Outputs"},{"location":"tasks/md5sum/","text":"Homepage compute_checksum description Generates an MD5 checksum for the input file outputs {'md5sum': 'STDOUT of the md5sum command that has been redirected to a file'} Inputs Required _runtime (Any, required ) file (File, required ): Input file to generate MD5 checksum for Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. Outputs md5sum (File)","title":"Md5sum"},{"location":"tasks/md5sum/#compute_checksum","text":"description Generates an MD5 checksum for the input file outputs {'md5sum': 'STDOUT of the md5sum command that has been redirected to a file'}","title":"compute_checksum"},{"location":"tasks/md5sum/#inputs","text":"","title":"Inputs"},{"location":"tasks/md5sum/#required","text":"_runtime (Any, required ) file (File, required ): Input file to generate MD5 checksum for","title":"Required"},{"location":"tasks/md5sum/#defaults","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.","title":"Defaults"},{"location":"tasks/md5sum/#outputs","text":"md5sum (File)","title":"Outputs"},{"location":"tasks/mosdepth/","text":"Homepage coverage description Runs the Mosdepth tool for calculating coverage outputs {'summary': 'A summary of mean depths per chromosome and within specified regions per chromosome', 'global_dist': 'The $prefix.mosdepth.global.dist.txt file contains a cumulative distribution indicating the proportion of total bases that were covered for at least a given coverage value. It does this for each chromosome, and for the whole genome.', 'region_dist': 'The $prefix.mosdepth.region.dist.txt file contains a cumulative distribution indicating the proportion of total bases in the region(s) defined by coverage_bed that were covered for at least a given coverage value'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to calculate coverage for bam_index (File, required ): BAM index file corresponding to the input BAM Optional coverage_bed (File?): BED file to pass to the -b flag of mosdepth . This will restrict coverage analysis to regions defined by the BED file. Defaults min_mapping_quality (Int, default=20); description : Minimum mapping quality to pass to the -Q flag of mosdepth ; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,'.bam')): Prefix for the mosdepth report files. The extensions .mosdepth.summary.txt , .mosdepth.global.dist.txt and .mosdepth.region.dist.txt will be added. use_fast_mode (Boolean, default=true): Use Mosdepth's 'fast mode'? This enables the -x flag. Outputs summary (File) global_dist (File) region_dist (File?)","title":"Mosdepth"},{"location":"tasks/mosdepth/#coverage","text":"description Runs the Mosdepth tool for calculating coverage outputs {'summary': 'A summary of mean depths per chromosome and within specified regions per chromosome', 'global_dist': 'The $prefix.mosdepth.global.dist.txt file contains a cumulative distribution indicating the proportion of total bases that were covered for at least a given coverage value. It does this for each chromosome, and for the whole genome.', 'region_dist': 'The $prefix.mosdepth.region.dist.txt file contains a cumulative distribution indicating the proportion of total bases in the region(s) defined by coverage_bed that were covered for at least a given coverage value'}","title":"coverage"},{"location":"tasks/mosdepth/#inputs","text":"","title":"Inputs"},{"location":"tasks/mosdepth/#required","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to calculate coverage for bam_index (File, required ): BAM index file corresponding to the input BAM","title":"Required"},{"location":"tasks/mosdepth/#optional","text":"coverage_bed (File?): BED file to pass to the -b flag of mosdepth . This will restrict coverage analysis to regions defined by the BED file.","title":"Optional"},{"location":"tasks/mosdepth/#defaults","text":"min_mapping_quality (Int, default=20); description : Minimum mapping quality to pass to the -Q flag of mosdepth ; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,'.bam')): Prefix for the mosdepth report files. The extensions .mosdepth.summary.txt , .mosdepth.global.dist.txt and .mosdepth.region.dist.txt will be added. use_fast_mode (Boolean, default=true): Use Mosdepth's 'fast mode'? This enables the -x flag.","title":"Defaults"},{"location":"tasks/mosdepth/#outputs","text":"summary (File) global_dist (File) region_dist (File?)","title":"Outputs"},{"location":"tasks/multiqc/","text":"Homepage multiqc description Generates a MultiQC quality control metrics report summary from input QC result files outputs {'multiqc_report': 'A gzipped tar archive of all MultiQC output files'} Inputs Required _runtime (Any, required ) input_files (Array[File], required ): An array of files for MultiQC to compile into a report. Invalid files will be gracefully ignored by MultiQC. prefix (String, required ): A string for the MultiQC output directory: / and .tar.gz Optional config (File?): YAML file for configuring generated report Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. Outputs multiqc_report (File)","title":"Multiqc"},{"location":"tasks/multiqc/#multiqc","text":"description Generates a MultiQC quality control metrics report summary from input QC result files outputs {'multiqc_report': 'A gzipped tar archive of all MultiQC output files'}","title":"multiqc"},{"location":"tasks/multiqc/#inputs","text":"","title":"Inputs"},{"location":"tasks/multiqc/#required","text":"_runtime (Any, required ) input_files (Array[File], required ): An array of files for MultiQC to compile into a report. Invalid files will be gracefully ignored by MultiQC. prefix (String, required ): A string for the MultiQC output directory: / and .tar.gz","title":"Required"},{"location":"tasks/multiqc/#optional","text":"config (File?): YAML file for configuring generated report","title":"Optional"},{"location":"tasks/multiqc/#defaults","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.","title":"Defaults"},{"location":"tasks/multiqc/#outputs","text":"multiqc_report (File)","title":"Outputs"},{"location":"tasks/ngsderive/","text":"Homepage strandedness description Derives the experimental strandedness protocol used to generate the input RNA-Seq BAM file. Reports evidence supporting final results. outputs {'strandedness_file': 'TSV file containing the ngsderive strandedness report', 'strandedness': 'The derived strandedness, in string format'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to derive strandedness for bam_index (File, required ): BAM index file corresponding to the input BAM gene_model (File, required ): Gene model as a GFF/GTF file Defaults min_mapq (Int, default=30); description : Minimum MAPQ to consider for supporting reads; common : true min_reads_per_gene (Int, default=10); description : Filter any genes that don't have at least min_reads_per_gene reads mapping to them; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. num_genes (Int, default=1000); description : How many genes to sample; common : true outfile_name (String, default=basename(bam,\".bam\") + \".strandedness.tsv\"): Name for the strandedness TSV file split_by_rg (Boolean, default=false); description : Contain one entry in the output TSV per read group, in addition to an overall entry; common : true Outputs strandedness_string (String) strandedness_file (File) instrument description Derives the instrument used to sequence the input BAM file. Reports evidence supporting final results. outputs {'instrument_file': 'TSV file containing the ngsderive isntrument report for the input BAM file'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to derive instrument for Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. num_reads (Int, default=10000); description : How many reads to analyze from the start of the file. Any n < 1 to parse whole file.; common : true outfile_name (String, default=basename(bam,\".bam\") + \".instrument.tsv\"): Name for the instrument TSV file Outputs instrument_file (File) instrument_string (String) read_length description Derives the original experimental read length of the input BAM. Reports evidence supporting final results. outputs {'read_length_file': 'TSV file containing the ngsderive readlen report for the input BAM file'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to derive read length for bam_index (File, required ): BAM index file corresponding to the input BAM Defaults majority_vote_cutoff (Float, default=0.7); description : To call a majority readlen, the maximum read length must have at least majority-vote-cutoff % reads in support; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. num_reads (Int, default=-1); description : How many reads to analyze from the start of the file. Any n < 1 to parse whole file.; common : true outfile_name (String, default=basename(bam,\".bam\") + \".readlength.tsv\"): Name for the readlen TSV file Outputs read_length_file (File) encoding description Derives the encoding of the input NGS file(s). Reports evidence supporting final results. outputs {'encoding_file': 'TSV file containing the ngsderive encoding report for all input files', 'inferred_encoding': 'The most permissive encoding found among the input files, in string format'} Inputs Required _runtime (Any, required ) ngs_files (Array[File], required ): An array of FASTQs and/or BAMs for which to derive encoding outfile_name (String, required ): Name for the encoding TSV file Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. num_reads (Int, default=1000000); description : How many reads to analyze from the start of the file(s). Any n < 1 to parse whole file(s).; common : true Outputs inferred_encoding (String) encoding_file (File) junction_annotation description Annotates junctions found in an RNA-Seq BAM as known, novel, or partially novel external_help https://stjudecloud.github.io/ngsderive/subcommands/junction_annotation/ outputs {'junction_summary': 'TSV file containing the ngsderive junction-annotation summary', 'junctions': 'TSV file containing a detailed list of annotated junctions'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to annotate junctions for bam_index (File, required ): BAM index file corresponding to the input BAM gene_model (File, required ): Gene model as a GFF/GTF file Defaults fuzzy_junction_match_range (Int, default=0); description : Consider found splices within +-k bases of a known splice event annotated; common : true min_intron (Int, default=50); description : Minimum size of intron to be considered a splice; common : true min_mapq (Int, default=30); description : Minimum MAPQ to consider for supporting reads; common : true min_reads (Int, default=2); description : Filter any junctions that don't have at least min_reads reads supporting them; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\")): Prefix for the summary TSV and junction files. The extensions .junction_summary.tsv and .junctions.tsv will be added. Outputs junction_summary (File) junctions (File) endedness description Derives the endedness of the input BAM file. Reports evidence for final result. outputs {'endedness_file': 'TSV file containing the ngsderive endedness report'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to derive endedness from Defaults calc_rpt (Boolean, default=false); description : Calculate and output Reads-Per-Template. This will produce a more sophisticated estimate for endedness, but uses substantially more memory (can reach up to 200% of BAM size in memory consumption for some inputs).; common : true lenient (Boolean, default=false); description : Return a zero exit code on unknown results; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by value of calc_rpt and the size of the input. Specified in GB. num_reads (Int, default=-1); description : How many reads to analyze from the start of the file. Any n < 1 to parse whole file.; common : true outfile_name (String, default=basename(bam,\".bam\") + \".endedness.tsv\"): Name for the endedness TSV file paired_deviance (Float, default=0.0); description : Distance from 0.5 split between number of f+l- reads and f-l+ reads allowed to be called 'Paired-End'. Default of 0.0 only appropriate if the whole file is being processed.; common : true round_rpt (Boolean, default=false); description : Round RPT to the nearest INT before comparing to expected values. Appropriate if using --num-reads > 0.; common : true split_by_rg (Boolean, default=false); description : Contain one entry per read group; common : true Outputs endedness_file (File)","title":"Ngsderive"},{"location":"tasks/ngsderive/#strandedness","text":"description Derives the experimental strandedness protocol used to generate the input RNA-Seq BAM file. Reports evidence supporting final results. outputs {'strandedness_file': 'TSV file containing the ngsderive strandedness report', 'strandedness': 'The derived strandedness, in string format'}","title":"strandedness"},{"location":"tasks/ngsderive/#inputs","text":"","title":"Inputs"},{"location":"tasks/ngsderive/#required","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to derive strandedness for bam_index (File, required ): BAM index file corresponding to the input BAM gene_model (File, required ): Gene model as a GFF/GTF file","title":"Required"},{"location":"tasks/ngsderive/#defaults","text":"min_mapq (Int, default=30); description : Minimum MAPQ to consider for supporting reads; common : true min_reads_per_gene (Int, default=10); description : Filter any genes that don't have at least min_reads_per_gene reads mapping to them; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. num_genes (Int, default=1000); description : How many genes to sample; common : true outfile_name (String, default=basename(bam,\".bam\") + \".strandedness.tsv\"): Name for the strandedness TSV file split_by_rg (Boolean, default=false); description : Contain one entry in the output TSV per read group, in addition to an overall entry; common : true","title":"Defaults"},{"location":"tasks/ngsderive/#outputs","text":"strandedness_string (String) strandedness_file (File)","title":"Outputs"},{"location":"tasks/ngsderive/#instrument","text":"description Derives the instrument used to sequence the input BAM file. Reports evidence supporting final results. outputs {'instrument_file': 'TSV file containing the ngsderive isntrument report for the input BAM file'}","title":"instrument"},{"location":"tasks/ngsderive/#inputs_1","text":"","title":"Inputs"},{"location":"tasks/ngsderive/#required_1","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to derive instrument for","title":"Required"},{"location":"tasks/ngsderive/#defaults_1","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. num_reads (Int, default=10000); description : How many reads to analyze from the start of the file. Any n < 1 to parse whole file.; common : true outfile_name (String, default=basename(bam,\".bam\") + \".instrument.tsv\"): Name for the instrument TSV file","title":"Defaults"},{"location":"tasks/ngsderive/#outputs_1","text":"instrument_file (File) instrument_string (String)","title":"Outputs"},{"location":"tasks/ngsderive/#read_length","text":"description Derives the original experimental read length of the input BAM. Reports evidence supporting final results. outputs {'read_length_file': 'TSV file containing the ngsderive readlen report for the input BAM file'}","title":"read_length"},{"location":"tasks/ngsderive/#inputs_2","text":"","title":"Inputs"},{"location":"tasks/ngsderive/#required_2","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to derive read length for bam_index (File, required ): BAM index file corresponding to the input BAM","title":"Required"},{"location":"tasks/ngsderive/#defaults_2","text":"majority_vote_cutoff (Float, default=0.7); description : To call a majority readlen, the maximum read length must have at least majority-vote-cutoff % reads in support; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. num_reads (Int, default=-1); description : How many reads to analyze from the start of the file. Any n < 1 to parse whole file.; common : true outfile_name (String, default=basename(bam,\".bam\") + \".readlength.tsv\"): Name for the readlen TSV file","title":"Defaults"},{"location":"tasks/ngsderive/#outputs_2","text":"read_length_file (File)","title":"Outputs"},{"location":"tasks/ngsderive/#encoding","text":"description Derives the encoding of the input NGS file(s). Reports evidence supporting final results. outputs {'encoding_file': 'TSV file containing the ngsderive encoding report for all input files', 'inferred_encoding': 'The most permissive encoding found among the input files, in string format'}","title":"encoding"},{"location":"tasks/ngsderive/#inputs_3","text":"","title":"Inputs"},{"location":"tasks/ngsderive/#required_3","text":"_runtime (Any, required ) ngs_files (Array[File], required ): An array of FASTQs and/or BAMs for which to derive encoding outfile_name (String, required ): Name for the encoding TSV file","title":"Required"},{"location":"tasks/ngsderive/#defaults_3","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. num_reads (Int, default=1000000); description : How many reads to analyze from the start of the file(s). Any n < 1 to parse whole file(s).; common : true","title":"Defaults"},{"location":"tasks/ngsderive/#outputs_3","text":"inferred_encoding (String) encoding_file (File)","title":"Outputs"},{"location":"tasks/ngsderive/#junction_annotation","text":"description Annotates junctions found in an RNA-Seq BAM as known, novel, or partially novel external_help https://stjudecloud.github.io/ngsderive/subcommands/junction_annotation/ outputs {'junction_summary': 'TSV file containing the ngsderive junction-annotation summary', 'junctions': 'TSV file containing a detailed list of annotated junctions'}","title":"junction_annotation"},{"location":"tasks/ngsderive/#inputs_4","text":"","title":"Inputs"},{"location":"tasks/ngsderive/#required_4","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to annotate junctions for bam_index (File, required ): BAM index file corresponding to the input BAM gene_model (File, required ): Gene model as a GFF/GTF file","title":"Required"},{"location":"tasks/ngsderive/#defaults_4","text":"fuzzy_junction_match_range (Int, default=0); description : Consider found splices within +-k bases of a known splice event annotated; common : true min_intron (Int, default=50); description : Minimum size of intron to be considered a splice; common : true min_mapq (Int, default=30); description : Minimum MAPQ to consider for supporting reads; common : true min_reads (Int, default=2); description : Filter any junctions that don't have at least min_reads reads supporting them; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\")): Prefix for the summary TSV and junction files. The extensions .junction_summary.tsv and .junctions.tsv will be added.","title":"Defaults"},{"location":"tasks/ngsderive/#outputs_4","text":"junction_summary (File) junctions (File)","title":"Outputs"},{"location":"tasks/ngsderive/#endedness","text":"description Derives the endedness of the input BAM file. Reports evidence for final result. outputs {'endedness_file': 'TSV file containing the ngsderive endedness report'}","title":"endedness"},{"location":"tasks/ngsderive/#inputs_5","text":"","title":"Inputs"},{"location":"tasks/ngsderive/#required_5","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to derive endedness from","title":"Required"},{"location":"tasks/ngsderive/#defaults_5","text":"calc_rpt (Boolean, default=false); description : Calculate and output Reads-Per-Template. This will produce a more sophisticated estimate for endedness, but uses substantially more memory (can reach up to 200% of BAM size in memory consumption for some inputs).; common : true lenient (Boolean, default=false); description : Return a zero exit code on unknown results; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by value of calc_rpt and the size of the input. Specified in GB. num_reads (Int, default=-1); description : How many reads to analyze from the start of the file. Any n < 1 to parse whole file.; common : true outfile_name (String, default=basename(bam,\".bam\") + \".endedness.tsv\"): Name for the endedness TSV file paired_deviance (Float, default=0.0); description : Distance from 0.5 split between number of f+l- reads and f-l+ reads allowed to be called 'Paired-End'. Default of 0.0 only appropriate if the whole file is being processed.; common : true round_rpt (Boolean, default=false); description : Round RPT to the nearest INT before comparing to expected values. Appropriate if using --num-reads > 0.; common : true split_by_rg (Boolean, default=false); description : Contain one entry per read group; common : true","title":"Defaults"},{"location":"tasks/ngsderive/#outputs_5","text":"endedness_file (File)","title":"Outputs"},{"location":"tasks/picard/","text":"Homepage TODO looks like this file was missed when converting from a memory_gb parameter to a \"softcoded\" runtime block. When moving those, check tests/tools/test_picard.yaml . mark_duplicates description Marks duplicate reads in the input BAM file using Picard external_help https://gatk.broadinstitute.org/hc/en-us/articles/360037052812-MarkDuplicates-Picard- help For non-primary reads and unmapped mates of duplicate reads to be included in duplicate analysis, input BAM must be collated. See external_help for more information. outputs {'duplicate_marked_bam': 'The input BAM with computationally determined duplicates marked.', 'duplicate_marked_bam_index': 'The .bai BAM index file associated with duplicate_marked_bam ', 'duplicate_marked_bam_md5': 'The md5sum of duplicate_marked_bam ', 'mark_duplicates_metrics': {'description': 'The METRICS_FILE result of picard MarkDuplicates ', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#DuplicationMetrics'}} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file in which to mark duplicates Defaults clear_dt (Boolean, default=true): Clear the DT tag from the input BAM? For increased performance, if the input BAM does not have the DT tag, set to false . create_bam (Boolean, default=true); description : Enable BAM creation (true)? Or only output MarkDuplicates metrics (false)?; common : true duplicate_scoring_strategy (String, default=\"SUM_OF_BASE_QUALITIES\"); description : Strategy for scoring duplicates.; choices : ['SUM_OF_BASE_QUALITIES', 'TOTAL_MAPPED_REFERENCE_LENGTH', 'RANDOM'] modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from the default memory allocation. Default memory allocation is determined by the size of the input BAM. Specified in GB. optical_distance (Int, default=0): Maximum distance between read coordinates to consider them optical duplicates. If 0 , then optical duplicate marking is disabled. Suggested settings of 100 for unpatterned versions of the Illumina platform (e.g. HiSeq) or 2500 for patterned flowcell models (e.g. NovaSeq). Calculation of distance depends on coordinate data embedded in the read names, typically produced by the Illumina sequencing machines. Optical duplicate detection will not work on non-standard names without modifying read_name_regex . prefix (String, default=basename(bam,\".bam\") + \".MarkDuplicates\"): Prefix for the MarkDuplicates result files. The extensions .bam , .bam.bai , .bam.md5 , and .metrics.txt will be added. read_name_regex (String, default=\"^[!-9;-?A-~:]+:([!-9;-?A-~]+):([0-9]+):([0-9]+)$\"): Regular expression for extracting tile names, x coordinates, and y coordinates from read names. The default works for typical Illumina read names. remove_duplicates (Boolean, default=false): Remove duplicate reads from the output BAM? If true , the output BAM will not contain any duplicate reads. remove_sequencing_duplicates (Boolean, default=false): Remove sequencing duplicates (i.e. optical duplicates) from the output BAM? If true , the output BAM will not contain any sequencing duplicates (optical duplicates). tagging_policy (String, default=\"All\"); description : Tagging policy for the output BAM.; choices : ['DontTag', 'OpticalOnly', 'All'] validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT Outputs duplicate_marked_bam (File?) duplicate_marked_bam_index (File?) duplicate_marked_bam_md5 (File?) mark_duplicates_metrics (File) validate_bam description Validates the input BAM file for correct formatting using Picard external_help https://gatk.broadinstitute.org/hc/en-us/articles/360057440611-ValidateSamFile-Picard- outputs {'validate_report': 'Validation report produced by picard ValidateSamFile . Validation warnings and errors are logged.', 'validated_bam': 'The unmodified input BAM after it has been succesfully validated'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to validate Optional reference_fasta (File?): Reference genome in FASTA format. Presence of the reference FASTA allows for NM tag validation. Defaults ignore_list (Array[String], default=[]); description : List of Picard errors and warnings to ignore. Possible values can be found on the GATK website (see external_help ).; external_help : https://gatk.broadinstitute.org/hc/en-us/articles/360035891231-Errors-in-SAM-or-BAM-files-can-be-diagnosed-with-ValidateSamFile; common : true index_validation_stringency_less_exhaustive (Boolean, default=false): Set INDEX_VALIDATION_STRINGENCY=LESS_EXHAUSTIVE ? max_errors (Int, default=2147483647): Set the value of MAX_OUTPUT for picard ValidateSamFile . The Picard default is 100, a lower number can enable fast fail behavior memory_gb (Int, default=16): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. outfile_name (String, default=basename(bam,\".bam\") + \".ValidateSamFile.txt\"): Name for the ValidateSamFile report file succeed_on_errors (Boolean, default=false); description : Succeed the task even if errors and/or warnings are detected; common : true succeed_on_warnings (Boolean, default=true); description : Succeed the task if warnings are detected and there are no errors. Overridden by succeed_on_errors ; common : true summary_mode (Boolean, default=false); description : Enable SUMMARY mode?; common : true validation_stringency (String, default=\"LENIENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT Outputs validate_report (File) sort description Sorts the input BAM file external_help https://gatk.broadinstitute.org/hc/en-us/articles/360036510732-SortSam-Picard- outputs {'sorted_bam': 'The input BAM after it has been sorted according to sort_order ', 'sorted_bam_index': 'The .bai BAM index file associated with sorted_bam ', 'sorted_bam_md5': 'The md5sum of sorted_bam '} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to sort Defaults memory_gb (Int, default=25): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\") + \".sorted\"): Prefix for the sorted BAM file and accessory files. The extensions .bam , .bam.bai , and .bam.md5 will be added. sort_order (String, default=\"coordinate\"); description : Order by which to sort the input BAM; choices : ['queryname', 'coordinate', 'duplicate']; common : true validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT Outputs sorted_bam (File) sorted_bam_index (File) sorted_bam_md5 (File) merge_sam_files description Merges the input BAM files into a single BAM file. All input BAMs are assumed to be sorted according to sort_order . external_help https://gatk.broadinstitute.org/hc/en-us/articles/360057440751-MergeSamFiles-Picard- outputs {'merged_bam': 'The BAM resulting from merging all the input BAMs', 'merged_bam_index': 'The .bai BAM index file associated with merged_bam ', 'merged_bam_md5': 'The md5sum of merged_bam '} Inputs Required _runtime (Any, required ) bams (Array[File], required ): Input BAMs to merge. All BAMs are assumed to be sorted according to sort_order . prefix (String, required ): Prefix for the merged BAM file and accessory files. The extensions .bam , .bam.bai , and .bam.md5 will be added. Defaults memory_gb (Int, default=40): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. sort_order (String, default=\"coordinate\"); description : Sort order for the output merged BAM. It is assumed all input BAMs share this order.; choices : ['unsorted', 'queryname', 'coordinate', 'duplicate', 'unknown']; common : true threading (Boolean, default=true): Option to create a background thread to encode, compress and write to disk the output file. The threaded version uses about 20% more CPU and decreases runtime by ~20% when writing out a compressed BAM file. Sets runtime.cpu = 2 if true . runtime.cpu = 1 if false . validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT Outputs merged_bam (File) merged_bam_index (File) merged_bam_md5 (File) clean_sam description Cleans the input BAM file. Cleans soft-clipping beyond end-of-reference, sets MAPQ=0 for unmapped reads. external_help https://gatk.broadinstitute.org/hc/en-us/articles/360036885571-CleanSam-Picard- outputs {'cleaned_bam': 'A cleaned version of the input BAM', 'cleaned_bam_index': 'The .bai BAM index file associated with cleaned_bam ', 'cleaned_bam_md5': 'The md5sum of cleaned_bam '} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to clean Defaults memory_gb (Int, default=25): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\") + \".cleaned\"): Prefix for the cleaned BAM file and accessory files. The extensions .bam , .bam.bai , and .bam.md5 will be added. validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT Outputs cleaned_bam (File) cleaned_bam_index (File) cleaned_bam_md5 (File) collect_wgs_metrics description Runs picard CollectWgsMetrics to collect metrics about the fractions of reads that pass base- and mapping-quality filters as well as coverage (read-depth) levels external_help https://gatk.broadinstitute.org/hc/en-us/articles/360037226132-CollectWgsMetrics-Picard- outputs {'wgs_metrics': {'description': 'Output report of picard CollectWgsMetrics ', 'external_help': 'https://broadinstitute.github.io/picard/picard-metric-definitions.html#CollectWgsMetrics.WgsMetrics'}} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file for which to calculate WGS metrics reference_fasta (File, required ): Gzipped reference genome in FASTA format Defaults memory_gb (Int, default=12): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. outfile_name (String, default=basename(bam,\".bam\") + \".CollectWgsMetrics.txt\"): Name for the metrics result file validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT Outputs wgs_metrics (File) collect_alignment_summary_metrics description Runs picard CollectAlignmentSummaryMetrics to calculate metrics detailing the quality of the read alignments as well as the proportion of the reads that passed machine signal-to-noise threshold quality filters external_help https://gatk.broadinstitute.org/hc/en-us/articles/360040507751-CollectAlignmentSummaryMetrics-Picard- outputs {'alignment_metrics': {'description': 'The text file output of CollectAlignmentSummaryMetrics ', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#AlignmentSummaryMetrics'}, 'alignment_metrics_pdf': 'The PDF file output of CollectAlignmentSummaryMetrics '} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file for which to calculate alignment metrics Defaults memory_gb (Int, default=8): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\") + \".CollectAlignmentSummaryMetrics\"): Prefix for the output report files. The extensions .txt and .pdf will be added. validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT Outputs alignment_metrics (File) alignment_metrics_pdf (File) collect_gc_bias_metrics description Runs picard CollectGcBiasMetrics to collect information about the relative proportions of guanine (G) and cytosine (C) nucleotides external_help https://gatk.broadinstitute.org/hc/en-us/articles/360037593931-CollectGcBiasMetrics-Picard- outputs {'gc_bias_metrics': {'description': 'The full text file output of CollectGcBiasMetrics ', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#GcBiasDetailMetrics'}, 'gc_bias_metrics_summary': {'description': 'The summary text file output of CollectGcBiasMetrics ', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#GcBiasSummaryMetrics'}, 'gc_bias_metrics_pdf': 'The PDF file output of CollectGcBiasMetrics '} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file for which to calculate GC bias metrics reference_fasta (File, required ): Reference sequences in FASTA format Defaults memory_gb (Int, default=8): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\") + \".CollectGcBiasMetrics\"): Prefix for the output report files. The extensions .txt , .summary.txt , and .pdf will be added. validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT Outputs gc_bias_metrics (File) gc_bias_metrics_summary (File) gc_bias_metrics_pdf (File) collect_insert_size_metrics description Runs picard CollectInsertSizeMetrics to collect metrics for validating library construction including the insert size distribution and read orientation of Paired-End libraries external_help https://gatk.broadinstitute.org/hc/en-us/articles/360037055772-CollectInsertSizeMetrics-Picard- outputs {'insert_size_metrics': {'description': 'The text file output of CollectInsertSizeMetrics ', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#InsertSizeMetrics'}, 'insert_size_metrics_pdf': 'The PDF file output of CollectInsertSizeMetrics '} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file for which to calculate insert size metrics Defaults memory_gb (Int, default=8): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\") + \".CollectInsertSizeMetrics\"): Prefix for the output report files. The extensions .txt and .pdf will be added. validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT Outputs insert_size_metrics (File) insert_size_metrics_pdf (File) quality_score_distribution description Runs picard QualityScoreDistribution to calculate the range of quality scores and creates an accompanying chart external_help https://gatk.broadinstitute.org/hc/en-us/articles/360037057312-QualityScoreDistribution-Picard- outputs {'quality_score_distribution_txt': 'The text file output of QualityScoreDistribution ', 'quality_score_distribution_pdf': 'The PDF file output of QualityScoreDistribution '} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file for which to calculate quality score distribution Defaults memory_gb (Int, default=8): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\") + \".QualityScoreDistribution\"): Prefix for the output report files. The extensions .txt and .pdf will be added. validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT Outputs quality_score_distribution_txt (File) quality_score_distribution_pdf (File) bam_to_fastq description [Deprecated] This WDL task converts the input BAM file into FASTQ format files. This task has been deprecated in favor of samtools.bam_to_fastq which is more performant and doesn't error on 'illegal mate states'. deprecated true Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to convert to FASTQ Defaults memory_gb (Int, default=56): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. paired (Boolean, default=true); description : Is the data Paired-End (true) or Single-End (false)?; common : true prefix (String, default=basename(bam,\".bam\")): Prefix for the file. The extension <extension> will be added. Outputs read_one_fastq_gz (File) read_two_fastq_gz (File?) merge_vcfs description Merges the input VCF files into a single VCF file external_help https://gatk.broadinstitute.org/hc/en-us/articles/360036713331-MergeVcfs-Picard outputs {'output_vcf': 'The merged VCF file', 'output_vcf_index': 'The index file associated with the merged VCF file'} Inputs Required _runtime (Any, required ) output_vcf_name (String, required ): Name for the merged VCF file vcfs (Array[File], required ): Input VCF format files to merge. May be gzipped or binary compressed. vcfs_indexes (Array[File], required ): Index files associated with the input VCF files Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. Outputs output_vcf (File) output_vcf_index (File) scatter_interval_list description Splits an interval list into smaller interval lists for parallel processing external_help https://gatk.broadinstitute.org/hc/en-us/articles/360036897212-IntervalListTools-Picard outputs {'out': 'The split interval lists', 'interval_count': 'The number of split interval lists'} Inputs Required _runtime (Any, required ) interval_list (File, required ): Input interval list to split scatter_count (Int, required ): Number of interval lists to create Defaults sort (Boolean, default=true): Should the output interval lists be sorted? Sorts by coordinate. subdivision_mode (String, default=\"BALANCING_WITHOUT_INTERVAL_SUBDIVISION_WITH_OVERFLOW\"); description : How to subdivide the intervals; choices : ['BALANCING_WITHOUT_INTERVAL_SUBDIVISION_WITH_OVERFLOW', 'INTERVAL_SUBDIVISION', 'BALANCING_WITHOUT_INTERVAL_SUBDIVISION'] unique (Boolean, default=true): Should the output interval lists contain unique intervals? Implies sort=true. Merges overlapping or adjacent intervals. Outputs interval_lists_scatter (Array[File]) interval_count (Int) create_sequence_dictionary description Creates a sequence dictionary for the input FASTA file using Picard external_help https://gatk.broadinstitute.org/hc/en-us/articles/13832748622491-CreateSequenceDictionary-Picard- outputs {'dictionary': 'Sequence dictionary produced by picard CreateSequenceDictionary .'} Inputs Required _runtime (Any, required ) fasta (File, required ): Input FASTA format file from which to create dictionary Optional assembly_name (String?): Value to put in AS field of sequence dictionary fasta_url (String?): Value to put in UR field of sequence dictionary species (String?): Value to put in SP field of sequence dictionary Defaults memory_gb (Int, default=16): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. outfile_name (String, default=basename(fasta,\".fa\") + \".dict\"): Name for the CreateSequenceDictionary dictionary file Outputs dictionary (File)","title":"Picard"},{"location":"tasks/picard/#mark_duplicates","text":"description Marks duplicate reads in the input BAM file using Picard external_help https://gatk.broadinstitute.org/hc/en-us/articles/360037052812-MarkDuplicates-Picard- help For non-primary reads and unmapped mates of duplicate reads to be included in duplicate analysis, input BAM must be collated. See external_help for more information. outputs {'duplicate_marked_bam': 'The input BAM with computationally determined duplicates marked.', 'duplicate_marked_bam_index': 'The .bai BAM index file associated with duplicate_marked_bam ', 'duplicate_marked_bam_md5': 'The md5sum of duplicate_marked_bam ', 'mark_duplicates_metrics': {'description': 'The METRICS_FILE result of picard MarkDuplicates ', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#DuplicationMetrics'}}","title":"mark_duplicates"},{"location":"tasks/picard/#inputs","text":"","title":"Inputs"},{"location":"tasks/picard/#required","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file in which to mark duplicates","title":"Required"},{"location":"tasks/picard/#defaults","text":"clear_dt (Boolean, default=true): Clear the DT tag from the input BAM? For increased performance, if the input BAM does not have the DT tag, set to false . create_bam (Boolean, default=true); description : Enable BAM creation (true)? Or only output MarkDuplicates metrics (false)?; common : true duplicate_scoring_strategy (String, default=\"SUM_OF_BASE_QUALITIES\"); description : Strategy for scoring duplicates.; choices : ['SUM_OF_BASE_QUALITIES', 'TOTAL_MAPPED_REFERENCE_LENGTH', 'RANDOM'] modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from the default memory allocation. Default memory allocation is determined by the size of the input BAM. Specified in GB. optical_distance (Int, default=0): Maximum distance between read coordinates to consider them optical duplicates. If 0 , then optical duplicate marking is disabled. Suggested settings of 100 for unpatterned versions of the Illumina platform (e.g. HiSeq) or 2500 for patterned flowcell models (e.g. NovaSeq). Calculation of distance depends on coordinate data embedded in the read names, typically produced by the Illumina sequencing machines. Optical duplicate detection will not work on non-standard names without modifying read_name_regex . prefix (String, default=basename(bam,\".bam\") + \".MarkDuplicates\"): Prefix for the MarkDuplicates result files. The extensions .bam , .bam.bai , .bam.md5 , and .metrics.txt will be added. read_name_regex (String, default=\"^[!-9;-?A-~:]+:([!-9;-?A-~]+):([0-9]+):([0-9]+)$\"): Regular expression for extracting tile names, x coordinates, and y coordinates from read names. The default works for typical Illumina read names. remove_duplicates (Boolean, default=false): Remove duplicate reads from the output BAM? If true , the output BAM will not contain any duplicate reads. remove_sequencing_duplicates (Boolean, default=false): Remove sequencing duplicates (i.e. optical duplicates) from the output BAM? If true , the output BAM will not contain any sequencing duplicates (optical duplicates). tagging_policy (String, default=\"All\"); description : Tagging policy for the output BAM.; choices : ['DontTag', 'OpticalOnly', 'All'] validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT","title":"Defaults"},{"location":"tasks/picard/#outputs","text":"duplicate_marked_bam (File?) duplicate_marked_bam_index (File?) duplicate_marked_bam_md5 (File?) mark_duplicates_metrics (File)","title":"Outputs"},{"location":"tasks/picard/#validate_bam","text":"description Validates the input BAM file for correct formatting using Picard external_help https://gatk.broadinstitute.org/hc/en-us/articles/360057440611-ValidateSamFile-Picard- outputs {'validate_report': 'Validation report produced by picard ValidateSamFile . Validation warnings and errors are logged.', 'validated_bam': 'The unmodified input BAM after it has been succesfully validated'}","title":"validate_bam"},{"location":"tasks/picard/#inputs_1","text":"","title":"Inputs"},{"location":"tasks/picard/#required_1","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to validate","title":"Required"},{"location":"tasks/picard/#optional","text":"reference_fasta (File?): Reference genome in FASTA format. Presence of the reference FASTA allows for NM tag validation.","title":"Optional"},{"location":"tasks/picard/#defaults_1","text":"ignore_list (Array[String], default=[]); description : List of Picard errors and warnings to ignore. Possible values can be found on the GATK website (see external_help ).; external_help : https://gatk.broadinstitute.org/hc/en-us/articles/360035891231-Errors-in-SAM-or-BAM-files-can-be-diagnosed-with-ValidateSamFile; common : true index_validation_stringency_less_exhaustive (Boolean, default=false): Set INDEX_VALIDATION_STRINGENCY=LESS_EXHAUSTIVE ? max_errors (Int, default=2147483647): Set the value of MAX_OUTPUT for picard ValidateSamFile . The Picard default is 100, a lower number can enable fast fail behavior memory_gb (Int, default=16): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. outfile_name (String, default=basename(bam,\".bam\") + \".ValidateSamFile.txt\"): Name for the ValidateSamFile report file succeed_on_errors (Boolean, default=false); description : Succeed the task even if errors and/or warnings are detected; common : true succeed_on_warnings (Boolean, default=true); description : Succeed the task if warnings are detected and there are no errors. Overridden by succeed_on_errors ; common : true summary_mode (Boolean, default=false); description : Enable SUMMARY mode?; common : true validation_stringency (String, default=\"LENIENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT","title":"Defaults"},{"location":"tasks/picard/#outputs_1","text":"validate_report (File)","title":"Outputs"},{"location":"tasks/picard/#sort","text":"description Sorts the input BAM file external_help https://gatk.broadinstitute.org/hc/en-us/articles/360036510732-SortSam-Picard- outputs {'sorted_bam': 'The input BAM after it has been sorted according to sort_order ', 'sorted_bam_index': 'The .bai BAM index file associated with sorted_bam ', 'sorted_bam_md5': 'The md5sum of sorted_bam '}","title":"sort"},{"location":"tasks/picard/#inputs_2","text":"","title":"Inputs"},{"location":"tasks/picard/#required_2","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to sort","title":"Required"},{"location":"tasks/picard/#defaults_2","text":"memory_gb (Int, default=25): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\") + \".sorted\"): Prefix for the sorted BAM file and accessory files. The extensions .bam , .bam.bai , and .bam.md5 will be added. sort_order (String, default=\"coordinate\"); description : Order by which to sort the input BAM; choices : ['queryname', 'coordinate', 'duplicate']; common : true validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT","title":"Defaults"},{"location":"tasks/picard/#outputs_2","text":"sorted_bam (File) sorted_bam_index (File) sorted_bam_md5 (File)","title":"Outputs"},{"location":"tasks/picard/#merge_sam_files","text":"description Merges the input BAM files into a single BAM file. All input BAMs are assumed to be sorted according to sort_order . external_help https://gatk.broadinstitute.org/hc/en-us/articles/360057440751-MergeSamFiles-Picard- outputs {'merged_bam': 'The BAM resulting from merging all the input BAMs', 'merged_bam_index': 'The .bai BAM index file associated with merged_bam ', 'merged_bam_md5': 'The md5sum of merged_bam '}","title":"merge_sam_files"},{"location":"tasks/picard/#inputs_3","text":"","title":"Inputs"},{"location":"tasks/picard/#required_3","text":"_runtime (Any, required ) bams (Array[File], required ): Input BAMs to merge. All BAMs are assumed to be sorted according to sort_order . prefix (String, required ): Prefix for the merged BAM file and accessory files. The extensions .bam , .bam.bai , and .bam.md5 will be added.","title":"Required"},{"location":"tasks/picard/#defaults_3","text":"memory_gb (Int, default=40): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. sort_order (String, default=\"coordinate\"); description : Sort order for the output merged BAM. It is assumed all input BAMs share this order.; choices : ['unsorted', 'queryname', 'coordinate', 'duplicate', 'unknown']; common : true threading (Boolean, default=true): Option to create a background thread to encode, compress and write to disk the output file. The threaded version uses about 20% more CPU and decreases runtime by ~20% when writing out a compressed BAM file. Sets runtime.cpu = 2 if true . runtime.cpu = 1 if false . validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT","title":"Defaults"},{"location":"tasks/picard/#outputs_3","text":"merged_bam (File) merged_bam_index (File) merged_bam_md5 (File)","title":"Outputs"},{"location":"tasks/picard/#clean_sam","text":"description Cleans the input BAM file. Cleans soft-clipping beyond end-of-reference, sets MAPQ=0 for unmapped reads. external_help https://gatk.broadinstitute.org/hc/en-us/articles/360036885571-CleanSam-Picard- outputs {'cleaned_bam': 'A cleaned version of the input BAM', 'cleaned_bam_index': 'The .bai BAM index file associated with cleaned_bam ', 'cleaned_bam_md5': 'The md5sum of cleaned_bam '}","title":"clean_sam"},{"location":"tasks/picard/#inputs_4","text":"","title":"Inputs"},{"location":"tasks/picard/#required_4","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to clean","title":"Required"},{"location":"tasks/picard/#defaults_4","text":"memory_gb (Int, default=25): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\") + \".cleaned\"): Prefix for the cleaned BAM file and accessory files. The extensions .bam , .bam.bai , and .bam.md5 will be added. validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT","title":"Defaults"},{"location":"tasks/picard/#outputs_4","text":"cleaned_bam (File) cleaned_bam_index (File) cleaned_bam_md5 (File)","title":"Outputs"},{"location":"tasks/picard/#collect_wgs_metrics","text":"description Runs picard CollectWgsMetrics to collect metrics about the fractions of reads that pass base- and mapping-quality filters as well as coverage (read-depth) levels external_help https://gatk.broadinstitute.org/hc/en-us/articles/360037226132-CollectWgsMetrics-Picard- outputs {'wgs_metrics': {'description': 'Output report of picard CollectWgsMetrics ', 'external_help': 'https://broadinstitute.github.io/picard/picard-metric-definitions.html#CollectWgsMetrics.WgsMetrics'}}","title":"collect_wgs_metrics"},{"location":"tasks/picard/#inputs_5","text":"","title":"Inputs"},{"location":"tasks/picard/#required_5","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file for which to calculate WGS metrics reference_fasta (File, required ): Gzipped reference genome in FASTA format","title":"Required"},{"location":"tasks/picard/#defaults_5","text":"memory_gb (Int, default=12): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. outfile_name (String, default=basename(bam,\".bam\") + \".CollectWgsMetrics.txt\"): Name for the metrics result file validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT","title":"Defaults"},{"location":"tasks/picard/#outputs_5","text":"wgs_metrics (File)","title":"Outputs"},{"location":"tasks/picard/#collect_alignment_summary_metrics","text":"description Runs picard CollectAlignmentSummaryMetrics to calculate metrics detailing the quality of the read alignments as well as the proportion of the reads that passed machine signal-to-noise threshold quality filters external_help https://gatk.broadinstitute.org/hc/en-us/articles/360040507751-CollectAlignmentSummaryMetrics-Picard- outputs {'alignment_metrics': {'description': 'The text file output of CollectAlignmentSummaryMetrics ', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#AlignmentSummaryMetrics'}, 'alignment_metrics_pdf': 'The PDF file output of CollectAlignmentSummaryMetrics '}","title":"collect_alignment_summary_metrics"},{"location":"tasks/picard/#inputs_6","text":"","title":"Inputs"},{"location":"tasks/picard/#required_6","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file for which to calculate alignment metrics","title":"Required"},{"location":"tasks/picard/#defaults_6","text":"memory_gb (Int, default=8): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\") + \".CollectAlignmentSummaryMetrics\"): Prefix for the output report files. The extensions .txt and .pdf will be added. validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT","title":"Defaults"},{"location":"tasks/picard/#outputs_6","text":"alignment_metrics (File) alignment_metrics_pdf (File)","title":"Outputs"},{"location":"tasks/picard/#collect_gc_bias_metrics","text":"description Runs picard CollectGcBiasMetrics to collect information about the relative proportions of guanine (G) and cytosine (C) nucleotides external_help https://gatk.broadinstitute.org/hc/en-us/articles/360037593931-CollectGcBiasMetrics-Picard- outputs {'gc_bias_metrics': {'description': 'The full text file output of CollectGcBiasMetrics ', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#GcBiasDetailMetrics'}, 'gc_bias_metrics_summary': {'description': 'The summary text file output of CollectGcBiasMetrics ', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#GcBiasSummaryMetrics'}, 'gc_bias_metrics_pdf': 'The PDF file output of CollectGcBiasMetrics '}","title":"collect_gc_bias_metrics"},{"location":"tasks/picard/#inputs_7","text":"","title":"Inputs"},{"location":"tasks/picard/#required_7","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file for which to calculate GC bias metrics reference_fasta (File, required ): Reference sequences in FASTA format","title":"Required"},{"location":"tasks/picard/#defaults_7","text":"memory_gb (Int, default=8): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\") + \".CollectGcBiasMetrics\"): Prefix for the output report files. The extensions .txt , .summary.txt , and .pdf will be added. validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT","title":"Defaults"},{"location":"tasks/picard/#outputs_7","text":"gc_bias_metrics (File) gc_bias_metrics_summary (File) gc_bias_metrics_pdf (File)","title":"Outputs"},{"location":"tasks/picard/#collect_insert_size_metrics","text":"description Runs picard CollectInsertSizeMetrics to collect metrics for validating library construction including the insert size distribution and read orientation of Paired-End libraries external_help https://gatk.broadinstitute.org/hc/en-us/articles/360037055772-CollectInsertSizeMetrics-Picard- outputs {'insert_size_metrics': {'description': 'The text file output of CollectInsertSizeMetrics ', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#InsertSizeMetrics'}, 'insert_size_metrics_pdf': 'The PDF file output of CollectInsertSizeMetrics '}","title":"collect_insert_size_metrics"},{"location":"tasks/picard/#inputs_8","text":"","title":"Inputs"},{"location":"tasks/picard/#required_8","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file for which to calculate insert size metrics","title":"Required"},{"location":"tasks/picard/#defaults_8","text":"memory_gb (Int, default=8): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\") + \".CollectInsertSizeMetrics\"): Prefix for the output report files. The extensions .txt and .pdf will be added. validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT","title":"Defaults"},{"location":"tasks/picard/#outputs_8","text":"insert_size_metrics (File) insert_size_metrics_pdf (File)","title":"Outputs"},{"location":"tasks/picard/#quality_score_distribution","text":"description Runs picard QualityScoreDistribution to calculate the range of quality scores and creates an accompanying chart external_help https://gatk.broadinstitute.org/hc/en-us/articles/360037057312-QualityScoreDistribution-Picard- outputs {'quality_score_distribution_txt': 'The text file output of QualityScoreDistribution ', 'quality_score_distribution_pdf': 'The PDF file output of QualityScoreDistribution '}","title":"quality_score_distribution"},{"location":"tasks/picard/#inputs_9","text":"","title":"Inputs"},{"location":"tasks/picard/#required_9","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file for which to calculate quality score distribution","title":"Required"},{"location":"tasks/picard/#defaults_9","text":"memory_gb (Int, default=8): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\") + \".QualityScoreDistribution\"): Prefix for the output report files. The extensions .txt and .pdf will be added. validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT","title":"Defaults"},{"location":"tasks/picard/#outputs_9","text":"quality_score_distribution_txt (File) quality_score_distribution_pdf (File)","title":"Outputs"},{"location":"tasks/picard/#bam_to_fastq","text":"description [Deprecated] This WDL task converts the input BAM file into FASTQ format files. This task has been deprecated in favor of samtools.bam_to_fastq which is more performant and doesn't error on 'illegal mate states'. deprecated true","title":"bam_to_fastq"},{"location":"tasks/picard/#inputs_10","text":"","title":"Inputs"},{"location":"tasks/picard/#required_10","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to convert to FASTQ","title":"Required"},{"location":"tasks/picard/#defaults_10","text":"memory_gb (Int, default=56): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. paired (Boolean, default=true); description : Is the data Paired-End (true) or Single-End (false)?; common : true prefix (String, default=basename(bam,\".bam\")): Prefix for the file. The extension <extension> will be added.","title":"Defaults"},{"location":"tasks/picard/#outputs_10","text":"read_one_fastq_gz (File) read_two_fastq_gz (File?)","title":"Outputs"},{"location":"tasks/picard/#merge_vcfs","text":"description Merges the input VCF files into a single VCF file external_help https://gatk.broadinstitute.org/hc/en-us/articles/360036713331-MergeVcfs-Picard outputs {'output_vcf': 'The merged VCF file', 'output_vcf_index': 'The index file associated with the merged VCF file'}","title":"merge_vcfs"},{"location":"tasks/picard/#inputs_11","text":"","title":"Inputs"},{"location":"tasks/picard/#required_11","text":"_runtime (Any, required ) output_vcf_name (String, required ): Name for the merged VCF file vcfs (Array[File], required ): Input VCF format files to merge. May be gzipped or binary compressed. vcfs_indexes (Array[File], required ): Index files associated with the input VCF files","title":"Required"},{"location":"tasks/picard/#defaults_11","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.","title":"Defaults"},{"location":"tasks/picard/#outputs_11","text":"output_vcf (File) output_vcf_index (File)","title":"Outputs"},{"location":"tasks/picard/#scatter_interval_list","text":"description Splits an interval list into smaller interval lists for parallel processing external_help https://gatk.broadinstitute.org/hc/en-us/articles/360036897212-IntervalListTools-Picard outputs {'out': 'The split interval lists', 'interval_count': 'The number of split interval lists'}","title":"scatter_interval_list"},{"location":"tasks/picard/#inputs_12","text":"","title":"Inputs"},{"location":"tasks/picard/#required_12","text":"_runtime (Any, required ) interval_list (File, required ): Input interval list to split scatter_count (Int, required ): Number of interval lists to create","title":"Required"},{"location":"tasks/picard/#defaults_12","text":"sort (Boolean, default=true): Should the output interval lists be sorted? Sorts by coordinate. subdivision_mode (String, default=\"BALANCING_WITHOUT_INTERVAL_SUBDIVISION_WITH_OVERFLOW\"); description : How to subdivide the intervals; choices : ['BALANCING_WITHOUT_INTERVAL_SUBDIVISION_WITH_OVERFLOW', 'INTERVAL_SUBDIVISION', 'BALANCING_WITHOUT_INTERVAL_SUBDIVISION'] unique (Boolean, default=true): Should the output interval lists contain unique intervals? Implies sort=true. Merges overlapping or adjacent intervals.","title":"Defaults"},{"location":"tasks/picard/#outputs_12","text":"interval_lists_scatter (Array[File]) interval_count (Int)","title":"Outputs"},{"location":"tasks/picard/#create_sequence_dictionary","text":"description Creates a sequence dictionary for the input FASTA file using Picard external_help https://gatk.broadinstitute.org/hc/en-us/articles/13832748622491-CreateSequenceDictionary-Picard- outputs {'dictionary': 'Sequence dictionary produced by picard CreateSequenceDictionary .'}","title":"create_sequence_dictionary"},{"location":"tasks/picard/#inputs_13","text":"","title":"Inputs"},{"location":"tasks/picard/#required_13","text":"_runtime (Any, required ) fasta (File, required ): Input FASTA format file from which to create dictionary","title":"Required"},{"location":"tasks/picard/#optional_1","text":"assembly_name (String?): Value to put in AS field of sequence dictionary fasta_url (String?): Value to put in UR field of sequence dictionary species (String?): Value to put in SP field of sequence dictionary","title":"Optional"},{"location":"tasks/picard/#defaults_13","text":"memory_gb (Int, default=16): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. outfile_name (String, default=basename(fasta,\".fa\") + \".dict\"): Name for the CreateSequenceDictionary dictionary file","title":"Defaults"},{"location":"tasks/picard/#outputs_13","text":"dictionary (File)","title":"Outputs"},{"location":"tasks/qualimap/","text":"Homepage rnaseq description Generates runs QualiMap's rnaseq tool on the input BAM file. Note that we don't expose the -p parameter. This is used to set strandedness protocol of the sample, however in practice it only disables certain calculations. We do not expose the parameter so that the full suite of calculations is always performed. outputs {'raw_summary': \"Raw text summary of QualiMap's results. Can be parsed by MultiQC.\", 'raw_coverage': \"Raw text of QualiMap's coverage analysis results. Can be parsed by MultiQC.\", 'results': 'Gzipped tar archive of all QualiMap output files'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to run qualimap rnaseq on gtf (File, required ): GTF features file. Gzipped or uncompressed. Defaults memory_gb (Int, default=16): RAM to allocate for task modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. name_sorted (Boolean, default=false); description : Is the BAM name sorted? QualiMap has an inefficient sorting algorithm. In order to save resources we recommend collating your input BAM before QualiMap and setting this parameter to true.; common : true paired_end (Boolean, default=true); description : Is the BAM paired end?; common : true prefix (String, default=basename(bam,\".bam\") + \".qualimap_rnaseq_results\"): Prefix for the results directory and output tarball. The extension .qualimap_rnaseq_results.tar.gz will be added. Outputs raw_summary (File) raw_coverage (File) results (File) bamqc description [Deprecated] This WDL task runs QualiMap's bamqc tool on the input BAM file. This task has been deprecated due to memory leak issues. Use at your own risk, for some samples can consume over 1TB of RAM. deprecated true Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to run qualimap bamqc on Defaults memory_gb (Int, default=32): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=1): Number of cores to allocate for task prefix (String, default=basename(bam,\".bam\")): Prefix for the file. The extension <extension> will be added. use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. Outputs results (File)","title":"Qualimap"},{"location":"tasks/qualimap/#rnaseq","text":"description Generates runs QualiMap's rnaseq tool on the input BAM file. Note that we don't expose the -p parameter. This is used to set strandedness protocol of the sample, however in practice it only disables certain calculations. We do not expose the parameter so that the full suite of calculations is always performed. outputs {'raw_summary': \"Raw text summary of QualiMap's results. Can be parsed by MultiQC.\", 'raw_coverage': \"Raw text of QualiMap's coverage analysis results. Can be parsed by MultiQC.\", 'results': 'Gzipped tar archive of all QualiMap output files'}","title":"rnaseq"},{"location":"tasks/qualimap/#inputs","text":"","title":"Inputs"},{"location":"tasks/qualimap/#required","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to run qualimap rnaseq on gtf (File, required ): GTF features file. Gzipped or uncompressed.","title":"Required"},{"location":"tasks/qualimap/#defaults","text":"memory_gb (Int, default=16): RAM to allocate for task modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. name_sorted (Boolean, default=false); description : Is the BAM name sorted? QualiMap has an inefficient sorting algorithm. In order to save resources we recommend collating your input BAM before QualiMap and setting this parameter to true.; common : true paired_end (Boolean, default=true); description : Is the BAM paired end?; common : true prefix (String, default=basename(bam,\".bam\") + \".qualimap_rnaseq_results\"): Prefix for the results directory and output tarball. The extension .qualimap_rnaseq_results.tar.gz will be added.","title":"Defaults"},{"location":"tasks/qualimap/#outputs","text":"raw_summary (File) raw_coverage (File) results (File)","title":"Outputs"},{"location":"tasks/qualimap/#bamqc","text":"description [Deprecated] This WDL task runs QualiMap's bamqc tool on the input BAM file. This task has been deprecated due to memory leak issues. Use at your own risk, for some samples can consume over 1TB of RAM. deprecated true","title":"bamqc"},{"location":"tasks/qualimap/#inputs_1","text":"","title":"Inputs"},{"location":"tasks/qualimap/#required_1","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to run qualimap bamqc on","title":"Required"},{"location":"tasks/qualimap/#defaults_1","text":"memory_gb (Int, default=32): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=1): Number of cores to allocate for task prefix (String, default=basename(bam,\".bam\")): Prefix for the file. The extension <extension> will be added. use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments.","title":"Defaults"},{"location":"tasks/qualimap/#outputs_1","text":"results (File)","title":"Outputs"},{"location":"tasks/read_group/","text":"Read groups are defined in the SAM spec ID: \"Read group identifier. Each Read Group must have a unique ID. The value of ID is used in the RG tags of alignment records.\", BC: \"Barcode sequence identifying the sample or library. This value is the expected barcode bases as read by the sequencing machine in the absence of errors. If there are several barcodes for the sample/library (e.g., one on each end of the template), the recommended implementation concatenates all the barcodes separating them with hyphens ( - ).\", CN: \"Name of sequencing center producing the read.\", DS: \"Description.\", DT: \"Date the run was produced (ISO8601 date or date/time).\", FO: \"Flow order. The array of nucleotide bases that correspond to the nucleotides used for each flow of each read. Multi-base flows are encoded in IUPAC format, and non-nucleotide flows by various other characters. Format: /\\*|[ACMGRSVTWYHKDBN]+/\", KS: \"The array of nucleotide bases that correspond to the key sequence of each read.\", LB: \"Library.\", PG: \"Programs used for processing the read group.\", PI: \"Predicted median insert size, rounded to the nearest integer.\", PL: \"Platform/technology used to produce the reads. Valid values: CAPILLARY, DNBSEQ (MGI/BGI), ELEMENT, HELICOS, ILLUMINA, IONTORRENT, LS454, ONT (Oxford Nanopore), PACBIO (Pacific Biosciences), SINGULAR, SOLID, and ULTIMA. This field should be omitted when the technology is not in this list (though the PM field may still be present in this case) or is unknown.\", PM: \"Platform model. Free-form text providing further details of the platform/technology used.\", PU: \"Platform unit (e.g., flowcell-barcode.lane for Illumina or slide for SOLiD). Unique identifier.\", SM: \"Sample. Use pool name where a pool is being sequenced.\" An example input JSON entry for read_group might look like this: { \"read_group\": { \"ID\": \"rg1\", \"PI\": 150, \"PL\": \"ILLUMINA\", \"SM\": \"Sample\", \"LB\": \"Sample\" } } ReadGroup_to_string description Stringifies a ReadGroup struct outputs {'stringified_read_group': 'Input ReadGroup as a string'} Inputs Required _runtime (Any, required ) read_group (ReadGroup, required ): ReadGroup struct to stringify Outputs stringified_read_group (String) get_ReadGroups description Gets read group information from a BAM file and writes it out as JSON which is converted to a WDL struct. outputs {'read_groups': 'An array of ReadGroup structs containing read group information.'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to get read groups from Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. Outputs read_groups (Array[ReadGroup]) validate_ReadGroup description Validate a ReadGroup struct's fields are defined outputs {'check': 'Dummy output to indicate success and enable call-caching'} Inputs Required _runtime (Any, required ) read_group (ReadGroup, required ): ReadGroup struct to validate Defaults required_fields (Array[String], default=[]): Array of read group fields that must be defined. The ID field is always required and does not need to be specified. restrictive (Boolean, default=true): If true, run a less permissive validation of field values. Otherwise, check against SAM spec-defined values. Outputs check (String)","title":"Read group"},{"location":"tasks/read_group/#readgroup_to_string","text":"description Stringifies a ReadGroup struct outputs {'stringified_read_group': 'Input ReadGroup as a string'}","title":"ReadGroup_to_string"},{"location":"tasks/read_group/#inputs","text":"","title":"Inputs"},{"location":"tasks/read_group/#required","text":"_runtime (Any, required ) read_group (ReadGroup, required ): ReadGroup struct to stringify","title":"Required"},{"location":"tasks/read_group/#outputs","text":"stringified_read_group (String)","title":"Outputs"},{"location":"tasks/read_group/#get_readgroups","text":"description Gets read group information from a BAM file and writes it out as JSON which is converted to a WDL struct. outputs {'read_groups': 'An array of ReadGroup structs containing read group information.'}","title":"get_ReadGroups"},{"location":"tasks/read_group/#inputs_1","text":"","title":"Inputs"},{"location":"tasks/read_group/#required_1","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to get read groups from","title":"Required"},{"location":"tasks/read_group/#defaults","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.","title":"Defaults"},{"location":"tasks/read_group/#outputs_1","text":"read_groups (Array[ReadGroup])","title":"Outputs"},{"location":"tasks/read_group/#validate_readgroup","text":"description Validate a ReadGroup struct's fields are defined outputs {'check': 'Dummy output to indicate success and enable call-caching'}","title":"validate_ReadGroup"},{"location":"tasks/read_group/#inputs_2","text":"","title":"Inputs"},{"location":"tasks/read_group/#required_2","text":"_runtime (Any, required ) read_group (ReadGroup, required ): ReadGroup struct to validate","title":"Required"},{"location":"tasks/read_group/#defaults_1","text":"required_fields (Array[String], default=[]): Array of read group fields that must be defined. The ID field is always required and does not need to be specified. restrictive (Boolean, default=true): If true, run a less permissive validation of field values. Otherwise, check against SAM spec-defined values.","title":"Defaults"},{"location":"tasks/read_group/#outputs_2","text":"check (String)","title":"Outputs"},{"location":"tasks/sambamba/","text":"Homepage index description Creates a .bai BAM index for the input BAM outputs {'bam_index': \"A .bai BAM index associated with the input BAM. Filename will be basename(bam) + '.bai' .\"} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to index Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments. Not recommended for cluster environments.; common : true Outputs bam_index (File) merge description Merges multiple sorted BAMs into a single BAM outputs {'merged_bam': 'The BAM resulting from merging all the input BAMs'} Inputs Required _runtime (Any, required ) bams (Array[File], required ): An array of BAMs to merge into one combined BAM prefix (String, required ): Prefix for the BAM file. The extension .bam will be added. Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments. Not recommended for cluster environments.; common : true Outputs merged_bam (File) sort description Sorts the input BAM file outputs {'sorted_bam': 'The input BAM after it has been sorted according to sort_order '} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to sort Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2): Number of cores to allocate for task prefix (String, default=basename(bam,\".bam\") + \".sorted\"): Prefix for the sorted BAM file. The extension .bam will be added. queryname_sort (Boolean, default=false); description : If true, sort the BAM by queryname. If false, sort by coordinate.; common : true Outputs sorted_bam (File) markdup description Marks duplicate reads in the input BAM file outputs {'duplicate_marked_bam': 'The input BAM with computationally determined duplicates marked.', 'mark_duplicates_metrics': 'Duplicate marking metrics output from sambamba'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file in which to mark duplicates Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2): Number of cores to allocate for task prefix (String, default=basename(bam,\".bam\")): Prefix for the markdup result files. The extensions markdup.bam will be added. remove_duplicates (Boolean, default=false); description : If true, remove duplicates instead of marking them.; common : true Outputs duplicate_marked_bam (File) duplicate_marked_bam_index (File) markdup_log (File) flagstat description Produces a report containing statistics about the alignments based on the bit flags set in the BAM outputs {'flagstat_report': ' sambamba flagstat STDOUT redirected to a file'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to generate flagstat for Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true outfile_name (String, default=basename(bam,\".bam\") + \".flagstat.txt\"): Name for the flagstat report file use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments. Not recommended for cluster environments.; common : true Outputs flagstat_report (File)","title":"Sambamba"},{"location":"tasks/sambamba/#index","text":"description Creates a .bai BAM index for the input BAM outputs {'bam_index': \"A .bai BAM index associated with the input BAM. Filename will be basename(bam) + '.bai' .\"}","title":"index"},{"location":"tasks/sambamba/#inputs","text":"","title":"Inputs"},{"location":"tasks/sambamba/#required","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to index","title":"Required"},{"location":"tasks/sambamba/#defaults","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments. Not recommended for cluster environments.; common : true","title":"Defaults"},{"location":"tasks/sambamba/#outputs","text":"bam_index (File)","title":"Outputs"},{"location":"tasks/sambamba/#merge","text":"description Merges multiple sorted BAMs into a single BAM outputs {'merged_bam': 'The BAM resulting from merging all the input BAMs'}","title":"merge"},{"location":"tasks/sambamba/#inputs_1","text":"","title":"Inputs"},{"location":"tasks/sambamba/#required_1","text":"_runtime (Any, required ) bams (Array[File], required ): An array of BAMs to merge into one combined BAM prefix (String, required ): Prefix for the BAM file. The extension .bam will be added.","title":"Required"},{"location":"tasks/sambamba/#defaults_1","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments. Not recommended for cluster environments.; common : true","title":"Defaults"},{"location":"tasks/sambamba/#outputs_1","text":"merged_bam (File)","title":"Outputs"},{"location":"tasks/sambamba/#sort","text":"description Sorts the input BAM file outputs {'sorted_bam': 'The input BAM after it has been sorted according to sort_order '}","title":"sort"},{"location":"tasks/sambamba/#inputs_2","text":"","title":"Inputs"},{"location":"tasks/sambamba/#required_2","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to sort","title":"Required"},{"location":"tasks/sambamba/#defaults_2","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2): Number of cores to allocate for task prefix (String, default=basename(bam,\".bam\") + \".sorted\"): Prefix for the sorted BAM file. The extension .bam will be added. queryname_sort (Boolean, default=false); description : If true, sort the BAM by queryname. If false, sort by coordinate.; common : true","title":"Defaults"},{"location":"tasks/sambamba/#outputs_2","text":"sorted_bam (File)","title":"Outputs"},{"location":"tasks/sambamba/#markdup","text":"description Marks duplicate reads in the input BAM file outputs {'duplicate_marked_bam': 'The input BAM with computationally determined duplicates marked.', 'mark_duplicates_metrics': 'Duplicate marking metrics output from sambamba'}","title":"markdup"},{"location":"tasks/sambamba/#inputs_3","text":"","title":"Inputs"},{"location":"tasks/sambamba/#required_3","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file in which to mark duplicates","title":"Required"},{"location":"tasks/sambamba/#defaults_3","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2): Number of cores to allocate for task prefix (String, default=basename(bam,\".bam\")): Prefix for the markdup result files. The extensions markdup.bam will be added. remove_duplicates (Boolean, default=false); description : If true, remove duplicates instead of marking them.; common : true","title":"Defaults"},{"location":"tasks/sambamba/#outputs_3","text":"duplicate_marked_bam (File) duplicate_marked_bam_index (File) markdup_log (File)","title":"Outputs"},{"location":"tasks/sambamba/#flagstat","text":"description Produces a report containing statistics about the alignments based on the bit flags set in the BAM outputs {'flagstat_report': ' sambamba flagstat STDOUT redirected to a file'}","title":"flagstat"},{"location":"tasks/sambamba/#inputs_4","text":"","title":"Inputs"},{"location":"tasks/sambamba/#required_4","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to generate flagstat for","title":"Required"},{"location":"tasks/sambamba/#defaults_4","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true outfile_name (String, default=basename(bam,\".bam\") + \".flagstat.txt\"): Name for the flagstat report file use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments. Not recommended for cluster environments.; common : true","title":"Defaults"},{"location":"tasks/sambamba/#outputs_4","text":"flagstat_report (File)","title":"Outputs"},{"location":"tasks/samtools/","text":"Homepage quickcheck description Runs Samtools quickcheck on the input BAM file. This checks that the BAM file appears to be intact, e.g. header exists and the end-of-file marker exists. outputs {'check': 'Dummy output to enable caching'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to quickcheck Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. Outputs check (String) split description Runs Samtools split on the input BAM file. This splits the BAM by read group into one or more output files. It optionally errors if there are reads present that do not belong to a read group. Inputs Required _runtime (Any, required ) bam (File, required ); description : Input BAM format file to split; stream : true Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\")): Prefix for the split BAM files. The extensions will contain read group IDs, and will end in .bam . reject_unaccounted (Boolean, default=true); description : If true, error if there are reads present that do not have read group information.; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs split_bams (Array[File]) flagstat description Produces a samtools flagstat report containing statistics about the alignments based on the bit flags set in the BAM outputs {'flagstat_report': ' samtools flagstat STDOUT redirected to a file'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to generate flagstat for Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true outfile_name (String, default=basename(bam,\".bam\") + \".flagstat.txt\"): Name for the flagstat report file use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs flagstat_report (File) index description Creates a .bai BAM index for the input BAM outputs {'bam_index': \"A .bai BAM index associated with the input BAM. Filename will be basename(bam) + '.bai' .\"} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to index Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs bam_index (File) subsample description Randomly subsamples the input BAM, in order to produce an output BAM with approximately the desired number of reads. help A desired_reads greater than zero must be supplied. A desired_reads <= 0 will result in task failure. Sampling is probabalistic and will be approximate to desired_reads . Read count will not be exact. A sampled_bam will not be produced if the input BAM read count is less than or equal to desired_reads . outputs {'orig_read_count': 'A TSV report containing the original read count before subsampling. If subsampling was requested but the input BAM had less than desired_reads , no read count will be filled in (instead there will be a dash ).', 'sampled_bam': 'The subsampled input BAM. Only present if subsampling was performed.'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to subsample desired_reads (Int, required ): How many reads should be in the ouput BAM? Output BAM read count will be approximate to this value. Must be greater than zero. A desired_reads <= 0 will result in task failure. Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\")): Prefix for the BAM file. The extension .sampled.bam will be added. use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs orig_read_count (File) sampled_bam (File?) filter description Filters a BAM based on its bitwise flag value. help This task is a wrapper around samtools view . This task will fail if there are no reads in the output BAM. This can happen either because the input BAM was empty or because the supplied bitwise_filter was too strict. If you want to down-sample a BAM, use the subsample task instead. Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to filter bitwise_filter (FlagFilter, required ): A set of 4 possible read filters to apply. This is a FlagFilter object (see ../data_structures/flag_filter.wdl for more information). Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\") + \".filtered\"): Prefix for the filtered BAM file. The extension .bam will be added. use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs filtered_bam (File) merge description Merges multiple sorted BAMs into a single BAM outputs {'merged_bam': 'The BAM resulting from merging all the input BAMs'} Inputs Required _runtime (Any, required ) bams (Array[File], required ): An array of BAMs to merge into one combined BAM prefix (String, required ): Prefix for the BAM file. The extension .bam will be added. Optional new_header (File?): Use the lines of FILE as @ headers to be copied to the merged BAM, replacing any header lines that would otherwise be copied from the first BAM file in the list. (File may actually be in SAM format, though any alignment records it may contain are ignored.) Defaults attach_rg (Boolean, default=true); description : Attach an RG tag to each alignment. The tag value is inferred from file names.; common : true combine_pg (Boolean, default=true); description : Similarly to combine_rg : for each @PG ID in the set of files to merge, use the @PG line of the first file we find that ID in rather than adding a suffix to differentiate similar IDs.; common : true combine_rg (Boolean, default=true); description : When several input files contain @RG headers with the same ID, emit only one of them (namely, the header line from the first file we find that ID in) to the merged output file. Combining these similar headers is usually the right thing to do when the files being merged originated from the same file. Without -c , all @RG headers appear in the output file, with random suffixes added to their IDs where necessary to differentiate them.; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. name_sorted (Boolean, default=false); description : Are all input BAMs queryname sorted (true)? Or are all input BAMs coordinate sorted (false)?; common : true ncpu (Int, default=2); description : Number of cores to allocate for task; common : true region (String, default=\"\"): Merge files in the specified region (Format: chr:start-end ) use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs merged_bam (File) addreplacerg description Adds or replaces read group tags outputs {'tagged_bam': 'The transformed input BAM after read group modifications have been applied'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to add read group information Optional read_group_id (String?): Allows you to specify the read group ID of an existing @RG line and applies it to the reads specified by the orphan_only option Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true orphan_only (Boolean, default=true); description : Only add RG tags to orphans (true)? Or also overwrite all existing RG tags (including any in the header) (false)?; common : true overwrite_header_record (Boolean, default=false); description : Overwrite an existing @RG line, if a new one with the same ID value is provided?; common : true prefix (String, default=basename(bam,\".bam\") + \".addreplacerg\"): Prefix for the BAM file. The extension .bam will be added. read_group_line (Array[String], default=[]); description : Allows you to specify a read group line to append to (or replace in) the header and applies it to the reads specified by the orphan_only option. Each String in the Array should correspond to one field of the read group line. Tab literals will be inserted between each entry in the final BAM. Only one read group line can be supplied per invocation of this task.; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs tagged_bam (File) collate description Runs samtools collate on the input BAM file. Shuffles and groups reads together by their names. outputs {'collated_bam': 'A collated BAM (reads sharing a name next to each other, no other guarantee of sort order)'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to collate Defaults fast_mode (Boolean, default=true); description : Use fast mode (output primary alignments only)?; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\") + \".collated\"): Prefix for the collated BAM file. The extension .bam will be added. use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs collated_bam (File) bam_to_fastq description Converts an input BAM file into FASTQ(s) using samtools fastq . help If paired_end == false , then all reads in the BAM will be output to a single FASTQ file. Use bitwise_filter argument to remove any unwanted reads. An exit-code of 42 indicates that no reads were present in the output FASTQs. An exit-code of 43 indicates that unexpected reads were discovered in the input BAM. output {'collated_bam': 'A collated BAM (reads sharing a name next to each other, no other guarantee of sort order). Only generated if retain_collated_bam and paired_end are both true. Has the name ~{prefix}.collated.bam .', 'read_one_fastq_gz': 'Gzipped FASTQ file with 1st reads in pair. Only generated if paired_end is true and interleaved is false. Has the name ~{prefix}.R1.fastq.gz .', 'read_two_fastq_gz': 'Gzipped FASTQ file with 2nd reads in pair. Only generated if paired_end is true and interleaved is false. Has the name ~{prefix}.R2.fastq.gz .', 'singleton_reads_fastq_gz': 'Gzipped FASTQ containing singleton reads. Only generated if paired_end and output_singletons are both true. Has the name ~{prefix}.singleton.fastq.gz .', 'interleaved_reads_fastq_gz': 'Interleaved gzipped Paired-End FASTQ. Only generated if paired_end and interleaved are both true. Has the name ~{prefix}.fastq.gz . The conditions under which this output and single_end_reads_fastq_gz are created are mutually exclusive, but since they share the same literal filename they will always evaluate to the same file (or undefined if neither are created).', 'single_end_reads_fastq_gz': 'A gzipped FASTQ containing all reads. Only generated if paired_end is false. Has the name ~{prefix}.fastq.gz . The conditions under which this output and interleaved_reads_fastq_gz are created are mutually exclusive, but since they share the same literal filename they will always evaluate to the same file (or undefined if neither are created).'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to convert to FASTQ(s) Defaults append_read_number (Boolean, default=true); description : Append /1 and /2 suffixes to read names?; common : true bitwise_filter (FlagFilter, default={\"include_if_all\": \"0x0\", \"exclude_if_any\": \"0x900\", \"include_if_any\": \"0x0\", \"exclude_if_all\": \"0x0\"}): A set of 4 possible read filters to apply during conversion to FASTQ. This is a FlagFilter object (see ../data_structures/flag_filter.wdl for more information). By default, it will remove secondary and supplementary reads from the output FASTQs. collated (Boolean, default=false); description : Is the BAM collated (or name-sorted)? If collated == true , then the input BAM will be run through samtools fastq without preprocessing. If collated == false , then samtools collate must be run on the input BAM before conversion to FASTQ. Ignored if paired_end == false .; common : true fail_on_unexpected_reads (Boolean, default=false): The definition of 'unexpected' depends on whether the values of paired_end and output_singletons are true or false. If paired_end is false , no reads are considered unexpected, and every read (not caught by bitwise_filter ) will be present in the resulting FASTQ regardless of first / last bit settings. This setting will be ignored in that case. If paired_end is true then reads that don't satisfy first XOR last are considered unexpected (i.e. reads that have neither first nor last set or reads that have both first and last set). If output_singletons is false , singleton reads are considered unexpected. A singleton read is a read with either the first or the last bit set (but not both) and that possesses a unique QNAME; i.e. it is a read without a pair when all reads are expected to be paired. But if output_singletons is true , these singleton reads will be output as their own FASTQ instead of causing the task to fail. If fail_on_unexpected_reads is false , then all the above cases will be ignored. Any 'unexpected' reads will be silently discarded.; description : Should the task fail if reads with an unexpected first / last bit setting are discovered?; common : true fast_mode (Boolean, default=!retain_collated_bam); description : Fast mode for samtools collate ? If true , this removes secondary and supplementary reads during the collate step. If false , secondary and supplementary reads will be retained in the collated_bam output (if created). Defaults to the opposite of retain_collated_bam . Ignored if collated == true or paired_end == false .; common : true interleaved (Boolean, default=false); description : Create an interleaved FASTQ file from Paired-End data? Ignored if paired_end == false .; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true output_singletons (Boolean, default=false): Output singleton reads as their own FASTQ? Ignored if paired_end == false . paired_end (Boolean, default=true); description : Is the data Paired-End? If paired_end == false , then all reads in the BAM will be output to a single FASTQ file. Use bitwise_filter argument to remove any unwanted reads.; common : true prefix (String, default=basename(bam,\".bam\")): Prefix for the collated BAM and FASTQ files. The extensions .collated.bam and [,.R1,.R2,.singleton].fastq.gz will be added. retain_collated_bam (Boolean, default=false); description : Save the collated BAM to disk and output it (true)? This slows performance and substantially increases storage requirements. Be aware that collated BAMs occupy much more space than either position sorted or name sorted BAMs (due to the compression algorithm). Ignored if collated == true or paired_end == false .; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs collated_bam (File?) read_one_fastq_gz (File?) read_two_fastq_gz (File?) singleton_reads_fastq_gz (File?) interleaved_reads_fastq_gz (File?) single_end_reads_fastq_gz (File?) fixmate description Runs samtools fixmate on the name-collated input BAM file. This fills in mate coordinates and insert size fields among other tags and fields. help This task assumes a name-sorted or name-collated input BAM. If you have a position-sorted BAM, please use the position_sorted_fixmate task. This task runs fixmate and outputs a BAM in the same order as the input. outputs {'fixmate_bam': 'The BAM resulting from running samtools fixmate on the input BAM'} Inputs Required _runtime (Any, required ) bam (File, required ); description : Input BAM format file to add mate information. Must be name-sorted or name-collated.; stream : true Defaults add_cigar (Boolean, default=true); description : Add template cigar ct tag; tool_default : false; common : true add_mate_score (Boolean, default=true); description : Add mate score tags. These are used by markdup to select the best reads to keep.; tool_default : false; common : true disable_flag_sanitization (Boolean, default=false): Disable all flag sanitization? disable_proper_pair_check (Boolean, default=false): Disable proper pair check [ensure one forward and one reverse read in each pair] extension (String, default=\".bam\"); description : File format extension to use for output file.; choices : ['.bam', '.cram']; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\") + \".fixmate\"): Prefix for the output file. The extension specified with the extension parameter will be added. remove_unaligned_and_secondary (Boolean, default=false): Remove unmapped and secondary reads use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs fixmate_bam (File) position_sorted_fixmate description Runs samtools fixmate on the position-sorted input BAM file and output a position-sorted BAM. fixmate fills in mate coordinates and insert size fields among other tags and fields. samtools fixmate assumes a name-sorted or name-collated input BAM. If you already have a collated BAM, please use the fixmate task. This task collates the input BAM, runs fixmate , and then resorts the output into a position-sorted BAM. Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to add mate information. Must be position-sorted. Defaults add_cigar (Boolean, default=true); description : Add template cigar ct tag; tool_default : false; common : true add_mate_score (Boolean, default=true); description : Add mate score tags. These are used by markdup to select the best reads to keep.; tool_default : false; common : true disable_flag_sanitization (Boolean, default=false): Disable all flag sanitization? disable_proper_pair_check (Boolean, default=false): Disable proper pair check [ensure one forward and one reverse read in each pair]? fast_mode (Boolean, default=false); description : Use fast mode (output primary alignments only)?; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\") + \".fixmate\"): Prefix for the output file. The extension .bam will be added. remove_unaligned_and_secondary (Boolean, default=false): Remove unmapped and secondary reads use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs fixmate_bam (File) markdup description [DEPRECATED] Runs samtools markdup on the position-sorted input BAM file. This creates a report and optionally a new BAM with duplicate reads marked. help This task assumes samtools fixmate has already been run on the input BAM. If it has not, then the output may be incorrect. A name-sorted or collated BAM can be run through the fixmate task (and then position-sorted prior to this task) or a position-sorted BAM can be run through the position_sorted_fixmate task. Deprecated due to extremely high memory usage for certain RNA-Seq samples when searching for optical duplicates. Use mark_duplicates in ./picard.wdl instead. deprecated true Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to mark duplicates in Defaults coordinates_order (String, default=\"txy\"); description : The order of the elements captured in the read_coords_regex regular expression. Default is txy where t is a part of the read name selected for string comparison and x / y are the coordinates used for optical duplicate detection. Ignored if optical_distance == 0 .; choices : ['txy', 'tyx', 'xyt', 'yxt', 'xty', 'ytx', 'xy', 'yx'] create_bam (Boolean, default=true): Create a new BAM with duplicate reads marked? If false , then only a markdup report will be generated. duplicate_count (Boolean, default=false): Record the original primary read duplication count (include itself) in a dc tag? Ignored if create_bam == false . duplicates_of_duplicates_check (Boolean, default=false): Check duplicates of duplicates for correctness? Performs further checks to make sure all optical duplicates are found. Also operates on mark_duplicates_with_do_tag tagging where reads may be tagged with the best quality read. Disabling this option can speed up duplicate marking when there are a great many duplicates for each original read. Ignored if create_bam == false or optical_distance == 0 . include_qc_fails (Boolean, default=false): Include reads that have the QC-failed flag set in duplicate marking? This can increase the number of duplicates found. Ignored if create_bam == false . json (Boolean, default=false): Output a JSON report instead of a text report? Either are parseable by MultiQC. mark_duplicates_with_do_tag (Boolean, default=false): Mark duplicates with the do ( d uplicate o riginal) tag? The do tag contains the name of the \"original\" read that was duplicated. Ignored if create_bam == false . mark_supp_or_sec_or_unmapped_as_duplicates (Boolean, default=false): Mark supplementary, secondary, or unmapped alignments of duplicates as duplicates? As this takes a quick second pass over the data it will increase running time. Ignored if create_bam == false . max_readlen (Int, default=300): Expected maximum read length. modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true optical_distance (Int, default=0): Maximum distance between read coordinates to consider them optical duplicates. If 0 , then optical duplicate marking is disabled. Suggested settings of 100 for HiSeq style platforms or about 2500 for NovaSeq ones. When set above 0 , duplicate reads are tagged with dt:Z:SQ for optical duplicates and dt:Z:LB otherwise. Calculation of distance depends on coordinate data embedded in the read names, typically produced by the Illumina sequencing machines. Optical duplicate detection will not work on non-standard names without modifying read_coords_regex . If changing read_coords_regex , make sure that coordinates_order matches. prefix (String, default=basename(bam,\".bam\") + \".markdup\"): Prefix for the output file. TODO read_coords_regex (String, default=\"[!-9;-?A-~:]+:([!-9;-?A-~]+):([0-9]+):([0-9]+)\"); description : Regular expression to extract read coordinates from the QNAME field. This takes a POSIX regular expression for at least x and y to be used in optical duplicate marking It can also include another part of the read name to test for equality, eg lane:tile elements. Elements wanted are captured with parentheses. The default is meant to capture information from Illumina style read names. Ignored if optical_distance == 0 . If changing read_coords_regex , make sure that coordinates_order matches.; tool_default : ([!-9;-?A-~]+:[0-9]+:[0-9]+:[0-9]+:[0-9]+):([0-9]+):([0-9]+) remove_duplicates (Boolean, default=false): Remove duplicates from the output BAM? Ignored if create_bam == false . use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true use_read_groups (Boolean, default=false): Only mark duplicates within the same Read Group? Ignored if create_bam == false . Outputs markdup_report (File) markdup_bam (File?) faidx description Creates a .fai FASTA index for the input FASTA outputs {'fasta_index': \"A .fai FASTA index associated with the input FASTA. Filename will be basename(fasta) + '.fai' .\"} Inputs Required _runtime (Any, required ) fasta (File, required ): Input FASTA format file to index. Optionally gzip compressed. Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments. Not recommended for cluster environments.; common : true Outputs fasta_index (File)","title":"Samtools"},{"location":"tasks/samtools/#quickcheck","text":"description Runs Samtools quickcheck on the input BAM file. This checks that the BAM file appears to be intact, e.g. header exists and the end-of-file marker exists. outputs {'check': 'Dummy output to enable caching'}","title":"quickcheck"},{"location":"tasks/samtools/#inputs","text":"","title":"Inputs"},{"location":"tasks/samtools/#required","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to quickcheck","title":"Required"},{"location":"tasks/samtools/#defaults","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.","title":"Defaults"},{"location":"tasks/samtools/#outputs","text":"check (String)","title":"Outputs"},{"location":"tasks/samtools/#split","text":"description Runs Samtools split on the input BAM file. This splits the BAM by read group into one or more output files. It optionally errors if there are reads present that do not belong to a read group.","title":"split"},{"location":"tasks/samtools/#inputs_1","text":"","title":"Inputs"},{"location":"tasks/samtools/#required_1","text":"_runtime (Any, required ) bam (File, required ); description : Input BAM format file to split; stream : true","title":"Required"},{"location":"tasks/samtools/#defaults_1","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\")): Prefix for the split BAM files. The extensions will contain read group IDs, and will end in .bam . reject_unaccounted (Boolean, default=true); description : If true, error if there are reads present that do not have read group information.; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/samtools/#outputs_1","text":"split_bams (Array[File])","title":"Outputs"},{"location":"tasks/samtools/#flagstat","text":"description Produces a samtools flagstat report containing statistics about the alignments based on the bit flags set in the BAM outputs {'flagstat_report': ' samtools flagstat STDOUT redirected to a file'}","title":"flagstat"},{"location":"tasks/samtools/#inputs_2","text":"","title":"Inputs"},{"location":"tasks/samtools/#required_2","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to generate flagstat for","title":"Required"},{"location":"tasks/samtools/#defaults_2","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true outfile_name (String, default=basename(bam,\".bam\") + \".flagstat.txt\"): Name for the flagstat report file use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/samtools/#outputs_2","text":"flagstat_report (File)","title":"Outputs"},{"location":"tasks/samtools/#index","text":"description Creates a .bai BAM index for the input BAM outputs {'bam_index': \"A .bai BAM index associated with the input BAM. Filename will be basename(bam) + '.bai' .\"}","title":"index"},{"location":"tasks/samtools/#inputs_3","text":"","title":"Inputs"},{"location":"tasks/samtools/#required_3","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to index","title":"Required"},{"location":"tasks/samtools/#defaults_3","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/samtools/#outputs_3","text":"bam_index (File)","title":"Outputs"},{"location":"tasks/samtools/#subsample","text":"description Randomly subsamples the input BAM, in order to produce an output BAM with approximately the desired number of reads. help A desired_reads greater than zero must be supplied. A desired_reads <= 0 will result in task failure. Sampling is probabalistic and will be approximate to desired_reads . Read count will not be exact. A sampled_bam will not be produced if the input BAM read count is less than or equal to desired_reads . outputs {'orig_read_count': 'A TSV report containing the original read count before subsampling. If subsampling was requested but the input BAM had less than desired_reads , no read count will be filled in (instead there will be a dash ).', 'sampled_bam': 'The subsampled input BAM. Only present if subsampling was performed.'}","title":"subsample"},{"location":"tasks/samtools/#inputs_4","text":"","title":"Inputs"},{"location":"tasks/samtools/#required_4","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to subsample desired_reads (Int, required ): How many reads should be in the ouput BAM? Output BAM read count will be approximate to this value. Must be greater than zero. A desired_reads <= 0 will result in task failure.","title":"Required"},{"location":"tasks/samtools/#defaults_4","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\")): Prefix for the BAM file. The extension .sampled.bam will be added. use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/samtools/#outputs_4","text":"orig_read_count (File) sampled_bam (File?)","title":"Outputs"},{"location":"tasks/samtools/#filter","text":"description Filters a BAM based on its bitwise flag value. help This task is a wrapper around samtools view . This task will fail if there are no reads in the output BAM. This can happen either because the input BAM was empty or because the supplied bitwise_filter was too strict. If you want to down-sample a BAM, use the subsample task instead.","title":"filter"},{"location":"tasks/samtools/#inputs_5","text":"","title":"Inputs"},{"location":"tasks/samtools/#required_5","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to filter bitwise_filter (FlagFilter, required ): A set of 4 possible read filters to apply. This is a FlagFilter object (see ../data_structures/flag_filter.wdl for more information).","title":"Required"},{"location":"tasks/samtools/#defaults_5","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\") + \".filtered\"): Prefix for the filtered BAM file. The extension .bam will be added. use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/samtools/#outputs_5","text":"filtered_bam (File)","title":"Outputs"},{"location":"tasks/samtools/#merge","text":"description Merges multiple sorted BAMs into a single BAM outputs {'merged_bam': 'The BAM resulting from merging all the input BAMs'}","title":"merge"},{"location":"tasks/samtools/#inputs_6","text":"","title":"Inputs"},{"location":"tasks/samtools/#required_6","text":"_runtime (Any, required ) bams (Array[File], required ): An array of BAMs to merge into one combined BAM prefix (String, required ): Prefix for the BAM file. The extension .bam will be added.","title":"Required"},{"location":"tasks/samtools/#optional","text":"new_header (File?): Use the lines of FILE as @ headers to be copied to the merged BAM, replacing any header lines that would otherwise be copied from the first BAM file in the list. (File may actually be in SAM format, though any alignment records it may contain are ignored.)","title":"Optional"},{"location":"tasks/samtools/#defaults_6","text":"attach_rg (Boolean, default=true); description : Attach an RG tag to each alignment. The tag value is inferred from file names.; common : true combine_pg (Boolean, default=true); description : Similarly to combine_rg : for each @PG ID in the set of files to merge, use the @PG line of the first file we find that ID in rather than adding a suffix to differentiate similar IDs.; common : true combine_rg (Boolean, default=true); description : When several input files contain @RG headers with the same ID, emit only one of them (namely, the header line from the first file we find that ID in) to the merged output file. Combining these similar headers is usually the right thing to do when the files being merged originated from the same file. Without -c , all @RG headers appear in the output file, with random suffixes added to their IDs where necessary to differentiate them.; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. name_sorted (Boolean, default=false); description : Are all input BAMs queryname sorted (true)? Or are all input BAMs coordinate sorted (false)?; common : true ncpu (Int, default=2); description : Number of cores to allocate for task; common : true region (String, default=\"\"): Merge files in the specified region (Format: chr:start-end ) use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/samtools/#outputs_6","text":"merged_bam (File)","title":"Outputs"},{"location":"tasks/samtools/#addreplacerg","text":"description Adds or replaces read group tags outputs {'tagged_bam': 'The transformed input BAM after read group modifications have been applied'}","title":"addreplacerg"},{"location":"tasks/samtools/#inputs_7","text":"","title":"Inputs"},{"location":"tasks/samtools/#required_7","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to add read group information","title":"Required"},{"location":"tasks/samtools/#optional_1","text":"read_group_id (String?): Allows you to specify the read group ID of an existing @RG line and applies it to the reads specified by the orphan_only option","title":"Optional"},{"location":"tasks/samtools/#defaults_7","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true orphan_only (Boolean, default=true); description : Only add RG tags to orphans (true)? Or also overwrite all existing RG tags (including any in the header) (false)?; common : true overwrite_header_record (Boolean, default=false); description : Overwrite an existing @RG line, if a new one with the same ID value is provided?; common : true prefix (String, default=basename(bam,\".bam\") + \".addreplacerg\"): Prefix for the BAM file. The extension .bam will be added. read_group_line (Array[String], default=[]); description : Allows you to specify a read group line to append to (or replace in) the header and applies it to the reads specified by the orphan_only option. Each String in the Array should correspond to one field of the read group line. Tab literals will be inserted between each entry in the final BAM. Only one read group line can be supplied per invocation of this task.; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/samtools/#outputs_7","text":"tagged_bam (File)","title":"Outputs"},{"location":"tasks/samtools/#collate","text":"description Runs samtools collate on the input BAM file. Shuffles and groups reads together by their names. outputs {'collated_bam': 'A collated BAM (reads sharing a name next to each other, no other guarantee of sort order)'}","title":"collate"},{"location":"tasks/samtools/#inputs_8","text":"","title":"Inputs"},{"location":"tasks/samtools/#required_8","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to collate","title":"Required"},{"location":"tasks/samtools/#defaults_8","text":"fast_mode (Boolean, default=true); description : Use fast mode (output primary alignments only)?; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\") + \".collated\"): Prefix for the collated BAM file. The extension .bam will be added. use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/samtools/#outputs_8","text":"collated_bam (File)","title":"Outputs"},{"location":"tasks/samtools/#bam_to_fastq","text":"description Converts an input BAM file into FASTQ(s) using samtools fastq . help If paired_end == false , then all reads in the BAM will be output to a single FASTQ file. Use bitwise_filter argument to remove any unwanted reads. An exit-code of 42 indicates that no reads were present in the output FASTQs. An exit-code of 43 indicates that unexpected reads were discovered in the input BAM. output {'collated_bam': 'A collated BAM (reads sharing a name next to each other, no other guarantee of sort order). Only generated if retain_collated_bam and paired_end are both true. Has the name ~{prefix}.collated.bam .', 'read_one_fastq_gz': 'Gzipped FASTQ file with 1st reads in pair. Only generated if paired_end is true and interleaved is false. Has the name ~{prefix}.R1.fastq.gz .', 'read_two_fastq_gz': 'Gzipped FASTQ file with 2nd reads in pair. Only generated if paired_end is true and interleaved is false. Has the name ~{prefix}.R2.fastq.gz .', 'singleton_reads_fastq_gz': 'Gzipped FASTQ containing singleton reads. Only generated if paired_end and output_singletons are both true. Has the name ~{prefix}.singleton.fastq.gz .', 'interleaved_reads_fastq_gz': 'Interleaved gzipped Paired-End FASTQ. Only generated if paired_end and interleaved are both true. Has the name ~{prefix}.fastq.gz . The conditions under which this output and single_end_reads_fastq_gz are created are mutually exclusive, but since they share the same literal filename they will always evaluate to the same file (or undefined if neither are created).', 'single_end_reads_fastq_gz': 'A gzipped FASTQ containing all reads. Only generated if paired_end is false. Has the name ~{prefix}.fastq.gz . The conditions under which this output and interleaved_reads_fastq_gz are created are mutually exclusive, but since they share the same literal filename they will always evaluate to the same file (or undefined if neither are created).'}","title":"bam_to_fastq"},{"location":"tasks/samtools/#inputs_9","text":"","title":"Inputs"},{"location":"tasks/samtools/#required_9","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to convert to FASTQ(s)","title":"Required"},{"location":"tasks/samtools/#defaults_9","text":"append_read_number (Boolean, default=true); description : Append /1 and /2 suffixes to read names?; common : true bitwise_filter (FlagFilter, default={\"include_if_all\": \"0x0\", \"exclude_if_any\": \"0x900\", \"include_if_any\": \"0x0\", \"exclude_if_all\": \"0x0\"}): A set of 4 possible read filters to apply during conversion to FASTQ. This is a FlagFilter object (see ../data_structures/flag_filter.wdl for more information). By default, it will remove secondary and supplementary reads from the output FASTQs. collated (Boolean, default=false); description : Is the BAM collated (or name-sorted)? If collated == true , then the input BAM will be run through samtools fastq without preprocessing. If collated == false , then samtools collate must be run on the input BAM before conversion to FASTQ. Ignored if paired_end == false .; common : true fail_on_unexpected_reads (Boolean, default=false): The definition of 'unexpected' depends on whether the values of paired_end and output_singletons are true or false. If paired_end is false , no reads are considered unexpected, and every read (not caught by bitwise_filter ) will be present in the resulting FASTQ regardless of first / last bit settings. This setting will be ignored in that case. If paired_end is true then reads that don't satisfy first XOR last are considered unexpected (i.e. reads that have neither first nor last set or reads that have both first and last set). If output_singletons is false , singleton reads are considered unexpected. A singleton read is a read with either the first or the last bit set (but not both) and that possesses a unique QNAME; i.e. it is a read without a pair when all reads are expected to be paired. But if output_singletons is true , these singleton reads will be output as their own FASTQ instead of causing the task to fail. If fail_on_unexpected_reads is false , then all the above cases will be ignored. Any 'unexpected' reads will be silently discarded.; description : Should the task fail if reads with an unexpected first / last bit setting are discovered?; common : true fast_mode (Boolean, default=!retain_collated_bam); description : Fast mode for samtools collate ? If true , this removes secondary and supplementary reads during the collate step. If false , secondary and supplementary reads will be retained in the collated_bam output (if created). Defaults to the opposite of retain_collated_bam . Ignored if collated == true or paired_end == false .; common : true interleaved (Boolean, default=false); description : Create an interleaved FASTQ file from Paired-End data? Ignored if paired_end == false .; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true output_singletons (Boolean, default=false): Output singleton reads as their own FASTQ? Ignored if paired_end == false . paired_end (Boolean, default=true); description : Is the data Paired-End? If paired_end == false , then all reads in the BAM will be output to a single FASTQ file. Use bitwise_filter argument to remove any unwanted reads.; common : true prefix (String, default=basename(bam,\".bam\")): Prefix for the collated BAM and FASTQ files. The extensions .collated.bam and [,.R1,.R2,.singleton].fastq.gz will be added. retain_collated_bam (Boolean, default=false); description : Save the collated BAM to disk and output it (true)? This slows performance and substantially increases storage requirements. Be aware that collated BAMs occupy much more space than either position sorted or name sorted BAMs (due to the compression algorithm). Ignored if collated == true or paired_end == false .; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/samtools/#outputs_9","text":"collated_bam (File?) read_one_fastq_gz (File?) read_two_fastq_gz (File?) singleton_reads_fastq_gz (File?) interleaved_reads_fastq_gz (File?) single_end_reads_fastq_gz (File?)","title":"Outputs"},{"location":"tasks/samtools/#fixmate","text":"description Runs samtools fixmate on the name-collated input BAM file. This fills in mate coordinates and insert size fields among other tags and fields. help This task assumes a name-sorted or name-collated input BAM. If you have a position-sorted BAM, please use the position_sorted_fixmate task. This task runs fixmate and outputs a BAM in the same order as the input. outputs {'fixmate_bam': 'The BAM resulting from running samtools fixmate on the input BAM'}","title":"fixmate"},{"location":"tasks/samtools/#inputs_10","text":"","title":"Inputs"},{"location":"tasks/samtools/#required_10","text":"_runtime (Any, required ) bam (File, required ); description : Input BAM format file to add mate information. Must be name-sorted or name-collated.; stream : true","title":"Required"},{"location":"tasks/samtools/#defaults_10","text":"add_cigar (Boolean, default=true); description : Add template cigar ct tag; tool_default : false; common : true add_mate_score (Boolean, default=true); description : Add mate score tags. These are used by markdup to select the best reads to keep.; tool_default : false; common : true disable_flag_sanitization (Boolean, default=false): Disable all flag sanitization? disable_proper_pair_check (Boolean, default=false): Disable proper pair check [ensure one forward and one reverse read in each pair] extension (String, default=\".bam\"); description : File format extension to use for output file.; choices : ['.bam', '.cram']; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\") + \".fixmate\"): Prefix for the output file. The extension specified with the extension parameter will be added. remove_unaligned_and_secondary (Boolean, default=false): Remove unmapped and secondary reads use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/samtools/#outputs_10","text":"fixmate_bam (File)","title":"Outputs"},{"location":"tasks/samtools/#position_sorted_fixmate","text":"description Runs samtools fixmate on the position-sorted input BAM file and output a position-sorted BAM. fixmate fills in mate coordinates and insert size fields among other tags and fields. samtools fixmate assumes a name-sorted or name-collated input BAM. If you already have a collated BAM, please use the fixmate task. This task collates the input BAM, runs fixmate , and then resorts the output into a position-sorted BAM.","title":"position_sorted_fixmate"},{"location":"tasks/samtools/#inputs_11","text":"","title":"Inputs"},{"location":"tasks/samtools/#required_11","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to add mate information. Must be position-sorted.","title":"Required"},{"location":"tasks/samtools/#defaults_11","text":"add_cigar (Boolean, default=true); description : Add template cigar ct tag; tool_default : false; common : true add_mate_score (Boolean, default=true); description : Add mate score tags. These are used by markdup to select the best reads to keep.; tool_default : false; common : true disable_flag_sanitization (Boolean, default=false): Disable all flag sanitization? disable_proper_pair_check (Boolean, default=false): Disable proper pair check [ensure one forward and one reverse read in each pair]? fast_mode (Boolean, default=false); description : Use fast mode (output primary alignments only)?; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\") + \".fixmate\"): Prefix for the output file. The extension .bam will be added. remove_unaligned_and_secondary (Boolean, default=false): Remove unmapped and secondary reads use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/samtools/#outputs_11","text":"fixmate_bam (File)","title":"Outputs"},{"location":"tasks/samtools/#markdup","text":"description [DEPRECATED] Runs samtools markdup on the position-sorted input BAM file. This creates a report and optionally a new BAM with duplicate reads marked. help This task assumes samtools fixmate has already been run on the input BAM. If it has not, then the output may be incorrect. A name-sorted or collated BAM can be run through the fixmate task (and then position-sorted prior to this task) or a position-sorted BAM can be run through the position_sorted_fixmate task. Deprecated due to extremely high memory usage for certain RNA-Seq samples when searching for optical duplicates. Use mark_duplicates in ./picard.wdl instead. deprecated true","title":"markdup"},{"location":"tasks/samtools/#inputs_12","text":"","title":"Inputs"},{"location":"tasks/samtools/#required_12","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to mark duplicates in","title":"Required"},{"location":"tasks/samtools/#defaults_12","text":"coordinates_order (String, default=\"txy\"); description : The order of the elements captured in the read_coords_regex regular expression. Default is txy where t is a part of the read name selected for string comparison and x / y are the coordinates used for optical duplicate detection. Ignored if optical_distance == 0 .; choices : ['txy', 'tyx', 'xyt', 'yxt', 'xty', 'ytx', 'xy', 'yx'] create_bam (Boolean, default=true): Create a new BAM with duplicate reads marked? If false , then only a markdup report will be generated. duplicate_count (Boolean, default=false): Record the original primary read duplication count (include itself) in a dc tag? Ignored if create_bam == false . duplicates_of_duplicates_check (Boolean, default=false): Check duplicates of duplicates for correctness? Performs further checks to make sure all optical duplicates are found. Also operates on mark_duplicates_with_do_tag tagging where reads may be tagged with the best quality read. Disabling this option can speed up duplicate marking when there are a great many duplicates for each original read. Ignored if create_bam == false or optical_distance == 0 . include_qc_fails (Boolean, default=false): Include reads that have the QC-failed flag set in duplicate marking? This can increase the number of duplicates found. Ignored if create_bam == false . json (Boolean, default=false): Output a JSON report instead of a text report? Either are parseable by MultiQC. mark_duplicates_with_do_tag (Boolean, default=false): Mark duplicates with the do ( d uplicate o riginal) tag? The do tag contains the name of the \"original\" read that was duplicated. Ignored if create_bam == false . mark_supp_or_sec_or_unmapped_as_duplicates (Boolean, default=false): Mark supplementary, secondary, or unmapped alignments of duplicates as duplicates? As this takes a quick second pass over the data it will increase running time. Ignored if create_bam == false . max_readlen (Int, default=300): Expected maximum read length. modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true optical_distance (Int, default=0): Maximum distance between read coordinates to consider them optical duplicates. If 0 , then optical duplicate marking is disabled. Suggested settings of 100 for HiSeq style platforms or about 2500 for NovaSeq ones. When set above 0 , duplicate reads are tagged with dt:Z:SQ for optical duplicates and dt:Z:LB otherwise. Calculation of distance depends on coordinate data embedded in the read names, typically produced by the Illumina sequencing machines. Optical duplicate detection will not work on non-standard names without modifying read_coords_regex . If changing read_coords_regex , make sure that coordinates_order matches. prefix (String, default=basename(bam,\".bam\") + \".markdup\"): Prefix for the output file. TODO read_coords_regex (String, default=\"[!-9;-?A-~:]+:([!-9;-?A-~]+):([0-9]+):([0-9]+)\"); description : Regular expression to extract read coordinates from the QNAME field. This takes a POSIX regular expression for at least x and y to be used in optical duplicate marking It can also include another part of the read name to test for equality, eg lane:tile elements. Elements wanted are captured with parentheses. The default is meant to capture information from Illumina style read names. Ignored if optical_distance == 0 . If changing read_coords_regex , make sure that coordinates_order matches.; tool_default : ([!-9;-?A-~]+:[0-9]+:[0-9]+:[0-9]+:[0-9]+):([0-9]+):([0-9]+) remove_duplicates (Boolean, default=false): Remove duplicates from the output BAM? Ignored if create_bam == false . use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true use_read_groups (Boolean, default=false): Only mark duplicates within the same Read Group? Ignored if create_bam == false .","title":"Defaults"},{"location":"tasks/samtools/#outputs_12","text":"markdup_report (File) markdup_bam (File?)","title":"Outputs"},{"location":"tasks/samtools/#faidx","text":"description Creates a .fai FASTA index for the input FASTA outputs {'fasta_index': \"A .fai FASTA index associated with the input FASTA. Filename will be basename(fasta) + '.fai' .\"}","title":"faidx"},{"location":"tasks/samtools/#inputs_13","text":"","title":"Inputs"},{"location":"tasks/samtools/#required_13","text":"_runtime (Any, required ) fasta (File, required ): Input FASTA format file to index. Optionally gzip compressed.","title":"Required"},{"location":"tasks/samtools/#defaults_13","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments. Not recommended for cluster environments.; common : true","title":"Defaults"},{"location":"tasks/samtools/#outputs_13","text":"fasta_index (File)","title":"Outputs"},{"location":"tasks/star/","text":"Homepage build_star_db description Runs STAR's build command to generate a STAR format reference for alignment outputs {'star_db': 'A gzipped TAR file containing the STAR reference files. Suitable as the star_db_tar_gz input to the alignment task.'} Inputs Required _runtime (Any, required ) gtf (File, required ): GTF format feature file reference_fasta (File, required ): The FASTA format reference file for the genome Defaults db_name (String, default=\"star_db\"); description : Name for output in compressed, archived format. The suffix .tar.gz will be added.; common : true genomeChrBinNbits (Int, default=18): =log2(chrBin), where chrBin is the size of the bins for genome storage: each chromosome will occupy an integer number of bins. For a genome with large number of contigs, it is recommended to scale this parameter as min(18, log2[max(GenomeLength/NumberOfReferences,ReadLength)]). genomeSAindexNbases (Int, default=14): length (bases) of the SA pre-indexing string. Typically between 10 and 15. Longer strings will use much more memory, but allow faster searches. For small genomes, the parameter --genomeSAindexNbases must be scaled down to min(14, log2(GenomeLength)/2 - 1) . genomeSAsparseD (Int, default=1): suffix array sparsity, i.e. distance between indices: use bigger numbers to decrease needed RAM at the cost of mapping speed reduction. genomeSuffixLengthMax (Int, default=-1): maximum length of the suffixes, has to be longer than read length. -1 = infinite. memory_gb (Int, default=50): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=8); description : Number of cores to allocate for task; common : true sjdbGTFchrPrefix (String, default=\"-\"); description : prefix for chromosome names in a GTF file (e.g. 'chr' for using ENSMEBL annotations with UCSC genomes); common : true sjdbGTFfeatureExon (String, default=\"exon\"): feature type in GTF file to be used as exons for building transcripts sjdbGTFtagExonParentGene (String, default=\"gene_id\"): GTF attribute name for parent gene ID sjdbGTFtagExonParentGeneName (String, default=\"gene_name\"): GTF attrbute name for parent gene name sjdbGTFtagExonParentGeneType (String, default=\"gene_type gene_biotype\"): GTF attrbute name for parent gene type sjdbGTFtagExonParentTranscript (String, default=\"transcript_id\"): GTF attribute name for parent transcript ID sjdbOverhang (Int, default=125); description : length of the donor/acceptor sequence on each side of the junctions, ideally = (mate_length - 1). [STAR default] : 100 . [WDL default] : 125 .; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs star_db (File) alignment description Runs the STAR aligner on a set of RNA-Seq FASTQ files external_help https://github.com/alexdobin/STAR/blob/2.7.11b/doc/STARmanual.pdf outputs {'star_log': 'Summary mapping statistics after mapping job is complete. The statistics are calculated for each read (Single- or Paired-End) and then summed or averaged over all reads. Note that STAR counts a Paired-End read as one read. Most of the information is collected about the UNIQUE mappers. Each splicing is counted in the numbers of splices, which would correspond to summing the counts in SJ.out.tab. The mismatch/indel error rates are calculated on a per base basis, i.e. as total number of mismatches/indels in all unique mappers divided by the total number of mapped bases.', 'star_bam': 'STAR aligned BAM', 'star_junctions': 'File contains high confidence collapsed splice junctions in tab-delimited format. Note that STAR defines the junction start/end as intronic bases, while many other software define them as exonic bases. See meta.external_help for file specification.', 'star_chimeric_junctions': 'Tab delimited file containing chimeric reads and associated metadata. See meta.external_help for file specification.'} Inputs Required _runtime (Any, required ) prefix (String, required ): Prefix for the BAM and other STAR files. The extensions .Aligned.out.bam , .Log.final.out , .SJ.out.tab , and .Chimeric.out.junction will be added. read_one_fastqs_gz (Array[File], required ): An array of gzipped FASTQ files containing read one information star_db_tar_gz (File, required ): A gzipped TAR file containing the STAR reference files. The name of the root directory which was archived must match the archive's filename without the .tar.gz extension. Optional read_groups (String?): A string containing the read group information to output in the BAM file. If including multiple read group fields per-read group, they should be space delimited. Read groups should be comma separated, with a space on each side (i.e. ' , '). The ID field must come first for each read group and must be contained in the basename of a FASTQ file or pair of FASTQ files if Paired-End. Example: ID:rg1 PU:flowcell1.lane1 SM:sample1 PL:illumina LB:sample1_lib1 , ID:rg2 PU:flowcell1.lane2 SM:sample1 PL:illumina LB:sample1_lib1 . These two read groups could be associated with the following four FASTQs: sample1.rg1_R1.fastq,sample1.rg2_R1.fastq and sample1.rg1_R2.fastq,sample1.rg2_R2.fastq Defaults alignEndsProtrude (Pair[Int,String], default=(0, \"ConcordantPair\")); description : allow protrusion of alignment ends, i.e. start (end) of the +strand mate downstream of the start (end) of the -strand mate. left : maximum number of protrusion bases allowed. right : see choices below.; choices : {'ConcordantPair': 'report alignments with non-zero protrusion as concordant pairs', 'DiscordantPair': 'report alignments with non-zero protrusion as discordant pairs'} alignEndsType (String, default=\"Local\"); description : type of read ends alignment; choices : {'Local': 'standard local alignment with soft-clipping allowed', 'EndToEnd': 'force end-to-end read alignment, do not soft-clip', 'Extend5pOfRead1': 'fully extend only the 5p of the read1, all other ends: local alignment', 'Extend5pOfReads12': 'fully extend only the 5p of the both read1 and read2, all other ends: local alignment'} alignInsertionFlush (String, default=\"None\"); description : how to flush ambiguous insertion positions; choices : {'None': 'insertions are not flushed', 'Right': 'insertions are flushed to the right'}; common : true alignIntronMax (Int, default=500000); description : maximum intron size, if 0, max intron size will be determined by (2^winBinNbits) winAnchorDistNbins. [STAR default] : 0 . [WDL default] : 500000 .; common *: true alignIntronMin (Int, default=21); description : minimum intron size: genomic gap is considered intron if its length>=alignIntronMin, otherwise it is considered Deletion; common : true alignMatesGapMax (Int, default=1000000); description : maximum gap between two mates, if 0, max intron gap will be determined by (2^winBinNbits) winAnchorDistNbins. [STAR default] : 0 . [WDL default] : 1000000 ; common *: true alignSJDBoverhangMin (Int, default=1); description : minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments. [STAR default] : 3 . [WDL default] : 1 .; common : true alignSJoverhangMin (Int, default=5); description : minimum overhang (i.e. block size) for spliced alignments; common : true alignSJstitchMismatchNmax (SJ_Motifs, default={\"noncanonical_motifs\": 0, \"GT_AG_and_CT_AC_motif\": -1, \"GC_AG_and_CT_GC_motif\": 0, \"AT_AC_and_GT_AT_motif\": 0}): maximum number of mismatches for stitching of the splice junctions (-1: no limit) for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif alignSoftClipAtReferenceEnds (String, default=\"Yes\"); description : allow the soft-clipping of the alignments past the end of the chromosomes; choices : {'Yes': 'allow', 'No': 'prohibit, useful for compatibility with Cufflinks'}; common : true alignSplicedMateMapLmin (Int, default=0): minimum mapped length for a read mate that is spliced alignSplicedMateMapLminOverLmate (Float, default=0.66): alignSplicedMateMapLmin normalized to mate length alignTranscriptsPerReadNmax (Int, default=10000): max number of different alignments per read to consider alignTranscriptsPerWindowNmax (Int, default=100): max number of transcripts per window alignWindowsPerReadNmax (Int, default=10000): max number of windows per read chimFilter (String, default=\"banGenomicN\"); description : different filters for chimeric alignments; choices : {'None': 'no filtering', 'banGenomicN': 'Ns are not allowed in the genome sequence around the chimeric junction'} chimJunctionOverhangMin (Int, default=20); description : minimum overhang for a chimeric junction; common : true chimMainSegmentMultNmax (Int, default=10); description : maximum number of multi-alignments for the main chimeric segment. =1 will prohibit multimapping main segments.; common : true chimMultimapNmax (Int, default=0); description : maximum number of chimeric multi-alignments. 0 : use the old scheme for chimeric detection which only considered unique alignments; common : true chimMultimapScoreRange (Int, default=1): the score range for multi-mapping chimeras below the best chimeric score. Only works with --chimMultimapNmax > 1. chimNonchimScoreDropMin (Int, default=20): to trigger chimeric detection, the drop in the best non-chimeric alignment score with respect to the read length has to be greater than this value chimOutJunctionFormat (String, default=\"plain\"); description : formatting type for the Chimeric.out.junction file; choices : {'plain': 'no comment lines/headers', 'comments': 'comment lines at the end of the file: command line and Nreads: total, unique/multi-mapping'}; common : true chimOutType (String, default=\"Junctions\"); description : type of chimeric output; choices : {'Junctions': 'Chimeric.out.junction', 'WithinBAM_HardClip': 'output into main aligned BAM files (Aligned. .bam). Hard-clipping in the CIGAR for supplemental chimeric alignments.', 'WithinBAM_SoftClip': 'output into main aligned BAM files (Aligned. .bam). Soft-clipping in the CIGAR for supplemental chimeric alignments.'}; common : true chimScoreDropMax (Int, default=20); description : max drop (difference) of chimeric score (the sum of scores of all chimeric segments) from the read length; common : true chimScoreJunctionNonGTAG (Int, default=-1): penalty for a non-GT/AG chimeric junction chimScoreMin (Int, default=0); description : minimum total (summed) score of the chimeric segments; common : true chimScoreSeparation (Int, default=10): minimum difference (separation) between the best chimeric score and the next one chimSegmentMin (Int, default=0); description : minimum length of chimeric segment length, if ==0, no chimeric output; common : true chimSegmentReadGapMax (Int, default=0); description : maximum gap in the read sequence between chimeric segments; common : true clip3pAdapterMMp (Pair[Float,Float], default=(0.1, 0.1)): max proportion of mismatches for 3p adapter clipping for each mate. left applies to read one and right applies to read two. clip3pAdapterSeq (Pair[String,String], default=(\"None\", \"None\")); description : adapter sequences to clip from 3p of each mate. left applies to read one and right applies to read two.; choices : {'None': 'No 3p adapter trimming will be performed', 'sequence': 'A nucleotide sequence string of any length, matching the regex /[ATCG]+/ ', 'polyA': 'polyA sequence with the length equal to read length'}; common : true clip3pAfterAdapterNbases (Pair[Int,Int], default=(0, 0)): number of bases to clip from 3p of each mate after the adapter clipping. left applies to read one and right applies to read two. clip3pNbases (Pair[Int,Int], default=(0, 0)): number of bases to clip from 3p of each mate. left applies to read one and right applies to read two. clip5pNbases (Pair[Int,Int], default=(0, 0)): number of bases to clip from 5p of each mate. left applies to read one and right applies to read two. clipAdapterType (String, default=\"Hamming\"); description : adapter clipping type; choices : {'Hamming': 'adapter clipping based on Hamming distance, with the number of mismatches controlled by --clip5pAdapterMMp', 'CellRanger4': '5p and 3p adapter clipping similar to CellRanger4. Utilizes Opal package by Martin \u0160o\u0161i\u0107: https://github.com/Martinsos/opal', 'None': 'no adapter clipping, all other clip* parameters are disregarded'} limitOutSJcollapsed (Int, default=1000000): max number of collapsed junctions limitOutSJoneRead (Int, default=1000): max number of junctions for one read (including all multi-mappers) limitSjdbInsertNsj (Int, default=1000000): maximum number of junction to be inserted to the genome on the fly at the mapping stage, including those from annotations and those detected in the 1st step of the 2-pass run modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=8); description : Number of cores to allocate for task; common : true outFilterIntronMotifs (String, default=\"None\"); description : filter alignment using their motifs; choices : {'None': 'no filtering', 'RemoveNoncanonical': 'filter out alignments that contain non-canonical junctions', 'RemoveNoncanonicalUnannotated': 'filter out alignments that contain non-canonical unannotated junctions when using annotated splice junctions database. The annotated non-canonical junctions will be kept.'}; common : true outFilterIntronStrands (String, default=\"RemoveInconsistentStrands\"); description : filter alignments; choices : {'None': 'no filtering', 'RemoveInconsistentStrands': 'remove alignments that have junctions with inconsistent strands'}; common : true outFilterMatchNmin (Int, default=0); description : alignment will be output only if the number of matched bases is higher than or equal to this value; common : true outFilterMatchNminOverLread (Float, default=0.66): same as outFilterMatchNmin, but normalized to the read length (sum of mates' lengths for Paired-End reads) outFilterMismatchNmax (Int, default=10); description : alignment will be output only if it has no more mismatches than this value; common : true outFilterMismatchNoverLmax (Float, default=0.3): alignment will be output only if its ratio of mismatches to mapped length is less than or equal to this value outFilterMismatchNoverReadLmax (Float, default=1.0): alignment will be output only if its ratio of mismatches to read length is less than or equal to this value outFilterMultimapNmax (Int, default=20); description : maximum number of loci the read is allowed to map to. Alignments (all of them) will be output only if the read maps to no more loci than this value. Otherwise no alignments will be output, and the read will be counted as 'mapped to too many loci' in the Log.final.out. [STAR default] : 10 . [WDL default] : 20 .; common : true outFilterMultimapScoreRange (Int, default=1): the score range below the maximum score for multimapping alignments outFilterScoreMin (Int, default=0); description : alignment will be output only if its score is higher than or equal to this value; common : true outFilterScoreMinOverLread (Float, default=0.66): same as outFilterScoreMin, but normalized to read length (sum of mates' lengths for Paired-End reads) outFilterType (String, default=\"Normal\"); description : type of filtering; choices : {'Normal': 'standard filtering using only current alignment', 'BySJout': 'keep only those reads that contain junctions that passed filtering into SJ.out.tab'}; common : true outQSconversionAdd (Int, default=0): add this number to the quality score (e.g. to convert from Illumina to Sanger, use -31) outSAMattrIHstart (Int, default=1): start value for the IH attribute. 0 may be required by some downstream software, such as Cufflinks or StringTie. outSAMattributes (String, default=\"NH HI AS nM NM MD XS\"); description : a string of desired SAM attributes, in the order desired for the output SAM. Tags can be listed in any combination/order. [STAR defaults] : NH HI AS nM . [WDL default] : NH HI AS nM NM MD XS .; choices : {'NH': 'number of loci the reads maps to: =1 for unique mappers, >1 for multimappers. Standard SAM tag.', 'HI': 'multiple alignment index, starts with --outSAMattrIHstart (=1 by default). Standard SAM tag.', 'AS': 'local alignment score, +1/-1 for matches/mismateches, score penalties for indels and gaps. For PE reads, total score for two mates. Standard SAM tag.', 'nM': 'number of mismatches. For PE reads, sum over two mates.', 'NM': 'edit distance to the reference (number of mismatched + inserted + deleted bases) for each mate. Standard SAM tag.', 'MD': 'string encoding mismatched and deleted reference bases (see standard SAM specifications). Standard SAM tag.', 'jM': 'intron motifs for all junctions (i.e. N in CIGAR): 0: non-canonical; 1: GT/AG, 2: CT/AC, 3: GC/AG, 4: CT/GC, 5: AT/AC, 6: GT/AT. If splice junctions database is used, and a junction is annotated, 20 is added to its motif value.', 'jI': 'start and end of introns for all junctions (1-based).', 'XS': 'alignment strand according to --outSAMstrandField.', 'MC': \"mate's CIGAR string. Standard SAM tag.\", 'ch': 'marks all segments of all chimeric alignments for --chimOutType WithinBAM output.', 'cN': \"number of bases clipped from the read ends: 5' and 3'\"}; common *: true outSAMflagAND (Int, default=65535): 0-65535 : sam FLAG will be bitwise AND'd with this value, i.e. FLAG=FLAG & outSAMflagOR. This is applied after all flags have been set by STAR, but before outSAMflagOR. Can be used to unset specific bits that are not set otherwise. outSAMflagOR (Int, default=0): 0-65535 : sam FLAG will be bitwise OR'd with this value, i.e. FLAG=FLAG | outSAMflagOR. This is applied after all flags have been set by STAR, and after outSAMflagAND. Can be used to set specific bits that are not set otherwise. outSAMmapqUnique (Int, default=254): 0-255 : the MAPQ value for unique mappers. Please note the STAR default (255) produces errors downstream, as a MAPQ value of 255 is reserved to indicate a missing value. The default of this task is 254, which is the highest valid MAPQ value, and possibly what the author of STAR intended. [STAR default] : 255 . [WDL default] : 254 . outSAMorder (String, default=\"Paired\"); description : type of sorting for the SAM output; choices : {'Paired': 'one mate after the other for all paired alignments', 'PairedKeepInputOrder': 'one mate after the other for all paired alignments, the order is kept the same as in the input FASTQ files'} outSAMreadID (String, default=\"Standard\"); description : read ID record type; choices : {'Standard': 'first word (until space) from the FASTx read ID line, removing /1,/2 from the end', 'Number': 'read number (index) in the FASTx file'} outSAMstrandField (String, default=\"intronMotif\"); description : Cufflinks-like strand field flag; choices : {'None': 'not used', 'intronMotif': 'strand derived from the intron motif. This option changes the output alignments: reads with inconsistent and/or non-canonical introns are filtered out.'}; common : true outSAMtlen (String, default=\"left_plus\"); description : calculation method for the TLEN field in the SAM/BAM files; choices : {'left_plus': 'leftmost base of the (+)strand mate to rightmost base of the (-)mate. (+)sign for the (+)strand mate', 'left_any': 'leftmost base of any mate to rightmost base of any mate. (+)sign for the mate with the leftmost base. This is different from left_plus for overlapping mates with protruding ends'} outSAMunmapped (String, default=\"Within\"); description : output of unmapped reads in the SAM format.; choices : {'None': 'no output [STAR default] ', 'Within': 'output unmapped reads within the main SAM file (i.e. Aligned.out.sam) [WDL default] '} outSJfilterCountTotalMin (SJ_Motifs, default={\"noncanonical_motifs\": 3, \"GT_AG_and_CT_AC_motif\": 1, \"GC_AG_and_CT_GC_motif\": 1, \"AT_AC_and_GT_AT_motif\": 1}): minimum total (multi-mapping+unique) read count per junction for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif. Junctions are output if one of outSJfilterCountUniqueMin OR outSJfilterCountTotalMin conditions are satisfied. Does not apply to annotated junctions. outSJfilterCountUniqueMin (SJ_Motifs, default={\"noncanonical_motifs\": 3, \"GT_AG_and_CT_AC_motif\": 1, \"GC_AG_and_CT_GC_motif\": 1, \"AT_AC_and_GT_AT_motif\": 1}): minimum uniquely mapping read count per junction for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif. Junctions are output if one of outSJfilterCountUniqueMin OR outSJfilterCountTotalMin conditions are satisfied. Does not apply to annotated junctions. outSJfilterDistToOtherSJmin (SJ_Motifs, default={\"noncanonical_motifs\": 10, \"GT_AG_and_CT_AC_motif\": 0, \"GC_AG_and_CT_GC_motif\": 5, \"AT_AC_and_GT_AT_motif\": 10}): minimum allowed distance to other junctions' donor/acceptor for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. Does not apply to annotated junctions. outSJfilterIntronMaxVsReadN (Array[Int], default=[50000, 100000, 200000]): maximum gap allowed for junctions supported by 1,2,3,,,N reads. i.e. by default junctions supported by 1 read can have gaps <=50000b, by 2 reads: <=100000b, by 3 reads: <=200000b. by >=4 reads any gap <=alignIntronMax. Does not apply to annotated junctions. outSJfilterOverhangMin (SJ_Motifs, default={\"noncanonical_motifs\": 30, \"GT_AG_and_CT_AC_motif\": 12, \"GC_AG_and_CT_GC_motif\": 12, \"AT_AC_and_GT_AT_motif\": 12}): minimum overhang length for splice junctions on both sides for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif. Does not apply to annotated junctions. outSJfilterReads (String, default=\"All\"); description : which reads to consider for collapsed splice junctions output; choices : {'All': 'all reads, unique- and multi-mappers', 'Unique': 'uniquely mapping reads only'}; common : true peOverlapMMp (Float, default=0.01): maximum proportion of mismatched bases in the overlap area peOverlapNbasesMin (Int, default=0): minimum number of overlap bases to trigger mates merging and realignment. Specify >0 value to switch on the 'merging of overlapping mates' algorithm. readMapNumber (Int, default=-1); description : number of reads to map from the beginning of the file. -1 to map all reads; common : true readNameSeparator (String, default=\"/\"): character(s) separating the part of the read names that will be trimmed in output (read name after space is always trimmed) readQualityScoreBase (Int, default=33): number to be subtracted from the ASCII code to get Phred quality score read_two_fastqs_gz (Array[File], default=[]); description : An array of gzipped FASTQ files containing read two information; common : true runRNGseed (Int, default=777); description : random number generator seed; common : true scoreDelBase (Int, default=-2): deletion extension penalty per base (in addition to scoreDelOpen) scoreDelOpen (Int, default=-2): deletion open penalty scoreGap (Int, default=0): splice junction penalty (independent on intron motif) scoreGapATAC (Int, default=-8): AT/AC and GT/AT junction penalty (in addition to scoreGap) scoreGapGCAG (Int, default=-4): GC/AG and CT/GC junction penalty (in addition to scoreGap) scoreGapNoncan (Int, default=-8): non-canonical junction penalty (in addition to scoreGap) scoreGenomicLengthLog2scale (Float, default=-0.25): extra score logarithmically scaled with genomic length of the alignment: scoreGenomicLengthLog2scale*log2(genomicLength) scoreInsBase (Int, default=-2): insertion extension penalty per base (in addition to scoreInsOpen) scoreInsOpen (Int, default=-2): insertion open penalty scoreStitchSJshift (Int, default=1): maximum score reduction while searching for SJ boundaries in the stitching step seedMapMin (Int, default=5): min length of seeds to be mapped seedMultimapNmax (Int, default=10000): only pieces that map fewer than this value are utilized in the stitching procedure seedNoneLociPerWindow (Int, default=10): max number of one seed loci per window seedPerReadNmax (Int, default=1000): max number of seeds per read seedPerWindowNmax (Int, default=50): max number of seeds per window seedSearchLmax (Int, default=0): defines the maximum length of the seeds, if =0 seed length is not limited seedSearchStartLmax (Int, default=50): defines the search start point through the read - the read is split into pieces no longer than this value seedSearchStartLmaxOverLread (Float, default=1.0): seedSearchStartLmax normalized to read length (sum of mates' lengths for Paired-End reads) seedSplitMin (Int, default=12): min length of the seed sequences split by Ns or mate gap sjdbScore (Int, default=2); description : extra alignment score for alignments that cross database junctions; common : true twopass1readsN (Int, default=-1); description : number of reads to process for the 1st step. Use default ( -1 ) to map all reads in the first step; common : true twopassMode (String, default=\"Basic\"); description : 2-pass mapping mode; choices : {'None': '1-pass mapping [STAR default] ', 'Basic': 'basic 2-pass mapping, with all 1st pass junctions inserted into the genome indices on the fly [WDL default] '}; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true winAnchorDistNbins (Int, default=9): max number of bins between two anchors that allows aggregation of anchors into one window winAnchorMultimapNmax (Int, default=50): max number of loci anchors are allowed to map to winBinNbits (Int, default=16): =log2(winBin), where winBin is the size of the bin for the windows/clustering, each window will occupy an integer number of bins winFlankNbins (Int, default=4): =log2(winFlank), where winFlank is the size of the left and right flanking regions for each window Outputs star_log (File) star_bam (File) star_junctions (File) star_chimeric_junctions (File?)","title":"Star"},{"location":"tasks/star/#build_star_db","text":"description Runs STAR's build command to generate a STAR format reference for alignment outputs {'star_db': 'A gzipped TAR file containing the STAR reference files. Suitable as the star_db_tar_gz input to the alignment task.'}","title":"build_star_db"},{"location":"tasks/star/#inputs","text":"","title":"Inputs"},{"location":"tasks/star/#required","text":"_runtime (Any, required ) gtf (File, required ): GTF format feature file reference_fasta (File, required ): The FASTA format reference file for the genome","title":"Required"},{"location":"tasks/star/#defaults","text":"db_name (String, default=\"star_db\"); description : Name for output in compressed, archived format. The suffix .tar.gz will be added.; common : true genomeChrBinNbits (Int, default=18): =log2(chrBin), where chrBin is the size of the bins for genome storage: each chromosome will occupy an integer number of bins. For a genome with large number of contigs, it is recommended to scale this parameter as min(18, log2[max(GenomeLength/NumberOfReferences,ReadLength)]). genomeSAindexNbases (Int, default=14): length (bases) of the SA pre-indexing string. Typically between 10 and 15. Longer strings will use much more memory, but allow faster searches. For small genomes, the parameter --genomeSAindexNbases must be scaled down to min(14, log2(GenomeLength)/2 - 1) . genomeSAsparseD (Int, default=1): suffix array sparsity, i.e. distance between indices: use bigger numbers to decrease needed RAM at the cost of mapping speed reduction. genomeSuffixLengthMax (Int, default=-1): maximum length of the suffixes, has to be longer than read length. -1 = infinite. memory_gb (Int, default=50): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=8); description : Number of cores to allocate for task; common : true sjdbGTFchrPrefix (String, default=\"-\"); description : prefix for chromosome names in a GTF file (e.g. 'chr' for using ENSMEBL annotations with UCSC genomes); common : true sjdbGTFfeatureExon (String, default=\"exon\"): feature type in GTF file to be used as exons for building transcripts sjdbGTFtagExonParentGene (String, default=\"gene_id\"): GTF attribute name for parent gene ID sjdbGTFtagExonParentGeneName (String, default=\"gene_name\"): GTF attrbute name for parent gene name sjdbGTFtagExonParentGeneType (String, default=\"gene_type gene_biotype\"): GTF attrbute name for parent gene type sjdbGTFtagExonParentTranscript (String, default=\"transcript_id\"): GTF attribute name for parent transcript ID sjdbOverhang (Int, default=125); description : length of the donor/acceptor sequence on each side of the junctions, ideally = (mate_length - 1). [STAR default] : 100 . [WDL default] : 125 .; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/star/#outputs","text":"star_db (File)","title":"Outputs"},{"location":"tasks/star/#alignment","text":"description Runs the STAR aligner on a set of RNA-Seq FASTQ files external_help https://github.com/alexdobin/STAR/blob/2.7.11b/doc/STARmanual.pdf outputs {'star_log': 'Summary mapping statistics after mapping job is complete. The statistics are calculated for each read (Single- or Paired-End) and then summed or averaged over all reads. Note that STAR counts a Paired-End read as one read. Most of the information is collected about the UNIQUE mappers. Each splicing is counted in the numbers of splices, which would correspond to summing the counts in SJ.out.tab. The mismatch/indel error rates are calculated on a per base basis, i.e. as total number of mismatches/indels in all unique mappers divided by the total number of mapped bases.', 'star_bam': 'STAR aligned BAM', 'star_junctions': 'File contains high confidence collapsed splice junctions in tab-delimited format. Note that STAR defines the junction start/end as intronic bases, while many other software define them as exonic bases. See meta.external_help for file specification.', 'star_chimeric_junctions': 'Tab delimited file containing chimeric reads and associated metadata. See meta.external_help for file specification.'}","title":"alignment"},{"location":"tasks/star/#inputs_1","text":"","title":"Inputs"},{"location":"tasks/star/#required_1","text":"_runtime (Any, required ) prefix (String, required ): Prefix for the BAM and other STAR files. The extensions .Aligned.out.bam , .Log.final.out , .SJ.out.tab , and .Chimeric.out.junction will be added. read_one_fastqs_gz (Array[File], required ): An array of gzipped FASTQ files containing read one information star_db_tar_gz (File, required ): A gzipped TAR file containing the STAR reference files. The name of the root directory which was archived must match the archive's filename without the .tar.gz extension.","title":"Required"},{"location":"tasks/star/#optional","text":"read_groups (String?): A string containing the read group information to output in the BAM file. If including multiple read group fields per-read group, they should be space delimited. Read groups should be comma separated, with a space on each side (i.e. ' , '). The ID field must come first for each read group and must be contained in the basename of a FASTQ file or pair of FASTQ files if Paired-End. Example: ID:rg1 PU:flowcell1.lane1 SM:sample1 PL:illumina LB:sample1_lib1 , ID:rg2 PU:flowcell1.lane2 SM:sample1 PL:illumina LB:sample1_lib1 . These two read groups could be associated with the following four FASTQs: sample1.rg1_R1.fastq,sample1.rg2_R1.fastq and sample1.rg1_R2.fastq,sample1.rg2_R2.fastq","title":"Optional"},{"location":"tasks/star/#defaults_1","text":"alignEndsProtrude (Pair[Int,String], default=(0, \"ConcordantPair\")); description : allow protrusion of alignment ends, i.e. start (end) of the +strand mate downstream of the start (end) of the -strand mate. left : maximum number of protrusion bases allowed. right : see choices below.; choices : {'ConcordantPair': 'report alignments with non-zero protrusion as concordant pairs', 'DiscordantPair': 'report alignments with non-zero protrusion as discordant pairs'} alignEndsType (String, default=\"Local\"); description : type of read ends alignment; choices : {'Local': 'standard local alignment with soft-clipping allowed', 'EndToEnd': 'force end-to-end read alignment, do not soft-clip', 'Extend5pOfRead1': 'fully extend only the 5p of the read1, all other ends: local alignment', 'Extend5pOfReads12': 'fully extend only the 5p of the both read1 and read2, all other ends: local alignment'} alignInsertionFlush (String, default=\"None\"); description : how to flush ambiguous insertion positions; choices : {'None': 'insertions are not flushed', 'Right': 'insertions are flushed to the right'}; common : true alignIntronMax (Int, default=500000); description : maximum intron size, if 0, max intron size will be determined by (2^winBinNbits) winAnchorDistNbins. [STAR default] : 0 . [WDL default] : 500000 .; common *: true alignIntronMin (Int, default=21); description : minimum intron size: genomic gap is considered intron if its length>=alignIntronMin, otherwise it is considered Deletion; common : true alignMatesGapMax (Int, default=1000000); description : maximum gap between two mates, if 0, max intron gap will be determined by (2^winBinNbits) winAnchorDistNbins. [STAR default] : 0 . [WDL default] : 1000000 ; common *: true alignSJDBoverhangMin (Int, default=1); description : minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments. [STAR default] : 3 . [WDL default] : 1 .; common : true alignSJoverhangMin (Int, default=5); description : minimum overhang (i.e. block size) for spliced alignments; common : true alignSJstitchMismatchNmax (SJ_Motifs, default={\"noncanonical_motifs\": 0, \"GT_AG_and_CT_AC_motif\": -1, \"GC_AG_and_CT_GC_motif\": 0, \"AT_AC_and_GT_AT_motif\": 0}): maximum number of mismatches for stitching of the splice junctions (-1: no limit) for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif alignSoftClipAtReferenceEnds (String, default=\"Yes\"); description : allow the soft-clipping of the alignments past the end of the chromosomes; choices : {'Yes': 'allow', 'No': 'prohibit, useful for compatibility with Cufflinks'}; common : true alignSplicedMateMapLmin (Int, default=0): minimum mapped length for a read mate that is spliced alignSplicedMateMapLminOverLmate (Float, default=0.66): alignSplicedMateMapLmin normalized to mate length alignTranscriptsPerReadNmax (Int, default=10000): max number of different alignments per read to consider alignTranscriptsPerWindowNmax (Int, default=100): max number of transcripts per window alignWindowsPerReadNmax (Int, default=10000): max number of windows per read chimFilter (String, default=\"banGenomicN\"); description : different filters for chimeric alignments; choices : {'None': 'no filtering', 'banGenomicN': 'Ns are not allowed in the genome sequence around the chimeric junction'} chimJunctionOverhangMin (Int, default=20); description : minimum overhang for a chimeric junction; common : true chimMainSegmentMultNmax (Int, default=10); description : maximum number of multi-alignments for the main chimeric segment. =1 will prohibit multimapping main segments.; common : true chimMultimapNmax (Int, default=0); description : maximum number of chimeric multi-alignments. 0 : use the old scheme for chimeric detection which only considered unique alignments; common : true chimMultimapScoreRange (Int, default=1): the score range for multi-mapping chimeras below the best chimeric score. Only works with --chimMultimapNmax > 1. chimNonchimScoreDropMin (Int, default=20): to trigger chimeric detection, the drop in the best non-chimeric alignment score with respect to the read length has to be greater than this value chimOutJunctionFormat (String, default=\"plain\"); description : formatting type for the Chimeric.out.junction file; choices : {'plain': 'no comment lines/headers', 'comments': 'comment lines at the end of the file: command line and Nreads: total, unique/multi-mapping'}; common : true chimOutType (String, default=\"Junctions\"); description : type of chimeric output; choices : {'Junctions': 'Chimeric.out.junction', 'WithinBAM_HardClip': 'output into main aligned BAM files (Aligned. .bam). Hard-clipping in the CIGAR for supplemental chimeric alignments.', 'WithinBAM_SoftClip': 'output into main aligned BAM files (Aligned. .bam). Soft-clipping in the CIGAR for supplemental chimeric alignments.'}; common : true chimScoreDropMax (Int, default=20); description : max drop (difference) of chimeric score (the sum of scores of all chimeric segments) from the read length; common : true chimScoreJunctionNonGTAG (Int, default=-1): penalty for a non-GT/AG chimeric junction chimScoreMin (Int, default=0); description : minimum total (summed) score of the chimeric segments; common : true chimScoreSeparation (Int, default=10): minimum difference (separation) between the best chimeric score and the next one chimSegmentMin (Int, default=0); description : minimum length of chimeric segment length, if ==0, no chimeric output; common : true chimSegmentReadGapMax (Int, default=0); description : maximum gap in the read sequence between chimeric segments; common : true clip3pAdapterMMp (Pair[Float,Float], default=(0.1, 0.1)): max proportion of mismatches for 3p adapter clipping for each mate. left applies to read one and right applies to read two. clip3pAdapterSeq (Pair[String,String], default=(\"None\", \"None\")); description : adapter sequences to clip from 3p of each mate. left applies to read one and right applies to read two.; choices : {'None': 'No 3p adapter trimming will be performed', 'sequence': 'A nucleotide sequence string of any length, matching the regex /[ATCG]+/ ', 'polyA': 'polyA sequence with the length equal to read length'}; common : true clip3pAfterAdapterNbases (Pair[Int,Int], default=(0, 0)): number of bases to clip from 3p of each mate after the adapter clipping. left applies to read one and right applies to read two. clip3pNbases (Pair[Int,Int], default=(0, 0)): number of bases to clip from 3p of each mate. left applies to read one and right applies to read two. clip5pNbases (Pair[Int,Int], default=(0, 0)): number of bases to clip from 5p of each mate. left applies to read one and right applies to read two. clipAdapterType (String, default=\"Hamming\"); description : adapter clipping type; choices : {'Hamming': 'adapter clipping based on Hamming distance, with the number of mismatches controlled by --clip5pAdapterMMp', 'CellRanger4': '5p and 3p adapter clipping similar to CellRanger4. Utilizes Opal package by Martin \u0160o\u0161i\u0107: https://github.com/Martinsos/opal', 'None': 'no adapter clipping, all other clip* parameters are disregarded'} limitOutSJcollapsed (Int, default=1000000): max number of collapsed junctions limitOutSJoneRead (Int, default=1000): max number of junctions for one read (including all multi-mappers) limitSjdbInsertNsj (Int, default=1000000): maximum number of junction to be inserted to the genome on the fly at the mapping stage, including those from annotations and those detected in the 1st step of the 2-pass run modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=8); description : Number of cores to allocate for task; common : true outFilterIntronMotifs (String, default=\"None\"); description : filter alignment using their motifs; choices : {'None': 'no filtering', 'RemoveNoncanonical': 'filter out alignments that contain non-canonical junctions', 'RemoveNoncanonicalUnannotated': 'filter out alignments that contain non-canonical unannotated junctions when using annotated splice junctions database. The annotated non-canonical junctions will be kept.'}; common : true outFilterIntronStrands (String, default=\"RemoveInconsistentStrands\"); description : filter alignments; choices : {'None': 'no filtering', 'RemoveInconsistentStrands': 'remove alignments that have junctions with inconsistent strands'}; common : true outFilterMatchNmin (Int, default=0); description : alignment will be output only if the number of matched bases is higher than or equal to this value; common : true outFilterMatchNminOverLread (Float, default=0.66): same as outFilterMatchNmin, but normalized to the read length (sum of mates' lengths for Paired-End reads) outFilterMismatchNmax (Int, default=10); description : alignment will be output only if it has no more mismatches than this value; common : true outFilterMismatchNoverLmax (Float, default=0.3): alignment will be output only if its ratio of mismatches to mapped length is less than or equal to this value outFilterMismatchNoverReadLmax (Float, default=1.0): alignment will be output only if its ratio of mismatches to read length is less than or equal to this value outFilterMultimapNmax (Int, default=20); description : maximum number of loci the read is allowed to map to. Alignments (all of them) will be output only if the read maps to no more loci than this value. Otherwise no alignments will be output, and the read will be counted as 'mapped to too many loci' in the Log.final.out. [STAR default] : 10 . [WDL default] : 20 .; common : true outFilterMultimapScoreRange (Int, default=1): the score range below the maximum score for multimapping alignments outFilterScoreMin (Int, default=0); description : alignment will be output only if its score is higher than or equal to this value; common : true outFilterScoreMinOverLread (Float, default=0.66): same as outFilterScoreMin, but normalized to read length (sum of mates' lengths for Paired-End reads) outFilterType (String, default=\"Normal\"); description : type of filtering; choices : {'Normal': 'standard filtering using only current alignment', 'BySJout': 'keep only those reads that contain junctions that passed filtering into SJ.out.tab'}; common : true outQSconversionAdd (Int, default=0): add this number to the quality score (e.g. to convert from Illumina to Sanger, use -31) outSAMattrIHstart (Int, default=1): start value for the IH attribute. 0 may be required by some downstream software, such as Cufflinks or StringTie. outSAMattributes (String, default=\"NH HI AS nM NM MD XS\"); description : a string of desired SAM attributes, in the order desired for the output SAM. Tags can be listed in any combination/order. [STAR defaults] : NH HI AS nM . [WDL default] : NH HI AS nM NM MD XS .; choices : {'NH': 'number of loci the reads maps to: =1 for unique mappers, >1 for multimappers. Standard SAM tag.', 'HI': 'multiple alignment index, starts with --outSAMattrIHstart (=1 by default). Standard SAM tag.', 'AS': 'local alignment score, +1/-1 for matches/mismateches, score penalties for indels and gaps. For PE reads, total score for two mates. Standard SAM tag.', 'nM': 'number of mismatches. For PE reads, sum over two mates.', 'NM': 'edit distance to the reference (number of mismatched + inserted + deleted bases) for each mate. Standard SAM tag.', 'MD': 'string encoding mismatched and deleted reference bases (see standard SAM specifications). Standard SAM tag.', 'jM': 'intron motifs for all junctions (i.e. N in CIGAR): 0: non-canonical; 1: GT/AG, 2: CT/AC, 3: GC/AG, 4: CT/GC, 5: AT/AC, 6: GT/AT. If splice junctions database is used, and a junction is annotated, 20 is added to its motif value.', 'jI': 'start and end of introns for all junctions (1-based).', 'XS': 'alignment strand according to --outSAMstrandField.', 'MC': \"mate's CIGAR string. Standard SAM tag.\", 'ch': 'marks all segments of all chimeric alignments for --chimOutType WithinBAM output.', 'cN': \"number of bases clipped from the read ends: 5' and 3'\"}; common *: true outSAMflagAND (Int, default=65535): 0-65535 : sam FLAG will be bitwise AND'd with this value, i.e. FLAG=FLAG & outSAMflagOR. This is applied after all flags have been set by STAR, but before outSAMflagOR. Can be used to unset specific bits that are not set otherwise. outSAMflagOR (Int, default=0): 0-65535 : sam FLAG will be bitwise OR'd with this value, i.e. FLAG=FLAG | outSAMflagOR. This is applied after all flags have been set by STAR, and after outSAMflagAND. Can be used to set specific bits that are not set otherwise. outSAMmapqUnique (Int, default=254): 0-255 : the MAPQ value for unique mappers. Please note the STAR default (255) produces errors downstream, as a MAPQ value of 255 is reserved to indicate a missing value. The default of this task is 254, which is the highest valid MAPQ value, and possibly what the author of STAR intended. [STAR default] : 255 . [WDL default] : 254 . outSAMorder (String, default=\"Paired\"); description : type of sorting for the SAM output; choices : {'Paired': 'one mate after the other for all paired alignments', 'PairedKeepInputOrder': 'one mate after the other for all paired alignments, the order is kept the same as in the input FASTQ files'} outSAMreadID (String, default=\"Standard\"); description : read ID record type; choices : {'Standard': 'first word (until space) from the FASTx read ID line, removing /1,/2 from the end', 'Number': 'read number (index) in the FASTx file'} outSAMstrandField (String, default=\"intronMotif\"); description : Cufflinks-like strand field flag; choices : {'None': 'not used', 'intronMotif': 'strand derived from the intron motif. This option changes the output alignments: reads with inconsistent and/or non-canonical introns are filtered out.'}; common : true outSAMtlen (String, default=\"left_plus\"); description : calculation method for the TLEN field in the SAM/BAM files; choices : {'left_plus': 'leftmost base of the (+)strand mate to rightmost base of the (-)mate. (+)sign for the (+)strand mate', 'left_any': 'leftmost base of any mate to rightmost base of any mate. (+)sign for the mate with the leftmost base. This is different from left_plus for overlapping mates with protruding ends'} outSAMunmapped (String, default=\"Within\"); description : output of unmapped reads in the SAM format.; choices : {'None': 'no output [STAR default] ', 'Within': 'output unmapped reads within the main SAM file (i.e. Aligned.out.sam) [WDL default] '} outSJfilterCountTotalMin (SJ_Motifs, default={\"noncanonical_motifs\": 3, \"GT_AG_and_CT_AC_motif\": 1, \"GC_AG_and_CT_GC_motif\": 1, \"AT_AC_and_GT_AT_motif\": 1}): minimum total (multi-mapping+unique) read count per junction for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif. Junctions are output if one of outSJfilterCountUniqueMin OR outSJfilterCountTotalMin conditions are satisfied. Does not apply to annotated junctions. outSJfilterCountUniqueMin (SJ_Motifs, default={\"noncanonical_motifs\": 3, \"GT_AG_and_CT_AC_motif\": 1, \"GC_AG_and_CT_GC_motif\": 1, \"AT_AC_and_GT_AT_motif\": 1}): minimum uniquely mapping read count per junction for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif. Junctions are output if one of outSJfilterCountUniqueMin OR outSJfilterCountTotalMin conditions are satisfied. Does not apply to annotated junctions. outSJfilterDistToOtherSJmin (SJ_Motifs, default={\"noncanonical_motifs\": 10, \"GT_AG_and_CT_AC_motif\": 0, \"GC_AG_and_CT_GC_motif\": 5, \"AT_AC_and_GT_AT_motif\": 10}): minimum allowed distance to other junctions' donor/acceptor for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. Does not apply to annotated junctions. outSJfilterIntronMaxVsReadN (Array[Int], default=[50000, 100000, 200000]): maximum gap allowed for junctions supported by 1,2,3,,,N reads. i.e. by default junctions supported by 1 read can have gaps <=50000b, by 2 reads: <=100000b, by 3 reads: <=200000b. by >=4 reads any gap <=alignIntronMax. Does not apply to annotated junctions. outSJfilterOverhangMin (SJ_Motifs, default={\"noncanonical_motifs\": 30, \"GT_AG_and_CT_AC_motif\": 12, \"GC_AG_and_CT_GC_motif\": 12, \"AT_AC_and_GT_AT_motif\": 12}): minimum overhang length for splice junctions on both sides for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif. Does not apply to annotated junctions. outSJfilterReads (String, default=\"All\"); description : which reads to consider for collapsed splice junctions output; choices : {'All': 'all reads, unique- and multi-mappers', 'Unique': 'uniquely mapping reads only'}; common : true peOverlapMMp (Float, default=0.01): maximum proportion of mismatched bases in the overlap area peOverlapNbasesMin (Int, default=0): minimum number of overlap bases to trigger mates merging and realignment. Specify >0 value to switch on the 'merging of overlapping mates' algorithm. readMapNumber (Int, default=-1); description : number of reads to map from the beginning of the file. -1 to map all reads; common : true readNameSeparator (String, default=\"/\"): character(s) separating the part of the read names that will be trimmed in output (read name after space is always trimmed) readQualityScoreBase (Int, default=33): number to be subtracted from the ASCII code to get Phred quality score read_two_fastqs_gz (Array[File], default=[]); description : An array of gzipped FASTQ files containing read two information; common : true runRNGseed (Int, default=777); description : random number generator seed; common : true scoreDelBase (Int, default=-2): deletion extension penalty per base (in addition to scoreDelOpen) scoreDelOpen (Int, default=-2): deletion open penalty scoreGap (Int, default=0): splice junction penalty (independent on intron motif) scoreGapATAC (Int, default=-8): AT/AC and GT/AT junction penalty (in addition to scoreGap) scoreGapGCAG (Int, default=-4): GC/AG and CT/GC junction penalty (in addition to scoreGap) scoreGapNoncan (Int, default=-8): non-canonical junction penalty (in addition to scoreGap) scoreGenomicLengthLog2scale (Float, default=-0.25): extra score logarithmically scaled with genomic length of the alignment: scoreGenomicLengthLog2scale*log2(genomicLength) scoreInsBase (Int, default=-2): insertion extension penalty per base (in addition to scoreInsOpen) scoreInsOpen (Int, default=-2): insertion open penalty scoreStitchSJshift (Int, default=1): maximum score reduction while searching for SJ boundaries in the stitching step seedMapMin (Int, default=5): min length of seeds to be mapped seedMultimapNmax (Int, default=10000): only pieces that map fewer than this value are utilized in the stitching procedure seedNoneLociPerWindow (Int, default=10): max number of one seed loci per window seedPerReadNmax (Int, default=1000): max number of seeds per read seedPerWindowNmax (Int, default=50): max number of seeds per window seedSearchLmax (Int, default=0): defines the maximum length of the seeds, if =0 seed length is not limited seedSearchStartLmax (Int, default=50): defines the search start point through the read - the read is split into pieces no longer than this value seedSearchStartLmaxOverLread (Float, default=1.0): seedSearchStartLmax normalized to read length (sum of mates' lengths for Paired-End reads) seedSplitMin (Int, default=12): min length of the seed sequences split by Ns or mate gap sjdbScore (Int, default=2); description : extra alignment score for alignments that cross database junctions; common : true twopass1readsN (Int, default=-1); description : number of reads to process for the 1st step. Use default ( -1 ) to map all reads in the first step; common : true twopassMode (String, default=\"Basic\"); description : 2-pass mapping mode; choices : {'None': '1-pass mapping [STAR default] ', 'Basic': 'basic 2-pass mapping, with all 1st pass junctions inserted into the genome indices on the fly [WDL default] '}; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true winAnchorDistNbins (Int, default=9): max number of bins between two anchors that allows aggregation of anchors into one window winAnchorMultimapNmax (Int, default=50): max number of loci anchors are allowed to map to winBinNbits (Int, default=16): =log2(winBin), where winBin is the size of the bin for the windows/clustering, each window will occupy an integer number of bins winFlankNbins (Int, default=4): =log2(winFlank), where winFlank is the size of the left and right flanking regions for each window","title":"Defaults"},{"location":"tasks/star/#outputs_1","text":"star_log (File) star_bam (File) star_junctions (File) star_chimeric_junctions (File?)","title":"Outputs"},{"location":"tasks/util/","text":"Utilities download description Uses wget to download a file from a remote URL to the local filesystem outputs {'downloaded_file': 'File downloaded from provided URL'} Inputs Required _runtime (Any, required ) disk_size_gb (Int, required ): Disk space to allocate for task, specified in GB outfile_name (String, required ): Name of the output file url (String, required ): URL of the file to download Optional md5sum (String?): Optional md5sum to check against downloaded file. Recommended to use in order to catch corruption or an unintentional file swap. Outputs downloaded_file (File) get_read_groups description Gets read group information from a BAM file and writes it out to as a string outputs {'read_groups': 'An array of strings containing read group information. If format_for_star = true , all found read groups are contained in one string ( read_groups[0] ). If format_for_star = false , each found @RG line will be its own entry in output array read_groups .'} Inputs Required _runtime (Any, required ) bam (File, required ); description : Input BAM format file to get read groups from; stream : true Defaults format_for_star (Boolean, default=true); description : Format read group information for the STAR aligner (true) or output @RG lines of the header without further processing (false)? STAR formatted results will be an array of length 1, where all found read groups are contained in one string ( read_groups[0] ). If no processing is selected, each found @RG line will be its own entry in output array read_groups .; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. Outputs read_groups (Array[String]) split_string description Split a string into an array of strings based on a delimiter outputs {'split_string': 'Split string as an array'} Inputs Required _runtime (Any, required ) input_string (String, required ): String to split on occurences of delimiter Defaults delimiter (String, default=\" , \"); description : Delimiter on which to split input_string ; common : true Outputs split_strings (Array[String]) calc_gene_lengths description Calculate gene lengths from a GTF feature file using the non-overlapping exonic length algorithm help The non-overlapping exonic length algorithm can be implemented as the sum of each base covered by at least one exon; where each base is given a value of 1 regardless of how many exons overlap it. outputs {'gene_lengths': 'A two column headered TSV file with gene names in the first column and feature lengths (as integers) in the second column'} Inputs Required _runtime (Any, required ) gtf (File, required ): GTF feature file Defaults idattr (String, default=\"gene_name\"); description : GTF attribute to be used as feature ID. The value of this attribute will be used as the first column in the output file.; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. outfile_name (String, default=basename(gtf,\".gtf.gz\") + \".genelengths.txt\"): Name of the gene lengths file Outputs gene_lengths (File) compression_integrity description Checks the compression integrity of a bgzipped file outputs {'check': 'Dummy output to indicate success and to enable call-caching'} Inputs Required _runtime (Any, required ) bgzipped_file (File, required ): Input bgzipped file to check integrity of Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. Outputs check (String) add_to_bam_header description Adds another line of text to the bottom of a BAM header outputs {'reheadered_bam': 'The BAM after its header has been modified'} Inputs Required _runtime (Any, required ) additional_header (String, required ): A string to add as a new line in the BAM header. No format checking is done, so please ensure you do not invalidate your BAM with this task. Add only spec compliant entries to the header. bam (File, required ): Input BAM format file which will have its header added to Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\") + \".reheader\"): Prefix for the reheadered BAM. The extension .bam will be added. Outputs reheadered_bam (File) unpack_tarball description Accepts a .tar.gz archive and converts it into a flat array of files. Any directory structure of the archive is ignored. outputs {'tarball_contents': 'An array of files found in the input tarball'} Inputs Required _runtime (Any, required ) tarball (File, required ): A .tar.gz archive to unpack into individual files Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. Outputs tarball_contents (Array[File]) make_coverage_regions_beds description Takes in a GTF file, converts it to BED, then filters it down to two 3 column BED files: one of only 'exons', one of only 'CDS' regions outputs {'bed': 'Input GTF converted into BED format using the gtf2bed program', 'exon_bed': \"3 column BED file corresponding to all 'exons' found in the input GTF\", 'CDS_bed': \"3 column BED file corresponding to all 'CDS' regions found in the input GTF\"} Inputs Required _runtime (Any, required ) gtf (File, required ): GTF feature file from which to derive coverage regions BED files Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. Outputs bed (File) exon_bed (File) CDS_bed (File) global_phred_scores description Calculates statistics about PHRED scores of the input BAM outputs {'phred_scores': 'Headered TSV file containing PHRED score statistics'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to calculate PHRED score statistics for Defaults fast_mode (Boolean, default=true): Enable fast mode (true) or calculate statistics for every base in the BAM (false)? modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\")): Prefix for the output TSV file. The extension .global_PHRED_scores.tsv will be added. Outputs phred_scores (File) qc_summary description [OUT OF DATE] This WDL task pulls out keys metrics that can provide a high level overview of the sample, without needing to examine the entire MultiQC report. Currently, these key metrics come from Qualimap and ngsderive. Inputs Required _runtime (Any, required ) multiqc_tar_gz (File, required ): MultiQC report tarball from which to extract key metrics Defaults outfile_name (String, default=basename(multiqc_tar_gz,\".multiqc.tar.gz\") + \".qc_summary.json\"): Name for the JSON file Outputs summary (File) split_fastq description Splits a FASTQ into multiple files based on the number of reads per file outputs {'fastqs': 'Array of FASTQ files, each containing a subset of the input FASTQ'} Inputs Required _runtime (Any, required ) fastq (File, required ); description : Gzipped FASTQ file to split; stream : true Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2): Number of cores to allocate for task prefix (String, default=sub(basename(fastq),\"(fastq|fq)\\.gz$\",\"\")): Prefix for the FASTQ file. The extension .fq.gz will be added. reads_per_file (Int, default=10000000): Number of reads to include in each output FASTQ file Outputs fastqs (Array[File])","title":"Utilities"},{"location":"tasks/util/#utilities","text":"","title":"Utilities"},{"location":"tasks/util/#download","text":"description Uses wget to download a file from a remote URL to the local filesystem outputs {'downloaded_file': 'File downloaded from provided URL'}","title":"download"},{"location":"tasks/util/#inputs","text":"","title":"Inputs"},{"location":"tasks/util/#required","text":"_runtime (Any, required ) disk_size_gb (Int, required ): Disk space to allocate for task, specified in GB outfile_name (String, required ): Name of the output file url (String, required ): URL of the file to download","title":"Required"},{"location":"tasks/util/#optional","text":"md5sum (String?): Optional md5sum to check against downloaded file. Recommended to use in order to catch corruption or an unintentional file swap.","title":"Optional"},{"location":"tasks/util/#outputs","text":"downloaded_file (File)","title":"Outputs"},{"location":"tasks/util/#get_read_groups","text":"description Gets read group information from a BAM file and writes it out to as a string outputs {'read_groups': 'An array of strings containing read group information. If format_for_star = true , all found read groups are contained in one string ( read_groups[0] ). If format_for_star = false , each found @RG line will be its own entry in output array read_groups .'}","title":"get_read_groups"},{"location":"tasks/util/#inputs_1","text":"","title":"Inputs"},{"location":"tasks/util/#required_1","text":"_runtime (Any, required ) bam (File, required ); description : Input BAM format file to get read groups from; stream : true","title":"Required"},{"location":"tasks/util/#defaults","text":"format_for_star (Boolean, default=true); description : Format read group information for the STAR aligner (true) or output @RG lines of the header without further processing (false)? STAR formatted results will be an array of length 1, where all found read groups are contained in one string ( read_groups[0] ). If no processing is selected, each found @RG line will be its own entry in output array read_groups .; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.","title":"Defaults"},{"location":"tasks/util/#outputs_1","text":"read_groups (Array[String])","title":"Outputs"},{"location":"tasks/util/#split_string","text":"description Split a string into an array of strings based on a delimiter outputs {'split_string': 'Split string as an array'}","title":"split_string"},{"location":"tasks/util/#inputs_2","text":"","title":"Inputs"},{"location":"tasks/util/#required_2","text":"_runtime (Any, required ) input_string (String, required ): String to split on occurences of delimiter","title":"Required"},{"location":"tasks/util/#defaults_1","text":"delimiter (String, default=\" , \"); description : Delimiter on which to split input_string ; common : true","title":"Defaults"},{"location":"tasks/util/#outputs_2","text":"split_strings (Array[String])","title":"Outputs"},{"location":"tasks/util/#calc_gene_lengths","text":"description Calculate gene lengths from a GTF feature file using the non-overlapping exonic length algorithm help The non-overlapping exonic length algorithm can be implemented as the sum of each base covered by at least one exon; where each base is given a value of 1 regardless of how many exons overlap it. outputs {'gene_lengths': 'A two column headered TSV file with gene names in the first column and feature lengths (as integers) in the second column'}","title":"calc_gene_lengths"},{"location":"tasks/util/#inputs_3","text":"","title":"Inputs"},{"location":"tasks/util/#required_3","text":"_runtime (Any, required ) gtf (File, required ): GTF feature file","title":"Required"},{"location":"tasks/util/#defaults_2","text":"idattr (String, default=\"gene_name\"); description : GTF attribute to be used as feature ID. The value of this attribute will be used as the first column in the output file.; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. outfile_name (String, default=basename(gtf,\".gtf.gz\") + \".genelengths.txt\"): Name of the gene lengths file","title":"Defaults"},{"location":"tasks/util/#outputs_3","text":"gene_lengths (File)","title":"Outputs"},{"location":"tasks/util/#compression_integrity","text":"description Checks the compression integrity of a bgzipped file outputs {'check': 'Dummy output to indicate success and to enable call-caching'}","title":"compression_integrity"},{"location":"tasks/util/#inputs_4","text":"","title":"Inputs"},{"location":"tasks/util/#required_4","text":"_runtime (Any, required ) bgzipped_file (File, required ): Input bgzipped file to check integrity of","title":"Required"},{"location":"tasks/util/#defaults_3","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.","title":"Defaults"},{"location":"tasks/util/#outputs_4","text":"check (String)","title":"Outputs"},{"location":"tasks/util/#add_to_bam_header","text":"description Adds another line of text to the bottom of a BAM header outputs {'reheadered_bam': 'The BAM after its header has been modified'}","title":"add_to_bam_header"},{"location":"tasks/util/#inputs_5","text":"","title":"Inputs"},{"location":"tasks/util/#required_5","text":"_runtime (Any, required ) additional_header (String, required ): A string to add as a new line in the BAM header. No format checking is done, so please ensure you do not invalidate your BAM with this task. Add only spec compliant entries to the header. bam (File, required ): Input BAM format file which will have its header added to","title":"Required"},{"location":"tasks/util/#defaults_4","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\") + \".reheader\"): Prefix for the reheadered BAM. The extension .bam will be added.","title":"Defaults"},{"location":"tasks/util/#outputs_5","text":"reheadered_bam (File)","title":"Outputs"},{"location":"tasks/util/#unpack_tarball","text":"description Accepts a .tar.gz archive and converts it into a flat array of files. Any directory structure of the archive is ignored. outputs {'tarball_contents': 'An array of files found in the input tarball'}","title":"unpack_tarball"},{"location":"tasks/util/#inputs_6","text":"","title":"Inputs"},{"location":"tasks/util/#required_6","text":"_runtime (Any, required ) tarball (File, required ): A .tar.gz archive to unpack into individual files","title":"Required"},{"location":"tasks/util/#defaults_5","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.","title":"Defaults"},{"location":"tasks/util/#outputs_6","text":"tarball_contents (Array[File])","title":"Outputs"},{"location":"tasks/util/#make_coverage_regions_beds","text":"description Takes in a GTF file, converts it to BED, then filters it down to two 3 column BED files: one of only 'exons', one of only 'CDS' regions outputs {'bed': 'Input GTF converted into BED format using the gtf2bed program', 'exon_bed': \"3 column BED file corresponding to all 'exons' found in the input GTF\", 'CDS_bed': \"3 column BED file corresponding to all 'CDS' regions found in the input GTF\"}","title":"make_coverage_regions_beds"},{"location":"tasks/util/#inputs_7","text":"","title":"Inputs"},{"location":"tasks/util/#required_7","text":"_runtime (Any, required ) gtf (File, required ): GTF feature file from which to derive coverage regions BED files","title":"Required"},{"location":"tasks/util/#defaults_6","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.","title":"Defaults"},{"location":"tasks/util/#outputs_7","text":"bed (File) exon_bed (File) CDS_bed (File)","title":"Outputs"},{"location":"tasks/util/#global_phred_scores","text":"description Calculates statistics about PHRED scores of the input BAM outputs {'phred_scores': 'Headered TSV file containing PHRED score statistics'}","title":"global_phred_scores"},{"location":"tasks/util/#inputs_8","text":"","title":"Inputs"},{"location":"tasks/util/#required_8","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to calculate PHRED score statistics for","title":"Required"},{"location":"tasks/util/#defaults_7","text":"fast_mode (Boolean, default=true): Enable fast mode (true) or calculate statistics for every base in the BAM (false)? modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\")): Prefix for the output TSV file. The extension .global_PHRED_scores.tsv will be added.","title":"Defaults"},{"location":"tasks/util/#outputs_8","text":"phred_scores (File)","title":"Outputs"},{"location":"tasks/util/#qc_summary","text":"description [OUT OF DATE] This WDL task pulls out keys metrics that can provide a high level overview of the sample, without needing to examine the entire MultiQC report. Currently, these key metrics come from Qualimap and ngsderive.","title":"qc_summary"},{"location":"tasks/util/#inputs_9","text":"","title":"Inputs"},{"location":"tasks/util/#required_9","text":"_runtime (Any, required ) multiqc_tar_gz (File, required ): MultiQC report tarball from which to extract key metrics","title":"Required"},{"location":"tasks/util/#defaults_8","text":"outfile_name (String, default=basename(multiqc_tar_gz,\".multiqc.tar.gz\") + \".qc_summary.json\"): Name for the JSON file","title":"Defaults"},{"location":"tasks/util/#outputs_9","text":"summary (File)","title":"Outputs"},{"location":"tasks/util/#split_fastq","text":"description Splits a FASTQ into multiple files based on the number of reads per file outputs {'fastqs': 'Array of FASTQ files, each containing a subset of the input FASTQ'}","title":"split_fastq"},{"location":"tasks/util/#inputs_10","text":"","title":"Inputs"},{"location":"tasks/util/#required_10","text":"_runtime (Any, required ) fastq (File, required ); description : Gzipped FASTQ file to split; stream : true","title":"Required"},{"location":"tasks/util/#defaults_9","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2): Number of cores to allocate for task prefix (String, default=sub(basename(fastq),\"(fastq|fq)\\.gz$\",\"\")): Prefix for the FASTQ file. The extension .fq.gz will be added. reads_per_file (Int, default=10000000): Number of reads to include in each output FASTQ file","title":"Defaults"},{"location":"tasks/util/#outputs_10","text":"fastqs (Array[File])","title":"Outputs"},{"location":"workflows/10x-bam-to-fastqs/","text":"Cell Ranger BAM to FASTQs This WDL workflow converts an input BAM file to a set of FASTQ files. It performs QC checks along the way to validate the input and output. Output: read1s an array of files with the first read in the pair read2s an array of files with the second read in the pair fastqs an array of files sufficient for localizing in Cell Ranger's expected format fastqs_archive a compressed archive containing the array of FASTQ files LICENSING: MIT License Copyright 2020-Present St. Jude Children's Research Hospital Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. cell_ranger_bam_to_fastqs Inputs Required bam (File, required ): BAM file to split into FASTQs. bamtofastq._runtime (Any, required ) fqlint._runtime (Any, required ) quickcheck._runtime (Any, required ) Defaults cellranger11 (Boolean, default=false): Convert a BAM produced by Cell Ranger 1.0-1.1 gemcode (Boolean, default=false): Convert a BAM produced from GemCode data (Longranger 1.0 - 1.3) longranger20 (Boolean, default=false): Convert a BAM produced by Longranger 2.0 use_all_cores (Boolean, default=false): Use all cores for multi-core steps? bamtofastq.memory_gb (Int, default=40) bamtofastq.modify_disk_size_gb (Int, default=0) bamtofastq.ncpu (Int, default=1) fqlint.disable_validator_codes (Array[String], default=[]) fqlint.modify_disk_size_gb (Int, default=0) fqlint.modify_memory_gb (Int, default=0) fqlint.paired_read_validation_level (String, default=\"high\") fqlint.panic (Boolean, default=true) fqlint.single_read_validation_level (String, default=\"high\") quickcheck.modify_disk_size_gb (Int, default=0) Outputs fastqs (Array[File]) fastqs_archive (File) read1s (Array[File]) read2s (Array[File]) parse_input Inputs Required _runtime (Any, required ) cellranger11 (Boolean, required ): Convert a BAM produced by Cell Ranger 1.0-1.1 gemcode (Boolean, required ): Convert a BAM produced from GemCode data (Longranger 1.0 - 1.3) longranger20 (Boolean, required ): Convert a BAM produced by Longranger 2.0 Outputs input_check (String)","title":"Cell Ranger BAM to FASTQs"},{"location":"workflows/10x-bam-to-fastqs/#cell-ranger-bam-to-fastqs","text":"This WDL workflow converts an input BAM file to a set of FASTQ files. It performs QC checks along the way to validate the input and output.","title":"Cell Ranger BAM to FASTQs"},{"location":"workflows/10x-bam-to-fastqs/#output","text":"read1s an array of files with the first read in the pair read2s an array of files with the second read in the pair fastqs an array of files sufficient for localizing in Cell Ranger's expected format fastqs_archive a compressed archive containing the array of FASTQ files","title":"Output:"},{"location":"workflows/10x-bam-to-fastqs/#licensing","text":"","title":"LICENSING:"},{"location":"workflows/10x-bam-to-fastqs/#mit-license","text":"Copyright 2020-Present St. Jude Children's Research Hospital Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.","title":"MIT License"},{"location":"workflows/10x-bam-to-fastqs/#cell_ranger_bam_to_fastqs","text":"","title":"cell_ranger_bam_to_fastqs"},{"location":"workflows/10x-bam-to-fastqs/#inputs","text":"","title":"Inputs"},{"location":"workflows/10x-bam-to-fastqs/#required","text":"bam (File, required ): BAM file to split into FASTQs. bamtofastq._runtime (Any, required ) fqlint._runtime (Any, required ) quickcheck._runtime (Any, required )","title":"Required"},{"location":"workflows/10x-bam-to-fastqs/#defaults","text":"cellranger11 (Boolean, default=false): Convert a BAM produced by Cell Ranger 1.0-1.1 gemcode (Boolean, default=false): Convert a BAM produced from GemCode data (Longranger 1.0 - 1.3) longranger20 (Boolean, default=false): Convert a BAM produced by Longranger 2.0 use_all_cores (Boolean, default=false): Use all cores for multi-core steps? bamtofastq.memory_gb (Int, default=40) bamtofastq.modify_disk_size_gb (Int, default=0) bamtofastq.ncpu (Int, default=1) fqlint.disable_validator_codes (Array[String], default=[]) fqlint.modify_disk_size_gb (Int, default=0) fqlint.modify_memory_gb (Int, default=0) fqlint.paired_read_validation_level (String, default=\"high\") fqlint.panic (Boolean, default=true) fqlint.single_read_validation_level (String, default=\"high\") quickcheck.modify_disk_size_gb (Int, default=0)","title":"Defaults"},{"location":"workflows/10x-bam-to-fastqs/#outputs","text":"fastqs (Array[File]) fastqs_archive (File) read1s (Array[File]) read2s (Array[File])","title":"Outputs"},{"location":"workflows/10x-bam-to-fastqs/#parse_input","text":"","title":"parse_input"},{"location":"workflows/10x-bam-to-fastqs/#inputs_1","text":"","title":"Inputs"},{"location":"workflows/10x-bam-to-fastqs/#required_1","text":"_runtime (Any, required ) cellranger11 (Boolean, required ): Convert a BAM produced by Cell Ranger 1.0-1.1 gemcode (Boolean, required ): Convert a BAM produced from GemCode data (Longranger 1.0 - 1.3) longranger20 (Boolean, required ): Convert a BAM produced by Longranger 2.0","title":"Required"},{"location":"workflows/10x-bam-to-fastqs/#outputs_1","text":"input_check (String)","title":"Outputs"},{"location":"workflows/ESTIMATE/","text":"estimate description [DEPRECATED] Runs the ESTIMATE software package on a feature counts file external_help https://bioinformatics.mdanderson.org/estimate/ outputs {'tpm': 'Transcripts Per Million file', 'estimate_result': 'Final output of ESTIMATE'} deprecated true Inputs Required counts_file (File, required ): A two column headerless TSV file with gene names in the first column and counts (as integers) in the second column. Entries starting with '__' will be discarded. Can be generated with htseq.wdl . gene_lengths_file (File, required ): A two column headered TSV file with gene names (matching those in the counts file) in the first column and feature lengths (as integers) in the second column. Can be generated with calc-gene-lengths.wdl . calc_tpm._runtime (Any, required ) run_estimate._runtime (Any, required ) Defaults calc_tpm.prefix (String, default=basename(counts,\".feature-counts.txt\")) run_estimate.disk_size_gb (Int, default=10) run_estimate.max_retries (Int, default=1) run_estimate.memory_gb (Int, default=4) run_estimate.outfile_name (String, default=basename(gene_expression_file,\".TPM.txt\") + \".ESTIMATE.gct\") Outputs tpm (File) estimate_result (File)","title":"ESTIMATE"},{"location":"workflows/ESTIMATE/#estimate","text":"description [DEPRECATED] Runs the ESTIMATE software package on a feature counts file external_help https://bioinformatics.mdanderson.org/estimate/ outputs {'tpm': 'Transcripts Per Million file', 'estimate_result': 'Final output of ESTIMATE'} deprecated true","title":"estimate"},{"location":"workflows/ESTIMATE/#inputs","text":"","title":"Inputs"},{"location":"workflows/ESTIMATE/#required","text":"counts_file (File, required ): A two column headerless TSV file with gene names in the first column and counts (as integers) in the second column. Entries starting with '__' will be discarded. Can be generated with htseq.wdl . gene_lengths_file (File, required ): A two column headered TSV file with gene names (matching those in the counts file) in the first column and feature lengths (as integers) in the second column. Can be generated with calc-gene-lengths.wdl . calc_tpm._runtime (Any, required ) run_estimate._runtime (Any, required )","title":"Required"},{"location":"workflows/ESTIMATE/#defaults","text":"calc_tpm.prefix (String, default=basename(counts,\".feature-counts.txt\")) run_estimate.disk_size_gb (Int, default=10) run_estimate.max_retries (Int, default=1) run_estimate.memory_gb (Int, default=4) run_estimate.outfile_name (String, default=basename(gene_expression_file,\".TPM.txt\") + \".ESTIMATE.gct\")","title":"Defaults"},{"location":"workflows/ESTIMATE/#outputs","text":"tpm (File) estimate_result (File)","title":"Outputs"},{"location":"workflows/bam-to-fastqs/","text":"bam_to_fastqs description Converts an input BAM file to one or more FASTQ files, performing QC checks along the way outputs {'read1s': 'Array of FASTQ files corresponding to either first reads (if paired_end = true ) or all reads (if paired_end = false )', 'read2s': 'Array of FASTQ files corresponding to last reads (if paired_end = true )'} allowNestedInputs true Inputs Required bam (File, required ): BAM file to split into FASTQs bam_to_fastq._runtime (Any, required ) fqlint._runtime (Any, required ) quickcheck._runtime (Any, required ) split._runtime (Any, required ) Defaults paired_end (Boolean, default=true): Is the data Paired-End (true) or Single-End (false)? use_all_cores (Boolean, default=false): Use all cores for multi-core steps? bam_to_fastq.append_read_number (Boolean, default=true) bam_to_fastq.bitwise_filter (FlagFilter, default={\"include_if_all\": \"0x0\", \"exclude_if_any\": \"0x900\", \"include_if_any\": \"0x0\", \"exclude_if_all\": \"0x0\"}) bam_to_fastq.collated (Boolean, default=false) bam_to_fastq.fail_on_unexpected_reads (Boolean, default=false) bam_to_fastq.fast_mode (Boolean, default=!retain_collated_bam) bam_to_fastq.modify_disk_size_gb (Int, default=0) bam_to_fastq.modify_memory_gb (Int, default=0) bam_to_fastq.ncpu (Int, default=2) bam_to_fastq.output_singletons (Boolean, default=false) bam_to_fastq.prefix (String, default=basename(bam,\".bam\")) bam_to_fastq.retain_collated_bam (Boolean, default=false) fqlint.disable_validator_codes (Array[String], default=[]) fqlint.modify_disk_size_gb (Int, default=0) fqlint.modify_memory_gb (Int, default=0) fqlint.paired_read_validation_level (String, default=\"high\") fqlint.panic (Boolean, default=true) fqlint.single_read_validation_level (String, default=\"high\") quickcheck.modify_disk_size_gb (Int, default=0) split.modify_disk_size_gb (Int, default=0) split.ncpu (Int, default=2) split.prefix (String, default=basename(bam,\".bam\")) split.reject_unaccounted (Boolean, default=true) Outputs read1s (Array[File]) read2s (Array[File?])","title":"Bam to fastqs"},{"location":"workflows/bam-to-fastqs/#bam_to_fastqs","text":"description Converts an input BAM file to one or more FASTQ files, performing QC checks along the way outputs {'read1s': 'Array of FASTQ files corresponding to either first reads (if paired_end = true ) or all reads (if paired_end = false )', 'read2s': 'Array of FASTQ files corresponding to last reads (if paired_end = true )'} allowNestedInputs true","title":"bam_to_fastqs"},{"location":"workflows/bam-to-fastqs/#inputs","text":"","title":"Inputs"},{"location":"workflows/bam-to-fastqs/#required","text":"bam (File, required ): BAM file to split into FASTQs bam_to_fastq._runtime (Any, required ) fqlint._runtime (Any, required ) quickcheck._runtime (Any, required ) split._runtime (Any, required )","title":"Required"},{"location":"workflows/bam-to-fastqs/#defaults","text":"paired_end (Boolean, default=true): Is the data Paired-End (true) or Single-End (false)? use_all_cores (Boolean, default=false): Use all cores for multi-core steps? bam_to_fastq.append_read_number (Boolean, default=true) bam_to_fastq.bitwise_filter (FlagFilter, default={\"include_if_all\": \"0x0\", \"exclude_if_any\": \"0x900\", \"include_if_any\": \"0x0\", \"exclude_if_all\": \"0x0\"}) bam_to_fastq.collated (Boolean, default=false) bam_to_fastq.fail_on_unexpected_reads (Boolean, default=false) bam_to_fastq.fast_mode (Boolean, default=!retain_collated_bam) bam_to_fastq.modify_disk_size_gb (Int, default=0) bam_to_fastq.modify_memory_gb (Int, default=0) bam_to_fastq.ncpu (Int, default=2) bam_to_fastq.output_singletons (Boolean, default=false) bam_to_fastq.prefix (String, default=basename(bam,\".bam\")) bam_to_fastq.retain_collated_bam (Boolean, default=false) fqlint.disable_validator_codes (Array[String], default=[]) fqlint.modify_disk_size_gb (Int, default=0) fqlint.modify_memory_gb (Int, default=0) fqlint.paired_read_validation_level (String, default=\"high\") fqlint.panic (Boolean, default=true) fqlint.single_read_validation_level (String, default=\"high\") quickcheck.modify_disk_size_gb (Int, default=0) split.modify_disk_size_gb (Int, default=0) split.ncpu (Int, default=2) split.prefix (String, default=basename(bam,\".bam\")) split.reject_unaccounted (Boolean, default=true)","title":"Defaults"},{"location":"workflows/bam-to-fastqs/#outputs","text":"read1s (Array[File]) read2s (Array[File?])","title":"Outputs"},{"location":"workflows/bwa-db-build/","text":"bwa_db_build description Generates a set of genome reference files usable by the BWA aligner from an input reference file in FASTA format. outputs {'reference_fa': 'FASTA format reference file used to generate bwa_db_tar_gz ', 'bwa_db_tar_gz': 'Gzipped tar archive of the BWA reference files. Files are at the root of the archive.'} allowNestedInputs true Inputs Required reference_fa_name (String, required ): Name of output reference FASTA file reference_fa_url (String, required ): URL to retrieve the reference FASTA file from. build_bwa_db._runtime (Any, required ) reference_download._runtime (Any, required ) Optional reference_fa_md5 (String?): Expected md5sum of reference FASTA file Defaults reference_fa_disk_size_gb (Int, default=10): Disk size in GB to allocate for the reference FASTA file. build_bwa_db.db_name (String, default=\"bwa_db\") build_bwa_db.modify_disk_size_gb (Int, default=0) Outputs reference_fa (File) bwa_db_tar_gz (File)","title":"Bwa db build"},{"location":"workflows/bwa-db-build/#bwa_db_build","text":"description Generates a set of genome reference files usable by the BWA aligner from an input reference file in FASTA format. outputs {'reference_fa': 'FASTA format reference file used to generate bwa_db_tar_gz ', 'bwa_db_tar_gz': 'Gzipped tar archive of the BWA reference files. Files are at the root of the archive.'} allowNestedInputs true","title":"bwa_db_build"},{"location":"workflows/bwa-db-build/#inputs","text":"","title":"Inputs"},{"location":"workflows/bwa-db-build/#required","text":"reference_fa_name (String, required ): Name of output reference FASTA file reference_fa_url (String, required ): URL to retrieve the reference FASTA file from. build_bwa_db._runtime (Any, required ) reference_download._runtime (Any, required )","title":"Required"},{"location":"workflows/bwa-db-build/#optional","text":"reference_fa_md5 (String?): Expected md5sum of reference FASTA file","title":"Optional"},{"location":"workflows/bwa-db-build/#defaults","text":"reference_fa_disk_size_gb (Int, default=10): Disk size in GB to allocate for the reference FASTA file. build_bwa_db.db_name (String, default=\"bwa_db\") build_bwa_db.modify_disk_size_gb (Int, default=0)","title":"Defaults"},{"location":"workflows/bwa-db-build/#outputs","text":"reference_fa (File) bwa_db_tar_gz (File)","title":"Outputs"},{"location":"workflows/dnaseq-core/","text":"WARNING: this workflow is experimental! Use at your own risk! dnaseq_core_experimental description Aligns DNA reads using bwa outputs {'harmonized_bam': 'Harmonized DNA-Seq BAM, aligned with bwa', 'harmonized_bam_index': 'Index for the harmonized DNA-Seq BAM file'} allowNestedInputs true Inputs Required bwa_db (File, required ): Gzipped tar archive of the bwa reference files. Files should be at the root of the archive. prefix (String, required ): Prefix for the BAM file. The extension .bam will be added. read_groups (Array[ReadGroup], required ): An Array of structs defining read groups to include in the harmonized BAM. Must correspond to input FASTQs. Each read group ID must be contained in the basename of a pair of FASTQ files. This requirement means the length of read_groups must equal the length of read_one_fastqs_gz and the length of read_two_fastqs_gz . Only the ID field is required, and it must be unique for each read group defined. See data_structures/read_group.wdl for help formatting your input JSON. read_one_fastqs_gz (Array[File], required ): Input gzipped FASTQ format file(s) with 1st read in pair to align read_two_fastqs_gz (Array[File], required ): Input gzipped FASTQ format file(s) with 2nd read in pair to align ReadGroup_to_string._runtime (Any, required ) bwa_aln_pe._runtime (Any, required ) bwa_mem._runtime (Any, required ) index._runtime (Any, required ) read_ones._runtime (Any, required ) read_twos._runtime (Any, required ) sort._runtime (Any, required ) rg_merge.basic_merge._runtime (Any, required ) rg_merge.final_merge._runtime (Any, required ) rg_merge.inner_merge._runtime (Any, required ) Optional rg_merge.basic_merge.new_header (File?) rg_merge.final_merge.new_header (File?) rg_merge.inner_merge.new_header (File?) Defaults aligner (String, default=\"mem\"); description : BWA aligner to use; choices : ['mem', 'aln'] reads_per_file (Int, default=10000000): Controls the number of reads per FASTQ file for internal split to run BWA in parallel. use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. bwa_aln_pe.modify_disk_size_gb (Int, default=0) bwa_aln_pe.ncpu (Int, default=4) bwa_mem.modify_disk_size_gb (Int, default=0) bwa_mem.ncpu (Int, default=4) index.modify_disk_size_gb (Int, default=0) index.ncpu (Int, default=2) index.use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. read_ones.modify_disk_size_gb (Int, default=0) read_ones.ncpu (Int, default=2) read_ones.prefix (String, default=sub(basename(fastq),\"(fastq|fq)\\.gz$\",\"\")): Prefix for the BAM file. The extension .bam will be added. read_twos.modify_disk_size_gb (Int, default=0) read_twos.ncpu (Int, default=2) read_twos.prefix (String, default=sub(basename(fastq),\"(fastq|fq)\\.gz$\",\"\")): Prefix for the BAM file. The extension .bam will be added. rg_merge.max_length (Int, default=100) sort.memory_gb (Int, default=25) sort.modify_disk_size_gb (Int, default=0) sort.prefix (String, default=basename(bam,\".bam\") + \".sorted\"): Prefix for the BAM file. The extension .bam will be added. sort.sort_order (String, default=\"coordinate\") sort.validation_stringency (String, default=\"SILENT\") rg_merge.basic_merge.combine_rg (Boolean, default=true) rg_merge.basic_merge.modify_disk_size_gb (Int, default=0) rg_merge.basic_merge.name_sorted (Boolean, default=false) rg_merge.basic_merge.ncpu (Int, default=2) rg_merge.basic_merge.region (String, default=\"\") rg_merge.final_merge.modify_disk_size_gb (Int, default=0) rg_merge.final_merge.name_sorted (Boolean, default=false) rg_merge.final_merge.ncpu (Int, default=2) rg_merge.final_merge.region (String, default=\"\") rg_merge.inner_merge.combine_rg (Boolean, default=true) rg_merge.inner_merge.modify_disk_size_gb (Int, default=0) rg_merge.inner_merge.name_sorted (Boolean, default=false) rg_merge.inner_merge.ncpu (Int, default=2) rg_merge.inner_merge.region (String, default=\"\") Outputs harmonized_bam (File) harmonized_bam_index (File)","title":"Dnaseq core"},{"location":"workflows/dnaseq-core/#dnaseq_core_experimental","text":"description Aligns DNA reads using bwa outputs {'harmonized_bam': 'Harmonized DNA-Seq BAM, aligned with bwa', 'harmonized_bam_index': 'Index for the harmonized DNA-Seq BAM file'} allowNestedInputs true","title":"dnaseq_core_experimental"},{"location":"workflows/dnaseq-core/#inputs","text":"","title":"Inputs"},{"location":"workflows/dnaseq-core/#required","text":"bwa_db (File, required ): Gzipped tar archive of the bwa reference files. Files should be at the root of the archive. prefix (String, required ): Prefix for the BAM file. The extension .bam will be added. read_groups (Array[ReadGroup], required ): An Array of structs defining read groups to include in the harmonized BAM. Must correspond to input FASTQs. Each read group ID must be contained in the basename of a pair of FASTQ files. This requirement means the length of read_groups must equal the length of read_one_fastqs_gz and the length of read_two_fastqs_gz . Only the ID field is required, and it must be unique for each read group defined. See data_structures/read_group.wdl for help formatting your input JSON. read_one_fastqs_gz (Array[File], required ): Input gzipped FASTQ format file(s) with 1st read in pair to align read_two_fastqs_gz (Array[File], required ): Input gzipped FASTQ format file(s) with 2nd read in pair to align ReadGroup_to_string._runtime (Any, required ) bwa_aln_pe._runtime (Any, required ) bwa_mem._runtime (Any, required ) index._runtime (Any, required ) read_ones._runtime (Any, required ) read_twos._runtime (Any, required ) sort._runtime (Any, required ) rg_merge.basic_merge._runtime (Any, required ) rg_merge.final_merge._runtime (Any, required ) rg_merge.inner_merge._runtime (Any, required )","title":"Required"},{"location":"workflows/dnaseq-core/#optional","text":"rg_merge.basic_merge.new_header (File?) rg_merge.final_merge.new_header (File?) rg_merge.inner_merge.new_header (File?)","title":"Optional"},{"location":"workflows/dnaseq-core/#defaults","text":"aligner (String, default=\"mem\"); description : BWA aligner to use; choices : ['mem', 'aln'] reads_per_file (Int, default=10000000): Controls the number of reads per FASTQ file for internal split to run BWA in parallel. use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. bwa_aln_pe.modify_disk_size_gb (Int, default=0) bwa_aln_pe.ncpu (Int, default=4) bwa_mem.modify_disk_size_gb (Int, default=0) bwa_mem.ncpu (Int, default=4) index.modify_disk_size_gb (Int, default=0) index.ncpu (Int, default=2) index.use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. read_ones.modify_disk_size_gb (Int, default=0) read_ones.ncpu (Int, default=2) read_ones.prefix (String, default=sub(basename(fastq),\"(fastq|fq)\\.gz$\",\"\")): Prefix for the BAM file. The extension .bam will be added. read_twos.modify_disk_size_gb (Int, default=0) read_twos.ncpu (Int, default=2) read_twos.prefix (String, default=sub(basename(fastq),\"(fastq|fq)\\.gz$\",\"\")): Prefix for the BAM file. The extension .bam will be added. rg_merge.max_length (Int, default=100) sort.memory_gb (Int, default=25) sort.modify_disk_size_gb (Int, default=0) sort.prefix (String, default=basename(bam,\".bam\") + \".sorted\"): Prefix for the BAM file. The extension .bam will be added. sort.sort_order (String, default=\"coordinate\") sort.validation_stringency (String, default=\"SILENT\") rg_merge.basic_merge.combine_rg (Boolean, default=true) rg_merge.basic_merge.modify_disk_size_gb (Int, default=0) rg_merge.basic_merge.name_sorted (Boolean, default=false) rg_merge.basic_merge.ncpu (Int, default=2) rg_merge.basic_merge.region (String, default=\"\") rg_merge.final_merge.modify_disk_size_gb (Int, default=0) rg_merge.final_merge.name_sorted (Boolean, default=false) rg_merge.final_merge.ncpu (Int, default=2) rg_merge.final_merge.region (String, default=\"\") rg_merge.inner_merge.combine_rg (Boolean, default=true) rg_merge.inner_merge.modify_disk_size_gb (Int, default=0) rg_merge.inner_merge.name_sorted (Boolean, default=false) rg_merge.inner_merge.ncpu (Int, default=2) rg_merge.inner_merge.region (String, default=\"\")","title":"Defaults"},{"location":"workflows/dnaseq-core/#outputs","text":"harmonized_bam (File) harmonized_bam_index (File)","title":"Outputs"},{"location":"workflows/dnaseq-standard-fastq/","text":"WARNING: this workflow is experimental! Use at your own risk! dnaseq_standard_fastq_experimental description Aligns DNA reads using bwa outputs {'harmonized_bam': 'Harmonized DNA-Seq BAM, aligned with bwa', 'harmonized_bam_index': 'Index for the harmonized DNA-Seq BAM file'} allowNestedInputs true Inputs Required bwa_db (File, required ): Gzipped tar archive of the bwa reference files. Files should be at the root of the archive. prefix (String, required ): Prefix for the BAM file. The extension .bam will be added. read_groups (Array[ReadGroup], required ); description : An Array of structs defining read groups to include in the harmonized BAM. Must correspond to input FASTQs. Each read group ID must be contained in the basename of a FASTQ file or pair of FASTQ files if Paired-End. This requirement means the length of read_groups must equal the length of read_one_fastqs_gz and the length of read_two_fastqs_gz if non-zero. Only the ID field is required, and it must be unique for each read group defined. See data_structures/read_group.wdl for help formatting your input JSON.; external_help : https://samtools.github.io/hts-specs/SAMv1.pdf read_one_fastqs_gz (Array[File], required ): Input gzipped FASTQ format file(s) with 1st read in pair to align read_two_fastqs_gz (Array[File], required ): Input gzipped FASTQ format file(s) with 2nd read in pair to align fqlint._runtime (Any, required ) parse_input._runtime (Any, required ) subsample._runtime (Any, required ) dnaseq_core_experimental.ReadGroup_to_string._runtime (Any, required ) dnaseq_core_experimental.bwa_aln_pe._runtime (Any, required ) dnaseq_core_experimental.bwa_mem._runtime (Any, required ) dnaseq_core_experimental.index._runtime (Any, required ) dnaseq_core_experimental.read_ones._runtime (Any, required ) dnaseq_core_experimental.read_twos._runtime (Any, required ) dnaseq_core_experimental.sort._runtime (Any, required ) dnaseq_core_experimental.rg_merge.basic_merge._runtime (Any, required ) dnaseq_core_experimental.rg_merge.final_merge._runtime (Any, required ) dnaseq_core_experimental.rg_merge.inner_merge._runtime (Any, required ) Optional dnaseq_core_experimental.rg_merge.basic_merge.new_header (File?) dnaseq_core_experimental.rg_merge.final_merge.new_header (File?) dnaseq_core_experimental.rg_merge.inner_merge.new_header (File?) Defaults aligner (String, default=\"mem\"); description : BWA aligner to use; choices : ['mem', 'aln'] reads_per_file (Int, default=10000000): Controls the number of reads per FASTQ file for internal split to run BWA in parallel. subsample_n_reads (Int, default=-1): Only process a random sampling of n reads. Any n <= 0 for processing entire input. use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. validate_input (Boolean, default=true): Ensure input BAM is well-formed before beginning harmonization? fqlint.disable_validator_codes (Array[String], default=[]) fqlint.modify_disk_size_gb (Int, default=0) fqlint.modify_memory_gb (Int, default=0) fqlint.paired_read_validation_level (String, default=\"high\") fqlint.panic (Boolean, default=true) fqlint.single_read_validation_level (String, default=\"high\") subsample.modify_disk_size_gb (Int, default=0) subsample.prefix (String, default=sub(basename(read_one_fastq),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\")): Prefix for the BAM file. The extension .bam will be added. subsample.probability (Float, default=1.0) dnaseq_core_experimental.bwa_aln_pe.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.bwa_aln_pe.ncpu (Int, default=4) dnaseq_core_experimental.bwa_mem.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.bwa_mem.ncpu (Int, default=4) dnaseq_core_experimental.index.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.index.ncpu (Int, default=2) dnaseq_core_experimental.index.use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. dnaseq_core_experimental.read_ones.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.read_ones.ncpu (Int, default=2) dnaseq_core_experimental.read_ones.prefix (String, default=sub(basename(fastq),\"(fastq|fq)\\.gz$\",\"\")): Prefix for the BAM file. The extension .bam will be added. dnaseq_core_experimental.read_twos.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.read_twos.ncpu (Int, default=2) dnaseq_core_experimental.read_twos.prefix (String, default=sub(basename(fastq),\"(fastq|fq)\\.gz$\",\"\")): Prefix for the BAM file. The extension .bam will be added. dnaseq_core_experimental.rg_merge.max_length (Int, default=100) dnaseq_core_experimental.sort.memory_gb (Int, default=25) dnaseq_core_experimental.sort.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.sort.prefix (String, default=basename(bam,\".bam\") + \".sorted\"): Prefix for the BAM file. The extension .bam will be added. dnaseq_core_experimental.sort.sort_order (String, default=\"coordinate\") dnaseq_core_experimental.sort.validation_stringency (String, default=\"SILENT\") dnaseq_core_experimental.rg_merge.basic_merge.combine_rg (Boolean, default=true) dnaseq_core_experimental.rg_merge.basic_merge.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.rg_merge.basic_merge.name_sorted (Boolean, default=false) dnaseq_core_experimental.rg_merge.basic_merge.ncpu (Int, default=2) dnaseq_core_experimental.rg_merge.basic_merge.region (String, default=\"\") dnaseq_core_experimental.rg_merge.final_merge.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.rg_merge.final_merge.name_sorted (Boolean, default=false) dnaseq_core_experimental.rg_merge.final_merge.ncpu (Int, default=2) dnaseq_core_experimental.rg_merge.final_merge.region (String, default=\"\") dnaseq_core_experimental.rg_merge.inner_merge.combine_rg (Boolean, default=true) dnaseq_core_experimental.rg_merge.inner_merge.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.rg_merge.inner_merge.name_sorted (Boolean, default=false) dnaseq_core_experimental.rg_merge.inner_merge.ncpu (Int, default=2) dnaseq_core_experimental.rg_merge.inner_merge.region (String, default=\"\") Outputs harmonized_bam (File) harmonized_bam_index (File) parse_input description Parses and validates the dnaseq_standard workflow's provided inputs outputs {'check': 'Dummy output to indicate success and to enable call-caching'} Inputs Required _runtime (Any, required ) aligner (String, required ); description : BWA aligner to use; choices : ['mem', 'aln'] array_lengths (Array[Int], required ) Outputs check (String)","title":"Dnaseq standard fastq"},{"location":"workflows/dnaseq-standard-fastq/#dnaseq_standard_fastq_experimental","text":"description Aligns DNA reads using bwa outputs {'harmonized_bam': 'Harmonized DNA-Seq BAM, aligned with bwa', 'harmonized_bam_index': 'Index for the harmonized DNA-Seq BAM file'} allowNestedInputs true","title":"dnaseq_standard_fastq_experimental"},{"location":"workflows/dnaseq-standard-fastq/#inputs","text":"","title":"Inputs"},{"location":"workflows/dnaseq-standard-fastq/#required","text":"bwa_db (File, required ): Gzipped tar archive of the bwa reference files. Files should be at the root of the archive. prefix (String, required ): Prefix for the BAM file. The extension .bam will be added. read_groups (Array[ReadGroup], required ); description : An Array of structs defining read groups to include in the harmonized BAM. Must correspond to input FASTQs. Each read group ID must be contained in the basename of a FASTQ file or pair of FASTQ files if Paired-End. This requirement means the length of read_groups must equal the length of read_one_fastqs_gz and the length of read_two_fastqs_gz if non-zero. Only the ID field is required, and it must be unique for each read group defined. See data_structures/read_group.wdl for help formatting your input JSON.; external_help : https://samtools.github.io/hts-specs/SAMv1.pdf read_one_fastqs_gz (Array[File], required ): Input gzipped FASTQ format file(s) with 1st read in pair to align read_two_fastqs_gz (Array[File], required ): Input gzipped FASTQ format file(s) with 2nd read in pair to align fqlint._runtime (Any, required ) parse_input._runtime (Any, required ) subsample._runtime (Any, required ) dnaseq_core_experimental.ReadGroup_to_string._runtime (Any, required ) dnaseq_core_experimental.bwa_aln_pe._runtime (Any, required ) dnaseq_core_experimental.bwa_mem._runtime (Any, required ) dnaseq_core_experimental.index._runtime (Any, required ) dnaseq_core_experimental.read_ones._runtime (Any, required ) dnaseq_core_experimental.read_twos._runtime (Any, required ) dnaseq_core_experimental.sort._runtime (Any, required ) dnaseq_core_experimental.rg_merge.basic_merge._runtime (Any, required ) dnaseq_core_experimental.rg_merge.final_merge._runtime (Any, required ) dnaseq_core_experimental.rg_merge.inner_merge._runtime (Any, required )","title":"Required"},{"location":"workflows/dnaseq-standard-fastq/#optional","text":"dnaseq_core_experimental.rg_merge.basic_merge.new_header (File?) dnaseq_core_experimental.rg_merge.final_merge.new_header (File?) dnaseq_core_experimental.rg_merge.inner_merge.new_header (File?)","title":"Optional"},{"location":"workflows/dnaseq-standard-fastq/#defaults","text":"aligner (String, default=\"mem\"); description : BWA aligner to use; choices : ['mem', 'aln'] reads_per_file (Int, default=10000000): Controls the number of reads per FASTQ file for internal split to run BWA in parallel. subsample_n_reads (Int, default=-1): Only process a random sampling of n reads. Any n <= 0 for processing entire input. use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. validate_input (Boolean, default=true): Ensure input BAM is well-formed before beginning harmonization? fqlint.disable_validator_codes (Array[String], default=[]) fqlint.modify_disk_size_gb (Int, default=0) fqlint.modify_memory_gb (Int, default=0) fqlint.paired_read_validation_level (String, default=\"high\") fqlint.panic (Boolean, default=true) fqlint.single_read_validation_level (String, default=\"high\") subsample.modify_disk_size_gb (Int, default=0) subsample.prefix (String, default=sub(basename(read_one_fastq),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\")): Prefix for the BAM file. The extension .bam will be added. subsample.probability (Float, default=1.0) dnaseq_core_experimental.bwa_aln_pe.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.bwa_aln_pe.ncpu (Int, default=4) dnaseq_core_experimental.bwa_mem.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.bwa_mem.ncpu (Int, default=4) dnaseq_core_experimental.index.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.index.ncpu (Int, default=2) dnaseq_core_experimental.index.use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. dnaseq_core_experimental.read_ones.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.read_ones.ncpu (Int, default=2) dnaseq_core_experimental.read_ones.prefix (String, default=sub(basename(fastq),\"(fastq|fq)\\.gz$\",\"\")): Prefix for the BAM file. The extension .bam will be added. dnaseq_core_experimental.read_twos.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.read_twos.ncpu (Int, default=2) dnaseq_core_experimental.read_twos.prefix (String, default=sub(basename(fastq),\"(fastq|fq)\\.gz$\",\"\")): Prefix for the BAM file. The extension .bam will be added. dnaseq_core_experimental.rg_merge.max_length (Int, default=100) dnaseq_core_experimental.sort.memory_gb (Int, default=25) dnaseq_core_experimental.sort.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.sort.prefix (String, default=basename(bam,\".bam\") + \".sorted\"): Prefix for the BAM file. The extension .bam will be added. dnaseq_core_experimental.sort.sort_order (String, default=\"coordinate\") dnaseq_core_experimental.sort.validation_stringency (String, default=\"SILENT\") dnaseq_core_experimental.rg_merge.basic_merge.combine_rg (Boolean, default=true) dnaseq_core_experimental.rg_merge.basic_merge.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.rg_merge.basic_merge.name_sorted (Boolean, default=false) dnaseq_core_experimental.rg_merge.basic_merge.ncpu (Int, default=2) dnaseq_core_experimental.rg_merge.basic_merge.region (String, default=\"\") dnaseq_core_experimental.rg_merge.final_merge.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.rg_merge.final_merge.name_sorted (Boolean, default=false) dnaseq_core_experimental.rg_merge.final_merge.ncpu (Int, default=2) dnaseq_core_experimental.rg_merge.final_merge.region (String, default=\"\") dnaseq_core_experimental.rg_merge.inner_merge.combine_rg (Boolean, default=true) dnaseq_core_experimental.rg_merge.inner_merge.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.rg_merge.inner_merge.name_sorted (Boolean, default=false) dnaseq_core_experimental.rg_merge.inner_merge.ncpu (Int, default=2) dnaseq_core_experimental.rg_merge.inner_merge.region (String, default=\"\")","title":"Defaults"},{"location":"workflows/dnaseq-standard-fastq/#outputs","text":"harmonized_bam (File) harmonized_bam_index (File)","title":"Outputs"},{"location":"workflows/dnaseq-standard-fastq/#parse_input","text":"description Parses and validates the dnaseq_standard workflow's provided inputs outputs {'check': 'Dummy output to indicate success and to enable call-caching'}","title":"parse_input"},{"location":"workflows/dnaseq-standard-fastq/#inputs_1","text":"","title":"Inputs"},{"location":"workflows/dnaseq-standard-fastq/#required_1","text":"_runtime (Any, required ) aligner (String, required ); description : BWA aligner to use; choices : ['mem', 'aln'] array_lengths (Array[Int], required )","title":"Required"},{"location":"workflows/dnaseq-standard-fastq/#outputs_1","text":"check (String)","title":"Outputs"},{"location":"workflows/dnaseq-standard/","text":"WARNING: this workflow is experimental! Use at your own risk! dnaseq_standard_experimental description Aligns DNA reads using bwa outputs {'harmonized_bam': 'Harmonized DNA-Seq BAM, aligned with bwa', 'harmonized_bam_index': 'Index for the harmonized DNA-Seq BAM file'} allowNestedInputs true Inputs Required bam (File, required ): Input BAM to realign bwa_db (File, required ): Gzipped tar archive of the bwa reference files. Files should be at the root of the archive. get_ReadGroups._runtime (Any, required ) parse_input._runtime (Any, required ) subsample._runtime (Any, required ) validate_input_bam._runtime (Any, required ) bam_to_fastqs.bam_to_fastq._runtime (Any, required ) bam_to_fastqs.fqlint._runtime (Any, required ) bam_to_fastqs.quickcheck._runtime (Any, required ) bam_to_fastqs.split._runtime (Any, required ) dnaseq_core_experimental.ReadGroup_to_string._runtime (Any, required ) dnaseq_core_experimental.bwa_aln_pe._runtime (Any, required ) dnaseq_core_experimental.bwa_mem._runtime (Any, required ) dnaseq_core_experimental.index._runtime (Any, required ) dnaseq_core_experimental.read_ones._runtime (Any, required ) dnaseq_core_experimental.read_twos._runtime (Any, required ) dnaseq_core_experimental.sort._runtime (Any, required ) dnaseq_core_experimental.rg_merge.basic_merge._runtime (Any, required ) dnaseq_core_experimental.rg_merge.final_merge._runtime (Any, required ) dnaseq_core_experimental.rg_merge.inner_merge._runtime (Any, required ) Optional validate_input_bam.reference_fasta (File?) dnaseq_core_experimental.rg_merge.basic_merge.new_header (File?) dnaseq_core_experimental.rg_merge.final_merge.new_header (File?) dnaseq_core_experimental.rg_merge.inner_merge.new_header (File?) Defaults aligner (String, default=\"mem\"); description : BWA aligner to use; choices : ['mem', 'aln'] prefix (String, default=basename(bam,\".bam\")): Prefix for the BAM file. The extension .bam will be added. reads_per_file (Int, default=10000000): Controls the number of reads per FASTQ file for internal split to run BWA in parallel. subsample_n_reads (Int, default=-1): Only process a random sampling of n reads. Any n <= 0 for processing entire input. use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. validate_input (Boolean, default=true): Ensure input BAM is well-formed before beginning harmonization? dnaseq_core_experimental.use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. get_ReadGroups.modify_disk_size_gb (Int, default=0) subsample.modify_disk_size_gb (Int, default=0) subsample.ncpu (Int, default=2) subsample.prefix (String, default=basename(bam,\".bam\")): Prefix for the BAM file. The extension .bam will be added. validate_input_bam.ignore_list (Array[String], default=[]) validate_input_bam.index_validation_stringency_less_exhaustive (Boolean, default=false) validate_input_bam.max_errors (Int, default=2147483647) validate_input_bam.memory_gb (Int, default=16) validate_input_bam.modify_disk_size_gb (Int, default=0) validate_input_bam.outfile_name (String, default=basename(bam,\".bam\") + \".ValidateSamFile.txt\") validate_input_bam.succeed_on_errors (Boolean, default=false) validate_input_bam.succeed_on_warnings (Boolean, default=true) validate_input_bam.summary_mode (Boolean, default=false) validate_input_bam.validation_stringency (String, default=\"LENIENT\") bam_to_fastqs.bam_to_fastq.append_read_number (Boolean, default=true) bam_to_fastqs.bam_to_fastq.bitwise_filter (FlagFilter, default={\"include_if_all\": \"0x0\", \"exclude_if_any\": \"0x900\", \"include_if_any\": \"0x0\", \"exclude_if_all\": \"0x0\"}) bam_to_fastqs.bam_to_fastq.collated (Boolean, default=false) bam_to_fastqs.bam_to_fastq.fail_on_unexpected_reads (Boolean, default=false) bam_to_fastqs.bam_to_fastq.fast_mode (Boolean, default=!retain_collated_bam) bam_to_fastqs.bam_to_fastq.modify_disk_size_gb (Int, default=0) bam_to_fastqs.bam_to_fastq.modify_memory_gb (Int, default=0) bam_to_fastqs.bam_to_fastq.ncpu (Int, default=2) bam_to_fastqs.bam_to_fastq.output_singletons (Boolean, default=false) bam_to_fastqs.bam_to_fastq.prefix (String, default=basename(bam,\".bam\")): Prefix for the BAM file. The extension .bam will be added. bam_to_fastqs.bam_to_fastq.retain_collated_bam (Boolean, default=false) bam_to_fastqs.fqlint.disable_validator_codes (Array[String], default=[]) bam_to_fastqs.fqlint.modify_disk_size_gb (Int, default=0) bam_to_fastqs.fqlint.modify_memory_gb (Int, default=0) bam_to_fastqs.fqlint.paired_read_validation_level (String, default=\"high\") bam_to_fastqs.fqlint.panic (Boolean, default=true) bam_to_fastqs.fqlint.single_read_validation_level (String, default=\"high\") bam_to_fastqs.quickcheck.modify_disk_size_gb (Int, default=0) bam_to_fastqs.split.modify_disk_size_gb (Int, default=0) bam_to_fastqs.split.ncpu (Int, default=2) bam_to_fastqs.split.prefix (String, default=basename(bam,\".bam\")): Prefix for the BAM file. The extension .bam will be added. bam_to_fastqs.split.reject_unaccounted (Boolean, default=true) dnaseq_core_experimental.bwa_aln_pe.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.bwa_aln_pe.ncpu (Int, default=4) dnaseq_core_experimental.bwa_mem.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.bwa_mem.ncpu (Int, default=4) dnaseq_core_experimental.index.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.index.ncpu (Int, default=2) dnaseq_core_experimental.index.use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. dnaseq_core_experimental.read_ones.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.read_ones.ncpu (Int, default=2) dnaseq_core_experimental.read_ones.prefix (String, default=sub(basename(fastq),\"(fastq|fq)\\.gz$\",\"\")): Prefix for the BAM file. The extension .bam will be added. dnaseq_core_experimental.read_twos.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.read_twos.ncpu (Int, default=2) dnaseq_core_experimental.read_twos.prefix (String, default=sub(basename(fastq),\"(fastq|fq)\\.gz$\",\"\")): Prefix for the BAM file. The extension .bam will be added. dnaseq_core_experimental.rg_merge.max_length (Int, default=100) dnaseq_core_experimental.sort.memory_gb (Int, default=25) dnaseq_core_experimental.sort.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.sort.prefix (String, default=basename(bam,\".bam\") + \".sorted\"): Prefix for the BAM file. The extension .bam will be added. dnaseq_core_experimental.sort.sort_order (String, default=\"coordinate\") dnaseq_core_experimental.sort.validation_stringency (String, default=\"SILENT\") dnaseq_core_experimental.rg_merge.basic_merge.combine_rg (Boolean, default=true) dnaseq_core_experimental.rg_merge.basic_merge.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.rg_merge.basic_merge.name_sorted (Boolean, default=false) dnaseq_core_experimental.rg_merge.basic_merge.ncpu (Int, default=2) dnaseq_core_experimental.rg_merge.basic_merge.region (String, default=\"\") dnaseq_core_experimental.rg_merge.final_merge.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.rg_merge.final_merge.name_sorted (Boolean, default=false) dnaseq_core_experimental.rg_merge.final_merge.ncpu (Int, default=2) dnaseq_core_experimental.rg_merge.final_merge.region (String, default=\"\") dnaseq_core_experimental.rg_merge.inner_merge.combine_rg (Boolean, default=true) dnaseq_core_experimental.rg_merge.inner_merge.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.rg_merge.inner_merge.name_sorted (Boolean, default=false) dnaseq_core_experimental.rg_merge.inner_merge.ncpu (Int, default=2) dnaseq_core_experimental.rg_merge.inner_merge.region (String, default=\"\") Outputs harmonized_bam (File) harmonized_bam_index (File) parse_input description Parses and validates the dnaseq_standard workflow's provided inputs outputs {'check': 'Dummy output to indicate success and to enable call-caching'} Inputs Required _runtime (Any, required ) aligner (String, required ); description : BWA aligner to use; choices : ['mem', 'aln'] Outputs check (String)","title":"Dnaseq standard"},{"location":"workflows/dnaseq-standard/#dnaseq_standard_experimental","text":"description Aligns DNA reads using bwa outputs {'harmonized_bam': 'Harmonized DNA-Seq BAM, aligned with bwa', 'harmonized_bam_index': 'Index for the harmonized DNA-Seq BAM file'} allowNestedInputs true","title":"dnaseq_standard_experimental"},{"location":"workflows/dnaseq-standard/#inputs","text":"","title":"Inputs"},{"location":"workflows/dnaseq-standard/#required","text":"bam (File, required ): Input BAM to realign bwa_db (File, required ): Gzipped tar archive of the bwa reference files. Files should be at the root of the archive. get_ReadGroups._runtime (Any, required ) parse_input._runtime (Any, required ) subsample._runtime (Any, required ) validate_input_bam._runtime (Any, required ) bam_to_fastqs.bam_to_fastq._runtime (Any, required ) bam_to_fastqs.fqlint._runtime (Any, required ) bam_to_fastqs.quickcheck._runtime (Any, required ) bam_to_fastqs.split._runtime (Any, required ) dnaseq_core_experimental.ReadGroup_to_string._runtime (Any, required ) dnaseq_core_experimental.bwa_aln_pe._runtime (Any, required ) dnaseq_core_experimental.bwa_mem._runtime (Any, required ) dnaseq_core_experimental.index._runtime (Any, required ) dnaseq_core_experimental.read_ones._runtime (Any, required ) dnaseq_core_experimental.read_twos._runtime (Any, required ) dnaseq_core_experimental.sort._runtime (Any, required ) dnaseq_core_experimental.rg_merge.basic_merge._runtime (Any, required ) dnaseq_core_experimental.rg_merge.final_merge._runtime (Any, required ) dnaseq_core_experimental.rg_merge.inner_merge._runtime (Any, required )","title":"Required"},{"location":"workflows/dnaseq-standard/#optional","text":"validate_input_bam.reference_fasta (File?) dnaseq_core_experimental.rg_merge.basic_merge.new_header (File?) dnaseq_core_experimental.rg_merge.final_merge.new_header (File?) dnaseq_core_experimental.rg_merge.inner_merge.new_header (File?)","title":"Optional"},{"location":"workflows/dnaseq-standard/#defaults","text":"aligner (String, default=\"mem\"); description : BWA aligner to use; choices : ['mem', 'aln'] prefix (String, default=basename(bam,\".bam\")): Prefix for the BAM file. The extension .bam will be added. reads_per_file (Int, default=10000000): Controls the number of reads per FASTQ file for internal split to run BWA in parallel. subsample_n_reads (Int, default=-1): Only process a random sampling of n reads. Any n <= 0 for processing entire input. use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. validate_input (Boolean, default=true): Ensure input BAM is well-formed before beginning harmonization? dnaseq_core_experimental.use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. get_ReadGroups.modify_disk_size_gb (Int, default=0) subsample.modify_disk_size_gb (Int, default=0) subsample.ncpu (Int, default=2) subsample.prefix (String, default=basename(bam,\".bam\")): Prefix for the BAM file. The extension .bam will be added. validate_input_bam.ignore_list (Array[String], default=[]) validate_input_bam.index_validation_stringency_less_exhaustive (Boolean, default=false) validate_input_bam.max_errors (Int, default=2147483647) validate_input_bam.memory_gb (Int, default=16) validate_input_bam.modify_disk_size_gb (Int, default=0) validate_input_bam.outfile_name (String, default=basename(bam,\".bam\") + \".ValidateSamFile.txt\") validate_input_bam.succeed_on_errors (Boolean, default=false) validate_input_bam.succeed_on_warnings (Boolean, default=true) validate_input_bam.summary_mode (Boolean, default=false) validate_input_bam.validation_stringency (String, default=\"LENIENT\") bam_to_fastqs.bam_to_fastq.append_read_number (Boolean, default=true) bam_to_fastqs.bam_to_fastq.bitwise_filter (FlagFilter, default={\"include_if_all\": \"0x0\", \"exclude_if_any\": \"0x900\", \"include_if_any\": \"0x0\", \"exclude_if_all\": \"0x0\"}) bam_to_fastqs.bam_to_fastq.collated (Boolean, default=false) bam_to_fastqs.bam_to_fastq.fail_on_unexpected_reads (Boolean, default=false) bam_to_fastqs.bam_to_fastq.fast_mode (Boolean, default=!retain_collated_bam) bam_to_fastqs.bam_to_fastq.modify_disk_size_gb (Int, default=0) bam_to_fastqs.bam_to_fastq.modify_memory_gb (Int, default=0) bam_to_fastqs.bam_to_fastq.ncpu (Int, default=2) bam_to_fastqs.bam_to_fastq.output_singletons (Boolean, default=false) bam_to_fastqs.bam_to_fastq.prefix (String, default=basename(bam,\".bam\")): Prefix for the BAM file. The extension .bam will be added. bam_to_fastqs.bam_to_fastq.retain_collated_bam (Boolean, default=false) bam_to_fastqs.fqlint.disable_validator_codes (Array[String], default=[]) bam_to_fastqs.fqlint.modify_disk_size_gb (Int, default=0) bam_to_fastqs.fqlint.modify_memory_gb (Int, default=0) bam_to_fastqs.fqlint.paired_read_validation_level (String, default=\"high\") bam_to_fastqs.fqlint.panic (Boolean, default=true) bam_to_fastqs.fqlint.single_read_validation_level (String, default=\"high\") bam_to_fastqs.quickcheck.modify_disk_size_gb (Int, default=0) bam_to_fastqs.split.modify_disk_size_gb (Int, default=0) bam_to_fastqs.split.ncpu (Int, default=2) bam_to_fastqs.split.prefix (String, default=basename(bam,\".bam\")): Prefix for the BAM file. The extension .bam will be added. bam_to_fastqs.split.reject_unaccounted (Boolean, default=true) dnaseq_core_experimental.bwa_aln_pe.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.bwa_aln_pe.ncpu (Int, default=4) dnaseq_core_experimental.bwa_mem.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.bwa_mem.ncpu (Int, default=4) dnaseq_core_experimental.index.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.index.ncpu (Int, default=2) dnaseq_core_experimental.index.use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. dnaseq_core_experimental.read_ones.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.read_ones.ncpu (Int, default=2) dnaseq_core_experimental.read_ones.prefix (String, default=sub(basename(fastq),\"(fastq|fq)\\.gz$\",\"\")): Prefix for the BAM file. The extension .bam will be added. dnaseq_core_experimental.read_twos.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.read_twos.ncpu (Int, default=2) dnaseq_core_experimental.read_twos.prefix (String, default=sub(basename(fastq),\"(fastq|fq)\\.gz$\",\"\")): Prefix for the BAM file. The extension .bam will be added. dnaseq_core_experimental.rg_merge.max_length (Int, default=100) dnaseq_core_experimental.sort.memory_gb (Int, default=25) dnaseq_core_experimental.sort.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.sort.prefix (String, default=basename(bam,\".bam\") + \".sorted\"): Prefix for the BAM file. The extension .bam will be added. dnaseq_core_experimental.sort.sort_order (String, default=\"coordinate\") dnaseq_core_experimental.sort.validation_stringency (String, default=\"SILENT\") dnaseq_core_experimental.rg_merge.basic_merge.combine_rg (Boolean, default=true) dnaseq_core_experimental.rg_merge.basic_merge.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.rg_merge.basic_merge.name_sorted (Boolean, default=false) dnaseq_core_experimental.rg_merge.basic_merge.ncpu (Int, default=2) dnaseq_core_experimental.rg_merge.basic_merge.region (String, default=\"\") dnaseq_core_experimental.rg_merge.final_merge.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.rg_merge.final_merge.name_sorted (Boolean, default=false) dnaseq_core_experimental.rg_merge.final_merge.ncpu (Int, default=2) dnaseq_core_experimental.rg_merge.final_merge.region (String, default=\"\") dnaseq_core_experimental.rg_merge.inner_merge.combine_rg (Boolean, default=true) dnaseq_core_experimental.rg_merge.inner_merge.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.rg_merge.inner_merge.name_sorted (Boolean, default=false) dnaseq_core_experimental.rg_merge.inner_merge.ncpu (Int, default=2) dnaseq_core_experimental.rg_merge.inner_merge.region (String, default=\"\")","title":"Defaults"},{"location":"workflows/dnaseq-standard/#outputs","text":"harmonized_bam (File) harmonized_bam_index (File)","title":"Outputs"},{"location":"workflows/dnaseq-standard/#parse_input","text":"description Parses and validates the dnaseq_standard workflow's provided inputs outputs {'check': 'Dummy output to indicate success and to enable call-caching'}","title":"parse_input"},{"location":"workflows/dnaseq-standard/#inputs_1","text":"","title":"Inputs"},{"location":"workflows/dnaseq-standard/#required_1","text":"_runtime (Any, required ) aligner (String, required ); description : BWA aligner to use; choices : ['mem', 'aln']","title":"Required"},{"location":"workflows/dnaseq-standard/#outputs_1","text":"check (String)","title":"Outputs"},{"location":"workflows/flag_filter/","text":"FlagFilter A struct to represent the filtering flags used in various samtools commands. The order of precedence is include_if_all , exclude_if_any , include_if_any , and exclude_if_all . These four fields correspond to the samtools flags -f , -F , --rf , and -G respectively. The values of these fields are strings that represent a 12bit bitwise flag. These strings must evaluate to an integer less than 4096 (2^12). They can be in octal, decimal, or hexadecimal format. Please see the meta.help of validate_string_is_12bit_oct_dec_or_hex for more information on the valid formats. The validate_FlagFilter workflow can be used to validate a FlagFilter struct. WARNING The validate_FlagFilter workflow will only check that all the fields can be parsed as integers less than 4096. It will not check if the flags are sensible input to samtools fastq . samtools fastq also employs very little error checking on the flags. So it is possible to pass in flags that produce nonsensical output. For example, it is possible to pass in flags that produce no output. Please exhibit caution while modifying any default values of a FlagFilter . We suggest using the Broad Institute's SAM flag explainer to construct the flags. Find it here . Example input JSON { \"flags\": { \"include_if_all\": \"0x3\", \"exclude_if_any\": \"0xF04\", \"include_if_any\": \"0x0\", \"exclude_if_all\": \"0x0\" } } Explanation The above example JSON represents a FlagFilter struct being passed to parameter named flags . The include_if_all field is set to 0x3 which is 3 in decimal. The exclude_if_any field is set to 0xF04 which is 3844 in decimal. The include_if_any field is set to 0x0 which is 0 in decimal. The exclude_if_all field is set to 0x0 which is 0 in decimal. 3 in decimal can be represented as 000000000011 in 12bit binary. This number means that to be included a read must have the 1st and 2nd bits set. Those bits correspond to the read paired and read mapped in proper pair flags. 3844 in decimal can be represented as 111100000100 in 12bit binary. This number means that to be excluded a read must have any of the 3rd, 9th, 10th, 11th, or 12th bits set. We won't go through what all those bits mean here, but you can find the meanings of the bits in the SAM flag explainer . In short, those are all flags corresponding to the quality of the read and them being true may indicate that the read is of low quality and should be excluded. validate_FlagFilter description Validates a FlagFilter struct. output {'check': 'Dummy output to enable caching.'} Inputs Required flags (FlagFilter, required ): FlagFilter struct to validate validate_exclude_if_all._runtime (Any, required ) validate_exclude_if_any._runtime (Any, required ) validate_include_if_all._runtime (Any, required ) validate_include_if_any._runtime (Any, required ) Outputs check (String) validate_string_is_12bit_oct_dec_or_hex description Validates that a string is a octal, decimal, or hexadecimal number and less than 2^12. help Hexadecimal numbers must be prefixed with '0x' and only contain the characters [0-9A-F] to be valid (i.e. [a-f] is not allowed). Octal number must start with '0' and only contain the characters [0-7] to be valid. And decimal numbers must start with a digit between 1-9 and only contain the characters [0-9] to be valid. outputs {'check': 'Dummy output to enable caching.'} Inputs Required _runtime (Any, required ) number (String, required ): The number to validate. See task meta.help for accepted formats. Outputs check (String)","title":"FlagFilter"},{"location":"workflows/flag_filter/#flagfilter","text":"A struct to represent the filtering flags used in various samtools commands. The order of precedence is include_if_all , exclude_if_any , include_if_any , and exclude_if_all . These four fields correspond to the samtools flags -f , -F , --rf , and -G respectively. The values of these fields are strings that represent a 12bit bitwise flag. These strings must evaluate to an integer less than 4096 (2^12). They can be in octal, decimal, or hexadecimal format. Please see the meta.help of validate_string_is_12bit_oct_dec_or_hex for more information on the valid formats. The validate_FlagFilter workflow can be used to validate a FlagFilter struct. WARNING The validate_FlagFilter workflow will only check that all the fields can be parsed as integers less than 4096. It will not check if the flags are sensible input to samtools fastq . samtools fastq also employs very little error checking on the flags. So it is possible to pass in flags that produce nonsensical output. For example, it is possible to pass in flags that produce no output. Please exhibit caution while modifying any default values of a FlagFilter . We suggest using the Broad Institute's SAM flag explainer to construct the flags. Find it here .","title":"FlagFilter"},{"location":"workflows/flag_filter/#example-input-json","text":"{ \"flags\": { \"include_if_all\": \"0x3\", \"exclude_if_any\": \"0xF04\", \"include_if_any\": \"0x0\", \"exclude_if_all\": \"0x0\" } }","title":"Example input JSON"},{"location":"workflows/flag_filter/#explanation","text":"The above example JSON represents a FlagFilter struct being passed to parameter named flags . The include_if_all field is set to 0x3 which is 3 in decimal. The exclude_if_any field is set to 0xF04 which is 3844 in decimal. The include_if_any field is set to 0x0 which is 0 in decimal. The exclude_if_all field is set to 0x0 which is 0 in decimal. 3 in decimal can be represented as 000000000011 in 12bit binary. This number means that to be included a read must have the 1st and 2nd bits set. Those bits correspond to the read paired and read mapped in proper pair flags. 3844 in decimal can be represented as 111100000100 in 12bit binary. This number means that to be excluded a read must have any of the 3rd, 9th, 10th, 11th, or 12th bits set. We won't go through what all those bits mean here, but you can find the meanings of the bits in the SAM flag explainer . In short, those are all flags corresponding to the quality of the read and them being true may indicate that the read is of low quality and should be excluded.","title":"Explanation"},{"location":"workflows/flag_filter/#validate_flagfilter","text":"description Validates a FlagFilter struct. output {'check': 'Dummy output to enable caching.'}","title":"validate_FlagFilter"},{"location":"workflows/flag_filter/#inputs","text":"","title":"Inputs"},{"location":"workflows/flag_filter/#required","text":"flags (FlagFilter, required ): FlagFilter struct to validate validate_exclude_if_all._runtime (Any, required ) validate_exclude_if_any._runtime (Any, required ) validate_include_if_all._runtime (Any, required ) validate_include_if_any._runtime (Any, required )","title":"Required"},{"location":"workflows/flag_filter/#outputs","text":"check (String)","title":"Outputs"},{"location":"workflows/flag_filter/#validate_string_is_12bit_oct_dec_or_hex","text":"description Validates that a string is a octal, decimal, or hexadecimal number and less than 2^12. help Hexadecimal numbers must be prefixed with '0x' and only contain the characters [0-9A-F] to be valid (i.e. [a-f] is not allowed). Octal number must start with '0' and only contain the characters [0-7] to be valid. And decimal numbers must start with a digit between 1-9 and only contain the characters [0-9] to be valid. outputs {'check': 'Dummy output to enable caching.'}","title":"validate_string_is_12bit_oct_dec_or_hex"},{"location":"workflows/flag_filter/#inputs_1","text":"","title":"Inputs"},{"location":"workflows/flag_filter/#required_1","text":"_runtime (Any, required ) number (String, required ): The number to validate. See task meta.help for accepted formats.","title":"Required"},{"location":"workflows/flag_filter/#outputs_1","text":"check (String)","title":"Outputs"},{"location":"workflows/gatk-reference/","text":"gatk_reference description Fetches reference files for GATK. outputs {'fasta': 'FASTA file for the reference genome.', 'fasta_index': 'Index for the FASTA file for the reference genome.', 'fasta_dict': 'Sequence dictionary for the reference genome.', 'dbSNP_vcf': 'dbSNP VCF file for the reference genome.', 'dbSNP_vcf_index': 'Index for the dbSNP VCF file for the reference genome.', 'interval_list': 'List of intervals that will be used when computing variants.', 'knownVCFs': 'VCF files with known variants to use with variant calling.'} allowNestedInputs true Inputs Required dbSNP_vcf_name (String, required ): Name of the dbSNP VCF file. dbSNP_vcf_url (String, required ): URL from which to retrieve the dbSNP VCF file. knownVCF_names (Array[String], required ): Names of the VCF files with known variants. Order should match that of knownVCF_urls . knownVCF_urls (Array[String], required ): URLs from which to retrieve VCF files with known variants. reference_fa_md5 (String, required ): MD5 checksum for the reference FASTA file. reference_fa_name (String, required ): Name of the output reference FASTA file. reference_fa_url (String, required ): URL from which to retrieve the reference FASTA file. create_sequence_dictionary._runtime (Any, required ) dbsnp._runtime (Any, required ) dbsnp.disk_size_gb (Int, required ) dbsnp_index._runtime (Any, required ) dbsnp_index.disk_size_gb (Int, required ) faidx._runtime (Any, required ) fasta_download._runtime (Any, required ) fasta_download.disk_size_gb (Int, required ) intervals._runtime (Any, required ) intervals.disk_size_gb (Int, required ) knownVCF._runtime (Any, required ) knownVCF.disk_size_gb (Int, required ) Optional dbSNP_vcf_index_name (String?): Name of the index for the dbSNP VCF file. dbSNP_vcf_index_url (String?): URL from which to retrieve the index for the dbSNP VCF file. interval_list_name (String?): Name of the list of intervals to use when computing variants. interval_list_url (String?): URL from which to retrieve the list of intervals to use when computing variants. create_sequence_dictionary.assembly_name (String?) create_sequence_dictionary.fasta_url (String?) create_sequence_dictionary.species (String?) dbsnp.md5sum (String?) dbsnp_index.md5sum (String?) intervals.md5sum (String?) knownVCF.md5sum (String?) Defaults create_sequence_dictionary.memory_gb (Int, default=16) create_sequence_dictionary.modify_disk_size_gb (Int, default=0) create_sequence_dictionary.outfile_name (String, default=basename(fasta,\".fa\") + \".dict\") faidx.modify_disk_size_gb (Int, default=0) faidx.use_all_cores (Boolean, default=false) Outputs fasta (File) fasta_index (File) fasta_dict (File) dbSNP_vcf (File?) dbSNP_vcf_index (File?) interval_list (File?) knownVCFs (Array[File])","title":"Gatk reference"},{"location":"workflows/gatk-reference/#gatk_reference","text":"description Fetches reference files for GATK. outputs {'fasta': 'FASTA file for the reference genome.', 'fasta_index': 'Index for the FASTA file for the reference genome.', 'fasta_dict': 'Sequence dictionary for the reference genome.', 'dbSNP_vcf': 'dbSNP VCF file for the reference genome.', 'dbSNP_vcf_index': 'Index for the dbSNP VCF file for the reference genome.', 'interval_list': 'List of intervals that will be used when computing variants.', 'knownVCFs': 'VCF files with known variants to use with variant calling.'} allowNestedInputs true","title":"gatk_reference"},{"location":"workflows/gatk-reference/#inputs","text":"","title":"Inputs"},{"location":"workflows/gatk-reference/#required","text":"dbSNP_vcf_name (String, required ): Name of the dbSNP VCF file. dbSNP_vcf_url (String, required ): URL from which to retrieve the dbSNP VCF file. knownVCF_names (Array[String], required ): Names of the VCF files with known variants. Order should match that of knownVCF_urls . knownVCF_urls (Array[String], required ): URLs from which to retrieve VCF files with known variants. reference_fa_md5 (String, required ): MD5 checksum for the reference FASTA file. reference_fa_name (String, required ): Name of the output reference FASTA file. reference_fa_url (String, required ): URL from which to retrieve the reference FASTA file. create_sequence_dictionary._runtime (Any, required ) dbsnp._runtime (Any, required ) dbsnp.disk_size_gb (Int, required ) dbsnp_index._runtime (Any, required ) dbsnp_index.disk_size_gb (Int, required ) faidx._runtime (Any, required ) fasta_download._runtime (Any, required ) fasta_download.disk_size_gb (Int, required ) intervals._runtime (Any, required ) intervals.disk_size_gb (Int, required ) knownVCF._runtime (Any, required ) knownVCF.disk_size_gb (Int, required )","title":"Required"},{"location":"workflows/gatk-reference/#optional","text":"dbSNP_vcf_index_name (String?): Name of the index for the dbSNP VCF file. dbSNP_vcf_index_url (String?): URL from which to retrieve the index for the dbSNP VCF file. interval_list_name (String?): Name of the list of intervals to use when computing variants. interval_list_url (String?): URL from which to retrieve the list of intervals to use when computing variants. create_sequence_dictionary.assembly_name (String?) create_sequence_dictionary.fasta_url (String?) create_sequence_dictionary.species (String?) dbsnp.md5sum (String?) dbsnp_index.md5sum (String?) intervals.md5sum (String?) knownVCF.md5sum (String?)","title":"Optional"},{"location":"workflows/gatk-reference/#defaults","text":"create_sequence_dictionary.memory_gb (Int, default=16) create_sequence_dictionary.modify_disk_size_gb (Int, default=0) create_sequence_dictionary.outfile_name (String, default=basename(fasta,\".fa\") + \".dict\") faidx.modify_disk_size_gb (Int, default=0) faidx.use_all_cores (Boolean, default=false)","title":"Defaults"},{"location":"workflows/gatk-reference/#outputs","text":"fasta (File) fasta_index (File) fasta_dict (File) dbSNP_vcf (File?) dbSNP_vcf_index (File?) interval_list (File?) knownVCFs (Array[File])","title":"Outputs"},{"location":"workflows/make-qc-reference/","text":"make_qc_reference description Downloads and creates all reference files needed to run the quality_check workflow outputs {'reference_fa': 'FASTA format reference file', 'gtf': 'GTF feature file', 'exon_bed': '3 column BED file defining the regions of the exome. Derived from gtf .', 'CDS_bed': '3 column BED file defining the regions of the coding domain. Derived from gtf .', 'kraken_db': 'A complete Kraken2 database'} allowNestedInputs true Inputs Required gtf_name (String, required ): Name of output GTF file gtf_url (String, required ): URL to retrieve the reference GTF file from reference_fa_name (String, required ): Name of output reference FASTA file reference_fa_url (String, required ): URL to retrieve the reference FASTA file from create_library_from_fastas._runtime (Any, required ) download_library._runtime (Any, required ) download_taxonomy._runtime (Any, required ) fastas_download._runtime (Any, required ) gtf_download._runtime (Any, required ) kraken_build_db._runtime (Any, required ) make_coverage_regions_beds._runtime (Any, required ) reference_download._runtime (Any, required ) Optional fastas_download.md5sum (String?) gtf_download.md5sum (String?) reference_download.md5sum (String?) Defaults gtf_disk_size_gb (Int, default=10): Disk size (in GB) to allocate for downloading the GTF file kraken_fasta_urls (Array[String], default=[]): URLs for any additional FASTA files in NCBI format to download and include in the Kraken2 database. This allows the addition of individual genomes (or other sequences) of interest. kraken_fastas (Array[File], default=[]): Array of gzipped FASTA files. Each sequence's ID must contain either an NCBI accession number or an explicit assignment of the taxonomy ID using kraken:taxid kraken_fastas_disk_size_gb (Int, default=10): Disk size (in GB) to allocate for downloading the FASTA files kraken_libraries (Array[String], default=[\"archaea\", \"bacteria\", \"plasmid\", \"viral\", \"human\", \"fungi\", \"protozoa\", \"UniVec_Core\"]); description : List of kraken libraries to download; choices : ['archaea', 'bacteria', 'plasmid', 'viral', 'human', 'fungi', 'plant', 'protozoa', 'nt', 'UniVec', 'UniVec_Core'] protein (Boolean, default=false): Construct a protein database? reference_fa_disk_size_gb (Int, default=10): Disk size (in GB) to allocate for downloading the reference FASTA file create_library_from_fastas.modify_disk_size_gb (Int, default=0) download_library.modify_disk_size_gb (Int, default=0) kraken_build_db.db_name (String, default=\"kraken2_db\") kraken_build_db.kmer_len (Int, default=if protein then 15 else 35) kraken_build_db.max_db_size_gb (Int, default=-1) kraken_build_db.minimizer_len (Int, default=if protein then 12 else 31) kraken_build_db.minimizer_spaces (Int, default=if protein then 0 else 7) kraken_build_db.modify_disk_size_gb (Int, default=0) kraken_build_db.modify_memory_gb (Int, default=0) kraken_build_db.ncpu (Int, default=4) kraken_build_db.use_all_cores (Boolean, default=false) make_coverage_regions_beds.modify_disk_size_gb (Int, default=0) Outputs reference_fa (File) gtf (File) exon_bed (File) CDS_bed (File) kraken_db (File)","title":"Make qc reference"},{"location":"workflows/make-qc-reference/#make_qc_reference","text":"description Downloads and creates all reference files needed to run the quality_check workflow outputs {'reference_fa': 'FASTA format reference file', 'gtf': 'GTF feature file', 'exon_bed': '3 column BED file defining the regions of the exome. Derived from gtf .', 'CDS_bed': '3 column BED file defining the regions of the coding domain. Derived from gtf .', 'kraken_db': 'A complete Kraken2 database'} allowNestedInputs true","title":"make_qc_reference"},{"location":"workflows/make-qc-reference/#inputs","text":"","title":"Inputs"},{"location":"workflows/make-qc-reference/#required","text":"gtf_name (String, required ): Name of output GTF file gtf_url (String, required ): URL to retrieve the reference GTF file from reference_fa_name (String, required ): Name of output reference FASTA file reference_fa_url (String, required ): URL to retrieve the reference FASTA file from create_library_from_fastas._runtime (Any, required ) download_library._runtime (Any, required ) download_taxonomy._runtime (Any, required ) fastas_download._runtime (Any, required ) gtf_download._runtime (Any, required ) kraken_build_db._runtime (Any, required ) make_coverage_regions_beds._runtime (Any, required ) reference_download._runtime (Any, required )","title":"Required"},{"location":"workflows/make-qc-reference/#optional","text":"fastas_download.md5sum (String?) gtf_download.md5sum (String?) reference_download.md5sum (String?)","title":"Optional"},{"location":"workflows/make-qc-reference/#defaults","text":"gtf_disk_size_gb (Int, default=10): Disk size (in GB) to allocate for downloading the GTF file kraken_fasta_urls (Array[String], default=[]): URLs for any additional FASTA files in NCBI format to download and include in the Kraken2 database. This allows the addition of individual genomes (or other sequences) of interest. kraken_fastas (Array[File], default=[]): Array of gzipped FASTA files. Each sequence's ID must contain either an NCBI accession number or an explicit assignment of the taxonomy ID using kraken:taxid kraken_fastas_disk_size_gb (Int, default=10): Disk size (in GB) to allocate for downloading the FASTA files kraken_libraries (Array[String], default=[\"archaea\", \"bacteria\", \"plasmid\", \"viral\", \"human\", \"fungi\", \"protozoa\", \"UniVec_Core\"]); description : List of kraken libraries to download; choices : ['archaea', 'bacteria', 'plasmid', 'viral', 'human', 'fungi', 'plant', 'protozoa', 'nt', 'UniVec', 'UniVec_Core'] protein (Boolean, default=false): Construct a protein database? reference_fa_disk_size_gb (Int, default=10): Disk size (in GB) to allocate for downloading the reference FASTA file create_library_from_fastas.modify_disk_size_gb (Int, default=0) download_library.modify_disk_size_gb (Int, default=0) kraken_build_db.db_name (String, default=\"kraken2_db\") kraken_build_db.kmer_len (Int, default=if protein then 15 else 35) kraken_build_db.max_db_size_gb (Int, default=-1) kraken_build_db.minimizer_len (Int, default=if protein then 12 else 31) kraken_build_db.minimizer_spaces (Int, default=if protein then 0 else 7) kraken_build_db.modify_disk_size_gb (Int, default=0) kraken_build_db.modify_memory_gb (Int, default=0) kraken_build_db.ncpu (Int, default=4) kraken_build_db.use_all_cores (Boolean, default=false) make_coverage_regions_beds.modify_disk_size_gb (Int, default=0)","title":"Defaults"},{"location":"workflows/make-qc-reference/#outputs","text":"reference_fa (File) gtf (File) exon_bed (File) CDS_bed (File) kraken_db (File)","title":"Outputs"},{"location":"workflows/markdups-post/","text":"MarkDuplicates Post An investigation of all our QC tools was conducted when duplicate marking was introduced to our pipeline. Most tools do not take into consideration whether a read is a duplicate or not. But the tasks called below produce different results depending on whether the input BAM has been duplicate marked or not. markdups_post description Runs QC analyses which are impacted by duplicate marking outputs {'insert_size_metrics': ' *.txt output file of picard collectInsertSizeMetrics ', 'insert_size_metrics_pdf': ' *.pdf output file of picard collectInsertSizeMetrics ', 'flagstat_report': ' samtools flagstat report', 'mosdepth_global_summary': 'Summary of whole genome coverage produced by mosdepth ', 'mosdepth_global_dist': 'Distribution of whole genome coverage produced by mosdepth ', 'mosdepth_region_summary': 'Summaries of coverage corresponding to the regions defined by coverage_beds input, produced by mosdepth ', 'mosdepth_region_dist': 'Distributions of coverage corresponding to the regions defined by coverage_beds input, produced by mosdepth '} allowNestedInputs true Inputs Required markdups_bam (File, required ): Input BAM format file to quality check. Duplicates being marked is not necessary for a successful run of this workflow. markdups_bam_index (File, required ): BAM index file corresponding to the input BAM collect_insert_size_metrics._runtime (Any, required ) flagstat._runtime (Any, required ) regions_coverage._runtime (Any, required ) wg_coverage._runtime (Any, required ) Optional wg_coverage.coverage_bed (File?) Defaults coverage_beds (Array[File], default=[]): An array of 3 column BEDs which are passed to the -b flag of mosdepth, in order to restrict coverage analysis to select regions coverage_labels (Array[String], default=[]): An array of equal length to coverage_beds which determines the prefix label applied to the output files. If omitted, defaults of regions1 , regions2 , etc. will be used. prefix (String, default=basename(markdups_bam,\".bam\")): Prefix for all results files collect_insert_size_metrics.memory_gb (Int, default=8) collect_insert_size_metrics.modify_disk_size_gb (Int, default=0) collect_insert_size_metrics.validation_stringency (String, default=\"SILENT\") flagstat.modify_disk_size_gb (Int, default=0) flagstat.ncpu (Int, default=2) flagstat.use_all_cores (Boolean, default=false) regions_coverage.min_mapping_quality (Int, default=20) regions_coverage.modify_disk_size_gb (Int, default=0) regions_coverage.use_fast_mode (Boolean, default=true) wg_coverage.min_mapping_quality (Int, default=20) wg_coverage.modify_disk_size_gb (Int, default=0) wg_coverage.use_fast_mode (Boolean, default=true) Outputs insert_size_metrics (File) insert_size_metrics_pdf (File) flagstat_report (File) mosdepth_global_summary (File) mosdepth_global_dist (File) mosdepth_region_summary (Array[File]) mosdepth_region_dist (Array[File?])","title":"MarkDuplicates Post"},{"location":"workflows/markdups-post/#markduplicates-post","text":"An investigation of all our QC tools was conducted when duplicate marking was introduced to our pipeline. Most tools do not take into consideration whether a read is a duplicate or not. But the tasks called below produce different results depending on whether the input BAM has been duplicate marked or not.","title":"MarkDuplicates Post"},{"location":"workflows/markdups-post/#markdups_post","text":"description Runs QC analyses which are impacted by duplicate marking outputs {'insert_size_metrics': ' *.txt output file of picard collectInsertSizeMetrics ', 'insert_size_metrics_pdf': ' *.pdf output file of picard collectInsertSizeMetrics ', 'flagstat_report': ' samtools flagstat report', 'mosdepth_global_summary': 'Summary of whole genome coverage produced by mosdepth ', 'mosdepth_global_dist': 'Distribution of whole genome coverage produced by mosdepth ', 'mosdepth_region_summary': 'Summaries of coverage corresponding to the regions defined by coverage_beds input, produced by mosdepth ', 'mosdepth_region_dist': 'Distributions of coverage corresponding to the regions defined by coverage_beds input, produced by mosdepth '} allowNestedInputs true","title":"markdups_post"},{"location":"workflows/markdups-post/#inputs","text":"","title":"Inputs"},{"location":"workflows/markdups-post/#required","text":"markdups_bam (File, required ): Input BAM format file to quality check. Duplicates being marked is not necessary for a successful run of this workflow. markdups_bam_index (File, required ): BAM index file corresponding to the input BAM collect_insert_size_metrics._runtime (Any, required ) flagstat._runtime (Any, required ) regions_coverage._runtime (Any, required ) wg_coverage._runtime (Any, required )","title":"Required"},{"location":"workflows/markdups-post/#optional","text":"wg_coverage.coverage_bed (File?)","title":"Optional"},{"location":"workflows/markdups-post/#defaults","text":"coverage_beds (Array[File], default=[]): An array of 3 column BEDs which are passed to the -b flag of mosdepth, in order to restrict coverage analysis to select regions coverage_labels (Array[String], default=[]): An array of equal length to coverage_beds which determines the prefix label applied to the output files. If omitted, defaults of regions1 , regions2 , etc. will be used. prefix (String, default=basename(markdups_bam,\".bam\")): Prefix for all results files collect_insert_size_metrics.memory_gb (Int, default=8) collect_insert_size_metrics.modify_disk_size_gb (Int, default=0) collect_insert_size_metrics.validation_stringency (String, default=\"SILENT\") flagstat.modify_disk_size_gb (Int, default=0) flagstat.ncpu (Int, default=2) flagstat.use_all_cores (Boolean, default=false) regions_coverage.min_mapping_quality (Int, default=20) regions_coverage.modify_disk_size_gb (Int, default=0) regions_coverage.use_fast_mode (Boolean, default=true) wg_coverage.min_mapping_quality (Int, default=20) wg_coverage.modify_disk_size_gb (Int, default=0) wg_coverage.use_fast_mode (Boolean, default=true)","title":"Defaults"},{"location":"workflows/markdups-post/#outputs","text":"insert_size_metrics (File) insert_size_metrics_pdf (File) flagstat_report (File) mosdepth_global_summary (File) mosdepth_global_dist (File) mosdepth_region_summary (Array[File]) mosdepth_region_dist (Array[File?])","title":"Outputs"},{"location":"workflows/quality-check-standard/","text":"quality_check description Performs comprehensive quality checks, aggregating all analyses and metrics into a final MultiQC report. help Assumes that input BAM is position-sorted. external_help https://multiqc.info/ outputs {'bam_checksum': 'STDOUT of the md5sum command run on the input BAM that has been redirected to a file', 'validate_sam_file': 'Validation report produced by picard ValidateSamFile . Validation warnings and errors are logged.', 'flagstat_report': ' samtools flagstat STDOUT redirected to a file. If mark_duplicates is true , then this result will be generated from the duplicate marked BAM.', 'fastqc_results': 'A gzipped tar archive of all FastQC output files', 'instrument_file': 'TSV file containing the ngsderive isntrument report for the input BAM file', 'read_length_file': 'TSV file containing the ngsderive readlen report for the input BAM file', 'inferred_encoding': 'TSV file containing the ngsderive encoding report for the input BAM file', 'alignment_metrics': {'description': 'The text file output of picard CollectAlignmentSummaryMetrics ', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#AlignmentSummaryMetrics'}, 'alignment_metrics_pdf': 'The PDF file output of picard CollectAlignmentSummaryMetrics ', 'insert_size_metrics': {'description': 'The text file output of picard CollectInsertSizeMetrics . If mark_duplicates is true , then this result will be generated from the duplicate marked BAM.', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#InsertSizeMetrics'}, 'insert_size_metrics_pdf': 'The PDF file output of picard CollectInsertSizeMetrics . If mark_duplicates is true , then this result will be generated from the duplicate marked BAM.', 'quality_score_distribution_txt': 'The text file output of picard QualityScoreDistribution ', 'quality_score_distribution_pdf': 'The PDF file output of picard QualityScoreDistribution ', 'phred_scores': 'Headered TSV file containing PHRED score statistics', 'kraken_report': {'description': 'A Kraken2 summary report', 'external_help': 'https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#sample-report-output-format'}, 'mosdepth_global_dist': 'The $prefix.mosdepth.global.dist.txt file contains a cumulative distribution indicating the proportion of total bases that were covered for at least a given coverage value. It does this for each chromosome, and for the whole genome.', 'mosdepth_global_summary': 'A summary of mean depths per chromosome', 'mosdepth_region_dist': 'The $prefix.mosdepth.region.dist.txt file contains a cumulative distribution indicating the proportion of total bases in the region(s) defined by the coverage_bed that were covered for at least a given coverage value. There will be one file in this array for each coverage_beds input file.', 'mosdepth_region_summary': 'A summary of mean depths per chromosome and within specified regions per chromosome. There will be one file in this array for each coverage_beds input file.', 'multiqc_report': 'A gzipped tar archive of all MultiQC output files', 'orig_read_count': 'A TSV report containing the original read count before subsampling. Only present if subsample_n_reads > 0 .', 'kraken_sequences': {'description': 'Detailed Kraken2 output that has been gzipped. Only present if store_kraken_sequences == true .', 'external_help': 'https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#standard-kraken-output-format'}, 'comparative_kraken_report': 'Kraken2 summary report for only the alternatively filtered reads. Only present if run_comparative_kraken == true .', 'comparative_kraken_sequences': 'Detailed Kraken2 output for only the alternatively filtered reads. Only present if run_comparative_kraken == true && store_kraken_sequences == true .', 'mark_duplicates_metrics': {'description': 'The METRICS_FILE result of picard MarkDuplicates . Only present if mark_duplicates == true && optical_distance > 0 .', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#DuplicationMetrics'}, 'mosdepth_dups_marked_global_dist': 'The $prefix.mosdepth.global.dist.txt file contains a cumulative distribution indicating the proportion of total bases that were covered for at least a given coverage value. It does this for each chromosome, and for the whole genome. This file is produced from analyzing the duplicate marked BAM. Only present if mark_duplicates == true .', 'mosdepth_dups_marked_global_summary': 'A summary of mean depths per chromosome. This file is produced from analyzing the duplicate marked BAM. Only present if mark_duplicates == true .', 'mosdepth_dups_marked_region_dist': 'The $prefix.mosdepth.region.dist.txt file contains a cumulative distribution indicating the proportion of total bases in the region(s) defined by the coverage_bed that were covered for at least a given coverage value. There will be one file in this array for each coverage_beds input file. This file is produced from analyzing the duplicate marked BAM. Only present if mark_duplicates == true .', 'mosdepth_dups_marked_region_summary': 'A summary of mean depths per chromosome and within specified regions per chromosome. There will be one file in this array for each coverage_beds input file. This file is produced from analyzing the duplicate marked BAM. Only present if mark_duplicates == true .', 'inferred_strandedness': 'TSV file containing the ngsderive strandedness report. Only present if rna == true .', 'qualimap_rnaseq_results': 'Gzipped tar archive of all QualiMap output files. Only present if rna == true .', 'junction_summary': 'TSV file containing the ngsderive junction-annotation summary. Only present if rna == true ', 'junctions': 'TSV file containing a detailed list of annotated junctions. Only present if rna == true .', 'librarian_report': 'A tar archive containing the librarian report and raw data. Only present if run_librarian == true .', 'IntermediateFiles': 'Any and all files produced as intermediate during pipeline processing. Only output if output_intermediate_files == true .'} allowNestedInputs true Inputs Required bam (File, required ): Input BAM format file to quality check bam_index (File, required ): BAM index file corresponding to the input BAM kraken_db (File, required ): Kraken2 database. Can be generated with ../reference/make-qc-reference.wdl . Must be a tarball without a root directory. alt_filtered_fastq._runtime (Any, required ) alt_filtered_fqlint._runtime (Any, required ) bam_to_fastq._runtime (Any, required ) collect_alignment_summary_metrics._runtime (Any, required ) collect_insert_size_metrics._runtime (Any, required ) comparative_kraken._runtime (Any, required ) compression_integrity._runtime (Any, required ) compute_checksum._runtime (Any, required ) encoding._runtime (Any, required ) endedness._runtime (Any, required ) fastqc._runtime (Any, required ) flagstat._runtime (Any, required ) fqlint._runtime (Any, required ) global_phred_scores._runtime (Any, required ) instrument._runtime (Any, required ) junction_annotation._runtime (Any, required ) kraken._runtime (Any, required ) librarian._runtime (Any, required ) markdups._runtime (Any, required ) multiqc._runtime (Any, required ) parse_input._runtime (Any, required ) qualimap_rnaseq._runtime (Any, required ) quality_score_distribution._runtime (Any, required ) quickcheck._runtime (Any, required ) read_length._runtime (Any, required ) regions_coverage._runtime (Any, required ) strandedness._runtime (Any, required ) subsample._runtime (Any, required ) subsample_index._runtime (Any, required ) validate_bam._runtime (Any, required ) wg_coverage._runtime (Any, required ) comparative_kraken_filter_validator.validate_exclude_if_all._runtime (Any, required ) comparative_kraken_filter_validator.validate_exclude_if_any._runtime (Any, required ) comparative_kraken_filter_validator.validate_include_if_all._runtime (Any, required ) comparative_kraken_filter_validator.validate_include_if_any._runtime (Any, required ) kraken_filter_validator.validate_exclude_if_all._runtime (Any, required ) kraken_filter_validator.validate_exclude_if_any._runtime (Any, required ) kraken_filter_validator.validate_include_if_all._runtime (Any, required ) kraken_filter_validator.validate_include_if_any._runtime (Any, required ) markdups_post.collect_insert_size_metrics._runtime (Any, required ) markdups_post.flagstat._runtime (Any, required ) markdups_post.regions_coverage._runtime (Any, required ) markdups_post.wg_coverage._runtime (Any, required ) Optional gtf (File?): GTF features file. Gzipped or uncompressed. Required for RNA-Seq data. validate_bam.reference_fasta (File?) wg_coverage.coverage_bed (File?) markdups_post.wg_coverage.coverage_bed (File?) Defaults comparative_filter (FlagFilter, default={\"include_if_all\": \"0x0\", \"exclude_if_any\": \"0x904\", \"include_if_any\": \"0x0\", \"exclude_if_all\": \"0x0\"}): Filter to apply to the input BAM while performing a second FASTQ conversion, before running Kraken2 another time. This is a FlagFilter object (see ../../data_structures/flag_filter.wdl for more information). By default, it will remove unmapped, secondary, and supplementary reads from the created FASTQs. WARNING These filters can be tricky to configure; please read documentation thoroughly before changing the defaults. coverage_beds (Array[File], default=[]): An array of 3 column BEDs which are passed to the -b flag of mosdepth, in order to restrict coverage analysis to select regions. Any regional analysis enabled by this option is in addition to whole genome coverage, which is calculated regardless of this setting. An exon BED and a Coding Sequence BED are examples of regions you may wish to restrict coverage analysis to. Those two BEDs can be created with the workflow in ../reference/make-qc-reference.wdl . coverage_labels (Array[String], default=[]): An array of equal length to coverage_beds which determines the prefix label applied to the output files. If omitted, defaults of regions1 , regions2 , etc. will be used. If using the BEDs created by ../reference/make-qc-reference.wdl , the labels [\"exon\", \"CDS\"] are appropriate. Make sure to provide the coverage BEDs in the same order as the labels. extra_multiqc_inputs (Array[File], default=[]): An array of additional files to pass directly into MultiQC mark_duplicates (Boolean, default=rna): Mark duplicates before select analyses? Default behavior is to set this to the value of the rna parameter. This is because DNA files are often duplicate marked already, and RNA-Seq files are usually not duplicate marked. If set to true , a BAM will be generated and passed to selected downstream analyses. For more details about what analyses are run, review ./markdups-post.wdl . WARNING, this duplicate marked BAM is not ouput by default. If you would like to output this file, set output_intermediate_files = true . multiqc_config (File, default=\"https://raw.githubusercontent.com/stjudecloud/workflows/main/workflows/qc/inputs/multiqc_config_hg38.yaml\"): YAML file for configuring MultiQC optical_distance (Int, default=0): Maximum distance between read coordinates to consider them optical duplicates instead of library duplicates (e.g. PCR duplicates). If mark_duplicates == false , this parameter is ignored. If 0 , then optical duplicate marking is disabled and only traditional duplicate marking will be performed. Suggested settings of 100 for unpatterned versions of the Illumina platform (e.g. HiSeq) or 2500 for patterned flowcell models (e.g. NovaSeq). Calculation of distance depends on coordinate data embedded in the read names, typically produced by Illumina sequencing machines. Optical duplicate detection will not work on non-standard names without a custom regex for tile-data extraction. Review the mark_duplicates task in ../../tools/picard.wdl for more information. output_intermediate_files (Boolean, default=false): Output intermediate files? FASTQs; if rna == true a collated BAM; if mark_duplicates == true a duplicate marked BAM, various accessory files like indexes and md5sums; if subsampling was requested and performed then a sampled BAM and associated index. WARNING, these files can be large. prefix (String, default=basename(bam,\".bam\")): Prefix for all results files rna (Boolean, default=false): Is the sequenced molecule RNA? Enabling this option adds RNA-Seq specific analyses to the workflow. If true , a GTF file must be provided. If false , the GTF file is ignored. run_comparative_kraken (Boolean, default=false): Run Kraken2 a second time with different FASTQ filtering? If true , comparative_filter is used in a second run of BAM->FASTQ conversion, resulting in differently filtered FASTQs analyzed by Kraken2. If false , comparative_filter is ignored. run_librarian (Boolean, default=rna); description : Run the librarian tool to generate a report of the likely Illumina library prep kit used to generate the data. WARNING this tool is not guaranteed to work on all data, and may produce nonsensical results. librarian was trained on a limited set of GEO read data (Gene Expression Oriented). This means the input data should be Paired-End, of mouse or human origin, read length should be >50bp, and derived from a library prep kit that is in the librarian database. By default, this tool is run when rna == true .; external_help : https://f1000research.com/articles/11-1122/v2 standard_filter (FlagFilter, default={\"include_if_all\": \"0x0\", \"exclude_if_any\": \"0x900\", \"include_if_any\": \"0x0\", \"exclude_if_all\": \"0x0\"}): Filter to apply to the input BAM while converting to FASTQ, before running Kraken2 and librarian (if run_librarian == true ). This is a FlagFilter object (see ../../data_structures/flag_filter.wdl for more information). By default, it will remove secondary and supplementary reads from the created FASTQs. WARNING: These filters can be tricky to configure; please read documentation thoroughly before changing the defaults. WARNING: If you have set run_librarian to true , we strongly recommend leaving this filter at the default value. librarian is trained on a specific set of reads, and changing this filter may produce nonsensical results. store_kraken_sequences (Boolean, default=false): Store the Kraken2 sequences output? This will apply to all runs of Kraken2 (see parameter_meta.run_comparative_kraken ). WARNING these files can be very large. subsample_n_reads (Int, default=-1): Only process a random sampling of approximately n reads. Any n <= 0 for processing entire input. Subsampling is done probabalistically so the exact number of reads in the output will have some variation. use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. alt_filtered_fastq.append_read_number (Boolean, default=true) alt_filtered_fastq.collated (Boolean, default=false) alt_filtered_fastq.fail_on_unexpected_reads (Boolean, default=false) alt_filtered_fastq.modify_disk_size_gb (Int, default=0) alt_filtered_fastq.modify_memory_gb (Int, default=0) alt_filtered_fastq.ncpu (Int, default=2) alt_filtered_fastq.output_singletons (Boolean, default=false) alt_filtered_fqlint.disable_validator_codes (Array[String], default=[]) alt_filtered_fqlint.modify_disk_size_gb (Int, default=0) alt_filtered_fqlint.modify_memory_gb (Int, default=0) alt_filtered_fqlint.paired_read_validation_level (String, default=\"high\") alt_filtered_fqlint.panic (Boolean, default=true) alt_filtered_fqlint.single_read_validation_level (String, default=\"high\") bam_to_fastq.append_read_number (Boolean, default=true) bam_to_fastq.collated (Boolean, default=false) bam_to_fastq.fail_on_unexpected_reads (Boolean, default=false) bam_to_fastq.modify_disk_size_gb (Int, default=0) bam_to_fastq.modify_memory_gb (Int, default=0) bam_to_fastq.ncpu (Int, default=2) bam_to_fastq.output_singletons (Boolean, default=false) collect_alignment_summary_metrics.memory_gb (Int, default=8) collect_alignment_summary_metrics.modify_disk_size_gb (Int, default=0) collect_alignment_summary_metrics.validation_stringency (String, default=\"SILENT\") collect_insert_size_metrics.memory_gb (Int, default=8) collect_insert_size_metrics.modify_disk_size_gb (Int, default=0) collect_insert_size_metrics.validation_stringency (String, default=\"SILENT\") comparative_kraken.min_base_quality (Int, default=0) comparative_kraken.modify_disk_size_gb (Int, default=0) comparative_kraken.modify_memory_gb (Int, default=0) comparative_kraken.ncpu (Int, default=4) comparative_kraken.use_names (Boolean, default=true) compression_integrity.modify_disk_size_gb (Int, default=0) compute_checksum.modify_disk_size_gb (Int, default=0) encoding.modify_disk_size_gb (Int, default=0) endedness.calc_rpt (Boolean, default=false) endedness.modify_disk_size_gb (Int, default=0) endedness.modify_memory_gb (Int, default=0) endedness.num_reads (Int, default=-1) endedness.paired_deviance (Float, default=0.0) endedness.round_rpt (Boolean, default=false) endedness.split_by_rg (Boolean, default=false) fastqc.modify_disk_size_gb (Int, default=0) fastqc.ncpu (Int, default=4) flagstat.modify_disk_size_gb (Int, default=0) flagstat.ncpu (Int, default=2) flagstat.use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. fqlint.disable_validator_codes (Array[String], default=[]) fqlint.modify_disk_size_gb (Int, default=0) fqlint.modify_memory_gb (Int, default=0) fqlint.paired_read_validation_level (String, default=\"high\") fqlint.panic (Boolean, default=true) fqlint.single_read_validation_level (String, default=\"high\") global_phred_scores.fast_mode (Boolean, default=true) global_phred_scores.modify_disk_size_gb (Int, default=0) instrument.modify_disk_size_gb (Int, default=0) instrument.num_reads (Int, default=10000) junction_annotation.fuzzy_junction_match_range (Int, default=0) junction_annotation.min_intron (Int, default=50) junction_annotation.min_mapq (Int, default=30) junction_annotation.min_reads (Int, default=2) junction_annotation.modify_disk_size_gb (Int, default=0) kraken.min_base_quality (Int, default=0) kraken.modify_disk_size_gb (Int, default=0) kraken.modify_memory_gb (Int, default=0) kraken.ncpu (Int, default=4) kraken.use_names (Boolean, default=true) librarian.modify_disk_size_gb (Int, default=0) librarian.prefix (String, default=sub(basename(read_one_fastq),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\") + \".librarian\"): Prefix for all results files markdups.clear_dt (Boolean, default=true) markdups.duplicate_scoring_strategy (String, default=\"SUM_OF_BASE_QUALITIES\") markdups.modify_disk_size_gb (Int, default=0) markdups.modify_memory_gb (Int, default=0) markdups.read_name_regex (String, default=\"^[!-9;-?A-~:]+:([!-9;-?A-~]+):([0-9]+):([0-9]+)$\") markdups.remove_duplicates (Boolean, default=false) markdups.remove_sequencing_duplicates (Boolean, default=false) markdups.tagging_policy (String, default=\"All\") markdups.validation_stringency (String, default=\"SILENT\") multiqc.modify_disk_size_gb (Int, default=0) qualimap_rnaseq.memory_gb (Int, default=16) qualimap_rnaseq.modify_disk_size_gb (Int, default=0) quality_score_distribution.memory_gb (Int, default=8) quality_score_distribution.modify_disk_size_gb (Int, default=0) quality_score_distribution.validation_stringency (String, default=\"SILENT\") quickcheck.modify_disk_size_gb (Int, default=0) read_length.majority_vote_cutoff (Float, default=0.7) read_length.modify_disk_size_gb (Int, default=0) read_length.num_reads (Int, default=-1) regions_coverage.min_mapping_quality (Int, default=20) regions_coverage.modify_disk_size_gb (Int, default=0) regions_coverage.use_fast_mode (Boolean, default=true) strandedness.min_mapq (Int, default=30) strandedness.min_reads_per_gene (Int, default=10) strandedness.modify_disk_size_gb (Int, default=0) strandedness.num_genes (Int, default=1000) strandedness.split_by_rg (Boolean, default=false) subsample.modify_disk_size_gb (Int, default=0) subsample.ncpu (Int, default=2) subsample_index.modify_disk_size_gb (Int, default=0) subsample_index.ncpu (Int, default=2) validate_bam.index_validation_stringency_less_exhaustive (Boolean, default=false) validate_bam.max_errors (Int, default=2147483647) validate_bam.memory_gb (Int, default=16) validate_bam.modify_disk_size_gb (Int, default=0) validate_bam.succeed_on_warnings (Boolean, default=true) validate_bam.validation_stringency (String, default=\"LENIENT\") wg_coverage.min_mapping_quality (Int, default=20) wg_coverage.modify_disk_size_gb (Int, default=0) wg_coverage.use_fast_mode (Boolean, default=true) markdups_post.collect_insert_size_metrics.memory_gb (Int, default=8) markdups_post.collect_insert_size_metrics.modify_disk_size_gb (Int, default=0) markdups_post.collect_insert_size_metrics.validation_stringency (String, default=\"SILENT\") markdups_post.flagstat.modify_disk_size_gb (Int, default=0) markdups_post.flagstat.ncpu (Int, default=2) markdups_post.flagstat.use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. markdups_post.regions_coverage.min_mapping_quality (Int, default=20) markdups_post.regions_coverage.modify_disk_size_gb (Int, default=0) markdups_post.regions_coverage.use_fast_mode (Boolean, default=true) markdups_post.wg_coverage.min_mapping_quality (Int, default=20) markdups_post.wg_coverage.modify_disk_size_gb (Int, default=0) markdups_post.wg_coverage.use_fast_mode (Boolean, default=true) Outputs bam_checksum (File) validate_sam_file (File) flagstat_report (File) fastqc_results (File) instrument_file (File) read_length_file (File) inferred_encoding (File) inferred_endedness (File) alignment_metrics (File) alignment_metrics_pdf (File) insert_size_metrics (File) insert_size_metrics_pdf (File) quality_score_distribution_txt (File) quality_score_distribution_pdf (File) phred_scores (File) kraken_report (File) mosdepth_global_dist (File) mosdepth_global_summary (File) mosdepth_region_dist (Array[File]) mosdepth_region_summary (Array[File]) multiqc_report (File) orig_read_count (File?) kraken_sequences (File?) comparative_kraken_report (File?) comparative_kraken_sequences (File?) mosdepth_dups_marked_global_dist (File?) mosdepth_dups_marked_global_summary (File?) mosdepth_dups_marked_region_summary (Array[File]?) mosdepth_dups_marked_region_dist (Array[File?]?) mark_duplicates_metrics (File?) inferred_strandedness (File?) qualimap_rnaseq_results (File?) junction_summary (File?) junctions (File?) librarian_report (File?) intermediate_files (IntermediateFiles?) parse_input description Parses and validates the quality_check workflow's provided inputs outputs {'check': 'Dummy output to indicate success and to enable call-caching', 'labels': 'An array of labels to use on the result coverage files associated with each coverage BED'} Inputs Required _runtime (Any, required ) coverage_beds_len (Int, required ): Length of the provided coverage_beds array coverage_labels (Array[String], required ): An array of equal length to coverage_beds_len which determines the prefix label applied to coverage output files. If an empty array is supplied, defaults of regions1 , regions2 , etc. will be used. gtf_provided (Boolean, required ): Was a GTF supplied by the user? Must be true if rna == true . rna (Boolean, required ): Is the sequenced molecule RNA? Outputs labels (Array[String])","title":"Quality check standard"},{"location":"workflows/quality-check-standard/#quality_check","text":"description Performs comprehensive quality checks, aggregating all analyses and metrics into a final MultiQC report. help Assumes that input BAM is position-sorted. external_help https://multiqc.info/ outputs {'bam_checksum': 'STDOUT of the md5sum command run on the input BAM that has been redirected to a file', 'validate_sam_file': 'Validation report produced by picard ValidateSamFile . Validation warnings and errors are logged.', 'flagstat_report': ' samtools flagstat STDOUT redirected to a file. If mark_duplicates is true , then this result will be generated from the duplicate marked BAM.', 'fastqc_results': 'A gzipped tar archive of all FastQC output files', 'instrument_file': 'TSV file containing the ngsderive isntrument report for the input BAM file', 'read_length_file': 'TSV file containing the ngsderive readlen report for the input BAM file', 'inferred_encoding': 'TSV file containing the ngsderive encoding report for the input BAM file', 'alignment_metrics': {'description': 'The text file output of picard CollectAlignmentSummaryMetrics ', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#AlignmentSummaryMetrics'}, 'alignment_metrics_pdf': 'The PDF file output of picard CollectAlignmentSummaryMetrics ', 'insert_size_metrics': {'description': 'The text file output of picard CollectInsertSizeMetrics . If mark_duplicates is true , then this result will be generated from the duplicate marked BAM.', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#InsertSizeMetrics'}, 'insert_size_metrics_pdf': 'The PDF file output of picard CollectInsertSizeMetrics . If mark_duplicates is true , then this result will be generated from the duplicate marked BAM.', 'quality_score_distribution_txt': 'The text file output of picard QualityScoreDistribution ', 'quality_score_distribution_pdf': 'The PDF file output of picard QualityScoreDistribution ', 'phred_scores': 'Headered TSV file containing PHRED score statistics', 'kraken_report': {'description': 'A Kraken2 summary report', 'external_help': 'https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#sample-report-output-format'}, 'mosdepth_global_dist': 'The $prefix.mosdepth.global.dist.txt file contains a cumulative distribution indicating the proportion of total bases that were covered for at least a given coverage value. It does this for each chromosome, and for the whole genome.', 'mosdepth_global_summary': 'A summary of mean depths per chromosome', 'mosdepth_region_dist': 'The $prefix.mosdepth.region.dist.txt file contains a cumulative distribution indicating the proportion of total bases in the region(s) defined by the coverage_bed that were covered for at least a given coverage value. There will be one file in this array for each coverage_beds input file.', 'mosdepth_region_summary': 'A summary of mean depths per chromosome and within specified regions per chromosome. There will be one file in this array for each coverage_beds input file.', 'multiqc_report': 'A gzipped tar archive of all MultiQC output files', 'orig_read_count': 'A TSV report containing the original read count before subsampling. Only present if subsample_n_reads > 0 .', 'kraken_sequences': {'description': 'Detailed Kraken2 output that has been gzipped. Only present if store_kraken_sequences == true .', 'external_help': 'https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#standard-kraken-output-format'}, 'comparative_kraken_report': 'Kraken2 summary report for only the alternatively filtered reads. Only present if run_comparative_kraken == true .', 'comparative_kraken_sequences': 'Detailed Kraken2 output for only the alternatively filtered reads. Only present if run_comparative_kraken == true && store_kraken_sequences == true .', 'mark_duplicates_metrics': {'description': 'The METRICS_FILE result of picard MarkDuplicates . Only present if mark_duplicates == true && optical_distance > 0 .', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#DuplicationMetrics'}, 'mosdepth_dups_marked_global_dist': 'The $prefix.mosdepth.global.dist.txt file contains a cumulative distribution indicating the proportion of total bases that were covered for at least a given coverage value. It does this for each chromosome, and for the whole genome. This file is produced from analyzing the duplicate marked BAM. Only present if mark_duplicates == true .', 'mosdepth_dups_marked_global_summary': 'A summary of mean depths per chromosome. This file is produced from analyzing the duplicate marked BAM. Only present if mark_duplicates == true .', 'mosdepth_dups_marked_region_dist': 'The $prefix.mosdepth.region.dist.txt file contains a cumulative distribution indicating the proportion of total bases in the region(s) defined by the coverage_bed that were covered for at least a given coverage value. There will be one file in this array for each coverage_beds input file. This file is produced from analyzing the duplicate marked BAM. Only present if mark_duplicates == true .', 'mosdepth_dups_marked_region_summary': 'A summary of mean depths per chromosome and within specified regions per chromosome. There will be one file in this array for each coverage_beds input file. This file is produced from analyzing the duplicate marked BAM. Only present if mark_duplicates == true .', 'inferred_strandedness': 'TSV file containing the ngsderive strandedness report. Only present if rna == true .', 'qualimap_rnaseq_results': 'Gzipped tar archive of all QualiMap output files. Only present if rna == true .', 'junction_summary': 'TSV file containing the ngsderive junction-annotation summary. Only present if rna == true ', 'junctions': 'TSV file containing a detailed list of annotated junctions. Only present if rna == true .', 'librarian_report': 'A tar archive containing the librarian report and raw data. Only present if run_librarian == true .', 'IntermediateFiles': 'Any and all files produced as intermediate during pipeline processing. Only output if output_intermediate_files == true .'} allowNestedInputs true","title":"quality_check"},{"location":"workflows/quality-check-standard/#inputs","text":"","title":"Inputs"},{"location":"workflows/quality-check-standard/#required","text":"bam (File, required ): Input BAM format file to quality check bam_index (File, required ): BAM index file corresponding to the input BAM kraken_db (File, required ): Kraken2 database. Can be generated with ../reference/make-qc-reference.wdl . Must be a tarball without a root directory. alt_filtered_fastq._runtime (Any, required ) alt_filtered_fqlint._runtime (Any, required ) bam_to_fastq._runtime (Any, required ) collect_alignment_summary_metrics._runtime (Any, required ) collect_insert_size_metrics._runtime (Any, required ) comparative_kraken._runtime (Any, required ) compression_integrity._runtime (Any, required ) compute_checksum._runtime (Any, required ) encoding._runtime (Any, required ) endedness._runtime (Any, required ) fastqc._runtime (Any, required ) flagstat._runtime (Any, required ) fqlint._runtime (Any, required ) global_phred_scores._runtime (Any, required ) instrument._runtime (Any, required ) junction_annotation._runtime (Any, required ) kraken._runtime (Any, required ) librarian._runtime (Any, required ) markdups._runtime (Any, required ) multiqc._runtime (Any, required ) parse_input._runtime (Any, required ) qualimap_rnaseq._runtime (Any, required ) quality_score_distribution._runtime (Any, required ) quickcheck._runtime (Any, required ) read_length._runtime (Any, required ) regions_coverage._runtime (Any, required ) strandedness._runtime (Any, required ) subsample._runtime (Any, required ) subsample_index._runtime (Any, required ) validate_bam._runtime (Any, required ) wg_coverage._runtime (Any, required ) comparative_kraken_filter_validator.validate_exclude_if_all._runtime (Any, required ) comparative_kraken_filter_validator.validate_exclude_if_any._runtime (Any, required ) comparative_kraken_filter_validator.validate_include_if_all._runtime (Any, required ) comparative_kraken_filter_validator.validate_include_if_any._runtime (Any, required ) kraken_filter_validator.validate_exclude_if_all._runtime (Any, required ) kraken_filter_validator.validate_exclude_if_any._runtime (Any, required ) kraken_filter_validator.validate_include_if_all._runtime (Any, required ) kraken_filter_validator.validate_include_if_any._runtime (Any, required ) markdups_post.collect_insert_size_metrics._runtime (Any, required ) markdups_post.flagstat._runtime (Any, required ) markdups_post.regions_coverage._runtime (Any, required ) markdups_post.wg_coverage._runtime (Any, required )","title":"Required"},{"location":"workflows/quality-check-standard/#optional","text":"gtf (File?): GTF features file. Gzipped or uncompressed. Required for RNA-Seq data. validate_bam.reference_fasta (File?) wg_coverage.coverage_bed (File?) markdups_post.wg_coverage.coverage_bed (File?)","title":"Optional"},{"location":"workflows/quality-check-standard/#defaults","text":"comparative_filter (FlagFilter, default={\"include_if_all\": \"0x0\", \"exclude_if_any\": \"0x904\", \"include_if_any\": \"0x0\", \"exclude_if_all\": \"0x0\"}): Filter to apply to the input BAM while performing a second FASTQ conversion, before running Kraken2 another time. This is a FlagFilter object (see ../../data_structures/flag_filter.wdl for more information). By default, it will remove unmapped, secondary, and supplementary reads from the created FASTQs. WARNING These filters can be tricky to configure; please read documentation thoroughly before changing the defaults. coverage_beds (Array[File], default=[]): An array of 3 column BEDs which are passed to the -b flag of mosdepth, in order to restrict coverage analysis to select regions. Any regional analysis enabled by this option is in addition to whole genome coverage, which is calculated regardless of this setting. An exon BED and a Coding Sequence BED are examples of regions you may wish to restrict coverage analysis to. Those two BEDs can be created with the workflow in ../reference/make-qc-reference.wdl . coverage_labels (Array[String], default=[]): An array of equal length to coverage_beds which determines the prefix label applied to the output files. If omitted, defaults of regions1 , regions2 , etc. will be used. If using the BEDs created by ../reference/make-qc-reference.wdl , the labels [\"exon\", \"CDS\"] are appropriate. Make sure to provide the coverage BEDs in the same order as the labels. extra_multiqc_inputs (Array[File], default=[]): An array of additional files to pass directly into MultiQC mark_duplicates (Boolean, default=rna): Mark duplicates before select analyses? Default behavior is to set this to the value of the rna parameter. This is because DNA files are often duplicate marked already, and RNA-Seq files are usually not duplicate marked. If set to true , a BAM will be generated and passed to selected downstream analyses. For more details about what analyses are run, review ./markdups-post.wdl . WARNING, this duplicate marked BAM is not ouput by default. If you would like to output this file, set output_intermediate_files = true . multiqc_config (File, default=\"https://raw.githubusercontent.com/stjudecloud/workflows/main/workflows/qc/inputs/multiqc_config_hg38.yaml\"): YAML file for configuring MultiQC optical_distance (Int, default=0): Maximum distance between read coordinates to consider them optical duplicates instead of library duplicates (e.g. PCR duplicates). If mark_duplicates == false , this parameter is ignored. If 0 , then optical duplicate marking is disabled and only traditional duplicate marking will be performed. Suggested settings of 100 for unpatterned versions of the Illumina platform (e.g. HiSeq) or 2500 for patterned flowcell models (e.g. NovaSeq). Calculation of distance depends on coordinate data embedded in the read names, typically produced by Illumina sequencing machines. Optical duplicate detection will not work on non-standard names without a custom regex for tile-data extraction. Review the mark_duplicates task in ../../tools/picard.wdl for more information. output_intermediate_files (Boolean, default=false): Output intermediate files? FASTQs; if rna == true a collated BAM; if mark_duplicates == true a duplicate marked BAM, various accessory files like indexes and md5sums; if subsampling was requested and performed then a sampled BAM and associated index. WARNING, these files can be large. prefix (String, default=basename(bam,\".bam\")): Prefix for all results files rna (Boolean, default=false): Is the sequenced molecule RNA? Enabling this option adds RNA-Seq specific analyses to the workflow. If true , a GTF file must be provided. If false , the GTF file is ignored. run_comparative_kraken (Boolean, default=false): Run Kraken2 a second time with different FASTQ filtering? If true , comparative_filter is used in a second run of BAM->FASTQ conversion, resulting in differently filtered FASTQs analyzed by Kraken2. If false , comparative_filter is ignored. run_librarian (Boolean, default=rna); description : Run the librarian tool to generate a report of the likely Illumina library prep kit used to generate the data. WARNING this tool is not guaranteed to work on all data, and may produce nonsensical results. librarian was trained on a limited set of GEO read data (Gene Expression Oriented). This means the input data should be Paired-End, of mouse or human origin, read length should be >50bp, and derived from a library prep kit that is in the librarian database. By default, this tool is run when rna == true .; external_help : https://f1000research.com/articles/11-1122/v2 standard_filter (FlagFilter, default={\"include_if_all\": \"0x0\", \"exclude_if_any\": \"0x900\", \"include_if_any\": \"0x0\", \"exclude_if_all\": \"0x0\"}): Filter to apply to the input BAM while converting to FASTQ, before running Kraken2 and librarian (if run_librarian == true ). This is a FlagFilter object (see ../../data_structures/flag_filter.wdl for more information). By default, it will remove secondary and supplementary reads from the created FASTQs. WARNING: These filters can be tricky to configure; please read documentation thoroughly before changing the defaults. WARNING: If you have set run_librarian to true , we strongly recommend leaving this filter at the default value. librarian is trained on a specific set of reads, and changing this filter may produce nonsensical results. store_kraken_sequences (Boolean, default=false): Store the Kraken2 sequences output? This will apply to all runs of Kraken2 (see parameter_meta.run_comparative_kraken ). WARNING these files can be very large. subsample_n_reads (Int, default=-1): Only process a random sampling of approximately n reads. Any n <= 0 for processing entire input. Subsampling is done probabalistically so the exact number of reads in the output will have some variation. use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. alt_filtered_fastq.append_read_number (Boolean, default=true) alt_filtered_fastq.collated (Boolean, default=false) alt_filtered_fastq.fail_on_unexpected_reads (Boolean, default=false) alt_filtered_fastq.modify_disk_size_gb (Int, default=0) alt_filtered_fastq.modify_memory_gb (Int, default=0) alt_filtered_fastq.ncpu (Int, default=2) alt_filtered_fastq.output_singletons (Boolean, default=false) alt_filtered_fqlint.disable_validator_codes (Array[String], default=[]) alt_filtered_fqlint.modify_disk_size_gb (Int, default=0) alt_filtered_fqlint.modify_memory_gb (Int, default=0) alt_filtered_fqlint.paired_read_validation_level (String, default=\"high\") alt_filtered_fqlint.panic (Boolean, default=true) alt_filtered_fqlint.single_read_validation_level (String, default=\"high\") bam_to_fastq.append_read_number (Boolean, default=true) bam_to_fastq.collated (Boolean, default=false) bam_to_fastq.fail_on_unexpected_reads (Boolean, default=false) bam_to_fastq.modify_disk_size_gb (Int, default=0) bam_to_fastq.modify_memory_gb (Int, default=0) bam_to_fastq.ncpu (Int, default=2) bam_to_fastq.output_singletons (Boolean, default=false) collect_alignment_summary_metrics.memory_gb (Int, default=8) collect_alignment_summary_metrics.modify_disk_size_gb (Int, default=0) collect_alignment_summary_metrics.validation_stringency (String, default=\"SILENT\") collect_insert_size_metrics.memory_gb (Int, default=8) collect_insert_size_metrics.modify_disk_size_gb (Int, default=0) collect_insert_size_metrics.validation_stringency (String, default=\"SILENT\") comparative_kraken.min_base_quality (Int, default=0) comparative_kraken.modify_disk_size_gb (Int, default=0) comparative_kraken.modify_memory_gb (Int, default=0) comparative_kraken.ncpu (Int, default=4) comparative_kraken.use_names (Boolean, default=true) compression_integrity.modify_disk_size_gb (Int, default=0) compute_checksum.modify_disk_size_gb (Int, default=0) encoding.modify_disk_size_gb (Int, default=0) endedness.calc_rpt (Boolean, default=false) endedness.modify_disk_size_gb (Int, default=0) endedness.modify_memory_gb (Int, default=0) endedness.num_reads (Int, default=-1) endedness.paired_deviance (Float, default=0.0) endedness.round_rpt (Boolean, default=false) endedness.split_by_rg (Boolean, default=false) fastqc.modify_disk_size_gb (Int, default=0) fastqc.ncpu (Int, default=4) flagstat.modify_disk_size_gb (Int, default=0) flagstat.ncpu (Int, default=2) flagstat.use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. fqlint.disable_validator_codes (Array[String], default=[]) fqlint.modify_disk_size_gb (Int, default=0) fqlint.modify_memory_gb (Int, default=0) fqlint.paired_read_validation_level (String, default=\"high\") fqlint.panic (Boolean, default=true) fqlint.single_read_validation_level (String, default=\"high\") global_phred_scores.fast_mode (Boolean, default=true) global_phred_scores.modify_disk_size_gb (Int, default=0) instrument.modify_disk_size_gb (Int, default=0) instrument.num_reads (Int, default=10000) junction_annotation.fuzzy_junction_match_range (Int, default=0) junction_annotation.min_intron (Int, default=50) junction_annotation.min_mapq (Int, default=30) junction_annotation.min_reads (Int, default=2) junction_annotation.modify_disk_size_gb (Int, default=0) kraken.min_base_quality (Int, default=0) kraken.modify_disk_size_gb (Int, default=0) kraken.modify_memory_gb (Int, default=0) kraken.ncpu (Int, default=4) kraken.use_names (Boolean, default=true) librarian.modify_disk_size_gb (Int, default=0) librarian.prefix (String, default=sub(basename(read_one_fastq),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\") + \".librarian\"): Prefix for all results files markdups.clear_dt (Boolean, default=true) markdups.duplicate_scoring_strategy (String, default=\"SUM_OF_BASE_QUALITIES\") markdups.modify_disk_size_gb (Int, default=0) markdups.modify_memory_gb (Int, default=0) markdups.read_name_regex (String, default=\"^[!-9;-?A-~:]+:([!-9;-?A-~]+):([0-9]+):([0-9]+)$\") markdups.remove_duplicates (Boolean, default=false) markdups.remove_sequencing_duplicates (Boolean, default=false) markdups.tagging_policy (String, default=\"All\") markdups.validation_stringency (String, default=\"SILENT\") multiqc.modify_disk_size_gb (Int, default=0) qualimap_rnaseq.memory_gb (Int, default=16) qualimap_rnaseq.modify_disk_size_gb (Int, default=0) quality_score_distribution.memory_gb (Int, default=8) quality_score_distribution.modify_disk_size_gb (Int, default=0) quality_score_distribution.validation_stringency (String, default=\"SILENT\") quickcheck.modify_disk_size_gb (Int, default=0) read_length.majority_vote_cutoff (Float, default=0.7) read_length.modify_disk_size_gb (Int, default=0) read_length.num_reads (Int, default=-1) regions_coverage.min_mapping_quality (Int, default=20) regions_coverage.modify_disk_size_gb (Int, default=0) regions_coverage.use_fast_mode (Boolean, default=true) strandedness.min_mapq (Int, default=30) strandedness.min_reads_per_gene (Int, default=10) strandedness.modify_disk_size_gb (Int, default=0) strandedness.num_genes (Int, default=1000) strandedness.split_by_rg (Boolean, default=false) subsample.modify_disk_size_gb (Int, default=0) subsample.ncpu (Int, default=2) subsample_index.modify_disk_size_gb (Int, default=0) subsample_index.ncpu (Int, default=2) validate_bam.index_validation_stringency_less_exhaustive (Boolean, default=false) validate_bam.max_errors (Int, default=2147483647) validate_bam.memory_gb (Int, default=16) validate_bam.modify_disk_size_gb (Int, default=0) validate_bam.succeed_on_warnings (Boolean, default=true) validate_bam.validation_stringency (String, default=\"LENIENT\") wg_coverage.min_mapping_quality (Int, default=20) wg_coverage.modify_disk_size_gb (Int, default=0) wg_coverage.use_fast_mode (Boolean, default=true) markdups_post.collect_insert_size_metrics.memory_gb (Int, default=8) markdups_post.collect_insert_size_metrics.modify_disk_size_gb (Int, default=0) markdups_post.collect_insert_size_metrics.validation_stringency (String, default=\"SILENT\") markdups_post.flagstat.modify_disk_size_gb (Int, default=0) markdups_post.flagstat.ncpu (Int, default=2) markdups_post.flagstat.use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. markdups_post.regions_coverage.min_mapping_quality (Int, default=20) markdups_post.regions_coverage.modify_disk_size_gb (Int, default=0) markdups_post.regions_coverage.use_fast_mode (Boolean, default=true) markdups_post.wg_coverage.min_mapping_quality (Int, default=20) markdups_post.wg_coverage.modify_disk_size_gb (Int, default=0) markdups_post.wg_coverage.use_fast_mode (Boolean, default=true)","title":"Defaults"},{"location":"workflows/quality-check-standard/#outputs","text":"bam_checksum (File) validate_sam_file (File) flagstat_report (File) fastqc_results (File) instrument_file (File) read_length_file (File) inferred_encoding (File) inferred_endedness (File) alignment_metrics (File) alignment_metrics_pdf (File) insert_size_metrics (File) insert_size_metrics_pdf (File) quality_score_distribution_txt (File) quality_score_distribution_pdf (File) phred_scores (File) kraken_report (File) mosdepth_global_dist (File) mosdepth_global_summary (File) mosdepth_region_dist (Array[File]) mosdepth_region_summary (Array[File]) multiqc_report (File) orig_read_count (File?) kraken_sequences (File?) comparative_kraken_report (File?) comparative_kraken_sequences (File?) mosdepth_dups_marked_global_dist (File?) mosdepth_dups_marked_global_summary (File?) mosdepth_dups_marked_region_summary (Array[File]?) mosdepth_dups_marked_region_dist (Array[File?]?) mark_duplicates_metrics (File?) inferred_strandedness (File?) qualimap_rnaseq_results (File?) junction_summary (File?) junctions (File?) librarian_report (File?) intermediate_files (IntermediateFiles?)","title":"Outputs"},{"location":"workflows/quality-check-standard/#parse_input","text":"description Parses and validates the quality_check workflow's provided inputs outputs {'check': 'Dummy output to indicate success and to enable call-caching', 'labels': 'An array of labels to use on the result coverage files associated with each coverage BED'}","title":"parse_input"},{"location":"workflows/quality-check-standard/#inputs_1","text":"","title":"Inputs"},{"location":"workflows/quality-check-standard/#required_1","text":"_runtime (Any, required ) coverage_beds_len (Int, required ): Length of the provided coverage_beds array coverage_labels (Array[String], required ): An array of equal length to coverage_beds_len which determines the prefix label applied to coverage output files. If an empty array is supplied, defaults of regions1 , regions2 , etc. will be used. gtf_provided (Boolean, required ): Was a GTF supplied by the user? Must be true if rna == true . rna (Boolean, required ): Is the sequenced molecule RNA?","title":"Required"},{"location":"workflows/quality-check-standard/#outputs_1","text":"labels (Array[String])","title":"Outputs"},{"location":"workflows/rnaseq-variant-calling/","text":"rnaseq_variant_calling description Call short germline variants from RNA-Seq data. Produces a VCF file of variants. Based on GATK RNA-Seq short variant calling best practices pipeline. outputs {'recalibrated_bam': 'BAM that has undergone recalibration of base quality scores', 'recalibrated_bam_index': 'Index file for recalibrated BAM file', 'variant_filtered_vcf': 'VCF file after variant filters have been applied', 'variant_filtered_vcf_index': 'Index for filtered variant VCF file'} Inputs Required bam (File, required ): BAM file of aligned RNA-Seq reads bam_index (File, required ): Index file for BAM file calling_interval_list (File, required ): Interval list of regions from which to call variants. Used for parallelization. dbSNP_vcf (File, required ): dbSNP VCF file dbSNP_vcf_index (File, required ): Index file for dbSNP VCF file dict (File, required ): Sequence dictionary for reference FASTA file fasta (File, required ): Reference FASTA file fasta_index (File, required ): Index file for reference FASTA file known_vcf_indexes (Array[File], required ): Array of index files for known indels VCF files known_vcfs (Array[File], required ): Array of known indels VCF files apply_bqsr._runtime (Any, required ) base_recalibrator._runtime (Any, required ) haplotype_caller._runtime (Any, required ) mark_duplicates._runtime (Any, required ) merge_vcfs._runtime (Any, required ) scatter_interval_list._runtime (Any, required ) split_n_cigar_reads._runtime (Any, required ) variant_filtration._runtime (Any, required ) Defaults bam_is_dup_marked (Boolean, default=false): Whether the input BAM file has duplicates marked. prefix (String, default=basename(bam,'.bam')): Prefix for the output files. scatter_count (Int, default=6): Number of intervals to scatter over. This should typically be set to 5-20. Higher values will increase parallelism and speed up the workflow, but increase overhead in provisioning resources. apply_bqsr.memory_gb (Int, default=25) apply_bqsr.modify_disk_size_gb (Int, default=0) apply_bqsr.ncpu (Int, default=4) apply_bqsr.use_original_quality_scores (Boolean, default=false) base_recalibrator.memory_gb (Int, default=25) base_recalibrator.modify_disk_size_gb (Int, default=0) base_recalibrator.ncpu (Int, default=4) base_recalibrator.outfile_name (String, default=basename(bam,\".bam\") + \".recal.txt\") base_recalibrator.use_original_quality_scores (Boolean, default=false) haplotype_caller.memory_gb (Int, default=25) haplotype_caller.modify_disk_size_gb (Int, default=0) haplotype_caller.ncpu (Int, default=4) haplotype_caller.prefix (String, default=basename(bam,\".bam\")): Prefix for the output files. haplotype_caller.stand_call_conf (Int, default=20) haplotype_caller.use_soft_clipped_bases (Boolean, default=false) mark_duplicates.clear_dt (Boolean, default=true) mark_duplicates.duplicate_scoring_strategy (String, default=\"SUM_OF_BASE_QUALITIES\") mark_duplicates.modify_disk_size_gb (Int, default=0) mark_duplicates.modify_memory_gb (Int, default=0) mark_duplicates.optical_distance (Int, default=0) mark_duplicates.prefix (String, default=basename(bam,\".bam\") + \".MarkDuplicates\"): Prefix for the output files. mark_duplicates.read_name_regex (String, default=\"^[!-9;-?A-~:]+:([!-9;-?A-~]+):([0-9]+):([0-9]+)$\") mark_duplicates.remove_duplicates (Boolean, default=false) mark_duplicates.remove_sequencing_duplicates (Boolean, default=false) mark_duplicates.tagging_policy (String, default=\"All\") mark_duplicates.validation_stringency (String, default=\"SILENT\") merge_vcfs.modify_disk_size_gb (Int, default=0) scatter_interval_list.sort (Boolean, default=true) scatter_interval_list.subdivision_mode (String, default=\"BALANCING_WITHOUT_INTERVAL_SUBDIVISION_WITH_OVERFLOW\") scatter_interval_list.unique (Boolean, default=true) split_n_cigar_reads.memory_gb (Int, default=25) split_n_cigar_reads.modify_disk_size_gb (Int, default=0) split_n_cigar_reads.ncpu (Int, default=8) split_n_cigar_reads.prefix (String, default=basename(bam,\".bam\") + \".split\"): Prefix for the output files. variant_filtration.cluster (Int, default=3) variant_filtration.filter_expressions (Array[String], default=[\"FS > 30.0\", \"QD < 2.0\"]) variant_filtration.filter_names (Array[String], default=[\"FS\", \"QD\"]) variant_filtration.modify_disk_size_gb (Int, default=0) variant_filtration.ncpu (Int, default=1) variant_filtration.window (Int, default=35) Outputs recalibrated_bam (File) recalibrated_bam_index (File) variant_filtered_vcf (File) variant_filtered_vcf_index (File)","title":"Rnaseq variant calling"},{"location":"workflows/rnaseq-variant-calling/#rnaseq_variant_calling","text":"description Call short germline variants from RNA-Seq data. Produces a VCF file of variants. Based on GATK RNA-Seq short variant calling best practices pipeline. outputs {'recalibrated_bam': 'BAM that has undergone recalibration of base quality scores', 'recalibrated_bam_index': 'Index file for recalibrated BAM file', 'variant_filtered_vcf': 'VCF file after variant filters have been applied', 'variant_filtered_vcf_index': 'Index for filtered variant VCF file'}","title":"rnaseq_variant_calling"},{"location":"workflows/rnaseq-variant-calling/#inputs","text":"","title":"Inputs"},{"location":"workflows/rnaseq-variant-calling/#required","text":"bam (File, required ): BAM file of aligned RNA-Seq reads bam_index (File, required ): Index file for BAM file calling_interval_list (File, required ): Interval list of regions from which to call variants. Used for parallelization. dbSNP_vcf (File, required ): dbSNP VCF file dbSNP_vcf_index (File, required ): Index file for dbSNP VCF file dict (File, required ): Sequence dictionary for reference FASTA file fasta (File, required ): Reference FASTA file fasta_index (File, required ): Index file for reference FASTA file known_vcf_indexes (Array[File], required ): Array of index files for known indels VCF files known_vcfs (Array[File], required ): Array of known indels VCF files apply_bqsr._runtime (Any, required ) base_recalibrator._runtime (Any, required ) haplotype_caller._runtime (Any, required ) mark_duplicates._runtime (Any, required ) merge_vcfs._runtime (Any, required ) scatter_interval_list._runtime (Any, required ) split_n_cigar_reads._runtime (Any, required ) variant_filtration._runtime (Any, required )","title":"Required"},{"location":"workflows/rnaseq-variant-calling/#defaults","text":"bam_is_dup_marked (Boolean, default=false): Whether the input BAM file has duplicates marked. prefix (String, default=basename(bam,'.bam')): Prefix for the output files. scatter_count (Int, default=6): Number of intervals to scatter over. This should typically be set to 5-20. Higher values will increase parallelism and speed up the workflow, but increase overhead in provisioning resources. apply_bqsr.memory_gb (Int, default=25) apply_bqsr.modify_disk_size_gb (Int, default=0) apply_bqsr.ncpu (Int, default=4) apply_bqsr.use_original_quality_scores (Boolean, default=false) base_recalibrator.memory_gb (Int, default=25) base_recalibrator.modify_disk_size_gb (Int, default=0) base_recalibrator.ncpu (Int, default=4) base_recalibrator.outfile_name (String, default=basename(bam,\".bam\") + \".recal.txt\") base_recalibrator.use_original_quality_scores (Boolean, default=false) haplotype_caller.memory_gb (Int, default=25) haplotype_caller.modify_disk_size_gb (Int, default=0) haplotype_caller.ncpu (Int, default=4) haplotype_caller.prefix (String, default=basename(bam,\".bam\")): Prefix for the output files. haplotype_caller.stand_call_conf (Int, default=20) haplotype_caller.use_soft_clipped_bases (Boolean, default=false) mark_duplicates.clear_dt (Boolean, default=true) mark_duplicates.duplicate_scoring_strategy (String, default=\"SUM_OF_BASE_QUALITIES\") mark_duplicates.modify_disk_size_gb (Int, default=0) mark_duplicates.modify_memory_gb (Int, default=0) mark_duplicates.optical_distance (Int, default=0) mark_duplicates.prefix (String, default=basename(bam,\".bam\") + \".MarkDuplicates\"): Prefix for the output files. mark_duplicates.read_name_regex (String, default=\"^[!-9;-?A-~:]+:([!-9;-?A-~]+):([0-9]+):([0-9]+)$\") mark_duplicates.remove_duplicates (Boolean, default=false) mark_duplicates.remove_sequencing_duplicates (Boolean, default=false) mark_duplicates.tagging_policy (String, default=\"All\") mark_duplicates.validation_stringency (String, default=\"SILENT\") merge_vcfs.modify_disk_size_gb (Int, default=0) scatter_interval_list.sort (Boolean, default=true) scatter_interval_list.subdivision_mode (String, default=\"BALANCING_WITHOUT_INTERVAL_SUBDIVISION_WITH_OVERFLOW\") scatter_interval_list.unique (Boolean, default=true) split_n_cigar_reads.memory_gb (Int, default=25) split_n_cigar_reads.modify_disk_size_gb (Int, default=0) split_n_cigar_reads.ncpu (Int, default=8) split_n_cigar_reads.prefix (String, default=basename(bam,\".bam\") + \".split\"): Prefix for the output files. variant_filtration.cluster (Int, default=3) variant_filtration.filter_expressions (Array[String], default=[\"FS > 30.0\", \"QD < 2.0\"]) variant_filtration.filter_names (Array[String], default=[\"FS\", \"QD\"]) variant_filtration.modify_disk_size_gb (Int, default=0) variant_filtration.ncpu (Int, default=1) variant_filtration.window (Int, default=35)","title":"Defaults"},{"location":"workflows/rnaseq-variant-calling/#outputs","text":"recalibrated_bam (File) recalibrated_bam_index (File) variant_filtered_vcf (File) variant_filtered_vcf_index (File)","title":"Outputs"},{"location":"workflows/samtools-merge/","text":"WARNING: this workflow is experimental! Use at your own risk! samtools_merge description Runs samtools merge , with optional iteration to avoid maximum command line argument length outputs {'merged_bam': 'The BAM resulting from merging all the input BAMs'} allowNestedInputs true Inputs Required bams (Array[File], required ): BAMs to merge into a final BAM basic_merge._runtime (Any, required ) final_merge._runtime (Any, required ) inner_merge._runtime (Any, required ) Optional basic_merge.new_header (File?) final_merge.new_header (File?) inner_merge.new_header (File?) Defaults max_length (Int, default=100): Maximum number of BAMs to merge before using iteration prefix (String, default=basename(bams[0],\".bam\")): Prefix for output BAM. use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. basic_merge.combine_rg (Boolean, default=true) basic_merge.modify_disk_size_gb (Int, default=0) basic_merge.name_sorted (Boolean, default=false) basic_merge.ncpu (Int, default=2) basic_merge.region (String, default=\"\") final_merge.modify_disk_size_gb (Int, default=0) final_merge.name_sorted (Boolean, default=false) final_merge.ncpu (Int, default=2) final_merge.region (String, default=\"\") inner_merge.combine_rg (Boolean, default=true) inner_merge.modify_disk_size_gb (Int, default=0) inner_merge.name_sorted (Boolean, default=false) inner_merge.ncpu (Int, default=2) inner_merge.region (String, default=\"\") Outputs merged_bam (File)","title":"Samtools merge"},{"location":"workflows/samtools-merge/#samtools_merge","text":"description Runs samtools merge , with optional iteration to avoid maximum command line argument length outputs {'merged_bam': 'The BAM resulting from merging all the input BAMs'} allowNestedInputs true","title":"samtools_merge"},{"location":"workflows/samtools-merge/#inputs","text":"","title":"Inputs"},{"location":"workflows/samtools-merge/#required","text":"bams (Array[File], required ): BAMs to merge into a final BAM basic_merge._runtime (Any, required ) final_merge._runtime (Any, required ) inner_merge._runtime (Any, required )","title":"Required"},{"location":"workflows/samtools-merge/#optional","text":"basic_merge.new_header (File?) final_merge.new_header (File?) inner_merge.new_header (File?)","title":"Optional"},{"location":"workflows/samtools-merge/#defaults","text":"max_length (Int, default=100): Maximum number of BAMs to merge before using iteration prefix (String, default=basename(bams[0],\".bam\")): Prefix for output BAM. use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. basic_merge.combine_rg (Boolean, default=true) basic_merge.modify_disk_size_gb (Int, default=0) basic_merge.name_sorted (Boolean, default=false) basic_merge.ncpu (Int, default=2) basic_merge.region (String, default=\"\") final_merge.modify_disk_size_gb (Int, default=0) final_merge.name_sorted (Boolean, default=false) final_merge.ncpu (Int, default=2) final_merge.region (String, default=\"\") inner_merge.combine_rg (Boolean, default=true) inner_merge.modify_disk_size_gb (Int, default=0) inner_merge.name_sorted (Boolean, default=false) inner_merge.ncpu (Int, default=2) inner_merge.region (String, default=\"\")","title":"Defaults"},{"location":"workflows/samtools-merge/#outputs","text":"merged_bam (File)","title":"Outputs"},{"location":"workflows/scrnaseq-standard/","text":"scRNA-Seq Standard This WDL workflow runs the Cell Ranger scRNA-Seq alignment workflow for St. Jude Cloud. The workflow takes an input BAM file and splits it into FASTQ files for each read in the pair. The read pairs are then passed through Cell Ranger to generate a BAM file and perform quantification. Strandedness is inferred using ngsderive. File validation is performed at several steps, including immediately preceeding output. LICENSING MIT License Copyright 2022-Present St. Jude Children's Research Hospital Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. scrnaseq_standard Inputs Required bam (File, required ): Input BAM format file to quality check gtf (File, required ): Gzipped GTF feature file transcriptome_tar_gz (File, required ): Database of reference files for Cell Ranger. Can be downloaded from 10x Genomics. compute_checksum._runtime (Any, required ) count._runtime (Any, required ) strandedness._runtime (Any, required ) subsample._runtime (Any, required ) validate_bam._runtime (Any, required ) validate_input_bam._runtime (Any, required ) cell_ranger_bam_to_fastqs.bamtofastq._runtime (Any, required ) cell_ranger_bam_to_fastqs.fqlint._runtime (Any, required ) cell_ranger_bam_to_fastqs.quickcheck._runtime (Any, required ) Optional validate_bam.reference_fasta (File?) validate_input_bam.reference_fasta (File?) Defaults prefix (String, default=basename(bam,\".bam\")): Prefix for output files subsample_n_reads (Int, default=-1): Only process a random sampling of n reads. <= 0 for processing entire input BAM. use_all_cores (Boolean, default=false): Use all cores for multi-core steps? validate_input (Boolean, default=true): Ensure input BAM is well-formed before beginning harmonization? cell_ranger_bam_to_fastqs.cellranger11 (Boolean, default=false) cell_ranger_bam_to_fastqs.gemcode (Boolean, default=false) cell_ranger_bam_to_fastqs.longranger20 (Boolean, default=false) compute_checksum.modify_disk_size_gb (Int, default=0) count.memory_gb (Int, default=16) count.modify_disk_size_gb (Int, default=0) count.ncpu (Int, default=1) strandedness.min_mapq (Int, default=30) strandedness.min_reads_per_gene (Int, default=10) strandedness.modify_disk_size_gb (Int, default=0) strandedness.num_genes (Int, default=1000) strandedness.outfile_name (String, default=basename(bam,\".bam\") + \".strandedness.tsv\") strandedness.split_by_rg (Boolean, default=false) subsample.modify_disk_size_gb (Int, default=0) subsample.ncpu (Int, default=2) subsample.prefix (String, default=basename(bam,\".bam\")): Prefix for output files validate_bam.ignore_list (Array[String], default=[]) validate_bam.index_validation_stringency_less_exhaustive (Boolean, default=false) validate_bam.max_errors (Int, default=2147483647) validate_bam.memory_gb (Int, default=16) validate_bam.modify_disk_size_gb (Int, default=0) validate_bam.outfile_name (String, default=basename(bam,\".bam\") + \".ValidateSamFile.txt\") validate_bam.succeed_on_errors (Boolean, default=false) validate_bam.succeed_on_warnings (Boolean, default=true) validate_bam.summary_mode (Boolean, default=false) validate_bam.validation_stringency (String, default=\"LENIENT\") validate_input_bam.ignore_list (Array[String], default=[]) validate_input_bam.index_validation_stringency_less_exhaustive (Boolean, default=false) validate_input_bam.max_errors (Int, default=2147483647) validate_input_bam.memory_gb (Int, default=16) validate_input_bam.modify_disk_size_gb (Int, default=0) validate_input_bam.outfile_name (String, default=basename(bam,\".bam\") + \".ValidateSamFile.txt\") validate_input_bam.succeed_on_errors (Boolean, default=false) validate_input_bam.succeed_on_warnings (Boolean, default=true) validate_input_bam.summary_mode (Boolean, default=false) validate_input_bam.validation_stringency (String, default=\"LENIENT\") cell_ranger_bam_to_fastqs.bamtofastq.memory_gb (Int, default=40) cell_ranger_bam_to_fastqs.bamtofastq.modify_disk_size_gb (Int, default=0) cell_ranger_bam_to_fastqs.bamtofastq.ncpu (Int, default=1) cell_ranger_bam_to_fastqs.fqlint.disable_validator_codes (Array[String], default=[]) cell_ranger_bam_to_fastqs.fqlint.modify_disk_size_gb (Int, default=0) cell_ranger_bam_to_fastqs.fqlint.modify_memory_gb (Int, default=0) cell_ranger_bam_to_fastqs.fqlint.paired_read_validation_level (String, default=\"high\") cell_ranger_bam_to_fastqs.fqlint.panic (Boolean, default=true) cell_ranger_bam_to_fastqs.fqlint.single_read_validation_level (String, default=\"high\") cell_ranger_bam_to_fastqs.quickcheck.modify_disk_size_gb (Int, default=0) Outputs harmonized_bam (File) bam_checksum (File) bam_index (File) qc (File) barcodes (File) features (File) matrix (File) filtered_gene_h5 (File) raw_gene_h5 (File) raw_barcodes (File) raw_features (File) raw_matrix (File) mol_info_h5 (File) web_summary (File) inferred_strandedness (File)","title":"Scrnaseq standard"},{"location":"workflows/scrnaseq-standard/#licensing","text":"","title":"LICENSING"},{"location":"workflows/scrnaseq-standard/#mit-license","text":"Copyright 2022-Present St. Jude Children's Research Hospital Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.","title":"MIT License"},{"location":"workflows/scrnaseq-standard/#scrnaseq_standard","text":"","title":"scrnaseq_standard"},{"location":"workflows/scrnaseq-standard/#inputs","text":"","title":"Inputs"},{"location":"workflows/scrnaseq-standard/#required","text":"bam (File, required ): Input BAM format file to quality check gtf (File, required ): Gzipped GTF feature file transcriptome_tar_gz (File, required ): Database of reference files for Cell Ranger. Can be downloaded from 10x Genomics. compute_checksum._runtime (Any, required ) count._runtime (Any, required ) strandedness._runtime (Any, required ) subsample._runtime (Any, required ) validate_bam._runtime (Any, required ) validate_input_bam._runtime (Any, required ) cell_ranger_bam_to_fastqs.bamtofastq._runtime (Any, required ) cell_ranger_bam_to_fastqs.fqlint._runtime (Any, required ) cell_ranger_bam_to_fastqs.quickcheck._runtime (Any, required )","title":"Required"},{"location":"workflows/scrnaseq-standard/#optional","text":"validate_bam.reference_fasta (File?) validate_input_bam.reference_fasta (File?)","title":"Optional"},{"location":"workflows/scrnaseq-standard/#defaults","text":"prefix (String, default=basename(bam,\".bam\")): Prefix for output files subsample_n_reads (Int, default=-1): Only process a random sampling of n reads. <= 0 for processing entire input BAM. use_all_cores (Boolean, default=false): Use all cores for multi-core steps? validate_input (Boolean, default=true): Ensure input BAM is well-formed before beginning harmonization? cell_ranger_bam_to_fastqs.cellranger11 (Boolean, default=false) cell_ranger_bam_to_fastqs.gemcode (Boolean, default=false) cell_ranger_bam_to_fastqs.longranger20 (Boolean, default=false) compute_checksum.modify_disk_size_gb (Int, default=0) count.memory_gb (Int, default=16) count.modify_disk_size_gb (Int, default=0) count.ncpu (Int, default=1) strandedness.min_mapq (Int, default=30) strandedness.min_reads_per_gene (Int, default=10) strandedness.modify_disk_size_gb (Int, default=0) strandedness.num_genes (Int, default=1000) strandedness.outfile_name (String, default=basename(bam,\".bam\") + \".strandedness.tsv\") strandedness.split_by_rg (Boolean, default=false) subsample.modify_disk_size_gb (Int, default=0) subsample.ncpu (Int, default=2) subsample.prefix (String, default=basename(bam,\".bam\")): Prefix for output files validate_bam.ignore_list (Array[String], default=[]) validate_bam.index_validation_stringency_less_exhaustive (Boolean, default=false) validate_bam.max_errors (Int, default=2147483647) validate_bam.memory_gb (Int, default=16) validate_bam.modify_disk_size_gb (Int, default=0) validate_bam.outfile_name (String, default=basename(bam,\".bam\") + \".ValidateSamFile.txt\") validate_bam.succeed_on_errors (Boolean, default=false) validate_bam.succeed_on_warnings (Boolean, default=true) validate_bam.summary_mode (Boolean, default=false) validate_bam.validation_stringency (String, default=\"LENIENT\") validate_input_bam.ignore_list (Array[String], default=[]) validate_input_bam.index_validation_stringency_less_exhaustive (Boolean, default=false) validate_input_bam.max_errors (Int, default=2147483647) validate_input_bam.memory_gb (Int, default=16) validate_input_bam.modify_disk_size_gb (Int, default=0) validate_input_bam.outfile_name (String, default=basename(bam,\".bam\") + \".ValidateSamFile.txt\") validate_input_bam.succeed_on_errors (Boolean, default=false) validate_input_bam.succeed_on_warnings (Boolean, default=true) validate_input_bam.summary_mode (Boolean, default=false) validate_input_bam.validation_stringency (String, default=\"LENIENT\") cell_ranger_bam_to_fastqs.bamtofastq.memory_gb (Int, default=40) cell_ranger_bam_to_fastqs.bamtofastq.modify_disk_size_gb (Int, default=0) cell_ranger_bam_to_fastqs.bamtofastq.ncpu (Int, default=1) cell_ranger_bam_to_fastqs.fqlint.disable_validator_codes (Array[String], default=[]) cell_ranger_bam_to_fastqs.fqlint.modify_disk_size_gb (Int, default=0) cell_ranger_bam_to_fastqs.fqlint.modify_memory_gb (Int, default=0) cell_ranger_bam_to_fastqs.fqlint.paired_read_validation_level (String, default=\"high\") cell_ranger_bam_to_fastqs.fqlint.panic (Boolean, default=true) cell_ranger_bam_to_fastqs.fqlint.single_read_validation_level (String, default=\"high\") cell_ranger_bam_to_fastqs.quickcheck.modify_disk_size_gb (Int, default=0)","title":"Defaults"},{"location":"workflows/scrnaseq-standard/#outputs","text":"harmonized_bam (File) bam_checksum (File) bam_index (File) qc (File) barcodes (File) features (File) matrix (File) filtered_gene_h5 (File) raw_gene_h5 (File) raw_barcodes (File) raw_features (File) raw_matrix (File) mol_info_h5 (File) web_summary (File) inferred_strandedness (File)","title":"Outputs"},{"location":"workflows/star-db-build/","text":"star_db_build description Builds a database suitable for running the STAR alignment program outputs {'reference_fa': 'FASTA format reference file', 'gtf': 'GTF feature file', 'star_db_tar_gz': 'A gzipped TAR file containing the STAR reference files'} allowNestedInputs true Inputs Required gtf_name (String, required ): Name of output GTF file gtf_url (String, required ): URL to retrieve the reference GTF file from reference_fa_name (String, required ): Name of output reference FASTA file reference_fa_url (String, required ): URL to retrieve the reference FASTA file from build_star_db._runtime (Any, required ) gtf_download._runtime (Any, required ) reference_download._runtime (Any, required ) Optional gtf_md5 (String?): Expected md5sum of GTF file reference_fa_md5 (String?): Expected md5sum of reference FASTA file Defaults gtf_disk_size_gb (Int, default=10): Disk space to allocate the GTF download task reference_fa_disk_size_gb (Int, default=10): Disk space to allocate the FASTA download task build_star_db.db_name (String, default=\"star_db\") build_star_db.genomeChrBinNbits (Int, default=18) build_star_db.genomeSAindexNbases (Int, default=14) build_star_db.genomeSAsparseD (Int, default=1) build_star_db.genomeSuffixLengthMax (Int, default=-1) build_star_db.memory_gb (Int, default=50) build_star_db.modify_disk_size_gb (Int, default=0) build_star_db.ncpu (Int, default=8) build_star_db.sjdbGTFchrPrefix (String, default=\"-\") build_star_db.sjdbGTFfeatureExon (String, default=\"exon\") build_star_db.sjdbGTFtagExonParentGene (String, default=\"gene_id\") build_star_db.sjdbGTFtagExonParentGeneName (String, default=\"gene_name\") build_star_db.sjdbGTFtagExonParentGeneType (String, default=\"gene_type gene_biotype\") build_star_db.sjdbGTFtagExonParentTranscript (String, default=\"transcript_id\") build_star_db.sjdbOverhang (Int, default=125) build_star_db.use_all_cores (Boolean, default=false) Outputs reference_fa (File) gtf (File) star_db_tar_gz (File)","title":"Star db build"},{"location":"workflows/star-db-build/#star_db_build","text":"description Builds a database suitable for running the STAR alignment program outputs {'reference_fa': 'FASTA format reference file', 'gtf': 'GTF feature file', 'star_db_tar_gz': 'A gzipped TAR file containing the STAR reference files'} allowNestedInputs true","title":"star_db_build"},{"location":"workflows/star-db-build/#inputs","text":"","title":"Inputs"},{"location":"workflows/star-db-build/#required","text":"gtf_name (String, required ): Name of output GTF file gtf_url (String, required ): URL to retrieve the reference GTF file from reference_fa_name (String, required ): Name of output reference FASTA file reference_fa_url (String, required ): URL to retrieve the reference FASTA file from build_star_db._runtime (Any, required ) gtf_download._runtime (Any, required ) reference_download._runtime (Any, required )","title":"Required"},{"location":"workflows/star-db-build/#optional","text":"gtf_md5 (String?): Expected md5sum of GTF file reference_fa_md5 (String?): Expected md5sum of reference FASTA file","title":"Optional"},{"location":"workflows/star-db-build/#defaults","text":"gtf_disk_size_gb (Int, default=10): Disk space to allocate the GTF download task reference_fa_disk_size_gb (Int, default=10): Disk space to allocate the FASTA download task build_star_db.db_name (String, default=\"star_db\") build_star_db.genomeChrBinNbits (Int, default=18) build_star_db.genomeSAindexNbases (Int, default=14) build_star_db.genomeSAsparseD (Int, default=1) build_star_db.genomeSuffixLengthMax (Int, default=-1) build_star_db.memory_gb (Int, default=50) build_star_db.modify_disk_size_gb (Int, default=0) build_star_db.ncpu (Int, default=8) build_star_db.sjdbGTFchrPrefix (String, default=\"-\") build_star_db.sjdbGTFfeatureExon (String, default=\"exon\") build_star_db.sjdbGTFtagExonParentGene (String, default=\"gene_id\") build_star_db.sjdbGTFtagExonParentGeneName (String, default=\"gene_name\") build_star_db.sjdbGTFtagExonParentGeneType (String, default=\"gene_type gene_biotype\") build_star_db.sjdbGTFtagExonParentTranscript (String, default=\"transcript_id\") build_star_db.sjdbOverhang (Int, default=125) build_star_db.use_all_cores (Boolean, default=false)","title":"Defaults"},{"location":"workflows/star-db-build/#outputs","text":"reference_fa (File) gtf (File) star_db_tar_gz (File)","title":"Outputs"}]}
\ No newline at end of file
+{"config":{"indexing":"full","lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"This repository contains all bioinformatics workflows used on the St. Jude Cloud project. Officially, the repository is in beta \u2014 the project is adding workflows as they are developed and put into production. Resources requirements have been optimized to minimize failures in our computing environment, but they may not reflect the best settings for your use case. Please ensure that you tailor these parameters to fit your needs. \ud83c\udfe0 Homepage Please excuse the state of our documentation. We are working on some big changes around here, and with those changes will come much improved documentation. Repository Structure The repository is laid out as follows: workflows/ - Directory containing all end-to-end bioinformatics workflows. tools/ - All tools we have wrapped as individual WDL tasks. data_structures/ - WDL struct definitions and tasks or workflows related to their construction, parsing, or validation. docker/ - Dockerfiles used in our workflows. All docker images are published to the GitHub Container Registy as a part of our CI and are versioned. tests/ - Home to all of our testing infrastructure. We use pytest-workflow for validating our code. bin/ - no longer in use Scripts used by Cromwell configuration settings. Add this to $PATH prior to using configurations in conf with Cromwell. conf/ - no longer in use Cromwell configuration files created for various environments that we use across our team. Feel free to use/fork/suggest improvements. Bootstrap guide This repository implements workflows using the Workflow Description Language (WDL). If unfamiliar with WDL, a short overview is available in the WDL spec . The workflows and tasks in this repository should require minimal set-up and configuration before you're ready to run. You don't even need to clone the repo! The bare minimum requirements are a locally installed WDL runner and an internet connection. The exact steps for installation, configuration, and execution are going to depend on you environment and preferred engine. There are a variety of WDL engines you could use, though our team prefers miniwdl . We also make use of the miniwdl-lsf plugin for running on our LSF cluster. Most WDL runners are capable of running a WDL file from a URL. This is how we most commonly execute our workflows and tasks. The below command could be used to submit a run of our rnaseq-standard workflow using miniwdl : miniwdl run --verbose --input inputs.json https://raw.githubusercontent.com/stjudecloud/workflows/rnaseq-standard/v3.0.1/workflows/rnaseq/rnaseq-standard.wdl For an introduction to WDL, there are many guides, one of which is from Terra . Author \ud83d\udc64 St. Jude Cloud Team Website: https://stjude.cloud Github: @stjudecloud Twitter: @StJudeResearch Tests Every task in this repository is covered by at least one test (see all of our tests in tests/tools/ ). These are run using pytest-workflow . The command for running our tests should be executed at the root of the repo: python -m pytest --kwdof --git-aware \ud83e\udd1d Contributing Contributions, issues and feature requests are welcome! Feel free to check issues page . You can also take a look at the contributing guide . Links worth checking out The OpenWDL GitHub Our preferred WDL runner: miniwdl Most of our tasks are run inside a BioContainers image Our tasks are validated using pytest-workflow \ud83d\udcdd License Copyright \u00a9 2020-Present St. Jude Cloud Team . This project is MIT licensed.","title":"Home"},{"location":"#homepage","text":"Please excuse the state of our documentation. We are working on some big changes around here, and with those changes will come much improved documentation.","title":"\ud83c\udfe0 Homepage"},{"location":"#repository-structure","text":"The repository is laid out as follows: workflows/ - Directory containing all end-to-end bioinformatics workflows. tools/ - All tools we have wrapped as individual WDL tasks. data_structures/ - WDL struct definitions and tasks or workflows related to their construction, parsing, or validation. docker/ - Dockerfiles used in our workflows. All docker images are published to the GitHub Container Registy as a part of our CI and are versioned. tests/ - Home to all of our testing infrastructure. We use pytest-workflow for validating our code. bin/ - no longer in use Scripts used by Cromwell configuration settings. Add this to $PATH prior to using configurations in conf with Cromwell. conf/ - no longer in use Cromwell configuration files created for various environments that we use across our team. Feel free to use/fork/suggest improvements.","title":"Repository Structure"},{"location":"#bootstrap-guide","text":"This repository implements workflows using the Workflow Description Language (WDL). If unfamiliar with WDL, a short overview is available in the WDL spec . The workflows and tasks in this repository should require minimal set-up and configuration before you're ready to run. You don't even need to clone the repo! The bare minimum requirements are a locally installed WDL runner and an internet connection. The exact steps for installation, configuration, and execution are going to depend on you environment and preferred engine. There are a variety of WDL engines you could use, though our team prefers miniwdl . We also make use of the miniwdl-lsf plugin for running on our LSF cluster. Most WDL runners are capable of running a WDL file from a URL. This is how we most commonly execute our workflows and tasks. The below command could be used to submit a run of our rnaseq-standard workflow using miniwdl : miniwdl run --verbose --input inputs.json https://raw.githubusercontent.com/stjudecloud/workflows/rnaseq-standard/v3.0.1/workflows/rnaseq/rnaseq-standard.wdl For an introduction to WDL, there are many guides, one of which is from Terra .","title":"Bootstrap guide"},{"location":"#author","text":"\ud83d\udc64 St. Jude Cloud Team Website: https://stjude.cloud Github: @stjudecloud Twitter: @StJudeResearch","title":"Author"},{"location":"#tests","text":"Every task in this repository is covered by at least one test (see all of our tests in tests/tools/ ). These are run using pytest-workflow . The command for running our tests should be executed at the root of the repo: python -m pytest --kwdof --git-aware","title":"Tests"},{"location":"#contributing","text":"Contributions, issues and feature requests are welcome! Feel free to check issues page . You can also take a look at the contributing guide .","title":"\ud83e\udd1d Contributing"},{"location":"#links-worth-checking-out","text":"The OpenWDL GitHub Our preferred WDL runner: miniwdl Most of our tasks are run inside a BioContainers image Our tasks are validated using pytest-workflow","title":"Links worth checking out"},{"location":"#license","text":"Copyright \u00a9 2020-Present St. Jude Cloud Team . This project is MIT licensed.","title":"\ud83d\udcdd License"},{"location":"build_for_dnanexus/","text":"Building WDL workflows for DNAnexus Obtain dxWDL JAR Retrieve the latest dxWDL JAR release from GitHub: https://github.com/dnanexus/dxWDL/releases Optional workflow parameters for dxWDL -project - Specify a project to compile the workflow. This is optional and otherwise uses the currently selected project. -archive - Archive older versions of the workflow and applets -defaults - Set default options for certain parameters -verbose - Detailed build information -locked - Creates a one stage worklfow that is cleaner in the interface -extras - JSON formatted file with options primarily for the DNAnexus platform settings Build Interactive t-SNE workflow for DNAnexus Commands for building the t-SNE workflows are included below. Your version of dxWDL may differ from the version included below. Several optional parameters are included. -defaults specifies DNAnexus paths to reference data for the workflow. -extras specifies that tasks should be retried by default on failure. Build workflow running htseq-count on BAM input java -jar dxWDL-v1.46.2.jar compile workflows/interactive-tsne/interactive_tsne_from_bams.wdl -project project-FjFfvV89F80QvvxJ8131yzpB -archive -verbose -defaults workflows/interactive-tsne/inputs/defaults_bam.json -extras workflows/interactive-tsne/inputs/extras.json -locked Build workflow from HTSeq counts data java -jar dxWDL-v1.46.2.jar compile workflows/interactive-tsne/interactive_tsne_from_counts.wdl -project project-FjFfvV89F80QvvxJ8131yzpB -archive -verbose -defaults workflows/interactive-tsne/inputs/defaults_counts.json -extras workflows/interactive-tsne/inputs/extras.json -locked Build workflow with RNA-Seq V2 remapping of BAM input java -jar dxWDL-v1.46.2.jar compile workflows/interactive-tsne/interactive-tsne.wdl -project project-FjFfvV89F80QvvxJ8131yzpB -archive -verbose -defaults workflows/interactive-tsne/inputs/defaults.json -extras workflows/interactive-tsne/inputs/extras.json -locked","title":"Building WDL workflows for DNAnexus"},{"location":"build_for_dnanexus/#building-wdl-workflows-for-dnanexus","text":"","title":"Building WDL workflows for DNAnexus"},{"location":"build_for_dnanexus/#obtain-dxwdl-jar","text":"Retrieve the latest dxWDL JAR release from GitHub: https://github.com/dnanexus/dxWDL/releases","title":"Obtain dxWDL JAR"},{"location":"build_for_dnanexus/#optional-workflow-parameters-for-dxwdl","text":"-project - Specify a project to compile the workflow. This is optional and otherwise uses the currently selected project. -archive - Archive older versions of the workflow and applets -defaults - Set default options for certain parameters -verbose - Detailed build information -locked - Creates a one stage worklfow that is cleaner in the interface -extras - JSON formatted file with options primarily for the DNAnexus platform settings","title":"Optional workflow parameters for dxWDL"},{"location":"build_for_dnanexus/#build-interactive-t-sne-workflow-for-dnanexus","text":"Commands for building the t-SNE workflows are included below. Your version of dxWDL may differ from the version included below. Several optional parameters are included. -defaults specifies DNAnexus paths to reference data for the workflow. -extras specifies that tasks should be retried by default on failure.","title":"Build Interactive t-SNE workflow for DNAnexus"},{"location":"build_for_dnanexus/#build-workflow-running-htseq-count-on-bam-input","text":"java -jar dxWDL-v1.46.2.jar compile workflows/interactive-tsne/interactive_tsne_from_bams.wdl -project project-FjFfvV89F80QvvxJ8131yzpB -archive -verbose -defaults workflows/interactive-tsne/inputs/defaults_bam.json -extras workflows/interactive-tsne/inputs/extras.json -locked","title":"Build workflow running htseq-count on BAM input"},{"location":"build_for_dnanexus/#build-workflow-from-htseq-counts-data","text":"java -jar dxWDL-v1.46.2.jar compile workflows/interactive-tsne/interactive_tsne_from_counts.wdl -project project-FjFfvV89F80QvvxJ8131yzpB -archive -verbose -defaults workflows/interactive-tsne/inputs/defaults_counts.json -extras workflows/interactive-tsne/inputs/extras.json -locked","title":"Build workflow from HTSeq counts data"},{"location":"build_for_dnanexus/#build-workflow-with-rna-seq-v2-remapping-of-bam-input","text":"java -jar dxWDL-v1.46.2.jar compile workflows/interactive-tsne/interactive-tsne.wdl -project project-FjFfvV89F80QvvxJ8131yzpB -archive -verbose -defaults workflows/interactive-tsne/inputs/defaults.json -extras workflows/interactive-tsne/inputs/extras.json -locked","title":"Build workflow with RNA-Seq V2 remapping of BAM input"},{"location":"tasks/bwa/","text":"Homepage bwa_aln description Maps Single-End FASTQ files to BAM format using bwa aln outputs {'bam': 'Aligned BAM format file'} Inputs Required _runtime (Any, required ) bwa_db_tar_gz (File, required ): Gzipped tar archive of the bwa reference files. Files should be at the root of the archive. fastq (File, required ): Input FASTQ file to align with bwa Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true prefix (String, default=sub(basename(fastq),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\")): Prefix for the BAM file. The extension .bam will be added. read_group (String, default=\"\"); description : Read group information for BWA to insert into the header. BWA format: '@RG ID:foo SM:bar'; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs bam (File) bwa_aln_pe description Maps Paired-End FASTQ files to BAM format using bwa aln outputs {'bam': 'Aligned BAM format file'} Inputs Required _runtime (Any, required ) bwa_db_tar_gz (File, required ): Gzipped tar archive of the bwa reference files. Files should be at the root of the archive. read_one_fastq_gz (File, required ); description : Input gzipped FASTQ read one file to align with bwa; stream : false read_two_fastq_gz (File, required ); description : Input gzipped FASTQ read two file to align with bwa; stream : false Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4); description : Number of cores to allocate for task; common : true prefix (String, default=sub(basename(read_one_fastq_gz),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\")): Prefix for the BAM file. The extension .bam will be added. read_group (String, default=\"\"); description : Read group information for BWA to insert into the header. BWA format: '@RG ID:foo SM:bar'; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs bam (File) bwa_mem description Maps FASTQ files to BAM format using bwa mem outputs {'bam': 'Aligned BAM format file'} Inputs Required _runtime (Any, required ) bwa_db_tar_gz (File, required ): Gzipped tar archive of the bwa reference files. Files should be at the root of the archive. read_one_fastq_gz (File, required ): Input gzipped FASTQ read one file to align with bwa Optional read_two_fastq_gz (File?): Input gzipped FASTQ read two file to align with bwa Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4); description : Number of cores to allocate for task; common : true prefix (String, default=sub(basename(read_one_fastq_gz),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\")): Prefix for the BAM file. The extension .bam will be added. read_group (String, default=\"\"); description : Read group information for BWA to insert into the header. BWA format: '@RG ID:foo SM:bar'; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs bam (File) build_bwa_db description Creates a BWA index and returns it as a compressed tar archive outputs {'bwa_db_tar_gz': 'Tarballed bwa reference files'} Inputs Required _runtime (Any, required ) reference_fasta (File, required ): Input reference Fasta file to index with bwa. Should be compressed with gzip. Defaults db_name (String, default=\"bwa_db\"); description : Name of the output gzipped tar archive of the bwa reference files. The extension .tar.gz will be added.; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. Outputs bwa_db_tar_gz (File)","title":"Bwa"},{"location":"tasks/bwa/#bwa_aln","text":"description Maps Single-End FASTQ files to BAM format using bwa aln outputs {'bam': 'Aligned BAM format file'}","title":"bwa_aln"},{"location":"tasks/bwa/#inputs","text":"","title":"Inputs"},{"location":"tasks/bwa/#required","text":"_runtime (Any, required ) bwa_db_tar_gz (File, required ): Gzipped tar archive of the bwa reference files. Files should be at the root of the archive. fastq (File, required ): Input FASTQ file to align with bwa","title":"Required"},{"location":"tasks/bwa/#defaults","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true prefix (String, default=sub(basename(fastq),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\")): Prefix for the BAM file. The extension .bam will be added. read_group (String, default=\"\"); description : Read group information for BWA to insert into the header. BWA format: '@RG ID:foo SM:bar'; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/bwa/#outputs","text":"bam (File)","title":"Outputs"},{"location":"tasks/bwa/#bwa_aln_pe","text":"description Maps Paired-End FASTQ files to BAM format using bwa aln outputs {'bam': 'Aligned BAM format file'}","title":"bwa_aln_pe"},{"location":"tasks/bwa/#inputs_1","text":"","title":"Inputs"},{"location":"tasks/bwa/#required_1","text":"_runtime (Any, required ) bwa_db_tar_gz (File, required ): Gzipped tar archive of the bwa reference files. Files should be at the root of the archive. read_one_fastq_gz (File, required ); description : Input gzipped FASTQ read one file to align with bwa; stream : false read_two_fastq_gz (File, required ); description : Input gzipped FASTQ read two file to align with bwa; stream : false","title":"Required"},{"location":"tasks/bwa/#defaults_1","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4); description : Number of cores to allocate for task; common : true prefix (String, default=sub(basename(read_one_fastq_gz),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\")): Prefix for the BAM file. The extension .bam will be added. read_group (String, default=\"\"); description : Read group information for BWA to insert into the header. BWA format: '@RG ID:foo SM:bar'; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/bwa/#outputs_1","text":"bam (File)","title":"Outputs"},{"location":"tasks/bwa/#bwa_mem","text":"description Maps FASTQ files to BAM format using bwa mem outputs {'bam': 'Aligned BAM format file'}","title":"bwa_mem"},{"location":"tasks/bwa/#inputs_2","text":"","title":"Inputs"},{"location":"tasks/bwa/#required_2","text":"_runtime (Any, required ) bwa_db_tar_gz (File, required ): Gzipped tar archive of the bwa reference files. Files should be at the root of the archive. read_one_fastq_gz (File, required ): Input gzipped FASTQ read one file to align with bwa","title":"Required"},{"location":"tasks/bwa/#optional","text":"read_two_fastq_gz (File?): Input gzipped FASTQ read two file to align with bwa","title":"Optional"},{"location":"tasks/bwa/#defaults_2","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4); description : Number of cores to allocate for task; common : true prefix (String, default=sub(basename(read_one_fastq_gz),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\")): Prefix for the BAM file. The extension .bam will be added. read_group (String, default=\"\"); description : Read group information for BWA to insert into the header. BWA format: '@RG ID:foo SM:bar'; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/bwa/#outputs_2","text":"bam (File)","title":"Outputs"},{"location":"tasks/bwa/#build_bwa_db","text":"description Creates a BWA index and returns it as a compressed tar archive outputs {'bwa_db_tar_gz': 'Tarballed bwa reference files'}","title":"build_bwa_db"},{"location":"tasks/bwa/#inputs_3","text":"","title":"Inputs"},{"location":"tasks/bwa/#required_3","text":"_runtime (Any, required ) reference_fasta (File, required ): Input reference Fasta file to index with bwa. Should be compressed with gzip.","title":"Required"},{"location":"tasks/bwa/#defaults_3","text":"db_name (String, default=\"bwa_db\"); description : Name of the output gzipped tar archive of the bwa reference files. The extension .tar.gz will be added.; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.","title":"Defaults"},{"location":"tasks/bwa/#outputs_3","text":"bwa_db_tar_gz (File)","title":"Outputs"},{"location":"tasks/cellranger/","text":"Cell Ranger This WDL file wrap the 10x Genomics Cell Ranger tool. Cell Ranger is a tool for handling scRNA-Seq data. count description This WDL task runs Cell Ranger count to generate an aligned BAM and feature counts from scRNA-Seq data. Inputs Required _runtime (Any, required ) fastqs_tar_gz (File, required ): Path to the FASTQ folder archive in .tar.gz format id (String, required ): A unique run ID transcriptome_tar_gz (File, required ): Path to Cell Ranger-compatible transcriptome reference in .tar.gz format Defaults memory_gb (Int, default=16): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=1): Number of cores to allocate for task use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. Outputs bam (File) bam_index (File) qc (File) barcodes (File) features (File) matrix (File) filtered_gene_h5 (File) raw_gene_h5 (File) raw_barcodes (File) raw_features (File) raw_matrix (File) mol_info_h5 (File) web_summary (File) cloupe (File) bamtofastq description This WDL task runs the 10x bamtofastq tool to convert Cell Ranger generated BAM files back to FASTQ files Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM to convert to Cell Ranger compatible fastqs Defaults cellranger11 (Boolean, default=false): Convert a BAM produced by Cell Ranger 1.0-1.1 gemcode (Boolean, default=false): Convert a BAM produced from GemCode data (Longranger 1.0 - 1.3) longranger20 (Boolean, default=false): Convert a BAM produced by Longranger 2.0 memory_gb (Int, default=40): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=1): Number of cores to allocate for task use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. Outputs fastqs (Array[File]) fastqs_archive (File) read_one_fastq_gz (Array[File]) read_two_fastq_gz (Array[File])","title":"Cellranger"},{"location":"tasks/cellranger/#count","text":"description This WDL task runs Cell Ranger count to generate an aligned BAM and feature counts from scRNA-Seq data.","title":"count"},{"location":"tasks/cellranger/#inputs","text":"","title":"Inputs"},{"location":"tasks/cellranger/#required","text":"_runtime (Any, required ) fastqs_tar_gz (File, required ): Path to the FASTQ folder archive in .tar.gz format id (String, required ): A unique run ID transcriptome_tar_gz (File, required ): Path to Cell Ranger-compatible transcriptome reference in .tar.gz format","title":"Required"},{"location":"tasks/cellranger/#defaults","text":"memory_gb (Int, default=16): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=1): Number of cores to allocate for task use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments.","title":"Defaults"},{"location":"tasks/cellranger/#outputs","text":"bam (File) bam_index (File) qc (File) barcodes (File) features (File) matrix (File) filtered_gene_h5 (File) raw_gene_h5 (File) raw_barcodes (File) raw_features (File) raw_matrix (File) mol_info_h5 (File) web_summary (File) cloupe (File)","title":"Outputs"},{"location":"tasks/cellranger/#bamtofastq","text":"description This WDL task runs the 10x bamtofastq tool to convert Cell Ranger generated BAM files back to FASTQ files","title":"bamtofastq"},{"location":"tasks/cellranger/#inputs_1","text":"","title":"Inputs"},{"location":"tasks/cellranger/#required_1","text":"_runtime (Any, required ) bam (File, required ): Input BAM to convert to Cell Ranger compatible fastqs","title":"Required"},{"location":"tasks/cellranger/#defaults_1","text":"cellranger11 (Boolean, default=false): Convert a BAM produced by Cell Ranger 1.0-1.1 gemcode (Boolean, default=false): Convert a BAM produced from GemCode data (Longranger 1.0 - 1.3) longranger20 (Boolean, default=false): Convert a BAM produced by Longranger 2.0 memory_gb (Int, default=40): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=1): Number of cores to allocate for task use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments.","title":"Defaults"},{"location":"tasks/cellranger/#outputs_1","text":"fastqs (Array[File]) fastqs_archive (File) read_one_fastq_gz (Array[File]) read_two_fastq_gz (Array[File])","title":"Outputs"},{"location":"tasks/deeptools/","text":"Homepage bam_coverage description Generates a BigWig coverage track using bamCoverage from DeepTools outputs {'bigwig': 'BigWig format coverage file'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to generate coverage for bam_index (File, required ): BAM index file corresponding to the input BAM Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\")): Prefix for the BigWig file. The extension .bw will be added. use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs bigwig (File)","title":"Deeptools"},{"location":"tasks/deeptools/#bam_coverage","text":"description Generates a BigWig coverage track using bamCoverage from DeepTools outputs {'bigwig': 'BigWig format coverage file'}","title":"bam_coverage"},{"location":"tasks/deeptools/#inputs","text":"","title":"Inputs"},{"location":"tasks/deeptools/#required","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to generate coverage for bam_index (File, required ): BAM index file corresponding to the input BAM","title":"Required"},{"location":"tasks/deeptools/#defaults","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\")): Prefix for the BigWig file. The extension .bw will be added. use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/deeptools/#outputs","text":"bigwig (File)","title":"Outputs"},{"location":"tasks/estimate/","text":"Homepage run_estimate description [DEPRECATED] Given a gene expression file, run the ESTIMATE software package outputs {'estimate_file': 'The results file of the ESTIMATE software package'} deprecated true Inputs Required _runtime (Any, required ) gene_expression_file (File, required ): A 2 column headered TSV file with 'Gene name' in the first column and gene expression values (as floats) in the second column. Can be generated with the calc_tpm task. Defaults disk_size_gb (Int, default=10): Disk space to allocate for task, specified in GB max_retries (Int, default=1): Number of times to retry in case of failure memory_gb (Int, default=4): RAM to allocate for task, specified in GB outfile_name (String, default=basename(gene_expression_file,\".TPM.txt\") + \".ESTIMATE.gct\"): Name of the ESTIMATE output file Outputs estimate_file (File)","title":"Estimate"},{"location":"tasks/estimate/#run_estimate","text":"description [DEPRECATED] Given a gene expression file, run the ESTIMATE software package outputs {'estimate_file': 'The results file of the ESTIMATE software package'} deprecated true","title":"run_estimate"},{"location":"tasks/estimate/#inputs","text":"","title":"Inputs"},{"location":"tasks/estimate/#required","text":"_runtime (Any, required ) gene_expression_file (File, required ): A 2 column headered TSV file with 'Gene name' in the first column and gene expression values (as floats) in the second column. Can be generated with the calc_tpm task.","title":"Required"},{"location":"tasks/estimate/#defaults","text":"disk_size_gb (Int, default=10): Disk space to allocate for task, specified in GB max_retries (Int, default=1): Number of times to retry in case of failure memory_gb (Int, default=4): RAM to allocate for task, specified in GB outfile_name (String, default=basename(gene_expression_file,\".TPM.txt\") + \".ESTIMATE.gct\"): Name of the ESTIMATE output file","title":"Defaults"},{"location":"tasks/estimate/#outputs","text":"estimate_file (File)","title":"Outputs"},{"location":"tasks/fastqc/","text":"Homepage fastqc description Generates a FastQC quality control metrics report for the input BAM file outputs {'raw_data': 'A zip archive of raw FastQC data. Can be parsed by MultiQC.', 'results': 'A gzipped tar archive of all FastQC output files'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to run FastQC on Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\") + \".fastqc_results\"): Prefix for the FastQC results directory. The extension .tar.gz will be added. use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs raw_data (File) results (File)","title":"Fastqc"},{"location":"tasks/fastqc/#fastqc","text":"description Generates a FastQC quality control metrics report for the input BAM file outputs {'raw_data': 'A zip archive of raw FastQC data. Can be parsed by MultiQC.', 'results': 'A gzipped tar archive of all FastQC output files'}","title":"fastqc"},{"location":"tasks/fastqc/#inputs","text":"","title":"Inputs"},{"location":"tasks/fastqc/#required","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to run FastQC on","title":"Required"},{"location":"tasks/fastqc/#defaults","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\") + \".fastqc_results\"): Prefix for the FastQC results directory. The extension .tar.gz will be added. use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/fastqc/#outputs","text":"raw_data (File) results (File)","title":"Outputs"},{"location":"tasks/fq/","text":"Homepage fqlint description Performs quality control on the input FASTQs to ensure proper formatting outputs {'validated_read1': 'The unmodified input read one FASTQ after it has been successfully validated', 'validated_read2': 'The unmodified input read two FASTQ after it has been successfully validated'} Inputs Required _runtime (Any, required ) read_one_fastq (File, required ); description : Input FASTQ with read one. Can be gzipped or uncompressed.; stream : true Optional read_two_fastq (File?); description : Input FASTQ with read two. Can be gzipped or uncompressed.; stream : true Defaults disable_validator_codes (Array[String], default=[]); description : Array of codes to disable specific validators; choices : {'S001': \"Plus line starts with a '+'\", 'S002': \"All characters in sequence line are one of 'ACGTN', case-insensitive\", 'S003': \"Name line starts with an '@'\", 'S004': 'All four record lines (name, sequence, plus line, and quality) are present', 'S005': 'Sequence and quality lengths are the same', 'S006': \"All characters in quality line are between '!' and '~' (ordinal values)\", 'S007': 'All record names are unique', 'P001': 'Each paired read name is the same, excluding interleave'} modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. paired_read_validation_level (String, default=\"high\"); description : Only use paired read validators up to a given level; choices : ['low', 'medium', 'high'] panic (Boolean, default=true); description : Panic on first error (true) or log all errors (false)?; common : true single_read_validation_level (String, default=\"high\"); description : Only use single read validators up to a given level; choices : ['low', 'medium', 'high'] Outputs check (String) subsample description Subsamples the input FASTQ(s) outputs {'subsampled_read1': 'Gzipped FASTQ file containing subsampled read1', 'subsampled_read2': 'Gzipped FASTQ file containing subsampled read2'} Inputs Required _runtime (Any, required ) read_one_fastq (File, required ): Input FASTQ with read one. Can be gzipped or uncompressed. Optional read_two_fastq (File?): Input FASTQ with read two. Can be gzipped or uncompressed. Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=sub(basename(read_one_fastq),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\")): Prefix for the output FASTQ file(s). The extension _R1.subsampled.fastq.gz and _R2.subsampled.fastq.gz will be added. probability (Float, default=1.0); description : The probability a record is kept, as a decimal (0.0, 1.0). Cannot be used with record-count . Any probability<=0.0 or probability>=1.0 to disable.; common : true record_count (Int, default=-1); description : The exact number of records to keep. Cannot be used with probability . Any record_count<=0 to disable.; common : true Outputs subsampled_read1 (File) subsampled_read2 (File?)","title":"Fq"},{"location":"tasks/fq/#fqlint","text":"description Performs quality control on the input FASTQs to ensure proper formatting outputs {'validated_read1': 'The unmodified input read one FASTQ after it has been successfully validated', 'validated_read2': 'The unmodified input read two FASTQ after it has been successfully validated'}","title":"fqlint"},{"location":"tasks/fq/#inputs","text":"","title":"Inputs"},{"location":"tasks/fq/#required","text":"_runtime (Any, required ) read_one_fastq (File, required ); description : Input FASTQ with read one. Can be gzipped or uncompressed.; stream : true","title":"Required"},{"location":"tasks/fq/#optional","text":"read_two_fastq (File?); description : Input FASTQ with read two. Can be gzipped or uncompressed.; stream : true","title":"Optional"},{"location":"tasks/fq/#defaults","text":"disable_validator_codes (Array[String], default=[]); description : Array of codes to disable specific validators; choices : {'S001': \"Plus line starts with a '+'\", 'S002': \"All characters in sequence line are one of 'ACGTN', case-insensitive\", 'S003': \"Name line starts with an '@'\", 'S004': 'All four record lines (name, sequence, plus line, and quality) are present', 'S005': 'Sequence and quality lengths are the same', 'S006': \"All characters in quality line are between '!' and '~' (ordinal values)\", 'S007': 'All record names are unique', 'P001': 'Each paired read name is the same, excluding interleave'} modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. paired_read_validation_level (String, default=\"high\"); description : Only use paired read validators up to a given level; choices : ['low', 'medium', 'high'] panic (Boolean, default=true); description : Panic on first error (true) or log all errors (false)?; common : true single_read_validation_level (String, default=\"high\"); description : Only use single read validators up to a given level; choices : ['low', 'medium', 'high']","title":"Defaults"},{"location":"tasks/fq/#outputs","text":"check (String)","title":"Outputs"},{"location":"tasks/fq/#subsample","text":"description Subsamples the input FASTQ(s) outputs {'subsampled_read1': 'Gzipped FASTQ file containing subsampled read1', 'subsampled_read2': 'Gzipped FASTQ file containing subsampled read2'}","title":"subsample"},{"location":"tasks/fq/#inputs_1","text":"","title":"Inputs"},{"location":"tasks/fq/#required_1","text":"_runtime (Any, required ) read_one_fastq (File, required ): Input FASTQ with read one. Can be gzipped or uncompressed.","title":"Required"},{"location":"tasks/fq/#optional_1","text":"read_two_fastq (File?): Input FASTQ with read two. Can be gzipped or uncompressed.","title":"Optional"},{"location":"tasks/fq/#defaults_1","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=sub(basename(read_one_fastq),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\")): Prefix for the output FASTQ file(s). The extension _R1.subsampled.fastq.gz and _R2.subsampled.fastq.gz will be added. probability (Float, default=1.0); description : The probability a record is kept, as a decimal (0.0, 1.0). Cannot be used with record-count . Any probability<=0.0 or probability>=1.0 to disable.; common : true record_count (Int, default=-1); description : The exact number of records to keep. Cannot be used with probability . Any record_count<=0 to disable.; common : true","title":"Defaults"},{"location":"tasks/fq/#outputs_1","text":"subsampled_read1 (File) subsampled_read2 (File?)","title":"Outputs"},{"location":"tasks/gatk4/","text":"Homepage split_n_cigar_reads description Splits reads that contain Ns in their CIGAR strings into multiple reads. external_help https://gatk.broadinstitute.org/hc/en-us/articles/360036858811-SplitNCigarReads outputs {'split_n_reads_bam': 'BAM file with reads split at N CIGAR elements and updated CIGAR strings.', 'split_n_reads_bam_index': 'Index file for the split BAM', 'split_n_reads_bam_md5': 'MD5 checksum for the split BAM'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to with unsplit reads containing Ns in their CIGAR strings. bam_index (File, required ): BAM index file corresponding to the input BAM dict (File, required ): Dictionary file for FASTA format genome fasta (File, required ): Reference genome in FASTA format. Must be uncompressed. fasta_index (File, required ): Index for FASTA format genome interval_list (File, required ): Interval list indicating regions in which to split reads Defaults memory_gb (Int, default=25): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=8): Number of cores to allocate for task prefix (String, default=basename(bam,\".bam\") + \".split\"): Prefix for the BAM file. The extension .bam will be added. Outputs split_n_reads_bam (File) split_n_reads_bam_index (File) split_n_reads_bam_md5 (File) base_recalibrator description Generates recalibration report for base quality score recalibration. external_help https://gatk.broadinstitute.org/hc/en-us/articles/360036897372-BaseRecalibratorSpark-BETA outputs {'recalibration_report': 'Recalibration report file'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file on which to recabilbrate base quality scores bam_index (File, required ): BAM index file corresponding to the input BAM dbSNP_vcf (File, required ): dbSNP VCF file dbSNP_vcf_index (File, required ): dbSNP VCF index file dict (File, required ): Dictionary file for FASTA format genome fasta (File, required ): Reference genome in FASTA format fasta_index (File, required ): Index for FASTA format genome known_indels_sites_VCFs (Array[File], required ): List of VCF files containing known indels known_indels_sites_indices (Array[File], required ): List of VCF index files corresponding to the VCF files in known_indels_sites_VCFs Defaults memory_gb (Int, default=25): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4): Number of cores to allocate for task outfile_name (String, default=basename(bam,\".bam\") + \".recal.txt\"): Name for the output recalibration report. use_original_quality_scores (Boolean, default=false): Use original quality scores from the input BAM. Default is to use recalibrated quality scores. Outputs recalibration_report (File) apply_bqsr description Applies base quality score recalibration to a BAM file. external_help https://gatk.broadinstitute.org/hc/en-us/articles/360040097972-ApplyBQSRSpark-BETA outputs {'recalibrated_bam': 'Recalibrated BAM file', 'recalibrated_bam_index': 'Index file for the recalibrated BAM'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file on which to apply base quality score recalibration bam_index (File, required ): BAM index file corresponding to the input BAM recalibration_report (File, required ): Recalibration report file Defaults memory_gb (Int, default=25): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4): Number of cores to allocate for task prefix (String, default=basename(bam,\".bam\")): Prefix for the output recalibrated BAM. The extension .bqsr.bam will be added. use_original_quality_scores (Boolean, default=false): Use original quality scores from the input BAM. Default is to use recalibrated quality scores. Outputs recalibrated_bam (File) recalibrated_bam_index (File) haplotype_caller description Calls germline SNPs and indels via local re-assembly of haplotypes. external_help https://gatk.broadinstitute.org/hc/en-us/articles/360037225632-HaplotypeCaller outputs {'vcf': 'VCF file containing called variants', 'vcf_index': 'Index file for the VCF'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file on which to call variants bam_index (File, required ): BAM index file corresponding to the input BAM dbSNP_vcf (File, required ): dbSNP VCF file dbSNP_vcf_index (File, required ): dbSNP VCF index file dict (File, required ): Dictionary file for FASTA format genome fasta (File, required ): Reference genome in FASTA format fasta_index (File, required ): Index for FASTA format genome interval_list (File, required ); description : Interval list indicating regions in which to call variants; external_help : https://gatk.broadinstitute.org/hc/en-us/articles/360035531852-Intervals-and-interval-lists Defaults memory_gb (Int, default=25): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4): Number of cores to allocate for task prefix (String, default=basename(bam,\".bam\")): Prefix for the output VCF. The extension .vcf.gz will be added. stand_call_conf (Int, default=20); description : Minimum confidence threshold for calling variants; external_help : https://gatk.broadinstitute.org/hc/en-us/articles/360037225632-HaplotypeCaller#--standard-min-confidence-threshold-for-calling use_soft_clipped_bases (Boolean, default=false): Use soft clipped bases in variant calling. Default is to ignore soft clipped bases. Outputs vcf (File) vcf_index (File) variant_filtration description Filters variants based on specified criteria. external_help https://gatk.broadinstitute.org/hc/en-us/articles/360037434691-VariantFiltration outputs {'vcf_filtered': 'Filtered VCF file', 'vcf_filtered_index': 'Index file for the filtered VCF'} Inputs Required _runtime (Any, required ) dict (File, required ): Dictionary file for FASTA format genome fasta (File, required ): Reference genome in FASTA format fasta_index (File, required ): Index for FASTA format genome vcf (File, required ): Input VCF format file to filter vcf_index (File, required ): VCF index file corresponding to the input VCF Defaults cluster (Int, default=3): Number of SNPs that must be present in a window to filter filter_expressions (Array[String], default=[\"FS > 30.0\", \"QD < 2.0\"]); description : Expressions for the filters; external_help : https://gatk.broadinstitute.org/hc/en-us/articles/360037434691-VariantFiltration#--filter-expression filter_names (Array[String], default=[\"FS\", \"QD\"]); description : Names of the filters to apply; external_help : https://gatk.broadinstitute.org/hc/en-us/articles/360037434691-VariantFiltration#--filter-name modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=1): Number of cores to allocate for task prefix (String, default=basename(vcf,\".vcf.gz\")): Prefix for the output filtered VCF. The extension .filtered.vcf.gz will be added. window (Int, default=35): Size of the window (in bases) for filtering Outputs vcf_filtered (File) vcf_filtered_index (File)","title":"Gatk4"},{"location":"tasks/gatk4/#split_n_cigar_reads","text":"description Splits reads that contain Ns in their CIGAR strings into multiple reads. external_help https://gatk.broadinstitute.org/hc/en-us/articles/360036858811-SplitNCigarReads outputs {'split_n_reads_bam': 'BAM file with reads split at N CIGAR elements and updated CIGAR strings.', 'split_n_reads_bam_index': 'Index file for the split BAM', 'split_n_reads_bam_md5': 'MD5 checksum for the split BAM'}","title":"split_n_cigar_reads"},{"location":"tasks/gatk4/#inputs","text":"","title":"Inputs"},{"location":"tasks/gatk4/#required","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to with unsplit reads containing Ns in their CIGAR strings. bam_index (File, required ): BAM index file corresponding to the input BAM dict (File, required ): Dictionary file for FASTA format genome fasta (File, required ): Reference genome in FASTA format. Must be uncompressed. fasta_index (File, required ): Index for FASTA format genome interval_list (File, required ): Interval list indicating regions in which to split reads","title":"Required"},{"location":"tasks/gatk4/#defaults","text":"memory_gb (Int, default=25): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=8): Number of cores to allocate for task prefix (String, default=basename(bam,\".bam\") + \".split\"): Prefix for the BAM file. The extension .bam will be added.","title":"Defaults"},{"location":"tasks/gatk4/#outputs","text":"split_n_reads_bam (File) split_n_reads_bam_index (File) split_n_reads_bam_md5 (File)","title":"Outputs"},{"location":"tasks/gatk4/#base_recalibrator","text":"description Generates recalibration report for base quality score recalibration. external_help https://gatk.broadinstitute.org/hc/en-us/articles/360036897372-BaseRecalibratorSpark-BETA outputs {'recalibration_report': 'Recalibration report file'}","title":"base_recalibrator"},{"location":"tasks/gatk4/#inputs_1","text":"","title":"Inputs"},{"location":"tasks/gatk4/#required_1","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file on which to recabilbrate base quality scores bam_index (File, required ): BAM index file corresponding to the input BAM dbSNP_vcf (File, required ): dbSNP VCF file dbSNP_vcf_index (File, required ): dbSNP VCF index file dict (File, required ): Dictionary file for FASTA format genome fasta (File, required ): Reference genome in FASTA format fasta_index (File, required ): Index for FASTA format genome known_indels_sites_VCFs (Array[File], required ): List of VCF files containing known indels known_indels_sites_indices (Array[File], required ): List of VCF index files corresponding to the VCF files in known_indels_sites_VCFs","title":"Required"},{"location":"tasks/gatk4/#defaults_1","text":"memory_gb (Int, default=25): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4): Number of cores to allocate for task outfile_name (String, default=basename(bam,\".bam\") + \".recal.txt\"): Name for the output recalibration report. use_original_quality_scores (Boolean, default=false): Use original quality scores from the input BAM. Default is to use recalibrated quality scores.","title":"Defaults"},{"location":"tasks/gatk4/#outputs_1","text":"recalibration_report (File)","title":"Outputs"},{"location":"tasks/gatk4/#apply_bqsr","text":"description Applies base quality score recalibration to a BAM file. external_help https://gatk.broadinstitute.org/hc/en-us/articles/360040097972-ApplyBQSRSpark-BETA outputs {'recalibrated_bam': 'Recalibrated BAM file', 'recalibrated_bam_index': 'Index file for the recalibrated BAM'}","title":"apply_bqsr"},{"location":"tasks/gatk4/#inputs_2","text":"","title":"Inputs"},{"location":"tasks/gatk4/#required_2","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file on which to apply base quality score recalibration bam_index (File, required ): BAM index file corresponding to the input BAM recalibration_report (File, required ): Recalibration report file","title":"Required"},{"location":"tasks/gatk4/#defaults_2","text":"memory_gb (Int, default=25): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4): Number of cores to allocate for task prefix (String, default=basename(bam,\".bam\")): Prefix for the output recalibrated BAM. The extension .bqsr.bam will be added. use_original_quality_scores (Boolean, default=false): Use original quality scores from the input BAM. Default is to use recalibrated quality scores.","title":"Defaults"},{"location":"tasks/gatk4/#outputs_2","text":"recalibrated_bam (File) recalibrated_bam_index (File)","title":"Outputs"},{"location":"tasks/gatk4/#haplotype_caller","text":"description Calls germline SNPs and indels via local re-assembly of haplotypes. external_help https://gatk.broadinstitute.org/hc/en-us/articles/360037225632-HaplotypeCaller outputs {'vcf': 'VCF file containing called variants', 'vcf_index': 'Index file for the VCF'}","title":"haplotype_caller"},{"location":"tasks/gatk4/#inputs_3","text":"","title":"Inputs"},{"location":"tasks/gatk4/#required_3","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file on which to call variants bam_index (File, required ): BAM index file corresponding to the input BAM dbSNP_vcf (File, required ): dbSNP VCF file dbSNP_vcf_index (File, required ): dbSNP VCF index file dict (File, required ): Dictionary file for FASTA format genome fasta (File, required ): Reference genome in FASTA format fasta_index (File, required ): Index for FASTA format genome interval_list (File, required ); description : Interval list indicating regions in which to call variants; external_help : https://gatk.broadinstitute.org/hc/en-us/articles/360035531852-Intervals-and-interval-lists","title":"Required"},{"location":"tasks/gatk4/#defaults_3","text":"memory_gb (Int, default=25): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4): Number of cores to allocate for task prefix (String, default=basename(bam,\".bam\")): Prefix for the output VCF. The extension .vcf.gz will be added. stand_call_conf (Int, default=20); description : Minimum confidence threshold for calling variants; external_help : https://gatk.broadinstitute.org/hc/en-us/articles/360037225632-HaplotypeCaller#--standard-min-confidence-threshold-for-calling use_soft_clipped_bases (Boolean, default=false): Use soft clipped bases in variant calling. Default is to ignore soft clipped bases.","title":"Defaults"},{"location":"tasks/gatk4/#outputs_3","text":"vcf (File) vcf_index (File)","title":"Outputs"},{"location":"tasks/gatk4/#variant_filtration","text":"description Filters variants based on specified criteria. external_help https://gatk.broadinstitute.org/hc/en-us/articles/360037434691-VariantFiltration outputs {'vcf_filtered': 'Filtered VCF file', 'vcf_filtered_index': 'Index file for the filtered VCF'}","title":"variant_filtration"},{"location":"tasks/gatk4/#inputs_4","text":"","title":"Inputs"},{"location":"tasks/gatk4/#required_4","text":"_runtime (Any, required ) dict (File, required ): Dictionary file for FASTA format genome fasta (File, required ): Reference genome in FASTA format fasta_index (File, required ): Index for FASTA format genome vcf (File, required ): Input VCF format file to filter vcf_index (File, required ): VCF index file corresponding to the input VCF","title":"Required"},{"location":"tasks/gatk4/#defaults_4","text":"cluster (Int, default=3): Number of SNPs that must be present in a window to filter filter_expressions (Array[String], default=[\"FS > 30.0\", \"QD < 2.0\"]); description : Expressions for the filters; external_help : https://gatk.broadinstitute.org/hc/en-us/articles/360037434691-VariantFiltration#--filter-expression filter_names (Array[String], default=[\"FS\", \"QD\"]); description : Names of the filters to apply; external_help : https://gatk.broadinstitute.org/hc/en-us/articles/360037434691-VariantFiltration#--filter-name modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=1): Number of cores to allocate for task prefix (String, default=basename(vcf,\".vcf.gz\")): Prefix for the output filtered VCF. The extension .filtered.vcf.gz will be added. window (Int, default=35): Size of the window (in bases) for filtering","title":"Defaults"},{"location":"tasks/gatk4/#outputs_4","text":"vcf_filtered (File) vcf_filtered_index (File)","title":"Outputs"},{"location":"tasks/htseq/","text":"Homepage count description Performs read counting for a set of features in the input BAM file outputs {'feature_counts': 'A two column headerless TSV file. First column is feature names and second column is counts.'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to generate coverage for gtf (File, required ): Input genomic features in gzipped GTF format to count reads for strandedness (String, required ); description : Strandedness protocol of the RNA-Seq experiment; external_help : https://htseq.readthedocs.io/en/latest/htseqcount.html#cmdoption-htseq-count-s; choices : ['yes', 'reverse', 'no'] Defaults feature_type (String, default=\"exon\"); description : Feature type (3rd column in GTF file) to be used, all features of other type are ignored; common : true idattr (String, default=\"gene_name\"); description : GFF attribute to be used as feature ID; common : true include_custom_header (Boolean, default=false); description : Include a custom header for the output file? This is not an official feature of HTSeq. If true, the first line of the output file will be feature ~{prefix} . This may break downstream tools that expect the typical headerless HTSeq output format.; common : true minaqual (Int, default=10); description : Skip all reads with alignment quality lower than the given minimum value; common : true mode (String, default=\"union\"); description : Mode to handle reads overlapping more than one feature. union is recommended for most use-cases.; external_help : https://htseq.readthedocs.io/en/latest/htseqcount.html#htseq-count-counting-reads-within-features; choices : ['union', 'intersection-strict', 'intersection-nonempty'] modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. nonunique (Boolean, default=false); description : Score reads that align to or are assigned to more than one feature?; common : true pos_sorted (Boolean, default=true); description : Is the BAM position sorted (true) or name sorted (false)?; common : true prefix (String, default=basename(bam,\".bam\")): Prefix for the feature counts file. The extension .feature-counts.txt will be added. secondary_alignments (Boolean, default=false); description : Score secondary alignments (SAM flag 0x100)?; common : true supplementary_alignments (Boolean, default=false); description : Score supplementary/chimeric alignments (SAM flag 0x800)?; common : true Outputs feature_counts (File) calc_tpm description Given a gene counts file and a gene lengths file, calculate Transcripts Per Million (TPM) outputs {'tpm_file': 'Transcripts Per Million (TPM) file. A two column headered TSV file.'} Inputs Required _runtime (Any, required ) counts (File, required ): A two column headerless TSV file with gene names in the first column and counts (as integers) in the second column. Entries starting with '__' will be discarded. Can be generated with the count task. gene_lengths (File, required ): A two column headered TSV file with gene names (matching those in the counts file) in the first column and feature lengths (as integers) in the second column. Can be generated with the calc_gene_lengths task in util.wdl . Defaults prefix (String, default=basename(counts,\".feature-counts.txt\")): Prefix for the TPM file. The extension .TPM.txt will be added. Outputs tpm_file (File)","title":"Htseq"},{"location":"tasks/htseq/#count","text":"description Performs read counting for a set of features in the input BAM file outputs {'feature_counts': 'A two column headerless TSV file. First column is feature names and second column is counts.'}","title":"count"},{"location":"tasks/htseq/#inputs","text":"","title":"Inputs"},{"location":"tasks/htseq/#required","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to generate coverage for gtf (File, required ): Input genomic features in gzipped GTF format to count reads for strandedness (String, required ); description : Strandedness protocol of the RNA-Seq experiment; external_help : https://htseq.readthedocs.io/en/latest/htseqcount.html#cmdoption-htseq-count-s; choices : ['yes', 'reverse', 'no']","title":"Required"},{"location":"tasks/htseq/#defaults","text":"feature_type (String, default=\"exon\"); description : Feature type (3rd column in GTF file) to be used, all features of other type are ignored; common : true idattr (String, default=\"gene_name\"); description : GFF attribute to be used as feature ID; common : true include_custom_header (Boolean, default=false); description : Include a custom header for the output file? This is not an official feature of HTSeq. If true, the first line of the output file will be feature ~{prefix} . This may break downstream tools that expect the typical headerless HTSeq output format.; common : true minaqual (Int, default=10); description : Skip all reads with alignment quality lower than the given minimum value; common : true mode (String, default=\"union\"); description : Mode to handle reads overlapping more than one feature. union is recommended for most use-cases.; external_help : https://htseq.readthedocs.io/en/latest/htseqcount.html#htseq-count-counting-reads-within-features; choices : ['union', 'intersection-strict', 'intersection-nonempty'] modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. nonunique (Boolean, default=false); description : Score reads that align to or are assigned to more than one feature?; common : true pos_sorted (Boolean, default=true); description : Is the BAM position sorted (true) or name sorted (false)?; common : true prefix (String, default=basename(bam,\".bam\")): Prefix for the feature counts file. The extension .feature-counts.txt will be added. secondary_alignments (Boolean, default=false); description : Score secondary alignments (SAM flag 0x100)?; common : true supplementary_alignments (Boolean, default=false); description : Score supplementary/chimeric alignments (SAM flag 0x800)?; common : true","title":"Defaults"},{"location":"tasks/htseq/#outputs","text":"feature_counts (File)","title":"Outputs"},{"location":"tasks/htseq/#calc_tpm","text":"description Given a gene counts file and a gene lengths file, calculate Transcripts Per Million (TPM) outputs {'tpm_file': 'Transcripts Per Million (TPM) file. A two column headered TSV file.'}","title":"calc_tpm"},{"location":"tasks/htseq/#inputs_1","text":"","title":"Inputs"},{"location":"tasks/htseq/#required_1","text":"_runtime (Any, required ) counts (File, required ): A two column headerless TSV file with gene names in the first column and counts (as integers) in the second column. Entries starting with '__' will be discarded. Can be generated with the count task. gene_lengths (File, required ): A two column headered TSV file with gene names (matching those in the counts file) in the first column and feature lengths (as integers) in the second column. Can be generated with the calc_gene_lengths task in util.wdl .","title":"Required"},{"location":"tasks/htseq/#defaults_1","text":"prefix (String, default=basename(counts,\".feature-counts.txt\")): Prefix for the TPM file. The extension .TPM.txt will be added.","title":"Defaults"},{"location":"tasks/htseq/#outputs_1","text":"tpm_file (File)","title":"Outputs"},{"location":"tasks/kraken2/","text":"Homepage download_taxonomy description Downloads the NCBI taxonomy which Kraken2 uses to create a tree and taxon map during the database build outputs {'taxonomy': 'The NCBI taxonomy, which is needed by the build_db task. This output is not human-readable or meant for anything other than building a Kraken2 database.'} Inputs Required _runtime (Any, required ) Defaults protein (Boolean, default=false): Construct a protein database? Outputs taxonomy (File) download_library description Downloads a predefined library of reference genomes from NCBI. Detailed organism list for libraries (except nt) available here outputs {'library': 'A library of reference genomes, which is needed by the build_db task. This output is not human-readable or meant for anything other than building a Kraken2 database.'} Inputs Required _runtime (Any, required ) library_name (String, required ); description : Library to download. Note that protein must equal true if downloading the nr library, and protein must equal false if downloading the UniVec or UniVec_Core library.; choices : ['archaea', 'bacteria', 'plasmid', 'viral', 'human', 'fungi', 'plant', 'protozoa', 'nt', 'nr', 'UniVec', 'UniVec_Core'] Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation, specified in GB. Default disk size is determined dynamically based on library_name . Note that the default sizes are adequate as of April 2023, but new genomes are constantly being added to the NCBI database. More disk space may be required depending on when in the future this task is run. protein (Boolean, default=false): Construct a protein database? Outputs library (File) create_library_from_fastas description Adds custom entries from FASTA files to a Kraken2 DB outputs {'custom_library': 'Kraken2 compatible library, which is needed by the build_db task. This output is not human-readable or meant for anything other than building a Kraken2 database.'} Inputs Required _runtime (Any, required ) fastas_gz (Array[File], required ): Array of gzipped FASTA files. Each FASTA sequence ID must contain either an NCBI accession number or an explicit assignment of the taxonomy ID using kraken:taxid Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. protein (Boolean, default=false): Construct a protein database? Outputs custom_library (File) build_db description Builds a custom Kraken2 database outputs {'built_db': 'A complete Kraken2 database'} Inputs Required _runtime (Any, required ) tarballs (Array[File], required ): Tarballs containing the NCBI taxonomy (generated by the download_taxonomy task) and at least one library (generated by the download_library or create_library_from_fastas task). Tarballs must not have a root directory. Defaults db_name (String, default=\"kraken2_db\"); description : Name for output in compressed, archived format. The suffix .tar.gz will be added.; common : true kmer_len (Int, default=if protein then 15 else 35): K-mer length in bp that will be used to build the database max_db_size_gb (Int, default=-1): Maximum number of GBs for Kraken 2 hash table; if the Kraken 2 estimator determines more would normally be needed, the reference library will be downsampled to fit. minimizer_len (Int, default=if protein then 12 else 31): Minimizer length in bp that will be used to build the database minimizer_spaces (Int, default=if protein then 0 else 7): Number of characters in minimizer that are ignored in comparisons modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4); description : Number of cores to allocate for task; common : true protein (Boolean, default=false): Construct a protein database? use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs built_db (File) kraken description Runs Kraken2 on a pair of fastq files outputs {'report': {'description': 'A Kraken2 summary report', 'external_help': 'https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#sample-report-output-format'}, 'sequences': {'description': 'Detailed Kraken2 output that has been gzipped', 'external_help': 'https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#standard-kraken-output-format'}} Inputs Required _runtime (Any, required ) db (File, required ): Kraken2 database. Can be generated with make-qc-reference.wdl . Must be a tarball without a root directory. read_one_fastq_gz (File, required ): Gzipped FASTQ file with 1st reads in pair read_two_fastq_gz (File, required ): Gzipped FASTQ file with 2nd reads in pair Defaults min_base_quality (Int, default=0): Minimum base quality used in classification modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4); description : Number of cores to allocate for task; common : true prefix (String, default=sub(basename(read_one_fastq_gz),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\")): Prefix for the Kraken2 output files. The extensions .kraken2.txt and .kraken2.sequences.txt.gz will be added. store_sequences (Boolean, default=false); description : Store and output main Kraken2 output in addition to the summary report?; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true use_names (Boolean, default=true): Print scientific names instead of just taxids? Outputs report (File) sequences (File?)","title":"Kraken2"},{"location":"tasks/kraken2/#download_taxonomy","text":"description Downloads the NCBI taxonomy which Kraken2 uses to create a tree and taxon map during the database build outputs {'taxonomy': 'The NCBI taxonomy, which is needed by the build_db task. This output is not human-readable or meant for anything other than building a Kraken2 database.'}","title":"download_taxonomy"},{"location":"tasks/kraken2/#inputs","text":"","title":"Inputs"},{"location":"tasks/kraken2/#required","text":"_runtime (Any, required )","title":"Required"},{"location":"tasks/kraken2/#defaults","text":"protein (Boolean, default=false): Construct a protein database?","title":"Defaults"},{"location":"tasks/kraken2/#outputs","text":"taxonomy (File)","title":"Outputs"},{"location":"tasks/kraken2/#download_library","text":"description Downloads a predefined library of reference genomes from NCBI. Detailed organism list for libraries (except nt) available here outputs {'library': 'A library of reference genomes, which is needed by the build_db task. This output is not human-readable or meant for anything other than building a Kraken2 database.'}","title":"download_library"},{"location":"tasks/kraken2/#inputs_1","text":"","title":"Inputs"},{"location":"tasks/kraken2/#required_1","text":"_runtime (Any, required ) library_name (String, required ); description : Library to download. Note that protein must equal true if downloading the nr library, and protein must equal false if downloading the UniVec or UniVec_Core library.; choices : ['archaea', 'bacteria', 'plasmid', 'viral', 'human', 'fungi', 'plant', 'protozoa', 'nt', 'nr', 'UniVec', 'UniVec_Core']","title":"Required"},{"location":"tasks/kraken2/#defaults_1","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation, specified in GB. Default disk size is determined dynamically based on library_name . Note that the default sizes are adequate as of April 2023, but new genomes are constantly being added to the NCBI database. More disk space may be required depending on when in the future this task is run. protein (Boolean, default=false): Construct a protein database?","title":"Defaults"},{"location":"tasks/kraken2/#outputs_1","text":"library (File)","title":"Outputs"},{"location":"tasks/kraken2/#create_library_from_fastas","text":"description Adds custom entries from FASTA files to a Kraken2 DB outputs {'custom_library': 'Kraken2 compatible library, which is needed by the build_db task. This output is not human-readable or meant for anything other than building a Kraken2 database.'}","title":"create_library_from_fastas"},{"location":"tasks/kraken2/#inputs_2","text":"","title":"Inputs"},{"location":"tasks/kraken2/#required_2","text":"_runtime (Any, required ) fastas_gz (Array[File], required ): Array of gzipped FASTA files. Each FASTA sequence ID must contain either an NCBI accession number or an explicit assignment of the taxonomy ID using kraken:taxid","title":"Required"},{"location":"tasks/kraken2/#defaults_2","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. protein (Boolean, default=false): Construct a protein database?","title":"Defaults"},{"location":"tasks/kraken2/#outputs_2","text":"custom_library (File)","title":"Outputs"},{"location":"tasks/kraken2/#build_db","text":"description Builds a custom Kraken2 database outputs {'built_db': 'A complete Kraken2 database'}","title":"build_db"},{"location":"tasks/kraken2/#inputs_3","text":"","title":"Inputs"},{"location":"tasks/kraken2/#required_3","text":"_runtime (Any, required ) tarballs (Array[File], required ): Tarballs containing the NCBI taxonomy (generated by the download_taxonomy task) and at least one library (generated by the download_library or create_library_from_fastas task). Tarballs must not have a root directory.","title":"Required"},{"location":"tasks/kraken2/#defaults_3","text":"db_name (String, default=\"kraken2_db\"); description : Name for output in compressed, archived format. The suffix .tar.gz will be added.; common : true kmer_len (Int, default=if protein then 15 else 35): K-mer length in bp that will be used to build the database max_db_size_gb (Int, default=-1): Maximum number of GBs for Kraken 2 hash table; if the Kraken 2 estimator determines more would normally be needed, the reference library will be downsampled to fit. minimizer_len (Int, default=if protein then 12 else 31): Minimizer length in bp that will be used to build the database minimizer_spaces (Int, default=if protein then 0 else 7): Number of characters in minimizer that are ignored in comparisons modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4); description : Number of cores to allocate for task; common : true protein (Boolean, default=false): Construct a protein database? use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/kraken2/#outputs_3","text":"built_db (File)","title":"Outputs"},{"location":"tasks/kraken2/#kraken","text":"description Runs Kraken2 on a pair of fastq files outputs {'report': {'description': 'A Kraken2 summary report', 'external_help': 'https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#sample-report-output-format'}, 'sequences': {'description': 'Detailed Kraken2 output that has been gzipped', 'external_help': 'https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#standard-kraken-output-format'}}","title":"kraken"},{"location":"tasks/kraken2/#inputs_4","text":"","title":"Inputs"},{"location":"tasks/kraken2/#required_4","text":"_runtime (Any, required ) db (File, required ): Kraken2 database. Can be generated with make-qc-reference.wdl . Must be a tarball without a root directory. read_one_fastq_gz (File, required ): Gzipped FASTQ file with 1st reads in pair read_two_fastq_gz (File, required ): Gzipped FASTQ file with 2nd reads in pair","title":"Required"},{"location":"tasks/kraken2/#defaults_4","text":"min_base_quality (Int, default=0): Minimum base quality used in classification modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=4); description : Number of cores to allocate for task; common : true prefix (String, default=sub(basename(read_one_fastq_gz),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\")): Prefix for the Kraken2 output files. The extensions .kraken2.txt and .kraken2.sequences.txt.gz will be added. store_sequences (Boolean, default=false); description : Store and output main Kraken2 output in addition to the summary report?; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true use_names (Boolean, default=true): Print scientific names instead of just taxids?","title":"Defaults"},{"location":"tasks/kraken2/#outputs_4","text":"report (File) sequences (File?)","title":"Outputs"},{"location":"tasks/librarian/","text":"librarian librarian description Runs the librarian tool to derive the likely Illumina library preparation protocol used to generate a pair of FASTQ files. help WARNING this tool is not guaranteed to work on all data, and may produce nonsensical results. librarian was trained on a limited set of GEO read data (Gene Expression Oriented). This means the input data should be Paired-End, of mouse or human origin, read length should be >50bp, and derived from a library prep kit that is in the librarian database. This version of librarian has been trained on \"read one\" data of Paired-End sequencing data. It is not intended for use with Single-End data, even though it only accepts a single FASTQ. output {'report': 'A tar archive containing the librarian report and raw data.', 'raw_data': 'The raw data that can be processed by MultiQC.'} Inputs Required _runtime (Any, required ) read_one_fastq (File, required ): Read one FASTQ of a Paired-End sample to analyze. May be uncompressed or gzipped. Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=sub(basename(read_one_fastq),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\") + \".librarian\"): Name of the output tar archive. The extension .tar.gz will be added. Outputs report (File) raw_data (File)","title":"librarian"},{"location":"tasks/librarian/#librarian","text":"","title":"librarian"},{"location":"tasks/librarian/#librarian_1","text":"description Runs the librarian tool to derive the likely Illumina library preparation protocol used to generate a pair of FASTQ files. help WARNING this tool is not guaranteed to work on all data, and may produce nonsensical results. librarian was trained on a limited set of GEO read data (Gene Expression Oriented). This means the input data should be Paired-End, of mouse or human origin, read length should be >50bp, and derived from a library prep kit that is in the librarian database. This version of librarian has been trained on \"read one\" data of Paired-End sequencing data. It is not intended for use with Single-End data, even though it only accepts a single FASTQ. output {'report': 'A tar archive containing the librarian report and raw data.', 'raw_data': 'The raw data that can be processed by MultiQC.'}","title":"librarian"},{"location":"tasks/librarian/#inputs","text":"","title":"Inputs"},{"location":"tasks/librarian/#required","text":"_runtime (Any, required ) read_one_fastq (File, required ): Read one FASTQ of a Paired-End sample to analyze. May be uncompressed or gzipped.","title":"Required"},{"location":"tasks/librarian/#defaults","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=sub(basename(read_one_fastq),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\") + \".librarian\"): Name of the output tar archive. The extension .tar.gz will be added.","title":"Defaults"},{"location":"tasks/librarian/#outputs","text":"report (File) raw_data (File)","title":"Outputs"},{"location":"tasks/md5sum/","text":"Homepage compute_checksum description Generates an MD5 checksum for the input file outputs {'md5sum': 'STDOUT of the md5sum command that has been redirected to a file'} Inputs Required _runtime (Any, required ) file (File, required ): Input file to generate MD5 checksum for Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. Outputs md5sum (File)","title":"Md5sum"},{"location":"tasks/md5sum/#compute_checksum","text":"description Generates an MD5 checksum for the input file outputs {'md5sum': 'STDOUT of the md5sum command that has been redirected to a file'}","title":"compute_checksum"},{"location":"tasks/md5sum/#inputs","text":"","title":"Inputs"},{"location":"tasks/md5sum/#required","text":"_runtime (Any, required ) file (File, required ): Input file to generate MD5 checksum for","title":"Required"},{"location":"tasks/md5sum/#defaults","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.","title":"Defaults"},{"location":"tasks/md5sum/#outputs","text":"md5sum (File)","title":"Outputs"},{"location":"tasks/mosdepth/","text":"Homepage coverage description Runs the Mosdepth tool for calculating coverage outputs {'summary': 'A summary of mean depths per chromosome and within specified regions per chromosome', 'global_dist': 'The $prefix.mosdepth.global.dist.txt file contains a cumulative distribution indicating the proportion of total bases that were covered for at least a given coverage value. It does this for each chromosome, and for the whole genome.', 'region_dist': 'The $prefix.mosdepth.region.dist.txt file contains a cumulative distribution indicating the proportion of total bases in the region(s) defined by coverage_bed that were covered for at least a given coverage value'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to calculate coverage for bam_index (File, required ): BAM index file corresponding to the input BAM Optional coverage_bed (File?): BED file to pass to the -b flag of mosdepth . This will restrict coverage analysis to regions defined by the BED file. Defaults min_mapping_quality (Int, default=20); description : Minimum mapping quality to pass to the -Q flag of mosdepth ; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,'.bam')): Prefix for the mosdepth report files. The extensions .mosdepth.summary.txt , .mosdepth.global.dist.txt and .mosdepth.region.dist.txt will be added. use_fast_mode (Boolean, default=true): Use Mosdepth's 'fast mode'? This enables the -x flag. Outputs summary (File) global_dist (File) region_dist (File?)","title":"Mosdepth"},{"location":"tasks/mosdepth/#coverage","text":"description Runs the Mosdepth tool for calculating coverage outputs {'summary': 'A summary of mean depths per chromosome and within specified regions per chromosome', 'global_dist': 'The $prefix.mosdepth.global.dist.txt file contains a cumulative distribution indicating the proportion of total bases that were covered for at least a given coverage value. It does this for each chromosome, and for the whole genome.', 'region_dist': 'The $prefix.mosdepth.region.dist.txt file contains a cumulative distribution indicating the proportion of total bases in the region(s) defined by coverage_bed that were covered for at least a given coverage value'}","title":"coverage"},{"location":"tasks/mosdepth/#inputs","text":"","title":"Inputs"},{"location":"tasks/mosdepth/#required","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to calculate coverage for bam_index (File, required ): BAM index file corresponding to the input BAM","title":"Required"},{"location":"tasks/mosdepth/#optional","text":"coverage_bed (File?): BED file to pass to the -b flag of mosdepth . This will restrict coverage analysis to regions defined by the BED file.","title":"Optional"},{"location":"tasks/mosdepth/#defaults","text":"min_mapping_quality (Int, default=20); description : Minimum mapping quality to pass to the -Q flag of mosdepth ; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,'.bam')): Prefix for the mosdepth report files. The extensions .mosdepth.summary.txt , .mosdepth.global.dist.txt and .mosdepth.region.dist.txt will be added. use_fast_mode (Boolean, default=true): Use Mosdepth's 'fast mode'? This enables the -x flag.","title":"Defaults"},{"location":"tasks/mosdepth/#outputs","text":"summary (File) global_dist (File) region_dist (File?)","title":"Outputs"},{"location":"tasks/multiqc/","text":"Homepage multiqc description Generates a MultiQC quality control metrics report summary from input QC result files outputs {'multiqc_report': 'A gzipped tar archive of all MultiQC output files'} Inputs Required _runtime (Any, required ) input_files (Array[File], required ): An array of files for MultiQC to compile into a report. Invalid files will be gracefully ignored by MultiQC. prefix (String, required ): A string for the MultiQC output directory: / and .tar.gz Optional config (File?): YAML file for configuring generated report Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. Outputs multiqc_report (File)","title":"Multiqc"},{"location":"tasks/multiqc/#multiqc","text":"description Generates a MultiQC quality control metrics report summary from input QC result files outputs {'multiqc_report': 'A gzipped tar archive of all MultiQC output files'}","title":"multiqc"},{"location":"tasks/multiqc/#inputs","text":"","title":"Inputs"},{"location":"tasks/multiqc/#required","text":"_runtime (Any, required ) input_files (Array[File], required ): An array of files for MultiQC to compile into a report. Invalid files will be gracefully ignored by MultiQC. prefix (String, required ): A string for the MultiQC output directory: / and .tar.gz","title":"Required"},{"location":"tasks/multiqc/#optional","text":"config (File?): YAML file for configuring generated report","title":"Optional"},{"location":"tasks/multiqc/#defaults","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.","title":"Defaults"},{"location":"tasks/multiqc/#outputs","text":"multiqc_report (File)","title":"Outputs"},{"location":"tasks/ngsderive/","text":"Homepage strandedness description Derives the experimental strandedness protocol used to generate the input RNA-Seq BAM file. Reports evidence supporting final results. outputs {'strandedness_file': 'TSV file containing the ngsderive strandedness report', 'strandedness': 'The derived strandedness, in string format'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to derive strandedness for bam_index (File, required ): BAM index file corresponding to the input BAM gene_model (File, required ): Gene model as a GFF/GTF file Defaults min_mapq (Int, default=30); description : Minimum MAPQ to consider for supporting reads; common : true min_reads_per_gene (Int, default=10); description : Filter any genes that don't have at least min_reads_per_gene reads mapping to them; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. num_genes (Int, default=1000); description : How many genes to sample; common : true outfile_name (String, default=basename(bam,\".bam\") + \".strandedness.tsv\"): Name for the strandedness TSV file split_by_rg (Boolean, default=false); description : Contain one entry in the output TSV per read group, in addition to an overall entry; common : true Outputs strandedness_string (String) strandedness_file (File) instrument description Derives the instrument used to sequence the input BAM file. Reports evidence supporting final results. outputs {'instrument_file': 'TSV file containing the ngsderive isntrument report for the input BAM file'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to derive instrument for Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. num_reads (Int, default=10000); description : How many reads to analyze from the start of the file. Any n < 1 to parse whole file.; common : true outfile_name (String, default=basename(bam,\".bam\") + \".instrument.tsv\"): Name for the instrument TSV file Outputs instrument_file (File) instrument_string (String) read_length description Derives the original experimental read length of the input BAM. Reports evidence supporting final results. outputs {'read_length_file': 'TSV file containing the ngsderive readlen report for the input BAM file'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to derive read length for bam_index (File, required ): BAM index file corresponding to the input BAM Defaults majority_vote_cutoff (Float, default=0.7); description : To call a majority readlen, the maximum read length must have at least majority-vote-cutoff % reads in support; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. num_reads (Int, default=-1); description : How many reads to analyze from the start of the file. Any n < 1 to parse whole file.; common : true outfile_name (String, default=basename(bam,\".bam\") + \".readlength.tsv\"): Name for the readlen TSV file Outputs read_length_file (File) encoding description Derives the encoding of the input NGS file(s). Reports evidence supporting final results. outputs {'encoding_file': 'TSV file containing the ngsderive encoding report for all input files', 'inferred_encoding': 'The most permissive encoding found among the input files, in string format'} Inputs Required _runtime (Any, required ) ngs_files (Array[File], required ): An array of FASTQs and/or BAMs for which to derive encoding outfile_name (String, required ): Name for the encoding TSV file Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. num_reads (Int, default=1000000); description : How many reads to analyze from the start of the file(s). Any n < 1 to parse whole file(s).; common : true Outputs inferred_encoding (String) encoding_file (File) junction_annotation description Annotates junctions found in an RNA-Seq BAM as known, novel, or partially novel external_help https://stjudecloud.github.io/ngsderive/subcommands/junction_annotation/ outputs {'junction_summary': 'TSV file containing the ngsderive junction-annotation summary', 'junctions': 'TSV file containing a detailed list of annotated junctions'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to annotate junctions for bam_index (File, required ): BAM index file corresponding to the input BAM gene_model (File, required ): Gene model as a GFF/GTF file Defaults fuzzy_junction_match_range (Int, default=0); description : Consider found splices within +-k bases of a known splice event annotated; common : true min_intron (Int, default=50); description : Minimum size of intron to be considered a splice; common : true min_mapq (Int, default=30); description : Minimum MAPQ to consider for supporting reads; common : true min_reads (Int, default=2); description : Filter any junctions that don't have at least min_reads reads supporting them; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\")): Prefix for the summary TSV and junction files. The extensions .junction_summary.tsv and .junctions.tsv will be added. Outputs junction_summary (File) junctions (File) endedness description Derives the endedness of the input BAM file. Reports evidence for final result. outputs {'endedness_file': 'TSV file containing the ngsderive endedness report'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to derive endedness from Defaults calc_rpt (Boolean, default=false); description : Calculate and output Reads-Per-Template. This will produce a more sophisticated estimate for endedness, but uses substantially more memory (can reach up to 200% of BAM size in memory consumption for some inputs).; common : true lenient (Boolean, default=false); description : Return a zero exit code on unknown results; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by value of calc_rpt and the size of the input. Specified in GB. num_reads (Int, default=-1); description : How many reads to analyze from the start of the file. Any n < 1 to parse whole file.; common : true outfile_name (String, default=basename(bam,\".bam\") + \".endedness.tsv\"): Name for the endedness TSV file paired_deviance (Float, default=0.0); description : Distance from 0.5 split between number of f+l- reads and f-l+ reads allowed to be called 'Paired-End'. Default of 0.0 only appropriate if the whole file is being processed.; common : true round_rpt (Boolean, default=false); description : Round RPT to the nearest INT before comparing to expected values. Appropriate if using --num-reads > 0.; common : true split_by_rg (Boolean, default=false); description : Contain one entry per read group; common : true Outputs endedness_file (File)","title":"Ngsderive"},{"location":"tasks/ngsderive/#strandedness","text":"description Derives the experimental strandedness protocol used to generate the input RNA-Seq BAM file. Reports evidence supporting final results. outputs {'strandedness_file': 'TSV file containing the ngsderive strandedness report', 'strandedness': 'The derived strandedness, in string format'}","title":"strandedness"},{"location":"tasks/ngsderive/#inputs","text":"","title":"Inputs"},{"location":"tasks/ngsderive/#required","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to derive strandedness for bam_index (File, required ): BAM index file corresponding to the input BAM gene_model (File, required ): Gene model as a GFF/GTF file","title":"Required"},{"location":"tasks/ngsderive/#defaults","text":"min_mapq (Int, default=30); description : Minimum MAPQ to consider for supporting reads; common : true min_reads_per_gene (Int, default=10); description : Filter any genes that don't have at least min_reads_per_gene reads mapping to them; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. num_genes (Int, default=1000); description : How many genes to sample; common : true outfile_name (String, default=basename(bam,\".bam\") + \".strandedness.tsv\"): Name for the strandedness TSV file split_by_rg (Boolean, default=false); description : Contain one entry in the output TSV per read group, in addition to an overall entry; common : true","title":"Defaults"},{"location":"tasks/ngsderive/#outputs","text":"strandedness_string (String) strandedness_file (File)","title":"Outputs"},{"location":"tasks/ngsderive/#instrument","text":"description Derives the instrument used to sequence the input BAM file. Reports evidence supporting final results. outputs {'instrument_file': 'TSV file containing the ngsderive isntrument report for the input BAM file'}","title":"instrument"},{"location":"tasks/ngsderive/#inputs_1","text":"","title":"Inputs"},{"location":"tasks/ngsderive/#required_1","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to derive instrument for","title":"Required"},{"location":"tasks/ngsderive/#defaults_1","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. num_reads (Int, default=10000); description : How many reads to analyze from the start of the file. Any n < 1 to parse whole file.; common : true outfile_name (String, default=basename(bam,\".bam\") + \".instrument.tsv\"): Name for the instrument TSV file","title":"Defaults"},{"location":"tasks/ngsderive/#outputs_1","text":"instrument_file (File) instrument_string (String)","title":"Outputs"},{"location":"tasks/ngsderive/#read_length","text":"description Derives the original experimental read length of the input BAM. Reports evidence supporting final results. outputs {'read_length_file': 'TSV file containing the ngsderive readlen report for the input BAM file'}","title":"read_length"},{"location":"tasks/ngsderive/#inputs_2","text":"","title":"Inputs"},{"location":"tasks/ngsderive/#required_2","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to derive read length for bam_index (File, required ): BAM index file corresponding to the input BAM","title":"Required"},{"location":"tasks/ngsderive/#defaults_2","text":"majority_vote_cutoff (Float, default=0.7); description : To call a majority readlen, the maximum read length must have at least majority-vote-cutoff % reads in support; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. num_reads (Int, default=-1); description : How many reads to analyze from the start of the file. Any n < 1 to parse whole file.; common : true outfile_name (String, default=basename(bam,\".bam\") + \".readlength.tsv\"): Name for the readlen TSV file","title":"Defaults"},{"location":"tasks/ngsderive/#outputs_2","text":"read_length_file (File)","title":"Outputs"},{"location":"tasks/ngsderive/#encoding","text":"description Derives the encoding of the input NGS file(s). Reports evidence supporting final results. outputs {'encoding_file': 'TSV file containing the ngsderive encoding report for all input files', 'inferred_encoding': 'The most permissive encoding found among the input files, in string format'}","title":"encoding"},{"location":"tasks/ngsderive/#inputs_3","text":"","title":"Inputs"},{"location":"tasks/ngsderive/#required_3","text":"_runtime (Any, required ) ngs_files (Array[File], required ): An array of FASTQs and/or BAMs for which to derive encoding outfile_name (String, required ): Name for the encoding TSV file","title":"Required"},{"location":"tasks/ngsderive/#defaults_3","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. num_reads (Int, default=1000000); description : How many reads to analyze from the start of the file(s). Any n < 1 to parse whole file(s).; common : true","title":"Defaults"},{"location":"tasks/ngsderive/#outputs_3","text":"inferred_encoding (String) encoding_file (File)","title":"Outputs"},{"location":"tasks/ngsderive/#junction_annotation","text":"description Annotates junctions found in an RNA-Seq BAM as known, novel, or partially novel external_help https://stjudecloud.github.io/ngsderive/subcommands/junction_annotation/ outputs {'junction_summary': 'TSV file containing the ngsderive junction-annotation summary', 'junctions': 'TSV file containing a detailed list of annotated junctions'}","title":"junction_annotation"},{"location":"tasks/ngsderive/#inputs_4","text":"","title":"Inputs"},{"location":"tasks/ngsderive/#required_4","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to annotate junctions for bam_index (File, required ): BAM index file corresponding to the input BAM gene_model (File, required ): Gene model as a GFF/GTF file","title":"Required"},{"location":"tasks/ngsderive/#defaults_4","text":"fuzzy_junction_match_range (Int, default=0); description : Consider found splices within +-k bases of a known splice event annotated; common : true min_intron (Int, default=50); description : Minimum size of intron to be considered a splice; common : true min_mapq (Int, default=30); description : Minimum MAPQ to consider for supporting reads; common : true min_reads (Int, default=2); description : Filter any junctions that don't have at least min_reads reads supporting them; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\")): Prefix for the summary TSV and junction files. The extensions .junction_summary.tsv and .junctions.tsv will be added.","title":"Defaults"},{"location":"tasks/ngsderive/#outputs_4","text":"junction_summary (File) junctions (File)","title":"Outputs"},{"location":"tasks/ngsderive/#endedness","text":"description Derives the endedness of the input BAM file. Reports evidence for final result. outputs {'endedness_file': 'TSV file containing the ngsderive endedness report'}","title":"endedness"},{"location":"tasks/ngsderive/#inputs_5","text":"","title":"Inputs"},{"location":"tasks/ngsderive/#required_5","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to derive endedness from","title":"Required"},{"location":"tasks/ngsderive/#defaults_5","text":"calc_rpt (Boolean, default=false); description : Calculate and output Reads-Per-Template. This will produce a more sophisticated estimate for endedness, but uses substantially more memory (can reach up to 200% of BAM size in memory consumption for some inputs).; common : true lenient (Boolean, default=false); description : Return a zero exit code on unknown results; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by value of calc_rpt and the size of the input. Specified in GB. num_reads (Int, default=-1); description : How many reads to analyze from the start of the file. Any n < 1 to parse whole file.; common : true outfile_name (String, default=basename(bam,\".bam\") + \".endedness.tsv\"): Name for the endedness TSV file paired_deviance (Float, default=0.0); description : Distance from 0.5 split between number of f+l- reads and f-l+ reads allowed to be called 'Paired-End'. Default of 0.0 only appropriate if the whole file is being processed.; common : true round_rpt (Boolean, default=false); description : Round RPT to the nearest INT before comparing to expected values. Appropriate if using --num-reads > 0.; common : true split_by_rg (Boolean, default=false); description : Contain one entry per read group; common : true","title":"Defaults"},{"location":"tasks/ngsderive/#outputs_5","text":"endedness_file (File)","title":"Outputs"},{"location":"tasks/picard/","text":"Homepage TODO looks like this file was missed when converting from a memory_gb parameter to a \"softcoded\" runtime block. When moving those, check tests/tools/test_picard.yaml . mark_duplicates description Marks duplicate reads in the input BAM file using Picard external_help https://gatk.broadinstitute.org/hc/en-us/articles/360037052812-MarkDuplicates-Picard- help For non-primary reads and unmapped mates of duplicate reads to be included in duplicate analysis, input BAM must be collated. See external_help for more information. outputs {'duplicate_marked_bam': 'The input BAM with computationally determined duplicates marked.', 'duplicate_marked_bam_index': 'The .bai BAM index file associated with duplicate_marked_bam ', 'duplicate_marked_bam_md5': 'The md5sum of duplicate_marked_bam ', 'mark_duplicates_metrics': {'description': 'The METRICS_FILE result of picard MarkDuplicates ', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#DuplicationMetrics'}} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file in which to mark duplicates Defaults clear_dt (Boolean, default=true): Clear the DT tag from the input BAM? For increased performance, if the input BAM does not have the DT tag, set to false . create_bam (Boolean, default=true); description : Enable BAM creation (true)? Or only output MarkDuplicates metrics (false)?; common : true duplicate_scoring_strategy (String, default=\"SUM_OF_BASE_QUALITIES\"); description : Strategy for scoring duplicates.; choices : ['SUM_OF_BASE_QUALITIES', 'TOTAL_MAPPED_REFERENCE_LENGTH', 'RANDOM'] modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from the default memory allocation. Default memory allocation is determined by the size of the input BAM. Specified in GB. optical_distance (Int, default=0): Maximum distance between read coordinates to consider them optical duplicates. If 0 , then optical duplicate marking is disabled. Suggested settings of 100 for unpatterned versions of the Illumina platform (e.g. HiSeq) or 2500 for patterned flowcell models (e.g. NovaSeq). Calculation of distance depends on coordinate data embedded in the read names, typically produced by the Illumina sequencing machines. Optical duplicate detection will not work on non-standard names without modifying read_name_regex . prefix (String, default=basename(bam,\".bam\") + \".MarkDuplicates\"): Prefix for the MarkDuplicates result files. The extensions .bam , .bam.bai , .bam.md5 , and .metrics.txt will be added. read_name_regex (String, default=\"^[!-9;-?A-~:]+:([!-9;-?A-~]+):([0-9]+):([0-9]+)$\"): Regular expression for extracting tile names, x coordinates, and y coordinates from read names. The default works for typical Illumina read names. remove_duplicates (Boolean, default=false): Remove duplicate reads from the output BAM? If true , the output BAM will not contain any duplicate reads. remove_sequencing_duplicates (Boolean, default=false): Remove sequencing duplicates (i.e. optical duplicates) from the output BAM? If true , the output BAM will not contain any sequencing duplicates (optical duplicates). tagging_policy (String, default=\"All\"); description : Tagging policy for the output BAM.; choices : ['DontTag', 'OpticalOnly', 'All'] validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT Outputs duplicate_marked_bam (File?) duplicate_marked_bam_index (File?) duplicate_marked_bam_md5 (File?) mark_duplicates_metrics (File) validate_bam description Validates the input BAM file for correct formatting using Picard external_help https://gatk.broadinstitute.org/hc/en-us/articles/360057440611-ValidateSamFile-Picard- outputs {'validate_report': 'Validation report produced by picard ValidateSamFile . Validation warnings and errors are logged.', 'validated_bam': 'The unmodified input BAM after it has been succesfully validated'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to validate Optional reference_fasta (File?): Reference genome in FASTA format. Presence of the reference FASTA allows for NM tag validation. Defaults ignore_list (Array[String], default=[]); description : List of Picard errors and warnings to ignore. Possible values can be found on the GATK website (see external_help ).; external_help : https://gatk.broadinstitute.org/hc/en-us/articles/360035891231-Errors-in-SAM-or-BAM-files-can-be-diagnosed-with-ValidateSamFile; common : true index_validation_stringency_less_exhaustive (Boolean, default=false): Set INDEX_VALIDATION_STRINGENCY=LESS_EXHAUSTIVE ? max_errors (Int, default=2147483647): Set the value of MAX_OUTPUT for picard ValidateSamFile . The Picard default is 100, a lower number can enable fast fail behavior memory_gb (Int, default=16): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. outfile_name (String, default=basename(bam,\".bam\") + \".ValidateSamFile.txt\"): Name for the ValidateSamFile report file succeed_on_errors (Boolean, default=false); description : Succeed the task even if errors and/or warnings are detected; common : true succeed_on_warnings (Boolean, default=true); description : Succeed the task if warnings are detected and there are no errors. Overridden by succeed_on_errors ; common : true summary_mode (Boolean, default=false); description : Enable SUMMARY mode?; common : true validation_stringency (String, default=\"LENIENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT Outputs validate_report (File) sort description Sorts the input BAM file external_help https://gatk.broadinstitute.org/hc/en-us/articles/360036510732-SortSam-Picard- outputs {'sorted_bam': 'The input BAM after it has been sorted according to sort_order ', 'sorted_bam_index': 'The .bai BAM index file associated with sorted_bam ', 'sorted_bam_md5': 'The md5sum of sorted_bam '} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to sort Defaults memory_gb (Int, default=25): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\") + \".sorted\"): Prefix for the sorted BAM file and accessory files. The extensions .bam , .bam.bai , and .bam.md5 will be added. sort_order (String, default=\"coordinate\"); description : Order by which to sort the input BAM; choices : ['queryname', 'coordinate', 'duplicate']; common : true validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT Outputs sorted_bam (File) sorted_bam_index (File) sorted_bam_md5 (File) merge_sam_files description Merges the input BAM files into a single BAM file. All input BAMs are assumed to be sorted according to sort_order . external_help https://gatk.broadinstitute.org/hc/en-us/articles/360057440751-MergeSamFiles-Picard- outputs {'merged_bam': 'The BAM resulting from merging all the input BAMs', 'merged_bam_index': 'The .bai BAM index file associated with merged_bam ', 'merged_bam_md5': 'The md5sum of merged_bam '} Inputs Required _runtime (Any, required ) bams (Array[File], required ): Input BAMs to merge. All BAMs are assumed to be sorted according to sort_order . prefix (String, required ): Prefix for the merged BAM file and accessory files. The extensions .bam , .bam.bai , and .bam.md5 will be added. Defaults memory_gb (Int, default=40): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. sort_order (String, default=\"coordinate\"); description : Sort order for the output merged BAM. It is assumed all input BAMs share this order.; choices : ['unsorted', 'queryname', 'coordinate', 'duplicate', 'unknown']; common : true threading (Boolean, default=true): Option to create a background thread to encode, compress and write to disk the output file. The threaded version uses about 20% more CPU and decreases runtime by ~20% when writing out a compressed BAM file. Sets runtime.cpu = 2 if true . runtime.cpu = 1 if false . validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT Outputs merged_bam (File) merged_bam_index (File) merged_bam_md5 (File) clean_sam description Cleans the input BAM file. Cleans soft-clipping beyond end-of-reference, sets MAPQ=0 for unmapped reads. external_help https://gatk.broadinstitute.org/hc/en-us/articles/360036885571-CleanSam-Picard- outputs {'cleaned_bam': 'A cleaned version of the input BAM', 'cleaned_bam_index': 'The .bai BAM index file associated with cleaned_bam ', 'cleaned_bam_md5': 'The md5sum of cleaned_bam '} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to clean Defaults memory_gb (Int, default=25): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\") + \".cleaned\"): Prefix for the cleaned BAM file and accessory files. The extensions .bam , .bam.bai , and .bam.md5 will be added. validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT Outputs cleaned_bam (File) cleaned_bam_index (File) cleaned_bam_md5 (File) collect_wgs_metrics description Runs picard CollectWgsMetrics to collect metrics about the fractions of reads that pass base- and mapping-quality filters as well as coverage (read-depth) levels external_help https://gatk.broadinstitute.org/hc/en-us/articles/360037226132-CollectWgsMetrics-Picard- outputs {'wgs_metrics': {'description': 'Output report of picard CollectWgsMetrics ', 'external_help': 'https://broadinstitute.github.io/picard/picard-metric-definitions.html#CollectWgsMetrics.WgsMetrics'}} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file for which to calculate WGS metrics reference_fasta (File, required ): Gzipped reference genome in FASTA format Defaults memory_gb (Int, default=12): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. outfile_name (String, default=basename(bam,\".bam\") + \".CollectWgsMetrics.txt\"): Name for the metrics result file validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT Outputs wgs_metrics (File) collect_alignment_summary_metrics description Runs picard CollectAlignmentSummaryMetrics to calculate metrics detailing the quality of the read alignments as well as the proportion of the reads that passed machine signal-to-noise threshold quality filters external_help https://gatk.broadinstitute.org/hc/en-us/articles/360040507751-CollectAlignmentSummaryMetrics-Picard- outputs {'alignment_metrics': {'description': 'The text file output of CollectAlignmentSummaryMetrics ', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#AlignmentSummaryMetrics'}, 'alignment_metrics_pdf': 'The PDF file output of CollectAlignmentSummaryMetrics '} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file for which to calculate alignment metrics Defaults memory_gb (Int, default=8): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\") + \".CollectAlignmentSummaryMetrics\"): Prefix for the output report files. The extensions .txt and .pdf will be added. validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT Outputs alignment_metrics (File) alignment_metrics_pdf (File) collect_gc_bias_metrics description Runs picard CollectGcBiasMetrics to collect information about the relative proportions of guanine (G) and cytosine (C) nucleotides external_help https://gatk.broadinstitute.org/hc/en-us/articles/360037593931-CollectGcBiasMetrics-Picard- outputs {'gc_bias_metrics': {'description': 'The full text file output of CollectGcBiasMetrics ', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#GcBiasDetailMetrics'}, 'gc_bias_metrics_summary': {'description': 'The summary text file output of CollectGcBiasMetrics ', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#GcBiasSummaryMetrics'}, 'gc_bias_metrics_pdf': 'The PDF file output of CollectGcBiasMetrics '} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file for which to calculate GC bias metrics reference_fasta (File, required ): Reference sequences in FASTA format Defaults memory_gb (Int, default=8): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\") + \".CollectGcBiasMetrics\"): Prefix for the output report files. The extensions .txt , .summary.txt , and .pdf will be added. validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT Outputs gc_bias_metrics (File) gc_bias_metrics_summary (File) gc_bias_metrics_pdf (File) collect_insert_size_metrics description Runs picard CollectInsertSizeMetrics to collect metrics for validating library construction including the insert size distribution and read orientation of Paired-End libraries external_help https://gatk.broadinstitute.org/hc/en-us/articles/360037055772-CollectInsertSizeMetrics-Picard- outputs {'insert_size_metrics': {'description': 'The text file output of CollectInsertSizeMetrics ', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#InsertSizeMetrics'}, 'insert_size_metrics_pdf': 'The PDF file output of CollectInsertSizeMetrics '} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file for which to calculate insert size metrics Defaults memory_gb (Int, default=8): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\") + \".CollectInsertSizeMetrics\"): Prefix for the output report files. The extensions .txt and .pdf will be added. validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT Outputs insert_size_metrics (File) insert_size_metrics_pdf (File) quality_score_distribution description Runs picard QualityScoreDistribution to calculate the range of quality scores and creates an accompanying chart external_help https://gatk.broadinstitute.org/hc/en-us/articles/360037057312-QualityScoreDistribution-Picard- outputs {'quality_score_distribution_txt': 'The text file output of QualityScoreDistribution ', 'quality_score_distribution_pdf': 'The PDF file output of QualityScoreDistribution '} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file for which to calculate quality score distribution Defaults memory_gb (Int, default=8): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\") + \".QualityScoreDistribution\"): Prefix for the output report files. The extensions .txt and .pdf will be added. validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT Outputs quality_score_distribution_txt (File) quality_score_distribution_pdf (File) bam_to_fastq description [Deprecated] This WDL task converts the input BAM file into FASTQ format files. This task has been deprecated in favor of samtools.bam_to_fastq which is more performant and doesn't error on 'illegal mate states'. deprecated true Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to convert to FASTQ Defaults memory_gb (Int, default=56): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. paired (Boolean, default=true); description : Is the data Paired-End (true) or Single-End (false)?; common : true prefix (String, default=basename(bam,\".bam\")): Prefix for the file. The extension <extension> will be added. Outputs read_one_fastq_gz (File) read_two_fastq_gz (File?) merge_vcfs description Merges the input VCF files into a single VCF file external_help https://gatk.broadinstitute.org/hc/en-us/articles/360036713331-MergeVcfs-Picard outputs {'output_vcf': 'The merged VCF file', 'output_vcf_index': 'The index file associated with the merged VCF file'} Inputs Required _runtime (Any, required ) output_vcf_name (String, required ): Name for the merged VCF file vcfs (Array[File], required ): Input VCF format files to merge. May be gzipped or binary compressed. vcfs_indexes (Array[File], required ): Index files associated with the input VCF files Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. Outputs output_vcf (File) output_vcf_index (File) scatter_interval_list description Splits an interval list into smaller interval lists for parallel processing external_help https://gatk.broadinstitute.org/hc/en-us/articles/360036897212-IntervalListTools-Picard outputs {'out': 'The split interval lists', 'interval_count': 'The number of split interval lists'} Inputs Required _runtime (Any, required ) interval_list (File, required ): Input interval list to split scatter_count (Int, required ): Number of interval lists to create Defaults sort (Boolean, default=true): Should the output interval lists be sorted? Sorts by coordinate. subdivision_mode (String, default=\"BALANCING_WITHOUT_INTERVAL_SUBDIVISION_WITH_OVERFLOW\"); description : How to subdivide the intervals; choices : ['BALANCING_WITHOUT_INTERVAL_SUBDIVISION_WITH_OVERFLOW', 'INTERVAL_SUBDIVISION', 'BALANCING_WITHOUT_INTERVAL_SUBDIVISION'] unique (Boolean, default=true): Should the output interval lists contain unique intervals? Implies sort=true. Merges overlapping or adjacent intervals. Outputs interval_lists_scatter (Array[File]) interval_count (Int) create_sequence_dictionary description Creates a sequence dictionary for the input FASTA file using Picard external_help https://gatk.broadinstitute.org/hc/en-us/articles/13832748622491-CreateSequenceDictionary-Picard- outputs {'dictionary': 'Sequence dictionary produced by picard CreateSequenceDictionary .'} Inputs Required _runtime (Any, required ) fasta (File, required ): Input FASTA format file from which to create dictionary Optional assembly_name (String?): Value to put in AS field of sequence dictionary fasta_url (String?): Value to put in UR field of sequence dictionary species (String?): Value to put in SP field of sequence dictionary Defaults memory_gb (Int, default=16): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. outfile_name (String, default=basename(fasta,\".fa\") + \".dict\"): Name for the CreateSequenceDictionary dictionary file Outputs dictionary (File)","title":"Picard"},{"location":"tasks/picard/#mark_duplicates","text":"description Marks duplicate reads in the input BAM file using Picard external_help https://gatk.broadinstitute.org/hc/en-us/articles/360037052812-MarkDuplicates-Picard- help For non-primary reads and unmapped mates of duplicate reads to be included in duplicate analysis, input BAM must be collated. See external_help for more information. outputs {'duplicate_marked_bam': 'The input BAM with computationally determined duplicates marked.', 'duplicate_marked_bam_index': 'The .bai BAM index file associated with duplicate_marked_bam ', 'duplicate_marked_bam_md5': 'The md5sum of duplicate_marked_bam ', 'mark_duplicates_metrics': {'description': 'The METRICS_FILE result of picard MarkDuplicates ', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#DuplicationMetrics'}}","title":"mark_duplicates"},{"location":"tasks/picard/#inputs","text":"","title":"Inputs"},{"location":"tasks/picard/#required","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file in which to mark duplicates","title":"Required"},{"location":"tasks/picard/#defaults","text":"clear_dt (Boolean, default=true): Clear the DT tag from the input BAM? For increased performance, if the input BAM does not have the DT tag, set to false . create_bam (Boolean, default=true); description : Enable BAM creation (true)? Or only output MarkDuplicates metrics (false)?; common : true duplicate_scoring_strategy (String, default=\"SUM_OF_BASE_QUALITIES\"); description : Strategy for scoring duplicates.; choices : ['SUM_OF_BASE_QUALITIES', 'TOTAL_MAPPED_REFERENCE_LENGTH', 'RANDOM'] modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from the default memory allocation. Default memory allocation is determined by the size of the input BAM. Specified in GB. optical_distance (Int, default=0): Maximum distance between read coordinates to consider them optical duplicates. If 0 , then optical duplicate marking is disabled. Suggested settings of 100 for unpatterned versions of the Illumina platform (e.g. HiSeq) or 2500 for patterned flowcell models (e.g. NovaSeq). Calculation of distance depends on coordinate data embedded in the read names, typically produced by the Illumina sequencing machines. Optical duplicate detection will not work on non-standard names without modifying read_name_regex . prefix (String, default=basename(bam,\".bam\") + \".MarkDuplicates\"): Prefix for the MarkDuplicates result files. The extensions .bam , .bam.bai , .bam.md5 , and .metrics.txt will be added. read_name_regex (String, default=\"^[!-9;-?A-~:]+:([!-9;-?A-~]+):([0-9]+):([0-9]+)$\"): Regular expression for extracting tile names, x coordinates, and y coordinates from read names. The default works for typical Illumina read names. remove_duplicates (Boolean, default=false): Remove duplicate reads from the output BAM? If true , the output BAM will not contain any duplicate reads. remove_sequencing_duplicates (Boolean, default=false): Remove sequencing duplicates (i.e. optical duplicates) from the output BAM? If true , the output BAM will not contain any sequencing duplicates (optical duplicates). tagging_policy (String, default=\"All\"); description : Tagging policy for the output BAM.; choices : ['DontTag', 'OpticalOnly', 'All'] validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT","title":"Defaults"},{"location":"tasks/picard/#outputs","text":"duplicate_marked_bam (File?) duplicate_marked_bam_index (File?) duplicate_marked_bam_md5 (File?) mark_duplicates_metrics (File)","title":"Outputs"},{"location":"tasks/picard/#validate_bam","text":"description Validates the input BAM file for correct formatting using Picard external_help https://gatk.broadinstitute.org/hc/en-us/articles/360057440611-ValidateSamFile-Picard- outputs {'validate_report': 'Validation report produced by picard ValidateSamFile . Validation warnings and errors are logged.', 'validated_bam': 'The unmodified input BAM after it has been succesfully validated'}","title":"validate_bam"},{"location":"tasks/picard/#inputs_1","text":"","title":"Inputs"},{"location":"tasks/picard/#required_1","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to validate","title":"Required"},{"location":"tasks/picard/#optional","text":"reference_fasta (File?): Reference genome in FASTA format. Presence of the reference FASTA allows for NM tag validation.","title":"Optional"},{"location":"tasks/picard/#defaults_1","text":"ignore_list (Array[String], default=[]); description : List of Picard errors and warnings to ignore. Possible values can be found on the GATK website (see external_help ).; external_help : https://gatk.broadinstitute.org/hc/en-us/articles/360035891231-Errors-in-SAM-or-BAM-files-can-be-diagnosed-with-ValidateSamFile; common : true index_validation_stringency_less_exhaustive (Boolean, default=false): Set INDEX_VALIDATION_STRINGENCY=LESS_EXHAUSTIVE ? max_errors (Int, default=2147483647): Set the value of MAX_OUTPUT for picard ValidateSamFile . The Picard default is 100, a lower number can enable fast fail behavior memory_gb (Int, default=16): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. outfile_name (String, default=basename(bam,\".bam\") + \".ValidateSamFile.txt\"): Name for the ValidateSamFile report file succeed_on_errors (Boolean, default=false); description : Succeed the task even if errors and/or warnings are detected; common : true succeed_on_warnings (Boolean, default=true); description : Succeed the task if warnings are detected and there are no errors. Overridden by succeed_on_errors ; common : true summary_mode (Boolean, default=false); description : Enable SUMMARY mode?; common : true validation_stringency (String, default=\"LENIENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT","title":"Defaults"},{"location":"tasks/picard/#outputs_1","text":"validate_report (File)","title":"Outputs"},{"location":"tasks/picard/#sort","text":"description Sorts the input BAM file external_help https://gatk.broadinstitute.org/hc/en-us/articles/360036510732-SortSam-Picard- outputs {'sorted_bam': 'The input BAM after it has been sorted according to sort_order ', 'sorted_bam_index': 'The .bai BAM index file associated with sorted_bam ', 'sorted_bam_md5': 'The md5sum of sorted_bam '}","title":"sort"},{"location":"tasks/picard/#inputs_2","text":"","title":"Inputs"},{"location":"tasks/picard/#required_2","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to sort","title":"Required"},{"location":"tasks/picard/#defaults_2","text":"memory_gb (Int, default=25): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\") + \".sorted\"): Prefix for the sorted BAM file and accessory files. The extensions .bam , .bam.bai , and .bam.md5 will be added. sort_order (String, default=\"coordinate\"); description : Order by which to sort the input BAM; choices : ['queryname', 'coordinate', 'duplicate']; common : true validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT","title":"Defaults"},{"location":"tasks/picard/#outputs_2","text":"sorted_bam (File) sorted_bam_index (File) sorted_bam_md5 (File)","title":"Outputs"},{"location":"tasks/picard/#merge_sam_files","text":"description Merges the input BAM files into a single BAM file. All input BAMs are assumed to be sorted according to sort_order . external_help https://gatk.broadinstitute.org/hc/en-us/articles/360057440751-MergeSamFiles-Picard- outputs {'merged_bam': 'The BAM resulting from merging all the input BAMs', 'merged_bam_index': 'The .bai BAM index file associated with merged_bam ', 'merged_bam_md5': 'The md5sum of merged_bam '}","title":"merge_sam_files"},{"location":"tasks/picard/#inputs_3","text":"","title":"Inputs"},{"location":"tasks/picard/#required_3","text":"_runtime (Any, required ) bams (Array[File], required ): Input BAMs to merge. All BAMs are assumed to be sorted according to sort_order . prefix (String, required ): Prefix for the merged BAM file and accessory files. The extensions .bam , .bam.bai , and .bam.md5 will be added.","title":"Required"},{"location":"tasks/picard/#defaults_3","text":"memory_gb (Int, default=40): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. sort_order (String, default=\"coordinate\"); description : Sort order for the output merged BAM. It is assumed all input BAMs share this order.; choices : ['unsorted', 'queryname', 'coordinate', 'duplicate', 'unknown']; common : true threading (Boolean, default=true): Option to create a background thread to encode, compress and write to disk the output file. The threaded version uses about 20% more CPU and decreases runtime by ~20% when writing out a compressed BAM file. Sets runtime.cpu = 2 if true . runtime.cpu = 1 if false . validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT","title":"Defaults"},{"location":"tasks/picard/#outputs_3","text":"merged_bam (File) merged_bam_index (File) merged_bam_md5 (File)","title":"Outputs"},{"location":"tasks/picard/#clean_sam","text":"description Cleans the input BAM file. Cleans soft-clipping beyond end-of-reference, sets MAPQ=0 for unmapped reads. external_help https://gatk.broadinstitute.org/hc/en-us/articles/360036885571-CleanSam-Picard- outputs {'cleaned_bam': 'A cleaned version of the input BAM', 'cleaned_bam_index': 'The .bai BAM index file associated with cleaned_bam ', 'cleaned_bam_md5': 'The md5sum of cleaned_bam '}","title":"clean_sam"},{"location":"tasks/picard/#inputs_4","text":"","title":"Inputs"},{"location":"tasks/picard/#required_4","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to clean","title":"Required"},{"location":"tasks/picard/#defaults_4","text":"memory_gb (Int, default=25): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\") + \".cleaned\"): Prefix for the cleaned BAM file and accessory files. The extensions .bam , .bam.bai , and .bam.md5 will be added. validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT","title":"Defaults"},{"location":"tasks/picard/#outputs_4","text":"cleaned_bam (File) cleaned_bam_index (File) cleaned_bam_md5 (File)","title":"Outputs"},{"location":"tasks/picard/#collect_wgs_metrics","text":"description Runs picard CollectWgsMetrics to collect metrics about the fractions of reads that pass base- and mapping-quality filters as well as coverage (read-depth) levels external_help https://gatk.broadinstitute.org/hc/en-us/articles/360037226132-CollectWgsMetrics-Picard- outputs {'wgs_metrics': {'description': 'Output report of picard CollectWgsMetrics ', 'external_help': 'https://broadinstitute.github.io/picard/picard-metric-definitions.html#CollectWgsMetrics.WgsMetrics'}}","title":"collect_wgs_metrics"},{"location":"tasks/picard/#inputs_5","text":"","title":"Inputs"},{"location":"tasks/picard/#required_5","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file for which to calculate WGS metrics reference_fasta (File, required ): Gzipped reference genome in FASTA format","title":"Required"},{"location":"tasks/picard/#defaults_5","text":"memory_gb (Int, default=12): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. outfile_name (String, default=basename(bam,\".bam\") + \".CollectWgsMetrics.txt\"): Name for the metrics result file validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT","title":"Defaults"},{"location":"tasks/picard/#outputs_5","text":"wgs_metrics (File)","title":"Outputs"},{"location":"tasks/picard/#collect_alignment_summary_metrics","text":"description Runs picard CollectAlignmentSummaryMetrics to calculate metrics detailing the quality of the read alignments as well as the proportion of the reads that passed machine signal-to-noise threshold quality filters external_help https://gatk.broadinstitute.org/hc/en-us/articles/360040507751-CollectAlignmentSummaryMetrics-Picard- outputs {'alignment_metrics': {'description': 'The text file output of CollectAlignmentSummaryMetrics ', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#AlignmentSummaryMetrics'}, 'alignment_metrics_pdf': 'The PDF file output of CollectAlignmentSummaryMetrics '}","title":"collect_alignment_summary_metrics"},{"location":"tasks/picard/#inputs_6","text":"","title":"Inputs"},{"location":"tasks/picard/#required_6","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file for which to calculate alignment metrics","title":"Required"},{"location":"tasks/picard/#defaults_6","text":"memory_gb (Int, default=8): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\") + \".CollectAlignmentSummaryMetrics\"): Prefix for the output report files. The extensions .txt and .pdf will be added. validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT","title":"Defaults"},{"location":"tasks/picard/#outputs_6","text":"alignment_metrics (File) alignment_metrics_pdf (File)","title":"Outputs"},{"location":"tasks/picard/#collect_gc_bias_metrics","text":"description Runs picard CollectGcBiasMetrics to collect information about the relative proportions of guanine (G) and cytosine (C) nucleotides external_help https://gatk.broadinstitute.org/hc/en-us/articles/360037593931-CollectGcBiasMetrics-Picard- outputs {'gc_bias_metrics': {'description': 'The full text file output of CollectGcBiasMetrics ', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#GcBiasDetailMetrics'}, 'gc_bias_metrics_summary': {'description': 'The summary text file output of CollectGcBiasMetrics ', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#GcBiasSummaryMetrics'}, 'gc_bias_metrics_pdf': 'The PDF file output of CollectGcBiasMetrics '}","title":"collect_gc_bias_metrics"},{"location":"tasks/picard/#inputs_7","text":"","title":"Inputs"},{"location":"tasks/picard/#required_7","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file for which to calculate GC bias metrics reference_fasta (File, required ): Reference sequences in FASTA format","title":"Required"},{"location":"tasks/picard/#defaults_7","text":"memory_gb (Int, default=8): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\") + \".CollectGcBiasMetrics\"): Prefix for the output report files. The extensions .txt , .summary.txt , and .pdf will be added. validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT","title":"Defaults"},{"location":"tasks/picard/#outputs_7","text":"gc_bias_metrics (File) gc_bias_metrics_summary (File) gc_bias_metrics_pdf (File)","title":"Outputs"},{"location":"tasks/picard/#collect_insert_size_metrics","text":"description Runs picard CollectInsertSizeMetrics to collect metrics for validating library construction including the insert size distribution and read orientation of Paired-End libraries external_help https://gatk.broadinstitute.org/hc/en-us/articles/360037055772-CollectInsertSizeMetrics-Picard- outputs {'insert_size_metrics': {'description': 'The text file output of CollectInsertSizeMetrics ', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#InsertSizeMetrics'}, 'insert_size_metrics_pdf': 'The PDF file output of CollectInsertSizeMetrics '}","title":"collect_insert_size_metrics"},{"location":"tasks/picard/#inputs_8","text":"","title":"Inputs"},{"location":"tasks/picard/#required_8","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file for which to calculate insert size metrics","title":"Required"},{"location":"tasks/picard/#defaults_8","text":"memory_gb (Int, default=8): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\") + \".CollectInsertSizeMetrics\"): Prefix for the output report files. The extensions .txt and .pdf will be added. validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT","title":"Defaults"},{"location":"tasks/picard/#outputs_8","text":"insert_size_metrics (File) insert_size_metrics_pdf (File)","title":"Outputs"},{"location":"tasks/picard/#quality_score_distribution","text":"description Runs picard QualityScoreDistribution to calculate the range of quality scores and creates an accompanying chart external_help https://gatk.broadinstitute.org/hc/en-us/articles/360037057312-QualityScoreDistribution-Picard- outputs {'quality_score_distribution_txt': 'The text file output of QualityScoreDistribution ', 'quality_score_distribution_pdf': 'The PDF file output of QualityScoreDistribution '}","title":"quality_score_distribution"},{"location":"tasks/picard/#inputs_9","text":"","title":"Inputs"},{"location":"tasks/picard/#required_9","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file for which to calculate quality score distribution","title":"Required"},{"location":"tasks/picard/#defaults_9","text":"memory_gb (Int, default=8): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\") + \".QualityScoreDistribution\"): Prefix for the output report files. The extensions .txt and .pdf will be added. validation_stringency (String, default=\"SILENT\"); description : Validation stringency for parsing the input BAM.; choices : ['STRICT', 'LENIENT', 'SILENT']; tool_default : STRICT","title":"Defaults"},{"location":"tasks/picard/#outputs_9","text":"quality_score_distribution_txt (File) quality_score_distribution_pdf (File)","title":"Outputs"},{"location":"tasks/picard/#bam_to_fastq","text":"description [Deprecated] This WDL task converts the input BAM file into FASTQ format files. This task has been deprecated in favor of samtools.bam_to_fastq which is more performant and doesn't error on 'illegal mate states'. deprecated true","title":"bam_to_fastq"},{"location":"tasks/picard/#inputs_10","text":"","title":"Inputs"},{"location":"tasks/picard/#required_10","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to convert to FASTQ","title":"Required"},{"location":"tasks/picard/#defaults_10","text":"memory_gb (Int, default=56): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. paired (Boolean, default=true); description : Is the data Paired-End (true) or Single-End (false)?; common : true prefix (String, default=basename(bam,\".bam\")): Prefix for the file. The extension <extension> will be added.","title":"Defaults"},{"location":"tasks/picard/#outputs_10","text":"read_one_fastq_gz (File) read_two_fastq_gz (File?)","title":"Outputs"},{"location":"tasks/picard/#merge_vcfs","text":"description Merges the input VCF files into a single VCF file external_help https://gatk.broadinstitute.org/hc/en-us/articles/360036713331-MergeVcfs-Picard outputs {'output_vcf': 'The merged VCF file', 'output_vcf_index': 'The index file associated with the merged VCF file'}","title":"merge_vcfs"},{"location":"tasks/picard/#inputs_11","text":"","title":"Inputs"},{"location":"tasks/picard/#required_11","text":"_runtime (Any, required ) output_vcf_name (String, required ): Name for the merged VCF file vcfs (Array[File], required ): Input VCF format files to merge. May be gzipped or binary compressed. vcfs_indexes (Array[File], required ): Index files associated with the input VCF files","title":"Required"},{"location":"tasks/picard/#defaults_11","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.","title":"Defaults"},{"location":"tasks/picard/#outputs_11","text":"output_vcf (File) output_vcf_index (File)","title":"Outputs"},{"location":"tasks/picard/#scatter_interval_list","text":"description Splits an interval list into smaller interval lists for parallel processing external_help https://gatk.broadinstitute.org/hc/en-us/articles/360036897212-IntervalListTools-Picard outputs {'out': 'The split interval lists', 'interval_count': 'The number of split interval lists'}","title":"scatter_interval_list"},{"location":"tasks/picard/#inputs_12","text":"","title":"Inputs"},{"location":"tasks/picard/#required_12","text":"_runtime (Any, required ) interval_list (File, required ): Input interval list to split scatter_count (Int, required ): Number of interval lists to create","title":"Required"},{"location":"tasks/picard/#defaults_12","text":"sort (Boolean, default=true): Should the output interval lists be sorted? Sorts by coordinate. subdivision_mode (String, default=\"BALANCING_WITHOUT_INTERVAL_SUBDIVISION_WITH_OVERFLOW\"); description : How to subdivide the intervals; choices : ['BALANCING_WITHOUT_INTERVAL_SUBDIVISION_WITH_OVERFLOW', 'INTERVAL_SUBDIVISION', 'BALANCING_WITHOUT_INTERVAL_SUBDIVISION'] unique (Boolean, default=true): Should the output interval lists contain unique intervals? Implies sort=true. Merges overlapping or adjacent intervals.","title":"Defaults"},{"location":"tasks/picard/#outputs_12","text":"interval_lists_scatter (Array[File]) interval_count (Int)","title":"Outputs"},{"location":"tasks/picard/#create_sequence_dictionary","text":"description Creates a sequence dictionary for the input FASTA file using Picard external_help https://gatk.broadinstitute.org/hc/en-us/articles/13832748622491-CreateSequenceDictionary-Picard- outputs {'dictionary': 'Sequence dictionary produced by picard CreateSequenceDictionary .'}","title":"create_sequence_dictionary"},{"location":"tasks/picard/#inputs_13","text":"","title":"Inputs"},{"location":"tasks/picard/#required_13","text":"_runtime (Any, required ) fasta (File, required ): Input FASTA format file from which to create dictionary","title":"Required"},{"location":"tasks/picard/#optional_1","text":"assembly_name (String?): Value to put in AS field of sequence dictionary fasta_url (String?): Value to put in UR field of sequence dictionary species (String?): Value to put in SP field of sequence dictionary","title":"Optional"},{"location":"tasks/picard/#defaults_13","text":"memory_gb (Int, default=16): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. outfile_name (String, default=basename(fasta,\".fa\") + \".dict\"): Name for the CreateSequenceDictionary dictionary file","title":"Defaults"},{"location":"tasks/picard/#outputs_13","text":"dictionary (File)","title":"Outputs"},{"location":"tasks/qualimap/","text":"Homepage rnaseq description Generates runs QualiMap's rnaseq tool on the input BAM file. Note that we don't expose the -p parameter. This is used to set strandedness protocol of the sample, however in practice it only disables certain calculations. We do not expose the parameter so that the full suite of calculations is always performed. outputs {'raw_summary': \"Raw text summary of QualiMap's results. Can be parsed by MultiQC.\", 'raw_coverage': \"Raw text of QualiMap's coverage analysis results. Can be parsed by MultiQC.\", 'results': 'Gzipped tar archive of all QualiMap output files'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to run qualimap rnaseq on gtf (File, required ): GTF features file. Gzipped or uncompressed. Defaults memory_gb (Int, default=16): RAM to allocate for task modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. name_sorted (Boolean, default=false); description : Is the BAM name sorted? QualiMap has an inefficient sorting algorithm. In order to save resources we recommend collating your input BAM before QualiMap and setting this parameter to true.; common : true paired_end (Boolean, default=true); description : Is the BAM paired end?; common : true prefix (String, default=basename(bam,\".bam\") + \".qualimap_rnaseq_results\"): Prefix for the results directory and output tarball. The extension .qualimap_rnaseq_results.tar.gz will be added. Outputs raw_summary (File) raw_coverage (File) results (File) bamqc description [Deprecated] This WDL task runs QualiMap's bamqc tool on the input BAM file. This task has been deprecated due to memory leak issues. Use at your own risk, for some samples can consume over 1TB of RAM. deprecated true Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to run qualimap bamqc on Defaults memory_gb (Int, default=32): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=1): Number of cores to allocate for task prefix (String, default=basename(bam,\".bam\")): Prefix for the file. The extension <extension> will be added. use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. Outputs results (File)","title":"Qualimap"},{"location":"tasks/qualimap/#rnaseq","text":"description Generates runs QualiMap's rnaseq tool on the input BAM file. Note that we don't expose the -p parameter. This is used to set strandedness protocol of the sample, however in practice it only disables certain calculations. We do not expose the parameter so that the full suite of calculations is always performed. outputs {'raw_summary': \"Raw text summary of QualiMap's results. Can be parsed by MultiQC.\", 'raw_coverage': \"Raw text of QualiMap's coverage analysis results. Can be parsed by MultiQC.\", 'results': 'Gzipped tar archive of all QualiMap output files'}","title":"rnaseq"},{"location":"tasks/qualimap/#inputs","text":"","title":"Inputs"},{"location":"tasks/qualimap/#required","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to run qualimap rnaseq on gtf (File, required ): GTF features file. Gzipped or uncompressed.","title":"Required"},{"location":"tasks/qualimap/#defaults","text":"memory_gb (Int, default=16): RAM to allocate for task modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. name_sorted (Boolean, default=false); description : Is the BAM name sorted? QualiMap has an inefficient sorting algorithm. In order to save resources we recommend collating your input BAM before QualiMap and setting this parameter to true.; common : true paired_end (Boolean, default=true); description : Is the BAM paired end?; common : true prefix (String, default=basename(bam,\".bam\") + \".qualimap_rnaseq_results\"): Prefix for the results directory and output tarball. The extension .qualimap_rnaseq_results.tar.gz will be added.","title":"Defaults"},{"location":"tasks/qualimap/#outputs","text":"raw_summary (File) raw_coverage (File) results (File)","title":"Outputs"},{"location":"tasks/qualimap/#bamqc","text":"description [Deprecated] This WDL task runs QualiMap's bamqc tool on the input BAM file. This task has been deprecated due to memory leak issues. Use at your own risk, for some samples can consume over 1TB of RAM. deprecated true","title":"bamqc"},{"location":"tasks/qualimap/#inputs_1","text":"","title":"Inputs"},{"location":"tasks/qualimap/#required_1","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to run qualimap bamqc on","title":"Required"},{"location":"tasks/qualimap/#defaults_1","text":"memory_gb (Int, default=32): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=1): Number of cores to allocate for task prefix (String, default=basename(bam,\".bam\")): Prefix for the file. The extension <extension> will be added. use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments.","title":"Defaults"},{"location":"tasks/qualimap/#outputs_1","text":"results (File)","title":"Outputs"},{"location":"tasks/read_group/","text":"Read groups are defined in the SAM spec ID: \"Read group identifier. Each Read Group must have a unique ID. The value of ID is used in the RG tags of alignment records.\", BC: \"Barcode sequence identifying the sample or library. This value is the expected barcode bases as read by the sequencing machine in the absence of errors. If there are several barcodes for the sample/library (e.g., one on each end of the template), the recommended implementation concatenates all the barcodes separating them with hyphens ( - ).\", CN: \"Name of sequencing center producing the read.\", DS: \"Description.\", DT: \"Date the run was produced (ISO8601 date or date/time).\", FO: \"Flow order. The array of nucleotide bases that correspond to the nucleotides used for each flow of each read. Multi-base flows are encoded in IUPAC format, and non-nucleotide flows by various other characters. Format: /\\*|[ACMGRSVTWYHKDBN]+/\", KS: \"The array of nucleotide bases that correspond to the key sequence of each read.\", LB: \"Library.\", PG: \"Programs used for processing the read group.\", PI: \"Predicted median insert size, rounded to the nearest integer.\", PL: \"Platform/technology used to produce the reads. Valid values: CAPILLARY, DNBSEQ (MGI/BGI), ELEMENT, HELICOS, ILLUMINA, IONTORRENT, LS454, ONT (Oxford Nanopore), PACBIO (Pacific Biosciences), SINGULAR, SOLID, and ULTIMA. This field should be omitted when the technology is not in this list (though the PM field may still be present in this case) or is unknown.\", PM: \"Platform model. Free-form text providing further details of the platform/technology used.\", PU: \"Platform unit (e.g., flowcell-barcode.lane for Illumina or slide for SOLiD). Unique identifier.\", SM: \"Sample. Use pool name where a pool is being sequenced.\" An example input JSON entry for read_group might look like this: { \"read_group\": { \"ID\": \"rg1\", \"PI\": 150, \"PL\": \"ILLUMINA\", \"SM\": \"Sample\", \"LB\": \"Sample\" } } ReadGroup_to_string description Stringifies a ReadGroup struct outputs {'stringified_read_group': 'Input ReadGroup as a string'} Inputs Required _runtime (Any, required ) read_group (ReadGroup, required ): ReadGroup struct to stringify Outputs stringified_read_group (String) get_ReadGroups description Gets read group information from a BAM file and writes it out as JSON which is converted to a WDL struct. outputs {'read_groups': 'An array of ReadGroup structs containing read group information.'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to get read groups from Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. Outputs read_groups (Array[ReadGroup]) validate_ReadGroup description Validate a ReadGroup struct's fields are defined outputs {'check': 'Dummy output to indicate success and enable call-caching'} Inputs Required _runtime (Any, required ) read_group (ReadGroup, required ): ReadGroup struct to validate Defaults required_fields (Array[String], default=[]): Array of read group fields that must be defined. The ID field is always required and does not need to be specified. restrictive (Boolean, default=true): If true, run a less permissive validation of field values. Otherwise, check against SAM spec-defined values. Outputs check (String)","title":"Read group"},{"location":"tasks/read_group/#readgroup_to_string","text":"description Stringifies a ReadGroup struct outputs {'stringified_read_group': 'Input ReadGroup as a string'}","title":"ReadGroup_to_string"},{"location":"tasks/read_group/#inputs","text":"","title":"Inputs"},{"location":"tasks/read_group/#required","text":"_runtime (Any, required ) read_group (ReadGroup, required ): ReadGroup struct to stringify","title":"Required"},{"location":"tasks/read_group/#outputs","text":"stringified_read_group (String)","title":"Outputs"},{"location":"tasks/read_group/#get_readgroups","text":"description Gets read group information from a BAM file and writes it out as JSON which is converted to a WDL struct. outputs {'read_groups': 'An array of ReadGroup structs containing read group information.'}","title":"get_ReadGroups"},{"location":"tasks/read_group/#inputs_1","text":"","title":"Inputs"},{"location":"tasks/read_group/#required_1","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to get read groups from","title":"Required"},{"location":"tasks/read_group/#defaults","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.","title":"Defaults"},{"location":"tasks/read_group/#outputs_1","text":"read_groups (Array[ReadGroup])","title":"Outputs"},{"location":"tasks/read_group/#validate_readgroup","text":"description Validate a ReadGroup struct's fields are defined outputs {'check': 'Dummy output to indicate success and enable call-caching'}","title":"validate_ReadGroup"},{"location":"tasks/read_group/#inputs_2","text":"","title":"Inputs"},{"location":"tasks/read_group/#required_2","text":"_runtime (Any, required ) read_group (ReadGroup, required ): ReadGroup struct to validate","title":"Required"},{"location":"tasks/read_group/#defaults_1","text":"required_fields (Array[String], default=[]): Array of read group fields that must be defined. The ID field is always required and does not need to be specified. restrictive (Boolean, default=true): If true, run a less permissive validation of field values. Otherwise, check against SAM spec-defined values.","title":"Defaults"},{"location":"tasks/read_group/#outputs_2","text":"check (String)","title":"Outputs"},{"location":"tasks/sambamba/","text":"Homepage index description Creates a .bai BAM index for the input BAM outputs {'bam_index': \"A .bai BAM index associated with the input BAM. Filename will be basename(bam) + '.bai' .\"} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to index Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments. Not recommended for cluster environments.; common : true Outputs bam_index (File) merge description Merges multiple sorted BAMs into a single BAM outputs {'merged_bam': 'The BAM resulting from merging all the input BAMs'} Inputs Required _runtime (Any, required ) bams (Array[File], required ): An array of BAMs to merge into one combined BAM prefix (String, required ): Prefix for the BAM file. The extension .bam will be added. Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments. Not recommended for cluster environments.; common : true Outputs merged_bam (File) sort description Sorts the input BAM file outputs {'sorted_bam': 'The input BAM after it has been sorted according to sort_order '} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to sort Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2): Number of cores to allocate for task prefix (String, default=basename(bam,\".bam\") + \".sorted\"): Prefix for the sorted BAM file. The extension .bam will be added. queryname_sort (Boolean, default=false); description : If true, sort the BAM by queryname. If false, sort by coordinate.; common : true Outputs sorted_bam (File) markdup description Marks duplicate reads in the input BAM file outputs {'duplicate_marked_bam': 'The input BAM with computationally determined duplicates marked.', 'mark_duplicates_metrics': 'Duplicate marking metrics output from sambamba'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file in which to mark duplicates Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2): Number of cores to allocate for task prefix (String, default=basename(bam,\".bam\")): Prefix for the markdup result files. The extensions markdup.bam will be added. remove_duplicates (Boolean, default=false); description : If true, remove duplicates instead of marking them.; common : true Outputs duplicate_marked_bam (File) duplicate_marked_bam_index (File) markdup_log (File) flagstat description Produces a report containing statistics about the alignments based on the bit flags set in the BAM outputs {'flagstat_report': ' sambamba flagstat STDOUT redirected to a file'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to generate flagstat for Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true outfile_name (String, default=basename(bam,\".bam\") + \".flagstat.txt\"): Name for the flagstat report file use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments. Not recommended for cluster environments.; common : true Outputs flagstat_report (File)","title":"Sambamba"},{"location":"tasks/sambamba/#index","text":"description Creates a .bai BAM index for the input BAM outputs {'bam_index': \"A .bai BAM index associated with the input BAM. Filename will be basename(bam) + '.bai' .\"}","title":"index"},{"location":"tasks/sambamba/#inputs","text":"","title":"Inputs"},{"location":"tasks/sambamba/#required","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to index","title":"Required"},{"location":"tasks/sambamba/#defaults","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments. Not recommended for cluster environments.; common : true","title":"Defaults"},{"location":"tasks/sambamba/#outputs","text":"bam_index (File)","title":"Outputs"},{"location":"tasks/sambamba/#merge","text":"description Merges multiple sorted BAMs into a single BAM outputs {'merged_bam': 'The BAM resulting from merging all the input BAMs'}","title":"merge"},{"location":"tasks/sambamba/#inputs_1","text":"","title":"Inputs"},{"location":"tasks/sambamba/#required_1","text":"_runtime (Any, required ) bams (Array[File], required ): An array of BAMs to merge into one combined BAM prefix (String, required ): Prefix for the BAM file. The extension .bam will be added.","title":"Required"},{"location":"tasks/sambamba/#defaults_1","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments. Not recommended for cluster environments.; common : true","title":"Defaults"},{"location":"tasks/sambamba/#outputs_1","text":"merged_bam (File)","title":"Outputs"},{"location":"tasks/sambamba/#sort","text":"description Sorts the input BAM file outputs {'sorted_bam': 'The input BAM after it has been sorted according to sort_order '}","title":"sort"},{"location":"tasks/sambamba/#inputs_2","text":"","title":"Inputs"},{"location":"tasks/sambamba/#required_2","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to sort","title":"Required"},{"location":"tasks/sambamba/#defaults_2","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2): Number of cores to allocate for task prefix (String, default=basename(bam,\".bam\") + \".sorted\"): Prefix for the sorted BAM file. The extension .bam will be added. queryname_sort (Boolean, default=false); description : If true, sort the BAM by queryname. If false, sort by coordinate.; common : true","title":"Defaults"},{"location":"tasks/sambamba/#outputs_2","text":"sorted_bam (File)","title":"Outputs"},{"location":"tasks/sambamba/#markdup","text":"description Marks duplicate reads in the input BAM file outputs {'duplicate_marked_bam': 'The input BAM with computationally determined duplicates marked.', 'mark_duplicates_metrics': 'Duplicate marking metrics output from sambamba'}","title":"markdup"},{"location":"tasks/sambamba/#inputs_3","text":"","title":"Inputs"},{"location":"tasks/sambamba/#required_3","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file in which to mark duplicates","title":"Required"},{"location":"tasks/sambamba/#defaults_3","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2): Number of cores to allocate for task prefix (String, default=basename(bam,\".bam\")): Prefix for the markdup result files. The extensions markdup.bam will be added. remove_duplicates (Boolean, default=false); description : If true, remove duplicates instead of marking them.; common : true","title":"Defaults"},{"location":"tasks/sambamba/#outputs_3","text":"duplicate_marked_bam (File) duplicate_marked_bam_index (File) markdup_log (File)","title":"Outputs"},{"location":"tasks/sambamba/#flagstat","text":"description Produces a report containing statistics about the alignments based on the bit flags set in the BAM outputs {'flagstat_report': ' sambamba flagstat STDOUT redirected to a file'}","title":"flagstat"},{"location":"tasks/sambamba/#inputs_4","text":"","title":"Inputs"},{"location":"tasks/sambamba/#required_4","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to generate flagstat for","title":"Required"},{"location":"tasks/sambamba/#defaults_4","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true outfile_name (String, default=basename(bam,\".bam\") + \".flagstat.txt\"): Name for the flagstat report file use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments. Not recommended for cluster environments.; common : true","title":"Defaults"},{"location":"tasks/sambamba/#outputs_4","text":"flagstat_report (File)","title":"Outputs"},{"location":"tasks/samtools/","text":"Homepage quickcheck description Runs Samtools quickcheck on the input BAM file. This checks that the BAM file appears to be intact, e.g. header exists and the end-of-file marker exists. outputs {'check': 'Dummy output to enable caching'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to quickcheck Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. Outputs check (String) split description Runs Samtools split on the input BAM file. This splits the BAM by read group into one or more output files. It optionally errors if there are reads present that do not belong to a read group. Inputs Required _runtime (Any, required ) bam (File, required ); description : Input BAM format file to split; stream : true Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\")): Prefix for the split BAM files. The extensions will contain read group IDs, and will end in .bam . reject_unaccounted (Boolean, default=true); description : If true, error if there are reads present that do not have read group information.; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs split_bams (Array[File]) flagstat description Produces a samtools flagstat report containing statistics about the alignments based on the bit flags set in the BAM outputs {'flagstat_report': ' samtools flagstat STDOUT redirected to a file'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to generate flagstat for Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true outfile_name (String, default=basename(bam,\".bam\") + \".flagstat.txt\"): Name for the flagstat report file use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs flagstat_report (File) index description Creates a .bai BAM index for the input BAM outputs {'bam_index': \"A .bai BAM index associated with the input BAM. Filename will be basename(bam) + '.bai' .\"} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to index Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs bam_index (File) subsample description Randomly subsamples the input BAM, in order to produce an output BAM with approximately the desired number of reads. help A desired_reads greater than zero must be supplied. A desired_reads <= 0 will result in task failure. Sampling is probabalistic and will be approximate to desired_reads . Read count will not be exact. A sampled_bam will not be produced if the input BAM read count is less than or equal to desired_reads . outputs {'orig_read_count': 'A TSV report containing the original read count before subsampling. If subsampling was requested but the input BAM had less than desired_reads , no read count will be filled in (instead there will be a dash ).', 'sampled_bam': 'The subsampled input BAM. Only present if subsampling was performed.'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to subsample desired_reads (Int, required ): How many reads should be in the ouput BAM? Output BAM read count will be approximate to this value. Must be greater than zero. A desired_reads <= 0 will result in task failure. Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\")): Prefix for the BAM file. The extension .sampled.bam will be added. use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs orig_read_count (File) sampled_bam (File?) filter description Filters a BAM based on its bitwise flag value. help This task is a wrapper around samtools view . This task will fail if there are no reads in the output BAM. This can happen either because the input BAM was empty or because the supplied bitwise_filter was too strict. If you want to down-sample a BAM, use the subsample task instead. Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to filter bitwise_filter (FlagFilter, required ): A set of 4 possible read filters to apply. This is a FlagFilter object (see ../data_structures/flag_filter.wdl for more information). Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\") + \".filtered\"): Prefix for the filtered BAM file. The extension .bam will be added. use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs filtered_bam (File) merge description Merges multiple sorted BAMs into a single BAM outputs {'merged_bam': 'The BAM resulting from merging all the input BAMs'} Inputs Required _runtime (Any, required ) bams (Array[File], required ): An array of BAMs to merge into one combined BAM prefix (String, required ): Prefix for the BAM file. The extension .bam will be added. Optional new_header (File?): Use the lines of FILE as @ headers to be copied to the merged BAM, replacing any header lines that would otherwise be copied from the first BAM file in the list. (File may actually be in SAM format, though any alignment records it may contain are ignored.) Defaults attach_rg (Boolean, default=true); description : Attach an RG tag to each alignment. The tag value is inferred from file names.; common : true combine_pg (Boolean, default=true); description : Similarly to combine_rg : for each @PG ID in the set of files to merge, use the @PG line of the first file we find that ID in rather than adding a suffix to differentiate similar IDs.; common : true combine_rg (Boolean, default=true); description : When several input files contain @RG headers with the same ID, emit only one of them (namely, the header line from the first file we find that ID in) to the merged output file. Combining these similar headers is usually the right thing to do when the files being merged originated from the same file. Without -c , all @RG headers appear in the output file, with random suffixes added to their IDs where necessary to differentiate them.; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. name_sorted (Boolean, default=false); description : Are all input BAMs queryname sorted (true)? Or are all input BAMs coordinate sorted (false)?; common : true ncpu (Int, default=2); description : Number of cores to allocate for task; common : true region (String, default=\"\"): Merge files in the specified region (Format: chr:start-end ) use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs merged_bam (File) addreplacerg description Adds or replaces read group tags outputs {'tagged_bam': 'The transformed input BAM after read group modifications have been applied'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to add read group information Optional read_group_id (String?): Allows you to specify the read group ID of an existing @RG line and applies it to the reads specified by the orphan_only option Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true orphan_only (Boolean, default=true); description : Only add RG tags to orphans (true)? Or also overwrite all existing RG tags (including any in the header) (false)?; common : true overwrite_header_record (Boolean, default=false); description : Overwrite an existing @RG line, if a new one with the same ID value is provided?; common : true prefix (String, default=basename(bam,\".bam\") + \".addreplacerg\"): Prefix for the BAM file. The extension .bam will be added. read_group_line (Array[String], default=[]); description : Allows you to specify a read group line to append to (or replace in) the header and applies it to the reads specified by the orphan_only option. Each String in the Array should correspond to one field of the read group line. Tab literals will be inserted between each entry in the final BAM. Only one read group line can be supplied per invocation of this task.; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs tagged_bam (File) collate description Runs samtools collate on the input BAM file. Shuffles and groups reads together by their names. outputs {'collated_bam': 'A collated BAM (reads sharing a name next to each other, no other guarantee of sort order)'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to collate Defaults fast_mode (Boolean, default=true); description : Use fast mode (output primary alignments only)?; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\") + \".collated\"): Prefix for the collated BAM file. The extension .bam will be added. use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs collated_bam (File) bam_to_fastq description Converts an input BAM file into FASTQ(s) using samtools fastq . help If paired_end == false , then all reads in the BAM will be output to a single FASTQ file. Use bitwise_filter argument to remove any unwanted reads. An exit-code of 42 indicates that no reads were present in the output FASTQs. An exit-code of 43 indicates that unexpected reads were discovered in the input BAM. output {'collated_bam': 'A collated BAM (reads sharing a name next to each other, no other guarantee of sort order). Only generated if retain_collated_bam and paired_end are both true. Has the name ~{prefix}.collated.bam .', 'read_one_fastq_gz': 'Gzipped FASTQ file with 1st reads in pair. Only generated if paired_end is true and interleaved is false. Has the name ~{prefix}.R1.fastq.gz .', 'read_two_fastq_gz': 'Gzipped FASTQ file with 2nd reads in pair. Only generated if paired_end is true and interleaved is false. Has the name ~{prefix}.R2.fastq.gz .', 'singleton_reads_fastq_gz': 'Gzipped FASTQ containing singleton reads. Only generated if paired_end and output_singletons are both true. Has the name ~{prefix}.singleton.fastq.gz .', 'interleaved_reads_fastq_gz': 'Interleaved gzipped Paired-End FASTQ. Only generated if paired_end and interleaved are both true. Has the name ~{prefix}.fastq.gz . The conditions under which this output and single_end_reads_fastq_gz are created are mutually exclusive, but since they share the same literal filename they will always evaluate to the same file (or undefined if neither are created).', 'single_end_reads_fastq_gz': 'A gzipped FASTQ containing all reads. Only generated if paired_end is false. Has the name ~{prefix}.fastq.gz . The conditions under which this output and interleaved_reads_fastq_gz are created are mutually exclusive, but since they share the same literal filename they will always evaluate to the same file (or undefined if neither are created).'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to convert to FASTQ(s) Defaults append_read_number (Boolean, default=true); description : Append /1 and /2 suffixes to read names?; common : true bitwise_filter (FlagFilter, default={\"include_if_all\": \"0x0\", \"exclude_if_any\": \"0x900\", \"include_if_any\": \"0x0\", \"exclude_if_all\": \"0x0\"}): A set of 4 possible read filters to apply during conversion to FASTQ. This is a FlagFilter object (see ../data_structures/flag_filter.wdl for more information). By default, it will remove secondary and supplementary reads from the output FASTQs. collated (Boolean, default=false); description : Is the BAM collated (or name-sorted)? If collated == true , then the input BAM will be run through samtools fastq without preprocessing. If collated == false , then samtools collate must be run on the input BAM before conversion to FASTQ. Ignored if paired_end == false .; common : true fail_on_unexpected_reads (Boolean, default=false): The definition of 'unexpected' depends on whether the values of paired_end and output_singletons are true or false. If paired_end is false , no reads are considered unexpected, and every read (not caught by bitwise_filter ) will be present in the resulting FASTQ regardless of first / last bit settings. This setting will be ignored in that case. If paired_end is true then reads that don't satisfy first XOR last are considered unexpected (i.e. reads that have neither first nor last set or reads that have both first and last set). If output_singletons is false , singleton reads are considered unexpected. A singleton read is a read with either the first or the last bit set (but not both) and that possesses a unique QNAME; i.e. it is a read without a pair when all reads are expected to be paired. But if output_singletons is true , these singleton reads will be output as their own FASTQ instead of causing the task to fail. If fail_on_unexpected_reads is false , then all the above cases will be ignored. Any 'unexpected' reads will be silently discarded.; description : Should the task fail if reads with an unexpected first / last bit setting are discovered?; common : true fast_mode (Boolean, default=!retain_collated_bam); description : Fast mode for samtools collate ? If true , this removes secondary and supplementary reads during the collate step. If false , secondary and supplementary reads will be retained in the collated_bam output (if created). Defaults to the opposite of retain_collated_bam . Ignored if collated == true or paired_end == false .; common : true interleaved (Boolean, default=false); description : Create an interleaved FASTQ file from Paired-End data? Ignored if paired_end == false .; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true output_singletons (Boolean, default=false): Output singleton reads as their own FASTQ? Ignored if paired_end == false . paired_end (Boolean, default=true); description : Is the data Paired-End? If paired_end == false , then all reads in the BAM will be output to a single FASTQ file. Use bitwise_filter argument to remove any unwanted reads.; common : true prefix (String, default=basename(bam,\".bam\")): Prefix for the collated BAM and FASTQ files. The extensions .collated.bam and [,.R1,.R2,.singleton].fastq.gz will be added. retain_collated_bam (Boolean, default=false); description : Save the collated BAM to disk and output it (true)? This slows performance and substantially increases storage requirements. Be aware that collated BAMs occupy much more space than either position sorted or name sorted BAMs (due to the compression algorithm). Ignored if collated == true or paired_end == false .; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs collated_bam (File?) read_one_fastq_gz (File?) read_two_fastq_gz (File?) singleton_reads_fastq_gz (File?) interleaved_reads_fastq_gz (File?) single_end_reads_fastq_gz (File?) fixmate description Runs samtools fixmate on the name-collated input BAM file. This fills in mate coordinates and insert size fields among other tags and fields. help This task assumes a name-sorted or name-collated input BAM. If you have a position-sorted BAM, please use the position_sorted_fixmate task. This task runs fixmate and outputs a BAM in the same order as the input. outputs {'fixmate_bam': 'The BAM resulting from running samtools fixmate on the input BAM'} Inputs Required _runtime (Any, required ) bam (File, required ); description : Input BAM format file to add mate information. Must be name-sorted or name-collated.; stream : true Defaults add_cigar (Boolean, default=true); description : Add template cigar ct tag; tool_default : false; common : true add_mate_score (Boolean, default=true); description : Add mate score tags. These are used by markdup to select the best reads to keep.; tool_default : false; common : true disable_flag_sanitization (Boolean, default=false): Disable all flag sanitization? disable_proper_pair_check (Boolean, default=false): Disable proper pair check [ensure one forward and one reverse read in each pair] extension (String, default=\".bam\"); description : File format extension to use for output file.; choices : ['.bam', '.cram']; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\") + \".fixmate\"): Prefix for the output file. The extension specified with the extension parameter will be added. remove_unaligned_and_secondary (Boolean, default=false): Remove unmapped and secondary reads use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs fixmate_bam (File) position_sorted_fixmate description Runs samtools fixmate on the position-sorted input BAM file and output a position-sorted BAM. fixmate fills in mate coordinates and insert size fields among other tags and fields. samtools fixmate assumes a name-sorted or name-collated input BAM. If you already have a collated BAM, please use the fixmate task. This task collates the input BAM, runs fixmate , and then resorts the output into a position-sorted BAM. Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to add mate information. Must be position-sorted. Defaults add_cigar (Boolean, default=true); description : Add template cigar ct tag; tool_default : false; common : true add_mate_score (Boolean, default=true); description : Add mate score tags. These are used by markdup to select the best reads to keep.; tool_default : false; common : true disable_flag_sanitization (Boolean, default=false): Disable all flag sanitization? disable_proper_pair_check (Boolean, default=false): Disable proper pair check [ensure one forward and one reverse read in each pair]? fast_mode (Boolean, default=false); description : Use fast mode (output primary alignments only)?; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\") + \".fixmate\"): Prefix for the output file. The extension .bam will be added. remove_unaligned_and_secondary (Boolean, default=false): Remove unmapped and secondary reads use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs fixmate_bam (File) markdup description [DEPRECATED] Runs samtools markdup on the position-sorted input BAM file. This creates a report and optionally a new BAM with duplicate reads marked. help This task assumes samtools fixmate has already been run on the input BAM. If it has not, then the output may be incorrect. A name-sorted or collated BAM can be run through the fixmate task (and then position-sorted prior to this task) or a position-sorted BAM can be run through the position_sorted_fixmate task. Deprecated due to extremely high memory usage for certain RNA-Seq samples when searching for optical duplicates. Use mark_duplicates in ./picard.wdl instead. deprecated true Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to mark duplicates in Defaults coordinates_order (String, default=\"txy\"); description : The order of the elements captured in the read_coords_regex regular expression. Default is txy where t is a part of the read name selected for string comparison and x / y are the coordinates used for optical duplicate detection. Ignored if optical_distance == 0 .; choices : ['txy', 'tyx', 'xyt', 'yxt', 'xty', 'ytx', 'xy', 'yx'] create_bam (Boolean, default=true): Create a new BAM with duplicate reads marked? If false , then only a markdup report will be generated. duplicate_count (Boolean, default=false): Record the original primary read duplication count (include itself) in a dc tag? Ignored if create_bam == false . duplicates_of_duplicates_check (Boolean, default=false): Check duplicates of duplicates for correctness? Performs further checks to make sure all optical duplicates are found. Also operates on mark_duplicates_with_do_tag tagging where reads may be tagged with the best quality read. Disabling this option can speed up duplicate marking when there are a great many duplicates for each original read. Ignored if create_bam == false or optical_distance == 0 . include_qc_fails (Boolean, default=false): Include reads that have the QC-failed flag set in duplicate marking? This can increase the number of duplicates found. Ignored if create_bam == false . json (Boolean, default=false): Output a JSON report instead of a text report? Either are parseable by MultiQC. mark_duplicates_with_do_tag (Boolean, default=false): Mark duplicates with the do ( d uplicate o riginal) tag? The do tag contains the name of the \"original\" read that was duplicated. Ignored if create_bam == false . mark_supp_or_sec_or_unmapped_as_duplicates (Boolean, default=false): Mark supplementary, secondary, or unmapped alignments of duplicates as duplicates? As this takes a quick second pass over the data it will increase running time. Ignored if create_bam == false . max_readlen (Int, default=300): Expected maximum read length. modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true optical_distance (Int, default=0): Maximum distance between read coordinates to consider them optical duplicates. If 0 , then optical duplicate marking is disabled. Suggested settings of 100 for HiSeq style platforms or about 2500 for NovaSeq ones. When set above 0 , duplicate reads are tagged with dt:Z:SQ for optical duplicates and dt:Z:LB otherwise. Calculation of distance depends on coordinate data embedded in the read names, typically produced by the Illumina sequencing machines. Optical duplicate detection will not work on non-standard names without modifying read_coords_regex . If changing read_coords_regex , make sure that coordinates_order matches. prefix (String, default=basename(bam,\".bam\") + \".markdup\"): Prefix for the output file. TODO read_coords_regex (String, default=\"[!-9;-?A-~:]+:([!-9;-?A-~]+):([0-9]+):([0-9]+)\"); description : Regular expression to extract read coordinates from the QNAME field. This takes a POSIX regular expression for at least x and y to be used in optical duplicate marking It can also include another part of the read name to test for equality, eg lane:tile elements. Elements wanted are captured with parentheses. The default is meant to capture information from Illumina style read names. Ignored if optical_distance == 0 . If changing read_coords_regex , make sure that coordinates_order matches.; tool_default : ([!-9;-?A-~]+:[0-9]+:[0-9]+:[0-9]+:[0-9]+):([0-9]+):([0-9]+) remove_duplicates (Boolean, default=false): Remove duplicates from the output BAM? Ignored if create_bam == false . use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true use_read_groups (Boolean, default=false): Only mark duplicates within the same Read Group? Ignored if create_bam == false . Outputs markdup_report (File) markdup_bam (File?) faidx description Creates a .fai FASTA index for the input FASTA outputs {'fasta_index': \"A .fai FASTA index associated with the input FASTA. Filename will be basename(fasta) + '.fai' .\"} Inputs Required _runtime (Any, required ) fasta (File, required ): Input FASTA format file to index. Optionally gzip compressed. Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments. Not recommended for cluster environments.; common : true Outputs fasta_index (File)","title":"Samtools"},{"location":"tasks/samtools/#quickcheck","text":"description Runs Samtools quickcheck on the input BAM file. This checks that the BAM file appears to be intact, e.g. header exists and the end-of-file marker exists. outputs {'check': 'Dummy output to enable caching'}","title":"quickcheck"},{"location":"tasks/samtools/#inputs","text":"","title":"Inputs"},{"location":"tasks/samtools/#required","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to quickcheck","title":"Required"},{"location":"tasks/samtools/#defaults","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.","title":"Defaults"},{"location":"tasks/samtools/#outputs","text":"check (String)","title":"Outputs"},{"location":"tasks/samtools/#split","text":"description Runs Samtools split on the input BAM file. This splits the BAM by read group into one or more output files. It optionally errors if there are reads present that do not belong to a read group.","title":"split"},{"location":"tasks/samtools/#inputs_1","text":"","title":"Inputs"},{"location":"tasks/samtools/#required_1","text":"_runtime (Any, required ) bam (File, required ); description : Input BAM format file to split; stream : true","title":"Required"},{"location":"tasks/samtools/#defaults_1","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\")): Prefix for the split BAM files. The extensions will contain read group IDs, and will end in .bam . reject_unaccounted (Boolean, default=true); description : If true, error if there are reads present that do not have read group information.; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/samtools/#outputs_1","text":"split_bams (Array[File])","title":"Outputs"},{"location":"tasks/samtools/#flagstat","text":"description Produces a samtools flagstat report containing statistics about the alignments based on the bit flags set in the BAM outputs {'flagstat_report': ' samtools flagstat STDOUT redirected to a file'}","title":"flagstat"},{"location":"tasks/samtools/#inputs_2","text":"","title":"Inputs"},{"location":"tasks/samtools/#required_2","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to generate flagstat for","title":"Required"},{"location":"tasks/samtools/#defaults_2","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true outfile_name (String, default=basename(bam,\".bam\") + \".flagstat.txt\"): Name for the flagstat report file use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/samtools/#outputs_2","text":"flagstat_report (File)","title":"Outputs"},{"location":"tasks/samtools/#index","text":"description Creates a .bai BAM index for the input BAM outputs {'bam_index': \"A .bai BAM index associated with the input BAM. Filename will be basename(bam) + '.bai' .\"}","title":"index"},{"location":"tasks/samtools/#inputs_3","text":"","title":"Inputs"},{"location":"tasks/samtools/#required_3","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to index","title":"Required"},{"location":"tasks/samtools/#defaults_3","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/samtools/#outputs_3","text":"bam_index (File)","title":"Outputs"},{"location":"tasks/samtools/#subsample","text":"description Randomly subsamples the input BAM, in order to produce an output BAM with approximately the desired number of reads. help A desired_reads greater than zero must be supplied. A desired_reads <= 0 will result in task failure. Sampling is probabalistic and will be approximate to desired_reads . Read count will not be exact. A sampled_bam will not be produced if the input BAM read count is less than or equal to desired_reads . outputs {'orig_read_count': 'A TSV report containing the original read count before subsampling. If subsampling was requested but the input BAM had less than desired_reads , no read count will be filled in (instead there will be a dash ).', 'sampled_bam': 'The subsampled input BAM. Only present if subsampling was performed.'}","title":"subsample"},{"location":"tasks/samtools/#inputs_4","text":"","title":"Inputs"},{"location":"tasks/samtools/#required_4","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to subsample desired_reads (Int, required ): How many reads should be in the ouput BAM? Output BAM read count will be approximate to this value. Must be greater than zero. A desired_reads <= 0 will result in task failure.","title":"Required"},{"location":"tasks/samtools/#defaults_4","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\")): Prefix for the BAM file. The extension .sampled.bam will be added. use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/samtools/#outputs_4","text":"orig_read_count (File) sampled_bam (File?)","title":"Outputs"},{"location":"tasks/samtools/#filter","text":"description Filters a BAM based on its bitwise flag value. help This task is a wrapper around samtools view . This task will fail if there are no reads in the output BAM. This can happen either because the input BAM was empty or because the supplied bitwise_filter was too strict. If you want to down-sample a BAM, use the subsample task instead.","title":"filter"},{"location":"tasks/samtools/#inputs_5","text":"","title":"Inputs"},{"location":"tasks/samtools/#required_5","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to filter bitwise_filter (FlagFilter, required ): A set of 4 possible read filters to apply. This is a FlagFilter object (see ../data_structures/flag_filter.wdl for more information).","title":"Required"},{"location":"tasks/samtools/#defaults_5","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\") + \".filtered\"): Prefix for the filtered BAM file. The extension .bam will be added. use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/samtools/#outputs_5","text":"filtered_bam (File)","title":"Outputs"},{"location":"tasks/samtools/#merge","text":"description Merges multiple sorted BAMs into a single BAM outputs {'merged_bam': 'The BAM resulting from merging all the input BAMs'}","title":"merge"},{"location":"tasks/samtools/#inputs_6","text":"","title":"Inputs"},{"location":"tasks/samtools/#required_6","text":"_runtime (Any, required ) bams (Array[File], required ): An array of BAMs to merge into one combined BAM prefix (String, required ): Prefix for the BAM file. The extension .bam will be added.","title":"Required"},{"location":"tasks/samtools/#optional","text":"new_header (File?): Use the lines of FILE as @ headers to be copied to the merged BAM, replacing any header lines that would otherwise be copied from the first BAM file in the list. (File may actually be in SAM format, though any alignment records it may contain are ignored.)","title":"Optional"},{"location":"tasks/samtools/#defaults_6","text":"attach_rg (Boolean, default=true); description : Attach an RG tag to each alignment. The tag value is inferred from file names.; common : true combine_pg (Boolean, default=true); description : Similarly to combine_rg : for each @PG ID in the set of files to merge, use the @PG line of the first file we find that ID in rather than adding a suffix to differentiate similar IDs.; common : true combine_rg (Boolean, default=true); description : When several input files contain @RG headers with the same ID, emit only one of them (namely, the header line from the first file we find that ID in) to the merged output file. Combining these similar headers is usually the right thing to do when the files being merged originated from the same file. Without -c , all @RG headers appear in the output file, with random suffixes added to their IDs where necessary to differentiate them.; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. name_sorted (Boolean, default=false); description : Are all input BAMs queryname sorted (true)? Or are all input BAMs coordinate sorted (false)?; common : true ncpu (Int, default=2); description : Number of cores to allocate for task; common : true region (String, default=\"\"): Merge files in the specified region (Format: chr:start-end ) use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/samtools/#outputs_6","text":"merged_bam (File)","title":"Outputs"},{"location":"tasks/samtools/#addreplacerg","text":"description Adds or replaces read group tags outputs {'tagged_bam': 'The transformed input BAM after read group modifications have been applied'}","title":"addreplacerg"},{"location":"tasks/samtools/#inputs_7","text":"","title":"Inputs"},{"location":"tasks/samtools/#required_7","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to add read group information","title":"Required"},{"location":"tasks/samtools/#optional_1","text":"read_group_id (String?): Allows you to specify the read group ID of an existing @RG line and applies it to the reads specified by the orphan_only option","title":"Optional"},{"location":"tasks/samtools/#defaults_7","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true orphan_only (Boolean, default=true); description : Only add RG tags to orphans (true)? Or also overwrite all existing RG tags (including any in the header) (false)?; common : true overwrite_header_record (Boolean, default=false); description : Overwrite an existing @RG line, if a new one with the same ID value is provided?; common : true prefix (String, default=basename(bam,\".bam\") + \".addreplacerg\"): Prefix for the BAM file. The extension .bam will be added. read_group_line (Array[String], default=[]); description : Allows you to specify a read group line to append to (or replace in) the header and applies it to the reads specified by the orphan_only option. Each String in the Array should correspond to one field of the read group line. Tab literals will be inserted between each entry in the final BAM. Only one read group line can be supplied per invocation of this task.; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/samtools/#outputs_7","text":"tagged_bam (File)","title":"Outputs"},{"location":"tasks/samtools/#collate","text":"description Runs samtools collate on the input BAM file. Shuffles and groups reads together by their names. outputs {'collated_bam': 'A collated BAM (reads sharing a name next to each other, no other guarantee of sort order)'}","title":"collate"},{"location":"tasks/samtools/#inputs_8","text":"","title":"Inputs"},{"location":"tasks/samtools/#required_8","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to collate","title":"Required"},{"location":"tasks/samtools/#defaults_8","text":"fast_mode (Boolean, default=true); description : Use fast mode (output primary alignments only)?; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\") + \".collated\"): Prefix for the collated BAM file. The extension .bam will be added. use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/samtools/#outputs_8","text":"collated_bam (File)","title":"Outputs"},{"location":"tasks/samtools/#bam_to_fastq","text":"description Converts an input BAM file into FASTQ(s) using samtools fastq . help If paired_end == false , then all reads in the BAM will be output to a single FASTQ file. Use bitwise_filter argument to remove any unwanted reads. An exit-code of 42 indicates that no reads were present in the output FASTQs. An exit-code of 43 indicates that unexpected reads were discovered in the input BAM. output {'collated_bam': 'A collated BAM (reads sharing a name next to each other, no other guarantee of sort order). Only generated if retain_collated_bam and paired_end are both true. Has the name ~{prefix}.collated.bam .', 'read_one_fastq_gz': 'Gzipped FASTQ file with 1st reads in pair. Only generated if paired_end is true and interleaved is false. Has the name ~{prefix}.R1.fastq.gz .', 'read_two_fastq_gz': 'Gzipped FASTQ file with 2nd reads in pair. Only generated if paired_end is true and interleaved is false. Has the name ~{prefix}.R2.fastq.gz .', 'singleton_reads_fastq_gz': 'Gzipped FASTQ containing singleton reads. Only generated if paired_end and output_singletons are both true. Has the name ~{prefix}.singleton.fastq.gz .', 'interleaved_reads_fastq_gz': 'Interleaved gzipped Paired-End FASTQ. Only generated if paired_end and interleaved are both true. Has the name ~{prefix}.fastq.gz . The conditions under which this output and single_end_reads_fastq_gz are created are mutually exclusive, but since they share the same literal filename they will always evaluate to the same file (or undefined if neither are created).', 'single_end_reads_fastq_gz': 'A gzipped FASTQ containing all reads. Only generated if paired_end is false. Has the name ~{prefix}.fastq.gz . The conditions under which this output and interleaved_reads_fastq_gz are created are mutually exclusive, but since they share the same literal filename they will always evaluate to the same file (or undefined if neither are created).'}","title":"bam_to_fastq"},{"location":"tasks/samtools/#inputs_9","text":"","title":"Inputs"},{"location":"tasks/samtools/#required_9","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to convert to FASTQ(s)","title":"Required"},{"location":"tasks/samtools/#defaults_9","text":"append_read_number (Boolean, default=true); description : Append /1 and /2 suffixes to read names?; common : true bitwise_filter (FlagFilter, default={\"include_if_all\": \"0x0\", \"exclude_if_any\": \"0x900\", \"include_if_any\": \"0x0\", \"exclude_if_all\": \"0x0\"}): A set of 4 possible read filters to apply during conversion to FASTQ. This is a FlagFilter object (see ../data_structures/flag_filter.wdl for more information). By default, it will remove secondary and supplementary reads from the output FASTQs. collated (Boolean, default=false); description : Is the BAM collated (or name-sorted)? If collated == true , then the input BAM will be run through samtools fastq without preprocessing. If collated == false , then samtools collate must be run on the input BAM before conversion to FASTQ. Ignored if paired_end == false .; common : true fail_on_unexpected_reads (Boolean, default=false): The definition of 'unexpected' depends on whether the values of paired_end and output_singletons are true or false. If paired_end is false , no reads are considered unexpected, and every read (not caught by bitwise_filter ) will be present in the resulting FASTQ regardless of first / last bit settings. This setting will be ignored in that case. If paired_end is true then reads that don't satisfy first XOR last are considered unexpected (i.e. reads that have neither first nor last set or reads that have both first and last set). If output_singletons is false , singleton reads are considered unexpected. A singleton read is a read with either the first or the last bit set (but not both) and that possesses a unique QNAME; i.e. it is a read without a pair when all reads are expected to be paired. But if output_singletons is true , these singleton reads will be output as their own FASTQ instead of causing the task to fail. If fail_on_unexpected_reads is false , then all the above cases will be ignored. Any 'unexpected' reads will be silently discarded.; description : Should the task fail if reads with an unexpected first / last bit setting are discovered?; common : true fast_mode (Boolean, default=!retain_collated_bam); description : Fast mode for samtools collate ? If true , this removes secondary and supplementary reads during the collate step. If false , secondary and supplementary reads will be retained in the collated_bam output (if created). Defaults to the opposite of retain_collated_bam . Ignored if collated == true or paired_end == false .; common : true interleaved (Boolean, default=false); description : Create an interleaved FASTQ file from Paired-End data? Ignored if paired_end == false .; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true output_singletons (Boolean, default=false): Output singleton reads as their own FASTQ? Ignored if paired_end == false . paired_end (Boolean, default=true); description : Is the data Paired-End? If paired_end == false , then all reads in the BAM will be output to a single FASTQ file. Use bitwise_filter argument to remove any unwanted reads.; common : true prefix (String, default=basename(bam,\".bam\")): Prefix for the collated BAM and FASTQ files. The extensions .collated.bam and [,.R1,.R2,.singleton].fastq.gz will be added. retain_collated_bam (Boolean, default=false); description : Save the collated BAM to disk and output it (true)? This slows performance and substantially increases storage requirements. Be aware that collated BAMs occupy much more space than either position sorted or name sorted BAMs (due to the compression algorithm). Ignored if collated == true or paired_end == false .; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/samtools/#outputs_9","text":"collated_bam (File?) read_one_fastq_gz (File?) read_two_fastq_gz (File?) singleton_reads_fastq_gz (File?) interleaved_reads_fastq_gz (File?) single_end_reads_fastq_gz (File?)","title":"Outputs"},{"location":"tasks/samtools/#fixmate","text":"description Runs samtools fixmate on the name-collated input BAM file. This fills in mate coordinates and insert size fields among other tags and fields. help This task assumes a name-sorted or name-collated input BAM. If you have a position-sorted BAM, please use the position_sorted_fixmate task. This task runs fixmate and outputs a BAM in the same order as the input. outputs {'fixmate_bam': 'The BAM resulting from running samtools fixmate on the input BAM'}","title":"fixmate"},{"location":"tasks/samtools/#inputs_10","text":"","title":"Inputs"},{"location":"tasks/samtools/#required_10","text":"_runtime (Any, required ) bam (File, required ); description : Input BAM format file to add mate information. Must be name-sorted or name-collated.; stream : true","title":"Required"},{"location":"tasks/samtools/#defaults_10","text":"add_cigar (Boolean, default=true); description : Add template cigar ct tag; tool_default : false; common : true add_mate_score (Boolean, default=true); description : Add mate score tags. These are used by markdup to select the best reads to keep.; tool_default : false; common : true disable_flag_sanitization (Boolean, default=false): Disable all flag sanitization? disable_proper_pair_check (Boolean, default=false): Disable proper pair check [ensure one forward and one reverse read in each pair] extension (String, default=\".bam\"); description : File format extension to use for output file.; choices : ['.bam', '.cram']; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\") + \".fixmate\"): Prefix for the output file. The extension specified with the extension parameter will be added. remove_unaligned_and_secondary (Boolean, default=false): Remove unmapped and secondary reads use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/samtools/#outputs_10","text":"fixmate_bam (File)","title":"Outputs"},{"location":"tasks/samtools/#position_sorted_fixmate","text":"description Runs samtools fixmate on the position-sorted input BAM file and output a position-sorted BAM. fixmate fills in mate coordinates and insert size fields among other tags and fields. samtools fixmate assumes a name-sorted or name-collated input BAM. If you already have a collated BAM, please use the fixmate task. This task collates the input BAM, runs fixmate , and then resorts the output into a position-sorted BAM.","title":"position_sorted_fixmate"},{"location":"tasks/samtools/#inputs_11","text":"","title":"Inputs"},{"location":"tasks/samtools/#required_11","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to add mate information. Must be position-sorted.","title":"Required"},{"location":"tasks/samtools/#defaults_11","text":"add_cigar (Boolean, default=true); description : Add template cigar ct tag; tool_default : false; common : true add_mate_score (Boolean, default=true); description : Add mate score tags. These are used by markdup to select the best reads to keep.; tool_default : false; common : true disable_flag_sanitization (Boolean, default=false): Disable all flag sanitization? disable_proper_pair_check (Boolean, default=false): Disable proper pair check [ensure one forward and one reverse read in each pair]? fast_mode (Boolean, default=false); description : Use fast mode (output primary alignments only)?; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true prefix (String, default=basename(bam,\".bam\") + \".fixmate\"): Prefix for the output file. The extension .bam will be added. remove_unaligned_and_secondary (Boolean, default=false): Remove unmapped and secondary reads use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/samtools/#outputs_11","text":"fixmate_bam (File)","title":"Outputs"},{"location":"tasks/samtools/#markdup","text":"description [DEPRECATED] Runs samtools markdup on the position-sorted input BAM file. This creates a report and optionally a new BAM with duplicate reads marked. help This task assumes samtools fixmate has already been run on the input BAM. If it has not, then the output may be incorrect. A name-sorted or collated BAM can be run through the fixmate task (and then position-sorted prior to this task) or a position-sorted BAM can be run through the position_sorted_fixmate task. Deprecated due to extremely high memory usage for certain RNA-Seq samples when searching for optical duplicates. Use mark_duplicates in ./picard.wdl instead. deprecated true","title":"markdup"},{"location":"tasks/samtools/#inputs_12","text":"","title":"Inputs"},{"location":"tasks/samtools/#required_12","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to mark duplicates in","title":"Required"},{"location":"tasks/samtools/#defaults_12","text":"coordinates_order (String, default=\"txy\"); description : The order of the elements captured in the read_coords_regex regular expression. Default is txy where t is a part of the read name selected for string comparison and x / y are the coordinates used for optical duplicate detection. Ignored if optical_distance == 0 .; choices : ['txy', 'tyx', 'xyt', 'yxt', 'xty', 'ytx', 'xy', 'yx'] create_bam (Boolean, default=true): Create a new BAM with duplicate reads marked? If false , then only a markdup report will be generated. duplicate_count (Boolean, default=false): Record the original primary read duplication count (include itself) in a dc tag? Ignored if create_bam == false . duplicates_of_duplicates_check (Boolean, default=false): Check duplicates of duplicates for correctness? Performs further checks to make sure all optical duplicates are found. Also operates on mark_duplicates_with_do_tag tagging where reads may be tagged with the best quality read. Disabling this option can speed up duplicate marking when there are a great many duplicates for each original read. Ignored if create_bam == false or optical_distance == 0 . include_qc_fails (Boolean, default=false): Include reads that have the QC-failed flag set in duplicate marking? This can increase the number of duplicates found. Ignored if create_bam == false . json (Boolean, default=false): Output a JSON report instead of a text report? Either are parseable by MultiQC. mark_duplicates_with_do_tag (Boolean, default=false): Mark duplicates with the do ( d uplicate o riginal) tag? The do tag contains the name of the \"original\" read that was duplicated. Ignored if create_bam == false . mark_supp_or_sec_or_unmapped_as_duplicates (Boolean, default=false): Mark supplementary, secondary, or unmapped alignments of duplicates as duplicates? As this takes a quick second pass over the data it will increase running time. Ignored if create_bam == false . max_readlen (Int, default=300): Expected maximum read length. modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. modify_memory_gb (Int, default=0): Add to or subtract from dynamic memory allocation. Default memory is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2); description : Number of cores to allocate for task; common : true optical_distance (Int, default=0): Maximum distance between read coordinates to consider them optical duplicates. If 0 , then optical duplicate marking is disabled. Suggested settings of 100 for HiSeq style platforms or about 2500 for NovaSeq ones. When set above 0 , duplicate reads are tagged with dt:Z:SQ for optical duplicates and dt:Z:LB otherwise. Calculation of distance depends on coordinate data embedded in the read names, typically produced by the Illumina sequencing machines. Optical duplicate detection will not work on non-standard names without modifying read_coords_regex . If changing read_coords_regex , make sure that coordinates_order matches. prefix (String, default=basename(bam,\".bam\") + \".markdup\"): Prefix for the output file. TODO read_coords_regex (String, default=\"[!-9;-?A-~:]+:([!-9;-?A-~]+):([0-9]+):([0-9]+)\"); description : Regular expression to extract read coordinates from the QNAME field. This takes a POSIX regular expression for at least x and y to be used in optical duplicate marking It can also include another part of the read name to test for equality, eg lane:tile elements. Elements wanted are captured with parentheses. The default is meant to capture information from Illumina style read names. Ignored if optical_distance == 0 . If changing read_coords_regex , make sure that coordinates_order matches.; tool_default : ([!-9;-?A-~]+:[0-9]+:[0-9]+:[0-9]+:[0-9]+):([0-9]+):([0-9]+) remove_duplicates (Boolean, default=false): Remove duplicates from the output BAM? Ignored if create_bam == false . use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true use_read_groups (Boolean, default=false): Only mark duplicates within the same Read Group? Ignored if create_bam == false .","title":"Defaults"},{"location":"tasks/samtools/#outputs_12","text":"markdup_report (File) markdup_bam (File?)","title":"Outputs"},{"location":"tasks/samtools/#faidx","text":"description Creates a .fai FASTA index for the input FASTA outputs {'fasta_index': \"A .fai FASTA index associated with the input FASTA. Filename will be basename(fasta) + '.fai' .\"}","title":"faidx"},{"location":"tasks/samtools/#inputs_13","text":"","title":"Inputs"},{"location":"tasks/samtools/#required_13","text":"_runtime (Any, required ) fasta (File, required ): Input FASTA format file to index. Optionally gzip compressed.","title":"Required"},{"location":"tasks/samtools/#defaults_13","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments. Not recommended for cluster environments.; common : true","title":"Defaults"},{"location":"tasks/samtools/#outputs_13","text":"fasta_index (File)","title":"Outputs"},{"location":"tasks/star/","text":"Homepage build_star_db description Runs STAR's build command to generate a STAR format reference for alignment outputs {'star_db': 'A gzipped TAR file containing the STAR reference files. Suitable as the star_db_tar_gz input to the alignment task.'} Inputs Required _runtime (Any, required ) gtf (File, required ): GTF format feature file reference_fasta (File, required ): The FASTA format reference file for the genome Defaults db_name (String, default=\"star_db\"); description : Name for output in compressed, archived format. The suffix .tar.gz will be added.; common : true genomeChrBinNbits (Int, default=18): =log2(chrBin), where chrBin is the size of the bins for genome storage: each chromosome will occupy an integer number of bins. For a genome with large number of contigs, it is recommended to scale this parameter as min(18, log2[max(GenomeLength/NumberOfReferences,ReadLength)]). genomeSAindexNbases (Int, default=14): length (bases) of the SA pre-indexing string. Typically between 10 and 15. Longer strings will use much more memory, but allow faster searches. For small genomes, the parameter --genomeSAindexNbases must be scaled down to min(14, log2(GenomeLength)/2 - 1) . genomeSAsparseD (Int, default=1): suffix array sparsity, i.e. distance between indices: use bigger numbers to decrease needed RAM at the cost of mapping speed reduction. genomeSuffixLengthMax (Int, default=-1): maximum length of the suffixes, has to be longer than read length. -1 = infinite. memory_gb (Int, default=50): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=8); description : Number of cores to allocate for task; common : true sjdbGTFchrPrefix (String, default=\"-\"); description : prefix for chromosome names in a GTF file (e.g. 'chr' for using ENSMEBL annotations with UCSC genomes); common : true sjdbGTFfeatureExon (String, default=\"exon\"): feature type in GTF file to be used as exons for building transcripts sjdbGTFtagExonParentGene (String, default=\"gene_id\"): GTF attribute name for parent gene ID sjdbGTFtagExonParentGeneName (String, default=\"gene_name\"): GTF attrbute name for parent gene name sjdbGTFtagExonParentGeneType (String, default=\"gene_type gene_biotype\"): GTF attrbute name for parent gene type sjdbGTFtagExonParentTranscript (String, default=\"transcript_id\"): GTF attribute name for parent transcript ID sjdbOverhang (Int, default=125); description : length of the donor/acceptor sequence on each side of the junctions, ideally = (mate_length - 1). [STAR default] : 100 . [WDL default] : 125 .; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true Outputs star_db (File) alignment description Runs the STAR aligner on a set of RNA-Seq FASTQ files external_help https://github.com/alexdobin/STAR/blob/2.7.11b/doc/STARmanual.pdf outputs {'star_log': 'Summary mapping statistics after mapping job is complete. The statistics are calculated for each read (Single- or Paired-End) and then summed or averaged over all reads. Note that STAR counts a Paired-End read as one read. Most of the information is collected about the UNIQUE mappers. Each splicing is counted in the numbers of splices, which would correspond to summing the counts in SJ.out.tab. The mismatch/indel error rates are calculated on a per base basis, i.e. as total number of mismatches/indels in all unique mappers divided by the total number of mapped bases.', 'star_bam': 'STAR aligned BAM', 'star_junctions': 'File contains high confidence collapsed splice junctions in tab-delimited format. Note that STAR defines the junction start/end as intronic bases, while many other software define them as exonic bases. See meta.external_help for file specification.', 'star_chimeric_junctions': 'Tab delimited file containing chimeric reads and associated metadata. See meta.external_help for file specification.'} Inputs Required _runtime (Any, required ) prefix (String, required ): Prefix for the BAM and other STAR files. The extensions .Aligned.out.bam , .Log.final.out , .SJ.out.tab , and .Chimeric.out.junction will be added. read_one_fastqs_gz (Array[File], required ): An array of gzipped FASTQ files containing read one information star_db_tar_gz (File, required ): A gzipped TAR file containing the STAR reference files. The name of the root directory which was archived must match the archive's filename without the .tar.gz extension. Optional read_groups (String?): A string containing the read group information to output in the BAM file. If including multiple read group fields per-read group, they should be space delimited. Read groups should be comma separated, with a space on each side (i.e. ' , '). The ID field must come first for each read group and must be contained in the basename of a FASTQ file or pair of FASTQ files if Paired-End. Example: ID:rg1 PU:flowcell1.lane1 SM:sample1 PL:illumina LB:sample1_lib1 , ID:rg2 PU:flowcell1.lane2 SM:sample1 PL:illumina LB:sample1_lib1 . These two read groups could be associated with the following four FASTQs: sample1.rg1_R1.fastq,sample1.rg2_R1.fastq and sample1.rg1_R2.fastq,sample1.rg2_R2.fastq Defaults alignEndsProtrude (Pair[Int,String], default=(0, \"ConcordantPair\")); description : allow protrusion of alignment ends, i.e. start (end) of the +strand mate downstream of the start (end) of the -strand mate. left : maximum number of protrusion bases allowed. right : see choices below.; choices : {'ConcordantPair': 'report alignments with non-zero protrusion as concordant pairs', 'DiscordantPair': 'report alignments with non-zero protrusion as discordant pairs'} alignEndsType (String, default=\"Local\"); description : type of read ends alignment; choices : {'Local': 'standard local alignment with soft-clipping allowed', 'EndToEnd': 'force end-to-end read alignment, do not soft-clip', 'Extend5pOfRead1': 'fully extend only the 5p of the read1, all other ends: local alignment', 'Extend5pOfReads12': 'fully extend only the 5p of the both read1 and read2, all other ends: local alignment'} alignInsertionFlush (String, default=\"None\"); description : how to flush ambiguous insertion positions; choices : {'None': 'insertions are not flushed', 'Right': 'insertions are flushed to the right'}; common : true alignIntronMax (Int, default=500000); description : maximum intron size, if 0, max intron size will be determined by (2^winBinNbits) winAnchorDistNbins. [STAR default] : 0 . [WDL default] : 500000 .; common *: true alignIntronMin (Int, default=21); description : minimum intron size: genomic gap is considered intron if its length>=alignIntronMin, otherwise it is considered Deletion; common : true alignMatesGapMax (Int, default=1000000); description : maximum gap between two mates, if 0, max intron gap will be determined by (2^winBinNbits) winAnchorDistNbins. [STAR default] : 0 . [WDL default] : 1000000 ; common *: true alignSJDBoverhangMin (Int, default=1); description : minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments. [STAR default] : 3 . [WDL default] : 1 .; common : true alignSJoverhangMin (Int, default=5); description : minimum overhang (i.e. block size) for spliced alignments; common : true alignSJstitchMismatchNmax (SJ_Motifs, default={\"noncanonical_motifs\": 0, \"GT_AG_and_CT_AC_motif\": -1, \"GC_AG_and_CT_GC_motif\": 0, \"AT_AC_and_GT_AT_motif\": 0}): maximum number of mismatches for stitching of the splice junctions (-1: no limit) for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif alignSoftClipAtReferenceEnds (String, default=\"Yes\"); description : allow the soft-clipping of the alignments past the end of the chromosomes; choices : {'Yes': 'allow', 'No': 'prohibit, useful for compatibility with Cufflinks'}; common : true alignSplicedMateMapLmin (Int, default=0): minimum mapped length for a read mate that is spliced alignSplicedMateMapLminOverLmate (Float, default=0.66): alignSplicedMateMapLmin normalized to mate length alignTranscriptsPerReadNmax (Int, default=10000): max number of different alignments per read to consider alignTranscriptsPerWindowNmax (Int, default=100): max number of transcripts per window alignWindowsPerReadNmax (Int, default=10000): max number of windows per read chimFilter (String, default=\"banGenomicN\"); description : different filters for chimeric alignments; choices : {'None': 'no filtering', 'banGenomicN': 'Ns are not allowed in the genome sequence around the chimeric junction'} chimJunctionOverhangMin (Int, default=20); description : minimum overhang for a chimeric junction; common : true chimMainSegmentMultNmax (Int, default=10); description : maximum number of multi-alignments for the main chimeric segment. =1 will prohibit multimapping main segments.; common : true chimMultimapNmax (Int, default=0); description : maximum number of chimeric multi-alignments. 0 : use the old scheme for chimeric detection which only considered unique alignments; common : true chimMultimapScoreRange (Int, default=1): the score range for multi-mapping chimeras below the best chimeric score. Only works with --chimMultimapNmax > 1. chimNonchimScoreDropMin (Int, default=20): to trigger chimeric detection, the drop in the best non-chimeric alignment score with respect to the read length has to be greater than this value chimOutJunctionFormat (String, default=\"plain\"); description : formatting type for the Chimeric.out.junction file; choices : {'plain': 'no comment lines/headers', 'comments': 'comment lines at the end of the file: command line and Nreads: total, unique/multi-mapping'}; common : true chimOutType (String, default=\"Junctions\"); description : type of chimeric output; choices : {'Junctions': 'Chimeric.out.junction', 'WithinBAM_HardClip': 'output into main aligned BAM files (Aligned. .bam). Hard-clipping in the CIGAR for supplemental chimeric alignments.', 'WithinBAM_SoftClip': 'output into main aligned BAM files (Aligned. .bam). Soft-clipping in the CIGAR for supplemental chimeric alignments.'}; common : true chimScoreDropMax (Int, default=20); description : max drop (difference) of chimeric score (the sum of scores of all chimeric segments) from the read length; common : true chimScoreJunctionNonGTAG (Int, default=-1): penalty for a non-GT/AG chimeric junction chimScoreMin (Int, default=0); description : minimum total (summed) score of the chimeric segments; common : true chimScoreSeparation (Int, default=10): minimum difference (separation) between the best chimeric score and the next one chimSegmentMin (Int, default=0); description : minimum length of chimeric segment length, if ==0, no chimeric output; common : true chimSegmentReadGapMax (Int, default=0); description : maximum gap in the read sequence between chimeric segments; common : true clip3pAdapterMMp (Pair[Float,Float], default=(0.1, 0.1)): max proportion of mismatches for 3p adapter clipping for each mate. left applies to read one and right applies to read two. clip3pAdapterSeq (Pair[String,String], default=(\"None\", \"None\")); description : adapter sequences to clip from 3p of each mate. left applies to read one and right applies to read two.; choices : {'None': 'No 3p adapter trimming will be performed', 'sequence': 'A nucleotide sequence string of any length, matching the regex /[ATCG]+/ ', 'polyA': 'polyA sequence with the length equal to read length'}; common : true clip3pAfterAdapterNbases (Pair[Int,Int], default=(0, 0)): number of bases to clip from 3p of each mate after the adapter clipping. left applies to read one and right applies to read two. clip3pNbases (Pair[Int,Int], default=(0, 0)): number of bases to clip from 3p of each mate. left applies to read one and right applies to read two. clip5pNbases (Pair[Int,Int], default=(0, 0)): number of bases to clip from 5p of each mate. left applies to read one and right applies to read two. clipAdapterType (String, default=\"Hamming\"); description : adapter clipping type; choices : {'Hamming': 'adapter clipping based on Hamming distance, with the number of mismatches controlled by --clip5pAdapterMMp', 'CellRanger4': '5p and 3p adapter clipping similar to CellRanger4. Utilizes Opal package by Martin \u0160o\u0161i\u0107: https://github.com/Martinsos/opal', 'None': 'no adapter clipping, all other clip* parameters are disregarded'} limitOutSJcollapsed (Int, default=1000000): max number of collapsed junctions limitOutSJoneRead (Int, default=1000): max number of junctions for one read (including all multi-mappers) limitSjdbInsertNsj (Int, default=1000000): maximum number of junction to be inserted to the genome on the fly at the mapping stage, including those from annotations and those detected in the 1st step of the 2-pass run modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=8); description : Number of cores to allocate for task; common : true outFilterIntronMotifs (String, default=\"None\"); description : filter alignment using their motifs; choices : {'None': 'no filtering', 'RemoveNoncanonical': 'filter out alignments that contain non-canonical junctions', 'RemoveNoncanonicalUnannotated': 'filter out alignments that contain non-canonical unannotated junctions when using annotated splice junctions database. The annotated non-canonical junctions will be kept.'}; common : true outFilterIntronStrands (String, default=\"RemoveInconsistentStrands\"); description : filter alignments; choices : {'None': 'no filtering', 'RemoveInconsistentStrands': 'remove alignments that have junctions with inconsistent strands'}; common : true outFilterMatchNmin (Int, default=0); description : alignment will be output only if the number of matched bases is higher than or equal to this value; common : true outFilterMatchNminOverLread (Float, default=0.66): same as outFilterMatchNmin, but normalized to the read length (sum of mates' lengths for Paired-End reads) outFilterMismatchNmax (Int, default=10); description : alignment will be output only if it has no more mismatches than this value; common : true outFilterMismatchNoverLmax (Float, default=0.3): alignment will be output only if its ratio of mismatches to mapped length is less than or equal to this value outFilterMismatchNoverReadLmax (Float, default=1.0): alignment will be output only if its ratio of mismatches to read length is less than or equal to this value outFilterMultimapNmax (Int, default=20); description : maximum number of loci the read is allowed to map to. Alignments (all of them) will be output only if the read maps to no more loci than this value. Otherwise no alignments will be output, and the read will be counted as 'mapped to too many loci' in the Log.final.out. [STAR default] : 10 . [WDL default] : 20 .; common : true outFilterMultimapScoreRange (Int, default=1): the score range below the maximum score for multimapping alignments outFilterScoreMin (Int, default=0); description : alignment will be output only if its score is higher than or equal to this value; common : true outFilterScoreMinOverLread (Float, default=0.66): same as outFilterScoreMin, but normalized to read length (sum of mates' lengths for Paired-End reads) outFilterType (String, default=\"Normal\"); description : type of filtering; choices : {'Normal': 'standard filtering using only current alignment', 'BySJout': 'keep only those reads that contain junctions that passed filtering into SJ.out.tab'}; common : true outQSconversionAdd (Int, default=0): add this number to the quality score (e.g. to convert from Illumina to Sanger, use -31) outSAMattrIHstart (Int, default=1): start value for the IH attribute. 0 may be required by some downstream software, such as Cufflinks or StringTie. outSAMattributes (String, default=\"NH HI AS nM NM MD XS\"); description : a string of desired SAM attributes, in the order desired for the output SAM. Tags can be listed in any combination/order. [STAR defaults] : NH HI AS nM . [WDL default] : NH HI AS nM NM MD XS .; choices : {'NH': 'number of loci the reads maps to: =1 for unique mappers, >1 for multimappers. Standard SAM tag.', 'HI': 'multiple alignment index, starts with --outSAMattrIHstart (=1 by default). Standard SAM tag.', 'AS': 'local alignment score, +1/-1 for matches/mismateches, score penalties for indels and gaps. For PE reads, total score for two mates. Standard SAM tag.', 'nM': 'number of mismatches. For PE reads, sum over two mates.', 'NM': 'edit distance to the reference (number of mismatched + inserted + deleted bases) for each mate. Standard SAM tag.', 'MD': 'string encoding mismatched and deleted reference bases (see standard SAM specifications). Standard SAM tag.', 'jM': 'intron motifs for all junctions (i.e. N in CIGAR): 0: non-canonical; 1: GT/AG, 2: CT/AC, 3: GC/AG, 4: CT/GC, 5: AT/AC, 6: GT/AT. If splice junctions database is used, and a junction is annotated, 20 is added to its motif value.', 'jI': 'start and end of introns for all junctions (1-based).', 'XS': 'alignment strand according to --outSAMstrandField.', 'MC': \"mate's CIGAR string. Standard SAM tag.\", 'ch': 'marks all segments of all chimeric alignments for --chimOutType WithinBAM output.', 'cN': \"number of bases clipped from the read ends: 5' and 3'\"}; common *: true outSAMflagAND (Int, default=65535): 0-65535 : sam FLAG will be bitwise AND'd with this value, i.e. FLAG=FLAG & outSAMflagOR. This is applied after all flags have been set by STAR, but before outSAMflagOR. Can be used to unset specific bits that are not set otherwise. outSAMflagOR (Int, default=0): 0-65535 : sam FLAG will be bitwise OR'd with this value, i.e. FLAG=FLAG | outSAMflagOR. This is applied after all flags have been set by STAR, and after outSAMflagAND. Can be used to set specific bits that are not set otherwise. outSAMmapqUnique (Int, default=254): 0-255 : the MAPQ value for unique mappers. Please note the STAR default (255) produces errors downstream, as a MAPQ value of 255 is reserved to indicate a missing value. The default of this task is 254, which is the highest valid MAPQ value, and possibly what the author of STAR intended. [STAR default] : 255 . [WDL default] : 254 . outSAMorder (String, default=\"Paired\"); description : type of sorting for the SAM output; choices : {'Paired': 'one mate after the other for all paired alignments', 'PairedKeepInputOrder': 'one mate after the other for all paired alignments, the order is kept the same as in the input FASTQ files'} outSAMreadID (String, default=\"Standard\"); description : read ID record type; choices : {'Standard': 'first word (until space) from the FASTx read ID line, removing /1,/2 from the end', 'Number': 'read number (index) in the FASTx file'} outSAMstrandField (String, default=\"intronMotif\"); description : Cufflinks-like strand field flag; choices : {'None': 'not used', 'intronMotif': 'strand derived from the intron motif. This option changes the output alignments: reads with inconsistent and/or non-canonical introns are filtered out.'}; common : true outSAMtlen (String, default=\"left_plus\"); description : calculation method for the TLEN field in the SAM/BAM files; choices : {'left_plus': 'leftmost base of the (+)strand mate to rightmost base of the (-)mate. (+)sign for the (+)strand mate', 'left_any': 'leftmost base of any mate to rightmost base of any mate. (+)sign for the mate with the leftmost base. This is different from left_plus for overlapping mates with protruding ends'} outSAMunmapped (String, default=\"Within\"); description : output of unmapped reads in the SAM format.; choices : {'None': 'no output [STAR default] ', 'Within': 'output unmapped reads within the main SAM file (i.e. Aligned.out.sam) [WDL default] '} outSJfilterCountTotalMin (SJ_Motifs, default={\"noncanonical_motifs\": 3, \"GT_AG_and_CT_AC_motif\": 1, \"GC_AG_and_CT_GC_motif\": 1, \"AT_AC_and_GT_AT_motif\": 1}): minimum total (multi-mapping+unique) read count per junction for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif. Junctions are output if one of outSJfilterCountUniqueMin OR outSJfilterCountTotalMin conditions are satisfied. Does not apply to annotated junctions. outSJfilterCountUniqueMin (SJ_Motifs, default={\"noncanonical_motifs\": 3, \"GT_AG_and_CT_AC_motif\": 1, \"GC_AG_and_CT_GC_motif\": 1, \"AT_AC_and_GT_AT_motif\": 1}): minimum uniquely mapping read count per junction for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif. Junctions are output if one of outSJfilterCountUniqueMin OR outSJfilterCountTotalMin conditions are satisfied. Does not apply to annotated junctions. outSJfilterDistToOtherSJmin (SJ_Motifs, default={\"noncanonical_motifs\": 10, \"GT_AG_and_CT_AC_motif\": 0, \"GC_AG_and_CT_GC_motif\": 5, \"AT_AC_and_GT_AT_motif\": 10}): minimum allowed distance to other junctions' donor/acceptor for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. Does not apply to annotated junctions. outSJfilterIntronMaxVsReadN (Array[Int], default=[50000, 100000, 200000]): maximum gap allowed for junctions supported by 1,2,3,,,N reads. i.e. by default junctions supported by 1 read can have gaps <=50000b, by 2 reads: <=100000b, by 3 reads: <=200000b. by >=4 reads any gap <=alignIntronMax. Does not apply to annotated junctions. outSJfilterOverhangMin (SJ_Motifs, default={\"noncanonical_motifs\": 30, \"GT_AG_and_CT_AC_motif\": 12, \"GC_AG_and_CT_GC_motif\": 12, \"AT_AC_and_GT_AT_motif\": 12}): minimum overhang length for splice junctions on both sides for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif. Does not apply to annotated junctions. outSJfilterReads (String, default=\"All\"); description : which reads to consider for collapsed splice junctions output; choices : {'All': 'all reads, unique- and multi-mappers', 'Unique': 'uniquely mapping reads only'}; common : true peOverlapMMp (Float, default=0.01): maximum proportion of mismatched bases in the overlap area peOverlapNbasesMin (Int, default=0): minimum number of overlap bases to trigger mates merging and realignment. Specify >0 value to switch on the 'merging of overlapping mates' algorithm. readMapNumber (Int, default=-1); description : number of reads to map from the beginning of the file. -1 to map all reads; common : true readNameSeparator (String, default=\"/\"): character(s) separating the part of the read names that will be trimmed in output (read name after space is always trimmed) readQualityScoreBase (Int, default=33): number to be subtracted from the ASCII code to get Phred quality score read_two_fastqs_gz (Array[File], default=[]); description : An array of gzipped FASTQ files containing read two information; common : true runRNGseed (Int, default=777); description : random number generator seed; common : true scoreDelBase (Int, default=-2): deletion extension penalty per base (in addition to scoreDelOpen) scoreDelOpen (Int, default=-2): deletion open penalty scoreGap (Int, default=0): splice junction penalty (independent on intron motif) scoreGapATAC (Int, default=-8): AT/AC and GT/AT junction penalty (in addition to scoreGap) scoreGapGCAG (Int, default=-4): GC/AG and CT/GC junction penalty (in addition to scoreGap) scoreGapNoncan (Int, default=-8): non-canonical junction penalty (in addition to scoreGap) scoreGenomicLengthLog2scale (Float, default=-0.25): extra score logarithmically scaled with genomic length of the alignment: scoreGenomicLengthLog2scale*log2(genomicLength) scoreInsBase (Int, default=-2): insertion extension penalty per base (in addition to scoreInsOpen) scoreInsOpen (Int, default=-2): insertion open penalty scoreStitchSJshift (Int, default=1): maximum score reduction while searching for SJ boundaries in the stitching step seedMapMin (Int, default=5): min length of seeds to be mapped seedMultimapNmax (Int, default=10000): only pieces that map fewer than this value are utilized in the stitching procedure seedNoneLociPerWindow (Int, default=10): max number of one seed loci per window seedPerReadNmax (Int, default=1000): max number of seeds per read seedPerWindowNmax (Int, default=50): max number of seeds per window seedSearchLmax (Int, default=0): defines the maximum length of the seeds, if =0 seed length is not limited seedSearchStartLmax (Int, default=50): defines the search start point through the read - the read is split into pieces no longer than this value seedSearchStartLmaxOverLread (Float, default=1.0): seedSearchStartLmax normalized to read length (sum of mates' lengths for Paired-End reads) seedSplitMin (Int, default=12): min length of the seed sequences split by Ns or mate gap sjdbScore (Int, default=2); description : extra alignment score for alignments that cross database junctions; common : true twopass1readsN (Int, default=-1); description : number of reads to process for the 1st step. Use default ( -1 ) to map all reads in the first step; common : true twopassMode (String, default=\"Basic\"); description : 2-pass mapping mode; choices : {'None': '1-pass mapping [STAR default] ', 'Basic': 'basic 2-pass mapping, with all 1st pass junctions inserted into the genome indices on the fly [WDL default] '}; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true winAnchorDistNbins (Int, default=9): max number of bins between two anchors that allows aggregation of anchors into one window winAnchorMultimapNmax (Int, default=50): max number of loci anchors are allowed to map to winBinNbits (Int, default=16): =log2(winBin), where winBin is the size of the bin for the windows/clustering, each window will occupy an integer number of bins winFlankNbins (Int, default=4): =log2(winFlank), where winFlank is the size of the left and right flanking regions for each window Outputs star_log (File) star_bam (File) star_junctions (File) star_chimeric_junctions (File?)","title":"Star"},{"location":"tasks/star/#build_star_db","text":"description Runs STAR's build command to generate a STAR format reference for alignment outputs {'star_db': 'A gzipped TAR file containing the STAR reference files. Suitable as the star_db_tar_gz input to the alignment task.'}","title":"build_star_db"},{"location":"tasks/star/#inputs","text":"","title":"Inputs"},{"location":"tasks/star/#required","text":"_runtime (Any, required ) gtf (File, required ): GTF format feature file reference_fasta (File, required ): The FASTA format reference file for the genome","title":"Required"},{"location":"tasks/star/#defaults","text":"db_name (String, default=\"star_db\"); description : Name for output in compressed, archived format. The suffix .tar.gz will be added.; common : true genomeChrBinNbits (Int, default=18): =log2(chrBin), where chrBin is the size of the bins for genome storage: each chromosome will occupy an integer number of bins. For a genome with large number of contigs, it is recommended to scale this parameter as min(18, log2[max(GenomeLength/NumberOfReferences,ReadLength)]). genomeSAindexNbases (Int, default=14): length (bases) of the SA pre-indexing string. Typically between 10 and 15. Longer strings will use much more memory, but allow faster searches. For small genomes, the parameter --genomeSAindexNbases must be scaled down to min(14, log2(GenomeLength)/2 - 1) . genomeSAsparseD (Int, default=1): suffix array sparsity, i.e. distance between indices: use bigger numbers to decrease needed RAM at the cost of mapping speed reduction. genomeSuffixLengthMax (Int, default=-1): maximum length of the suffixes, has to be longer than read length. -1 = infinite. memory_gb (Int, default=50): RAM to allocate for task, specified in GB modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=8); description : Number of cores to allocate for task; common : true sjdbGTFchrPrefix (String, default=\"-\"); description : prefix for chromosome names in a GTF file (e.g. 'chr' for using ENSMEBL annotations with UCSC genomes); common : true sjdbGTFfeatureExon (String, default=\"exon\"): feature type in GTF file to be used as exons for building transcripts sjdbGTFtagExonParentGene (String, default=\"gene_id\"): GTF attribute name for parent gene ID sjdbGTFtagExonParentGeneName (String, default=\"gene_name\"): GTF attrbute name for parent gene name sjdbGTFtagExonParentGeneType (String, default=\"gene_type gene_biotype\"): GTF attrbute name for parent gene type sjdbGTFtagExonParentTranscript (String, default=\"transcript_id\"): GTF attribute name for parent transcript ID sjdbOverhang (Int, default=125); description : length of the donor/acceptor sequence on each side of the junctions, ideally = (mate_length - 1). [STAR default] : 100 . [WDL default] : 125 .; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true","title":"Defaults"},{"location":"tasks/star/#outputs","text":"star_db (File)","title":"Outputs"},{"location":"tasks/star/#alignment","text":"description Runs the STAR aligner on a set of RNA-Seq FASTQ files external_help https://github.com/alexdobin/STAR/blob/2.7.11b/doc/STARmanual.pdf outputs {'star_log': 'Summary mapping statistics after mapping job is complete. The statistics are calculated for each read (Single- or Paired-End) and then summed or averaged over all reads. Note that STAR counts a Paired-End read as one read. Most of the information is collected about the UNIQUE mappers. Each splicing is counted in the numbers of splices, which would correspond to summing the counts in SJ.out.tab. The mismatch/indel error rates are calculated on a per base basis, i.e. as total number of mismatches/indels in all unique mappers divided by the total number of mapped bases.', 'star_bam': 'STAR aligned BAM', 'star_junctions': 'File contains high confidence collapsed splice junctions in tab-delimited format. Note that STAR defines the junction start/end as intronic bases, while many other software define them as exonic bases. See meta.external_help for file specification.', 'star_chimeric_junctions': 'Tab delimited file containing chimeric reads and associated metadata. See meta.external_help for file specification.'}","title":"alignment"},{"location":"tasks/star/#inputs_1","text":"","title":"Inputs"},{"location":"tasks/star/#required_1","text":"_runtime (Any, required ) prefix (String, required ): Prefix for the BAM and other STAR files. The extensions .Aligned.out.bam , .Log.final.out , .SJ.out.tab , and .Chimeric.out.junction will be added. read_one_fastqs_gz (Array[File], required ): An array of gzipped FASTQ files containing read one information star_db_tar_gz (File, required ): A gzipped TAR file containing the STAR reference files. The name of the root directory which was archived must match the archive's filename without the .tar.gz extension.","title":"Required"},{"location":"tasks/star/#optional","text":"read_groups (String?): A string containing the read group information to output in the BAM file. If including multiple read group fields per-read group, they should be space delimited. Read groups should be comma separated, with a space on each side (i.e. ' , '). The ID field must come first for each read group and must be contained in the basename of a FASTQ file or pair of FASTQ files if Paired-End. Example: ID:rg1 PU:flowcell1.lane1 SM:sample1 PL:illumina LB:sample1_lib1 , ID:rg2 PU:flowcell1.lane2 SM:sample1 PL:illumina LB:sample1_lib1 . These two read groups could be associated with the following four FASTQs: sample1.rg1_R1.fastq,sample1.rg2_R1.fastq and sample1.rg1_R2.fastq,sample1.rg2_R2.fastq","title":"Optional"},{"location":"tasks/star/#defaults_1","text":"alignEndsProtrude (Pair[Int,String], default=(0, \"ConcordantPair\")); description : allow protrusion of alignment ends, i.e. start (end) of the +strand mate downstream of the start (end) of the -strand mate. left : maximum number of protrusion bases allowed. right : see choices below.; choices : {'ConcordantPair': 'report alignments with non-zero protrusion as concordant pairs', 'DiscordantPair': 'report alignments with non-zero protrusion as discordant pairs'} alignEndsType (String, default=\"Local\"); description : type of read ends alignment; choices : {'Local': 'standard local alignment with soft-clipping allowed', 'EndToEnd': 'force end-to-end read alignment, do not soft-clip', 'Extend5pOfRead1': 'fully extend only the 5p of the read1, all other ends: local alignment', 'Extend5pOfReads12': 'fully extend only the 5p of the both read1 and read2, all other ends: local alignment'} alignInsertionFlush (String, default=\"None\"); description : how to flush ambiguous insertion positions; choices : {'None': 'insertions are not flushed', 'Right': 'insertions are flushed to the right'}; common : true alignIntronMax (Int, default=500000); description : maximum intron size, if 0, max intron size will be determined by (2^winBinNbits) winAnchorDistNbins. [STAR default] : 0 . [WDL default] : 500000 .; common *: true alignIntronMin (Int, default=21); description : minimum intron size: genomic gap is considered intron if its length>=alignIntronMin, otherwise it is considered Deletion; common : true alignMatesGapMax (Int, default=1000000); description : maximum gap between two mates, if 0, max intron gap will be determined by (2^winBinNbits) winAnchorDistNbins. [STAR default] : 0 . [WDL default] : 1000000 ; common *: true alignSJDBoverhangMin (Int, default=1); description : minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments. [STAR default] : 3 . [WDL default] : 1 .; common : true alignSJoverhangMin (Int, default=5); description : minimum overhang (i.e. block size) for spliced alignments; common : true alignSJstitchMismatchNmax (SJ_Motifs, default={\"noncanonical_motifs\": 0, \"GT_AG_and_CT_AC_motif\": -1, \"GC_AG_and_CT_GC_motif\": 0, \"AT_AC_and_GT_AT_motif\": 0}): maximum number of mismatches for stitching of the splice junctions (-1: no limit) for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif alignSoftClipAtReferenceEnds (String, default=\"Yes\"); description : allow the soft-clipping of the alignments past the end of the chromosomes; choices : {'Yes': 'allow', 'No': 'prohibit, useful for compatibility with Cufflinks'}; common : true alignSplicedMateMapLmin (Int, default=0): minimum mapped length for a read mate that is spliced alignSplicedMateMapLminOverLmate (Float, default=0.66): alignSplicedMateMapLmin normalized to mate length alignTranscriptsPerReadNmax (Int, default=10000): max number of different alignments per read to consider alignTranscriptsPerWindowNmax (Int, default=100): max number of transcripts per window alignWindowsPerReadNmax (Int, default=10000): max number of windows per read chimFilter (String, default=\"banGenomicN\"); description : different filters for chimeric alignments; choices : {'None': 'no filtering', 'banGenomicN': 'Ns are not allowed in the genome sequence around the chimeric junction'} chimJunctionOverhangMin (Int, default=20); description : minimum overhang for a chimeric junction; common : true chimMainSegmentMultNmax (Int, default=10); description : maximum number of multi-alignments for the main chimeric segment. =1 will prohibit multimapping main segments.; common : true chimMultimapNmax (Int, default=0); description : maximum number of chimeric multi-alignments. 0 : use the old scheme for chimeric detection which only considered unique alignments; common : true chimMultimapScoreRange (Int, default=1): the score range for multi-mapping chimeras below the best chimeric score. Only works with --chimMultimapNmax > 1. chimNonchimScoreDropMin (Int, default=20): to trigger chimeric detection, the drop in the best non-chimeric alignment score with respect to the read length has to be greater than this value chimOutJunctionFormat (String, default=\"plain\"); description : formatting type for the Chimeric.out.junction file; choices : {'plain': 'no comment lines/headers', 'comments': 'comment lines at the end of the file: command line and Nreads: total, unique/multi-mapping'}; common : true chimOutType (String, default=\"Junctions\"); description : type of chimeric output; choices : {'Junctions': 'Chimeric.out.junction', 'WithinBAM_HardClip': 'output into main aligned BAM files (Aligned. .bam). Hard-clipping in the CIGAR for supplemental chimeric alignments.', 'WithinBAM_SoftClip': 'output into main aligned BAM files (Aligned. .bam). Soft-clipping in the CIGAR for supplemental chimeric alignments.'}; common : true chimScoreDropMax (Int, default=20); description : max drop (difference) of chimeric score (the sum of scores of all chimeric segments) from the read length; common : true chimScoreJunctionNonGTAG (Int, default=-1): penalty for a non-GT/AG chimeric junction chimScoreMin (Int, default=0); description : minimum total (summed) score of the chimeric segments; common : true chimScoreSeparation (Int, default=10): minimum difference (separation) between the best chimeric score and the next one chimSegmentMin (Int, default=0); description : minimum length of chimeric segment length, if ==0, no chimeric output; common : true chimSegmentReadGapMax (Int, default=0); description : maximum gap in the read sequence between chimeric segments; common : true clip3pAdapterMMp (Pair[Float,Float], default=(0.1, 0.1)): max proportion of mismatches for 3p adapter clipping for each mate. left applies to read one and right applies to read two. clip3pAdapterSeq (Pair[String,String], default=(\"None\", \"None\")); description : adapter sequences to clip from 3p of each mate. left applies to read one and right applies to read two.; choices : {'None': 'No 3p adapter trimming will be performed', 'sequence': 'A nucleotide sequence string of any length, matching the regex /[ATCG]+/ ', 'polyA': 'polyA sequence with the length equal to read length'}; common : true clip3pAfterAdapterNbases (Pair[Int,Int], default=(0, 0)): number of bases to clip from 3p of each mate after the adapter clipping. left applies to read one and right applies to read two. clip3pNbases (Pair[Int,Int], default=(0, 0)): number of bases to clip from 3p of each mate. left applies to read one and right applies to read two. clip5pNbases (Pair[Int,Int], default=(0, 0)): number of bases to clip from 5p of each mate. left applies to read one and right applies to read two. clipAdapterType (String, default=\"Hamming\"); description : adapter clipping type; choices : {'Hamming': 'adapter clipping based on Hamming distance, with the number of mismatches controlled by --clip5pAdapterMMp', 'CellRanger4': '5p and 3p adapter clipping similar to CellRanger4. Utilizes Opal package by Martin \u0160o\u0161i\u0107: https://github.com/Martinsos/opal', 'None': 'no adapter clipping, all other clip* parameters are disregarded'} limitOutSJcollapsed (Int, default=1000000): max number of collapsed junctions limitOutSJoneRead (Int, default=1000): max number of junctions for one read (including all multi-mappers) limitSjdbInsertNsj (Int, default=1000000): maximum number of junction to be inserted to the genome on the fly at the mapping stage, including those from annotations and those detected in the 1st step of the 2-pass run modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=8); description : Number of cores to allocate for task; common : true outFilterIntronMotifs (String, default=\"None\"); description : filter alignment using their motifs; choices : {'None': 'no filtering', 'RemoveNoncanonical': 'filter out alignments that contain non-canonical junctions', 'RemoveNoncanonicalUnannotated': 'filter out alignments that contain non-canonical unannotated junctions when using annotated splice junctions database. The annotated non-canonical junctions will be kept.'}; common : true outFilterIntronStrands (String, default=\"RemoveInconsistentStrands\"); description : filter alignments; choices : {'None': 'no filtering', 'RemoveInconsistentStrands': 'remove alignments that have junctions with inconsistent strands'}; common : true outFilterMatchNmin (Int, default=0); description : alignment will be output only if the number of matched bases is higher than or equal to this value; common : true outFilterMatchNminOverLread (Float, default=0.66): same as outFilterMatchNmin, but normalized to the read length (sum of mates' lengths for Paired-End reads) outFilterMismatchNmax (Int, default=10); description : alignment will be output only if it has no more mismatches than this value; common : true outFilterMismatchNoverLmax (Float, default=0.3): alignment will be output only if its ratio of mismatches to mapped length is less than or equal to this value outFilterMismatchNoverReadLmax (Float, default=1.0): alignment will be output only if its ratio of mismatches to read length is less than or equal to this value outFilterMultimapNmax (Int, default=20); description : maximum number of loci the read is allowed to map to. Alignments (all of them) will be output only if the read maps to no more loci than this value. Otherwise no alignments will be output, and the read will be counted as 'mapped to too many loci' in the Log.final.out. [STAR default] : 10 . [WDL default] : 20 .; common : true outFilterMultimapScoreRange (Int, default=1): the score range below the maximum score for multimapping alignments outFilterScoreMin (Int, default=0); description : alignment will be output only if its score is higher than or equal to this value; common : true outFilterScoreMinOverLread (Float, default=0.66): same as outFilterScoreMin, but normalized to read length (sum of mates' lengths for Paired-End reads) outFilterType (String, default=\"Normal\"); description : type of filtering; choices : {'Normal': 'standard filtering using only current alignment', 'BySJout': 'keep only those reads that contain junctions that passed filtering into SJ.out.tab'}; common : true outQSconversionAdd (Int, default=0): add this number to the quality score (e.g. to convert from Illumina to Sanger, use -31) outSAMattrIHstart (Int, default=1): start value for the IH attribute. 0 may be required by some downstream software, such as Cufflinks or StringTie. outSAMattributes (String, default=\"NH HI AS nM NM MD XS\"); description : a string of desired SAM attributes, in the order desired for the output SAM. Tags can be listed in any combination/order. [STAR defaults] : NH HI AS nM . [WDL default] : NH HI AS nM NM MD XS .; choices : {'NH': 'number of loci the reads maps to: =1 for unique mappers, >1 for multimappers. Standard SAM tag.', 'HI': 'multiple alignment index, starts with --outSAMattrIHstart (=1 by default). Standard SAM tag.', 'AS': 'local alignment score, +1/-1 for matches/mismateches, score penalties for indels and gaps. For PE reads, total score for two mates. Standard SAM tag.', 'nM': 'number of mismatches. For PE reads, sum over two mates.', 'NM': 'edit distance to the reference (number of mismatched + inserted + deleted bases) for each mate. Standard SAM tag.', 'MD': 'string encoding mismatched and deleted reference bases (see standard SAM specifications). Standard SAM tag.', 'jM': 'intron motifs for all junctions (i.e. N in CIGAR): 0: non-canonical; 1: GT/AG, 2: CT/AC, 3: GC/AG, 4: CT/GC, 5: AT/AC, 6: GT/AT. If splice junctions database is used, and a junction is annotated, 20 is added to its motif value.', 'jI': 'start and end of introns for all junctions (1-based).', 'XS': 'alignment strand according to --outSAMstrandField.', 'MC': \"mate's CIGAR string. Standard SAM tag.\", 'ch': 'marks all segments of all chimeric alignments for --chimOutType WithinBAM output.', 'cN': \"number of bases clipped from the read ends: 5' and 3'\"}; common *: true outSAMflagAND (Int, default=65535): 0-65535 : sam FLAG will be bitwise AND'd with this value, i.e. FLAG=FLAG & outSAMflagOR. This is applied after all flags have been set by STAR, but before outSAMflagOR. Can be used to unset specific bits that are not set otherwise. outSAMflagOR (Int, default=0): 0-65535 : sam FLAG will be bitwise OR'd with this value, i.e. FLAG=FLAG | outSAMflagOR. This is applied after all flags have been set by STAR, and after outSAMflagAND. Can be used to set specific bits that are not set otherwise. outSAMmapqUnique (Int, default=254): 0-255 : the MAPQ value for unique mappers. Please note the STAR default (255) produces errors downstream, as a MAPQ value of 255 is reserved to indicate a missing value. The default of this task is 254, which is the highest valid MAPQ value, and possibly what the author of STAR intended. [STAR default] : 255 . [WDL default] : 254 . outSAMorder (String, default=\"Paired\"); description : type of sorting for the SAM output; choices : {'Paired': 'one mate after the other for all paired alignments', 'PairedKeepInputOrder': 'one mate after the other for all paired alignments, the order is kept the same as in the input FASTQ files'} outSAMreadID (String, default=\"Standard\"); description : read ID record type; choices : {'Standard': 'first word (until space) from the FASTx read ID line, removing /1,/2 from the end', 'Number': 'read number (index) in the FASTx file'} outSAMstrandField (String, default=\"intronMotif\"); description : Cufflinks-like strand field flag; choices : {'None': 'not used', 'intronMotif': 'strand derived from the intron motif. This option changes the output alignments: reads with inconsistent and/or non-canonical introns are filtered out.'}; common : true outSAMtlen (String, default=\"left_plus\"); description : calculation method for the TLEN field in the SAM/BAM files; choices : {'left_plus': 'leftmost base of the (+)strand mate to rightmost base of the (-)mate. (+)sign for the (+)strand mate', 'left_any': 'leftmost base of any mate to rightmost base of any mate. (+)sign for the mate with the leftmost base. This is different from left_plus for overlapping mates with protruding ends'} outSAMunmapped (String, default=\"Within\"); description : output of unmapped reads in the SAM format.; choices : {'None': 'no output [STAR default] ', 'Within': 'output unmapped reads within the main SAM file (i.e. Aligned.out.sam) [WDL default] '} outSJfilterCountTotalMin (SJ_Motifs, default={\"noncanonical_motifs\": 3, \"GT_AG_and_CT_AC_motif\": 1, \"GC_AG_and_CT_GC_motif\": 1, \"AT_AC_and_GT_AT_motif\": 1}): minimum total (multi-mapping+unique) read count per junction for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif. Junctions are output if one of outSJfilterCountUniqueMin OR outSJfilterCountTotalMin conditions are satisfied. Does not apply to annotated junctions. outSJfilterCountUniqueMin (SJ_Motifs, default={\"noncanonical_motifs\": 3, \"GT_AG_and_CT_AC_motif\": 1, \"GC_AG_and_CT_GC_motif\": 1, \"AT_AC_and_GT_AT_motif\": 1}): minimum uniquely mapping read count per junction for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif. Junctions are output if one of outSJfilterCountUniqueMin OR outSJfilterCountTotalMin conditions are satisfied. Does not apply to annotated junctions. outSJfilterDistToOtherSJmin (SJ_Motifs, default={\"noncanonical_motifs\": 10, \"GT_AG_and_CT_AC_motif\": 0, \"GC_AG_and_CT_GC_motif\": 5, \"AT_AC_and_GT_AT_motif\": 10}): minimum allowed distance to other junctions' donor/acceptor for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. Does not apply to annotated junctions. outSJfilterIntronMaxVsReadN (Array[Int], default=[50000, 100000, 200000]): maximum gap allowed for junctions supported by 1,2,3,,,N reads. i.e. by default junctions supported by 1 read can have gaps <=50000b, by 2 reads: <=100000b, by 3 reads: <=200000b. by >=4 reads any gap <=alignIntronMax. Does not apply to annotated junctions. outSJfilterOverhangMin (SJ_Motifs, default={\"noncanonical_motifs\": 30, \"GT_AG_and_CT_AC_motif\": 12, \"GC_AG_and_CT_GC_motif\": 12, \"AT_AC_and_GT_AT_motif\": 12}): minimum overhang length for splice junctions on both sides for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif. Does not apply to annotated junctions. outSJfilterReads (String, default=\"All\"); description : which reads to consider for collapsed splice junctions output; choices : {'All': 'all reads, unique- and multi-mappers', 'Unique': 'uniquely mapping reads only'}; common : true peOverlapMMp (Float, default=0.01): maximum proportion of mismatched bases in the overlap area peOverlapNbasesMin (Int, default=0): minimum number of overlap bases to trigger mates merging and realignment. Specify >0 value to switch on the 'merging of overlapping mates' algorithm. readMapNumber (Int, default=-1); description : number of reads to map from the beginning of the file. -1 to map all reads; common : true readNameSeparator (String, default=\"/\"): character(s) separating the part of the read names that will be trimmed in output (read name after space is always trimmed) readQualityScoreBase (Int, default=33): number to be subtracted from the ASCII code to get Phred quality score read_two_fastqs_gz (Array[File], default=[]); description : An array of gzipped FASTQ files containing read two information; common : true runRNGseed (Int, default=777); description : random number generator seed; common : true scoreDelBase (Int, default=-2): deletion extension penalty per base (in addition to scoreDelOpen) scoreDelOpen (Int, default=-2): deletion open penalty scoreGap (Int, default=0): splice junction penalty (independent on intron motif) scoreGapATAC (Int, default=-8): AT/AC and GT/AT junction penalty (in addition to scoreGap) scoreGapGCAG (Int, default=-4): GC/AG and CT/GC junction penalty (in addition to scoreGap) scoreGapNoncan (Int, default=-8): non-canonical junction penalty (in addition to scoreGap) scoreGenomicLengthLog2scale (Float, default=-0.25): extra score logarithmically scaled with genomic length of the alignment: scoreGenomicLengthLog2scale*log2(genomicLength) scoreInsBase (Int, default=-2): insertion extension penalty per base (in addition to scoreInsOpen) scoreInsOpen (Int, default=-2): insertion open penalty scoreStitchSJshift (Int, default=1): maximum score reduction while searching for SJ boundaries in the stitching step seedMapMin (Int, default=5): min length of seeds to be mapped seedMultimapNmax (Int, default=10000): only pieces that map fewer than this value are utilized in the stitching procedure seedNoneLociPerWindow (Int, default=10): max number of one seed loci per window seedPerReadNmax (Int, default=1000): max number of seeds per read seedPerWindowNmax (Int, default=50): max number of seeds per window seedSearchLmax (Int, default=0): defines the maximum length of the seeds, if =0 seed length is not limited seedSearchStartLmax (Int, default=50): defines the search start point through the read - the read is split into pieces no longer than this value seedSearchStartLmaxOverLread (Float, default=1.0): seedSearchStartLmax normalized to read length (sum of mates' lengths for Paired-End reads) seedSplitMin (Int, default=12): min length of the seed sequences split by Ns or mate gap sjdbScore (Int, default=2); description : extra alignment score for alignments that cross database junctions; common : true twopass1readsN (Int, default=-1); description : number of reads to process for the 1st step. Use default ( -1 ) to map all reads in the first step; common : true twopassMode (String, default=\"Basic\"); description : 2-pass mapping mode; choices : {'None': '1-pass mapping [STAR default] ', 'Basic': 'basic 2-pass mapping, with all 1st pass junctions inserted into the genome indices on the fly [WDL default] '}; common : true use_all_cores (Boolean, default=false); description : Use all cores? Recommended for cloud environments.; common : true winAnchorDistNbins (Int, default=9): max number of bins between two anchors that allows aggregation of anchors into one window winAnchorMultimapNmax (Int, default=50): max number of loci anchors are allowed to map to winBinNbits (Int, default=16): =log2(winBin), where winBin is the size of the bin for the windows/clustering, each window will occupy an integer number of bins winFlankNbins (Int, default=4): =log2(winFlank), where winFlank is the size of the left and right flanking regions for each window","title":"Defaults"},{"location":"tasks/star/#outputs_1","text":"star_log (File) star_bam (File) star_junctions (File) star_chimeric_junctions (File?)","title":"Outputs"},{"location":"tasks/util/","text":"Utilities download description Uses wget to download a file from a remote URL to the local filesystem outputs {'downloaded_file': 'File downloaded from provided URL'} Inputs Required _runtime (Any, required ) disk_size_gb (Int, required ): Disk space to allocate for task, specified in GB outfile_name (String, required ): Name of the output file url (String, required ): URL of the file to download Optional md5sum (String?): Optional md5sum to check against downloaded file. Recommended to use in order to catch corruption or an unintentional file swap. Outputs downloaded_file (File) get_read_groups description Gets read group information from a BAM file and writes it out to as a string outputs {'read_groups': 'An array of strings containing read group information. If format_for_star = true , all found read groups are contained in one string ( read_groups[0] ). If format_for_star = false , each found @RG line will be its own entry in output array read_groups .'} Inputs Required _runtime (Any, required ) bam (File, required ); description : Input BAM format file to get read groups from; stream : true Defaults format_for_star (Boolean, default=true); description : Format read group information for the STAR aligner (true) or output @RG lines of the header without further processing (false)? STAR formatted results will be an array of length 1, where all found read groups are contained in one string ( read_groups[0] ). If no processing is selected, each found @RG line will be its own entry in output array read_groups .; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. Outputs read_groups (Array[String]) split_string description Split a string into an array of strings based on a delimiter outputs {'split_string': 'Split string as an array'} Inputs Required _runtime (Any, required ) input_string (String, required ): String to split on occurences of delimiter Defaults delimiter (String, default=\" , \"); description : Delimiter on which to split input_string ; common : true Outputs split_strings (Array[String]) calc_gene_lengths description Calculate gene lengths from a GTF feature file using the non-overlapping exonic length algorithm help The non-overlapping exonic length algorithm can be implemented as the sum of each base covered by at least one exon; where each base is given a value of 1 regardless of how many exons overlap it. outputs {'gene_lengths': 'A two column headered TSV file with gene names in the first column and feature lengths (as integers) in the second column'} Inputs Required _runtime (Any, required ) gtf (File, required ): GTF feature file Defaults idattr (String, default=\"gene_name\"); description : GTF attribute to be used as feature ID. The value of this attribute will be used as the first column in the output file.; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. outfile_name (String, default=basename(gtf,\".gtf.gz\") + \".genelengths.txt\"): Name of the gene lengths file Outputs gene_lengths (File) compression_integrity description Checks the compression integrity of a bgzipped file outputs {'check': 'Dummy output to indicate success and to enable call-caching'} Inputs Required _runtime (Any, required ) bgzipped_file (File, required ): Input bgzipped file to check integrity of Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. Outputs check (String) add_to_bam_header description Adds another line of text to the bottom of a BAM header outputs {'reheadered_bam': 'The BAM after its header has been modified'} Inputs Required _runtime (Any, required ) additional_header (String, required ): A string to add as a new line in the BAM header. No format checking is done, so please ensure you do not invalidate your BAM with this task. Add only spec compliant entries to the header. bam (File, required ): Input BAM format file which will have its header added to Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\") + \".reheader\"): Prefix for the reheadered BAM. The extension .bam will be added. Outputs reheadered_bam (File) unpack_tarball description Accepts a .tar.gz archive and converts it into a flat array of files. Any directory structure of the archive is ignored. outputs {'tarball_contents': 'An array of files found in the input tarball'} Inputs Required _runtime (Any, required ) tarball (File, required ): A .tar.gz archive to unpack into individual files Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. Outputs tarball_contents (Array[File]) make_coverage_regions_beds description Takes in a GTF file, converts it to BED, then filters it down to two 3 column BED files: one of only 'exons', one of only 'CDS' regions outputs {'bed': 'Input GTF converted into BED format using the gtf2bed program', 'exon_bed': \"3 column BED file corresponding to all 'exons' found in the input GTF\", 'CDS_bed': \"3 column BED file corresponding to all 'CDS' regions found in the input GTF\"} Inputs Required _runtime (Any, required ) gtf (File, required ): GTF feature file from which to derive coverage regions BED files Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. Outputs bed (File) exon_bed (File) CDS_bed (File) global_phred_scores description Calculates statistics about PHRED scores of the input BAM outputs {'phred_scores': 'Headered TSV file containing PHRED score statistics'} Inputs Required _runtime (Any, required ) bam (File, required ): Input BAM format file to calculate PHRED score statistics for Defaults fast_mode (Boolean, default=true): Enable fast mode (true) or calculate statistics for every base in the BAM (false)? modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\")): Prefix for the output TSV file. The extension .global_PHRED_scores.tsv will be added. Outputs phred_scores (File) qc_summary description [OUT OF DATE] This WDL task pulls out keys metrics that can provide a high level overview of the sample, without needing to examine the entire MultiQC report. Currently, these key metrics come from Qualimap and ngsderive. Inputs Required _runtime (Any, required ) multiqc_tar_gz (File, required ): MultiQC report tarball from which to extract key metrics Defaults outfile_name (String, default=basename(multiqc_tar_gz,\".multiqc.tar.gz\") + \".qc_summary.json\"): Name for the JSON file Outputs summary (File) split_fastq description Splits a FASTQ into multiple files based on the number of reads per file outputs {'fastqs': 'Array of FASTQ files, each containing a subset of the input FASTQ'} Inputs Required _runtime (Any, required ) fastq (File, required ); description : Gzipped FASTQ file to split; stream : true Defaults modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2): Number of cores to allocate for task prefix (String, default=sub(basename(fastq),\"(fastq|fq)\\.gz$\",\"\")): Prefix for the FASTQ file. The extension .fq.gz will be added. reads_per_file (Int, default=10000000): Number of reads to include in each output FASTQ file Outputs fastqs (Array[File])","title":"Utilities"},{"location":"tasks/util/#utilities","text":"","title":"Utilities"},{"location":"tasks/util/#download","text":"description Uses wget to download a file from a remote URL to the local filesystem outputs {'downloaded_file': 'File downloaded from provided URL'}","title":"download"},{"location":"tasks/util/#inputs","text":"","title":"Inputs"},{"location":"tasks/util/#required","text":"_runtime (Any, required ) disk_size_gb (Int, required ): Disk space to allocate for task, specified in GB outfile_name (String, required ): Name of the output file url (String, required ): URL of the file to download","title":"Required"},{"location":"tasks/util/#optional","text":"md5sum (String?): Optional md5sum to check against downloaded file. Recommended to use in order to catch corruption or an unintentional file swap.","title":"Optional"},{"location":"tasks/util/#outputs","text":"downloaded_file (File)","title":"Outputs"},{"location":"tasks/util/#get_read_groups","text":"description Gets read group information from a BAM file and writes it out to as a string outputs {'read_groups': 'An array of strings containing read group information. If format_for_star = true , all found read groups are contained in one string ( read_groups[0] ). If format_for_star = false , each found @RG line will be its own entry in output array read_groups .'}","title":"get_read_groups"},{"location":"tasks/util/#inputs_1","text":"","title":"Inputs"},{"location":"tasks/util/#required_1","text":"_runtime (Any, required ) bam (File, required ); description : Input BAM format file to get read groups from; stream : true","title":"Required"},{"location":"tasks/util/#defaults","text":"format_for_star (Boolean, default=true); description : Format read group information for the STAR aligner (true) or output @RG lines of the header without further processing (false)? STAR formatted results will be an array of length 1, where all found read groups are contained in one string ( read_groups[0] ). If no processing is selected, each found @RG line will be its own entry in output array read_groups .; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.","title":"Defaults"},{"location":"tasks/util/#outputs_1","text":"read_groups (Array[String])","title":"Outputs"},{"location":"tasks/util/#split_string","text":"description Split a string into an array of strings based on a delimiter outputs {'split_string': 'Split string as an array'}","title":"split_string"},{"location":"tasks/util/#inputs_2","text":"","title":"Inputs"},{"location":"tasks/util/#required_2","text":"_runtime (Any, required ) input_string (String, required ): String to split on occurences of delimiter","title":"Required"},{"location":"tasks/util/#defaults_1","text":"delimiter (String, default=\" , \"); description : Delimiter on which to split input_string ; common : true","title":"Defaults"},{"location":"tasks/util/#outputs_2","text":"split_strings (Array[String])","title":"Outputs"},{"location":"tasks/util/#calc_gene_lengths","text":"description Calculate gene lengths from a GTF feature file using the non-overlapping exonic length algorithm help The non-overlapping exonic length algorithm can be implemented as the sum of each base covered by at least one exon; where each base is given a value of 1 regardless of how many exons overlap it. outputs {'gene_lengths': 'A two column headered TSV file with gene names in the first column and feature lengths (as integers) in the second column'}","title":"calc_gene_lengths"},{"location":"tasks/util/#inputs_3","text":"","title":"Inputs"},{"location":"tasks/util/#required_3","text":"_runtime (Any, required ) gtf (File, required ): GTF feature file","title":"Required"},{"location":"tasks/util/#defaults_2","text":"idattr (String, default=\"gene_name\"); description : GTF attribute to be used as feature ID. The value of this attribute will be used as the first column in the output file.; common : true modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. outfile_name (String, default=basename(gtf,\".gtf.gz\") + \".genelengths.txt\"): Name of the gene lengths file","title":"Defaults"},{"location":"tasks/util/#outputs_3","text":"gene_lengths (File)","title":"Outputs"},{"location":"tasks/util/#compression_integrity","text":"description Checks the compression integrity of a bgzipped file outputs {'check': 'Dummy output to indicate success and to enable call-caching'}","title":"compression_integrity"},{"location":"tasks/util/#inputs_4","text":"","title":"Inputs"},{"location":"tasks/util/#required_4","text":"_runtime (Any, required ) bgzipped_file (File, required ): Input bgzipped file to check integrity of","title":"Required"},{"location":"tasks/util/#defaults_3","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.","title":"Defaults"},{"location":"tasks/util/#outputs_4","text":"check (String)","title":"Outputs"},{"location":"tasks/util/#add_to_bam_header","text":"description Adds another line of text to the bottom of a BAM header outputs {'reheadered_bam': 'The BAM after its header has been modified'}","title":"add_to_bam_header"},{"location":"tasks/util/#inputs_5","text":"","title":"Inputs"},{"location":"tasks/util/#required_5","text":"_runtime (Any, required ) additional_header (String, required ): A string to add as a new line in the BAM header. No format checking is done, so please ensure you do not invalidate your BAM with this task. Add only spec compliant entries to the header. bam (File, required ): Input BAM format file which will have its header added to","title":"Required"},{"location":"tasks/util/#defaults_4","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\") + \".reheader\"): Prefix for the reheadered BAM. The extension .bam will be added.","title":"Defaults"},{"location":"tasks/util/#outputs_5","text":"reheadered_bam (File)","title":"Outputs"},{"location":"tasks/util/#unpack_tarball","text":"description Accepts a .tar.gz archive and converts it into a flat array of files. Any directory structure of the archive is ignored. outputs {'tarball_contents': 'An array of files found in the input tarball'}","title":"unpack_tarball"},{"location":"tasks/util/#inputs_6","text":"","title":"Inputs"},{"location":"tasks/util/#required_6","text":"_runtime (Any, required ) tarball (File, required ): A .tar.gz archive to unpack into individual files","title":"Required"},{"location":"tasks/util/#defaults_5","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.","title":"Defaults"},{"location":"tasks/util/#outputs_6","text":"tarball_contents (Array[File])","title":"Outputs"},{"location":"tasks/util/#make_coverage_regions_beds","text":"description Takes in a GTF file, converts it to BED, then filters it down to two 3 column BED files: one of only 'exons', one of only 'CDS' regions outputs {'bed': 'Input GTF converted into BED format using the gtf2bed program', 'exon_bed': \"3 column BED file corresponding to all 'exons' found in the input GTF\", 'CDS_bed': \"3 column BED file corresponding to all 'CDS' regions found in the input GTF\"}","title":"make_coverage_regions_beds"},{"location":"tasks/util/#inputs_7","text":"","title":"Inputs"},{"location":"tasks/util/#required_7","text":"_runtime (Any, required ) gtf (File, required ): GTF feature file from which to derive coverage regions BED files","title":"Required"},{"location":"tasks/util/#defaults_6","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB.","title":"Defaults"},{"location":"tasks/util/#outputs_7","text":"bed (File) exon_bed (File) CDS_bed (File)","title":"Outputs"},{"location":"tasks/util/#global_phred_scores","text":"description Calculates statistics about PHRED scores of the input BAM outputs {'phred_scores': 'Headered TSV file containing PHRED score statistics'}","title":"global_phred_scores"},{"location":"tasks/util/#inputs_8","text":"","title":"Inputs"},{"location":"tasks/util/#required_8","text":"_runtime (Any, required ) bam (File, required ): Input BAM format file to calculate PHRED score statistics for","title":"Required"},{"location":"tasks/util/#defaults_7","text":"fast_mode (Boolean, default=true): Enable fast mode (true) or calculate statistics for every base in the BAM (false)? modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. prefix (String, default=basename(bam,\".bam\")): Prefix for the output TSV file. The extension .global_PHRED_scores.tsv will be added.","title":"Defaults"},{"location":"tasks/util/#outputs_8","text":"phred_scores (File)","title":"Outputs"},{"location":"tasks/util/#qc_summary","text":"description [OUT OF DATE] This WDL task pulls out keys metrics that can provide a high level overview of the sample, without needing to examine the entire MultiQC report. Currently, these key metrics come from Qualimap and ngsderive.","title":"qc_summary"},{"location":"tasks/util/#inputs_9","text":"","title":"Inputs"},{"location":"tasks/util/#required_9","text":"_runtime (Any, required ) multiqc_tar_gz (File, required ): MultiQC report tarball from which to extract key metrics","title":"Required"},{"location":"tasks/util/#defaults_8","text":"outfile_name (String, default=basename(multiqc_tar_gz,\".multiqc.tar.gz\") + \".qc_summary.json\"): Name for the JSON file","title":"Defaults"},{"location":"tasks/util/#outputs_9","text":"summary (File)","title":"Outputs"},{"location":"tasks/util/#split_fastq","text":"description Splits a FASTQ into multiple files based on the number of reads per file outputs {'fastqs': 'Array of FASTQ files, each containing a subset of the input FASTQ'}","title":"split_fastq"},{"location":"tasks/util/#inputs_10","text":"","title":"Inputs"},{"location":"tasks/util/#required_10","text":"_runtime (Any, required ) fastq (File, required ); description : Gzipped FASTQ file to split; stream : true","title":"Required"},{"location":"tasks/util/#defaults_9","text":"modify_disk_size_gb (Int, default=0): Add to or subtract from dynamic disk space allocation. Default disk size is determined by the size of the inputs. Specified in GB. ncpu (Int, default=2): Number of cores to allocate for task prefix (String, default=sub(basename(fastq),\"(fastq|fq)\\.gz$\",\"\")): Prefix for the FASTQ file. The extension .fq.gz will be added. reads_per_file (Int, default=10000000): Number of reads to include in each output FASTQ file","title":"Defaults"},{"location":"tasks/util/#outputs_10","text":"fastqs (Array[File])","title":"Outputs"},{"location":"workflows/10x-bam-to-fastqs/","text":"Cell Ranger BAM to FASTQs This WDL workflow converts an input BAM file to a set of FASTQ files. It performs QC checks along the way to validate the input and output. Output: read1s an array of files with the first read in the pair read2s an array of files with the second read in the pair fastqs an array of files sufficient for localizing in Cell Ranger's expected format fastqs_archive a compressed archive containing the array of FASTQ files LICENSING: MIT License Copyright 2020-Present St. Jude Children's Research Hospital Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. cell_ranger_bam_to_fastqs Inputs Required bam (File, required ): BAM file to split into FASTQs. bamtofastq._runtime (Any, required ) fqlint._runtime (Any, required ) quickcheck._runtime (Any, required ) Defaults cellranger11 (Boolean, default=false): Convert a BAM produced by Cell Ranger 1.0-1.1 gemcode (Boolean, default=false): Convert a BAM produced from GemCode data (Longranger 1.0 - 1.3) longranger20 (Boolean, default=false): Convert a BAM produced by Longranger 2.0 use_all_cores (Boolean, default=false): Use all cores for multi-core steps? bamtofastq.memory_gb (Int, default=40) bamtofastq.modify_disk_size_gb (Int, default=0) bamtofastq.ncpu (Int, default=1) fqlint.disable_validator_codes (Array[String], default=[]) fqlint.modify_disk_size_gb (Int, default=0) fqlint.modify_memory_gb (Int, default=0) fqlint.paired_read_validation_level (String, default=\"high\") fqlint.panic (Boolean, default=true) fqlint.single_read_validation_level (String, default=\"high\") quickcheck.modify_disk_size_gb (Int, default=0) Outputs fastqs (Array[File]) fastqs_archive (File) read1s (Array[File]) read2s (Array[File]) parse_input Inputs Required _runtime (Any, required ) cellranger11 (Boolean, required ): Convert a BAM produced by Cell Ranger 1.0-1.1 gemcode (Boolean, required ): Convert a BAM produced from GemCode data (Longranger 1.0 - 1.3) longranger20 (Boolean, required ): Convert a BAM produced by Longranger 2.0 Outputs input_check (String)","title":"Cell Ranger BAM to FASTQs"},{"location":"workflows/10x-bam-to-fastqs/#cell-ranger-bam-to-fastqs","text":"This WDL workflow converts an input BAM file to a set of FASTQ files. It performs QC checks along the way to validate the input and output.","title":"Cell Ranger BAM to FASTQs"},{"location":"workflows/10x-bam-to-fastqs/#output","text":"read1s an array of files with the first read in the pair read2s an array of files with the second read in the pair fastqs an array of files sufficient for localizing in Cell Ranger's expected format fastqs_archive a compressed archive containing the array of FASTQ files","title":"Output:"},{"location":"workflows/10x-bam-to-fastqs/#licensing","text":"","title":"LICENSING:"},{"location":"workflows/10x-bam-to-fastqs/#mit-license","text":"Copyright 2020-Present St. Jude Children's Research Hospital Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.","title":"MIT License"},{"location":"workflows/10x-bam-to-fastqs/#cell_ranger_bam_to_fastqs","text":"","title":"cell_ranger_bam_to_fastqs"},{"location":"workflows/10x-bam-to-fastqs/#inputs","text":"","title":"Inputs"},{"location":"workflows/10x-bam-to-fastqs/#required","text":"bam (File, required ): BAM file to split into FASTQs. bamtofastq._runtime (Any, required ) fqlint._runtime (Any, required ) quickcheck._runtime (Any, required )","title":"Required"},{"location":"workflows/10x-bam-to-fastqs/#defaults","text":"cellranger11 (Boolean, default=false): Convert a BAM produced by Cell Ranger 1.0-1.1 gemcode (Boolean, default=false): Convert a BAM produced from GemCode data (Longranger 1.0 - 1.3) longranger20 (Boolean, default=false): Convert a BAM produced by Longranger 2.0 use_all_cores (Boolean, default=false): Use all cores for multi-core steps? bamtofastq.memory_gb (Int, default=40) bamtofastq.modify_disk_size_gb (Int, default=0) bamtofastq.ncpu (Int, default=1) fqlint.disable_validator_codes (Array[String], default=[]) fqlint.modify_disk_size_gb (Int, default=0) fqlint.modify_memory_gb (Int, default=0) fqlint.paired_read_validation_level (String, default=\"high\") fqlint.panic (Boolean, default=true) fqlint.single_read_validation_level (String, default=\"high\") quickcheck.modify_disk_size_gb (Int, default=0)","title":"Defaults"},{"location":"workflows/10x-bam-to-fastqs/#outputs","text":"fastqs (Array[File]) fastqs_archive (File) read1s (Array[File]) read2s (Array[File])","title":"Outputs"},{"location":"workflows/10x-bam-to-fastqs/#parse_input","text":"","title":"parse_input"},{"location":"workflows/10x-bam-to-fastqs/#inputs_1","text":"","title":"Inputs"},{"location":"workflows/10x-bam-to-fastqs/#required_1","text":"_runtime (Any, required ) cellranger11 (Boolean, required ): Convert a BAM produced by Cell Ranger 1.0-1.1 gemcode (Boolean, required ): Convert a BAM produced from GemCode data (Longranger 1.0 - 1.3) longranger20 (Boolean, required ): Convert a BAM produced by Longranger 2.0","title":"Required"},{"location":"workflows/10x-bam-to-fastqs/#outputs_1","text":"input_check (String)","title":"Outputs"},{"location":"workflows/ESTIMATE/","text":"estimate description [DEPRECATED] Runs the ESTIMATE software package on a feature counts file external_help https://bioinformatics.mdanderson.org/estimate/ outputs {'tpm': 'Transcripts Per Million file', 'estimate_result': 'Final output of ESTIMATE'} deprecated true Inputs Required counts_file (File, required ): A two column headerless TSV file with gene names in the first column and counts (as integers) in the second column. Entries starting with '__' will be discarded. Can be generated with htseq.wdl . gene_lengths_file (File, required ): A two column headered TSV file with gene names (matching those in the counts file) in the first column and feature lengths (as integers) in the second column. Can be generated with calc-gene-lengths.wdl . calc_tpm._runtime (Any, required ) run_estimate._runtime (Any, required ) Defaults calc_tpm.prefix (String, default=basename(counts,\".feature-counts.txt\")) run_estimate.disk_size_gb (Int, default=10) run_estimate.max_retries (Int, default=1) run_estimate.memory_gb (Int, default=4) run_estimate.outfile_name (String, default=basename(gene_expression_file,\".TPM.txt\") + \".ESTIMATE.gct\") Outputs tpm (File) estimate_result (File)","title":"ESTIMATE"},{"location":"workflows/ESTIMATE/#estimate","text":"description [DEPRECATED] Runs the ESTIMATE software package on a feature counts file external_help https://bioinformatics.mdanderson.org/estimate/ outputs {'tpm': 'Transcripts Per Million file', 'estimate_result': 'Final output of ESTIMATE'} deprecated true","title":"estimate"},{"location":"workflows/ESTIMATE/#inputs","text":"","title":"Inputs"},{"location":"workflows/ESTIMATE/#required","text":"counts_file (File, required ): A two column headerless TSV file with gene names in the first column and counts (as integers) in the second column. Entries starting with '__' will be discarded. Can be generated with htseq.wdl . gene_lengths_file (File, required ): A two column headered TSV file with gene names (matching those in the counts file) in the first column and feature lengths (as integers) in the second column. Can be generated with calc-gene-lengths.wdl . calc_tpm._runtime (Any, required ) run_estimate._runtime (Any, required )","title":"Required"},{"location":"workflows/ESTIMATE/#defaults","text":"calc_tpm.prefix (String, default=basename(counts,\".feature-counts.txt\")) run_estimate.disk_size_gb (Int, default=10) run_estimate.max_retries (Int, default=1) run_estimate.memory_gb (Int, default=4) run_estimate.outfile_name (String, default=basename(gene_expression_file,\".TPM.txt\") + \".ESTIMATE.gct\")","title":"Defaults"},{"location":"workflows/ESTIMATE/#outputs","text":"tpm (File) estimate_result (File)","title":"Outputs"},{"location":"workflows/bam-to-fastqs/","text":"bam_to_fastqs description Converts an input BAM file to one or more FASTQ files, performing QC checks along the way outputs {'read1s': 'Array of FASTQ files corresponding to either first reads (if paired_end = true ) or all reads (if paired_end = false )', 'read2s': 'Array of FASTQ files corresponding to last reads (if paired_end = true )'} allowNestedInputs true Inputs Required bam (File, required ): BAM file to split into FASTQs bam_to_fastq._runtime (Any, required ) fqlint._runtime (Any, required ) quickcheck._runtime (Any, required ) split._runtime (Any, required ) Defaults paired_end (Boolean, default=true): Is the data Paired-End (true) or Single-End (false)? use_all_cores (Boolean, default=false): Use all cores for multi-core steps? bam_to_fastq.append_read_number (Boolean, default=true) bam_to_fastq.bitwise_filter (FlagFilter, default={\"include_if_all\": \"0x0\", \"exclude_if_any\": \"0x900\", \"include_if_any\": \"0x0\", \"exclude_if_all\": \"0x0\"}) bam_to_fastq.collated (Boolean, default=false) bam_to_fastq.fail_on_unexpected_reads (Boolean, default=false) bam_to_fastq.fast_mode (Boolean, default=!retain_collated_bam) bam_to_fastq.modify_disk_size_gb (Int, default=0) bam_to_fastq.modify_memory_gb (Int, default=0) bam_to_fastq.ncpu (Int, default=2) bam_to_fastq.output_singletons (Boolean, default=false) bam_to_fastq.prefix (String, default=basename(bam,\".bam\")) bam_to_fastq.retain_collated_bam (Boolean, default=false) fqlint.disable_validator_codes (Array[String], default=[]) fqlint.modify_disk_size_gb (Int, default=0) fqlint.modify_memory_gb (Int, default=0) fqlint.paired_read_validation_level (String, default=\"high\") fqlint.panic (Boolean, default=true) fqlint.single_read_validation_level (String, default=\"high\") quickcheck.modify_disk_size_gb (Int, default=0) split.modify_disk_size_gb (Int, default=0) split.ncpu (Int, default=2) split.prefix (String, default=basename(bam,\".bam\")) split.reject_unaccounted (Boolean, default=true) Outputs read1s (Array[File]) read2s (Array[File?])","title":"Bam to fastqs"},{"location":"workflows/bam-to-fastqs/#bam_to_fastqs","text":"description Converts an input BAM file to one or more FASTQ files, performing QC checks along the way outputs {'read1s': 'Array of FASTQ files corresponding to either first reads (if paired_end = true ) or all reads (if paired_end = false )', 'read2s': 'Array of FASTQ files corresponding to last reads (if paired_end = true )'} allowNestedInputs true","title":"bam_to_fastqs"},{"location":"workflows/bam-to-fastqs/#inputs","text":"","title":"Inputs"},{"location":"workflows/bam-to-fastqs/#required","text":"bam (File, required ): BAM file to split into FASTQs bam_to_fastq._runtime (Any, required ) fqlint._runtime (Any, required ) quickcheck._runtime (Any, required ) split._runtime (Any, required )","title":"Required"},{"location":"workflows/bam-to-fastqs/#defaults","text":"paired_end (Boolean, default=true): Is the data Paired-End (true) or Single-End (false)? use_all_cores (Boolean, default=false): Use all cores for multi-core steps? bam_to_fastq.append_read_number (Boolean, default=true) bam_to_fastq.bitwise_filter (FlagFilter, default={\"include_if_all\": \"0x0\", \"exclude_if_any\": \"0x900\", \"include_if_any\": \"0x0\", \"exclude_if_all\": \"0x0\"}) bam_to_fastq.collated (Boolean, default=false) bam_to_fastq.fail_on_unexpected_reads (Boolean, default=false) bam_to_fastq.fast_mode (Boolean, default=!retain_collated_bam) bam_to_fastq.modify_disk_size_gb (Int, default=0) bam_to_fastq.modify_memory_gb (Int, default=0) bam_to_fastq.ncpu (Int, default=2) bam_to_fastq.output_singletons (Boolean, default=false) bam_to_fastq.prefix (String, default=basename(bam,\".bam\")) bam_to_fastq.retain_collated_bam (Boolean, default=false) fqlint.disable_validator_codes (Array[String], default=[]) fqlint.modify_disk_size_gb (Int, default=0) fqlint.modify_memory_gb (Int, default=0) fqlint.paired_read_validation_level (String, default=\"high\") fqlint.panic (Boolean, default=true) fqlint.single_read_validation_level (String, default=\"high\") quickcheck.modify_disk_size_gb (Int, default=0) split.modify_disk_size_gb (Int, default=0) split.ncpu (Int, default=2) split.prefix (String, default=basename(bam,\".bam\")) split.reject_unaccounted (Boolean, default=true)","title":"Defaults"},{"location":"workflows/bam-to-fastqs/#outputs","text":"read1s (Array[File]) read2s (Array[File?])","title":"Outputs"},{"location":"workflows/bwa-db-build/","text":"bwa_db_build description Generates a set of genome reference files usable by the BWA aligner from an input reference file in FASTA format. outputs {'reference_fa': 'FASTA format reference file used to generate bwa_db_tar_gz ', 'bwa_db_tar_gz': 'Gzipped tar archive of the BWA reference files. Files are at the root of the archive.'} allowNestedInputs true Inputs Required reference_fa_name (String, required ): Name of output reference FASTA file reference_fa_url (String, required ): URL to retrieve the reference FASTA file from. build_bwa_db._runtime (Any, required ) reference_download._runtime (Any, required ) Optional reference_fa_md5 (String?): Expected md5sum of reference FASTA file Defaults reference_fa_disk_size_gb (Int, default=10): Disk size in GB to allocate for the reference FASTA file. build_bwa_db.db_name (String, default=\"bwa_db\") build_bwa_db.modify_disk_size_gb (Int, default=0) Outputs reference_fa (File) bwa_db_tar_gz (File)","title":"Bwa db build"},{"location":"workflows/bwa-db-build/#bwa_db_build","text":"description Generates a set of genome reference files usable by the BWA aligner from an input reference file in FASTA format. outputs {'reference_fa': 'FASTA format reference file used to generate bwa_db_tar_gz ', 'bwa_db_tar_gz': 'Gzipped tar archive of the BWA reference files. Files are at the root of the archive.'} allowNestedInputs true","title":"bwa_db_build"},{"location":"workflows/bwa-db-build/#inputs","text":"","title":"Inputs"},{"location":"workflows/bwa-db-build/#required","text":"reference_fa_name (String, required ): Name of output reference FASTA file reference_fa_url (String, required ): URL to retrieve the reference FASTA file from. build_bwa_db._runtime (Any, required ) reference_download._runtime (Any, required )","title":"Required"},{"location":"workflows/bwa-db-build/#optional","text":"reference_fa_md5 (String?): Expected md5sum of reference FASTA file","title":"Optional"},{"location":"workflows/bwa-db-build/#defaults","text":"reference_fa_disk_size_gb (Int, default=10): Disk size in GB to allocate for the reference FASTA file. build_bwa_db.db_name (String, default=\"bwa_db\") build_bwa_db.modify_disk_size_gb (Int, default=0)","title":"Defaults"},{"location":"workflows/bwa-db-build/#outputs","text":"reference_fa (File) bwa_db_tar_gz (File)","title":"Outputs"},{"location":"workflows/dnaseq-core/","text":"WARNING: this workflow is experimental! Use at your own risk! dnaseq_core_experimental description Aligns DNA reads using bwa outputs {'harmonized_bam': 'Harmonized DNA-Seq BAM, aligned with bwa', 'harmonized_bam_index': 'Index for the harmonized DNA-Seq BAM file'} allowNestedInputs true Inputs Required bwa_db (File, required ): Gzipped tar archive of the bwa reference files. Files should be at the root of the archive. prefix (String, required ): Prefix for the BAM file. The extension .bam will be added. read_groups (Array[ReadGroup], required ): An Array of structs defining read groups to include in the harmonized BAM. Must correspond to input FASTQs. Each read group ID must be contained in the basename of a pair of FASTQ files. This requirement means the length of read_groups must equal the length of read_one_fastqs_gz and the length of read_two_fastqs_gz . Only the ID field is required, and it must be unique for each read group defined. See data_structures/read_group.wdl for help formatting your input JSON. read_one_fastqs_gz (Array[File], required ): Input gzipped FASTQ format file(s) with 1st read in pair to align read_two_fastqs_gz (Array[File], required ): Input gzipped FASTQ format file(s) with 2nd read in pair to align ReadGroup_to_string._runtime (Any, required ) bwa_aln_pe._runtime (Any, required ) bwa_mem._runtime (Any, required ) index._runtime (Any, required ) read_ones._runtime (Any, required ) read_twos._runtime (Any, required ) sort._runtime (Any, required ) rg_merge.basic_merge._runtime (Any, required ) rg_merge.final_merge._runtime (Any, required ) rg_merge.inner_merge._runtime (Any, required ) Optional sample_override (String?): Value to override the SM field of every read group. rg_merge.basic_merge.new_header (File?) rg_merge.final_merge.new_header (File?) rg_merge.inner_merge.new_header (File?) Defaults aligner (String, default=\"mem\"); description : BWA aligner to use; choices : ['mem', 'aln'] reads_per_file (Int, default=10000000): Controls the number of reads per FASTQ file for internal split to run BWA in parallel. use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. bwa_aln_pe.modify_disk_size_gb (Int, default=0) bwa_aln_pe.ncpu (Int, default=4) bwa_mem.modify_disk_size_gb (Int, default=0) bwa_mem.ncpu (Int, default=4) index.modify_disk_size_gb (Int, default=0) index.ncpu (Int, default=2) index.use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. read_ones.modify_disk_size_gb (Int, default=0) read_ones.ncpu (Int, default=2) read_ones.prefix (String, default=sub(basename(fastq),\"(fastq|fq)\\.gz$\",\"\")): Prefix for the BAM file. The extension .bam will be added. read_twos.modify_disk_size_gb (Int, default=0) read_twos.ncpu (Int, default=2) read_twos.prefix (String, default=sub(basename(fastq),\"(fastq|fq)\\.gz$\",\"\")): Prefix for the BAM file. The extension .bam will be added. rg_merge.max_length (Int, default=100) sort.memory_gb (Int, default=25) sort.modify_disk_size_gb (Int, default=0) sort.prefix (String, default=basename(bam,\".bam\") + \".sorted\"): Prefix for the BAM file. The extension .bam will be added. sort.sort_order (String, default=\"coordinate\") sort.validation_stringency (String, default=\"SILENT\") rg_merge.basic_merge.combine_rg (Boolean, default=true) rg_merge.basic_merge.modify_disk_size_gb (Int, default=0) rg_merge.basic_merge.name_sorted (Boolean, default=false) rg_merge.basic_merge.ncpu (Int, default=2) rg_merge.basic_merge.region (String, default=\"\") rg_merge.final_merge.modify_disk_size_gb (Int, default=0) rg_merge.final_merge.name_sorted (Boolean, default=false) rg_merge.final_merge.ncpu (Int, default=2) rg_merge.final_merge.region (String, default=\"\") rg_merge.inner_merge.combine_rg (Boolean, default=true) rg_merge.inner_merge.modify_disk_size_gb (Int, default=0) rg_merge.inner_merge.name_sorted (Boolean, default=false) rg_merge.inner_merge.ncpu (Int, default=2) rg_merge.inner_merge.region (String, default=\"\") Outputs harmonized_bam (File) harmonized_bam_index (File)","title":"Dnaseq core"},{"location":"workflows/dnaseq-core/#dnaseq_core_experimental","text":"description Aligns DNA reads using bwa outputs {'harmonized_bam': 'Harmonized DNA-Seq BAM, aligned with bwa', 'harmonized_bam_index': 'Index for the harmonized DNA-Seq BAM file'} allowNestedInputs true","title":"dnaseq_core_experimental"},{"location":"workflows/dnaseq-core/#inputs","text":"","title":"Inputs"},{"location":"workflows/dnaseq-core/#required","text":"bwa_db (File, required ): Gzipped tar archive of the bwa reference files. Files should be at the root of the archive. prefix (String, required ): Prefix for the BAM file. The extension .bam will be added. read_groups (Array[ReadGroup], required ): An Array of structs defining read groups to include in the harmonized BAM. Must correspond to input FASTQs. Each read group ID must be contained in the basename of a pair of FASTQ files. This requirement means the length of read_groups must equal the length of read_one_fastqs_gz and the length of read_two_fastqs_gz . Only the ID field is required, and it must be unique for each read group defined. See data_structures/read_group.wdl for help formatting your input JSON. read_one_fastqs_gz (Array[File], required ): Input gzipped FASTQ format file(s) with 1st read in pair to align read_two_fastqs_gz (Array[File], required ): Input gzipped FASTQ format file(s) with 2nd read in pair to align ReadGroup_to_string._runtime (Any, required ) bwa_aln_pe._runtime (Any, required ) bwa_mem._runtime (Any, required ) index._runtime (Any, required ) read_ones._runtime (Any, required ) read_twos._runtime (Any, required ) sort._runtime (Any, required ) rg_merge.basic_merge._runtime (Any, required ) rg_merge.final_merge._runtime (Any, required ) rg_merge.inner_merge._runtime (Any, required )","title":"Required"},{"location":"workflows/dnaseq-core/#optional","text":"sample_override (String?): Value to override the SM field of every read group. rg_merge.basic_merge.new_header (File?) rg_merge.final_merge.new_header (File?) rg_merge.inner_merge.new_header (File?)","title":"Optional"},{"location":"workflows/dnaseq-core/#defaults","text":"aligner (String, default=\"mem\"); description : BWA aligner to use; choices : ['mem', 'aln'] reads_per_file (Int, default=10000000): Controls the number of reads per FASTQ file for internal split to run BWA in parallel. use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. bwa_aln_pe.modify_disk_size_gb (Int, default=0) bwa_aln_pe.ncpu (Int, default=4) bwa_mem.modify_disk_size_gb (Int, default=0) bwa_mem.ncpu (Int, default=4) index.modify_disk_size_gb (Int, default=0) index.ncpu (Int, default=2) index.use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. read_ones.modify_disk_size_gb (Int, default=0) read_ones.ncpu (Int, default=2) read_ones.prefix (String, default=sub(basename(fastq),\"(fastq|fq)\\.gz$\",\"\")): Prefix for the BAM file. The extension .bam will be added. read_twos.modify_disk_size_gb (Int, default=0) read_twos.ncpu (Int, default=2) read_twos.prefix (String, default=sub(basename(fastq),\"(fastq|fq)\\.gz$\",\"\")): Prefix for the BAM file. The extension .bam will be added. rg_merge.max_length (Int, default=100) sort.memory_gb (Int, default=25) sort.modify_disk_size_gb (Int, default=0) sort.prefix (String, default=basename(bam,\".bam\") + \".sorted\"): Prefix for the BAM file. The extension .bam will be added. sort.sort_order (String, default=\"coordinate\") sort.validation_stringency (String, default=\"SILENT\") rg_merge.basic_merge.combine_rg (Boolean, default=true) rg_merge.basic_merge.modify_disk_size_gb (Int, default=0) rg_merge.basic_merge.name_sorted (Boolean, default=false) rg_merge.basic_merge.ncpu (Int, default=2) rg_merge.basic_merge.region (String, default=\"\") rg_merge.final_merge.modify_disk_size_gb (Int, default=0) rg_merge.final_merge.name_sorted (Boolean, default=false) rg_merge.final_merge.ncpu (Int, default=2) rg_merge.final_merge.region (String, default=\"\") rg_merge.inner_merge.combine_rg (Boolean, default=true) rg_merge.inner_merge.modify_disk_size_gb (Int, default=0) rg_merge.inner_merge.name_sorted (Boolean, default=false) rg_merge.inner_merge.ncpu (Int, default=2) rg_merge.inner_merge.region (String, default=\"\")","title":"Defaults"},{"location":"workflows/dnaseq-core/#outputs","text":"harmonized_bam (File) harmonized_bam_index (File)","title":"Outputs"},{"location":"workflows/dnaseq-standard-fastq/","text":"WARNING: this workflow is experimental! Use at your own risk! dnaseq_standard_fastq_experimental description Aligns DNA reads using bwa outputs {'harmonized_bam': 'Harmonized DNA-Seq BAM, aligned with bwa', 'harmonized_bam_index': 'Index for the harmonized DNA-Seq BAM file'} allowNestedInputs true Inputs Required bwa_db (File, required ): Gzipped tar archive of the bwa reference files. Files should be at the root of the archive. prefix (String, required ): Prefix for the BAM file. The extension .bam will be added. read_groups (Array[ReadGroup], required ); description : An Array of structs defining read groups to include in the harmonized BAM. Must correspond to input FASTQs. Each read group ID must be contained in the basename of a FASTQ file or pair of FASTQ files if Paired-End. This requirement means the length of read_groups must equal the length of read_one_fastqs_gz and the length of read_two_fastqs_gz if non-zero. Only the ID field is required, and it must be unique for each read group defined. See data_structures/read_group.wdl for help formatting your input JSON.; external_help : https://samtools.github.io/hts-specs/SAMv1.pdf read_one_fastqs_gz (Array[File], required ): Input gzipped FASTQ format file(s) with 1st read in pair to align read_two_fastqs_gz (Array[File], required ): Input gzipped FASTQ format file(s) with 2nd read in pair to align fqlint._runtime (Any, required ) parse_input._runtime (Any, required ) subsample._runtime (Any, required ) dnaseq_core_experimental.ReadGroup_to_string._runtime (Any, required ) dnaseq_core_experimental.bwa_aln_pe._runtime (Any, required ) dnaseq_core_experimental.bwa_mem._runtime (Any, required ) dnaseq_core_experimental.index._runtime (Any, required ) dnaseq_core_experimental.read_ones._runtime (Any, required ) dnaseq_core_experimental.read_twos._runtime (Any, required ) dnaseq_core_experimental.sort._runtime (Any, required ) dnaseq_core_experimental.rg_merge.basic_merge._runtime (Any, required ) dnaseq_core_experimental.rg_merge.final_merge._runtime (Any, required ) dnaseq_core_experimental.rg_merge.inner_merge._runtime (Any, required ) Optional dnaseq_core_experimental.sample_override (String?) dnaseq_core_experimental.rg_merge.basic_merge.new_header (File?) dnaseq_core_experimental.rg_merge.final_merge.new_header (File?) dnaseq_core_experimental.rg_merge.inner_merge.new_header (File?) Defaults aligner (String, default=\"mem\"); description : BWA aligner to use; choices : ['mem', 'aln'] reads_per_file (Int, default=10000000): Controls the number of reads per FASTQ file for internal split to run BWA in parallel. subsample_n_reads (Int, default=-1): Only process a random sampling of n reads. Any n <= 0 for processing entire input. use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. validate_input (Boolean, default=true): Ensure input FASTQs ares well-formed before beginning harmonization? fqlint.disable_validator_codes (Array[String], default=[]) fqlint.modify_disk_size_gb (Int, default=0) fqlint.modify_memory_gb (Int, default=0) fqlint.paired_read_validation_level (String, default=\"high\") fqlint.panic (Boolean, default=true) fqlint.single_read_validation_level (String, default=\"high\") subsample.modify_disk_size_gb (Int, default=0) subsample.prefix (String, default=sub(basename(read_one_fastq),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\")): Prefix for the BAM file. The extension .bam will be added. subsample.probability (Float, default=1.0) dnaseq_core_experimental.bwa_aln_pe.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.bwa_aln_pe.ncpu (Int, default=4) dnaseq_core_experimental.bwa_mem.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.bwa_mem.ncpu (Int, default=4) dnaseq_core_experimental.index.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.index.ncpu (Int, default=2) dnaseq_core_experimental.index.use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. dnaseq_core_experimental.read_ones.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.read_ones.ncpu (Int, default=2) dnaseq_core_experimental.read_ones.prefix (String, default=sub(basename(fastq),\"(fastq|fq)\\.gz$\",\"\")): Prefix for the BAM file. The extension .bam will be added. dnaseq_core_experimental.read_twos.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.read_twos.ncpu (Int, default=2) dnaseq_core_experimental.read_twos.prefix (String, default=sub(basename(fastq),\"(fastq|fq)\\.gz$\",\"\")): Prefix for the BAM file. The extension .bam will be added. dnaseq_core_experimental.rg_merge.max_length (Int, default=100) dnaseq_core_experimental.sort.memory_gb (Int, default=25) dnaseq_core_experimental.sort.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.sort.prefix (String, default=basename(bam,\".bam\") + \".sorted\"): Prefix for the BAM file. The extension .bam will be added. dnaseq_core_experimental.sort.sort_order (String, default=\"coordinate\") dnaseq_core_experimental.sort.validation_stringency (String, default=\"SILENT\") dnaseq_core_experimental.rg_merge.basic_merge.combine_rg (Boolean, default=true) dnaseq_core_experimental.rg_merge.basic_merge.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.rg_merge.basic_merge.name_sorted (Boolean, default=false) dnaseq_core_experimental.rg_merge.basic_merge.ncpu (Int, default=2) dnaseq_core_experimental.rg_merge.basic_merge.region (String, default=\"\") dnaseq_core_experimental.rg_merge.final_merge.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.rg_merge.final_merge.name_sorted (Boolean, default=false) dnaseq_core_experimental.rg_merge.final_merge.ncpu (Int, default=2) dnaseq_core_experimental.rg_merge.final_merge.region (String, default=\"\") dnaseq_core_experimental.rg_merge.inner_merge.combine_rg (Boolean, default=true) dnaseq_core_experimental.rg_merge.inner_merge.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.rg_merge.inner_merge.name_sorted (Boolean, default=false) dnaseq_core_experimental.rg_merge.inner_merge.ncpu (Int, default=2) dnaseq_core_experimental.rg_merge.inner_merge.region (String, default=\"\") Outputs harmonized_bam (File) harmonized_bam_index (File) parse_input description Parses and validates the dnaseq_standard workflow's provided inputs outputs {'check': 'Dummy output to indicate success and to enable call-caching'} Inputs Required _runtime (Any, required ) aligner (String, required ); description : BWA aligner to use; choices : ['mem', 'aln'] array_lengths (Array[Int], required ) Outputs check (String)","title":"Dnaseq standard fastq"},{"location":"workflows/dnaseq-standard-fastq/#dnaseq_standard_fastq_experimental","text":"description Aligns DNA reads using bwa outputs {'harmonized_bam': 'Harmonized DNA-Seq BAM, aligned with bwa', 'harmonized_bam_index': 'Index for the harmonized DNA-Seq BAM file'} allowNestedInputs true","title":"dnaseq_standard_fastq_experimental"},{"location":"workflows/dnaseq-standard-fastq/#inputs","text":"","title":"Inputs"},{"location":"workflows/dnaseq-standard-fastq/#required","text":"bwa_db (File, required ): Gzipped tar archive of the bwa reference files. Files should be at the root of the archive. prefix (String, required ): Prefix for the BAM file. The extension .bam will be added. read_groups (Array[ReadGroup], required ); description : An Array of structs defining read groups to include in the harmonized BAM. Must correspond to input FASTQs. Each read group ID must be contained in the basename of a FASTQ file or pair of FASTQ files if Paired-End. This requirement means the length of read_groups must equal the length of read_one_fastqs_gz and the length of read_two_fastqs_gz if non-zero. Only the ID field is required, and it must be unique for each read group defined. See data_structures/read_group.wdl for help formatting your input JSON.; external_help : https://samtools.github.io/hts-specs/SAMv1.pdf read_one_fastqs_gz (Array[File], required ): Input gzipped FASTQ format file(s) with 1st read in pair to align read_two_fastqs_gz (Array[File], required ): Input gzipped FASTQ format file(s) with 2nd read in pair to align fqlint._runtime (Any, required ) parse_input._runtime (Any, required ) subsample._runtime (Any, required ) dnaseq_core_experimental.ReadGroup_to_string._runtime (Any, required ) dnaseq_core_experimental.bwa_aln_pe._runtime (Any, required ) dnaseq_core_experimental.bwa_mem._runtime (Any, required ) dnaseq_core_experimental.index._runtime (Any, required ) dnaseq_core_experimental.read_ones._runtime (Any, required ) dnaseq_core_experimental.read_twos._runtime (Any, required ) dnaseq_core_experimental.sort._runtime (Any, required ) dnaseq_core_experimental.rg_merge.basic_merge._runtime (Any, required ) dnaseq_core_experimental.rg_merge.final_merge._runtime (Any, required ) dnaseq_core_experimental.rg_merge.inner_merge._runtime (Any, required )","title":"Required"},{"location":"workflows/dnaseq-standard-fastq/#optional","text":"dnaseq_core_experimental.sample_override (String?) dnaseq_core_experimental.rg_merge.basic_merge.new_header (File?) dnaseq_core_experimental.rg_merge.final_merge.new_header (File?) dnaseq_core_experimental.rg_merge.inner_merge.new_header (File?)","title":"Optional"},{"location":"workflows/dnaseq-standard-fastq/#defaults","text":"aligner (String, default=\"mem\"); description : BWA aligner to use; choices : ['mem', 'aln'] reads_per_file (Int, default=10000000): Controls the number of reads per FASTQ file for internal split to run BWA in parallel. subsample_n_reads (Int, default=-1): Only process a random sampling of n reads. Any n <= 0 for processing entire input. use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. validate_input (Boolean, default=true): Ensure input FASTQs ares well-formed before beginning harmonization? fqlint.disable_validator_codes (Array[String], default=[]) fqlint.modify_disk_size_gb (Int, default=0) fqlint.modify_memory_gb (Int, default=0) fqlint.paired_read_validation_level (String, default=\"high\") fqlint.panic (Boolean, default=true) fqlint.single_read_validation_level (String, default=\"high\") subsample.modify_disk_size_gb (Int, default=0) subsample.prefix (String, default=sub(basename(read_one_fastq),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\")): Prefix for the BAM file. The extension .bam will be added. subsample.probability (Float, default=1.0) dnaseq_core_experimental.bwa_aln_pe.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.bwa_aln_pe.ncpu (Int, default=4) dnaseq_core_experimental.bwa_mem.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.bwa_mem.ncpu (Int, default=4) dnaseq_core_experimental.index.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.index.ncpu (Int, default=2) dnaseq_core_experimental.index.use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. dnaseq_core_experimental.read_ones.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.read_ones.ncpu (Int, default=2) dnaseq_core_experimental.read_ones.prefix (String, default=sub(basename(fastq),\"(fastq|fq)\\.gz$\",\"\")): Prefix for the BAM file. The extension .bam will be added. dnaseq_core_experimental.read_twos.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.read_twos.ncpu (Int, default=2) dnaseq_core_experimental.read_twos.prefix (String, default=sub(basename(fastq),\"(fastq|fq)\\.gz$\",\"\")): Prefix for the BAM file. The extension .bam will be added. dnaseq_core_experimental.rg_merge.max_length (Int, default=100) dnaseq_core_experimental.sort.memory_gb (Int, default=25) dnaseq_core_experimental.sort.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.sort.prefix (String, default=basename(bam,\".bam\") + \".sorted\"): Prefix for the BAM file. The extension .bam will be added. dnaseq_core_experimental.sort.sort_order (String, default=\"coordinate\") dnaseq_core_experimental.sort.validation_stringency (String, default=\"SILENT\") dnaseq_core_experimental.rg_merge.basic_merge.combine_rg (Boolean, default=true) dnaseq_core_experimental.rg_merge.basic_merge.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.rg_merge.basic_merge.name_sorted (Boolean, default=false) dnaseq_core_experimental.rg_merge.basic_merge.ncpu (Int, default=2) dnaseq_core_experimental.rg_merge.basic_merge.region (String, default=\"\") dnaseq_core_experimental.rg_merge.final_merge.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.rg_merge.final_merge.name_sorted (Boolean, default=false) dnaseq_core_experimental.rg_merge.final_merge.ncpu (Int, default=2) dnaseq_core_experimental.rg_merge.final_merge.region (String, default=\"\") dnaseq_core_experimental.rg_merge.inner_merge.combine_rg (Boolean, default=true) dnaseq_core_experimental.rg_merge.inner_merge.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.rg_merge.inner_merge.name_sorted (Boolean, default=false) dnaseq_core_experimental.rg_merge.inner_merge.ncpu (Int, default=2) dnaseq_core_experimental.rg_merge.inner_merge.region (String, default=\"\")","title":"Defaults"},{"location":"workflows/dnaseq-standard-fastq/#outputs","text":"harmonized_bam (File) harmonized_bam_index (File)","title":"Outputs"},{"location":"workflows/dnaseq-standard-fastq/#parse_input","text":"description Parses and validates the dnaseq_standard workflow's provided inputs outputs {'check': 'Dummy output to indicate success and to enable call-caching'}","title":"parse_input"},{"location":"workflows/dnaseq-standard-fastq/#inputs_1","text":"","title":"Inputs"},{"location":"workflows/dnaseq-standard-fastq/#required_1","text":"_runtime (Any, required ) aligner (String, required ); description : BWA aligner to use; choices : ['mem', 'aln'] array_lengths (Array[Int], required )","title":"Required"},{"location":"workflows/dnaseq-standard-fastq/#outputs_1","text":"check (String)","title":"Outputs"},{"location":"workflows/dnaseq-standard/","text":"WARNING: this workflow is experimental! Use at your own risk! dnaseq_standard_experimental description Aligns DNA reads using bwa outputs {'harmonized_bam': 'Harmonized DNA-Seq BAM, aligned with bwa', 'harmonized_bam_index': 'Index for the harmonized DNA-Seq BAM file'} allowNestedInputs true Inputs Required bam (File, required ): Input BAM to realign bwa_db (File, required ): Gzipped tar archive of the bwa reference files. Files should be at the root of the archive. get_ReadGroups._runtime (Any, required ) parse_input._runtime (Any, required ) subsample._runtime (Any, required ) validate_input_bam._runtime (Any, required ) bam_to_fastqs.bam_to_fastq._runtime (Any, required ) bam_to_fastqs.fqlint._runtime (Any, required ) bam_to_fastqs.quickcheck._runtime (Any, required ) bam_to_fastqs.split._runtime (Any, required ) dnaseq_core_experimental.ReadGroup_to_string._runtime (Any, required ) dnaseq_core_experimental.bwa_aln_pe._runtime (Any, required ) dnaseq_core_experimental.bwa_mem._runtime (Any, required ) dnaseq_core_experimental.index._runtime (Any, required ) dnaseq_core_experimental.read_ones._runtime (Any, required ) dnaseq_core_experimental.read_twos._runtime (Any, required ) dnaseq_core_experimental.sort._runtime (Any, required ) dnaseq_core_experimental.rg_merge.basic_merge._runtime (Any, required ) dnaseq_core_experimental.rg_merge.final_merge._runtime (Any, required ) dnaseq_core_experimental.rg_merge.inner_merge._runtime (Any, required ) Optional sample_override (String?): Value to override the SM field of every read group. validate_input_bam.reference_fasta (File?) dnaseq_core_experimental.rg_merge.basic_merge.new_header (File?) dnaseq_core_experimental.rg_merge.final_merge.new_header (File?) dnaseq_core_experimental.rg_merge.inner_merge.new_header (File?) Defaults aligner (String, default=\"mem\"); description : BWA aligner to use; choices : ['mem', 'aln'] prefix (String, default=basename(bam,\".bam\")): Prefix for the BAM file. The extension .bam will be added. reads_per_file (Int, default=10000000): Controls the number of reads per FASTQ file for internal split to run BWA in parallel. subsample_n_reads (Int, default=-1): Only process a random sampling of n reads. Any n <= 0 for processing entire input. use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. validate_input (Boolean, default=true): Ensure input BAM is well-formed before beginning harmonization? get_ReadGroups.modify_disk_size_gb (Int, default=0) subsample.modify_disk_size_gb (Int, default=0) subsample.ncpu (Int, default=2) subsample.prefix (String, default=basename(bam,\".bam\")): Prefix for the BAM file. The extension .bam will be added. validate_input_bam.ignore_list (Array[String], default=[]) validate_input_bam.index_validation_stringency_less_exhaustive (Boolean, default=false) validate_input_bam.max_errors (Int, default=2147483647) validate_input_bam.memory_gb (Int, default=16) validate_input_bam.modify_disk_size_gb (Int, default=0) validate_input_bam.outfile_name (String, default=basename(bam,\".bam\") + \".ValidateSamFile.txt\") validate_input_bam.succeed_on_errors (Boolean, default=false) validate_input_bam.succeed_on_warnings (Boolean, default=true) validate_input_bam.summary_mode (Boolean, default=false) validate_input_bam.validation_stringency (String, default=\"LENIENT\") bam_to_fastqs.bam_to_fastq.append_read_number (Boolean, default=true) bam_to_fastqs.bam_to_fastq.bitwise_filter (FlagFilter, default={\"include_if_all\": \"0x0\", \"exclude_if_any\": \"0x900\", \"include_if_any\": \"0x0\", \"exclude_if_all\": \"0x0\"}) bam_to_fastqs.bam_to_fastq.collated (Boolean, default=false) bam_to_fastqs.bam_to_fastq.fail_on_unexpected_reads (Boolean, default=false) bam_to_fastqs.bam_to_fastq.fast_mode (Boolean, default=!retain_collated_bam) bam_to_fastqs.bam_to_fastq.modify_disk_size_gb (Int, default=0) bam_to_fastqs.bam_to_fastq.modify_memory_gb (Int, default=0) bam_to_fastqs.bam_to_fastq.ncpu (Int, default=2) bam_to_fastqs.bam_to_fastq.output_singletons (Boolean, default=false) bam_to_fastqs.bam_to_fastq.prefix (String, default=basename(bam,\".bam\")): Prefix for the BAM file. The extension .bam will be added. bam_to_fastqs.bam_to_fastq.retain_collated_bam (Boolean, default=false) bam_to_fastqs.fqlint.disable_validator_codes (Array[String], default=[]) bam_to_fastqs.fqlint.modify_disk_size_gb (Int, default=0) bam_to_fastqs.fqlint.modify_memory_gb (Int, default=0) bam_to_fastqs.fqlint.paired_read_validation_level (String, default=\"high\") bam_to_fastqs.fqlint.panic (Boolean, default=true) bam_to_fastqs.fqlint.single_read_validation_level (String, default=\"high\") bam_to_fastqs.quickcheck.modify_disk_size_gb (Int, default=0) bam_to_fastqs.split.modify_disk_size_gb (Int, default=0) bam_to_fastqs.split.ncpu (Int, default=2) bam_to_fastqs.split.prefix (String, default=basename(bam,\".bam\")): Prefix for the BAM file. The extension .bam will be added. bam_to_fastqs.split.reject_unaccounted (Boolean, default=true) dnaseq_core_experimental.bwa_aln_pe.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.bwa_aln_pe.ncpu (Int, default=4) dnaseq_core_experimental.bwa_mem.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.bwa_mem.ncpu (Int, default=4) dnaseq_core_experimental.index.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.index.ncpu (Int, default=2) dnaseq_core_experimental.index.use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. dnaseq_core_experimental.read_ones.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.read_ones.ncpu (Int, default=2) dnaseq_core_experimental.read_ones.prefix (String, default=sub(basename(fastq),\"(fastq|fq)\\.gz$\",\"\")): Prefix for the BAM file. The extension .bam will be added. dnaseq_core_experimental.read_twos.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.read_twos.ncpu (Int, default=2) dnaseq_core_experimental.read_twos.prefix (String, default=sub(basename(fastq),\"(fastq|fq)\\.gz$\",\"\")): Prefix for the BAM file. The extension .bam will be added. dnaseq_core_experimental.rg_merge.max_length (Int, default=100) dnaseq_core_experimental.sort.memory_gb (Int, default=25) dnaseq_core_experimental.sort.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.sort.prefix (String, default=basename(bam,\".bam\") + \".sorted\"): Prefix for the BAM file. The extension .bam will be added. dnaseq_core_experimental.sort.sort_order (String, default=\"coordinate\") dnaseq_core_experimental.sort.validation_stringency (String, default=\"SILENT\") dnaseq_core_experimental.rg_merge.basic_merge.combine_rg (Boolean, default=true) dnaseq_core_experimental.rg_merge.basic_merge.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.rg_merge.basic_merge.name_sorted (Boolean, default=false) dnaseq_core_experimental.rg_merge.basic_merge.ncpu (Int, default=2) dnaseq_core_experimental.rg_merge.basic_merge.region (String, default=\"\") dnaseq_core_experimental.rg_merge.final_merge.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.rg_merge.final_merge.name_sorted (Boolean, default=false) dnaseq_core_experimental.rg_merge.final_merge.ncpu (Int, default=2) dnaseq_core_experimental.rg_merge.final_merge.region (String, default=\"\") dnaseq_core_experimental.rg_merge.inner_merge.combine_rg (Boolean, default=true) dnaseq_core_experimental.rg_merge.inner_merge.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.rg_merge.inner_merge.name_sorted (Boolean, default=false) dnaseq_core_experimental.rg_merge.inner_merge.ncpu (Int, default=2) dnaseq_core_experimental.rg_merge.inner_merge.region (String, default=\"\") Outputs harmonized_bam (File) harmonized_bam_index (File) parse_input description Parses and validates the dnaseq_standard workflow's provided inputs outputs {'check': 'Dummy output to indicate success and to enable call-caching'} Inputs Required _runtime (Any, required ) aligner (String, required ); description : BWA aligner to use; choices : ['mem', 'aln'] Outputs check (String)","title":"Dnaseq standard"},{"location":"workflows/dnaseq-standard/#dnaseq_standard_experimental","text":"description Aligns DNA reads using bwa outputs {'harmonized_bam': 'Harmonized DNA-Seq BAM, aligned with bwa', 'harmonized_bam_index': 'Index for the harmonized DNA-Seq BAM file'} allowNestedInputs true","title":"dnaseq_standard_experimental"},{"location":"workflows/dnaseq-standard/#inputs","text":"","title":"Inputs"},{"location":"workflows/dnaseq-standard/#required","text":"bam (File, required ): Input BAM to realign bwa_db (File, required ): Gzipped tar archive of the bwa reference files. Files should be at the root of the archive. get_ReadGroups._runtime (Any, required ) parse_input._runtime (Any, required ) subsample._runtime (Any, required ) validate_input_bam._runtime (Any, required ) bam_to_fastqs.bam_to_fastq._runtime (Any, required ) bam_to_fastqs.fqlint._runtime (Any, required ) bam_to_fastqs.quickcheck._runtime (Any, required ) bam_to_fastqs.split._runtime (Any, required ) dnaseq_core_experimental.ReadGroup_to_string._runtime (Any, required ) dnaseq_core_experimental.bwa_aln_pe._runtime (Any, required ) dnaseq_core_experimental.bwa_mem._runtime (Any, required ) dnaseq_core_experimental.index._runtime (Any, required ) dnaseq_core_experimental.read_ones._runtime (Any, required ) dnaseq_core_experimental.read_twos._runtime (Any, required ) dnaseq_core_experimental.sort._runtime (Any, required ) dnaseq_core_experimental.rg_merge.basic_merge._runtime (Any, required ) dnaseq_core_experimental.rg_merge.final_merge._runtime (Any, required ) dnaseq_core_experimental.rg_merge.inner_merge._runtime (Any, required )","title":"Required"},{"location":"workflows/dnaseq-standard/#optional","text":"sample_override (String?): Value to override the SM field of every read group. validate_input_bam.reference_fasta (File?) dnaseq_core_experimental.rg_merge.basic_merge.new_header (File?) dnaseq_core_experimental.rg_merge.final_merge.new_header (File?) dnaseq_core_experimental.rg_merge.inner_merge.new_header (File?)","title":"Optional"},{"location":"workflows/dnaseq-standard/#defaults","text":"aligner (String, default=\"mem\"); description : BWA aligner to use; choices : ['mem', 'aln'] prefix (String, default=basename(bam,\".bam\")): Prefix for the BAM file. The extension .bam will be added. reads_per_file (Int, default=10000000): Controls the number of reads per FASTQ file for internal split to run BWA in parallel. subsample_n_reads (Int, default=-1): Only process a random sampling of n reads. Any n <= 0 for processing entire input. use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. validate_input (Boolean, default=true): Ensure input BAM is well-formed before beginning harmonization? get_ReadGroups.modify_disk_size_gb (Int, default=0) subsample.modify_disk_size_gb (Int, default=0) subsample.ncpu (Int, default=2) subsample.prefix (String, default=basename(bam,\".bam\")): Prefix for the BAM file. The extension .bam will be added. validate_input_bam.ignore_list (Array[String], default=[]) validate_input_bam.index_validation_stringency_less_exhaustive (Boolean, default=false) validate_input_bam.max_errors (Int, default=2147483647) validate_input_bam.memory_gb (Int, default=16) validate_input_bam.modify_disk_size_gb (Int, default=0) validate_input_bam.outfile_name (String, default=basename(bam,\".bam\") + \".ValidateSamFile.txt\") validate_input_bam.succeed_on_errors (Boolean, default=false) validate_input_bam.succeed_on_warnings (Boolean, default=true) validate_input_bam.summary_mode (Boolean, default=false) validate_input_bam.validation_stringency (String, default=\"LENIENT\") bam_to_fastqs.bam_to_fastq.append_read_number (Boolean, default=true) bam_to_fastqs.bam_to_fastq.bitwise_filter (FlagFilter, default={\"include_if_all\": \"0x0\", \"exclude_if_any\": \"0x900\", \"include_if_any\": \"0x0\", \"exclude_if_all\": \"0x0\"}) bam_to_fastqs.bam_to_fastq.collated (Boolean, default=false) bam_to_fastqs.bam_to_fastq.fail_on_unexpected_reads (Boolean, default=false) bam_to_fastqs.bam_to_fastq.fast_mode (Boolean, default=!retain_collated_bam) bam_to_fastqs.bam_to_fastq.modify_disk_size_gb (Int, default=0) bam_to_fastqs.bam_to_fastq.modify_memory_gb (Int, default=0) bam_to_fastqs.bam_to_fastq.ncpu (Int, default=2) bam_to_fastqs.bam_to_fastq.output_singletons (Boolean, default=false) bam_to_fastqs.bam_to_fastq.prefix (String, default=basename(bam,\".bam\")): Prefix for the BAM file. The extension .bam will be added. bam_to_fastqs.bam_to_fastq.retain_collated_bam (Boolean, default=false) bam_to_fastqs.fqlint.disable_validator_codes (Array[String], default=[]) bam_to_fastqs.fqlint.modify_disk_size_gb (Int, default=0) bam_to_fastqs.fqlint.modify_memory_gb (Int, default=0) bam_to_fastqs.fqlint.paired_read_validation_level (String, default=\"high\") bam_to_fastqs.fqlint.panic (Boolean, default=true) bam_to_fastqs.fqlint.single_read_validation_level (String, default=\"high\") bam_to_fastqs.quickcheck.modify_disk_size_gb (Int, default=0) bam_to_fastqs.split.modify_disk_size_gb (Int, default=0) bam_to_fastqs.split.ncpu (Int, default=2) bam_to_fastqs.split.prefix (String, default=basename(bam,\".bam\")): Prefix for the BAM file. The extension .bam will be added. bam_to_fastqs.split.reject_unaccounted (Boolean, default=true) dnaseq_core_experimental.bwa_aln_pe.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.bwa_aln_pe.ncpu (Int, default=4) dnaseq_core_experimental.bwa_mem.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.bwa_mem.ncpu (Int, default=4) dnaseq_core_experimental.index.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.index.ncpu (Int, default=2) dnaseq_core_experimental.index.use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. dnaseq_core_experimental.read_ones.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.read_ones.ncpu (Int, default=2) dnaseq_core_experimental.read_ones.prefix (String, default=sub(basename(fastq),\"(fastq|fq)\\.gz$\",\"\")): Prefix for the BAM file. The extension .bam will be added. dnaseq_core_experimental.read_twos.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.read_twos.ncpu (Int, default=2) dnaseq_core_experimental.read_twos.prefix (String, default=sub(basename(fastq),\"(fastq|fq)\\.gz$\",\"\")): Prefix for the BAM file. The extension .bam will be added. dnaseq_core_experimental.rg_merge.max_length (Int, default=100) dnaseq_core_experimental.sort.memory_gb (Int, default=25) dnaseq_core_experimental.sort.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.sort.prefix (String, default=basename(bam,\".bam\") + \".sorted\"): Prefix for the BAM file. The extension .bam will be added. dnaseq_core_experimental.sort.sort_order (String, default=\"coordinate\") dnaseq_core_experimental.sort.validation_stringency (String, default=\"SILENT\") dnaseq_core_experimental.rg_merge.basic_merge.combine_rg (Boolean, default=true) dnaseq_core_experimental.rg_merge.basic_merge.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.rg_merge.basic_merge.name_sorted (Boolean, default=false) dnaseq_core_experimental.rg_merge.basic_merge.ncpu (Int, default=2) dnaseq_core_experimental.rg_merge.basic_merge.region (String, default=\"\") dnaseq_core_experimental.rg_merge.final_merge.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.rg_merge.final_merge.name_sorted (Boolean, default=false) dnaseq_core_experimental.rg_merge.final_merge.ncpu (Int, default=2) dnaseq_core_experimental.rg_merge.final_merge.region (String, default=\"\") dnaseq_core_experimental.rg_merge.inner_merge.combine_rg (Boolean, default=true) dnaseq_core_experimental.rg_merge.inner_merge.modify_disk_size_gb (Int, default=0) dnaseq_core_experimental.rg_merge.inner_merge.name_sorted (Boolean, default=false) dnaseq_core_experimental.rg_merge.inner_merge.ncpu (Int, default=2) dnaseq_core_experimental.rg_merge.inner_merge.region (String, default=\"\")","title":"Defaults"},{"location":"workflows/dnaseq-standard/#outputs","text":"harmonized_bam (File) harmonized_bam_index (File)","title":"Outputs"},{"location":"workflows/dnaseq-standard/#parse_input","text":"description Parses and validates the dnaseq_standard workflow's provided inputs outputs {'check': 'Dummy output to indicate success and to enable call-caching'}","title":"parse_input"},{"location":"workflows/dnaseq-standard/#inputs_1","text":"","title":"Inputs"},{"location":"workflows/dnaseq-standard/#required_1","text":"_runtime (Any, required ) aligner (String, required ); description : BWA aligner to use; choices : ['mem', 'aln']","title":"Required"},{"location":"workflows/dnaseq-standard/#outputs_1","text":"check (String)","title":"Outputs"},{"location":"workflows/flag_filter/","text":"FlagFilter A struct to represent the filtering flags used in various samtools commands. The order of precedence is include_if_all , exclude_if_any , include_if_any , and exclude_if_all . These four fields correspond to the samtools flags -f , -F , --rf , and -G respectively. The values of these fields are strings that represent a 12bit bitwise flag. These strings must evaluate to an integer less than 4096 (2^12). They can be in octal, decimal, or hexadecimal format. Please see the meta.help of validate_string_is_12bit_oct_dec_or_hex for more information on the valid formats. The validate_FlagFilter workflow can be used to validate a FlagFilter struct. WARNING The validate_FlagFilter workflow will only check that all the fields can be parsed as integers less than 4096. It will not check if the flags are sensible input to samtools fastq . samtools fastq also employs very little error checking on the flags. So it is possible to pass in flags that produce nonsensical output. For example, it is possible to pass in flags that produce no output. Please exhibit caution while modifying any default values of a FlagFilter . We suggest using the Broad Institute's SAM flag explainer to construct the flags. Find it here . Example input JSON { \"flags\": { \"include_if_all\": \"0x3\", \"exclude_if_any\": \"0xF04\", \"include_if_any\": \"0x0\", \"exclude_if_all\": \"0x0\" } } Explanation The above example JSON represents a FlagFilter struct being passed to parameter named flags . The include_if_all field is set to 0x3 which is 3 in decimal. The exclude_if_any field is set to 0xF04 which is 3844 in decimal. The include_if_any field is set to 0x0 which is 0 in decimal. The exclude_if_all field is set to 0x0 which is 0 in decimal. 3 in decimal can be represented as 000000000011 in 12bit binary. This number means that to be included a read must have the 1st and 2nd bits set. Those bits correspond to the read paired and read mapped in proper pair flags. 3844 in decimal can be represented as 111100000100 in 12bit binary. This number means that to be excluded a read must have any of the 3rd, 9th, 10th, 11th, or 12th bits set. We won't go through what all those bits mean here, but you can find the meanings of the bits in the SAM flag explainer . In short, those are all flags corresponding to the quality of the read and them being true may indicate that the read is of low quality and should be excluded. validate_FlagFilter description Validates a FlagFilter struct. output {'check': 'Dummy output to enable caching.'} Inputs Required flags (FlagFilter, required ): FlagFilter struct to validate validate_exclude_if_all._runtime (Any, required ) validate_exclude_if_any._runtime (Any, required ) validate_include_if_all._runtime (Any, required ) validate_include_if_any._runtime (Any, required ) Outputs check (String) validate_string_is_12bit_oct_dec_or_hex description Validates that a string is a octal, decimal, or hexadecimal number and less than 2^12. help Hexadecimal numbers must be prefixed with '0x' and only contain the characters [0-9A-F] to be valid (i.e. [a-f] is not allowed). Octal number must start with '0' and only contain the characters [0-7] to be valid. And decimal numbers must start with a digit between 1-9 and only contain the characters [0-9] to be valid. outputs {'check': 'Dummy output to enable caching.'} Inputs Required _runtime (Any, required ) number (String, required ): The number to validate. See task meta.help for accepted formats. Outputs check (String)","title":"FlagFilter"},{"location":"workflows/flag_filter/#flagfilter","text":"A struct to represent the filtering flags used in various samtools commands. The order of precedence is include_if_all , exclude_if_any , include_if_any , and exclude_if_all . These four fields correspond to the samtools flags -f , -F , --rf , and -G respectively. The values of these fields are strings that represent a 12bit bitwise flag. These strings must evaluate to an integer less than 4096 (2^12). They can be in octal, decimal, or hexadecimal format. Please see the meta.help of validate_string_is_12bit_oct_dec_or_hex for more information on the valid formats. The validate_FlagFilter workflow can be used to validate a FlagFilter struct. WARNING The validate_FlagFilter workflow will only check that all the fields can be parsed as integers less than 4096. It will not check if the flags are sensible input to samtools fastq . samtools fastq also employs very little error checking on the flags. So it is possible to pass in flags that produce nonsensical output. For example, it is possible to pass in flags that produce no output. Please exhibit caution while modifying any default values of a FlagFilter . We suggest using the Broad Institute's SAM flag explainer to construct the flags. Find it here .","title":"FlagFilter"},{"location":"workflows/flag_filter/#example-input-json","text":"{ \"flags\": { \"include_if_all\": \"0x3\", \"exclude_if_any\": \"0xF04\", \"include_if_any\": \"0x0\", \"exclude_if_all\": \"0x0\" } }","title":"Example input JSON"},{"location":"workflows/flag_filter/#explanation","text":"The above example JSON represents a FlagFilter struct being passed to parameter named flags . The include_if_all field is set to 0x3 which is 3 in decimal. The exclude_if_any field is set to 0xF04 which is 3844 in decimal. The include_if_any field is set to 0x0 which is 0 in decimal. The exclude_if_all field is set to 0x0 which is 0 in decimal. 3 in decimal can be represented as 000000000011 in 12bit binary. This number means that to be included a read must have the 1st and 2nd bits set. Those bits correspond to the read paired and read mapped in proper pair flags. 3844 in decimal can be represented as 111100000100 in 12bit binary. This number means that to be excluded a read must have any of the 3rd, 9th, 10th, 11th, or 12th bits set. We won't go through what all those bits mean here, but you can find the meanings of the bits in the SAM flag explainer . In short, those are all flags corresponding to the quality of the read and them being true may indicate that the read is of low quality and should be excluded.","title":"Explanation"},{"location":"workflows/flag_filter/#validate_flagfilter","text":"description Validates a FlagFilter struct. output {'check': 'Dummy output to enable caching.'}","title":"validate_FlagFilter"},{"location":"workflows/flag_filter/#inputs","text":"","title":"Inputs"},{"location":"workflows/flag_filter/#required","text":"flags (FlagFilter, required ): FlagFilter struct to validate validate_exclude_if_all._runtime (Any, required ) validate_exclude_if_any._runtime (Any, required ) validate_include_if_all._runtime (Any, required ) validate_include_if_any._runtime (Any, required )","title":"Required"},{"location":"workflows/flag_filter/#outputs","text":"check (String)","title":"Outputs"},{"location":"workflows/flag_filter/#validate_string_is_12bit_oct_dec_or_hex","text":"description Validates that a string is a octal, decimal, or hexadecimal number and less than 2^12. help Hexadecimal numbers must be prefixed with '0x' and only contain the characters [0-9A-F] to be valid (i.e. [a-f] is not allowed). Octal number must start with '0' and only contain the characters [0-7] to be valid. And decimal numbers must start with a digit between 1-9 and only contain the characters [0-9] to be valid. outputs {'check': 'Dummy output to enable caching.'}","title":"validate_string_is_12bit_oct_dec_or_hex"},{"location":"workflows/flag_filter/#inputs_1","text":"","title":"Inputs"},{"location":"workflows/flag_filter/#required_1","text":"_runtime (Any, required ) number (String, required ): The number to validate. See task meta.help for accepted formats.","title":"Required"},{"location":"workflows/flag_filter/#outputs_1","text":"check (String)","title":"Outputs"},{"location":"workflows/gatk-reference/","text":"gatk_reference description Fetches reference files for GATK. outputs {'fasta': 'FASTA file for the reference genome.', 'fasta_index': 'Index for the FASTA file for the reference genome.', 'fasta_dict': 'Sequence dictionary for the reference genome.', 'dbSNP_vcf': 'dbSNP VCF file for the reference genome.', 'dbSNP_vcf_index': 'Index for the dbSNP VCF file for the reference genome.', 'interval_list': 'List of intervals that will be used when computing variants.', 'knownVCFs': 'VCF files with known variants to use with variant calling.'} allowNestedInputs true Inputs Required dbSNP_vcf_name (String, required ): Name of the dbSNP VCF file. dbSNP_vcf_url (String, required ): URL from which to retrieve the dbSNP VCF file. knownVCF_names (Array[String], required ): Names of the VCF files with known variants. Order should match that of knownVCF_urls . knownVCF_urls (Array[String], required ): URLs from which to retrieve VCF files with known variants. reference_fa_md5 (String, required ): MD5 checksum for the reference FASTA file. reference_fa_name (String, required ): Name of the output reference FASTA file. reference_fa_url (String, required ): URL from which to retrieve the reference FASTA file. create_sequence_dictionary._runtime (Any, required ) dbsnp._runtime (Any, required ) dbsnp.disk_size_gb (Int, required ) dbsnp_index._runtime (Any, required ) dbsnp_index.disk_size_gb (Int, required ) faidx._runtime (Any, required ) fasta_download._runtime (Any, required ) fasta_download.disk_size_gb (Int, required ) intervals._runtime (Any, required ) intervals.disk_size_gb (Int, required ) knownVCF._runtime (Any, required ) knownVCF.disk_size_gb (Int, required ) Optional dbSNP_vcf_index_name (String?): Name of the index for the dbSNP VCF file. dbSNP_vcf_index_url (String?): URL from which to retrieve the index for the dbSNP VCF file. interval_list_name (String?): Name of the list of intervals to use when computing variants. interval_list_url (String?): URL from which to retrieve the list of intervals to use when computing variants. create_sequence_dictionary.assembly_name (String?) create_sequence_dictionary.fasta_url (String?) create_sequence_dictionary.species (String?) dbsnp.md5sum (String?) dbsnp_index.md5sum (String?) intervals.md5sum (String?) knownVCF.md5sum (String?) Defaults create_sequence_dictionary.memory_gb (Int, default=16) create_sequence_dictionary.modify_disk_size_gb (Int, default=0) create_sequence_dictionary.outfile_name (String, default=basename(fasta,\".fa\") + \".dict\") faidx.modify_disk_size_gb (Int, default=0) faidx.use_all_cores (Boolean, default=false) Outputs fasta (File) fasta_index (File) fasta_dict (File) dbSNP_vcf (File?) dbSNP_vcf_index (File?) interval_list (File?) knownVCFs (Array[File])","title":"Gatk reference"},{"location":"workflows/gatk-reference/#gatk_reference","text":"description Fetches reference files for GATK. outputs {'fasta': 'FASTA file for the reference genome.', 'fasta_index': 'Index for the FASTA file for the reference genome.', 'fasta_dict': 'Sequence dictionary for the reference genome.', 'dbSNP_vcf': 'dbSNP VCF file for the reference genome.', 'dbSNP_vcf_index': 'Index for the dbSNP VCF file for the reference genome.', 'interval_list': 'List of intervals that will be used when computing variants.', 'knownVCFs': 'VCF files with known variants to use with variant calling.'} allowNestedInputs true","title":"gatk_reference"},{"location":"workflows/gatk-reference/#inputs","text":"","title":"Inputs"},{"location":"workflows/gatk-reference/#required","text":"dbSNP_vcf_name (String, required ): Name of the dbSNP VCF file. dbSNP_vcf_url (String, required ): URL from which to retrieve the dbSNP VCF file. knownVCF_names (Array[String], required ): Names of the VCF files with known variants. Order should match that of knownVCF_urls . knownVCF_urls (Array[String], required ): URLs from which to retrieve VCF files with known variants. reference_fa_md5 (String, required ): MD5 checksum for the reference FASTA file. reference_fa_name (String, required ): Name of the output reference FASTA file. reference_fa_url (String, required ): URL from which to retrieve the reference FASTA file. create_sequence_dictionary._runtime (Any, required ) dbsnp._runtime (Any, required ) dbsnp.disk_size_gb (Int, required ) dbsnp_index._runtime (Any, required ) dbsnp_index.disk_size_gb (Int, required ) faidx._runtime (Any, required ) fasta_download._runtime (Any, required ) fasta_download.disk_size_gb (Int, required ) intervals._runtime (Any, required ) intervals.disk_size_gb (Int, required ) knownVCF._runtime (Any, required ) knownVCF.disk_size_gb (Int, required )","title":"Required"},{"location":"workflows/gatk-reference/#optional","text":"dbSNP_vcf_index_name (String?): Name of the index for the dbSNP VCF file. dbSNP_vcf_index_url (String?): URL from which to retrieve the index for the dbSNP VCF file. interval_list_name (String?): Name of the list of intervals to use when computing variants. interval_list_url (String?): URL from which to retrieve the list of intervals to use when computing variants. create_sequence_dictionary.assembly_name (String?) create_sequence_dictionary.fasta_url (String?) create_sequence_dictionary.species (String?) dbsnp.md5sum (String?) dbsnp_index.md5sum (String?) intervals.md5sum (String?) knownVCF.md5sum (String?)","title":"Optional"},{"location":"workflows/gatk-reference/#defaults","text":"create_sequence_dictionary.memory_gb (Int, default=16) create_sequence_dictionary.modify_disk_size_gb (Int, default=0) create_sequence_dictionary.outfile_name (String, default=basename(fasta,\".fa\") + \".dict\") faidx.modify_disk_size_gb (Int, default=0) faidx.use_all_cores (Boolean, default=false)","title":"Defaults"},{"location":"workflows/gatk-reference/#outputs","text":"fasta (File) fasta_index (File) fasta_dict (File) dbSNP_vcf (File?) dbSNP_vcf_index (File?) interval_list (File?) knownVCFs (Array[File])","title":"Outputs"},{"location":"workflows/make-qc-reference/","text":"make_qc_reference description Downloads and creates all reference files needed to run the quality_check workflow outputs {'reference_fa': 'FASTA format reference file', 'gtf': 'GTF feature file', 'exon_bed': '3 column BED file defining the regions of the exome. Derived from gtf .', 'CDS_bed': '3 column BED file defining the regions of the coding domain. Derived from gtf .', 'kraken_db': 'A complete Kraken2 database'} allowNestedInputs true Inputs Required gtf_name (String, required ): Name of output GTF file gtf_url (String, required ): URL to retrieve the reference GTF file from reference_fa_name (String, required ): Name of output reference FASTA file reference_fa_url (String, required ): URL to retrieve the reference FASTA file from create_library_from_fastas._runtime (Any, required ) download_library._runtime (Any, required ) download_taxonomy._runtime (Any, required ) fastas_download._runtime (Any, required ) gtf_download._runtime (Any, required ) kraken_build_db._runtime (Any, required ) make_coverage_regions_beds._runtime (Any, required ) reference_download._runtime (Any, required ) Optional fastas_download.md5sum (String?) gtf_download.md5sum (String?) reference_download.md5sum (String?) Defaults gtf_disk_size_gb (Int, default=10): Disk size (in GB) to allocate for downloading the GTF file kraken_fasta_urls (Array[String], default=[]): URLs for any additional FASTA files in NCBI format to download and include in the Kraken2 database. This allows the addition of individual genomes (or other sequences) of interest. kraken_fastas (Array[File], default=[]): Array of gzipped FASTA files. Each sequence's ID must contain either an NCBI accession number or an explicit assignment of the taxonomy ID using kraken:taxid kraken_fastas_disk_size_gb (Int, default=10): Disk size (in GB) to allocate for downloading the FASTA files kraken_libraries (Array[String], default=[\"archaea\", \"bacteria\", \"plasmid\", \"viral\", \"human\", \"fungi\", \"protozoa\", \"UniVec_Core\"]); description : List of kraken libraries to download; choices : ['archaea', 'bacteria', 'plasmid', 'viral', 'human', 'fungi', 'plant', 'protozoa', 'nt', 'UniVec', 'UniVec_Core'] protein (Boolean, default=false): Construct a protein database? reference_fa_disk_size_gb (Int, default=10): Disk size (in GB) to allocate for downloading the reference FASTA file create_library_from_fastas.modify_disk_size_gb (Int, default=0) download_library.modify_disk_size_gb (Int, default=0) kraken_build_db.db_name (String, default=\"kraken2_db\") kraken_build_db.kmer_len (Int, default=if protein then 15 else 35) kraken_build_db.max_db_size_gb (Int, default=-1) kraken_build_db.minimizer_len (Int, default=if protein then 12 else 31) kraken_build_db.minimizer_spaces (Int, default=if protein then 0 else 7) kraken_build_db.modify_disk_size_gb (Int, default=0) kraken_build_db.modify_memory_gb (Int, default=0) kraken_build_db.ncpu (Int, default=4) kraken_build_db.use_all_cores (Boolean, default=false) make_coverage_regions_beds.modify_disk_size_gb (Int, default=0) Outputs reference_fa (File) gtf (File) exon_bed (File) CDS_bed (File) kraken_db (File)","title":"Make qc reference"},{"location":"workflows/make-qc-reference/#make_qc_reference","text":"description Downloads and creates all reference files needed to run the quality_check workflow outputs {'reference_fa': 'FASTA format reference file', 'gtf': 'GTF feature file', 'exon_bed': '3 column BED file defining the regions of the exome. Derived from gtf .', 'CDS_bed': '3 column BED file defining the regions of the coding domain. Derived from gtf .', 'kraken_db': 'A complete Kraken2 database'} allowNestedInputs true","title":"make_qc_reference"},{"location":"workflows/make-qc-reference/#inputs","text":"","title":"Inputs"},{"location":"workflows/make-qc-reference/#required","text":"gtf_name (String, required ): Name of output GTF file gtf_url (String, required ): URL to retrieve the reference GTF file from reference_fa_name (String, required ): Name of output reference FASTA file reference_fa_url (String, required ): URL to retrieve the reference FASTA file from create_library_from_fastas._runtime (Any, required ) download_library._runtime (Any, required ) download_taxonomy._runtime (Any, required ) fastas_download._runtime (Any, required ) gtf_download._runtime (Any, required ) kraken_build_db._runtime (Any, required ) make_coverage_regions_beds._runtime (Any, required ) reference_download._runtime (Any, required )","title":"Required"},{"location":"workflows/make-qc-reference/#optional","text":"fastas_download.md5sum (String?) gtf_download.md5sum (String?) reference_download.md5sum (String?)","title":"Optional"},{"location":"workflows/make-qc-reference/#defaults","text":"gtf_disk_size_gb (Int, default=10): Disk size (in GB) to allocate for downloading the GTF file kraken_fasta_urls (Array[String], default=[]): URLs for any additional FASTA files in NCBI format to download and include in the Kraken2 database. This allows the addition of individual genomes (or other sequences) of interest. kraken_fastas (Array[File], default=[]): Array of gzipped FASTA files. Each sequence's ID must contain either an NCBI accession number or an explicit assignment of the taxonomy ID using kraken:taxid kraken_fastas_disk_size_gb (Int, default=10): Disk size (in GB) to allocate for downloading the FASTA files kraken_libraries (Array[String], default=[\"archaea\", \"bacteria\", \"plasmid\", \"viral\", \"human\", \"fungi\", \"protozoa\", \"UniVec_Core\"]); description : List of kraken libraries to download; choices : ['archaea', 'bacteria', 'plasmid', 'viral', 'human', 'fungi', 'plant', 'protozoa', 'nt', 'UniVec', 'UniVec_Core'] protein (Boolean, default=false): Construct a protein database? reference_fa_disk_size_gb (Int, default=10): Disk size (in GB) to allocate for downloading the reference FASTA file create_library_from_fastas.modify_disk_size_gb (Int, default=0) download_library.modify_disk_size_gb (Int, default=0) kraken_build_db.db_name (String, default=\"kraken2_db\") kraken_build_db.kmer_len (Int, default=if protein then 15 else 35) kraken_build_db.max_db_size_gb (Int, default=-1) kraken_build_db.minimizer_len (Int, default=if protein then 12 else 31) kraken_build_db.minimizer_spaces (Int, default=if protein then 0 else 7) kraken_build_db.modify_disk_size_gb (Int, default=0) kraken_build_db.modify_memory_gb (Int, default=0) kraken_build_db.ncpu (Int, default=4) kraken_build_db.use_all_cores (Boolean, default=false) make_coverage_regions_beds.modify_disk_size_gb (Int, default=0)","title":"Defaults"},{"location":"workflows/make-qc-reference/#outputs","text":"reference_fa (File) gtf (File) exon_bed (File) CDS_bed (File) kraken_db (File)","title":"Outputs"},{"location":"workflows/markdups-post/","text":"MarkDuplicates Post An investigation of all our QC tools was conducted when duplicate marking was introduced to our pipeline. Most tools do not take into consideration whether a read is a duplicate or not. But the tasks called below produce different results depending on whether the input BAM has been duplicate marked or not. markdups_post description Runs QC analyses which are impacted by duplicate marking outputs {'insert_size_metrics': ' *.txt output file of picard collectInsertSizeMetrics ', 'insert_size_metrics_pdf': ' *.pdf output file of picard collectInsertSizeMetrics ', 'flagstat_report': ' samtools flagstat report', 'mosdepth_global_summary': 'Summary of whole genome coverage produced by mosdepth ', 'mosdepth_global_dist': 'Distribution of whole genome coverage produced by mosdepth ', 'mosdepth_region_summary': 'Summaries of coverage corresponding to the regions defined by coverage_beds input, produced by mosdepth ', 'mosdepth_region_dist': 'Distributions of coverage corresponding to the regions defined by coverage_beds input, produced by mosdepth '} allowNestedInputs true Inputs Required markdups_bam (File, required ): Input BAM format file to quality check. Duplicates being marked is not necessary for a successful run of this workflow. markdups_bam_index (File, required ): BAM index file corresponding to the input BAM collect_insert_size_metrics._runtime (Any, required ) flagstat._runtime (Any, required ) regions_coverage._runtime (Any, required ) wg_coverage._runtime (Any, required ) Optional wg_coverage.coverage_bed (File?) Defaults coverage_beds (Array[File], default=[]): An array of 3 column BEDs which are passed to the -b flag of mosdepth, in order to restrict coverage analysis to select regions coverage_labels (Array[String], default=[]): An array of equal length to coverage_beds which determines the prefix label applied to the output files. If omitted, defaults of regions1 , regions2 , etc. will be used. prefix (String, default=basename(markdups_bam,\".bam\")): Prefix for all results files collect_insert_size_metrics.memory_gb (Int, default=8) collect_insert_size_metrics.modify_disk_size_gb (Int, default=0) collect_insert_size_metrics.validation_stringency (String, default=\"SILENT\") flagstat.modify_disk_size_gb (Int, default=0) flagstat.ncpu (Int, default=2) flagstat.use_all_cores (Boolean, default=false) regions_coverage.min_mapping_quality (Int, default=20) regions_coverage.modify_disk_size_gb (Int, default=0) regions_coverage.use_fast_mode (Boolean, default=true) wg_coverage.min_mapping_quality (Int, default=20) wg_coverage.modify_disk_size_gb (Int, default=0) wg_coverage.use_fast_mode (Boolean, default=true) Outputs insert_size_metrics (File) insert_size_metrics_pdf (File) flagstat_report (File) mosdepth_global_summary (File) mosdepth_global_dist (File) mosdepth_region_summary (Array[File]) mosdepth_region_dist (Array[File?])","title":"MarkDuplicates Post"},{"location":"workflows/markdups-post/#markduplicates-post","text":"An investigation of all our QC tools was conducted when duplicate marking was introduced to our pipeline. Most tools do not take into consideration whether a read is a duplicate or not. But the tasks called below produce different results depending on whether the input BAM has been duplicate marked or not.","title":"MarkDuplicates Post"},{"location":"workflows/markdups-post/#markdups_post","text":"description Runs QC analyses which are impacted by duplicate marking outputs {'insert_size_metrics': ' *.txt output file of picard collectInsertSizeMetrics ', 'insert_size_metrics_pdf': ' *.pdf output file of picard collectInsertSizeMetrics ', 'flagstat_report': ' samtools flagstat report', 'mosdepth_global_summary': 'Summary of whole genome coverage produced by mosdepth ', 'mosdepth_global_dist': 'Distribution of whole genome coverage produced by mosdepth ', 'mosdepth_region_summary': 'Summaries of coverage corresponding to the regions defined by coverage_beds input, produced by mosdepth ', 'mosdepth_region_dist': 'Distributions of coverage corresponding to the regions defined by coverage_beds input, produced by mosdepth '} allowNestedInputs true","title":"markdups_post"},{"location":"workflows/markdups-post/#inputs","text":"","title":"Inputs"},{"location":"workflows/markdups-post/#required","text":"markdups_bam (File, required ): Input BAM format file to quality check. Duplicates being marked is not necessary for a successful run of this workflow. markdups_bam_index (File, required ): BAM index file corresponding to the input BAM collect_insert_size_metrics._runtime (Any, required ) flagstat._runtime (Any, required ) regions_coverage._runtime (Any, required ) wg_coverage._runtime (Any, required )","title":"Required"},{"location":"workflows/markdups-post/#optional","text":"wg_coverage.coverage_bed (File?)","title":"Optional"},{"location":"workflows/markdups-post/#defaults","text":"coverage_beds (Array[File], default=[]): An array of 3 column BEDs which are passed to the -b flag of mosdepth, in order to restrict coverage analysis to select regions coverage_labels (Array[String], default=[]): An array of equal length to coverage_beds which determines the prefix label applied to the output files. If omitted, defaults of regions1 , regions2 , etc. will be used. prefix (String, default=basename(markdups_bam,\".bam\")): Prefix for all results files collect_insert_size_metrics.memory_gb (Int, default=8) collect_insert_size_metrics.modify_disk_size_gb (Int, default=0) collect_insert_size_metrics.validation_stringency (String, default=\"SILENT\") flagstat.modify_disk_size_gb (Int, default=0) flagstat.ncpu (Int, default=2) flagstat.use_all_cores (Boolean, default=false) regions_coverage.min_mapping_quality (Int, default=20) regions_coverage.modify_disk_size_gb (Int, default=0) regions_coverage.use_fast_mode (Boolean, default=true) wg_coverage.min_mapping_quality (Int, default=20) wg_coverage.modify_disk_size_gb (Int, default=0) wg_coverage.use_fast_mode (Boolean, default=true)","title":"Defaults"},{"location":"workflows/markdups-post/#outputs","text":"insert_size_metrics (File) insert_size_metrics_pdf (File) flagstat_report (File) mosdepth_global_summary (File) mosdepth_global_dist (File) mosdepth_region_summary (Array[File]) mosdepth_region_dist (Array[File?])","title":"Outputs"},{"location":"workflows/quality-check-standard/","text":"quality_check description Performs comprehensive quality checks, aggregating all analyses and metrics into a final MultiQC report. help Assumes that input BAM is position-sorted. external_help https://multiqc.info/ outputs {'bam_checksum': 'STDOUT of the md5sum command run on the input BAM that has been redirected to a file', 'validate_sam_file': 'Validation report produced by picard ValidateSamFile . Validation warnings and errors are logged.', 'flagstat_report': ' samtools flagstat STDOUT redirected to a file. If mark_duplicates is true , then this result will be generated from the duplicate marked BAM.', 'fastqc_results': 'A gzipped tar archive of all FastQC output files', 'instrument_file': 'TSV file containing the ngsderive isntrument report for the input BAM file', 'read_length_file': 'TSV file containing the ngsderive readlen report for the input BAM file', 'inferred_encoding': 'TSV file containing the ngsderive encoding report for the input BAM file', 'alignment_metrics': {'description': 'The text file output of picard CollectAlignmentSummaryMetrics ', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#AlignmentSummaryMetrics'}, 'alignment_metrics_pdf': 'The PDF file output of picard CollectAlignmentSummaryMetrics ', 'insert_size_metrics': {'description': 'The text file output of picard CollectInsertSizeMetrics . If mark_duplicates is true , then this result will be generated from the duplicate marked BAM.', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#InsertSizeMetrics'}, 'insert_size_metrics_pdf': 'The PDF file output of picard CollectInsertSizeMetrics . If mark_duplicates is true , then this result will be generated from the duplicate marked BAM.', 'quality_score_distribution_txt': 'The text file output of picard QualityScoreDistribution ', 'quality_score_distribution_pdf': 'The PDF file output of picard QualityScoreDistribution ', 'phred_scores': 'Headered TSV file containing PHRED score statistics', 'kraken_report': {'description': 'A Kraken2 summary report', 'external_help': 'https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#sample-report-output-format'}, 'mosdepth_global_dist': 'The $prefix.mosdepth.global.dist.txt file contains a cumulative distribution indicating the proportion of total bases that were covered for at least a given coverage value. It does this for each chromosome, and for the whole genome.', 'mosdepth_global_summary': 'A summary of mean depths per chromosome', 'mosdepth_region_dist': 'The $prefix.mosdepth.region.dist.txt file contains a cumulative distribution indicating the proportion of total bases in the region(s) defined by the coverage_bed that were covered for at least a given coverage value. There will be one file in this array for each coverage_beds input file.', 'mosdepth_region_summary': 'A summary of mean depths per chromosome and within specified regions per chromosome. There will be one file in this array for each coverage_beds input file.', 'multiqc_report': 'A gzipped tar archive of all MultiQC output files', 'orig_read_count': 'A TSV report containing the original read count before subsampling. Only present if subsample_n_reads > 0 .', 'kraken_sequences': {'description': 'Detailed Kraken2 output that has been gzipped. Only present if store_kraken_sequences == true .', 'external_help': 'https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#standard-kraken-output-format'}, 'comparative_kraken_report': 'Kraken2 summary report for only the alternatively filtered reads. Only present if run_comparative_kraken == true .', 'comparative_kraken_sequences': 'Detailed Kraken2 output for only the alternatively filtered reads. Only present if run_comparative_kraken == true && store_kraken_sequences == true .', 'mark_duplicates_metrics': {'description': 'The METRICS_FILE result of picard MarkDuplicates . Only present if mark_duplicates == true && optical_distance > 0 .', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#DuplicationMetrics'}, 'mosdepth_dups_marked_global_dist': 'The $prefix.mosdepth.global.dist.txt file contains a cumulative distribution indicating the proportion of total bases that were covered for at least a given coverage value. It does this for each chromosome, and for the whole genome. This file is produced from analyzing the duplicate marked BAM. Only present if mark_duplicates == true .', 'mosdepth_dups_marked_global_summary': 'A summary of mean depths per chromosome. This file is produced from analyzing the duplicate marked BAM. Only present if mark_duplicates == true .', 'mosdepth_dups_marked_region_dist': 'The $prefix.mosdepth.region.dist.txt file contains a cumulative distribution indicating the proportion of total bases in the region(s) defined by the coverage_bed that were covered for at least a given coverage value. There will be one file in this array for each coverage_beds input file. This file is produced from analyzing the duplicate marked BAM. Only present if mark_duplicates == true .', 'mosdepth_dups_marked_region_summary': 'A summary of mean depths per chromosome and within specified regions per chromosome. There will be one file in this array for each coverage_beds input file. This file is produced from analyzing the duplicate marked BAM. Only present if mark_duplicates == true .', 'inferred_strandedness': 'TSV file containing the ngsderive strandedness report. Only present if rna == true .', 'qualimap_rnaseq_results': 'Gzipped tar archive of all QualiMap output files. Only present if rna == true .', 'junction_summary': 'TSV file containing the ngsderive junction-annotation summary. Only present if rna == true ', 'junctions': 'TSV file containing a detailed list of annotated junctions. Only present if rna == true .', 'librarian_report': 'A tar archive containing the librarian report and raw data. Only present if run_librarian == true .', 'IntermediateFiles': 'Any and all files produced as intermediate during pipeline processing. Only output if output_intermediate_files == true .'} allowNestedInputs true Inputs Required bam (File, required ): Input BAM format file to quality check bam_index (File, required ): BAM index file corresponding to the input BAM kraken_db (File, required ): Kraken2 database. Can be generated with ../reference/make-qc-reference.wdl . Must be a tarball without a root directory. alt_filtered_fastq._runtime (Any, required ) alt_filtered_fqlint._runtime (Any, required ) bam_to_fastq._runtime (Any, required ) collect_alignment_summary_metrics._runtime (Any, required ) collect_insert_size_metrics._runtime (Any, required ) comparative_kraken._runtime (Any, required ) compression_integrity._runtime (Any, required ) compute_checksum._runtime (Any, required ) encoding._runtime (Any, required ) endedness._runtime (Any, required ) fastqc._runtime (Any, required ) flagstat._runtime (Any, required ) fqlint._runtime (Any, required ) global_phred_scores._runtime (Any, required ) instrument._runtime (Any, required ) junction_annotation._runtime (Any, required ) kraken._runtime (Any, required ) librarian._runtime (Any, required ) markdups._runtime (Any, required ) multiqc._runtime (Any, required ) parse_input._runtime (Any, required ) qualimap_rnaseq._runtime (Any, required ) quality_score_distribution._runtime (Any, required ) quickcheck._runtime (Any, required ) read_length._runtime (Any, required ) regions_coverage._runtime (Any, required ) strandedness._runtime (Any, required ) subsample._runtime (Any, required ) subsample_index._runtime (Any, required ) validate_bam._runtime (Any, required ) wg_coverage._runtime (Any, required ) comparative_kraken_filter_validator.validate_exclude_if_all._runtime (Any, required ) comparative_kraken_filter_validator.validate_exclude_if_any._runtime (Any, required ) comparative_kraken_filter_validator.validate_include_if_all._runtime (Any, required ) comparative_kraken_filter_validator.validate_include_if_any._runtime (Any, required ) kraken_filter_validator.validate_exclude_if_all._runtime (Any, required ) kraken_filter_validator.validate_exclude_if_any._runtime (Any, required ) kraken_filter_validator.validate_include_if_all._runtime (Any, required ) kraken_filter_validator.validate_include_if_any._runtime (Any, required ) markdups_post.collect_insert_size_metrics._runtime (Any, required ) markdups_post.flagstat._runtime (Any, required ) markdups_post.regions_coverage._runtime (Any, required ) markdups_post.wg_coverage._runtime (Any, required ) Optional gtf (File?): GTF features file. Gzipped or uncompressed. Required for RNA-Seq data. validate_bam.reference_fasta (File?) wg_coverage.coverage_bed (File?) markdups_post.wg_coverage.coverage_bed (File?) Defaults comparative_filter (FlagFilter, default={\"include_if_all\": \"0x0\", \"exclude_if_any\": \"0x904\", \"include_if_any\": \"0x0\", \"exclude_if_all\": \"0x0\"}): Filter to apply to the input BAM while performing a second FASTQ conversion, before running Kraken2 another time. This is a FlagFilter object (see ../../data_structures/flag_filter.wdl for more information). By default, it will remove unmapped, secondary, and supplementary reads from the created FASTQs. WARNING These filters can be tricky to configure; please read documentation thoroughly before changing the defaults. coverage_beds (Array[File], default=[]): An array of 3 column BEDs which are passed to the -b flag of mosdepth, in order to restrict coverage analysis to select regions. Any regional analysis enabled by this option is in addition to whole genome coverage, which is calculated regardless of this setting. An exon BED and a Coding Sequence BED are examples of regions you may wish to restrict coverage analysis to. Those two BEDs can be created with the workflow in ../reference/make-qc-reference.wdl . coverage_labels (Array[String], default=[]): An array of equal length to coverage_beds which determines the prefix label applied to the output files. If omitted, defaults of regions1 , regions2 , etc. will be used. If using the BEDs created by ../reference/make-qc-reference.wdl , the labels [\"exon\", \"CDS\"] are appropriate. Make sure to provide the coverage BEDs in the same order as the labels. extra_multiqc_inputs (Array[File], default=[]): An array of additional files to pass directly into MultiQC mark_duplicates (Boolean, default=rna): Mark duplicates before select analyses? Default behavior is to set this to the value of the rna parameter. This is because DNA files are often duplicate marked already, and RNA-Seq files are usually not duplicate marked. If set to true , a BAM will be generated and passed to selected downstream analyses. For more details about what analyses are run, review ./markdups-post.wdl . WARNING, this duplicate marked BAM is not ouput by default. If you would like to output this file, set output_intermediate_files = true . multiqc_config (File, default=\"https://raw.githubusercontent.com/stjudecloud/workflows/main/workflows/qc/inputs/multiqc_config_hg38.yaml\"): YAML file for configuring MultiQC optical_distance (Int, default=0): Maximum distance between read coordinates to consider them optical duplicates instead of library duplicates (e.g. PCR duplicates). If mark_duplicates == false , this parameter is ignored. If 0 , then optical duplicate marking is disabled and only traditional duplicate marking will be performed. Suggested settings of 100 for unpatterned versions of the Illumina platform (e.g. HiSeq) or 2500 for patterned flowcell models (e.g. NovaSeq). Calculation of distance depends on coordinate data embedded in the read names, typically produced by Illumina sequencing machines. Optical duplicate detection will not work on non-standard names without a custom regex for tile-data extraction. Review the mark_duplicates task in ../../tools/picard.wdl for more information. output_intermediate_files (Boolean, default=false): Output intermediate files? FASTQs; if rna == true a collated BAM; if mark_duplicates == true a duplicate marked BAM, various accessory files like indexes and md5sums; if subsampling was requested and performed then a sampled BAM and associated index. WARNING, these files can be large. prefix (String, default=basename(bam,\".bam\")): Prefix for all results files rna (Boolean, default=false): Is the sequenced molecule RNA? Enabling this option adds RNA-Seq specific analyses to the workflow. If true , a GTF file must be provided. If false , the GTF file is ignored. run_comparative_kraken (Boolean, default=false): Run Kraken2 a second time with different FASTQ filtering? If true , comparative_filter is used in a second run of BAM->FASTQ conversion, resulting in differently filtered FASTQs analyzed by Kraken2. If false , comparative_filter is ignored. run_librarian (Boolean, default=rna); description : Run the librarian tool to generate a report of the likely Illumina library prep kit used to generate the data. WARNING this tool is not guaranteed to work on all data, and may produce nonsensical results. librarian was trained on a limited set of GEO read data (Gene Expression Oriented). This means the input data should be Paired-End, of mouse or human origin, read length should be >50bp, and derived from a library prep kit that is in the librarian database. By default, this tool is run when rna == true .; external_help : https://f1000research.com/articles/11-1122/v2 standard_filter (FlagFilter, default={\"include_if_all\": \"0x0\", \"exclude_if_any\": \"0x900\", \"include_if_any\": \"0x0\", \"exclude_if_all\": \"0x0\"}): Filter to apply to the input BAM while converting to FASTQ, before running Kraken2 and librarian (if run_librarian == true ). This is a FlagFilter object (see ../../data_structures/flag_filter.wdl for more information). By default, it will remove secondary and supplementary reads from the created FASTQs. WARNING: These filters can be tricky to configure; please read documentation thoroughly before changing the defaults. WARNING: If you have set run_librarian to true , we strongly recommend leaving this filter at the default value. librarian is trained on a specific set of reads, and changing this filter may produce nonsensical results. store_kraken_sequences (Boolean, default=false): Store the Kraken2 sequences output? This will apply to all runs of Kraken2 (see parameter_meta.run_comparative_kraken ). WARNING these files can be very large. subsample_n_reads (Int, default=-1): Only process a random sampling of approximately n reads. Any n <= 0 for processing entire input. Subsampling is done probabalistically so the exact number of reads in the output will have some variation. use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. alt_filtered_fastq.append_read_number (Boolean, default=true) alt_filtered_fastq.collated (Boolean, default=false) alt_filtered_fastq.fail_on_unexpected_reads (Boolean, default=false) alt_filtered_fastq.modify_disk_size_gb (Int, default=0) alt_filtered_fastq.modify_memory_gb (Int, default=0) alt_filtered_fastq.ncpu (Int, default=2) alt_filtered_fastq.output_singletons (Boolean, default=false) alt_filtered_fqlint.disable_validator_codes (Array[String], default=[]) alt_filtered_fqlint.modify_disk_size_gb (Int, default=0) alt_filtered_fqlint.modify_memory_gb (Int, default=0) alt_filtered_fqlint.paired_read_validation_level (String, default=\"high\") alt_filtered_fqlint.panic (Boolean, default=true) alt_filtered_fqlint.single_read_validation_level (String, default=\"high\") bam_to_fastq.append_read_number (Boolean, default=true) bam_to_fastq.collated (Boolean, default=false) bam_to_fastq.fail_on_unexpected_reads (Boolean, default=false) bam_to_fastq.modify_disk_size_gb (Int, default=0) bam_to_fastq.modify_memory_gb (Int, default=0) bam_to_fastq.ncpu (Int, default=2) bam_to_fastq.output_singletons (Boolean, default=false) collect_alignment_summary_metrics.memory_gb (Int, default=8) collect_alignment_summary_metrics.modify_disk_size_gb (Int, default=0) collect_alignment_summary_metrics.validation_stringency (String, default=\"SILENT\") collect_insert_size_metrics.memory_gb (Int, default=8) collect_insert_size_metrics.modify_disk_size_gb (Int, default=0) collect_insert_size_metrics.validation_stringency (String, default=\"SILENT\") comparative_kraken.min_base_quality (Int, default=0) comparative_kraken.modify_disk_size_gb (Int, default=0) comparative_kraken.modify_memory_gb (Int, default=0) comparative_kraken.ncpu (Int, default=4) comparative_kraken.use_names (Boolean, default=true) compression_integrity.modify_disk_size_gb (Int, default=0) compute_checksum.modify_disk_size_gb (Int, default=0) encoding.modify_disk_size_gb (Int, default=0) endedness.calc_rpt (Boolean, default=false) endedness.modify_disk_size_gb (Int, default=0) endedness.modify_memory_gb (Int, default=0) endedness.num_reads (Int, default=-1) endedness.paired_deviance (Float, default=0.0) endedness.round_rpt (Boolean, default=false) endedness.split_by_rg (Boolean, default=false) fastqc.modify_disk_size_gb (Int, default=0) fastqc.ncpu (Int, default=4) flagstat.modify_disk_size_gb (Int, default=0) flagstat.ncpu (Int, default=2) flagstat.use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. fqlint.disable_validator_codes (Array[String], default=[]) fqlint.modify_disk_size_gb (Int, default=0) fqlint.modify_memory_gb (Int, default=0) fqlint.paired_read_validation_level (String, default=\"high\") fqlint.panic (Boolean, default=true) fqlint.single_read_validation_level (String, default=\"high\") global_phred_scores.fast_mode (Boolean, default=true) global_phred_scores.modify_disk_size_gb (Int, default=0) instrument.modify_disk_size_gb (Int, default=0) instrument.num_reads (Int, default=10000) junction_annotation.fuzzy_junction_match_range (Int, default=0) junction_annotation.min_intron (Int, default=50) junction_annotation.min_mapq (Int, default=30) junction_annotation.min_reads (Int, default=2) junction_annotation.modify_disk_size_gb (Int, default=0) kraken.min_base_quality (Int, default=0) kraken.modify_disk_size_gb (Int, default=0) kraken.modify_memory_gb (Int, default=0) kraken.ncpu (Int, default=4) kraken.use_names (Boolean, default=true) librarian.modify_disk_size_gb (Int, default=0) librarian.prefix (String, default=sub(basename(read_one_fastq),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\") + \".librarian\"): Prefix for all results files markdups.clear_dt (Boolean, default=true) markdups.duplicate_scoring_strategy (String, default=\"SUM_OF_BASE_QUALITIES\") markdups.modify_disk_size_gb (Int, default=0) markdups.modify_memory_gb (Int, default=0) markdups.read_name_regex (String, default=\"^[!-9;-?A-~:]+:([!-9;-?A-~]+):([0-9]+):([0-9]+)$\") markdups.remove_duplicates (Boolean, default=false) markdups.remove_sequencing_duplicates (Boolean, default=false) markdups.tagging_policy (String, default=\"All\") markdups.validation_stringency (String, default=\"SILENT\") multiqc.modify_disk_size_gb (Int, default=0) qualimap_rnaseq.memory_gb (Int, default=16) qualimap_rnaseq.modify_disk_size_gb (Int, default=0) quality_score_distribution.memory_gb (Int, default=8) quality_score_distribution.modify_disk_size_gb (Int, default=0) quality_score_distribution.validation_stringency (String, default=\"SILENT\") quickcheck.modify_disk_size_gb (Int, default=0) read_length.majority_vote_cutoff (Float, default=0.7) read_length.modify_disk_size_gb (Int, default=0) read_length.num_reads (Int, default=-1) regions_coverage.min_mapping_quality (Int, default=20) regions_coverage.modify_disk_size_gb (Int, default=0) regions_coverage.use_fast_mode (Boolean, default=true) strandedness.min_mapq (Int, default=30) strandedness.min_reads_per_gene (Int, default=10) strandedness.modify_disk_size_gb (Int, default=0) strandedness.num_genes (Int, default=1000) strandedness.split_by_rg (Boolean, default=false) subsample.modify_disk_size_gb (Int, default=0) subsample.ncpu (Int, default=2) subsample_index.modify_disk_size_gb (Int, default=0) subsample_index.ncpu (Int, default=2) validate_bam.index_validation_stringency_less_exhaustive (Boolean, default=false) validate_bam.max_errors (Int, default=2147483647) validate_bam.memory_gb (Int, default=16) validate_bam.modify_disk_size_gb (Int, default=0) validate_bam.succeed_on_warnings (Boolean, default=true) validate_bam.validation_stringency (String, default=\"LENIENT\") wg_coverage.min_mapping_quality (Int, default=20) wg_coverage.modify_disk_size_gb (Int, default=0) wg_coverage.use_fast_mode (Boolean, default=true) markdups_post.collect_insert_size_metrics.memory_gb (Int, default=8) markdups_post.collect_insert_size_metrics.modify_disk_size_gb (Int, default=0) markdups_post.collect_insert_size_metrics.validation_stringency (String, default=\"SILENT\") markdups_post.flagstat.modify_disk_size_gb (Int, default=0) markdups_post.flagstat.ncpu (Int, default=2) markdups_post.flagstat.use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. markdups_post.regions_coverage.min_mapping_quality (Int, default=20) markdups_post.regions_coverage.modify_disk_size_gb (Int, default=0) markdups_post.regions_coverage.use_fast_mode (Boolean, default=true) markdups_post.wg_coverage.min_mapping_quality (Int, default=20) markdups_post.wg_coverage.modify_disk_size_gb (Int, default=0) markdups_post.wg_coverage.use_fast_mode (Boolean, default=true) Outputs bam_checksum (File) validate_sam_file (File) flagstat_report (File) fastqc_results (File) instrument_file (File) read_length_file (File) inferred_encoding (File) inferred_endedness (File) alignment_metrics (File) alignment_metrics_pdf (File) insert_size_metrics (File) insert_size_metrics_pdf (File) quality_score_distribution_txt (File) quality_score_distribution_pdf (File) phred_scores (File) kraken_report (File) mosdepth_global_dist (File) mosdepth_global_summary (File) mosdepth_region_dist (Array[File]) mosdepth_region_summary (Array[File]) multiqc_report (File) orig_read_count (File?) kraken_sequences (File?) comparative_kraken_report (File?) comparative_kraken_sequences (File?) mosdepth_dups_marked_global_dist (File?) mosdepth_dups_marked_global_summary (File?) mosdepth_dups_marked_region_summary (Array[File]?) mosdepth_dups_marked_region_dist (Array[File?]?) mark_duplicates_metrics (File?) inferred_strandedness (File?) qualimap_rnaseq_results (File?) junction_summary (File?) junctions (File?) librarian_report (File?) intermediate_files (IntermediateFiles?) parse_input description Parses and validates the quality_check workflow's provided inputs outputs {'check': 'Dummy output to indicate success and to enable call-caching', 'labels': 'An array of labels to use on the result coverage files associated with each coverage BED'} Inputs Required _runtime (Any, required ) coverage_beds_len (Int, required ): Length of the provided coverage_beds array coverage_labels (Array[String], required ): An array of equal length to coverage_beds_len which determines the prefix label applied to coverage output files. If an empty array is supplied, defaults of regions1 , regions2 , etc. will be used. gtf_provided (Boolean, required ): Was a GTF supplied by the user? Must be true if rna == true . rna (Boolean, required ): Is the sequenced molecule RNA? Outputs labels (Array[String])","title":"Quality check standard"},{"location":"workflows/quality-check-standard/#quality_check","text":"description Performs comprehensive quality checks, aggregating all analyses and metrics into a final MultiQC report. help Assumes that input BAM is position-sorted. external_help https://multiqc.info/ outputs {'bam_checksum': 'STDOUT of the md5sum command run on the input BAM that has been redirected to a file', 'validate_sam_file': 'Validation report produced by picard ValidateSamFile . Validation warnings and errors are logged.', 'flagstat_report': ' samtools flagstat STDOUT redirected to a file. If mark_duplicates is true , then this result will be generated from the duplicate marked BAM.', 'fastqc_results': 'A gzipped tar archive of all FastQC output files', 'instrument_file': 'TSV file containing the ngsderive isntrument report for the input BAM file', 'read_length_file': 'TSV file containing the ngsderive readlen report for the input BAM file', 'inferred_encoding': 'TSV file containing the ngsderive encoding report for the input BAM file', 'alignment_metrics': {'description': 'The text file output of picard CollectAlignmentSummaryMetrics ', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#AlignmentSummaryMetrics'}, 'alignment_metrics_pdf': 'The PDF file output of picard CollectAlignmentSummaryMetrics ', 'insert_size_metrics': {'description': 'The text file output of picard CollectInsertSizeMetrics . If mark_duplicates is true , then this result will be generated from the duplicate marked BAM.', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#InsertSizeMetrics'}, 'insert_size_metrics_pdf': 'The PDF file output of picard CollectInsertSizeMetrics . If mark_duplicates is true , then this result will be generated from the duplicate marked BAM.', 'quality_score_distribution_txt': 'The text file output of picard QualityScoreDistribution ', 'quality_score_distribution_pdf': 'The PDF file output of picard QualityScoreDistribution ', 'phred_scores': 'Headered TSV file containing PHRED score statistics', 'kraken_report': {'description': 'A Kraken2 summary report', 'external_help': 'https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#sample-report-output-format'}, 'mosdepth_global_dist': 'The $prefix.mosdepth.global.dist.txt file contains a cumulative distribution indicating the proportion of total bases that were covered for at least a given coverage value. It does this for each chromosome, and for the whole genome.', 'mosdepth_global_summary': 'A summary of mean depths per chromosome', 'mosdepth_region_dist': 'The $prefix.mosdepth.region.dist.txt file contains a cumulative distribution indicating the proportion of total bases in the region(s) defined by the coverage_bed that were covered for at least a given coverage value. There will be one file in this array for each coverage_beds input file.', 'mosdepth_region_summary': 'A summary of mean depths per chromosome and within specified regions per chromosome. There will be one file in this array for each coverage_beds input file.', 'multiqc_report': 'A gzipped tar archive of all MultiQC output files', 'orig_read_count': 'A TSV report containing the original read count before subsampling. Only present if subsample_n_reads > 0 .', 'kraken_sequences': {'description': 'Detailed Kraken2 output that has been gzipped. Only present if store_kraken_sequences == true .', 'external_help': 'https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown#standard-kraken-output-format'}, 'comparative_kraken_report': 'Kraken2 summary report for only the alternatively filtered reads. Only present if run_comparative_kraken == true .', 'comparative_kraken_sequences': 'Detailed Kraken2 output for only the alternatively filtered reads. Only present if run_comparative_kraken == true && store_kraken_sequences == true .', 'mark_duplicates_metrics': {'description': 'The METRICS_FILE result of picard MarkDuplicates . Only present if mark_duplicates == true && optical_distance > 0 .', 'external_help': 'http://broadinstitute.github.io/picard/picard-metric-definitions.html#DuplicationMetrics'}, 'mosdepth_dups_marked_global_dist': 'The $prefix.mosdepth.global.dist.txt file contains a cumulative distribution indicating the proportion of total bases that were covered for at least a given coverage value. It does this for each chromosome, and for the whole genome. This file is produced from analyzing the duplicate marked BAM. Only present if mark_duplicates == true .', 'mosdepth_dups_marked_global_summary': 'A summary of mean depths per chromosome. This file is produced from analyzing the duplicate marked BAM. Only present if mark_duplicates == true .', 'mosdepth_dups_marked_region_dist': 'The $prefix.mosdepth.region.dist.txt file contains a cumulative distribution indicating the proportion of total bases in the region(s) defined by the coverage_bed that were covered for at least a given coverage value. There will be one file in this array for each coverage_beds input file. This file is produced from analyzing the duplicate marked BAM. Only present if mark_duplicates == true .', 'mosdepth_dups_marked_region_summary': 'A summary of mean depths per chromosome and within specified regions per chromosome. There will be one file in this array for each coverage_beds input file. This file is produced from analyzing the duplicate marked BAM. Only present if mark_duplicates == true .', 'inferred_strandedness': 'TSV file containing the ngsderive strandedness report. Only present if rna == true .', 'qualimap_rnaseq_results': 'Gzipped tar archive of all QualiMap output files. Only present if rna == true .', 'junction_summary': 'TSV file containing the ngsderive junction-annotation summary. Only present if rna == true ', 'junctions': 'TSV file containing a detailed list of annotated junctions. Only present if rna == true .', 'librarian_report': 'A tar archive containing the librarian report and raw data. Only present if run_librarian == true .', 'IntermediateFiles': 'Any and all files produced as intermediate during pipeline processing. Only output if output_intermediate_files == true .'} allowNestedInputs true","title":"quality_check"},{"location":"workflows/quality-check-standard/#inputs","text":"","title":"Inputs"},{"location":"workflows/quality-check-standard/#required","text":"bam (File, required ): Input BAM format file to quality check bam_index (File, required ): BAM index file corresponding to the input BAM kraken_db (File, required ): Kraken2 database. Can be generated with ../reference/make-qc-reference.wdl . Must be a tarball without a root directory. alt_filtered_fastq._runtime (Any, required ) alt_filtered_fqlint._runtime (Any, required ) bam_to_fastq._runtime (Any, required ) collect_alignment_summary_metrics._runtime (Any, required ) collect_insert_size_metrics._runtime (Any, required ) comparative_kraken._runtime (Any, required ) compression_integrity._runtime (Any, required ) compute_checksum._runtime (Any, required ) encoding._runtime (Any, required ) endedness._runtime (Any, required ) fastqc._runtime (Any, required ) flagstat._runtime (Any, required ) fqlint._runtime (Any, required ) global_phred_scores._runtime (Any, required ) instrument._runtime (Any, required ) junction_annotation._runtime (Any, required ) kraken._runtime (Any, required ) librarian._runtime (Any, required ) markdups._runtime (Any, required ) multiqc._runtime (Any, required ) parse_input._runtime (Any, required ) qualimap_rnaseq._runtime (Any, required ) quality_score_distribution._runtime (Any, required ) quickcheck._runtime (Any, required ) read_length._runtime (Any, required ) regions_coverage._runtime (Any, required ) strandedness._runtime (Any, required ) subsample._runtime (Any, required ) subsample_index._runtime (Any, required ) validate_bam._runtime (Any, required ) wg_coverage._runtime (Any, required ) comparative_kraken_filter_validator.validate_exclude_if_all._runtime (Any, required ) comparative_kraken_filter_validator.validate_exclude_if_any._runtime (Any, required ) comparative_kraken_filter_validator.validate_include_if_all._runtime (Any, required ) comparative_kraken_filter_validator.validate_include_if_any._runtime (Any, required ) kraken_filter_validator.validate_exclude_if_all._runtime (Any, required ) kraken_filter_validator.validate_exclude_if_any._runtime (Any, required ) kraken_filter_validator.validate_include_if_all._runtime (Any, required ) kraken_filter_validator.validate_include_if_any._runtime (Any, required ) markdups_post.collect_insert_size_metrics._runtime (Any, required ) markdups_post.flagstat._runtime (Any, required ) markdups_post.regions_coverage._runtime (Any, required ) markdups_post.wg_coverage._runtime (Any, required )","title":"Required"},{"location":"workflows/quality-check-standard/#optional","text":"gtf (File?): GTF features file. Gzipped or uncompressed. Required for RNA-Seq data. validate_bam.reference_fasta (File?) wg_coverage.coverage_bed (File?) markdups_post.wg_coverage.coverage_bed (File?)","title":"Optional"},{"location":"workflows/quality-check-standard/#defaults","text":"comparative_filter (FlagFilter, default={\"include_if_all\": \"0x0\", \"exclude_if_any\": \"0x904\", \"include_if_any\": \"0x0\", \"exclude_if_all\": \"0x0\"}): Filter to apply to the input BAM while performing a second FASTQ conversion, before running Kraken2 another time. This is a FlagFilter object (see ../../data_structures/flag_filter.wdl for more information). By default, it will remove unmapped, secondary, and supplementary reads from the created FASTQs. WARNING These filters can be tricky to configure; please read documentation thoroughly before changing the defaults. coverage_beds (Array[File], default=[]): An array of 3 column BEDs which are passed to the -b flag of mosdepth, in order to restrict coverage analysis to select regions. Any regional analysis enabled by this option is in addition to whole genome coverage, which is calculated regardless of this setting. An exon BED and a Coding Sequence BED are examples of regions you may wish to restrict coverage analysis to. Those two BEDs can be created with the workflow in ../reference/make-qc-reference.wdl . coverage_labels (Array[String], default=[]): An array of equal length to coverage_beds which determines the prefix label applied to the output files. If omitted, defaults of regions1 , regions2 , etc. will be used. If using the BEDs created by ../reference/make-qc-reference.wdl , the labels [\"exon\", \"CDS\"] are appropriate. Make sure to provide the coverage BEDs in the same order as the labels. extra_multiqc_inputs (Array[File], default=[]): An array of additional files to pass directly into MultiQC mark_duplicates (Boolean, default=rna): Mark duplicates before select analyses? Default behavior is to set this to the value of the rna parameter. This is because DNA files are often duplicate marked already, and RNA-Seq files are usually not duplicate marked. If set to true , a BAM will be generated and passed to selected downstream analyses. For more details about what analyses are run, review ./markdups-post.wdl . WARNING, this duplicate marked BAM is not ouput by default. If you would like to output this file, set output_intermediate_files = true . multiqc_config (File, default=\"https://raw.githubusercontent.com/stjudecloud/workflows/main/workflows/qc/inputs/multiqc_config_hg38.yaml\"): YAML file for configuring MultiQC optical_distance (Int, default=0): Maximum distance between read coordinates to consider them optical duplicates instead of library duplicates (e.g. PCR duplicates). If mark_duplicates == false , this parameter is ignored. If 0 , then optical duplicate marking is disabled and only traditional duplicate marking will be performed. Suggested settings of 100 for unpatterned versions of the Illumina platform (e.g. HiSeq) or 2500 for patterned flowcell models (e.g. NovaSeq). Calculation of distance depends on coordinate data embedded in the read names, typically produced by Illumina sequencing machines. Optical duplicate detection will not work on non-standard names without a custom regex for tile-data extraction. Review the mark_duplicates task in ../../tools/picard.wdl for more information. output_intermediate_files (Boolean, default=false): Output intermediate files? FASTQs; if rna == true a collated BAM; if mark_duplicates == true a duplicate marked BAM, various accessory files like indexes and md5sums; if subsampling was requested and performed then a sampled BAM and associated index. WARNING, these files can be large. prefix (String, default=basename(bam,\".bam\")): Prefix for all results files rna (Boolean, default=false): Is the sequenced molecule RNA? Enabling this option adds RNA-Seq specific analyses to the workflow. If true , a GTF file must be provided. If false , the GTF file is ignored. run_comparative_kraken (Boolean, default=false): Run Kraken2 a second time with different FASTQ filtering? If true , comparative_filter is used in a second run of BAM->FASTQ conversion, resulting in differently filtered FASTQs analyzed by Kraken2. If false , comparative_filter is ignored. run_librarian (Boolean, default=rna); description : Run the librarian tool to generate a report of the likely Illumina library prep kit used to generate the data. WARNING this tool is not guaranteed to work on all data, and may produce nonsensical results. librarian was trained on a limited set of GEO read data (Gene Expression Oriented). This means the input data should be Paired-End, of mouse or human origin, read length should be >50bp, and derived from a library prep kit that is in the librarian database. By default, this tool is run when rna == true .; external_help : https://f1000research.com/articles/11-1122/v2 standard_filter (FlagFilter, default={\"include_if_all\": \"0x0\", \"exclude_if_any\": \"0x900\", \"include_if_any\": \"0x0\", \"exclude_if_all\": \"0x0\"}): Filter to apply to the input BAM while converting to FASTQ, before running Kraken2 and librarian (if run_librarian == true ). This is a FlagFilter object (see ../../data_structures/flag_filter.wdl for more information). By default, it will remove secondary and supplementary reads from the created FASTQs. WARNING: These filters can be tricky to configure; please read documentation thoroughly before changing the defaults. WARNING: If you have set run_librarian to true , we strongly recommend leaving this filter at the default value. librarian is trained on a specific set of reads, and changing this filter may produce nonsensical results. store_kraken_sequences (Boolean, default=false): Store the Kraken2 sequences output? This will apply to all runs of Kraken2 (see parameter_meta.run_comparative_kraken ). WARNING these files can be very large. subsample_n_reads (Int, default=-1): Only process a random sampling of approximately n reads. Any n <= 0 for processing entire input. Subsampling is done probabalistically so the exact number of reads in the output will have some variation. use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. alt_filtered_fastq.append_read_number (Boolean, default=true) alt_filtered_fastq.collated (Boolean, default=false) alt_filtered_fastq.fail_on_unexpected_reads (Boolean, default=false) alt_filtered_fastq.modify_disk_size_gb (Int, default=0) alt_filtered_fastq.modify_memory_gb (Int, default=0) alt_filtered_fastq.ncpu (Int, default=2) alt_filtered_fastq.output_singletons (Boolean, default=false) alt_filtered_fqlint.disable_validator_codes (Array[String], default=[]) alt_filtered_fqlint.modify_disk_size_gb (Int, default=0) alt_filtered_fqlint.modify_memory_gb (Int, default=0) alt_filtered_fqlint.paired_read_validation_level (String, default=\"high\") alt_filtered_fqlint.panic (Boolean, default=true) alt_filtered_fqlint.single_read_validation_level (String, default=\"high\") bam_to_fastq.append_read_number (Boolean, default=true) bam_to_fastq.collated (Boolean, default=false) bam_to_fastq.fail_on_unexpected_reads (Boolean, default=false) bam_to_fastq.modify_disk_size_gb (Int, default=0) bam_to_fastq.modify_memory_gb (Int, default=0) bam_to_fastq.ncpu (Int, default=2) bam_to_fastq.output_singletons (Boolean, default=false) collect_alignment_summary_metrics.memory_gb (Int, default=8) collect_alignment_summary_metrics.modify_disk_size_gb (Int, default=0) collect_alignment_summary_metrics.validation_stringency (String, default=\"SILENT\") collect_insert_size_metrics.memory_gb (Int, default=8) collect_insert_size_metrics.modify_disk_size_gb (Int, default=0) collect_insert_size_metrics.validation_stringency (String, default=\"SILENT\") comparative_kraken.min_base_quality (Int, default=0) comparative_kraken.modify_disk_size_gb (Int, default=0) comparative_kraken.modify_memory_gb (Int, default=0) comparative_kraken.ncpu (Int, default=4) comparative_kraken.use_names (Boolean, default=true) compression_integrity.modify_disk_size_gb (Int, default=0) compute_checksum.modify_disk_size_gb (Int, default=0) encoding.modify_disk_size_gb (Int, default=0) endedness.calc_rpt (Boolean, default=false) endedness.modify_disk_size_gb (Int, default=0) endedness.modify_memory_gb (Int, default=0) endedness.num_reads (Int, default=-1) endedness.paired_deviance (Float, default=0.0) endedness.round_rpt (Boolean, default=false) endedness.split_by_rg (Boolean, default=false) fastqc.modify_disk_size_gb (Int, default=0) fastqc.ncpu (Int, default=4) flagstat.modify_disk_size_gb (Int, default=0) flagstat.ncpu (Int, default=2) flagstat.use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. fqlint.disable_validator_codes (Array[String], default=[]) fqlint.modify_disk_size_gb (Int, default=0) fqlint.modify_memory_gb (Int, default=0) fqlint.paired_read_validation_level (String, default=\"high\") fqlint.panic (Boolean, default=true) fqlint.single_read_validation_level (String, default=\"high\") global_phred_scores.fast_mode (Boolean, default=true) global_phred_scores.modify_disk_size_gb (Int, default=0) instrument.modify_disk_size_gb (Int, default=0) instrument.num_reads (Int, default=10000) junction_annotation.fuzzy_junction_match_range (Int, default=0) junction_annotation.min_intron (Int, default=50) junction_annotation.min_mapq (Int, default=30) junction_annotation.min_reads (Int, default=2) junction_annotation.modify_disk_size_gb (Int, default=0) kraken.min_base_quality (Int, default=0) kraken.modify_disk_size_gb (Int, default=0) kraken.modify_memory_gb (Int, default=0) kraken.ncpu (Int, default=4) kraken.use_names (Boolean, default=true) librarian.modify_disk_size_gb (Int, default=0) librarian.prefix (String, default=sub(basename(read_one_fastq),\"([_\\.][rR][12])?(\\.subsampled)?\\.(fastq|fq)(\\.gz)?$\",\"\") + \".librarian\"): Prefix for all results files markdups.clear_dt (Boolean, default=true) markdups.duplicate_scoring_strategy (String, default=\"SUM_OF_BASE_QUALITIES\") markdups.modify_disk_size_gb (Int, default=0) markdups.modify_memory_gb (Int, default=0) markdups.read_name_regex (String, default=\"^[!-9;-?A-~:]+:([!-9;-?A-~]+):([0-9]+):([0-9]+)$\") markdups.remove_duplicates (Boolean, default=false) markdups.remove_sequencing_duplicates (Boolean, default=false) markdups.tagging_policy (String, default=\"All\") markdups.validation_stringency (String, default=\"SILENT\") multiqc.modify_disk_size_gb (Int, default=0) qualimap_rnaseq.memory_gb (Int, default=16) qualimap_rnaseq.modify_disk_size_gb (Int, default=0) quality_score_distribution.memory_gb (Int, default=8) quality_score_distribution.modify_disk_size_gb (Int, default=0) quality_score_distribution.validation_stringency (String, default=\"SILENT\") quickcheck.modify_disk_size_gb (Int, default=0) read_length.majority_vote_cutoff (Float, default=0.7) read_length.modify_disk_size_gb (Int, default=0) read_length.num_reads (Int, default=-1) regions_coverage.min_mapping_quality (Int, default=20) regions_coverage.modify_disk_size_gb (Int, default=0) regions_coverage.use_fast_mode (Boolean, default=true) strandedness.min_mapq (Int, default=30) strandedness.min_reads_per_gene (Int, default=10) strandedness.modify_disk_size_gb (Int, default=0) strandedness.num_genes (Int, default=1000) strandedness.split_by_rg (Boolean, default=false) subsample.modify_disk_size_gb (Int, default=0) subsample.ncpu (Int, default=2) subsample_index.modify_disk_size_gb (Int, default=0) subsample_index.ncpu (Int, default=2) validate_bam.index_validation_stringency_less_exhaustive (Boolean, default=false) validate_bam.max_errors (Int, default=2147483647) validate_bam.memory_gb (Int, default=16) validate_bam.modify_disk_size_gb (Int, default=0) validate_bam.succeed_on_warnings (Boolean, default=true) validate_bam.validation_stringency (String, default=\"LENIENT\") wg_coverage.min_mapping_quality (Int, default=20) wg_coverage.modify_disk_size_gb (Int, default=0) wg_coverage.use_fast_mode (Boolean, default=true) markdups_post.collect_insert_size_metrics.memory_gb (Int, default=8) markdups_post.collect_insert_size_metrics.modify_disk_size_gb (Int, default=0) markdups_post.collect_insert_size_metrics.validation_stringency (String, default=\"SILENT\") markdups_post.flagstat.modify_disk_size_gb (Int, default=0) markdups_post.flagstat.ncpu (Int, default=2) markdups_post.flagstat.use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. markdups_post.regions_coverage.min_mapping_quality (Int, default=20) markdups_post.regions_coverage.modify_disk_size_gb (Int, default=0) markdups_post.regions_coverage.use_fast_mode (Boolean, default=true) markdups_post.wg_coverage.min_mapping_quality (Int, default=20) markdups_post.wg_coverage.modify_disk_size_gb (Int, default=0) markdups_post.wg_coverage.use_fast_mode (Boolean, default=true)","title":"Defaults"},{"location":"workflows/quality-check-standard/#outputs","text":"bam_checksum (File) validate_sam_file (File) flagstat_report (File) fastqc_results (File) instrument_file (File) read_length_file (File) inferred_encoding (File) inferred_endedness (File) alignment_metrics (File) alignment_metrics_pdf (File) insert_size_metrics (File) insert_size_metrics_pdf (File) quality_score_distribution_txt (File) quality_score_distribution_pdf (File) phred_scores (File) kraken_report (File) mosdepth_global_dist (File) mosdepth_global_summary (File) mosdepth_region_dist (Array[File]) mosdepth_region_summary (Array[File]) multiqc_report (File) orig_read_count (File?) kraken_sequences (File?) comparative_kraken_report (File?) comparative_kraken_sequences (File?) mosdepth_dups_marked_global_dist (File?) mosdepth_dups_marked_global_summary (File?) mosdepth_dups_marked_region_summary (Array[File]?) mosdepth_dups_marked_region_dist (Array[File?]?) mark_duplicates_metrics (File?) inferred_strandedness (File?) qualimap_rnaseq_results (File?) junction_summary (File?) junctions (File?) librarian_report (File?) intermediate_files (IntermediateFiles?)","title":"Outputs"},{"location":"workflows/quality-check-standard/#parse_input","text":"description Parses and validates the quality_check workflow's provided inputs outputs {'check': 'Dummy output to indicate success and to enable call-caching', 'labels': 'An array of labels to use on the result coverage files associated with each coverage BED'}","title":"parse_input"},{"location":"workflows/quality-check-standard/#inputs_1","text":"","title":"Inputs"},{"location":"workflows/quality-check-standard/#required_1","text":"_runtime (Any, required ) coverage_beds_len (Int, required ): Length of the provided coverage_beds array coverage_labels (Array[String], required ): An array of equal length to coverage_beds_len which determines the prefix label applied to coverage output files. If an empty array is supplied, defaults of regions1 , regions2 , etc. will be used. gtf_provided (Boolean, required ): Was a GTF supplied by the user? Must be true if rna == true . rna (Boolean, required ): Is the sequenced molecule RNA?","title":"Required"},{"location":"workflows/quality-check-standard/#outputs_1","text":"labels (Array[String])","title":"Outputs"},{"location":"workflows/rnaseq-variant-calling/","text":"rnaseq_variant_calling description Call short germline variants from RNA-Seq data. Produces a VCF file of variants. Based on GATK RNA-Seq short variant calling best practices pipeline. outputs {'recalibrated_bam': 'BAM that has undergone recalibration of base quality scores', 'recalibrated_bam_index': 'Index file for recalibrated BAM file', 'variant_filtered_vcf': 'VCF file after variant filters have been applied', 'variant_filtered_vcf_index': 'Index for filtered variant VCF file'} Inputs Required bam (File, required ): BAM file of aligned RNA-Seq reads bam_index (File, required ): Index file for BAM file calling_interval_list (File, required ): Interval list of regions from which to call variants. Used for parallelization. dbSNP_vcf (File, required ): dbSNP VCF file dbSNP_vcf_index (File, required ): Index file for dbSNP VCF file dict (File, required ): Sequence dictionary for reference FASTA file fasta (File, required ): Reference FASTA file fasta_index (File, required ): Index file for reference FASTA file known_vcf_indexes (Array[File], required ): Array of index files for known indels VCF files known_vcfs (Array[File], required ): Array of known indels VCF files apply_bqsr._runtime (Any, required ) base_recalibrator._runtime (Any, required ) haplotype_caller._runtime (Any, required ) mark_duplicates._runtime (Any, required ) merge_vcfs._runtime (Any, required ) scatter_interval_list._runtime (Any, required ) split_n_cigar_reads._runtime (Any, required ) variant_filtration._runtime (Any, required ) Defaults bam_is_dup_marked (Boolean, default=false): Whether the input BAM file has duplicates marked. prefix (String, default=basename(bam,'.bam')): Prefix for the output files. scatter_count (Int, default=6): Number of intervals to scatter over. This should typically be set to 5-20. Higher values will increase parallelism and speed up the workflow, but increase overhead in provisioning resources. apply_bqsr.memory_gb (Int, default=25) apply_bqsr.modify_disk_size_gb (Int, default=0) apply_bqsr.ncpu (Int, default=4) apply_bqsr.use_original_quality_scores (Boolean, default=false) base_recalibrator.memory_gb (Int, default=25) base_recalibrator.modify_disk_size_gb (Int, default=0) base_recalibrator.ncpu (Int, default=4) base_recalibrator.outfile_name (String, default=basename(bam,\".bam\") + \".recal.txt\") base_recalibrator.use_original_quality_scores (Boolean, default=false) haplotype_caller.memory_gb (Int, default=25) haplotype_caller.modify_disk_size_gb (Int, default=0) haplotype_caller.ncpu (Int, default=4) haplotype_caller.prefix (String, default=basename(bam,\".bam\")): Prefix for the output files. haplotype_caller.stand_call_conf (Int, default=20) haplotype_caller.use_soft_clipped_bases (Boolean, default=false) mark_duplicates.clear_dt (Boolean, default=true) mark_duplicates.duplicate_scoring_strategy (String, default=\"SUM_OF_BASE_QUALITIES\") mark_duplicates.modify_disk_size_gb (Int, default=0) mark_duplicates.modify_memory_gb (Int, default=0) mark_duplicates.optical_distance (Int, default=0) mark_duplicates.prefix (String, default=basename(bam,\".bam\") + \".MarkDuplicates\"): Prefix for the output files. mark_duplicates.read_name_regex (String, default=\"^[!-9;-?A-~:]+:([!-9;-?A-~]+):([0-9]+):([0-9]+)$\") mark_duplicates.remove_duplicates (Boolean, default=false) mark_duplicates.remove_sequencing_duplicates (Boolean, default=false) mark_duplicates.tagging_policy (String, default=\"All\") mark_duplicates.validation_stringency (String, default=\"SILENT\") merge_vcfs.modify_disk_size_gb (Int, default=0) scatter_interval_list.sort (Boolean, default=true) scatter_interval_list.subdivision_mode (String, default=\"BALANCING_WITHOUT_INTERVAL_SUBDIVISION_WITH_OVERFLOW\") scatter_interval_list.unique (Boolean, default=true) split_n_cigar_reads.memory_gb (Int, default=25) split_n_cigar_reads.modify_disk_size_gb (Int, default=0) split_n_cigar_reads.ncpu (Int, default=8) split_n_cigar_reads.prefix (String, default=basename(bam,\".bam\") + \".split\"): Prefix for the output files. variant_filtration.cluster (Int, default=3) variant_filtration.filter_expressions (Array[String], default=[\"FS > 30.0\", \"QD < 2.0\"]) variant_filtration.filter_names (Array[String], default=[\"FS\", \"QD\"]) variant_filtration.modify_disk_size_gb (Int, default=0) variant_filtration.ncpu (Int, default=1) variant_filtration.window (Int, default=35) Outputs recalibrated_bam (File) recalibrated_bam_index (File) variant_filtered_vcf (File) variant_filtered_vcf_index (File)","title":"Rnaseq variant calling"},{"location":"workflows/rnaseq-variant-calling/#rnaseq_variant_calling","text":"description Call short germline variants from RNA-Seq data. Produces a VCF file of variants. Based on GATK RNA-Seq short variant calling best practices pipeline. outputs {'recalibrated_bam': 'BAM that has undergone recalibration of base quality scores', 'recalibrated_bam_index': 'Index file for recalibrated BAM file', 'variant_filtered_vcf': 'VCF file after variant filters have been applied', 'variant_filtered_vcf_index': 'Index for filtered variant VCF file'}","title":"rnaseq_variant_calling"},{"location":"workflows/rnaseq-variant-calling/#inputs","text":"","title":"Inputs"},{"location":"workflows/rnaseq-variant-calling/#required","text":"bam (File, required ): BAM file of aligned RNA-Seq reads bam_index (File, required ): Index file for BAM file calling_interval_list (File, required ): Interval list of regions from which to call variants. Used for parallelization. dbSNP_vcf (File, required ): dbSNP VCF file dbSNP_vcf_index (File, required ): Index file for dbSNP VCF file dict (File, required ): Sequence dictionary for reference FASTA file fasta (File, required ): Reference FASTA file fasta_index (File, required ): Index file for reference FASTA file known_vcf_indexes (Array[File], required ): Array of index files for known indels VCF files known_vcfs (Array[File], required ): Array of known indels VCF files apply_bqsr._runtime (Any, required ) base_recalibrator._runtime (Any, required ) haplotype_caller._runtime (Any, required ) mark_duplicates._runtime (Any, required ) merge_vcfs._runtime (Any, required ) scatter_interval_list._runtime (Any, required ) split_n_cigar_reads._runtime (Any, required ) variant_filtration._runtime (Any, required )","title":"Required"},{"location":"workflows/rnaseq-variant-calling/#defaults","text":"bam_is_dup_marked (Boolean, default=false): Whether the input BAM file has duplicates marked. prefix (String, default=basename(bam,'.bam')): Prefix for the output files. scatter_count (Int, default=6): Number of intervals to scatter over. This should typically be set to 5-20. Higher values will increase parallelism and speed up the workflow, but increase overhead in provisioning resources. apply_bqsr.memory_gb (Int, default=25) apply_bqsr.modify_disk_size_gb (Int, default=0) apply_bqsr.ncpu (Int, default=4) apply_bqsr.use_original_quality_scores (Boolean, default=false) base_recalibrator.memory_gb (Int, default=25) base_recalibrator.modify_disk_size_gb (Int, default=0) base_recalibrator.ncpu (Int, default=4) base_recalibrator.outfile_name (String, default=basename(bam,\".bam\") + \".recal.txt\") base_recalibrator.use_original_quality_scores (Boolean, default=false) haplotype_caller.memory_gb (Int, default=25) haplotype_caller.modify_disk_size_gb (Int, default=0) haplotype_caller.ncpu (Int, default=4) haplotype_caller.prefix (String, default=basename(bam,\".bam\")): Prefix for the output files. haplotype_caller.stand_call_conf (Int, default=20) haplotype_caller.use_soft_clipped_bases (Boolean, default=false) mark_duplicates.clear_dt (Boolean, default=true) mark_duplicates.duplicate_scoring_strategy (String, default=\"SUM_OF_BASE_QUALITIES\") mark_duplicates.modify_disk_size_gb (Int, default=0) mark_duplicates.modify_memory_gb (Int, default=0) mark_duplicates.optical_distance (Int, default=0) mark_duplicates.prefix (String, default=basename(bam,\".bam\") + \".MarkDuplicates\"): Prefix for the output files. mark_duplicates.read_name_regex (String, default=\"^[!-9;-?A-~:]+:([!-9;-?A-~]+):([0-9]+):([0-9]+)$\") mark_duplicates.remove_duplicates (Boolean, default=false) mark_duplicates.remove_sequencing_duplicates (Boolean, default=false) mark_duplicates.tagging_policy (String, default=\"All\") mark_duplicates.validation_stringency (String, default=\"SILENT\") merge_vcfs.modify_disk_size_gb (Int, default=0) scatter_interval_list.sort (Boolean, default=true) scatter_interval_list.subdivision_mode (String, default=\"BALANCING_WITHOUT_INTERVAL_SUBDIVISION_WITH_OVERFLOW\") scatter_interval_list.unique (Boolean, default=true) split_n_cigar_reads.memory_gb (Int, default=25) split_n_cigar_reads.modify_disk_size_gb (Int, default=0) split_n_cigar_reads.ncpu (Int, default=8) split_n_cigar_reads.prefix (String, default=basename(bam,\".bam\") + \".split\"): Prefix for the output files. variant_filtration.cluster (Int, default=3) variant_filtration.filter_expressions (Array[String], default=[\"FS > 30.0\", \"QD < 2.0\"]) variant_filtration.filter_names (Array[String], default=[\"FS\", \"QD\"]) variant_filtration.modify_disk_size_gb (Int, default=0) variant_filtration.ncpu (Int, default=1) variant_filtration.window (Int, default=35)","title":"Defaults"},{"location":"workflows/rnaseq-variant-calling/#outputs","text":"recalibrated_bam (File) recalibrated_bam_index (File) variant_filtered_vcf (File) variant_filtered_vcf_index (File)","title":"Outputs"},{"location":"workflows/samtools-merge/","text":"WARNING: this workflow is experimental! Use at your own risk! samtools_merge description Runs samtools merge , with optional iteration to avoid maximum command line argument length outputs {'merged_bam': 'The BAM resulting from merging all the input BAMs'} allowNestedInputs true Inputs Required bams (Array[File], required ): BAMs to merge into a final BAM basic_merge._runtime (Any, required ) final_merge._runtime (Any, required ) inner_merge._runtime (Any, required ) Optional basic_merge.new_header (File?) final_merge.new_header (File?) inner_merge.new_header (File?) Defaults max_length (Int, default=100): Maximum number of BAMs to merge before using iteration prefix (String, default=basename(bams[0],\".bam\")): Prefix for output BAM. use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. basic_merge.combine_rg (Boolean, default=true) basic_merge.modify_disk_size_gb (Int, default=0) basic_merge.name_sorted (Boolean, default=false) basic_merge.ncpu (Int, default=2) basic_merge.region (String, default=\"\") final_merge.modify_disk_size_gb (Int, default=0) final_merge.name_sorted (Boolean, default=false) final_merge.ncpu (Int, default=2) final_merge.region (String, default=\"\") inner_merge.combine_rg (Boolean, default=true) inner_merge.modify_disk_size_gb (Int, default=0) inner_merge.name_sorted (Boolean, default=false) inner_merge.ncpu (Int, default=2) inner_merge.region (String, default=\"\") Outputs merged_bam (File)","title":"Samtools merge"},{"location":"workflows/samtools-merge/#samtools_merge","text":"description Runs samtools merge , with optional iteration to avoid maximum command line argument length outputs {'merged_bam': 'The BAM resulting from merging all the input BAMs'} allowNestedInputs true","title":"samtools_merge"},{"location":"workflows/samtools-merge/#inputs","text":"","title":"Inputs"},{"location":"workflows/samtools-merge/#required","text":"bams (Array[File], required ): BAMs to merge into a final BAM basic_merge._runtime (Any, required ) final_merge._runtime (Any, required ) inner_merge._runtime (Any, required )","title":"Required"},{"location":"workflows/samtools-merge/#optional","text":"basic_merge.new_header (File?) final_merge.new_header (File?) inner_merge.new_header (File?)","title":"Optional"},{"location":"workflows/samtools-merge/#defaults","text":"max_length (Int, default=100): Maximum number of BAMs to merge before using iteration prefix (String, default=basename(bams[0],\".bam\")): Prefix for output BAM. use_all_cores (Boolean, default=false): Use all cores? Recommended for cloud environments. basic_merge.combine_rg (Boolean, default=true) basic_merge.modify_disk_size_gb (Int, default=0) basic_merge.name_sorted (Boolean, default=false) basic_merge.ncpu (Int, default=2) basic_merge.region (String, default=\"\") final_merge.modify_disk_size_gb (Int, default=0) final_merge.name_sorted (Boolean, default=false) final_merge.ncpu (Int, default=2) final_merge.region (String, default=\"\") inner_merge.combine_rg (Boolean, default=true) inner_merge.modify_disk_size_gb (Int, default=0) inner_merge.name_sorted (Boolean, default=false) inner_merge.ncpu (Int, default=2) inner_merge.region (String, default=\"\")","title":"Defaults"},{"location":"workflows/samtools-merge/#outputs","text":"merged_bam (File)","title":"Outputs"},{"location":"workflows/scrnaseq-standard/","text":"scRNA-Seq Standard This WDL workflow runs the Cell Ranger scRNA-Seq alignment workflow for St. Jude Cloud. The workflow takes an input BAM file and splits it into FASTQ files for each read in the pair. The read pairs are then passed through Cell Ranger to generate a BAM file and perform quantification. Strandedness is inferred using ngsderive. File validation is performed at several steps, including immediately preceeding output. LICENSING MIT License Copyright 2022-Present St. Jude Children's Research Hospital Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. scrnaseq_standard Inputs Required bam (File, required ): Input BAM format file to quality check gtf (File, required ): Gzipped GTF feature file transcriptome_tar_gz (File, required ): Database of reference files for Cell Ranger. Can be downloaded from 10x Genomics. compute_checksum._runtime (Any, required ) count._runtime (Any, required ) strandedness._runtime (Any, required ) subsample._runtime (Any, required ) validate_bam._runtime (Any, required ) validate_input_bam._runtime (Any, required ) cell_ranger_bam_to_fastqs.bamtofastq._runtime (Any, required ) cell_ranger_bam_to_fastqs.fqlint._runtime (Any, required ) cell_ranger_bam_to_fastqs.quickcheck._runtime (Any, required ) Optional validate_bam.reference_fasta (File?) validate_input_bam.reference_fasta (File?) Defaults prefix (String, default=basename(bam,\".bam\")): Prefix for output files subsample_n_reads (Int, default=-1): Only process a random sampling of n reads. <= 0 for processing entire input BAM. use_all_cores (Boolean, default=false): Use all cores for multi-core steps? validate_input (Boolean, default=true): Ensure input BAM is well-formed before beginning harmonization? cell_ranger_bam_to_fastqs.cellranger11 (Boolean, default=false) cell_ranger_bam_to_fastqs.gemcode (Boolean, default=false) cell_ranger_bam_to_fastqs.longranger20 (Boolean, default=false) compute_checksum.modify_disk_size_gb (Int, default=0) count.memory_gb (Int, default=16) count.modify_disk_size_gb (Int, default=0) count.ncpu (Int, default=1) strandedness.min_mapq (Int, default=30) strandedness.min_reads_per_gene (Int, default=10) strandedness.modify_disk_size_gb (Int, default=0) strandedness.num_genes (Int, default=1000) strandedness.outfile_name (String, default=basename(bam,\".bam\") + \".strandedness.tsv\") strandedness.split_by_rg (Boolean, default=false) subsample.modify_disk_size_gb (Int, default=0) subsample.ncpu (Int, default=2) subsample.prefix (String, default=basename(bam,\".bam\")): Prefix for output files validate_bam.ignore_list (Array[String], default=[]) validate_bam.index_validation_stringency_less_exhaustive (Boolean, default=false) validate_bam.max_errors (Int, default=2147483647) validate_bam.memory_gb (Int, default=16) validate_bam.modify_disk_size_gb (Int, default=0) validate_bam.outfile_name (String, default=basename(bam,\".bam\") + \".ValidateSamFile.txt\") validate_bam.succeed_on_errors (Boolean, default=false) validate_bam.succeed_on_warnings (Boolean, default=true) validate_bam.summary_mode (Boolean, default=false) validate_bam.validation_stringency (String, default=\"LENIENT\") validate_input_bam.ignore_list (Array[String], default=[]) validate_input_bam.index_validation_stringency_less_exhaustive (Boolean, default=false) validate_input_bam.max_errors (Int, default=2147483647) validate_input_bam.memory_gb (Int, default=16) validate_input_bam.modify_disk_size_gb (Int, default=0) validate_input_bam.outfile_name (String, default=basename(bam,\".bam\") + \".ValidateSamFile.txt\") validate_input_bam.succeed_on_errors (Boolean, default=false) validate_input_bam.succeed_on_warnings (Boolean, default=true) validate_input_bam.summary_mode (Boolean, default=false) validate_input_bam.validation_stringency (String, default=\"LENIENT\") cell_ranger_bam_to_fastqs.bamtofastq.memory_gb (Int, default=40) cell_ranger_bam_to_fastqs.bamtofastq.modify_disk_size_gb (Int, default=0) cell_ranger_bam_to_fastqs.bamtofastq.ncpu (Int, default=1) cell_ranger_bam_to_fastqs.fqlint.disable_validator_codes (Array[String], default=[]) cell_ranger_bam_to_fastqs.fqlint.modify_disk_size_gb (Int, default=0) cell_ranger_bam_to_fastqs.fqlint.modify_memory_gb (Int, default=0) cell_ranger_bam_to_fastqs.fqlint.paired_read_validation_level (String, default=\"high\") cell_ranger_bam_to_fastqs.fqlint.panic (Boolean, default=true) cell_ranger_bam_to_fastqs.fqlint.single_read_validation_level (String, default=\"high\") cell_ranger_bam_to_fastqs.quickcheck.modify_disk_size_gb (Int, default=0) Outputs harmonized_bam (File) bam_checksum (File) bam_index (File) qc (File) barcodes (File) features (File) matrix (File) filtered_gene_h5 (File) raw_gene_h5 (File) raw_barcodes (File) raw_features (File) raw_matrix (File) mol_info_h5 (File) web_summary (File) inferred_strandedness (File)","title":"Scrnaseq standard"},{"location":"workflows/scrnaseq-standard/#licensing","text":"","title":"LICENSING"},{"location":"workflows/scrnaseq-standard/#mit-license","text":"Copyright 2022-Present St. Jude Children's Research Hospital Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.","title":"MIT License"},{"location":"workflows/scrnaseq-standard/#scrnaseq_standard","text":"","title":"scrnaseq_standard"},{"location":"workflows/scrnaseq-standard/#inputs","text":"","title":"Inputs"},{"location":"workflows/scrnaseq-standard/#required","text":"bam (File, required ): Input BAM format file to quality check gtf (File, required ): Gzipped GTF feature file transcriptome_tar_gz (File, required ): Database of reference files for Cell Ranger. Can be downloaded from 10x Genomics. compute_checksum._runtime (Any, required ) count._runtime (Any, required ) strandedness._runtime (Any, required ) subsample._runtime (Any, required ) validate_bam._runtime (Any, required ) validate_input_bam._runtime (Any, required ) cell_ranger_bam_to_fastqs.bamtofastq._runtime (Any, required ) cell_ranger_bam_to_fastqs.fqlint._runtime (Any, required ) cell_ranger_bam_to_fastqs.quickcheck._runtime (Any, required )","title":"Required"},{"location":"workflows/scrnaseq-standard/#optional","text":"validate_bam.reference_fasta (File?) validate_input_bam.reference_fasta (File?)","title":"Optional"},{"location":"workflows/scrnaseq-standard/#defaults","text":"prefix (String, default=basename(bam,\".bam\")): Prefix for output files subsample_n_reads (Int, default=-1): Only process a random sampling of n reads. <= 0 for processing entire input BAM. use_all_cores (Boolean, default=false): Use all cores for multi-core steps? validate_input (Boolean, default=true): Ensure input BAM is well-formed before beginning harmonization? cell_ranger_bam_to_fastqs.cellranger11 (Boolean, default=false) cell_ranger_bam_to_fastqs.gemcode (Boolean, default=false) cell_ranger_bam_to_fastqs.longranger20 (Boolean, default=false) compute_checksum.modify_disk_size_gb (Int, default=0) count.memory_gb (Int, default=16) count.modify_disk_size_gb (Int, default=0) count.ncpu (Int, default=1) strandedness.min_mapq (Int, default=30) strandedness.min_reads_per_gene (Int, default=10) strandedness.modify_disk_size_gb (Int, default=0) strandedness.num_genes (Int, default=1000) strandedness.outfile_name (String, default=basename(bam,\".bam\") + \".strandedness.tsv\") strandedness.split_by_rg (Boolean, default=false) subsample.modify_disk_size_gb (Int, default=0) subsample.ncpu (Int, default=2) subsample.prefix (String, default=basename(bam,\".bam\")): Prefix for output files validate_bam.ignore_list (Array[String], default=[]) validate_bam.index_validation_stringency_less_exhaustive (Boolean, default=false) validate_bam.max_errors (Int, default=2147483647) validate_bam.memory_gb (Int, default=16) validate_bam.modify_disk_size_gb (Int, default=0) validate_bam.outfile_name (String, default=basename(bam,\".bam\") + \".ValidateSamFile.txt\") validate_bam.succeed_on_errors (Boolean, default=false) validate_bam.succeed_on_warnings (Boolean, default=true) validate_bam.summary_mode (Boolean, default=false) validate_bam.validation_stringency (String, default=\"LENIENT\") validate_input_bam.ignore_list (Array[String], default=[]) validate_input_bam.index_validation_stringency_less_exhaustive (Boolean, default=false) validate_input_bam.max_errors (Int, default=2147483647) validate_input_bam.memory_gb (Int, default=16) validate_input_bam.modify_disk_size_gb (Int, default=0) validate_input_bam.outfile_name (String, default=basename(bam,\".bam\") + \".ValidateSamFile.txt\") validate_input_bam.succeed_on_errors (Boolean, default=false) validate_input_bam.succeed_on_warnings (Boolean, default=true) validate_input_bam.summary_mode (Boolean, default=false) validate_input_bam.validation_stringency (String, default=\"LENIENT\") cell_ranger_bam_to_fastqs.bamtofastq.memory_gb (Int, default=40) cell_ranger_bam_to_fastqs.bamtofastq.modify_disk_size_gb (Int, default=0) cell_ranger_bam_to_fastqs.bamtofastq.ncpu (Int, default=1) cell_ranger_bam_to_fastqs.fqlint.disable_validator_codes (Array[String], default=[]) cell_ranger_bam_to_fastqs.fqlint.modify_disk_size_gb (Int, default=0) cell_ranger_bam_to_fastqs.fqlint.modify_memory_gb (Int, default=0) cell_ranger_bam_to_fastqs.fqlint.paired_read_validation_level (String, default=\"high\") cell_ranger_bam_to_fastqs.fqlint.panic (Boolean, default=true) cell_ranger_bam_to_fastqs.fqlint.single_read_validation_level (String, default=\"high\") cell_ranger_bam_to_fastqs.quickcheck.modify_disk_size_gb (Int, default=0)","title":"Defaults"},{"location":"workflows/scrnaseq-standard/#outputs","text":"harmonized_bam (File) bam_checksum (File) bam_index (File) qc (File) barcodes (File) features (File) matrix (File) filtered_gene_h5 (File) raw_gene_h5 (File) raw_barcodes (File) raw_features (File) raw_matrix (File) mol_info_h5 (File) web_summary (File) inferred_strandedness (File)","title":"Outputs"},{"location":"workflows/star-db-build/","text":"star_db_build description Builds a database suitable for running the STAR alignment program outputs {'reference_fa': 'FASTA format reference file', 'gtf': 'GTF feature file', 'star_db_tar_gz': 'A gzipped TAR file containing the STAR reference files'} allowNestedInputs true Inputs Required gtf_name (String, required ): Name of output GTF file gtf_url (String, required ): URL to retrieve the reference GTF file from reference_fa_name (String, required ): Name of output reference FASTA file reference_fa_url (String, required ): URL to retrieve the reference FASTA file from build_star_db._runtime (Any, required ) gtf_download._runtime (Any, required ) reference_download._runtime (Any, required ) Optional gtf_md5 (String?): Expected md5sum of GTF file reference_fa_md5 (String?): Expected md5sum of reference FASTA file Defaults gtf_disk_size_gb (Int, default=10): Disk space to allocate the GTF download task reference_fa_disk_size_gb (Int, default=10): Disk space to allocate the FASTA download task build_star_db.db_name (String, default=\"star_db\") build_star_db.genomeChrBinNbits (Int, default=18) build_star_db.genomeSAindexNbases (Int, default=14) build_star_db.genomeSAsparseD (Int, default=1) build_star_db.genomeSuffixLengthMax (Int, default=-1) build_star_db.memory_gb (Int, default=50) build_star_db.modify_disk_size_gb (Int, default=0) build_star_db.ncpu (Int, default=8) build_star_db.sjdbGTFchrPrefix (String, default=\"-\") build_star_db.sjdbGTFfeatureExon (String, default=\"exon\") build_star_db.sjdbGTFtagExonParentGene (String, default=\"gene_id\") build_star_db.sjdbGTFtagExonParentGeneName (String, default=\"gene_name\") build_star_db.sjdbGTFtagExonParentGeneType (String, default=\"gene_type gene_biotype\") build_star_db.sjdbGTFtagExonParentTranscript (String, default=\"transcript_id\") build_star_db.sjdbOverhang (Int, default=125) build_star_db.use_all_cores (Boolean, default=false) Outputs reference_fa (File) gtf (File) star_db_tar_gz (File)","title":"Star db build"},{"location":"workflows/star-db-build/#star_db_build","text":"description Builds a database suitable for running the STAR alignment program outputs {'reference_fa': 'FASTA format reference file', 'gtf': 'GTF feature file', 'star_db_tar_gz': 'A gzipped TAR file containing the STAR reference files'} allowNestedInputs true","title":"star_db_build"},{"location":"workflows/star-db-build/#inputs","text":"","title":"Inputs"},{"location":"workflows/star-db-build/#required","text":"gtf_name (String, required ): Name of output GTF file gtf_url (String, required ): URL to retrieve the reference GTF file from reference_fa_name (String, required ): Name of output reference FASTA file reference_fa_url (String, required ): URL to retrieve the reference FASTA file from build_star_db._runtime (Any, required ) gtf_download._runtime (Any, required ) reference_download._runtime (Any, required )","title":"Required"},{"location":"workflows/star-db-build/#optional","text":"gtf_md5 (String?): Expected md5sum of GTF file reference_fa_md5 (String?): Expected md5sum of reference FASTA file","title":"Optional"},{"location":"workflows/star-db-build/#defaults","text":"gtf_disk_size_gb (Int, default=10): Disk space to allocate the GTF download task reference_fa_disk_size_gb (Int, default=10): Disk space to allocate the FASTA download task build_star_db.db_name (String, default=\"star_db\") build_star_db.genomeChrBinNbits (Int, default=18) build_star_db.genomeSAindexNbases (Int, default=14) build_star_db.genomeSAsparseD (Int, default=1) build_star_db.genomeSuffixLengthMax (Int, default=-1) build_star_db.memory_gb (Int, default=50) build_star_db.modify_disk_size_gb (Int, default=0) build_star_db.ncpu (Int, default=8) build_star_db.sjdbGTFchrPrefix (String, default=\"-\") build_star_db.sjdbGTFfeatureExon (String, default=\"exon\") build_star_db.sjdbGTFtagExonParentGene (String, default=\"gene_id\") build_star_db.sjdbGTFtagExonParentGeneName (String, default=\"gene_name\") build_star_db.sjdbGTFtagExonParentGeneType (String, default=\"gene_type gene_biotype\") build_star_db.sjdbGTFtagExonParentTranscript (String, default=\"transcript_id\") build_star_db.sjdbOverhang (Int, default=125) build_star_db.use_all_cores (Boolean, default=false)","title":"Defaults"},{"location":"workflows/star-db-build/#outputs","text":"reference_fa (File) gtf (File) star_db_tar_gz (File)","title":"Outputs"}]}
\ No newline at end of file
diff --git a/sitemap.xml.gz b/sitemap.xml.gz
index 6cad19e5937d39616d2abb4bd83182f5c60833d8..6bec0f225b41e923e249be0eb2e755d09bd6699a 100644
GIT binary patch
delta 13
Ucmb=gXP58h;9zL?o5)@P02yWjr2qf`

delta 13
Ucmb=gXP58h;9!VwoXB1Q02oIDX#fBK

diff --git a/workflows/dnaseq-core/index.html b/workflows/dnaseq-core/index.html
index 37dfec20..4724406f 100644
--- a/workflows/dnaseq-core/index.html
+++ b/workflows/dnaseq-core/index.html
@@ -197,6 +197,7 @@ <h4 id="required">Required</h4>
 </ul>
 <h4 id="optional">Optional</h4>
 <ul>
+<li><code>sample_override</code> (String?): Value to override the SM field of <em>every</em> read group.</li>
 <li><code>rg_merge.basic_merge.new_header</code> (File?)</li>
 <li><code>rg_merge.final_merge.new_header</code> (File?)</li>
 <li><code>rg_merge.inner_merge.new_header</code> (File?)</li>
diff --git a/workflows/dnaseq-standard-fastq/index.html b/workflows/dnaseq-standard-fastq/index.html
index 86ba2073..8c9bb400 100644
--- a/workflows/dnaseq-standard-fastq/index.html
+++ b/workflows/dnaseq-standard-fastq/index.html
@@ -200,6 +200,7 @@ <h4 id="required">Required</h4>
 </ul>
 <h4 id="optional">Optional</h4>
 <ul>
+<li><code>dnaseq_core_experimental.sample_override</code> (String?)</li>
 <li><code>dnaseq_core_experimental.rg_merge.basic_merge.new_header</code> (File?)</li>
 <li><code>dnaseq_core_experimental.rg_merge.final_merge.new_header</code> (File?)</li>
 <li><code>dnaseq_core_experimental.rg_merge.inner_merge.new_header</code> (File?)</li>
@@ -210,7 +211,7 @@ <h4 id="defaults">Defaults</h4>
 <li><code>reads_per_file</code> (Int, default=10000000): Controls the number of reads per FASTQ file for internal split to run BWA in parallel.</li>
 <li><code>subsample_n_reads</code> (Int, default=-1): Only process a random sampling of <code>n</code> reads. Any <code>n</code>&lt;=<code>0</code> for processing entire input.</li>
 <li><code>use_all_cores</code> (Boolean, default=false): Use all cores? Recommended for cloud environments.</li>
-<li><code>validate_input</code> (Boolean, default=true): Ensure input BAM is well-formed before beginning harmonization?</li>
+<li><code>validate_input</code> (Boolean, default=true): Ensure input FASTQs ares well-formed before beginning harmonization?</li>
 <li><code>fqlint.disable_validator_codes</code> (Array[String], default=[])</li>
 <li><code>fqlint.modify_disk_size_gb</code> (Int, default=0)</li>
 <li><code>fqlint.modify_memory_gb</code> (Int, default=0)</li>
diff --git a/workflows/dnaseq-standard/index.html b/workflows/dnaseq-standard/index.html
index 758e898a..758e726e 100644
--- a/workflows/dnaseq-standard/index.html
+++ b/workflows/dnaseq-standard/index.html
@@ -202,6 +202,7 @@ <h4 id="required">Required</h4>
 </ul>
 <h4 id="optional">Optional</h4>
 <ul>
+<li><code>sample_override</code> (String?): Value to override the SM field of <em>every</em> read group.</li>
 <li><code>validate_input_bam.reference_fasta</code> (File?)</li>
 <li><code>dnaseq_core_experimental.rg_merge.basic_merge.new_header</code> (File?)</li>
 <li><code>dnaseq_core_experimental.rg_merge.final_merge.new_header</code> (File?)</li>
@@ -215,7 +216,6 @@ <h4 id="defaults">Defaults</h4>
 <li><code>subsample_n_reads</code> (Int, default=-1): Only process a random sampling of <code>n</code> reads. Any <code>n</code>&lt;=<code>0</code> for processing entire input.</li>
 <li><code>use_all_cores</code> (Boolean, default=false): Use all cores? Recommended for cloud environments.</li>
 <li><code>validate_input</code> (Boolean, default=true): Ensure input BAM is well-formed before beginning harmonization?</li>
-<li><code>dnaseq_core_experimental.use_all_cores</code> (Boolean, default=false): Use all cores? Recommended for cloud environments.</li>
 <li><code>get_ReadGroups.modify_disk_size_gb</code> (Int, default=0)</li>
 <li><code>subsample.modify_disk_size_gb</code> (Int, default=0)</li>
 <li><code>subsample.ncpu</code> (Int, default=2)</li>