diff --git a/README.md b/README.md index f52c19d..449a07d 100644 --- a/README.md +++ b/README.md @@ -1,22 +1,14 @@ # somatic-cnvs -Workflows for somatic copy number variant analysis -## Running the Somatic CNV WDL +### Purpose : +Workflows for somatic copy number variant analysis. -### Which WDL should you use? +### cnv_somatic_panel_workflow : +Builds a panel of normals (PON) for the cnv pair workflow. -- Building a panel of normals (PoN): ``cnv_somatic_panel_workflow.wdl`` -- Running a matched pair: ``cnv_somatic_pair_workflow.wdl`` +#### Requirements/Expectations -#### Setting up parameter json file for a run - -To get started, create the json template (using ``java -jar wdltool.jar inputs ``) for the workflow you wish to run and adjust parameters accordingly. - -*Please note that there are optional workflow-level and task-level parameters that do not appear in the template file. These are set to reasonable values by default, but can also be adjusted if desired.* - -#### Required parameters in the somatic panel workflow - -Important: The normal_bams samples in the json can be used test the wdl, they are NOT to be used to create a panel of normals for sequence analysis. For instructions on creating a proper PON please refer to user the documents https://software.broadinstitute.org/gatk/documentation/ . +Important: The normal_bams samples in the json can be used test the wdl, they are NOT to be used to create a panel of normals for sequence analysis. For instructions on creating a proper PON please refer to user the documents [Panel of Normals](https://software.broadinstitute.org/gatk/documentation/article?id=11053) and [Generate a CNV panel of normals with CreateReadCountPanelOfNormals](https://gatkforums.broadinstitute.org/dsde/discussion/11682#2) . The reference used must be the same between PoN and case samples. @@ -37,7 +29,14 @@ In additional, there are optional workflow-level and task-level parameters that Further explanation of other task-level parameters may be found by invoking the ``--help`` documentation available in the gatk.jar for each tool. -#### Required parameters in the somatic pair workflow +#### Outputs +- Read count PON in HD5 format +- Addtional metrics + +### cnv_somatic_pair_workflow : +Running a matched pair to obtain somatic copy number variants. + +#### Requirements/Expectations The reference and bins (if specified) must be the same between PoN and case samples. @@ -62,8 +61,27 @@ To invoke Oncotator on the called tumor copy-ratio segments: Further explanation of these task-level parameters may be found by invoking the ``--help`` documentation available in the gatk.jar for each tool. -##Important -- The data in gs://gatk-test-data/1kgp are from the 1000 Genomes Project (http://www.internationalgenome.org/home) and are provided as is. - If you have questions on the data, please direct them to the 1000 Genomes Project email at info@1000genomes.org. Do NOT post questions about the data to the GATK forum. +#### Outputs +- Modeled segments for tumor and normal +- Modeled segments plot for tumor and normal +- Denoised copy ratios for tumor and normal +- Denoised copy ratios plot for tumor and normal +- Denoised copy ratios lim 4 plot for tumor and normal +- Addtional metrics + +### Software version requirements : +- GATK4.1 or later + +Cromwell version support +- Successfully tested on v37 + +### Important Note : - Runtime parameters are optimized for Broad's Google Cloud Platform implementation. -- For help running workflows on the Google Cloud Platform or locally please view the following tutorial [(How to) Execute Workflows from the gatk-workflows Git Organization](https://software.broadinstitute.org/gatk/documentation/article?id=12521). +- For help running workflows on the Google Cloud Platform or locally please +view the following tutorial [(How to) Execute Workflows from the gatk-workflows Git Organization](https://software.broadinstitute.org/gatk/documentation/article?id=12521). +- The following material is provided by the GATK Team. Please post any questions or concerns to one of our forum sites : [GATK](https://gatkforums.broadinstitute.org/gatk/categories/ask-the-team/) , [FireCloud](https://gatkforums.broadinstitute.org/firecloud/categories/ask-the-firecloud-team) or [Terra](https://broadinstitute.zendesk.com/hc/en-us/community/topics/360000500432-General-Discussion) , [WDL/Cromwell](https://gatkforums.broadinstitute.org/wdl/categories/ask-the-wdl-team). +- Please visit the [User Guide](https://software.broadinstitute.org/gatk/documentation/) site for further documentation on our workflows and tools. + +### LICENSING : +Copyright Broad Institute, 2019 | BSD-3 +This script is released under the WDL open source code license (BSD-3) (full license text at https://github.com/openwdl/wdl/blob/master/LICENSE). Note however that the programs it calls may be subject to different licenses. Users are responsible for checking that they are authorized to run all programs before running this script. diff --git a/cnv_common_tasks.wdl b/cnv_common_tasks.wdl index f5ffae4..a46a039 100644 --- a/cnv_common_tasks.wdl +++ b/cnv_common_tasks.wdl @@ -56,6 +56,11 @@ task AnnotateIntervals { File ref_fasta File ref_fasta_fai File ref_fasta_dict + File? mappability_track_bed + File? mappability_track_bed_idx + File? segmental_duplication_track_bed + File? segmental_duplication_track_bed_idx + Int? feature_query_lookahead File? gatk4_jar_override # Runtime parameters @@ -68,6 +73,10 @@ task AnnotateIntervals { Int machine_mem_mb = select_first([mem_gb, 2]) * 1000 Int command_mem_mb = machine_mem_mb - 500 + + # Determine output filename + String filename = select_first([intervals, "wgs.preprocessed"]) + String base_filename = basename(filename, ".interval_list") command <<< set -e @@ -76,8 +85,11 @@ task AnnotateIntervals { gatk --java-options "-Xmx${command_mem_mb}m" AnnotateIntervals \ -L ${intervals} \ --reference ${ref_fasta} \ + ${"--mappability-track " + mappability_track_bed} \ + ${"--segmental-duplication-track " + segmental_duplication_track_bed} \ + --feature-query-lookahead ${default=1000000 feature_query_lookahead} \ --interval-merging-rule OVERLAPPING_ONLY \ - --output annotated_intervals.tsv + --output ${base_filename}.annotated.tsv >>> runtime { @@ -89,7 +101,77 @@ task AnnotateIntervals { } output { - File annotated_intervals = "annotated_intervals.tsv" + File annotated_intervals = "${base_filename}.annotated.tsv" + } +} + +task FilterIntervals { + File intervals + File? blacklist_intervals + File? annotated_intervals + Array[File]? read_count_files + Float? minimum_gc_content + Float? maximum_gc_content + Float? minimum_mappability + Float? maximum_mappability + Float? minimum_segmental_duplication_content + Float? maximum_segmental_duplication_content + Int? low_count_filter_count_threshold + Float? low_count_filter_percentage_of_samples + Float? extreme_count_filter_minimum_percentile + Float? extreme_count_filter_maximum_percentile + Float? extreme_count_filter_percentage_of_samples + File? gatk4_jar_override + + # Runtime parameters + String gatk_docker + Int? mem_gb + Int? disk_space_gb + Boolean use_ssd = false + Int? cpu + Int? preemptible_attempts + + Int machine_mem_mb = select_first([mem_gb, 7]) * 1000 + Int command_mem_mb = machine_mem_mb - 500 + + # Determine output filename + String filename = select_first([intervals, "wgs.preprocessed"]) + String base_filename = basename(filename, ".interval_list") + + command <<< + set -e + export GATK_LOCAL_JAR=${default="/root/gatk.jar" gatk4_jar_override} + + gatk --java-options "-Xmx${command_mem_mb}m" FilterIntervals \ + -L ${intervals} \ + ${"-XL " + blacklist_intervals} \ + ${"--annotated-intervals " + annotated_intervals} \ + ${if defined(read_count_files) then "--input " else ""} ${sep=" --input " read_count_files} \ + --minimum-gc-content ${default="0.1" minimum_gc_content} \ + --maximum-gc-content ${default="0.9" maximum_gc_content} \ + --minimum-mappability ${default="0.9" minimum_mappability} \ + --maximum-mappability ${default="1.0" maximum_mappability} \ + --minimum-segmental-duplication-content ${default="0.0" minimum_segmental_duplication_content} \ + --maximum-segmental-duplication-content ${default="0.5" maximum_segmental_duplication_content} \ + --low-count-filter-count-threshold ${default="5" low_count_filter_count_threshold} \ + --low-count-filter-percentage-of-samples ${default="90.0" low_count_filter_percentage_of_samples} \ + --extreme-count-filter-minimum-percentile ${default="1.0" extreme_count_filter_minimum_percentile} \ + --extreme-count-filter-maximum-percentile ${default="99.0" extreme_count_filter_maximum_percentile} \ + --extreme-count-filter-percentage-of-samples ${default="90.0" extreme_count_filter_percentage_of_samples} \ + --interval-merging-rule OVERLAPPING_ONLY \ + --output ${base_filename}.filtered.interval_list + >>> + + runtime { + docker: "${gatk_docker}" + memory: machine_mem_mb + " MB" + disks: "local-disk " + select_first([disk_space_gb, 50]) + if use_ssd then " SSD" else " HDD" + cpu: select_first([cpu, 1]) + preemptible: select_first([preemptible_attempts, 5]) + } + + output { + File filtered_intervals = "${base_filename}.filtered.interval_list" } } @@ -200,6 +282,8 @@ task CollectAllelicCounts { task ScatterIntervals { File interval_list Int num_intervals_per_scatter + String? output_dir + File? gatk4_jar_override # Runtime parameters String gatk_docker @@ -210,16 +294,35 @@ task ScatterIntervals { Int? preemptible_attempts Int machine_mem_mb = select_first([mem_gb, 2]) * 1000 + Int command_mem_mb = machine_mem_mb - 500 + + # If optional output_dir not specified, use "out"; + String output_dir_ = select_first([output_dir, "out"]) String base_filename = basename(interval_list, ".interval_list") command <<< set -e - - grep @ ${interval_list} > header.txt - grep -v @ ${interval_list} > all_intervals.txt - split -l ${num_intervals_per_scatter} --numeric-suffixes all_intervals.txt ${base_filename}.scattered. - for i in ${base_filename}.scattered.*; do cat header.txt $i > $i.interval_list; done + mkdir ${output_dir_} + export GATK_LOCAL_JAR=${default="/root/gatk.jar" gatk4_jar_override} + + { + >&2 echo "Attempting to run IntervalListTools..." + gatk --java-options "-Xmx${command_mem_mb}m" IntervalListTools \ + --INPUT ${interval_list} \ + --SUBDIVISION_MODE INTERVAL_COUNT \ + --SCATTER_CONTENT ${num_intervals_per_scatter} \ + --OUTPUT ${output_dir_} && + # output files are named output_dir_/temp_0001_of_N/scattered.interval_list, etc. (N = num_intervals_per_scatter); + # we rename them as output_dir_/base_filename.scattered.0000.interval_list, etc. + ls ${output_dir_}/*/scattered.interval_list | \ + cat -n | \ + while read n filename; do mv $filename ${output_dir_}/${base_filename}.scattered.$(printf "%04d" $n).interval_list; done + } || { + # if only a single shard is required, then we can just rename the original interval list + >&2 echo "IntervalListTools failed because only a single shard is required. Copying original interval list..." + cp ${interval_list} ${output_dir_}/${base_filename}.scattered.1.interval_list + } >>> runtime { @@ -231,7 +334,7 @@ task ScatterIntervals { } output { - Array[File] scattered_interval_lists = glob("${base_filename}.scattered.*.interval_list") + Array[File] scattered_interval_lists = glob("${output_dir_}/${base_filename}.scattered.*.interval_list") } } @@ -239,6 +342,10 @@ task PostprocessGermlineCNVCalls { String entity_id Array[File] gcnv_calls_tars Array[File] gcnv_model_tars + Array[File] calling_configs + Array[File] denoising_configs + Array[File] gcnvkernel_version + Array[File] sharded_interval_lists File contig_ploidy_calls_tar Array[String]? allosomal_contigs Int ref_copy_number_autosomal_contigs @@ -258,7 +365,8 @@ task PostprocessGermlineCNVCalls { String genotyped_intervals_vcf_filename = "genotyped-intervals-${entity_id}.vcf.gz" String genotyped_segments_vcf_filename = "genotyped-segments-${entity_id}.vcf.gz" - Boolean allosomal_contigs_specified = defined(allosomal_contigs) && length(select_first([allosomal_contigs, []])) > 0 + + Array[String] allosomal_contigs_args = if defined(allosomal_contigs) then prefix("--allosomal-contig ", select_first([allosomal_contigs])) else [] String dollar = "$" #WDL workaround for using array[@], see https://github.com/broadinstitute/cromwell/issues/1819 @@ -266,13 +374,24 @@ task PostprocessGermlineCNVCalls { set -e export GATK_LOCAL_JAR=${default="/root/gatk.jar" gatk4_jar_override} + sharded_interval_lists_array=(${sep=" " sharded_interval_lists}) + # untar calls to CALLS_0, CALLS_1, etc directories and build the command line + # also copy over shard config and interval files gcnv_calls_tar_array=(${sep=" " gcnv_calls_tars}) + calling_configs_array=(${sep=" " calling_configs}) + denoising_configs_array=(${sep=" " denoising_configs}) + gcnvkernel_version_array=(${sep=" " gcnvkernel_version}) + sharded_interval_lists_array=(${sep=" " sharded_interval_lists}) calls_args="" for index in ${dollar}{!gcnv_calls_tar_array[@]}; do gcnv_calls_tar=${dollar}{gcnv_calls_tar_array[$index]} - mkdir CALLS_$index - tar xzf $gcnv_calls_tar -C CALLS_$index + mkdir -p CALLS_$index/SAMPLE_${sample_index} + tar xzf $gcnv_calls_tar -C CALLS_$index/SAMPLE_${sample_index} + cp ${dollar}{calling_configs_array[$index]} CALLS_$index/ + cp ${dollar}{denoising_configs_array[$index]} CALLS_$index/ + cp ${dollar}{gcnvkernel_version_array[$index]} CALLS_$index/ + cp ${dollar}{sharded_interval_lists_array[$index]} CALLS_$index/ calls_args="$calls_args --calls-shard-path CALLS_$index" done @@ -289,17 +408,18 @@ task PostprocessGermlineCNVCalls { mkdir extracted-contig-ploidy-calls tar xzf ${contig_ploidy_calls_tar} -C extracted-contig-ploidy-calls - allosomal_contigs_args="--allosomal-contig ${sep=" --allosomal-contig " allosomal_contigs}" - gatk --java-options "-Xmx${command_mem_mb}m" PostprocessGermlineCNVCalls \ $calls_args \ $model_args \ - ${true="$allosomal_contigs_args" false="" allosomal_contigs_specified} \ + ${sep=" " allosomal_contigs_args} \ --autosomal-ref-copy-number ${ref_copy_number_autosomal_contigs} \ --contig-ploidy-calls extracted-contig-ploidy-calls \ --sample-index ${sample_index} \ --output-genotyped-intervals ${genotyped_intervals_vcf_filename} \ --output-genotyped-segments ${genotyped_segments_vcf_filename} + + rm -r CALLS_* + rm -r MODEL_* >>> runtime { diff --git a/cnv_somatic_pair_workflow.b37.inputs.json b/cnv_somatic_pair_workflow.b37.inputs.json index 920ef7d..2a61ece 100644 --- a/cnv_somatic_pair_workflow.b37.inputs.json +++ b/cnv_somatic_pair_workflow.b37.inputs.json @@ -14,7 +14,7 @@ "CNVSomaticPairWorkflow.intervals": "gs://gatk-test-data/cnv/somatic/ice_targets.tsv.interval_list", "##_COMMENT3": "Docker", - "CNVSomaticPairWorkflow.gatk_docker": "broadinstitute/gatk:4.0.8.0", + "CNVSomaticPairWorkflow.gatk_docker": "broadinstitute/gatk:4.1.0.0", "##CNVSomaticPairWorkflow.oncotator_docker": "(optional) String?", "##_COMMENT4": "Memory Optional", diff --git a/cnv_somatic_pair_workflow.wdl b/cnv_somatic_pair_workflow.wdl index 8335631..17ee7ed 100644 --- a/cnv_somatic_pair_workflow.wdl +++ b/cnv_somatic_pair_workflow.wdl @@ -4,8 +4,8 @@ # # - The intervals argument is required for both WGS and WES workflows and accepts formats compatible with the # GATK -L argument (see https://gatkforums.broadinstitute.org/gatk/discussion/11009/intervals-and-interval-lists). -# These intervals will be padded on both sides by the amount specified by PreprocessIntervals.padding (default 250) -# and split into bins of length specified by PreprocessIntervals.bin_length (default 1000; specify 0 to skip binning, +# These intervals will be padded on both sides by the amount specified by padding (default 250) +# and split into bins of length specified by bin_length (default 1000; specify 0 to skip binning, # e.g., for WES). For WGS, the intervals should simply cover the autosomal chromosomes (sex chromosomes may be # included, but care should be taken to 1) avoid creating panels of mixed sex, and 2) denoise case samples only # with panels containing only individuals of the same sex as the case samples). @@ -16,6 +16,10 @@ # This may be useful for excluding centromeric regions, etc. from analysis. Alternatively, these regions may # be manually filtered from the final callset. # +# A reasonable blacklist for excluded intervals (-XL) can be found at: +# hg19: gs://gatk-best-practices/somatic-b37/CNV_and_centromere_blacklist.hg19.list +# hg38: gs://gatk-best-practices/somatic-hg38/CNV_and_centromere_blacklist.hg38liftover.list (untested) +# # - The sites file (common_sites) should be a Picard or GATK-style interval list. This is a list of sites # of known variation at which allelic counts will be collected for use in modeling minor-allele fractions. # @@ -25,8 +29,8 @@ # ############# -import "cnv_common_tasks.wdl" as CNVTasks -import "cnv_somatic_oncotator_workflow.wdl" as CNVOncotator +import "https://raw.githubusercontent.com/gatk-workflows/gatk4-somatic-cnvs/1.2/cnv_common_tasks.wdl" as CNVTasks +import "https://raw.githubusercontent.com/gatk-workflows/gatk4-somatic-cnvs/1.2/cnv_somatic_oncotator_workflow.wdl" as CNVOncotator workflow CNVSomaticPairWorkflow { @@ -66,7 +70,7 @@ workflow CNVSomaticPairWorkflow { ############################################## #### optional arguments for CollectCounts #### ############################################## - String? format + String? collect_counts_format Int? mem_gb_for_collect_counts ##################################################### @@ -86,6 +90,7 @@ workflow CNVSomaticPairWorkflow { ############################################## Int? max_num_segments_per_chromosome Int? min_total_allele_count + Int? min_total_allele_count_normal Float? genotyping_homozygous_log_ratio_threshold Float? genotyping_base_error_rate Float? kernel_variance_copy_ratio @@ -165,7 +170,7 @@ workflow CNVSomaticPairWorkflow { ref_fasta = ref_fasta, ref_fasta_fai = ref_fasta_fai, ref_fasta_dict = ref_fasta_dict, - format = format, + format = collect_counts_format, gatk4_jar_override = gatk4_jar_override, gatk_docker = gatk_docker, mem_gb = mem_gb_for_collect_counts, @@ -214,6 +219,7 @@ workflow CNVSomaticPairWorkflow { normal_allelic_counts = CollectAllelicCountsNormal.allelic_counts, max_num_segments_per_chromosome = max_num_segments_per_chromosome, min_total_allele_count = min_total_allele_count, + min_total_allele_count_normal = min_total_allele_count_normal, genotyping_homozygous_log_ratio_threshold = genotyping_homozygous_log_ratio_threshold, genotyping_base_error_rate = genotyping_base_error_rate, kernel_variance_copy_ratio = kernel_variance_copy_ratio, @@ -295,7 +301,7 @@ workflow CNVSomaticPairWorkflow { ref_fasta = ref_fasta, ref_fasta_fai = ref_fasta_fai, ref_fasta_dict = ref_fasta_dict, - format = format, + format = collect_counts_format, gatk4_jar_override = gatk4_jar_override, gatk_docker = gatk_docker, mem_gb = mem_gb_for_collect_counts, @@ -341,7 +347,7 @@ workflow CNVSomaticPairWorkflow { denoised_copy_ratios = DenoiseReadCountsNormal.denoised_copy_ratios, allelic_counts = CollectAllelicCountsNormal.allelic_counts, max_num_segments_per_chromosome = max_num_segments_per_chromosome, - min_total_allele_count = min_total_allele_count, + min_total_allele_count = min_total_allele_count_normal, genotyping_homozygous_log_ratio_threshold = genotyping_homozygous_log_ratio_threshold, genotyping_base_error_rate = genotyping_base_error_rate, kernel_variance_copy_ratio = kernel_variance_copy_ratio, @@ -446,6 +452,7 @@ workflow CNVSomaticPairWorkflow { File copy_ratio_parameters_tumor = ModelSegmentsTumor.copy_ratio_parameters File allele_fraction_parameters_tumor = ModelSegmentsTumor.allele_fraction_parameters File called_copy_ratio_segments_tumor = CallCopyRatioSegmentsTumor.called_copy_ratio_segments + File called_copy_ratio_legacy_segments_tumor = CallCopyRatioSegmentsTumor.called_copy_ratio_legacy_segments File denoised_copy_ratios_plot_tumor = PlotDenoisedCopyRatiosTumor.denoised_copy_ratios_plot File denoised_copy_ratios_lim_4_plot_tumor = PlotDenoisedCopyRatiosTumor.denoised_copy_ratios_lim_4_plot File standardized_MAD_tumor = PlotDenoisedCopyRatiosTumor.standardized_MAD @@ -472,6 +479,7 @@ workflow CNVSomaticPairWorkflow { File? copy_ratio_parameters_normal = ModelSegmentsNormal.copy_ratio_parameters File? allele_fraction_parameters_normal = ModelSegmentsNormal.allele_fraction_parameters File? called_copy_ratio_segments_normal = CallCopyRatioSegmentsNormal.called_copy_ratio_segments + File? called_copy_ratio_legacy_segments_normal = CallCopyRatioSegmentsNormal.called_copy_ratio_legacy_segments File? denoised_copy_ratios_plot_normal = PlotDenoisedCopyRatiosNormal.denoised_copy_ratios_plot File? denoised_copy_ratios_lim_4_plot_normal = PlotDenoisedCopyRatiosNormal.denoised_copy_ratios_lim_4_plot File? standardized_MAD_normal = PlotDenoisedCopyRatiosNormal.standardized_MAD @@ -536,6 +544,7 @@ task ModelSegments { File? normal_allelic_counts Int? max_num_segments_per_chromosome Int? min_total_allele_count + Int? min_total_allele_count_normal Float? genotyping_homozygous_log_ratio_threshold Float? genotyping_base_error_rate Float? kernel_variance_copy_ratio @@ -571,6 +580,11 @@ task ModelSegments { # If optional output_dir not specified, use "out" String output_dir_ = select_first([output_dir, "out"]) + # default values are min_total_allele_count_ = 0 in matched-normal mode + # = 30 in case-only mode + Int default_min_total_allele_count = if defined(normal_allelic_counts) then 0 else 30 + Int min_total_allele_count_ = select_first([min_total_allele_count, default_min_total_allele_count]) + command <<< set -e mkdir ${output_dir_} @@ -580,7 +594,8 @@ task ModelSegments { --denoised-copy-ratios ${denoised_copy_ratios} \ --allelic-counts ${allelic_counts} \ ${"--normal-allelic-counts " + normal_allelic_counts} \ - --minimum-total-allele-count ${default="30" min_total_allele_count} \ + --minimum-total-allele-count-case ${min_total_allele_count_} \ + --minimum-total-allele-count-normal ${default="30" min_total_allele_count_normal} \ --genotyping-homozygous-log-ratio-threshold ${default="-10.0" genotyping_homozygous_log_ratio_threshold} \ --genotyping-base-error-rate ${default="0.05" genotyping_base_error_rate} \ --maximum-number-of-segments-per-chromosome ${default="1000" max_num_segments_per_chromosome} \ @@ -591,14 +606,14 @@ task ModelSegments { --window-size ${sep=" --window-size " window_sizes} \ --number-of-changepoints-penalty-factor ${default="1.0" num_changepoints_penalty_factor} \ --minor-allele-fraction-prior-alpha ${default="25.0" minor_allele_fraction_prior_alpha} \ - --number-of-samples-copy-ratio ${default=100 num_samples_copy_ratio} \ - --number-of-burn-in-samples-copy-ratio ${default=50 num_burn_in_copy_ratio} \ - --number-of-samples-allele-fraction ${default=100 num_samples_allele_fraction} \ - --number-of-burn-in-samples-allele-fraction ${default=50 num_burn_in_allele_fraction} \ + --number-of-samples-copy-ratio ${default="100" num_samples_copy_ratio} \ + --number-of-burn-in-samples-copy-ratio ${default="50" num_burn_in_copy_ratio} \ + --number-of-samples-allele-fraction ${default="100" num_samples_allele_fraction} \ + --number-of-burn-in-samples-allele-fraction ${default="50" num_burn_in_allele_fraction} \ --smoothing-credible-interval-threshold-copy-ratio ${default="2.0" smoothing_threshold_copy_ratio} \ --smoothing-credible-interval-threshold-allele-fraction ${default="2.0" smoothing_threshold_allele_fraction} \ - --maximum-number-of-smoothing-iterations ${default=10 max_num_smoothing_iterations} \ - --number-of-smoothing-iterations-per-fit ${default=0 num_smoothing_iterations_per_fit} \ + --maximum-number-of-smoothing-iterations ${default="10" max_num_smoothing_iterations} \ + --number-of-smoothing-iterations-per-fit ${default="0" num_smoothing_iterations_per_fit} \ --output ${output_dir_} \ --output-prefix ${entity_id} @@ -673,6 +688,7 @@ task CallCopyRatioSegments { output { File called_copy_ratio_segments = "${entity_id}.called.seg" + File called_copy_ratio_legacy_segments = "${entity_id}.called.igv.seg" } } diff --git a/cnv_somatic_panel_workflow.b37.inputs.json b/cnv_somatic_panel_workflow.b37.inputs.json index 5e06649..41c3ced 100644 --- a/cnv_somatic_panel_workflow.b37.inputs.json +++ b/cnv_somatic_panel_workflow.b37.inputs.json @@ -11,7 +11,7 @@ "CNVSomaticPanelWorkflow.intervals": "gs://gatk-test-data/cnv/somatic/ice_targets.tsv.interval_list", "##_COMMENT3": "Docker", - "CNVSomaticPanelWorkflow.gatk_docker": "broadinstitute/gatk:4.0.8.0", + "CNVSomaticPanelWorkflow.gatk_docker": "broadinstitute/gatk:4.1.0.0", "##_COMMENT4": "Disk Size Optional", "##CNVSomaticPanelWorkflow.AnnotateIntervals.disk_space_gb": "(optional) Int?", diff --git a/cnv_somatic_panel_workflow.wdl b/cnv_somatic_panel_workflow.wdl index 4a8c350..9e7dd43 100644 --- a/cnv_somatic_panel_workflow.wdl +++ b/cnv_somatic_panel_workflow.wdl @@ -4,8 +4,8 @@ # # - The intervals argument is required for both WGS and WES workflows and accepts formats compatible with the # GATK -L argument (see https://gatkforums.broadinstitute.org/gatk/discussion/11009/intervals-and-interval-lists). -# These intervals will be padded on both sides by the amount specified by PreprocessIntervals.padding (default 250) -# and split into bins of length specified by PreprocessIntervals.bin_length (default 1000; specify 0 to skip binning, +# These intervals will be padded on both sides by the amount specified by padding (default 250) +# and split into bins of length specified by bin_length (default 1000; specify 0 to skip binning, # e.g., for WES). For WGS, the intervals should simply cover the autosomal chromosomes (sex chromosomes may be # included, but care should be taken to 1) avoid creating panels of mixed sex, and 2) denoise case samples only # with panels containing only individuals of the same sex as the case samples). @@ -16,13 +16,17 @@ # This may be useful for excluding centromeric regions, etc. from analysis. Alternatively, these regions may # be manually filtered from the final callset. # +# A reasonable blacklist for excluded intervals (-XL) can be found at: +# hg19: gs://gatk-best-practices/somatic-b37/CNV_and_centromere_blacklist.hg19.list +# hg38: gs://gatk-best-practices/somatic-hg38/CNV_and_centromere_blacklist.hg38liftover.list (untested) +# # - Example invocation: # # java -jar cromwell.jar run cnv_somatic_panel_workflow.wdl -i my_parameters.json # ############# -import "cnv_common_tasks.wdl" as CNVTasks +import "https://raw.githubusercontent.com/gatk-workflows/gatk4-somatic-cnvs/1.2/cnv_common_tasks.wdl" as CNVTasks workflow CNVSomaticPanelWorkflow { @@ -58,12 +62,17 @@ workflow CNVSomaticPanelWorkflow { ################################################## #### optional arguments for AnnotateIntervals #### ################################################## + File? mappability_track_bed + File? mappability_track_bed_idx + File? segmental_duplication_track_bed + File? segmental_duplication_track_bed_idx + Int? feature_query_lookahead Int? mem_gb_for_annotate_intervals ############################################## #### optional arguments for CollectCounts #### ############################################## - String? format + String? collect_counts_format Int? mem_gb_for_collect_counts ############################################################## @@ -103,6 +112,11 @@ workflow CNVSomaticPanelWorkflow { ref_fasta = ref_fasta, ref_fasta_fai = ref_fasta_fai, ref_fasta_dict = ref_fasta_dict, + mappability_track_bed = mappability_track_bed, + mappability_track_bed_idx = mappability_track_bed_idx, + segmental_duplication_track_bed = segmental_duplication_track_bed, + segmental_duplication_track_bed_idx = segmental_duplication_track_bed_idx, + feature_query_lookahead = feature_query_lookahead, gatk4_jar_override = gatk4_jar_override, gatk_docker = gatk_docker, mem_gb = mem_gb_for_annotate_intervals, @@ -119,7 +133,7 @@ workflow CNVSomaticPanelWorkflow { ref_fasta = ref_fasta, ref_fasta_fai = ref_fasta_fai, ref_fasta_dict = ref_fasta_dict, - format = format, + format = collect_counts_format, gatk4_jar_override = gatk4_jar_override, gatk_docker = gatk_docker, mem_gb = mem_gb_for_collect_counts,