{ "class": "Workflow", "cwlVersion": "sbg:draft-2", "id": "admin/sbg-public-data/whole-genome-analysis-bwa-gatk-2-3-9-lite/58", "label": "Whole Genome Analysis - BWA + GATK 2.3.9-Lite (with Metrics)", "description": "WGS pipeline is used to study the complete DNA sequence of an organism (known as Genome). Although WGS generally has lower coverage than WES, this method can detect variants outside of protein-coding areas and can detect changes affecting regulatory regions and various controlling mechanisms. This characteristic allows for wider application of the pipeline, especially in cases when novel variants are expected. For example, WGS can be used when the phenotype or family history strongly implicates genetic etiology but the phenotype does not correspondent to any specific disorder for which a testing targeting a specific gene is clinically available, or in case when a genetic disorder demonstrates high degree of genetic heterogeneity (H L Rehm, S J Bale et al. ACMG clinical laboratory standards for next-generation sequencing, Genet Med. 2013 September ; 15(9): 733–747. doi:10.1038/gim.2013.92.). \nThe pipeline is constructed following the Broad Institute best practice and utilizing Broad Institute's GATK tools. A separate step is undertaken to assess the quality of sequenced reads using Babraham Institute's tool FastQC. \nSequenced reads are aligned with the BWA tool after which duplicates are removed. The next step uses algorithms developed by the Broad Institute to improve alignment around indels followed by the re-evaluation of the qualities of sequenced bases. Generated SAM files are pooled together and variant calling is performed. Detected variants are subjected to additional analysis resulting in refined, high-quality set of identified variants (for more information on how variant calling is performed, please refer to the [Broad Institute's web site](https://www.broadinstitute.org/gatk/guide/topic?name=methods)).\n\nIn order to obtain optimal usage of the computational instance’s resources analysis is divided into the number of jobs that corresponds to the number of “chromosomal” regions in the input BED file plus one job for much smaller, mitochondrial and global contigs. Splitting of BED file (Target BED) into several smaller BED files is performed by SBG Pass Intervals tool. GATK RealignerTargetCreator uses these BED files to perform scatter (parallelization of execution) on its input intervals file and outputs for each execution intervals file used by GATK Indel Realigner, which performs scatter on this input and outputs BAM file for every interval. GATK BaseRecalibrator collects all the BAM files and use only those covered with BQSR intervals string input for creating the model for base quality score recalibration (BQSR). If BQSR intervals string is not set GATK BaseRecalibrator would work for more than 20 hours on Whole genome sample. For that reason this input is set to \"required\" with the **default value of 20** meaning only chromosome number 20 will be used for creating the model for BQSR. GATK PrintReads applies quality mapping table received from GATK BaseRecalibrator to the BAMs received from GATK IndelRealigner. It also works in scatter mode set on “reads” input (one job per BAM file). GATK UnifiedGenotyper caller scatters by BAM file received from GATK PrintReads. It performs variant calling on each of the BAMs and outputs raw variant calling file (VCF). Final steps of the workflow are re-calibrating and annotating of variants.\nWhole Genome Sequencing workflow can be used for processing several pairs of FASTQ files but all coming from the same sample, but different lanes. It is not created for processing FASTQ files coming from different samples together, but rather by processing each of these samples individually using \"batch by the sample\" on FASTQ files input and setting correctly metadata of FASTQ files. The tools SBG Pair Fastqs by metadata will split into the groups FASTQ files came from the different lanes and pass them through different jobs to BWA-MEM. Later, they will be merged in GATK IndelRealigner.\n\nIn order to complete the execution of the workflow the following fields in the metadata of FASTQ files must be set: **Paired-end, Sample ID, Platform and Library**.", "inputs": [ { "type": [ "File" ], "label": "SnpEff database", "sbg:fileTypes": "ZIP", "id": "#database_1", "sbg:suggestedValue": { "class": "File", "name": "GRCh37.75.zip", "path": "5772b6be507c1752674486c6" }, "sbg:x": 3039.9998648299047, "sbg:y": 356.6619691979336 }, { "type": [ { "type": "array", "items": "File" } ], "label": "FASTQ", "sbg:fileTypes": "FASTQ, FASTQ.GZ, FQ, FQ.GZ", "id": "#fastq", "batchType": [ "metadata.sample_id" ], "sbg:includeInPorts": true, "sbg:x": -43.333329757054656, "sbg:y": 254.66667691866573 }, { "type": [ "File" ], "label": "Reference or TAR with BWA reference indices", "sbg:fileTypes": "FASTA, FA, FA.GZ, FASTA.GZ, TAR", "id": "#reference", "sbg:includeInPorts": true, "sbg:suggestedValue": { "class": "File", "name": "human_g1k_v37_decoy.fasta.tar", "path": "5772b6d9507c1752674486e7" }, "sbg:x": -83.99998560216626, "sbg:y": 21.33333672417546 }, { "type": [ "File" ], "label": "Target BED", "sbg:fileTypes": "BED", "id": "#bed_file_1", "sbg:includeInPorts": true, "sbg:suggestedValue": { "class": "File", "name": "human_g1k_v37_decoy.breakpoints.bed", "path": "5772b6d8507c1752674486e5" }, "sbg:x": 133.33333332008812, "sbg:y": 1183.3286236522697 }, { "type": [ { "type": "array", "items": "File" } ], "label": "dbSNP", "sbg:fileTypes": "VCF, BED, TXT", "id": "#dbsnp", "sbg:includeInPorts": true, "sbg:suggestedValue": [ { "class": "File", "name": "dbsnp_137.b37.vcf", "path": "5772b6cd507c1752674486d8" } ], "sbg:x": 405.00015678671264, "sbg:y": 796.6620611745267 }, { "type": [ "File" ], "label": "1000g phase1 snps", "sbg:fileTypes": "VCF", "id": "#1000g_p1_snps", "sbg:includeInPorts": true, "sbg:suggestedValue": { "class": "File", "name": "1000G_phase1.snps.high_confidence.b37.vcf", "path": "578cf947507c17681a3117d0" }, "sbg:x": 2051.6670735677094, "sbg:y": -146.6666793823243 }, { "type": [ "File" ], "label": "1000g Omni", "sbg:fileTypes": "VCF", "id": "#1000g_omni", "sbg:includeInPorts": true, "sbg:suggestedValue": { "class": "File", "name": "1000G_omni2.5.b37.vcf", "path": "578cf946507c17681a3117cb" }, "sbg:x": 2051.6673787434906, "sbg:y": -326.6667683919273 }, { "type": [ "File" ], "label": "HapMap", "sbg:fileTypes": "VCF", "id": "#hapmap", "sbg:includeInPorts": true, "sbg:suggestedValue": { "class": "File", "name": "hapmap_3.3.b37.vcf", "path": "5772b6d3507c1752674486df" }, "sbg:x": 2180.000508626303, "sbg:y": -231.666717529297 }, { "type": [ "string" ], "label": "BQSR intervals optimal value is 20 or chr20", "id": "#bqsr_intervals", "sbg:includeInPorts": true, "sbg:x": 413.3332543108168, "sbg:y": 994.9941705862676 }, { "type": [ "File" ], "label": "Mills", "description": "Mills", "sbg:fileTypes": "VCF, BED, TXT", "id": "#mills", "sbg:includeInPorts": true, "sbg:suggestedValue": { "class": "File", "name": "Mills_and_1000G_gold_standard.indels.b37.sites.vcf", "path": "5772b6c9507c1752674486d4" }, "sbg:x": 625.0012461344405, "sbg:y": 508.33353678385447 }, { "type": [ "File" ], "label": "1000g p1 indels", "description": "1000g indels", "sbg:fileTypes": "VCF", "id": "#1000g_indels", "sbg:includeInPorts": true, "sbg:suggestedValue": { "class": "File", "name": "1000G_phase1.indels.b37.vcf", "path": "578cf947507c17681a3117ce" }, "sbg:x": 604.0883758797295, "sbg:y": 640.9030805932947 } ], "outputs": [ { "id": "#b64html", "label": "FastQC report", "source": [ "#SBG_Html2b64.b64html" ], "type": [ "null", "File" ], "sbg:fileTypes": "HTML, B64HTML", "required": false, "sbg:includeInPorts": true, "sbg:x": 518.3334852059734, "sbg:y": 444.0000584655357 }, { "id": "#summary_metrics", "label": "Picard Alignment Metrics", "source": [ "#Picard_CollectAlignmentSummaryMetrics.summary_metrics" ], "type": [ "File" ], "sbg:fileTypes": "TXT", "required": true, "sbg:includeInPorts": true, "sbg:x": 2745.0003536145005, "sbg:y": -523.3333742088753 }, { "id": "#plot_pdf", "label": "BaseRecalibrator Plot", "source": [ "#GATK_BaseRecalibrator.plot_pdf" ], "type": [ "null", "File" ], "sbg:fileTypes": "PDF", "required": false, "sbg:includeInPorts": true, "sbg:x": 1800.0004876984663, "sbg:y": 615.0001635419057 }, { "id": "#summary_text", "label": "SnpEff Summary text", "source": [ "#SnpEff.summary_text" ], "type": [ "null", "File" ], "sbg:fileTypes": "TXT", "required": false, "sbg:includeInPorts": true, "sbg:x": 3690.0001456340174, "sbg:y": 59.99999346997966 }, { "id": "#annotated", "label": "Annotated VCF", "source": [ "#SnpEff.annotated" ], "type": [ "null", "File" ], "sbg:fileTypes": "VCF, TXT, GATK, BED, BEDANN", "required": false, "sbg:includeInPorts": true, "sbg:x": 3673.333559420379, "sbg:y": 371.66670515802275 }, { "id": "#combined_vcf", "label": "Raw VCF", "source": [ "#GATK_CombineVariants.combined_vcf" ], "type": [ "File" ], "sbg:fileTypes": "VCF", "required": true, "sbg:includeInPorts": true, "sbg:x": 2531.6667673985175, "sbg:y": 648.3333466317924 }, { "id": "#summary", "label": "SnpEff summary HTML", "source": [ "#SnpEff.summary" ], "type": [ "null", "File" ], "sbg:fileTypes": "HTML, CSV", "required": false, "sbg:includeInPorts": true, "sbg:x": 3710.000015523695, "sbg:y": 219.99421000457625 }, { "id": "#summary_1", "label": "Genome Coverage", "source": [ "#SBG_Genome_Coverage.summary" ], "type": [ "null", "File" ], "required": false, "sbg:includeInPorts": true, "sbg:x": 3229.9999191098727, "sbg:y": -750.0057737959752 }, { "id": "#per_interval", "label": "Coverage Per Interval", "source": [ "#SBG_Genome_Coverage.per_interval" ], "type": [ "null", "File" ], "required": false, "sbg:includeInPorts": true, "sbg:x": 3226.6665857765397, "sbg:y": -583.3391089969207 } ], "steps": [ { "id": "#GATK_CombineVariants", "inputs": [ { "id": "#GATK_CombineVariants.variants", "source": [ "#GATK_UnifiedGenotyper.raw_vcf" ] }, { "id": "#GATK_CombineVariants.reference", "source": [ "#SBG_FASTA_Indices.fasta_reference" ] } ], "outputs": [ { "id": "#GATK_CombineVariants.combined_vcf" } ], "run": { "cwlVersion": "sbg:draft-2", "class": "CommandLineTool", "id": "admin/sbg-public-data/gatk-2-3-9-lite-combinevariants/13", "label": "GATK CombineVariants", "description": "Overview\n\nCombineVariants reads in variants records from separate ROD (Reference-Ordered Data) sources and combines them into a single VCF. Any (unique) name can be used to bind your ROD and any number of sources can be input. This tool aims to fulfill two main possible use cases, reflected by the two combination options (MERGE and UNION), for merging records at the variant level (the first 8 fields of the VCF) or at the genotype level (the rest).\n\nMERGE: combines multiple variant records present at the same site in the different input sources into a single variant record in the output. If sample names overlap, then they are \"uniquified\" by default, which means a suffix is appended to make them unique. Note that in version 3.3, the automatic uniquifying was disabled (unintentionally), and required setting `-genotypeMergeOptions UNIQUIFY` manually.\nUNION: assumes that each ROD source represents the same set of samples (although this is not enforced). It uses the priority list (if provided) to emit a single record instance at every position represented in the input RODs.\nCombineVariants will emit a record for every site that was present in any of your input VCF files, and will annotate (in the set attribute in the INFO field) whether the record had a PASS or FILTER status in each input ROD . In effect, CombineVariants always produces a union of the input VCFs. However, any part of the Venn of the merged VCFs can be extracted using JEXL expressions on the set attribute using SelectVariants. If you want to extract just the records in common between two VCFs, you would first run CombineVariants on the two files to generate a single VCF and then run SelectVariants to extract the common records with `-select 'set == \"Intersection\"'`, as worked out in the detailed example in the documentation guide.\n\nInput\nTwo or more variant sets to combine.\n\nOutput\nA combined VCF.\n\nUsage examples\n\nMerge two separate callsets\n java -jar GenomeAnalysisTK.jar \\\n -T CombineVariants \\\n -R reference.fasta \\\n --variant input1.vcf \\\n --variant input2.vcf \\\n -o output.vcf \\\n -genotypeMergeOptions UNIQUIFY\n \nGet the union of calls made on the same samples\n java -jar GenomeAnalysisTK.jar \\\n -T CombineVariants \\\n -R reference.fasta \\\n --variant:foo input1.vcf \\\n --variant:bar input2.vcf \\\n -o output.vcf \\\n -genotypeMergeOptions PRIORITIZE \\\n -priority foo,bar\n \nCaveats\n\nThis tool is not intended to manipulate GVCFS! To combine GVCF files output by HaplotypeCaller, use CombineGVCFs.\nTo join intermediate VCFs produced by running jobs in parallel by interval (e.g. by chromosome), use CatVariants.\n\nAdditional notes\n\nUsing this tool's multi-threaded parallelism capability is particularly useful when converting from VCF to BCF2, which can be time-consuming. In this case each thread spends CPU time doing the conversion, and the GATK engine is smart enough to merge the partial BCF2 blocks together efficiently. However, since this merge runs in only one thread, you can quickly reach diminishing returns with the number of parallel threads. In our hands, `-nt 4` works well but `-nt 8` tends to be be too much.\nSince GATK 2.1, when merging multiple VCF records at a site, the combined VCF record has the QUAL of the first VCF record with a non-MISSING QUAL value. The previous behavior was to take the max QUAL, which could result in strange downstream confusion.\n\n(IMPORTANT) Reference \".fasta\" Secondary Files\n\nTools in GATK that require a fasta reference file also look for the reference file's corresponding .fai (fasta index) and .dict (fasta dictionary) files. The fasta index file allows random access to reference bases and the dictionary file is a dictionary of the contig names and sizes contained within the fasta reference. These two secondary files are essential for GATK to work properly. To append these two files to your fasta reference please use the 'SBG FASTA Indices' tool within your GATK based workflow before using any of the GATK tools.", "baseCommand": [ "java", { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.memory_per_job){\n \treturn '-Xmx'.concat($job.inputs.memory_per_job, 'M')\n }\n \treturn '-Xmx2048M'\n}" }, "-jar", "/opt/GenomeAnalysisTKLite.jar", "--analysis_type", "CombineVariants" ], "inputs": [ { "required": true, "sbg:altPrefix": "-V", "sbg:category": "Input Files", "type": [ { "type": "array", "items": "File" } ], "inputBinding": { "position": 0, "prefix": "--variant", "separate": true, "sbg:cmdInclude": true }, "label": "Variants", "description": "Input VCF file.", "sbg:fileTypes": "VCF", "id": "#variants" }, { "sbg:altPrefix": "-S", "sbg:category": "GATK General", "sbg:toolDefaultValue": "SILENT", "type": [ "null", { "type": "enum", "symbols": [ "SILENT", "LENIENT", "STRICT" ], "name": "validation_strictness" } ], "inputBinding": { "position": 0, "prefix": "--validation_strictness", "separate": true, "sbg:cmdInclude": true }, "label": "Validation Strictness", "description": "How strict should we be with validation.", "id": "#validation_strictness" }, { "sbg:altPrefix": "-OQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--useOriginalQualities", "separate": true, "sbg:cmdInclude": true }, "label": "Use Original Qualities", "description": "If set, use the original base quality scores from the OQ tag when present instead of the standard scores.", "id": "#use_original_qualities" }, { "sbg:altPrefix": "-use_legacy_downsampler", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--use_legacy_downsampler", "separate": true, "sbg:cmdInclude": true }, "label": "Use Legacy Downsampler", "description": "Use the legacy downsampling implementation instead of the newer, less-tested implementation.", "id": "#use_legacy_downsampler" }, { "sbg:altPrefix": "-U", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", { "type": "enum", "symbols": [ "ALLOW_UNINDEXED_BAM", "ALLOW_UNSET_BAM_SORT_ORDER", "NO_READ_ORDER_VERIFICATION", "ALLOW_SEQ_DICT_INCOMPATIBILITY", "LENIENT_VCF_PROCESSING", "ALL" ], "name": "unsafe" } ], "inputBinding": { "position": 0, "prefix": "--unsafe", "separate": true, "sbg:cmdInclude": true }, "label": "Unsafe", "description": "If set, enables unsafe operations: nothing will be checked at runtime. For expert users only who know what they are doing. We do not support usage of this argument.", "id": "#unsafe" }, { "sbg:altPrefix": "-tag", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "string" ], "inputBinding": { "position": 0, "prefix": "--tag", "separate": true, "sbg:cmdInclude": true }, "label": "Tag", "description": "Arbitrary tag string to identify this GATK run as part of a group of runs, for later analysis.", "id": "#tag" }, { "sbg:altPrefix": "-suppressCommandLineHeader", "sbg:category": "Combine Variants", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--suppressCommandLineHeader", "separate": true, "sbg:cmdInclude": true }, "label": "Suppress Command Line Header", "description": "If true, do not output the header containing the command line used.", "id": "#suppress_command_line_header" }, { "sbg:altPrefix": "-setKey", "sbg:category": "Combine Variants", "sbg:toolDefaultValue": "set", "type": [ "null", "string" ], "inputBinding": { "position": 0, "prefix": "--setKey", "separate": true, "sbg:cmdInclude": true }, "label": "Set Key", "description": "Key used in the INFO key=value tag emitted describing which set the combined VCF record came from.", "id": "#set_key" }, { "sbg:altPrefix": "-rpr", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--remove_program_records", "separate": true, "sbg:cmdInclude": true }, "label": "Remove Program Records", "description": "Should we override the Walker's default and remove program records from the SAM header.", "id": "#remove_program_records" }, { "required": true, "sbg:altPrefix": "-R", "sbg:category": "Input Files", "type": [ "File" ], "inputBinding": { "position": 0, "prefix": "--reference_sequence", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Reference Genome", "description": "Reference Genome in FASTA format.", "sbg:fileTypes": "FASTA, FA", "id": "#reference" }, { "sbg:altPrefix": "-rgbl", "sbg:category": "GATK General", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 0, "prefix": "--read_group_black_list", "separate": true, "sbg:cmdInclude": true }, "label": "Read Group Black List", "description": "Filters out read groups matching : or a .txt file containing the filter strings one per line.", "id": "#read_group_black_list" }, { "sbg:altPrefix": "-rf", "sbg:category": "GATK General", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": { "type": "enum", "symbols": [ "BadCigarFilter", "BadMateFilter", "CountingFilteringIterator.CountingReadFilter", "DuplicateReadFilter", "FailsVendorQualityCheckFilter", "HCMappingQualityFilter", "LibraryReadFilter", "MalformedReadFilter", "MappingQualityFilter", "MappingQualityUnavailableFilter", "MappingQualityZeroFilter", "MateSameStrandFilter", "MaxInsertSizeFilter", "MissingReadGroupFilter", "NoOriginalQualityScoresFilter", "NotPrimaryAlignmentFilter", "OverclippedReadFilter", "Platform454Filter", "PlatformFilter", "PlatformUnitFilter", "ReadGroupBlackListFilter", "ReadLengthFilter", "ReadNameFilter", "ReadStrandFilter", "ReassignMappingQualityFilter", "ReassignOneMappingQualityFilter", "SampleFilter", "SingleReadGroupFilter", "UnmappedReadFilter" ] } } ], "inputBinding": { "position": 0, "prefix": "--read_filter", "separate": true, "sbg:cmdInclude": true }, "label": "Read Filter", "description": "Specify filtration criteria to apply to each read individually.", "id": "#read_filter" }, { "sbg:altPrefix": "-printComplexMerges", "sbg:category": "Combine Variants", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--printComplexMerges", "separate": true, "sbg:cmdInclude": true }, "label": "Print Complex Merges", "description": "Print out interesting sites requiring complex compatibility merging.", "id": "#print_complex_merges" }, { "sbg:altPrefix": "-preserveQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "6", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--preserve_qscores_less_than", "separate": true, "sbg:cmdInclude": true }, "label": "Preserve Qscores Less Than", "description": "Bases with quality scores less than this threshold won't be recalibrated (with -BQSR).", "id": "#preserve_qscores_less_than" }, { "sbg:altPrefix": "-et", "sbg:category": "GATK General", "sbg:toolDefaultValue": "STANDARD", "type": [ "null", { "type": "enum", "symbols": [ "NO_ET", "STANDARD" ], "name": "phone_home" } ], "inputBinding": { "position": 0, "prefix": "--phone_home", "separate": true, "sbg:cmdInclude": true }, "label": "Phone Home", "description": "What kind of GATK run report should we generate? STANDARD is the default, can be NO_ET so nothing is posted to the run repository. Please see http://gatkforums.broadinstitute.org/discussion/1250/what-is-phone-home-and-how-does-it-affect-me#latest for details.", "id": "#phone_home" }, { "sbg:altPrefix": "-pedValidationType", "sbg:category": "GATK General", "sbg:toolDefaultValue": "STRICT", "type": [ "null", { "type": "enum", "symbols": [ "STRICT", "SILENT" ], "name": "pedigree_validation_type" } ], "inputBinding": { "position": 0, "prefix": "--pedigreeValidationType", "separate": true, "sbg:cmdInclude": true }, "label": "Pedigree Validation Type", "description": "How strict should we be in validating the pedigree information?.", "id": "#pedigree_validation_type" }, { "sbg:altPrefix": "-pedString", "sbg:category": "GATK General", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 0, "prefix": "--pedigreeString", "separate": true, "sbg:cmdInclude": true }, "label": "Pedigree String", "description": "Pedigree string for samples.", "id": "#pedigree_string" }, { "sbg:altPrefix": "-ndrs", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--nonDeterministicRandomSeed", "separate": true, "sbg:cmdInclude": true }, "label": "Non Deterministic Random Seed", "description": "Makes the GATK behave non deterministically, that is, the random numbers generated will be different in every run.", "id": "#non_deterministic_random_seed" }, { "sbg:altPrefix": "-minN", "sbg:category": "Combine Variants", "sbg:toolDefaultValue": "1", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--minimumN", "separate": true, "sbg:cmdInclude": true }, "label": "Minimum N", "description": "Combine variants and output site only if the variant is present in at least N input files.", "id": "#minimum_n" }, { "sbg:altPrefix": "-minimalVCF", "sbg:category": "Combine Variants", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--minimalVCF", "separate": true, "sbg:cmdInclude": true }, "label": "Minimal Vcf", "description": "If true, then the output VCF will contain no INFO or genotype FORMAT fields.", "id": "#minimal_vcf" }, { "sbg:altPrefix": "-mergeInfoWithMaxAC", "sbg:category": "Combine Variants", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--mergeInfoWithMaxAC", "separate": true, "sbg:cmdInclude": true }, "label": "Merge Info With Max Ac", "description": "If true, when VCF records overlap the info field is taken from the one with the max AC instead of only taking the fields which are identical across the overlapping records.", "id": "#merge_info_with_max_ac" }, { "sbg:altPrefix": null, "sbg:category": "Execution", "sbg:toolDefaultValue": "2048", "type": [ "null", "int" ], "label": "Memory per job", "description": "Amount of RAM memory to be used per job.", "id": "#memory_per_job" }, { "sbg:category": "Execution", "sbg:toolDefaultValue": "0", "type": [ "null", "int" ], "label": "Memory overhead per job", "description": "Memory overhead per job. By default this parameter value is set to '0' (zero megabytes). This parameter value is added to the Memory per job parameter value. This results in the allocation of the sum total (Memory per job and Memory overhead per job) amount of memory per job. By default the memory per job parameter value is set to 2048 megabytes, unless specified otherwise.", "id": "#memory_overhead_per_job" }, { "sbg:altPrefix": "-maxRuntimeUnits", "sbg:category": "GATK General", "sbg:toolDefaultValue": "MINUTES", "type": [ "null", { "type": "enum", "symbols": [ "NANOSECONDS", "MICROSECONDS", "MILLISECONDS", "SECONDS", "MINUTES", "HOURS", "DAYS" ], "name": "max_runtime_units" } ], "inputBinding": { "position": 0, "prefix": "--maxRuntimeUnits", "separate": true, "sbg:cmdInclude": true }, "label": "Max Runtime Units", "description": "The TimeUnit for maxRuntime.", "id": "#max_runtime_units" }, { "sbg:altPrefix": "-maxRuntime", "sbg:category": "GATK General", "sbg:toolDefaultValue": "-1", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--maxRuntime", "separate": true, "sbg:cmdInclude": true }, "label": "Max Runtime", "description": "If provided, that GATK will stop execution cleanly as soon after maxRuntime has been exceeded, truncating the run but not exiting with a failure. By default the value is interpreted in minutes, but this can be changed by maxRuntimeUnits.", "id": "#max_runtime" }, { "sbg:altPrefix": "-kpr", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--keep_program_records", "separate": true, "sbg:cmdInclude": true }, "label": "Keep Program Records", "description": "Should we override the Walker's default and keep program records from the SAM header.", "id": "#keep_program_records" }, { "required": false, "sbg:altPrefix": "-L", "sbg:category": "Input Files", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--intervals", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Intervals", "description": "One or more genomic intervals over which to operate. Can be an specified in an .intervals file or a rod file.", "sbg:fileTypes": "VCF, BED, TXT", "id": "#intervals_file" }, { "sbg:altPrefix": "--intervals", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "string" ], "inputBinding": { "position": 0, "prefix": "-L", "separate": true, "sbg:cmdInclude": true }, "label": "Intervals", "description": "One or more genomic intervals over which to operate.", "id": "#intervals" }, { "sbg:altPrefix": "-isr", "sbg:category": "GATK General", "sbg:toolDefaultValue": "UNION", "type": [ "null", { "type": "enum", "symbols": [ "UNION", "INTERSECTION" ], "name": "interval_set_rule" } ], "inputBinding": { "position": 0, "prefix": "--interval_set_rule", "separate": true, "sbg:cmdInclude": true }, "label": "Interval Set Rule", "description": "Indicates the set merging approach the interval parser should use to combine the various -L or -XL inputs.", "id": "#interval_set_rule" }, { "sbg:altPrefix": "-ip", "sbg:category": "GATK General", "sbg:toolDefaultValue": "0", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--interval_padding", "separate": true, "sbg:cmdInclude": true }, "label": "Interval Padding", "description": "Indicates how many basepairs of padding to include around each of the intervals specified with the -L/--intervals argument.", "id": "#interval_padding" }, { "sbg:altPrefix": "-im", "sbg:category": "GATK General", "sbg:toolDefaultValue": "ALL", "type": [ "null", { "type": "enum", "symbols": [ "ALL", "OVERLAPPING_ONLY" ], "name": "interval_merging" } ], "inputBinding": { "position": 0, "prefix": "--interval_merging", "separate": true, "sbg:cmdInclude": true }, "label": "Interval Merging", "description": "Indicates the interval merging rule we should use for abutting intervals.", "id": "#interval_merging" }, { "sbg:altPrefix": "-genotypeMergeOptions", "sbg:category": "Combine Variants", "sbg:toolDefaultValue": "", "type": [ "null", { "type": "enum", "symbols": [ "UNIQUIFY", "UNSORTED", "REQUIRE_UNIQUE" ], "name": "genotypemergeoption" } ], "inputBinding": { "position": 0, "prefix": "--genotypemergeoption", "separate": true, "sbg:cmdInclude": true }, "label": "Genotypemergeoption", "description": "Determines how we should merge genotype records for samples shared across the ROD files.", "id": "#genotypemergeoption" }, { "required": false, "sbg:altPrefix": "-K", "sbg:category": "Input Files", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--gatk_key", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Gatk key", "description": "GATK Key file. Required if running with -et NO_ET. Please see http://gatkforums.broadinstitute.org/discussion/1250/what-is-phone-home-and-how-does-it-affect-me#latest for details.", "sbg:fileTypes": "KEY, LICENSE", "id": "#gatk_key" }, { "sbg:altPrefix": "-fixMisencodedQuals", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "-fixMisencodedQuals", "separate": true, "sbg:cmdInclude": true }, "label": "Fix Misencoded Quals", "description": "Fix mis-encoded base quality scores.", "id": "#fix_misencoded_quals" }, { "sbg:altPrefix": "-filteredRecordsMergeType", "sbg:category": "Combine Variants", "sbg:toolDefaultValue": "KEEP_IF_ANY_UNFILTERED", "type": [ "null", { "type": "enum", "symbols": [ "KEEP_IF_ANY_UNFILTERED", "KEEP_IF_ALL_UNFILTERED", "KEEP_UNCONDITIONAL" ], "name": "filteredrecordsmergetype" } ], "inputBinding": { "position": 0, "prefix": "--filteredrecordsmergetype", "separate": true, "sbg:cmdInclude": true }, "label": "Filteredrecordsmergetype", "description": "Determines how we should handle records seen at the same site in the VCF, but with different FILTER fields.", "id": "#filteredrecordsmergetype" }, { "sbg:altPrefix": "-filteredAreUncalled", "sbg:category": "Combine Variants", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--filteredAreUncalled", "separate": true, "sbg:cmdInclude": true }, "label": "Filtered Are Uncalled", "description": "If true, then filtered VCFs are treated as uncalled, so that filtered set annotations don't appear in the combined VCF.", "id": "#filtered_are_uncalled" }, { "required": false, "sbg:altPrefix": "-XL", "sbg:category": "Input Files", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--excludeIntervals", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Exclude Intervals", "description": "One or more genomic intervals to exclude from processing. Can be an .intervals file or a rod file.", "id": "#exclude_intervals" }, { "sbg:altPrefix": "-EOQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--emit_original_quals", "separate": true, "sbg:cmdInclude": true }, "label": "Emit Original Quals", "description": "If true, enables printing of the OQ tag with the original base qualities (with -BQSR).", "id": "#emit_original_quals" }, { "sbg:altPrefix": "-dt", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", { "type": "enum", "symbols": [ "NONE", "ALL_READS", "BY_SAMPLE" ], "name": "downsampling_type" } ], "inputBinding": { "position": 0, "prefix": "--downsampling_type", "separate": true, "sbg:cmdInclude": true }, "label": "Downsampling Type", "description": "Type of reads downsampling to employ at a given locus. Reads will be selected randomly to be removed from the pile based on the method described here.", "id": "#downsampling_type" }, { "sbg:altPrefix": "-dfrac", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--downsample_to_fraction", "separate": true, "sbg:cmdInclude": true }, "label": "Downsample to Fraction", "description": "Fraction [0.0-1.0] of reads to downsample to.", "id": "#downsample_to_fraction" }, { "sbg:altPrefix": "-dcov", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--downsample_to_coverage", "separate": true, "sbg:cmdInclude": true }, "label": "Downsample to Coverage", "description": "Coverage to downsample to at any given locus; note that downsampled reads are randomly selected from all possible reads at a locus. For non-locus-based traversals (eg., ReadWalkers), this sets the maximum number of reads at each alignment start position.", "id": "#downsample_to_coverage" }, { "sbg:altPrefix": null, "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--disableRandomization", "separate": true, "sbg:cmdInclude": true }, "label": "Disable Randomization", "description": "Completely eliminates randomization from nondeterministic methods. To be used mostly in the testing framework where dynamic parallelism can result in differing numbers of calls to the generator.", "id": "#disable_radnomization" }, { "sbg:altPrefix": "-DIQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--disable_indel_quals", "separate": true, "sbg:cmdInclude": true }, "label": "Disable Indel Quals", "description": "If 'true', disables printing of base insertion and base deletion tags (with -BQSR). Turns off printing of the base insertion and base deletion tags when using the -BQSR argument and only the base substitution qualities will be produced.", "id": "#disable_indel_quals" }, { "sbg:altPrefix": "-DBQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "-1", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--defaultBaseQualities", "separate": true, "sbg:cmdInclude": true }, "label": "Default Base Qualities", "description": "If reads are missing some or all base quality scores, this value will be used for all base quality scores.", "id": "#default_base_qualities" }, { "sbg:altPrefix": null, "sbg:category": "Execution", "sbg:toolDefaultValue": "1", "type": [ "null", "int" ], "label": "CPU per job", "description": "Number of CPUs per job.", "id": "#cpu_per_job" }, { "sbg:altPrefix": "-baqGOP", "sbg:category": "GATK General", "sbg:toolDefaultValue": "40.0", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--baqGapOpenPenalty", "separate": true, "sbg:cmdInclude": true }, "label": "BAQ Gap Open Penalty", "description": "BAQ gap open penalty (Phred Scaled). Default value is 40. 30 is perhaps better for whole genome call sets.", "id": "#baq_gap_open_penalty" }, { "sbg:altPrefix": "-baq", "sbg:category": "GATK General", "sbg:toolDefaultValue": "OFF", "type": [ "null", { "type": "enum", "symbols": [ "OFF", "CALCULATE_AS_NECESSARY", "RECALCULATE" ], "name": "baq" } ], "inputBinding": { "position": 0, "prefix": "--baq", "separate": true, "sbg:cmdInclude": true }, "label": "BAQ Calculation Type", "description": "Type of BAQ calculation to apply in the engine.", "id": "#baq" }, { "sbg:altPrefix": "-assumeIdenticalSamples", "sbg:category": "Combine Variants", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--assumeIdenticalSamples", "separate": true, "sbg:cmdInclude": true }, "label": "Assume Identical Samples", "description": "If true, assume input VCFs have identical sample sets and disjoint calls.", "id": "#assume_identical_samples" }, { "sbg:altPrefix": "--allow_potentially_misencoded_quality_scores", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "-allowPotentiallyMisencodedQuals", "separate": true, "sbg:cmdInclude": true }, "label": "Allow Potentially Misencoded Quals", "description": "Do not fail when encountered base qualities that are too high and seemingly indicate a problem with the base quality encoding of the BAM file.", "id": "#allow_potentailly_misencoded_quals" } ], "outputs": [ { "type": [ "File" ], "label": "Output Combined VCF", "description": "File to which variants should be written.", "sbg:fileTypes": "VCF", "outputBinding": { "glob": "*.vcf", "sbg:inheritMetadataFrom": "#variants", "secondaryFiles": [ ".idx" ] }, "id": "#combined_vcf" } ], "requirements": [ { "class": "ExpressionEngineRequirement", "id": "#cwl-js-engine", "requirements": [ { "class": "DockerRequirement", "dockerPull": "rabix/js-engine" } ] } ], "hints": [ { "class": "sbg:CPURequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.cpu_per_job){\n \treturn $job.inputs.cpu_per_job\n }\n return 1 \n}" } }, { "class": "sbg:MemRequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.memory_per_job){\n if($job.inputs.memory_overhead_per_job){\n \treturn $job.inputs.memory_per_job + $job.inputs.memory_overhead_per_job\n }\n else\n \t\treturn $job.inputs.memory_per_job\n }\n else if(!$job.inputs.memory_per_job && $job.inputs.memory_overhead_per_job){\n\t\treturn 2048 + $job.inputs.memory_overhead_per_job \n }\n else\n \treturn 2048\n}" } }, { "class": "DockerRequirement", "dockerImageId": "47510cb2da55", "dockerPull": "images.sbgenomics.com/stefanristeski/gatk2-lite:2.3-9" } ], "arguments": [ { "position": 0, "prefix": "--out", "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n var input_file = [].concat($job.inputs.variants)[0]\n var meta = input_file.metadata\n if(meta){\n \tvar sample = meta.sample_id ? meta.sample_id : 'Unknown'\n \tvar library = meta.library_id ? meta.library_id : 'Unknown'\n \tvar platform_unit = meta.platform_unit_id ? meta.platform_unit_id : 'Unknown'\n if(sample !== 'Unknown' || library !== 'Unknown' || platform_unit !== 'Unknown'){\n \treturn ['Sample_' + sample, 'Library_' + library, 'Platform_Unit_' + platform_unit, 'combined', 'vcf'].join('.')\n \t}\n }\n var variant_name = input_file.path.replace(/^.*[\\\\\\/]/, '').split('.')\n var variant_namebase = variant_name.slice(0, -1).join('.')\n return variant_namebase + '.combined.vcf'\n}" } } ], "sbg:job": { "inputs": { "variants": [ { "metadata": { "sample_id": "XY" }, "path": "variant.vcf", "secondaryFiles": [] } ], "validation_strictness": null, "use_original_qualities": null, "use_legacy_downsampler": null, "unsafe": null, "tag": null, "suppress_command_line_header": null, "set_key": null, "remove_program_records": null, "reference": null, "read_group_black_list": [], "read_filter": [], "print_complex_merges": null, "preserve_qscores_less_than": null, "phone_home": null, "pedigree_validation_type": null, "pedigree_string": [], "non_deterministic_random_seed": null, "minimum_n": null, "minimal_vcf": null, "merge_info_with_max_ac": null, "memory_per_job": null, "memory_overhead_per_job": 0, "max_runtime_units": null, "max_runtime": null, "keep_program_records": null, "intervals_file": null, "intervals": null, "interval_set_rule": null, "interval_padding": null, "interval_merging": null, "genotypemergeoption": null, "gatk_key": null, "fix_misencoded_quals": null, "filteredrecordsmergetype": null, "filtered_are_uncalled": null, "exclude_intervals": null, "emit_original_quals": null, "downsampling_type": null, "downsample_to_fraction": null, "downsample_to_coverage": null, "disable_radnomization": null, "disable_indel_quals": null, "default_base_qualities": null, "cpu_per_job": null, "baq_gap_open_penalty": null, "baq": null, "assume_identical_samples": null, "allow_potentailly_misencoded_quals": null }, "allocatedResources": { "mem": 2048, "cpu": 1 } }, "sbg:categories": [ "VCF-Processing" ], "sbg:cmdPreview": "java -Xmx2048M -jar /opt/GenomeAnalysisTKLite.jar --analysis_type CombineVariants --variant variant.vcf --out Sample_XY.Library_Unknown.Platform_Unit_Unknown.combined.vcf", "sbg:contributors": [ "bix-demo" ], "sbg:createdBy": "bix-demo", "sbg:createdOn": 1450911447, "sbg:id": "admin/sbg-public-data/gatk-2-3-9-lite-combinevariants/13", "sbg:image_url": null, "sbg:latestRevision": 9, "sbg:license": "MIT License", "sbg:links": [ { "id": "https://www.broadinstitute.org/gatk/index.php", "label": "Homepage" }, { "id": "https://github.com/broadgsa/gatk-protected", "label": "Source Code" }, { "id": "https://www.broadinstitute.org/gatk/guide/pdfdocs/GATK_GuideBook_2.3-9.pdf", "label": "Wiki" }, { "id": "https://www.broadinstitute.org/gatk/download/auth?package=GATK-archive&version=2.3-9-ge5ebf34", "label": "Download" }, { "id": "https://www.broadinstitute.org/gatk/about/#in-the-literature", "label": "Publication" }, { "id": "https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantutils_CombineVariants.php", "label": "Documentation" } ], "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1458841427, "sbg:project": "bix-demo/gatk-2-3-9-lite-demo", "sbg:revision": 9, "sbg:revisionsInfo": [ { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911447, "sbg:revision": 0 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911448, "sbg:revision": 1 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911449, "sbg:revision": 2 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911450, "sbg:revision": 3 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911451, "sbg:revision": 4 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911452, "sbg:revision": 5 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911453, "sbg:revision": 6 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911455, "sbg:revision": 7 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911456, "sbg:revision": 8 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1458841427, "sbg:revision": 9 } ], "sbg:sbgMaintained": false, "sbg:toolAuthor": "Broad Institute", "sbg:toolkit": "GATK", "sbg:toolkitVersion": "2.3.9 Lite", "sbg:validationErrors": [], "x": 2171.078624463554, "y": 435.23531175594775 }, "label": "GATK CombineVariants", "sbg:x": 2171.078624463554, "sbg:y": 435.23531175594775 }, { "id": "#GATK_VariantRecalibrator", "inputs": [ { "id": "#GATK_VariantRecalibrator.variants", "source": [ "#GATK_CombineVariants.combined_vcf" ] }, { "id": "#GATK_VariantRecalibrator.use_annotation", "default": [ "QD", "MQRankSum", "FS", "DP", "ReadPosRankSum", "HaplotypeScore" ] }, { "id": "#GATK_VariantRecalibrator.threads_per_job", "default": 32 }, { "id": "#GATK_VariantRecalibrator.resources_files", "source": [ "#SBG_Prepare_VQSR_Omni.output_vcf", "#SBG_Prepare_VQSR_dbSNP.output_vcf", "#SBG_Prepare_VQSR_1000G.output_vcf", "#SBG_Prepare_VQSR_HapMap.output_vcf" ] }, { "id": "#GATK_VariantRecalibrator.reference", "source": [ "#SBG_FASTA_Indices.fasta_reference" ] }, { "id": "#GATK_VariantRecalibrator.mode", "default": "SNP" }, { "id": "#GATK_VariantRecalibrator.memory_per_job", "default": 20000 }, { "id": "#GATK_VariantRecalibrator.cpu_per_job", "default": 1 } ], "outputs": [ { "id": "#GATK_VariantRecalibrator.tranches_plot" }, { "id": "#GATK_VariantRecalibrator.tranches_file" }, { "id": "#GATK_VariantRecalibrator.rscript_file" }, { "id": "#GATK_VariantRecalibrator.recal_file" }, { "id": "#GATK_VariantRecalibrator.R_plots" } ], "run": { "cwlVersion": "sbg:draft-2", "class": "CommandLineTool", "id": "admin/sbg-public-data/gatk-2-3-9-lite-variantrecalibrator/6", "label": "SNP GATK VariantRecalibrator", "description": "Overview\n\nThis tool performs the first pass in a two-stage process called VQSR; the second pass is performed by the ApplyRecalibration tool. In brief, the first pass consists of creating a Gaussian mixture model by looking at the distribution of annotation values over a high quality subset of the input call set, and then scoring all input variants according to the model. The second pass consists of filtering variants based on score cutoffs identified in the first pass.\n\nThe purpose of the variant recalibrator is to assign a well-calibrated probability to each variant call in a call set. You can then create highly accurate call sets by filtering based on this single estimate for the accuracy of each call. The approach taken by variant quality score recalibration is to develop a continuous, covarying estimate of the relationship between SNP call annotations (such as QD, MQ, and ReadPosRankSum, for example) and the probability that a SNP is a true genetic variant versus a sequencing or data processing artifact. This model is determined adaptively based on \"true sites\" provided as input, typically HapMap 3 sites and those sites found to be polymorphic on the Omni 2.5M SNP chip array (in humans). This adaptive error model can then be applied to both known and novel variation discovered in the call set of interest to evaluate the probability that each call is real. The score that gets added to the INFO field of each variant is called the VQSLOD. It is the log odds of being a true variant versus being false under the trained Gaussian mixture model.\n\nVQSR is probably the hardest part of the Best Practices to get right, so be sure to read the method documentation, parameter recommendations and tutorial to really understand what these tools and how to use them for best results on your own data.\n\nInputs\nThe input raw variants to be recalibrated.\nKnown, truth, and training sets to be used by the algorithm. See the method documentation for more details.\n\nOutputs\nA recalibration table file that will be used by the ApplyRecalibration tool.\nA tranches file which shows various metrics of the recalibration callset for slices of the data.\n\nUsage example\n\nRecalibrating SNPs in exome data\n java -Xmx4g -jar GenomeAnalysisTK.jar \\\n -T VariantRecalibrator \\\n -R reference.fasta \\\n -input raw_variants.vcf \\\n -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.b37.sites.vcf \\\n -resource:omni,known=false,training=true,truth=false,prior=12.0 1000G_omni2.5.b37.sites.vcf \\\n -resource:1000G,known=false,training=true,truth=false,prior=10.0 1000G_phase1.snps.high_confidence.vcf\n -resource:dbsnp,known=true,training=false,truth=false,prior=6.0 dbsnp_135.b37.vcf \\\n -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -an InbreedingCoeff \\\n -mode SNP \\\n -recalFile output.recal \\\n -tranchesFile output.tranches \\\n -rscriptFile output.plots.R\n \nCaveats\n\nThe values used in the example above are only meant to show how the command lines are composed. They are not meant to be taken as specific recommendations of values to use in your own work, and they may be different from the values cited elsewhere in our documentation. For the latest and greatest recommendations on how to set parameter values for you own analyses, please read the Best Practices section of the documentation, especially the FAQ document on VQSR parameters.\nWhole genomes and exomes take slightly different parameters, so make sure you adapt your commands accordingly! See the documents linked above for details.\nIf you work with small datasets (e.g. targeted capture experiments or small number of exomes), you will run into problems. Read the docs linked above for advice on how to deal with those issues.\nIn order to create the model reporting plots Rscript needs to be in your environment PATH (this is the scripting version of R, not the interactive version). See http://www.r-project.org for more info on how to download and install R.\n\n(IMPORTANT) Reference \".fasta\" Secondary Files\n\nTools in GATK that require a fasta reference file also look for the reference file's corresponding .fai (fasta index) and .dict (fasta dictionary) files. The fasta index file allows random access to reference bases and the dictionary file is a dictionary of the contig names and sizes contained within the fasta reference. These two secondary files are essential for GATK to work properly. To append these two files to your fasta reference please use the 'SBG FASTA Indices' tool within your GATK based workflow before using any of the GATK tools.", "baseCommand": [ "java", { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.memory_per_job){\n \treturn '-Xmx'.concat($job.inputs.memory_per_job, 'M');\n }\n else{\n \treturn '-Xmx2048M';\n }\n}" }, "-jar", "/opt/GenomeAnalysisTKLite.jar", "--analysis_type", "VariantRecalibrator", { "class": "Expression", "engine": "#cwl-js-engine", "script": "{ \n if($job.inputs.threads_per_job){\n return '-nt '.concat($job.inputs.threads_per_job)\n }\n else{\n \treturn '-nt '.concat(8)\n }\n}" } ], "inputs": [ { "required": true, "sbg:altPrefix": "-input", "sbg:category": "Input Files", "type": [ { "type": "array", "items": "File" } ], "inputBinding": { "position": 0, "prefix": "--input", "separate": true, "sbg:cmdInclude": true }, "label": "Variants", "description": "The raw input variants to be recalibrated.", "sbg:fileTypes": "VCF", "id": "#variants" }, { "sbg:altPrefix": "-S", "sbg:category": "GATK General", "sbg:toolDefaultValue": "SILENT", "type": [ "null", { "type": "enum", "symbols": [ "SILENT", "LENIENT", "STRICT" ], "name": "validation_strictness" } ], "inputBinding": { "position": 0, "prefix": "--validation_strictness", "separate": true, "sbg:cmdInclude": true }, "label": "Validation Strictness", "description": "How strict should we be with validation.", "id": "#validation_strictness" }, { "sbg:altPrefix": "-OQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--useOriginalQualities", "separate": true, "sbg:cmdInclude": true }, "label": "Use Original Qualities", "description": "If set, use the original base quality scores from the OQ tag when present instead of the standard scores.", "id": "#use_original_qualities" }, { "sbg:altPrefix": "-use_legacy_downsampler", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--use_legacy_downsampler", "separate": true, "sbg:cmdInclude": true }, "label": "Use Legacy Downsampler", "description": "Use the legacy downsampling implementation instead of the newer, less-tested implementation.", "id": "#use_legacy_downsampler" }, { "sbg:altPrefix": "-an", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "[]", "type": [ { "type": "array", "items": "string" } ], "inputBinding": { "position": 0, "prefix": "--use_annotation", "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if ($job.inputs.use_annotation.length == 0){\n \treturn ['QD', 'MQRankSum', 'FS', 'DP', 'ReadPosRankSum', 'HaplotypeScore']\n }\n else\n return $job.inputs.use_annotation\n\n\n}" }, "sbg:cmdInclude": true }, "label": "Use Annotation", "description": "The names of the annotations which should used for calculations (from input VCF INFO fields).", "id": "#use_annotation" }, { "sbg:altPrefix": "-U", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", { "type": "enum", "symbols": [ "ALLOW_UNINDEXED_BAM", "ALLOW_UNSET_BAM_SORT_ORDER", "NO_READ_ORDER_VERIFICATION", "ALLOW_SEQ_DICT_INCOMPATIBILITY", "LENIENT_VCF_PROCESSING", "ALL" ], "name": "unsafe" } ], "inputBinding": { "position": 0, "prefix": "--unsafe", "separate": true, "sbg:cmdInclude": true }, "label": "Unsafe", "description": "If set, enables unsafe operations: nothing will be checked at runtime. For expert users only who know what they are doing. We do not support usage of this argument.", "id": "#unsafe" }, { "sbg:altPrefix": "-ts_filter_level", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "99.0", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--ts_filter_level", "separate": true, "sbg:cmdInclude": true }, "label": "Ts Filter Level", "description": "The truth sensitivity level at which to start filtering, used here to indicate filtered variants in the model reporting plots.", "id": "#ts_filter_level" }, { "sbg:altPrefix": "-allPoly", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--trustAllPolymorphic", "separate": true, "sbg:cmdInclude": true }, "label": "Trust All Polymorphic", "description": "Trust that all the input training sets' unfiltered records contain only polymorphic sites to drastically speed up the computation.", "id": "#trust_all_polymorphic" }, { "sbg:altPrefix": null, "sbg:category": "Execution", "sbg:toolDefaultValue": "8", "type": [ "null", "int" ], "label": "Threads per job", "description": "For tools which support multiprocessing, this value can be used to set the number of threads to be used.", "id": "#threads_per_job" }, { "sbg:altPrefix": "-titv", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "2.15", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--target_titv", "separate": true, "sbg:cmdInclude": true }, "label": "Target Titv", "description": "The expected novel Ti/Tv ratio to use when calculating FDR tranches and for display on the optimization curve output figures. (approx 2.15 for whole genome experiments). ONLY USED FOR PLOTTING PURPOSES!.", "id": "#target_titv" }, { "sbg:altPrefix": "-tag", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "string" ], "inputBinding": { "position": 0, "prefix": "--tag", "separate": true, "sbg:cmdInclude": true }, "label": "Tag", "description": "Arbitrary tag string to identify this GATK run as part of a group of runs, for later analysis.", "id": "#tag" }, { "sbg:altPrefix": "-tranche", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "[100.0, 99.9, 99.0, 90.0]", "type": [ "null", { "type": "array", "items": "float" } ], "inputBinding": { "position": 0, "prefix": "--TStranche", "separate": true, "sbg:cmdInclude": true }, "label": "T Stranche", "description": "The levels of novel false discovery rate (FDR, implied by ti/tv) at which to slice the data. (in percent, that is 1.0 for 1 percent).", "id": "#t_stranche" }, { "sbg:altPrefix": "-std", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "14.0", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--stdThreshold", "separate": true, "sbg:cmdInclude": true }, "label": "Std Threshold", "description": "If a variant has annotations more than -std standard deviations away from mean then don't use it for building the Gaussian mixture model.", "id": "#std_threshold" }, { "sbg:altPrefix": "-shrinkage", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "1.0", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--shrinkage", "separate": true, "sbg:cmdInclude": true }, "label": "Shrinkage", "description": "The shrinkage parameter in the variational Bayes algorithm.", "id": "#shrinkage" }, { "required": true, "sbg:altPrefix": null, "sbg:category": "Input Files", "type": [ { "type": "array", "items": "File" } ], "label": "Resources", "description": "Resources.", "sbg:fileTypes": "VCF", "id": "#resources_files" }, { "sbg:altPrefix": "-rpr", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--remove_program_records", "separate": true, "sbg:cmdInclude": true }, "label": "Remove Program Records", "description": "Should we override the Walker's default and remove program records from the SAM header.", "id": "#remove_program_records" }, { "required": true, "sbg:altPrefix": "-R", "sbg:category": "Input Files", "type": [ "File" ], "inputBinding": { "position": 0, "prefix": "--reference_sequence", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Reference Genome", "description": "Reference Genome in FASTA format.", "sbg:fileTypes": "FASTA, FA", "id": "#reference" }, { "sbg:altPrefix": "-rgbl", "sbg:category": "GATK General", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 0, "prefix": "--read_group_black_list", "separate": true, "sbg:cmdInclude": true }, "label": "Read Group Black List", "description": "Filters out read groups matching : or a .txt file containing the filter strings one per line.", "id": "#read_group_black_list" }, { "sbg:altPrefix": "-rf", "sbg:category": "GATK General", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": { "type": "enum", "symbols": [ "BadCigarFilter", "BadMateFilter", "CountingFilteringIterator.CountingReadFilter", "DuplicateReadFilter", "FailsVendorQualityCheckFilter", "HCMappingQualityFilter", "LibraryReadFilter", "MalformedReadFilter", "MappingQualityFilter", "MappingQualityUnavailableFilter", "MappingQualityZeroFilter", "MateSameStrandFilter", "MaxInsertSizeFilter", "MissingReadGroupFilter", "NoOriginalQualityScoresFilter", "NotPrimaryAlignmentFilter", "OverclippedReadFilter", "Platform454Filter", "PlatformFilter", "PlatformUnitFilter", "ReadGroupBlackListFilter", "ReadLengthFilter", "ReadNameFilter", "ReadStrandFilter", "ReassignMappingQualityFilter", "ReassignOneMappingQualityFilter", "SampleFilter", "SingleReadGroupFilter", "UnmappedReadFilter" ] } } ], "inputBinding": { "position": 0, "prefix": "--read_filter", "separate": true, "sbg:cmdInclude": true }, "label": "Read Filter", "description": "Specify filtration criteria to apply to each read individually.", "id": "#read_filter" }, { "sbg:altPrefix": "-qual", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "80.0", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--qualThreshold", "separate": true, "sbg:cmdInclude": true }, "label": "Qual Threshold", "description": "If a known variant has raw QUAL value less than -qual then don't use it for building the Gaussian mixture model.", "id": "#qual_threshold" }, { "sbg:altPrefix": "-priorCounts", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "20.0", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--priorCounts", "separate": true, "sbg:cmdInclude": true }, "label": "Prior Counts", "description": "The number of prior counts to use in the variational Bayes algorithm.", "id": "#prior_counts" }, { "sbg:altPrefix": "-preserveQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "6", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--preserve_qscores_less_than", "separate": true, "sbg:cmdInclude": true }, "label": "Preserve Qscores Less Than", "description": "Bases with quality scores less than this threshold won't be recalibrated (with -BQSR).", "id": "#preserve_qscores_less_than" }, { "sbg:altPrefix": "-et", "sbg:category": "GATK General", "sbg:toolDefaultValue": "STANDARD", "type": [ "null", { "type": "enum", "symbols": [ "NO_ET", "STANDARD" ], "name": "phone_home" } ], "inputBinding": { "position": 0, "prefix": "--phone_home", "separate": true, "sbg:cmdInclude": true }, "label": "Phone Home", "description": "What kind of GATK run report should we generate? STANDARD is the default, can be NO_ET so nothing is posted to the run repository. Please see http://gatkforums.broadinstitute.org/discussion/1250/what-is-phone-home-and-how-does-it-affect-me#latest for details.", "id": "#phone_home" }, { "sbg:altPrefix": "-percentBad", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "0.03", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--percentBadVariants", "separate": true, "sbg:cmdInclude": true }, "label": "Percent Bad Variants", "description": "What percentage of the worst scoring variants to use when building the Gaussian mixture model of bad variants. 0.07 means bottom 7 percent.", "id": "#percent_bad_variants" }, { "sbg:altPrefix": "-pedValidationType", "sbg:category": "GATK General", "sbg:toolDefaultValue": "STRICT", "type": [ "null", { "type": "enum", "symbols": [ "STRICT", "SILENT" ], "name": "pedigree_validation_type" } ], "inputBinding": { "position": 0, "prefix": "--pedigreeValidationType", "separate": true, "sbg:cmdInclude": true }, "label": "Pedigree Validation Type", "description": "How strict should we be in validating the pedigree information?.", "id": "#pedigree_validation_type" }, { "sbg:altPrefix": "-pedString", "sbg:category": "GATK General", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 0, "prefix": "--pedigreeString", "separate": true, "sbg:cmdInclude": true }, "label": "Pedigree String", "description": "Pedigree string for samples.", "id": "#pedigree_string" }, { "sbg:altPrefix": "-nKM", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "30", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--numKMeans", "separate": true, "sbg:cmdInclude": true }, "label": "Num K Means", "description": "The number of k-means iterations to perform in order to initialize the means of the Gaussians in the Gaussian mixture model.", "id": "#num_k_means" }, { "sbg:altPrefix": "-ndrs", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--nonDeterministicRandomSeed", "separate": true, "sbg:cmdInclude": true }, "label": "Non Deterministic Random Seed", "description": "Makes the GATK behave non deterministically, that is, the random numbers generated will be different in every run.", "id": "#non_deterministic_random_seed" }, { "sbg:altPrefix": "-mode", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "SNP", "type": [ "null", { "type": "enum", "symbols": [ "SNP", "INDEL", "BOTH" ], "name": "mode" } ], "inputBinding": { "position": 0, "prefix": "--mode", "separate": true, "sbg:cmdInclude": true }, "label": "Mode", "description": "Recalibration mode to employ: 1.) SNP for recalibrating only snps (emitting indels untouched in the output VCF); 2.) INDEL for indels; and 3.) BOTH for recalibrating both snps and indels simultaneously.", "id": "#mode" }, { "sbg:altPrefix": "-minNumBad", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "2500", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--minNumBadVariants", "separate": true, "sbg:cmdInclude": true }, "label": "Min Num Bad Variants", "description": "The minimum amount of worst scoring variants to use when building the Gaussian mixture model of bad variants. Will override -percentBad argument if necessary.", "id": "#min_num_bad_variants" }, { "sbg:altPrefix": null, "sbg:category": "Execution", "sbg:toolDefaultValue": "2048", "type": [ "null", "int" ], "label": "Memory per job", "description": "Amount of RAM memory to be used per job.", "id": "#memory_per_job" }, { "sbg:category": "Execution", "sbg:toolDefaultValue": "0", "type": [ "null", "int" ], "label": "Memory overhead per job", "description": "Memory overhead per job. By default this parameter value is set to '0' (zero megabytes). This parameter value is added to the Memory per job parameter value. This results in the allocation of the sum total (Memory per job and Memory overhead per job) amount of memory per job. By default the memory per job parameter value is set to 2048 megabytes, unless specified otherwise.", "id": "#memory_overhead_per_job" }, { "sbg:altPrefix": "-maxRuntimeUnits", "sbg:category": "GATK General", "sbg:toolDefaultValue": "MINUTES", "type": [ "null", { "type": "enum", "symbols": [ "NANOSECONDS", "MICROSECONDS", "MILLISECONDS", "SECONDS", "MINUTES", "HOURS", "DAYS" ], "name": "max_runtime_units" } ], "inputBinding": { "position": 0, "prefix": "--maxRuntimeUnits", "separate": true, "sbg:cmdInclude": true }, "label": "Max Runtime Units", "description": "The TimeUnit for maxRuntime.", "id": "#max_runtime_units" }, { "sbg:altPrefix": "-maxRuntime", "sbg:category": "GATK General", "sbg:toolDefaultValue": "-1", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--maxRuntime", "separate": true, "sbg:cmdInclude": true }, "label": "Max Runtime", "description": "If provided, that GATK will stop execution cleanly as soon after maxRuntime has been exceeded, truncating the run but not exiting with a failure. By default the value is interpreted in minutes, but this can be changed by maxRuntimeUnits.", "id": "#max_runtime" }, { "sbg:altPrefix": "-mI", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "100", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--maxIterations", "separate": true, "sbg:cmdInclude": true }, "label": "Max Iterations", "description": "The maximum number of VBEM iterations to be performed in variational Bayes algorithm. Procedure will normally end when convergence is detected.", "id": "#max_iterations" }, { "sbg:altPrefix": "-mG", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "10", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--maxGaussians", "separate": true, "sbg:cmdInclude": true }, "label": "Max Gaussians", "description": "The maximum number of Gaussians to try during variational Bayes algorithm.", "id": "#max_gaussians" }, { "sbg:altPrefix": "-kpr", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--keep_program_records", "separate": true, "sbg:cmdInclude": true }, "label": "Keep Program Records", "description": "Should we override the Walker's default and keep program records from the SAM header.", "id": "#keep_program_records" }, { "required": false, "sbg:altPrefix": "-L", "sbg:category": "Input Files", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--intervals", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Intervals", "description": "One or more genomic intervals over which to operate. Can be an specified in an .intervals file or a rod file.", "sbg:fileTypes": "TXT, BED, VCF", "id": "#intervals_file" }, { "sbg:altPrefix": "--intervals", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "string" ], "inputBinding": { "position": 0, "prefix": "-L", "separate": true, "sbg:cmdInclude": true }, "label": "Intervals", "description": "One or more genomic intervals over which to operate.", "id": "#intervals" }, { "sbg:altPrefix": "-isr", "sbg:category": "GATK General", "sbg:toolDefaultValue": "UNION", "type": [ "null", { "type": "enum", "symbols": [ "UNION", "INTERSECTION" ], "name": "interval_set_rule" } ], "inputBinding": { "position": 0, "prefix": "--interval_set_rule", "separate": true, "sbg:cmdInclude": true }, "label": "Interval Set Rule", "description": "Indicates the set merging approach the interval parser should use to combine the various -L or -XL inputs.", "id": "#interval_set_rule" }, { "sbg:altPrefix": "-ip", "sbg:category": "GATK General", "sbg:toolDefaultValue": "0", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--interval_padding", "separate": true, "sbg:cmdInclude": true }, "label": "Interval Padding", "description": "Indicates how many basepairs of padding to include around each of the intervals specified with the -L/--intervals argument.", "id": "#interval_padding" }, { "sbg:altPrefix": "-im", "sbg:category": "GATK General", "sbg:toolDefaultValue": "ALL", "type": [ "null", { "type": "enum", "symbols": [ "ALL", "OVERLAPPING_ONLY" ], "name": "interval_merging" } ], "inputBinding": { "position": 0, "prefix": "--interval_merging", "separate": true, "sbg:cmdInclude": true }, "label": "Interval Merging", "description": "Indicates the interval merging rule we should use for abutting intervals.", "id": "#interval_merging" }, { "sbg:altPrefix": "-ignoreFilter", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 0, "prefix": "--ignore_filter", "separate": true, "sbg:cmdInclude": true }, "label": "Ignore Filter", "description": "If specified the variant recalibrator will use variants even if the specified filter name is marked in the input VCF file.", "id": "#ignore_filter" }, { "required": false, "sbg:altPrefix": "-K", "sbg:category": "Input Files", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--gatk_key", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Gatk key", "description": "GATK Key file. Required if running with -et NO_ET. Please see http://gatkforums.broadinstitute.org/discussion/1250/what-is-phone-home-and-how-does-it-affect-me#latest for details.", "sbg:fileTypes": "KEY, LICENSE", "id": "#gatk_key" }, { "sbg:altPrefix": "-fixMisencodedQuals", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "-fixMisencodedQuals", "separate": true, "sbg:cmdInclude": true }, "label": "Fix Misencoded Quals", "description": "Fix mis-encoded base quality scores.", "id": "#fix_misencoded_quals" }, { "required": false, "sbg:altPrefix": "-XL", "sbg:category": "Input Files", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--excludeIntervals", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Exclude Intervals", "description": "One or more genomic intervals to exclude from processing. Can be an .intervals file or a rod file.", "sbg:fileTypes": "TXT, BED, VCF", "id": "#exclude_intervals" }, { "sbg:altPrefix": "-EOQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--emit_original_quals", "separate": true, "sbg:cmdInclude": true }, "label": "Emit Original Quals", "description": "If true, enables printing of the OQ tag with the original base qualities (with -BQSR).", "id": "#emit_original_quals" }, { "sbg:altPrefix": "-dt", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", { "type": "enum", "symbols": [ "NONE", "ALL_READS", "BY_SAMPLE" ], "name": "downsampling_type" } ], "inputBinding": { "position": 0, "prefix": "--downsampling_type", "separate": true, "sbg:cmdInclude": true }, "label": "Downsampling Type", "description": "Type of reads downsampling to employ at a given locus. Reads will be selected randomly to be removed from the pile based on the method described here.", "id": "#downsampling_type" }, { "sbg:altPrefix": "-dfrac", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--downsample_to_fraction", "separate": true, "sbg:cmdInclude": true }, "label": "Downsample to Fraction", "description": "Fraction [0.0-1.0] of reads to downsample to.", "id": "#downsample_to_fraction" }, { "sbg:altPrefix": "-dcov", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--downsample_to_coverage", "separate": true, "sbg:cmdInclude": true }, "label": "Downsample to Coverage", "description": "Coverage to downsample to at any given locus; note that downsampled reads are randomly selected from all possible reads at a locus. For non-locus-based traversals (eg., ReadWalkers), this sets the maximum number of reads at each alignment start position.", "id": "#downsample_to_coverage" }, { "sbg:altPrefix": null, "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--disableRandomization", "separate": true, "sbg:cmdInclude": true }, "label": "Disable Randomization", "description": "Completely eliminates randomization from nondeterministic methods. To be used mostly in the testing framework where dynamic parallelism can result in differing numbers of calls to the generator.", "id": "#disable_radnomization" }, { "sbg:altPrefix": "-DIQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--disable_indel_quals", "separate": true, "sbg:cmdInclude": true }, "label": "Disable Indel Quals", "description": "If 'true', disables printing of base insertion and base deletion tags (with -BQSR). Turns off printing of the base insertion and base deletion tags when using the -BQSR argument and only the base substitution qualities will be produced.", "id": "#disable_indel_quals" }, { "sbg:altPrefix": "-dirichlet", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "0.001", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--dirichlet", "separate": true, "sbg:cmdInclude": true }, "label": "Dirichlet", "description": "The dirichlet parameter in the variational Bayes algorithm.", "id": "#dirichlet" }, { "sbg:altPrefix": "-DBQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "-1", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--defaultBaseQualities", "separate": true, "sbg:cmdInclude": true }, "label": "Default Base Qualities", "description": "If reads are missing some or all base quality scores, this value will be used for all base quality scores.", "id": "#default_base_qualities" }, { "sbg:altPrefix": null, "sbg:category": "Execution", "sbg:toolDefaultValue": "1", "type": [ "null", "int" ], "label": "CPU per job", "description": "Number of CPUs per job.", "id": "#cpu_per_job" }, { "sbg:altPrefix": "-baqGOP", "sbg:category": "GATK General", "sbg:toolDefaultValue": "40.0", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--baqGapOpenPenalty", "separate": true, "sbg:cmdInclude": true }, "label": "BAQ Gap Open Penalty", "description": "BAQ gap open penalty (Phred Scaled). Default value is 40. 30 is perhaps better for whole genome call sets.", "id": "#baq_gap_open_penalty" }, { "sbg:altPrefix": "-baq", "sbg:category": "GATK General", "sbg:toolDefaultValue": "OFF", "type": [ "null", { "type": "enum", "symbols": [ "OFF", "CALCULATE_AS_NECESSARY", "RECALCULATE" ], "name": "baq" } ], "inputBinding": { "position": 0, "prefix": "--baq", "separate": true, "sbg:cmdInclude": true }, "label": "BAQ Calculation Type", "description": "Type of BAQ calculation to apply in the engine.", "id": "#baq" }, { "sbg:altPrefix": "--allow_potentially_misencoded_quality_scores", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "-allowPotentiallyMisencodedQuals", "separate": true, "sbg:cmdInclude": true }, "label": "Allow Potentially Misencoded Quals", "description": "Do not fail when encountered base qualities that are too high and seemingly indicate a problem with the base quality encoding of the BAM file.", "id": "#allow_potentailly_misencoded_quals" } ], "outputs": [ { "type": [ "null", "File" ], "label": "Tranches Plot", "description": "PDF file containing tranches plot generated by VariantRecalibrator.", "sbg:fileTypes": "PDF", "outputBinding": { "glob": "*.tranches.pdf", "sbg:inheritMetadataFrom": "#variants" }, "id": "#tranches_plot" }, { "type": [ "null", "File" ], "label": "Tranches File", "description": "The output tranches file used by ApplyRecalibration.", "sbg:fileTypes": "TRANCHES", "outputBinding": { "glob": "*.tranches", "sbg:inheritMetadataFrom": "#variants" }, "id": "#tranches_file" }, { "type": [ "null", "File" ], "label": "Rscript File", "description": "The output rscript file generated by the VQSR to aid in visualization of the input data and learned model.", "sbg:fileTypes": "R", "outputBinding": { "glob": "*.recal.R", "sbg:inheritMetadataFrom": "#variants" }, "id": "#rscript_file" }, { "type": [ "File" ], "label": "Recal File", "description": "The output recal file used by ApplyRecalibration.", "sbg:fileTypes": "RECAL", "outputBinding": { "glob": "*.recal", "sbg:inheritMetadataFrom": "#variants" }, "id": "#recal_file" }, { "type": [ "null", "File" ], "label": "R Plots", "description": "PDF file containing plots generated by VariantRecalibrator.", "sbg:fileTypes": "PDF", "outputBinding": { "glob": "*.recal.R.pdf", "sbg:inheritMetadataFrom": "#variants" }, "id": "#R_plots" } ], "requirements": [ { "class": "ExpressionEngineRequirement", "id": "#cwl-js-engine", "requirements": [ { "class": "DockerRequirement", "dockerPull": "rabix/js-engine" } ] } ], "hints": [ { "class": "sbg:CPURequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.cpu_per_job){\n \treturn $job.inputs.cpu_per_job;\n }\n else{\n \treturn 1;\n }\n}" } }, { "class": "sbg:MemRequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.memory_per_job){\n if($job.inputs.memory_overhead_per_job){\n \treturn $job.inputs.memory_per_job + $job.inputs.memory_overhead_per_job\n }\n else\n \t\treturn $job.inputs.memory_per_job\n }\n else if(!$job.inputs.memory_per_job && $job.inputs.memory_overhead_per_job){\n\t\treturn 2048 + $job.inputs.memory_overhead_per_job \n }\n else\n \treturn 2048\n}" } }, { "class": "DockerRequirement", "dockerImageId": "47510cb2da55", "dockerPull": "images.sbgenomics.com/stefanristeski/gatk2-lite:2.3-9" } ], "arguments": [ { "position": 0, "prefix": "--recal_file", "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n variant_name = [].concat($job.inputs.variants)[0].path.replace(/^.*[\\\\\\/]/, '').split('.')\n variant_namebase = variant_name.slice(0, variant_name.length-1).join('.')\n return variant_namebase.concat('.recal')\n}" } }, { "position": 0, "prefix": "--rscript_file", "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n variant_name = [].concat($job.inputs.variants)[0].path.replace(/^.*[\\\\\\/]/, '').split('.')\n variant_namebase = variant_name.slice(0, variant_name.length-1).join('.')\n return variant_namebase.concat('.recal.R')\n}" } }, { "position": 0, "prefix": "--tranches_file", "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n variant_name = [].concat($job.inputs.variants)[0].path.replace(/^.*[\\\\\\/]/, '').split('.')\n variant_namebase = variant_name.slice(0, variant_name.length-1).join('.')\n return variant_namebase.concat('.tranches')\n}" } }, { "position": 1, "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n res = []\n for(i=0; i<$job.inputs.resources_files.length; i++){\n \tres.push($job.inputs.resources_files[i].metadata.resources, $job.inputs.resources_files[i].path);\n }\n return res.join(' ');\n}" } } ], "sbg:job": { "inputs": { "variants": [ { "path": "/f/some.vcf" }, { "path": "/f/some_other.vcf" } ], "validation_strictness": null, "use_original_qualities": null, "use_legacy_downsampler": null, "use_annotation": [], "unsafe": null, "ts_filter_level": null, "trust_all_polymorphic": null, "threads_per_job": null, "target_titv": null, "tag": null, "t_stranche": [], "std_threshold": null, "shrinkage": null, "resources_files": [ { "metadata": { "resources": "-resource:dbsnp,known=false,training=true,truth=false,prior=10.2", "some": "value" }, "path": "/dbsnp.vcf", "secondaryFiles": [] }, { "metadata": { "resources": "-resource:omni,known=false,training=true,truth=false,prior=10.2" }, "path": "/known.vcf", "secondaryFiles": [] } ], "remove_program_records": null, "reference": { "path": "/folder/reference.fasta" }, "read_group_black_list": [], "read_filter": [], "qual_threshold": null, "prior_counts": null, "preserve_qscores_less_than": null, "phone_home": null, "percent_bad_variants": null, "pedigree_validation_type": null, "pedigree_string": [], "num_k_means": null, "non_deterministic_random_seed": null, "mode": null, "min_num_bad_variants": null, "memory_per_job": null, "memory_overhead_per_job": 0, "max_runtime_units": null, "max_runtime": null, "max_iterations": null, "max_gaussians": null, "keep_program_records": null, "intervals_file": null, "intervals": null, "interval_set_rule": null, "interval_padding": null, "interval_merging": null, "ignore_filter": [], "gatk_key": null, "fix_misencoded_quals": null, "exclude_intervals": null, "emit_original_quals": null, "downsampling_type": null, "downsample_to_fraction": null, "downsample_to_coverage": null, "disable_radnomization": null, "disable_indel_quals": null, "dirichlet": null, "default_base_qualities": null, "cpu_per_job": null, "baq_gap_open_penalty": null, "baq": null, "allow_potentailly_misencoded_quals": null }, "allocatedResources": { "mem": 2048, "cpu": 1 } }, "sbg:categories": [ "VCF-Processing" ], "sbg:cmdPreview": "java -Xmx2048M -jar /opt/GenomeAnalysisTKLite.jar --analysis_type VariantRecalibrator -nt 8 --reference_sequence /folder/reference.fasta --input /f/some.vcf --input /f/some_other.vcf --use_annotation QD,MQRankSum,FS,DP,ReadPosRankSum,HaplotypeScore --recal_file some.recal --rscript_file some.recal.R --tranches_file some.tranches -resource:dbsnp,known=false,training=true,truth=false,prior=10.2 /dbsnp.vcf -resource:omni,known=false,training=true,truth=false,prior=10.2 /known.vcf", "sbg:contributors": [ "bix-demo" ], "sbg:createdBy": "bix-demo", "sbg:createdOn": 1450911440, "sbg:id": "admin/sbg-public-data/gatk-2-3-9-lite-variantrecalibrator/6", "sbg:image_url": null, "sbg:latestRevision": 5, "sbg:license": "MIT License", "sbg:links": [ { "id": "https://www.broadinstitute.org/gatk/index.php", "label": "Homepage" }, { "id": "https://github.com/broadgsa/gatk-protected", "label": "Source code" }, { "id": "https://www.broadinstitute.org/gatk/guide/pdfdocs/GATK_GuideBook_2.3-9.pdf", "label": "Wiki" }, { "id": "https://www.broadinstitute.org/gatk/download/auth?package=GATK-archive&version=2.3-9-ge5ebf34", "label": "Download" }, { "id": "https://www.broadinstitute.org/gatk/about/#in-the-literature", "label": "Publication" }, { "id": "https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantrecalibration_VariantRecalibrator.php", "label": "Documentation" } ], "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911446, "sbg:project": "bix-demo/gatk-2-3-9-lite-demo", "sbg:revision": 5, "sbg:revisionsInfo": [ { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911440, "sbg:revision": 0 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911441, "sbg:revision": 1 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911442, "sbg:revision": 2 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911443, "sbg:revision": 3 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911444, "sbg:revision": 4 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911446, "sbg:revision": 5 } ], "sbg:sbgMaintained": false, "sbg:toolAuthor": "Bread Institute", "sbg:toolkit": "GATK", "sbg:toolkitVersion": "2.3.9 Lite", "sbg:validationErrors": [], "x": 2796.76532909997, "y": -117.15687204030633 }, "label": "SNP GATK VariantRecalibrator", "sbg:x": 2796.76532909997, "sbg:y": -117.15687204030633 }, { "id": "#GATK_VariantRecalibrator_1", "inputs": [ { "id": "#GATK_VariantRecalibrator_1.variants", "source": [ "#GATK_CombineVariants.combined_vcf" ] }, { "id": "#GATK_VariantRecalibrator_1.use_annotation", "default": [ "QD", "DP", "FS", "ReadPosRankSum", "MQRankSum" ] }, { "id": "#GATK_VariantRecalibrator_1.threads_per_job", "default": 32 }, { "id": "#GATK_VariantRecalibrator_1.resources_files", "source": [ "#SBG_Prepare_VQSR_Mills.output_vcf", "#SBG_Prepare_VQSR_dbSNP.output_vcf" ] }, { "id": "#GATK_VariantRecalibrator_1.reference", "source": [ "#SBG_FASTA_Indices.fasta_reference" ] }, { "id": "#GATK_VariantRecalibrator_1.mode", "default": "INDEL" }, { "id": "#GATK_VariantRecalibrator_1.min_num_bad_variants", "default": 1000 }, { "id": "#GATK_VariantRecalibrator_1.memory_per_job", "default": 20000 }, { "id": "#GATK_VariantRecalibrator_1.max_gaussians", "default": 4 }, { "id": "#GATK_VariantRecalibrator_1.cpu_per_job", "default": 1 } ], "outputs": [ { "id": "#GATK_VariantRecalibrator_1.tranches_plot" }, { "id": "#GATK_VariantRecalibrator_1.tranches_file" }, { "id": "#GATK_VariantRecalibrator_1.rscript_file" }, { "id": "#GATK_VariantRecalibrator_1.recal_file" }, { "id": "#GATK_VariantRecalibrator_1.R_plots" } ], "run": { "cwlVersion": "sbg:draft-2", "class": "CommandLineTool", "id": "admin/sbg-public-data/gatk-2-3-9-lite-variantrecalibrator/6", "label": "INDEL GATK VariantRecalibrator", "description": "Overview\n\nThis tool performs the first pass in a two-stage process called VQSR; the second pass is performed by the ApplyRecalibration tool. In brief, the first pass consists of creating a Gaussian mixture model by looking at the distribution of annotation values over a high quality subset of the input call set, and then scoring all input variants according to the model. The second pass consists of filtering variants based on score cutoffs identified in the first pass.\n\nThe purpose of the variant recalibrator is to assign a well-calibrated probability to each variant call in a call set. You can then create highly accurate call sets by filtering based on this single estimate for the accuracy of each call. The approach taken by variant quality score recalibration is to develop a continuous, covarying estimate of the relationship between SNP call annotations (such as QD, MQ, and ReadPosRankSum, for example) and the probability that a SNP is a true genetic variant versus a sequencing or data processing artifact. This model is determined adaptively based on \"true sites\" provided as input, typically HapMap 3 sites and those sites found to be polymorphic on the Omni 2.5M SNP chip array (in humans). This adaptive error model can then be applied to both known and novel variation discovered in the call set of interest to evaluate the probability that each call is real. The score that gets added to the INFO field of each variant is called the VQSLOD. It is the log odds of being a true variant versus being false under the trained Gaussian mixture model.\n\nVQSR is probably the hardest part of the Best Practices to get right, so be sure to read the method documentation, parameter recommendations and tutorial to really understand what these tools and how to use them for best results on your own data.\n\nInputs\nThe input raw variants to be recalibrated.\nKnown, truth, and training sets to be used by the algorithm. See the method documentation for more details.\n\nOutputs\nA recalibration table file that will be used by the ApplyRecalibration tool.\nA tranches file which shows various metrics of the recalibration callset for slices of the data.\n\nUsage example\n\nRecalibrating SNPs in exome data\n java -Xmx4g -jar GenomeAnalysisTK.jar \\\n -T VariantRecalibrator \\\n -R reference.fasta \\\n -input raw_variants.vcf \\\n -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_3.3.b37.sites.vcf \\\n -resource:omni,known=false,training=true,truth=false,prior=12.0 1000G_omni2.5.b37.sites.vcf \\\n -resource:1000G,known=false,training=true,truth=false,prior=10.0 1000G_phase1.snps.high_confidence.vcf\n -resource:dbsnp,known=true,training=false,truth=false,prior=6.0 dbsnp_135.b37.vcf \\\n -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR -an InbreedingCoeff \\\n -mode SNP \\\n -recalFile output.recal \\\n -tranchesFile output.tranches \\\n -rscriptFile output.plots.R\n \nCaveats\n\nThe values used in the example above are only meant to show how the command lines are composed. They are not meant to be taken as specific recommendations of values to use in your own work, and they may be different from the values cited elsewhere in our documentation. For the latest and greatest recommendations on how to set parameter values for you own analyses, please read the Best Practices section of the documentation, especially the FAQ document on VQSR parameters.\nWhole genomes and exomes take slightly different parameters, so make sure you adapt your commands accordingly! See the documents linked above for details.\nIf you work with small datasets (e.g. targeted capture experiments or small number of exomes), you will run into problems. Read the docs linked above for advice on how to deal with those issues.\nIn order to create the model reporting plots Rscript needs to be in your environment PATH (this is the scripting version of R, not the interactive version). See http://www.r-project.org for more info on how to download and install R.\n\n(IMPORTANT) Reference \".fasta\" Secondary Files\n\nTools in GATK that require a fasta reference file also look for the reference file's corresponding .fai (fasta index) and .dict (fasta dictionary) files. The fasta index file allows random access to reference bases and the dictionary file is a dictionary of the contig names and sizes contained within the fasta reference. These two secondary files are essential for GATK to work properly. To append these two files to your fasta reference please use the 'SBG FASTA Indices' tool within your GATK based workflow before using any of the GATK tools.", "baseCommand": [ "java", { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.memory_per_job){\n \treturn '-Xmx'.concat($job.inputs.memory_per_job, 'M');\n }\n else{\n \treturn '-Xmx2048M';\n }\n}" }, "-jar", "/opt/GenomeAnalysisTKLite.jar", "--analysis_type", "VariantRecalibrator", { "class": "Expression", "engine": "#cwl-js-engine", "script": "{ \n if($job.inputs.threads_per_job){\n return '-nt '.concat($job.inputs.threads_per_job)\n }\n else{\n \treturn '-nt '.concat(8)\n }\n}" } ], "inputs": [ { "required": true, "sbg:altPrefix": "-input", "sbg:category": "Input Files", "type": [ { "type": "array", "items": "File" } ], "inputBinding": { "position": 0, "prefix": "--input", "separate": true, "sbg:cmdInclude": true }, "label": "Variants", "description": "The raw input variants to be recalibrated.", "sbg:fileTypes": "VCF", "id": "#variants" }, { "sbg:altPrefix": "-S", "sbg:category": "GATK General", "sbg:toolDefaultValue": "SILENT", "type": [ "null", { "type": "enum", "symbols": [ "SILENT", "LENIENT", "STRICT" ], "name": "validation_strictness" } ], "inputBinding": { "position": 0, "prefix": "--validation_strictness", "separate": true, "sbg:cmdInclude": true }, "label": "Validation Strictness", "description": "How strict should we be with validation.", "id": "#validation_strictness" }, { "sbg:altPrefix": "-OQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--useOriginalQualities", "separate": true, "sbg:cmdInclude": true }, "label": "Use Original Qualities", "description": "If set, use the original base quality scores from the OQ tag when present instead of the standard scores.", "id": "#use_original_qualities" }, { "sbg:altPrefix": "-use_legacy_downsampler", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--use_legacy_downsampler", "separate": true, "sbg:cmdInclude": true }, "label": "Use Legacy Downsampler", "description": "Use the legacy downsampling implementation instead of the newer, less-tested implementation.", "id": "#use_legacy_downsampler" }, { "sbg:altPrefix": "-an", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "[]", "type": [ { "type": "array", "items": "string" } ], "inputBinding": { "position": 0, "prefix": "--use_annotation", "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if ($job.inputs.use_annotation.length == 0){\n \treturn ['QD', 'MQRankSum', 'FS', 'DP', 'ReadPosRankSum', 'HaplotypeScore']\n }\n else\n return $job.inputs.use_annotation\n\n\n}" }, "sbg:cmdInclude": true }, "label": "Use Annotation", "description": "The names of the annotations which should used for calculations (from input VCF INFO fields).", "id": "#use_annotation" }, { "sbg:altPrefix": "-U", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", { "type": "enum", "symbols": [ "ALLOW_UNINDEXED_BAM", "ALLOW_UNSET_BAM_SORT_ORDER", "NO_READ_ORDER_VERIFICATION", "ALLOW_SEQ_DICT_INCOMPATIBILITY", "LENIENT_VCF_PROCESSING", "ALL" ], "name": "unsafe" } ], "inputBinding": { "position": 0, "prefix": "--unsafe", "separate": true, "sbg:cmdInclude": true }, "label": "Unsafe", "description": "If set, enables unsafe operations: nothing will be checked at runtime. For expert users only who know what they are doing. We do not support usage of this argument.", "id": "#unsafe" }, { "sbg:altPrefix": "-ts_filter_level", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "99.0", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--ts_filter_level", "separate": true, "sbg:cmdInclude": true }, "label": "Ts Filter Level", "description": "The truth sensitivity level at which to start filtering, used here to indicate filtered variants in the model reporting plots.", "id": "#ts_filter_level" }, { "sbg:altPrefix": "-allPoly", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--trustAllPolymorphic", "separate": true, "sbg:cmdInclude": true }, "label": "Trust All Polymorphic", "description": "Trust that all the input training sets' unfiltered records contain only polymorphic sites to drastically speed up the computation.", "id": "#trust_all_polymorphic" }, { "sbg:altPrefix": null, "sbg:category": "Execution", "sbg:toolDefaultValue": "8", "type": [ "null", "int" ], "label": "Threads per job", "description": "For tools which support multiprocessing, this value can be used to set the number of threads to be used.", "id": "#threads_per_job" }, { "sbg:altPrefix": "-titv", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "2.15", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--target_titv", "separate": true, "sbg:cmdInclude": true }, "label": "Target Titv", "description": "The expected novel Ti/Tv ratio to use when calculating FDR tranches and for display on the optimization curve output figures. (approx 2.15 for whole genome experiments). ONLY USED FOR PLOTTING PURPOSES!.", "id": "#target_titv" }, { "sbg:altPrefix": "-tag", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "string" ], "inputBinding": { "position": 0, "prefix": "--tag", "separate": true, "sbg:cmdInclude": true }, "label": "Tag", "description": "Arbitrary tag string to identify this GATK run as part of a group of runs, for later analysis.", "id": "#tag" }, { "sbg:altPrefix": "-tranche", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "[100.0, 99.9, 99.0, 90.0]", "type": [ "null", { "type": "array", "items": "float" } ], "inputBinding": { "position": 0, "prefix": "--TStranche", "separate": true, "sbg:cmdInclude": true }, "label": "T Stranche", "description": "The levels of novel false discovery rate (FDR, implied by ti/tv) at which to slice the data. (in percent, that is 1.0 for 1 percent).", "id": "#t_stranche" }, { "sbg:altPrefix": "-std", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "14.0", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--stdThreshold", "separate": true, "sbg:cmdInclude": true }, "label": "Std Threshold", "description": "If a variant has annotations more than -std standard deviations away from mean then don't use it for building the Gaussian mixture model.", "id": "#std_threshold" }, { "sbg:altPrefix": "-shrinkage", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "1.0", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--shrinkage", "separate": true, "sbg:cmdInclude": true }, "label": "Shrinkage", "description": "The shrinkage parameter in the variational Bayes algorithm.", "id": "#shrinkage" }, { "required": true, "sbg:altPrefix": null, "sbg:category": "Input Files", "type": [ { "type": "array", "items": "File" } ], "label": "Resources", "description": "Resources.", "sbg:fileTypes": "VCF", "id": "#resources_files" }, { "sbg:altPrefix": "-rpr", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--remove_program_records", "separate": true, "sbg:cmdInclude": true }, "label": "Remove Program Records", "description": "Should we override the Walker's default and remove program records from the SAM header.", "id": "#remove_program_records" }, { "required": true, "sbg:altPrefix": "-R", "sbg:category": "Input Files", "type": [ "File" ], "inputBinding": { "position": 0, "prefix": "--reference_sequence", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Reference Genome", "description": "Reference Genome in FASTA format.", "sbg:fileTypes": "FASTA, FA", "id": "#reference" }, { "sbg:altPrefix": "-rgbl", "sbg:category": "GATK General", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 0, "prefix": "--read_group_black_list", "separate": true, "sbg:cmdInclude": true }, "label": "Read Group Black List", "description": "Filters out read groups matching : or a .txt file containing the filter strings one per line.", "id": "#read_group_black_list" }, { "sbg:altPrefix": "-rf", "sbg:category": "GATK General", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": { "type": "enum", "symbols": [ "BadCigarFilter", "BadMateFilter", "CountingFilteringIterator.CountingReadFilter", "DuplicateReadFilter", "FailsVendorQualityCheckFilter", "HCMappingQualityFilter", "LibraryReadFilter", "MalformedReadFilter", "MappingQualityFilter", "MappingQualityUnavailableFilter", "MappingQualityZeroFilter", "MateSameStrandFilter", "MaxInsertSizeFilter", "MissingReadGroupFilter", "NoOriginalQualityScoresFilter", "NotPrimaryAlignmentFilter", "OverclippedReadFilter", "Platform454Filter", "PlatformFilter", "PlatformUnitFilter", "ReadGroupBlackListFilter", "ReadLengthFilter", "ReadNameFilter", "ReadStrandFilter", "ReassignMappingQualityFilter", "ReassignOneMappingQualityFilter", "SampleFilter", "SingleReadGroupFilter", "UnmappedReadFilter" ] } } ], "inputBinding": { "position": 0, "prefix": "--read_filter", "separate": true, "sbg:cmdInclude": true }, "label": "Read Filter", "description": "Specify filtration criteria to apply to each read individually.", "id": "#read_filter" }, { "sbg:altPrefix": "-qual", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "80.0", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--qualThreshold", "separate": true, "sbg:cmdInclude": true }, "label": "Qual Threshold", "description": "If a known variant has raw QUAL value less than -qual then don't use it for building the Gaussian mixture model.", "id": "#qual_threshold" }, { "sbg:altPrefix": "-priorCounts", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "20.0", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--priorCounts", "separate": true, "sbg:cmdInclude": true }, "label": "Prior Counts", "description": "The number of prior counts to use in the variational Bayes algorithm.", "id": "#prior_counts" }, { "sbg:altPrefix": "-preserveQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "6", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--preserve_qscores_less_than", "separate": true, "sbg:cmdInclude": true }, "label": "Preserve Qscores Less Than", "description": "Bases with quality scores less than this threshold won't be recalibrated (with -BQSR).", "id": "#preserve_qscores_less_than" }, { "sbg:altPrefix": "-et", "sbg:category": "GATK General", "sbg:toolDefaultValue": "STANDARD", "type": [ "null", { "type": "enum", "symbols": [ "NO_ET", "STANDARD" ], "name": "phone_home" } ], "inputBinding": { "position": 0, "prefix": "--phone_home", "separate": true, "sbg:cmdInclude": true }, "label": "Phone Home", "description": "What kind of GATK run report should we generate? STANDARD is the default, can be NO_ET so nothing is posted to the run repository. Please see http://gatkforums.broadinstitute.org/discussion/1250/what-is-phone-home-and-how-does-it-affect-me#latest for details.", "id": "#phone_home" }, { "sbg:altPrefix": "-percentBad", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "0.03", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--percentBadVariants", "separate": true, "sbg:cmdInclude": true }, "label": "Percent Bad Variants", "description": "What percentage of the worst scoring variants to use when building the Gaussian mixture model of bad variants. 0.07 means bottom 7 percent.", "id": "#percent_bad_variants" }, { "sbg:altPrefix": "-pedValidationType", "sbg:category": "GATK General", "sbg:toolDefaultValue": "STRICT", "type": [ "null", { "type": "enum", "symbols": [ "STRICT", "SILENT" ], "name": "pedigree_validation_type" } ], "inputBinding": { "position": 0, "prefix": "--pedigreeValidationType", "separate": true, "sbg:cmdInclude": true }, "label": "Pedigree Validation Type", "description": "How strict should we be in validating the pedigree information?.", "id": "#pedigree_validation_type" }, { "sbg:altPrefix": "-pedString", "sbg:category": "GATK General", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 0, "prefix": "--pedigreeString", "separate": true, "sbg:cmdInclude": true }, "label": "Pedigree String", "description": "Pedigree string for samples.", "id": "#pedigree_string" }, { "sbg:altPrefix": "-nKM", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "30", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--numKMeans", "separate": true, "sbg:cmdInclude": true }, "label": "Num K Means", "description": "The number of k-means iterations to perform in order to initialize the means of the Gaussians in the Gaussian mixture model.", "id": "#num_k_means" }, { "sbg:altPrefix": "-ndrs", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--nonDeterministicRandomSeed", "separate": true, "sbg:cmdInclude": true }, "label": "Non Deterministic Random Seed", "description": "Makes the GATK behave non deterministically, that is, the random numbers generated will be different in every run.", "id": "#non_deterministic_random_seed" }, { "sbg:altPrefix": "-mode", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "SNP", "type": [ "null", { "type": "enum", "symbols": [ "SNP", "INDEL", "BOTH" ], "name": "mode" } ], "inputBinding": { "position": 0, "prefix": "--mode", "separate": true, "sbg:cmdInclude": true }, "label": "Mode", "description": "Recalibration mode to employ: 1.) SNP for recalibrating only snps (emitting indels untouched in the output VCF); 2.) INDEL for indels; and 3.) BOTH for recalibrating both snps and indels simultaneously.", "id": "#mode" }, { "sbg:altPrefix": "-minNumBad", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "2500", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--minNumBadVariants", "separate": true, "sbg:cmdInclude": true }, "label": "Min Num Bad Variants", "description": "The minimum amount of worst scoring variants to use when building the Gaussian mixture model of bad variants. Will override -percentBad argument if necessary.", "id": "#min_num_bad_variants" }, { "sbg:altPrefix": null, "sbg:category": "Execution", "sbg:toolDefaultValue": "2048", "type": [ "null", "int" ], "label": "Memory per job", "description": "Amount of RAM memory to be used per job.", "id": "#memory_per_job" }, { "sbg:category": "Execution", "sbg:toolDefaultValue": "0", "type": [ "null", "int" ], "label": "Memory overhead per job", "description": "Memory overhead per job. By default this parameter value is set to '0' (zero megabytes). This parameter value is added to the Memory per job parameter value. This results in the allocation of the sum total (Memory per job and Memory overhead per job) amount of memory per job. By default the memory per job parameter value is set to 2048 megabytes, unless specified otherwise.", "id": "#memory_overhead_per_job" }, { "sbg:altPrefix": "-maxRuntimeUnits", "sbg:category": "GATK General", "sbg:toolDefaultValue": "MINUTES", "type": [ "null", { "type": "enum", "symbols": [ "NANOSECONDS", "MICROSECONDS", "MILLISECONDS", "SECONDS", "MINUTES", "HOURS", "DAYS" ], "name": "max_runtime_units" } ], "inputBinding": { "position": 0, "prefix": "--maxRuntimeUnits", "separate": true, "sbg:cmdInclude": true }, "label": "Max Runtime Units", "description": "The TimeUnit for maxRuntime.", "id": "#max_runtime_units" }, { "sbg:altPrefix": "-maxRuntime", "sbg:category": "GATK General", "sbg:toolDefaultValue": "-1", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--maxRuntime", "separate": true, "sbg:cmdInclude": true }, "label": "Max Runtime", "description": "If provided, that GATK will stop execution cleanly as soon after maxRuntime has been exceeded, truncating the run but not exiting with a failure. By default the value is interpreted in minutes, but this can be changed by maxRuntimeUnits.", "id": "#max_runtime" }, { "sbg:altPrefix": "-mI", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "100", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--maxIterations", "separate": true, "sbg:cmdInclude": true }, "label": "Max Iterations", "description": "The maximum number of VBEM iterations to be performed in variational Bayes algorithm. Procedure will normally end when convergence is detected.", "id": "#max_iterations" }, { "sbg:altPrefix": "-mG", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "10", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--maxGaussians", "separate": true, "sbg:cmdInclude": true }, "label": "Max Gaussians", "description": "The maximum number of Gaussians to try during variational Bayes algorithm.", "id": "#max_gaussians" }, { "sbg:altPrefix": "-kpr", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--keep_program_records", "separate": true, "sbg:cmdInclude": true }, "label": "Keep Program Records", "description": "Should we override the Walker's default and keep program records from the SAM header.", "id": "#keep_program_records" }, { "required": false, "sbg:altPrefix": "-L", "sbg:category": "Input Files", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--intervals", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Intervals", "description": "One or more genomic intervals over which to operate. Can be an specified in an .intervals file or a rod file.", "sbg:fileTypes": "TXT, BED, VCF", "id": "#intervals_file" }, { "sbg:altPrefix": "--intervals", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "string" ], "inputBinding": { "position": 0, "prefix": "-L", "separate": true, "sbg:cmdInclude": true }, "label": "Intervals", "description": "One or more genomic intervals over which to operate.", "id": "#intervals" }, { "sbg:altPrefix": "-isr", "sbg:category": "GATK General", "sbg:toolDefaultValue": "UNION", "type": [ "null", { "type": "enum", "symbols": [ "UNION", "INTERSECTION" ], "name": "interval_set_rule" } ], "inputBinding": { "position": 0, "prefix": "--interval_set_rule", "separate": true, "sbg:cmdInclude": true }, "label": "Interval Set Rule", "description": "Indicates the set merging approach the interval parser should use to combine the various -L or -XL inputs.", "id": "#interval_set_rule" }, { "sbg:altPrefix": "-ip", "sbg:category": "GATK General", "sbg:toolDefaultValue": "0", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--interval_padding", "separate": true, "sbg:cmdInclude": true }, "label": "Interval Padding", "description": "Indicates how many basepairs of padding to include around each of the intervals specified with the -L/--intervals argument.", "id": "#interval_padding" }, { "sbg:altPrefix": "-im", "sbg:category": "GATK General", "sbg:toolDefaultValue": "ALL", "type": [ "null", { "type": "enum", "symbols": [ "ALL", "OVERLAPPING_ONLY" ], "name": "interval_merging" } ], "inputBinding": { "position": 0, "prefix": "--interval_merging", "separate": true, "sbg:cmdInclude": true }, "label": "Interval Merging", "description": "Indicates the interval merging rule we should use for abutting intervals.", "id": "#interval_merging" }, { "sbg:altPrefix": "-ignoreFilter", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 0, "prefix": "--ignore_filter", "separate": true, "sbg:cmdInclude": true }, "label": "Ignore Filter", "description": "If specified the variant recalibrator will use variants even if the specified filter name is marked in the input VCF file.", "id": "#ignore_filter" }, { "required": false, "sbg:altPrefix": "-K", "sbg:category": "Input Files", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--gatk_key", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Gatk key", "description": "GATK Key file. Required if running with -et NO_ET. Please see http://gatkforums.broadinstitute.org/discussion/1250/what-is-phone-home-and-how-does-it-affect-me#latest for details.", "sbg:fileTypes": "KEY, LICENSE", "id": "#gatk_key" }, { "sbg:altPrefix": "-fixMisencodedQuals", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "-fixMisencodedQuals", "separate": true, "sbg:cmdInclude": true }, "label": "Fix Misencoded Quals", "description": "Fix mis-encoded base quality scores.", "id": "#fix_misencoded_quals" }, { "required": false, "sbg:altPrefix": "-XL", "sbg:category": "Input Files", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--excludeIntervals", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Exclude Intervals", "description": "One or more genomic intervals to exclude from processing. Can be an .intervals file or a rod file.", "sbg:fileTypes": "TXT, BED, VCF", "id": "#exclude_intervals" }, { "sbg:altPrefix": "-EOQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--emit_original_quals", "separate": true, "sbg:cmdInclude": true }, "label": "Emit Original Quals", "description": "If true, enables printing of the OQ tag with the original base qualities (with -BQSR).", "id": "#emit_original_quals" }, { "sbg:altPrefix": "-dt", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", { "type": "enum", "symbols": [ "NONE", "ALL_READS", "BY_SAMPLE" ], "name": "downsampling_type" } ], "inputBinding": { "position": 0, "prefix": "--downsampling_type", "separate": true, "sbg:cmdInclude": true }, "label": "Downsampling Type", "description": "Type of reads downsampling to employ at a given locus. Reads will be selected randomly to be removed from the pile based on the method described here.", "id": "#downsampling_type" }, { "sbg:altPrefix": "-dfrac", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--downsample_to_fraction", "separate": true, "sbg:cmdInclude": true }, "label": "Downsample to Fraction", "description": "Fraction [0.0-1.0] of reads to downsample to.", "id": "#downsample_to_fraction" }, { "sbg:altPrefix": "-dcov", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--downsample_to_coverage", "separate": true, "sbg:cmdInclude": true }, "label": "Downsample to Coverage", "description": "Coverage to downsample to at any given locus; note that downsampled reads are randomly selected from all possible reads at a locus. For non-locus-based traversals (eg., ReadWalkers), this sets the maximum number of reads at each alignment start position.", "id": "#downsample_to_coverage" }, { "sbg:altPrefix": null, "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--disableRandomization", "separate": true, "sbg:cmdInclude": true }, "label": "Disable Randomization", "description": "Completely eliminates randomization from nondeterministic methods. To be used mostly in the testing framework where dynamic parallelism can result in differing numbers of calls to the generator.", "id": "#disable_radnomization" }, { "sbg:altPrefix": "-DIQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--disable_indel_quals", "separate": true, "sbg:cmdInclude": true }, "label": "Disable Indel Quals", "description": "If 'true', disables printing of base insertion and base deletion tags (with -BQSR). Turns off printing of the base insertion and base deletion tags when using the -BQSR argument and only the base substitution qualities will be produced.", "id": "#disable_indel_quals" }, { "sbg:altPrefix": "-dirichlet", "sbg:category": "Variant Recalibrator", "sbg:toolDefaultValue": "0.001", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--dirichlet", "separate": true, "sbg:cmdInclude": true }, "label": "Dirichlet", "description": "The dirichlet parameter in the variational Bayes algorithm.", "id": "#dirichlet" }, { "sbg:altPrefix": "-DBQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "-1", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--defaultBaseQualities", "separate": true, "sbg:cmdInclude": true }, "label": "Default Base Qualities", "description": "If reads are missing some or all base quality scores, this value will be used for all base quality scores.", "id": "#default_base_qualities" }, { "sbg:altPrefix": null, "sbg:category": "Execution", "sbg:toolDefaultValue": "1", "type": [ "null", "int" ], "label": "CPU per job", "description": "Number of CPUs per job.", "id": "#cpu_per_job" }, { "sbg:altPrefix": "-baqGOP", "sbg:category": "GATK General", "sbg:toolDefaultValue": "40.0", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--baqGapOpenPenalty", "separate": true, "sbg:cmdInclude": true }, "label": "BAQ Gap Open Penalty", "description": "BAQ gap open penalty (Phred Scaled). Default value is 40. 30 is perhaps better for whole genome call sets.", "id": "#baq_gap_open_penalty" }, { "sbg:altPrefix": "-baq", "sbg:category": "GATK General", "sbg:toolDefaultValue": "OFF", "type": [ "null", { "type": "enum", "symbols": [ "OFF", "CALCULATE_AS_NECESSARY", "RECALCULATE" ], "name": "baq" } ], "inputBinding": { "position": 0, "prefix": "--baq", "separate": true, "sbg:cmdInclude": true }, "label": "BAQ Calculation Type", "description": "Type of BAQ calculation to apply in the engine.", "id": "#baq" }, { "sbg:altPrefix": "--allow_potentially_misencoded_quality_scores", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "-allowPotentiallyMisencodedQuals", "separate": true, "sbg:cmdInclude": true }, "label": "Allow Potentially Misencoded Quals", "description": "Do not fail when encountered base qualities that are too high and seemingly indicate a problem with the base quality encoding of the BAM file.", "id": "#allow_potentailly_misencoded_quals" } ], "outputs": [ { "type": [ "null", "File" ], "label": "Tranches Plot", "description": "PDF file containing tranches plot generated by VariantRecalibrator.", "sbg:fileTypes": "PDF", "outputBinding": { "glob": "*.tranches.pdf", "sbg:inheritMetadataFrom": "#variants" }, "id": "#tranches_plot" }, { "type": [ "null", "File" ], "label": "Tranches File", "description": "The output tranches file used by ApplyRecalibration.", "sbg:fileTypes": "TRANCHES", "outputBinding": { "glob": "*.tranches", "sbg:inheritMetadataFrom": "#variants" }, "id": "#tranches_file" }, { "type": [ "null", "File" ], "label": "Rscript File", "description": "The output rscript file generated by the VQSR to aid in visualization of the input data and learned model.", "sbg:fileTypes": "R", "outputBinding": { "glob": "*.recal.R", "sbg:inheritMetadataFrom": "#variants" }, "id": "#rscript_file" }, { "type": [ "File" ], "label": "Recal File", "description": "The output recal file used by ApplyRecalibration.", "sbg:fileTypes": "RECAL", "outputBinding": { "glob": "*.recal", "sbg:inheritMetadataFrom": "#variants" }, "id": "#recal_file" }, { "type": [ "null", "File" ], "label": "R Plots", "description": "PDF file containing plots generated by VariantRecalibrator.", "sbg:fileTypes": "PDF", "outputBinding": { "glob": "*.recal.R.pdf", "sbg:inheritMetadataFrom": "#variants" }, "id": "#R_plots" } ], "requirements": [ { "class": "ExpressionEngineRequirement", "id": "#cwl-js-engine", "requirements": [ { "class": "DockerRequirement", "dockerPull": "rabix/js-engine" } ] } ], "hints": [ { "class": "sbg:CPURequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.cpu_per_job){\n \treturn $job.inputs.cpu_per_job;\n }\n else{\n \treturn 1;\n }\n}" } }, { "class": "sbg:MemRequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.memory_per_job){\n if($job.inputs.memory_overhead_per_job){\n \treturn $job.inputs.memory_per_job + $job.inputs.memory_overhead_per_job\n }\n else\n \t\treturn $job.inputs.memory_per_job\n }\n else if(!$job.inputs.memory_per_job && $job.inputs.memory_overhead_per_job){\n\t\treturn 2048 + $job.inputs.memory_overhead_per_job \n }\n else\n \treturn 2048\n}" } }, { "class": "DockerRequirement", "dockerImageId": "47510cb2da55", "dockerPull": "images.sbgenomics.com/stefanristeski/gatk2-lite:2.3-9" } ], "arguments": [ { "position": 0, "prefix": "--recal_file", "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n variant_name = [].concat($job.inputs.variants)[0].path.replace(/^.*[\\\\\\/]/, '').split('.')\n variant_namebase = variant_name.slice(0, variant_name.length-1).join('.')\n return variant_namebase.concat('.recal')\n}" } }, { "position": 0, "prefix": "--rscript_file", "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n variant_name = [].concat($job.inputs.variants)[0].path.replace(/^.*[\\\\\\/]/, '').split('.')\n variant_namebase = variant_name.slice(0, variant_name.length-1).join('.')\n return variant_namebase.concat('.recal.R')\n}" } }, { "position": 0, "prefix": "--tranches_file", "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n variant_name = [].concat($job.inputs.variants)[0].path.replace(/^.*[\\\\\\/]/, '').split('.')\n variant_namebase = variant_name.slice(0, variant_name.length-1).join('.')\n return variant_namebase.concat('.tranches')\n}" } }, { "position": 1, "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n res = []\n for(i=0; i<$job.inputs.resources_files.length; i++){\n \tres.push($job.inputs.resources_files[i].metadata.resources, $job.inputs.resources_files[i].path);\n }\n return res.join(' ');\n}" } } ], "sbg:job": { "inputs": { "variants": [ { "path": "/f/some.vcf" }, { "path": "/f/some_other.vcf" } ], "validation_strictness": null, "use_original_qualities": null, "use_legacy_downsampler": null, "use_annotation": [], "unsafe": null, "ts_filter_level": null, "trust_all_polymorphic": null, "threads_per_job": null, "target_titv": null, "tag": null, "t_stranche": [], "std_threshold": null, "shrinkage": null, "resources_files": [ { "metadata": { "resources": "-resource:dbsnp,known=false,training=true,truth=false,prior=10.2", "some": "value" }, "path": "/dbsnp.vcf", "secondaryFiles": [] }, { "metadata": { "resources": "-resource:omni,known=false,training=true,truth=false,prior=10.2" }, "path": "/known.vcf", "secondaryFiles": [] } ], "remove_program_records": null, "reference": { "path": "/folder/reference.fasta" }, "read_group_black_list": [], "read_filter": [], "qual_threshold": null, "prior_counts": null, "preserve_qscores_less_than": null, "phone_home": null, "percent_bad_variants": null, "pedigree_validation_type": null, "pedigree_string": [], "num_k_means": null, "non_deterministic_random_seed": null, "mode": null, "min_num_bad_variants": null, "memory_per_job": null, "memory_overhead_per_job": 0, "max_runtime_units": null, "max_runtime": null, "max_iterations": null, "max_gaussians": null, "keep_program_records": null, "intervals_file": null, "intervals": null, "interval_set_rule": null, "interval_padding": null, "interval_merging": null, "ignore_filter": [], "gatk_key": null, "fix_misencoded_quals": null, "exclude_intervals": null, "emit_original_quals": null, "downsampling_type": null, "downsample_to_fraction": null, "downsample_to_coverage": null, "disable_radnomization": null, "disable_indel_quals": null, "dirichlet": null, "default_base_qualities": null, "cpu_per_job": null, "baq_gap_open_penalty": null, "baq": null, "allow_potentailly_misencoded_quals": null }, "allocatedResources": { "mem": 2048, "cpu": 1 } }, "sbg:categories": [ "VCF-Processing" ], "sbg:cmdPreview": "java -Xmx2048M -jar /opt/GenomeAnalysisTKLite.jar --analysis_type VariantRecalibrator -nt 8 --reference_sequence /folder/reference.fasta --input /f/some.vcf --input /f/some_other.vcf --use_annotation QD,MQRankSum,FS,DP,ReadPosRankSum,HaplotypeScore --recal_file some.recal --rscript_file some.recal.R --tranches_file some.tranches -resource:dbsnp,known=false,training=true,truth=false,prior=10.2 /dbsnp.vcf -resource:omni,known=false,training=true,truth=false,prior=10.2 /known.vcf", "sbg:contributors": [ "bix-demo" ], "sbg:createdBy": "bix-demo", "sbg:createdOn": 1450911440, "sbg:id": "admin/sbg-public-data/gatk-2-3-9-lite-variantrecalibrator/6", "sbg:image_url": null, "sbg:latestRevision": 5, "sbg:license": "MIT License", "sbg:links": [ { "id": "https://www.broadinstitute.org/gatk/index.php", "label": "Homepage" }, { "id": "https://github.com/broadgsa/gatk-protected", "label": "Source code" }, { "id": "https://www.broadinstitute.org/gatk/guide/pdfdocs/GATK_GuideBook_2.3-9.pdf", "label": "Wiki" }, { "id": "https://www.broadinstitute.org/gatk/download/auth?package=GATK-archive&version=2.3-9-ge5ebf34", "label": "Download" }, { "id": "https://www.broadinstitute.org/gatk/about/#in-the-literature", "label": "Publication" }, { "id": "https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantrecalibration_VariantRecalibrator.php", "label": "Documentation" } ], "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911446, "sbg:project": "bix-demo/gatk-2-3-9-lite-demo", "sbg:revision": 5, "sbg:revisionsInfo": [ { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911440, "sbg:revision": 0 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911441, "sbg:revision": 1 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911442, "sbg:revision": 2 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911443, "sbg:revision": 3 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911444, "sbg:revision": 4 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911446, "sbg:revision": 5 } ], "sbg:sbgMaintained": false, "sbg:toolAuthor": "Bread Institute", "sbg:toolkit": "GATK", "sbg:toolkitVersion": "2.3.9 Lite", "sbg:validationErrors": [], "x": 2816.3733208857607, "y": 135.19613669433062 }, "label": "INDEL GATK VariantRecalibrator", "sbg:x": 2816.3733208857607, "sbg:y": 135.19613669433062 }, { "id": "#GATK_ApplyRecalibration", "inputs": [ { "id": "#GATK_ApplyRecalibration.variants", "source": [ "#GATK_CombineVariants.combined_vcf" ] }, { "id": "#GATK_ApplyRecalibration.ts_filter_level", "default": 99 }, { "id": "#GATK_ApplyRecalibration.tranches_file", "source": [ "#GATK_VariantRecalibrator.tranches_file" ] }, { "id": "#GATK_ApplyRecalibration.threads_per_job", "default": 32 }, { "id": "#GATK_ApplyRecalibration.reference", "source": [ "#SBG_FASTA_Indices.fasta_reference" ] }, { "id": "#GATK_ApplyRecalibration.recal_file", "source": [ "#GATK_VariantRecalibrator.recal_file" ] }, { "id": "#GATK_ApplyRecalibration.mode", "default": "SNP" }, { "id": "#GATK_ApplyRecalibration.memory_per_job", "default": 20000 }, { "id": "#GATK_ApplyRecalibration.cpu_per_job", "default": 1 } ], "outputs": [ { "id": "#GATK_ApplyRecalibration.vcf" } ], "run": { "cwlVersion": "sbg:draft-2", "class": "CommandLineTool", "id": "admin/sbg-public-data/gatk-2-3-9-lite-applyrecalibration/6", "label": "SNP GATK ApplyRecalibration", "description": "Overview\n\nThis tool performs the second pass in a two-stage process called VQSR; the first pass is performed by the VariantRecalibrator tool. In brief, the first pass consists of creating a Gaussian mixture model by looking at the distribution of annotation values over a high quality subset of the input call set, and then scoring all input variants according to the model. The second pass consists of filtering variants based on score cutoffs identified in the first pass.\n\nUsing the tranche file and recalibration table generated by the previous step, the ApplyRecalibration tool looks at each variant's VQSLOD value and decides which tranche it falls in. Variants in tranches that fall below the specified truth sensitivity filter level have their FILTER field annotated with the corresponding tranche level. This will result in a call set that is filtered to the desired level but retains the information necessary to increase sensitivity if needed.\n\nTo be clear, please note that by \"filtered\", we mean that variants failing the requested tranche cutoff are marked as filtered in the output VCF; they are not discarded.\n\nVQSR is probably the hardest part of the Best Practices to get right, so be sure to read the method documentation, parameter recommendations and tutorial to really understand what these tools and how to use them for best results on your own data.\n\nInput\nThe raw input variants to be filtered.\nThe recalibration table file that was generated by the VariantRecalibrator tool.\nThe tranches file that was generated by the VariantRecalibrator tool.\n\nOutput\nA recalibrated VCF file in which each variant of the requested type is annotated with its VQSLOD and marked as filtered if the score is below the desired quality level.\n\nUsage example for filtering SNPs\n\n java -Xmx3g -jar GenomeAnalysisTK.jar \\\n -T ApplyRecalibration \\\n -R reference.fasta \\\n -input NA12878.HiSeq.WGS.bwa.cleaned.raw.subset.b37.vcf \\\n --ts_filter_level 99.0 \\\n -tranchesFile path/to/output.tranches \\\n -recalFile path/to/output.recal \\\n -mode SNP \\\n -o path/to/output.recalibrated.filtered.vcf\n \nCaveats\n\nThe tranche values used in the example above is only a general example. You should determine the level of sensitivity that is appropriate for your specific project. Remember that higher sensitivity (more power to detect variants, yay!) comes at the cost of specificity (more false negatives, boo!). You have to choose at what point you want to set the tradeoff.\nIn order to create the tranche reporting plots (which are only generated for SNPs, not indels!) Rscript needs to be in your environment PATH (this is the scripting version of R, not the interactive version). See http://www.r-project.org for more info on how to download and install R.\n\n(IMPORTANT) Reference \".fasta\" Secondary Files\n\nTools in GATK that require a fasta reference file also look for the reference file's corresponding .fai (fasta index) and .dict (fasta dictionary) files. The fasta index file allows random access to reference bases and the dictionary file is a dictionary of the contig names and sizes contained within the fasta reference. These two secondary files are essential for GATK to work properly. To append these two files to your fasta reference please use the 'SBG FASTA Indices' tool within your GATK based workflow before using any of the GATK tools.", "baseCommand": [ "java", { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.memory_per_job){\n \treturn '-Xmx'.concat($job.inputs.memory_per_job, 'M')\n }\n \treturn '-Xmx2048M'\n}" }, "-jar", "/opt/GenomeAnalysisTKLite.jar", "--analysis_type", "ApplyRecalibration", { "class": "Expression", "engine": "#cwl-js-engine", "script": "{ \n if($job.inputs.threads_per_job){\n return '-nt '.concat($job.inputs.threads_per_job)\n }\n else{\n \treturn '-nt '.concat(8)\n }\n}" } ], "inputs": [ { "required": true, "sbg:altPrefix": "-input", "sbg:category": "Input Files", "type": [ "File" ], "inputBinding": { "position": 0, "prefix": "--input", "separate": true, "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Input", "description": "The raw input variants to be recalibrated.", "sbg:fileTypes": "VCF", "id": "#variants" }, { "sbg:altPrefix": "-S", "sbg:category": "GATK General", "sbg:toolDefaultValue": "SILENT", "type": [ "null", { "type": "enum", "symbols": [ "SILENT", "LENIENT", "STRICT" ], "name": "validation_strictness" } ], "inputBinding": { "position": 0, "prefix": "--validation_strictness", "separate": true, "sbg:cmdInclude": true }, "label": "Validation Strictness", "description": "How strict should we be with validation.", "id": "#validation_strictness" }, { "sbg:altPrefix": "-OQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--useOriginalQualities", "separate": true, "sbg:cmdInclude": true }, "label": "Use Original Qualities", "description": "If set, use the original base quality scores from the OQ tag when present instead of the standard scores.", "id": "#use_original_qualities" }, { "sbg:altPrefix": "-use_legacy_downsampler", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--use_legacy_downsampler", "separate": true, "sbg:cmdInclude": true }, "label": "Use Legacy Downsampler", "description": "Use the legacy downsampling implementation instead of the newer, less-tested implementation.", "id": "#use_legacy_downsampler" }, { "sbg:altPrefix": "-U", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", { "type": "enum", "symbols": [ "ALLOW_UNINDEXED_BAM", "ALLOW_UNSET_BAM_SORT_ORDER", "NO_READ_ORDER_VERIFICATION", "ALLOW_SEQ_DICT_INCOMPATIBILITY", "LENIENT_VCF_PROCESSING", "ALL" ], "name": "unsafe" } ], "inputBinding": { "position": 0, "prefix": "--unsafe", "separate": true, "sbg:cmdInclude": true }, "label": "Unsafe", "description": "If set, enables unsafe operations: nothing will be checked at runtime. For expert users only who know what they are doing. We do not support usage of this argument.", "id": "#unsafe" }, { "sbg:altPrefix": "-ts_filter_level", "sbg:category": "Apply Recalibration", "sbg:toolDefaultValue": "99.0", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--ts_filter_level", "separate": true, "sbg:cmdInclude": true }, "label": "Ts Filter Level", "description": "The truth sensitivity level at which to start filtering.", "id": "#ts_filter_level" }, { "required": true, "sbg:altPrefix": "-tranchesFile", "sbg:category": "Input Files", "type": [ "File" ], "inputBinding": { "position": 0, "prefix": "--tranches_file", "separate": true, "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Tranches File", "description": "The input tranches file describing where to cut the data.", "sbg:fileTypes": "TRANCHES", "id": "#tranches_file" }, { "sbg:altPrefix": null, "sbg:category": "Execution", "sbg:toolDefaultValue": "8", "type": [ "null", "int" ], "label": "Threads per job", "description": "For tools which support multiprocessing, this value can be used to set the number of threads to be used.", "id": "#threads_per_job" }, { "sbg:altPrefix": "-tag", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "string" ], "inputBinding": { "position": 0, "prefix": "--tag", "separate": true, "sbg:cmdInclude": true }, "label": "Tag", "description": "Arbitrary tag string to identify this GATK run as part of a group of runs, for later analysis.", "id": "#tag" }, { "sbg:altPrefix": "-rpr", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--remove_program_records", "separate": true, "sbg:cmdInclude": true }, "label": "Remove Program Records", "description": "Should we override the Walker's default and remove program records from the SAM header.", "id": "#remove_program_records" }, { "required": true, "sbg:altPrefix": "-R", "sbg:category": "Input Files", "type": [ "File" ], "inputBinding": { "position": 0, "prefix": "--reference_sequence", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Reference Genome", "description": "Reference Genome in FASTA format.", "sbg:fileTypes": "FASTA, FA", "id": "#reference" }, { "required": true, "sbg:altPrefix": "-recalFile", "sbg:category": "Input Files", "type": [ "File" ], "inputBinding": { "position": 0, "prefix": "--recal_file", "separate": true, "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Recal File", "description": "The input recal file used by ApplyRecalibration.", "sbg:fileTypes": "RECAL", "id": "#recal_file" }, { "sbg:altPrefix": "-rgbl", "sbg:category": "GATK General", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 0, "prefix": "--read_group_black_list", "separate": true, "sbg:cmdInclude": true }, "label": "Read Group Black List", "description": "Filters out read groups matching : or a .txt file containing the filter strings one per line.", "id": "#read_group_black_list" }, { "sbg:altPrefix": "-rf", "sbg:category": "GATK General", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": { "type": "enum", "symbols": [ "BadCigarFilter", "BadMateFilter", "CountingFilteringIterator.CountingReadFilter", "DuplicateReadFilter", "FailsVendorQualityCheckFilter", "HCMappingQualityFilter", "LibraryReadFilter", "MalformedReadFilter", "MappingQualityFilter", "MappingQualityUnavailableFilter", "MappingQualityZeroFilter", "MateSameStrandFilter", "MaxInsertSizeFilter", "MissingReadGroupFilter", "NoOriginalQualityScoresFilter", "NotPrimaryAlignmentFilter", "OverclippedReadFilter", "Platform454Filter", "PlatformFilter", "PlatformUnitFilter", "ReadGroupBlackListFilter", "ReadLengthFilter", "ReadNameFilter", "ReadStrandFilter", "ReassignMappingQualityFilter", "ReassignOneMappingQualityFilter", "SampleFilter", "SingleReadGroupFilter", "UnmappedReadFilter" ] } } ], "inputBinding": { "position": 0, "prefix": "--read_filter", "separate": true, "sbg:cmdInclude": true }, "label": "Read Filter", "description": "Specify filtration criteria to apply to each read individually.", "id": "#read_filter" }, { "sbg:altPrefix": "-preserveQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "6", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--preserve_qscores_less_than", "separate": true, "sbg:cmdInclude": true }, "label": "Preserve Qscores Less Than", "description": "Bases with quality scores less than this threshold won't be recalibrated (with -BQSR).", "id": "#preserve_qscores_less_than" }, { "sbg:altPrefix": "-et", "sbg:category": "GATK General", "sbg:toolDefaultValue": "STANDARD", "type": [ "null", { "type": "enum", "symbols": [ "NO_ET", "STANDARD" ], "name": "phone_home" } ], "inputBinding": { "position": 0, "prefix": "--phone_home", "separate": true, "sbg:cmdInclude": true }, "label": "Phone Home", "description": "What kind of GATK run report should we generate? STANDARD is the default, can be NO_ET so nothing is posted to the run repository. Please see http://gatkforums.broadinstitute.org/discussion/1250/what-is-phone-home-and-how-does-it-affect-me#latest for details.", "id": "#phone_home" }, { "sbg:altPrefix": "-pedValidationType", "sbg:category": "GATK General", "sbg:toolDefaultValue": "STRICT", "type": [ "null", { "type": "enum", "symbols": [ "STRICT", "SILENT" ], "name": "pedigree_validation_type" } ], "inputBinding": { "position": 0, "prefix": "--pedigreeValidationType", "separate": true, "sbg:cmdInclude": true }, "label": "Pedigree Validation Type", "description": "How strict should we be in validating the pedigree information?.", "id": "#pedigree_validation_type" }, { "sbg:altPrefix": "-pedString", "sbg:category": "GATK General", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 0, "prefix": "--pedigreeString", "separate": true, "sbg:cmdInclude": true }, "label": "Pedigree String", "description": "Pedigree string for samples.", "id": "#pedigree_string" }, { "sbg:altPrefix": "-ndrs", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--nonDeterministicRandomSeed", "separate": true, "sbg:cmdInclude": true }, "label": "Non Deterministic Random Seed", "description": "Makes the GATK behave non deterministically, that is, the random numbers generated will be different in every run.", "id": "#non_deterministic_random_seed" }, { "sbg:altPrefix": "-mode", "sbg:category": "Apply Recalibration", "sbg:toolDefaultValue": "SNP", "type": [ "null", { "type": "enum", "symbols": [ "SNP", "INDEL", "BOTH" ], "name": "mode" } ], "inputBinding": { "position": 0, "prefix": "--mode", "separate": true, "sbg:cmdInclude": true }, "label": "Mode", "description": "Recalibration mode to employ: 1.) SNP for recalibrating only SNPs (emitting indels untouched in the output VCF); 2.) INDEL for indels; and 3.) BOTH for recalibrating both SNPs and indels simultaneously.", "id": "#mode" }, { "sbg:altPrefix": null, "sbg:category": "Execution", "sbg:toolDefaultValue": "2048", "type": [ "null", "int" ], "label": "Memory per job", "description": "Amount of RAM memory to be used per job.", "id": "#memory_per_job" }, { "sbg:category": "Execution", "sbg:toolDefaultValue": "0", "type": [ "null", "int" ], "label": "Memory overhead per job", "description": "Memory overhead per job. By default this parameter value is set to '0' (zero megabytes). This parameter value is added to the Memory per job parameter value. This results in the allocation of the sum total (Memory per job and Memory overhead per job) amount of memory per job. By default the memory per job parameter value is set to 2048 megabytes, unless specified otherwise.", "id": "#memory_overhead_per_job" }, { "sbg:altPrefix": "-maxRuntimeUnits", "sbg:category": "GATK General", "sbg:toolDefaultValue": "MINUTES", "type": [ "null", { "type": "enum", "symbols": [ "NANOSECONDS", "MICROSECONDS", "MILLISECONDS", "SECONDS", "MINUTES", "HOURS", "DAYS" ], "name": "max_runtime_units" } ], "inputBinding": { "position": 0, "prefix": "--maxRuntimeUnits", "separate": true, "sbg:cmdInclude": true }, "label": "Max Runtime Units", "description": "The TimeUnit for maxRuntime.", "id": "#max_runtime_units" }, { "sbg:altPrefix": "-maxRuntime", "sbg:category": "GATK General", "sbg:toolDefaultValue": "-1", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--maxRuntime", "separate": true, "sbg:cmdInclude": true }, "label": "Max Runtime", "description": "If provided, that GATK will stop execution cleanly as soon after maxRuntime has been exceeded, truncating the run but not exiting with a failure. By default the value is interpreted in minutes, but this can be changed by maxRuntimeUnits.", "id": "#max_runtime" }, { "sbg:altPrefix": "-kpr", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--keep_program_records", "separate": true, "sbg:cmdInclude": true }, "label": "Keep Program Records", "description": "Should we override the Walker's default and keep program records from the SAM header.", "id": "#keep_program_records" }, { "required": false, "sbg:altPrefix": "-L", "sbg:category": "Input Files", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--intervals", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Intervals", "description": "One or more genomic intervals over which to operate. Can be an specified in an .intervals file or a rod file.", "sbg:fileTypes": "TXT, BED, VCF", "id": "#intervals_file" }, { "sbg:altPrefix": "--intervals", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "string" ], "inputBinding": { "position": 0, "prefix": "-L", "separate": true, "sbg:cmdInclude": true }, "label": "Intervals", "description": "One or more genomic intervals over which to operate.", "id": "#intervals" }, { "sbg:altPrefix": "-isr", "sbg:category": "GATK General", "sbg:toolDefaultValue": "UNION", "type": [ "null", { "type": "enum", "symbols": [ "UNION", "INTERSECTION" ], "name": "interval_set_rule" } ], "inputBinding": { "position": 0, "prefix": "--interval_set_rule", "separate": true, "sbg:cmdInclude": true }, "label": "Interval Set Rule", "description": "Indicates the set merging approach the interval parser should use to combine the various -L or -XL inputs.", "id": "#interval_set_rule" }, { "sbg:altPrefix": "-ip", "sbg:category": "GATK General", "sbg:toolDefaultValue": "0", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--interval_padding", "separate": true, "sbg:cmdInclude": true }, "label": "Interval Padding", "description": "Indicates how many basepairs of padding to include around each of the intervals specified with the -L/--intervals argument.", "id": "#interval_padding" }, { "sbg:altPrefix": "-im", "sbg:category": "GATK General", "sbg:toolDefaultValue": "ALL", "type": [ "null", { "type": "enum", "symbols": [ "ALL", "OVERLAPPING_ONLY" ], "name": "interval_merging" } ], "inputBinding": { "position": 0, "prefix": "--interval_merging", "separate": true, "sbg:cmdInclude": true }, "label": "Interval Merging", "description": "Indicates the interval merging rule we should use for abutting intervals.", "id": "#interval_merging" }, { "sbg:altPrefix": "-ignoreFilter", "sbg:category": "Apply Recalibration", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 0, "prefix": "--ignore_filter", "separate": true, "sbg:cmdInclude": true }, "label": "Ignore Filter", "description": "If specified the variant recalibrator will use variants even if the specified filter name is marked in the input VCF file.", "id": "#ignore_filter" }, { "required": false, "sbg:altPrefix": "-K", "sbg:category": "Input Files", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--gatk_key", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Gatk key", "description": "GATK Key file. Required if running with -et NO_ET. Please see http://gatkforums.broadinstitute.org/discussion/1250/what-is-phone-home-and-how-does-it-affect-me#latest for details.", "sbg:fileTypes": "KEY, LICENSE", "id": "#gatk_key" }, { "sbg:altPrefix": "-fixMisencodedQuals", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "-fixMisencodedQuals", "separate": true, "sbg:cmdInclude": true }, "label": "Fix Misencoded Quals", "description": "Fix mis-encoded base quality scores.", "id": "#fix_misencoded_quals" }, { "required": false, "sbg:altPrefix": "-XL", "sbg:category": "Input Files", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--excludeIntervals", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Exclude Intervals", "description": "One or more genomic intervals to exclude from processing. Can be an .intervals file or a rod file.", "sbg:fileTypes": "TXT, BED, VCF", "id": "#exclude_intervals" }, { "sbg:altPrefix": "-EOQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--emit_original_quals", "separate": true, "sbg:cmdInclude": true }, "label": "Emit Original Quals", "description": "If true, enables printing of the OQ tag with the original base qualities (with -BQSR).", "id": "#emit_original_quals" }, { "sbg:altPrefix": "-dt", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", { "type": "enum", "symbols": [ "NONE", "ALL_READS", "BY_SAMPLE" ], "name": "downsampling_type" } ], "inputBinding": { "position": 0, "prefix": "--downsampling_type", "separate": true, "sbg:cmdInclude": true }, "label": "Downsampling Type", "description": "Type of reads downsampling to employ at a given locus. Reads will be selected randomly to be removed from the pile based on the method described here.", "id": "#downsampling_type" }, { "sbg:altPrefix": "-dfrac", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--downsample_to_fraction", "separate": true, "sbg:cmdInclude": true }, "label": "Downsample to Fraction", "description": "Fraction [0.0-1.0] of reads to downsample to.", "id": "#downsample_to_fraction" }, { "sbg:altPrefix": "-dcov", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--downsample_to_coverage", "separate": true, "sbg:cmdInclude": true }, "label": "Downsample to Coverage", "description": "Coverage to downsample to at any given locus; note that downsampled reads are randomly selected from all possible reads at a locus. For non-locus-based traversals (eg., ReadWalkers), this sets the maximum number of reads at each alignment start position.", "id": "#downsample_to_coverage" }, { "sbg:altPrefix": null, "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--disableRandomization", "separate": true, "sbg:cmdInclude": true }, "label": "Disable Randomization", "description": "Completely eliminates randomization from nondeterministic methods. To be used mostly in the testing framework where dynamic parallelism can result in differing numbers of calls to the generator.", "id": "#disable_radnomization" }, { "sbg:altPrefix": "-DIQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--disable_indel_quals", "separate": true, "sbg:cmdInclude": true }, "label": "Disable Indel Quals", "description": "If 'true', disables printing of base insertion and base deletion tags (with -BQSR). Turns off printing of the base insertion and base deletion tags when using the -BQSR argument and only the base substitution qualities will be produced.", "id": "#disable_indel_quals" }, { "sbg:altPrefix": "-DBQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "-1", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--defaultBaseQualities", "separate": true, "sbg:cmdInclude": true }, "label": "Default Base Qualities", "description": "If reads are missing some or all base quality scores, this value will be used for all base quality scores.", "id": "#default_base_qualities" }, { "sbg:altPrefix": null, "sbg:category": "Execution", "sbg:toolDefaultValue": "1", "type": [ "null", "int" ], "label": "CPU per job", "description": "Number of CPUs per job.", "id": "#cpu_per_job" }, { "sbg:altPrefix": "-baqGOP", "sbg:category": "GATK General", "sbg:toolDefaultValue": "40.0", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--baqGapOpenPenalty", "separate": true, "sbg:cmdInclude": true }, "label": "BAQ Gap Open Penalty", "description": "BAQ gap open penalty (Phred Scaled). Default value is 40. 30 is perhaps better for whole genome call sets.", "id": "#baq_gap_open_penalty" }, { "sbg:altPrefix": "-baq", "sbg:category": "GATK General", "sbg:toolDefaultValue": "OFF", "type": [ "null", { "type": "enum", "symbols": [ "OFF", "CALCULATE_AS_NECESSARY", "RECALCULATE" ], "name": "baq" } ], "inputBinding": { "position": 0, "prefix": "--baq", "separate": true, "sbg:cmdInclude": true }, "label": "BAQ Calculation Type", "description": "Type of BAQ calculation to apply in the engine.", "id": "#baq" }, { "sbg:altPrefix": "--allow_potentially_misencoded_quality_scores", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "-allowPotentiallyMisencodedQuals", "separate": true, "sbg:cmdInclude": true }, "label": "Allow Potentially Misencoded Quals", "description": "Do not fail when encountered base qualities that are too high and seemingly indicate a problem with the base quality encoding of the BAM file.", "id": "#allow_potentailly_misencoded_quals" } ], "outputs": [ { "type": [ "File" ], "label": "VCF", "description": "File to which variants should be written.", "outputBinding": { "glob": "*.vcf", "sbg:inheritMetadataFrom": "#variants" }, "id": "#vcf" } ], "requirements": [ { "class": "ExpressionEngineRequirement", "id": "#cwl-js-engine", "requirements": [ { "class": "DockerRequirement", "dockerPull": "rabix/js-engine" } ] } ], "hints": [ { "class": "sbg:CPURequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.cpu_per_job){\n \treturn $job.inputs.cpu_per_job\n }\n return 1 \n}" } }, { "class": "sbg:MemRequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.memory_per_job){\n if($job.inputs.memory_overhead_per_job){\n \treturn $job.inputs.memory_per_job + $job.inputs.memory_overhead_per_job\n }\n else\n \t\treturn $job.inputs.memory_per_job\n }\n else if(!$job.inputs.memory_per_job && $job.inputs.memory_overhead_per_job){\n\t\treturn 2048 + $job.inputs.memory_overhead_per_job \n }\n else\n \treturn 2048\n}" } }, { "class": "DockerRequirement", "dockerImageId": "47510cb2da55", "dockerPull": "images.sbgenomics.com/stefanristeski/gatk2-lite:2.3-9" } ], "arguments": [ { "position": 0, "prefix": "--out", "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n variant_name = [].concat($job.inputs.variants)[0].path.replace(/^.*[\\\\\\/]/, '').split('.')\n variant_namebase = variant_name.slice(0, variant_name.length-1).join('.')\n return variant_namebase + '.recalibrated.vcf'\n}" } } ], "sbg:job": { "inputs": { "variants": { "class": "File", "path": "variants.ext", "secondaryFiles": [], "size": 0 }, "validation_strictness": null, "use_original_qualities": null, "use_legacy_downsampler": null, "unsafe": null, "ts_filter_level": null, "tranches_file": { "class": "File", "path": "tranches_file.ext", "secondaryFiles": [], "size": 0 }, "threads_per_job": null, "tag": null, "remove_program_records": null, "reference": { "path": "." }, "recal_file": { "class": "File", "path": "recal_file.ext", "secondaryFiles": [], "size": 0 }, "read_group_black_list": [], "read_filter": [], "preserve_qscores_less_than": null, "phone_home": null, "pedigree_validation_type": null, "pedigree_string": [], "non_deterministic_random_seed": null, "mode": null, "memory_per_job": null, "memory_overhead_per_job": 0, "max_runtime_units": null, "max_runtime": null, "keep_program_records": null, "intervals_file": null, "intervals": null, "interval_set_rule": null, "interval_padding": null, "interval_merging": null, "ignore_filter": [], "gatk_key": null, "fix_misencoded_quals": null, "exclude_intervals": null, "emit_original_quals": null, "downsampling_type": null, "downsample_to_fraction": null, "downsample_to_coverage": null, "disable_radnomization": null, "disable_indel_quals": null, "default_base_qualities": null, "cpu_per_job": null, "baq_gap_open_penalty": null, "baq": null, "allow_potentailly_misencoded_quals": null }, "allocatedResources": { "mem": 2048, "cpu": 1 } }, "sbg:categories": [ "VCF-Processing" ], "sbg:cmdPreview": "java -Xmx2048M -jar /opt/GenomeAnalysisTKLite.jar --analysis_type ApplyRecalibration -nt 8 --reference_sequence . --input variants.ext --recal_file recal_file.ext --tranches_file tranches_file.ext --out variants.recalibrated.vcf", "sbg:contributors": [ "bix-demo" ], "sbg:createdBy": "bix-demo", "sbg:createdOn": 1450911340, "sbg:id": "admin/sbg-public-data/gatk-2-3-9-lite-applyrecalibration/6", "sbg:image_url": null, "sbg:latestRevision": 6, "sbg:license": "MIT License", "sbg:links": [ { "id": "https://www.broadinstitute.org/gatk/index.php", "label": "Homepage" }, { "id": "https://github.com/broadgsa/gatk-protected", "label": "Source Code" }, { "id": "https://www.broadinstitute.org/gatk/guide/pdfdocs/GATK_GuideBook_2.3-9.pdf", "label": "Wiki" }, { "id": "https://www.broadinstitute.org/gatk/download/auth?package=GATK-archive&version=2.3-9-ge5ebf34", "label": "Download" }, { "id": "https://www.broadinstitute.org/gatk/about/#in-the-literature", "label": "Publication" }, { "id": "https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantrecalibration_ApplyRecalibration.php", "label": "Documentation" } ], "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911345, "sbg:project": "bix-demo/gatk-2-3-9-lite-demo", "sbg:revision": 6, "sbg:revisionsInfo": [ { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911340, "sbg:revision": 0 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911341, "sbg:revision": 1 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911342, "sbg:revision": 2 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911342, "sbg:revision": 3 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911344, "sbg:revision": 4 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911344, "sbg:revision": 5 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911345, "sbg:revision": 6 } ], "sbg:sbgMaintained": false, "sbg:toolAuthor": "Broad Institute", "sbg:toolkit": "GATK", "sbg:toolkitVersion": "2.3.9 Lite", "sbg:validationErrors": [], "x": 3041.569298872784, "y": -5.784310809147737 }, "label": "SNP GATK ApplyRecalibration", "sbg:x": 3041.569298872784, "sbg:y": -5.784310809147737 }, { "id": "#GATK_ApplyRecalibration_1", "inputs": [ { "id": "#GATK_ApplyRecalibration_1.variants", "source": [ "#GATK_ApplyRecalibration.vcf" ] }, { "id": "#GATK_ApplyRecalibration_1.ts_filter_level", "default": 99 }, { "id": "#GATK_ApplyRecalibration_1.tranches_file", "source": [ "#GATK_VariantRecalibrator_1.tranches_file" ] }, { "id": "#GATK_ApplyRecalibration_1.threads_per_job", "default": 32 }, { "id": "#GATK_ApplyRecalibration_1.reference", "source": [ "#SBG_FASTA_Indices.fasta_reference" ] }, { "id": "#GATK_ApplyRecalibration_1.recal_file", "source": [ "#GATK_VariantRecalibrator_1.recal_file" ] }, { "id": "#GATK_ApplyRecalibration_1.mode", "default": "INDEL" }, { "id": "#GATK_ApplyRecalibration_1.memory_per_job", "default": 20000 }, { "id": "#GATK_ApplyRecalibration_1.cpu_per_job", "default": 1 } ], "outputs": [ { "id": "#GATK_ApplyRecalibration_1.vcf" } ], "run": { "cwlVersion": "sbg:draft-2", "class": "CommandLineTool", "id": "admin/sbg-public-data/gatk-2-3-9-lite-applyrecalibration/6", "label": "INDEL GATK ApplyRecalibration", "description": "Overview\n\nThis tool performs the second pass in a two-stage process called VQSR; the first pass is performed by the VariantRecalibrator tool. In brief, the first pass consists of creating a Gaussian mixture model by looking at the distribution of annotation values over a high quality subset of the input call set, and then scoring all input variants according to the model. The second pass consists of filtering variants based on score cutoffs identified in the first pass.\n\nUsing the tranche file and recalibration table generated by the previous step, the ApplyRecalibration tool looks at each variant's VQSLOD value and decides which tranche it falls in. Variants in tranches that fall below the specified truth sensitivity filter level have their FILTER field annotated with the corresponding tranche level. This will result in a call set that is filtered to the desired level but retains the information necessary to increase sensitivity if needed.\n\nTo be clear, please note that by \"filtered\", we mean that variants failing the requested tranche cutoff are marked as filtered in the output VCF; they are not discarded.\n\nVQSR is probably the hardest part of the Best Practices to get right, so be sure to read the method documentation, parameter recommendations and tutorial to really understand what these tools and how to use them for best results on your own data.\n\nInput\nThe raw input variants to be filtered.\nThe recalibration table file that was generated by the VariantRecalibrator tool.\nThe tranches file that was generated by the VariantRecalibrator tool.\n\nOutput\nA recalibrated VCF file in which each variant of the requested type is annotated with its VQSLOD and marked as filtered if the score is below the desired quality level.\n\nUsage example for filtering SNPs\n\n java -Xmx3g -jar GenomeAnalysisTK.jar \\\n -T ApplyRecalibration \\\n -R reference.fasta \\\n -input NA12878.HiSeq.WGS.bwa.cleaned.raw.subset.b37.vcf \\\n --ts_filter_level 99.0 \\\n -tranchesFile path/to/output.tranches \\\n -recalFile path/to/output.recal \\\n -mode SNP \\\n -o path/to/output.recalibrated.filtered.vcf\n \nCaveats\n\nThe tranche values used in the example above is only a general example. You should determine the level of sensitivity that is appropriate for your specific project. Remember that higher sensitivity (more power to detect variants, yay!) comes at the cost of specificity (more false negatives, boo!). You have to choose at what point you want to set the tradeoff.\nIn order to create the tranche reporting plots (which are only generated for SNPs, not indels!) Rscript needs to be in your environment PATH (this is the scripting version of R, not the interactive version). See http://www.r-project.org for more info on how to download and install R.\n\n(IMPORTANT) Reference \".fasta\" Secondary Files\n\nTools in GATK that require a fasta reference file also look for the reference file's corresponding .fai (fasta index) and .dict (fasta dictionary) files. The fasta index file allows random access to reference bases and the dictionary file is a dictionary of the contig names and sizes contained within the fasta reference. These two secondary files are essential for GATK to work properly. To append these two files to your fasta reference please use the 'SBG FASTA Indices' tool within your GATK based workflow before using any of the GATK tools.", "baseCommand": [ "java", { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.memory_per_job){\n \treturn '-Xmx'.concat($job.inputs.memory_per_job, 'M')\n }\n \treturn '-Xmx2048M'\n}" }, "-jar", "/opt/GenomeAnalysisTKLite.jar", "--analysis_type", "ApplyRecalibration", { "class": "Expression", "engine": "#cwl-js-engine", "script": "{ \n if($job.inputs.threads_per_job){\n return '-nt '.concat($job.inputs.threads_per_job)\n }\n else{\n \treturn '-nt '.concat(8)\n }\n}" } ], "inputs": [ { "required": true, "sbg:altPrefix": "-input", "sbg:category": "Input Files", "type": [ "File" ], "inputBinding": { "position": 0, "prefix": "--input", "separate": true, "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Input", "description": "The raw input variants to be recalibrated.", "sbg:fileTypes": "VCF", "id": "#variants" }, { "sbg:altPrefix": "-S", "sbg:category": "GATK General", "sbg:toolDefaultValue": "SILENT", "type": [ "null", { "type": "enum", "symbols": [ "SILENT", "LENIENT", "STRICT" ], "name": "validation_strictness" } ], "inputBinding": { "position": 0, "prefix": "--validation_strictness", "separate": true, "sbg:cmdInclude": true }, "label": "Validation Strictness", "description": "How strict should we be with validation.", "id": "#validation_strictness" }, { "sbg:altPrefix": "-OQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--useOriginalQualities", "separate": true, "sbg:cmdInclude": true }, "label": "Use Original Qualities", "description": "If set, use the original base quality scores from the OQ tag when present instead of the standard scores.", "id": "#use_original_qualities" }, { "sbg:altPrefix": "-use_legacy_downsampler", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--use_legacy_downsampler", "separate": true, "sbg:cmdInclude": true }, "label": "Use Legacy Downsampler", "description": "Use the legacy downsampling implementation instead of the newer, less-tested implementation.", "id": "#use_legacy_downsampler" }, { "sbg:altPrefix": "-U", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", { "type": "enum", "symbols": [ "ALLOW_UNINDEXED_BAM", "ALLOW_UNSET_BAM_SORT_ORDER", "NO_READ_ORDER_VERIFICATION", "ALLOW_SEQ_DICT_INCOMPATIBILITY", "LENIENT_VCF_PROCESSING", "ALL" ], "name": "unsafe" } ], "inputBinding": { "position": 0, "prefix": "--unsafe", "separate": true, "sbg:cmdInclude": true }, "label": "Unsafe", "description": "If set, enables unsafe operations: nothing will be checked at runtime. For expert users only who know what they are doing. We do not support usage of this argument.", "id": "#unsafe" }, { "sbg:altPrefix": "-ts_filter_level", "sbg:category": "Apply Recalibration", "sbg:toolDefaultValue": "99.0", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--ts_filter_level", "separate": true, "sbg:cmdInclude": true }, "label": "Ts Filter Level", "description": "The truth sensitivity level at which to start filtering.", "id": "#ts_filter_level" }, { "required": true, "sbg:altPrefix": "-tranchesFile", "sbg:category": "Input Files", "type": [ "File" ], "inputBinding": { "position": 0, "prefix": "--tranches_file", "separate": true, "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Tranches File", "description": "The input tranches file describing where to cut the data.", "sbg:fileTypes": "TRANCHES", "id": "#tranches_file" }, { "sbg:altPrefix": null, "sbg:category": "Execution", "sbg:toolDefaultValue": "8", "type": [ "null", "int" ], "label": "Threads per job", "description": "For tools which support multiprocessing, this value can be used to set the number of threads to be used.", "id": "#threads_per_job" }, { "sbg:altPrefix": "-tag", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "string" ], "inputBinding": { "position": 0, "prefix": "--tag", "separate": true, "sbg:cmdInclude": true }, "label": "Tag", "description": "Arbitrary tag string to identify this GATK run as part of a group of runs, for later analysis.", "id": "#tag" }, { "sbg:altPrefix": "-rpr", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--remove_program_records", "separate": true, "sbg:cmdInclude": true }, "label": "Remove Program Records", "description": "Should we override the Walker's default and remove program records from the SAM header.", "id": "#remove_program_records" }, { "required": true, "sbg:altPrefix": "-R", "sbg:category": "Input Files", "type": [ "File" ], "inputBinding": { "position": 0, "prefix": "--reference_sequence", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Reference Genome", "description": "Reference Genome in FASTA format.", "sbg:fileTypes": "FASTA, FA", "id": "#reference" }, { "required": true, "sbg:altPrefix": "-recalFile", "sbg:category": "Input Files", "type": [ "File" ], "inputBinding": { "position": 0, "prefix": "--recal_file", "separate": true, "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Recal File", "description": "The input recal file used by ApplyRecalibration.", "sbg:fileTypes": "RECAL", "id": "#recal_file" }, { "sbg:altPrefix": "-rgbl", "sbg:category": "GATK General", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 0, "prefix": "--read_group_black_list", "separate": true, "sbg:cmdInclude": true }, "label": "Read Group Black List", "description": "Filters out read groups matching : or a .txt file containing the filter strings one per line.", "id": "#read_group_black_list" }, { "sbg:altPrefix": "-rf", "sbg:category": "GATK General", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": { "type": "enum", "symbols": [ "BadCigarFilter", "BadMateFilter", "CountingFilteringIterator.CountingReadFilter", "DuplicateReadFilter", "FailsVendorQualityCheckFilter", "HCMappingQualityFilter", "LibraryReadFilter", "MalformedReadFilter", "MappingQualityFilter", "MappingQualityUnavailableFilter", "MappingQualityZeroFilter", "MateSameStrandFilter", "MaxInsertSizeFilter", "MissingReadGroupFilter", "NoOriginalQualityScoresFilter", "NotPrimaryAlignmentFilter", "OverclippedReadFilter", "Platform454Filter", "PlatformFilter", "PlatformUnitFilter", "ReadGroupBlackListFilter", "ReadLengthFilter", "ReadNameFilter", "ReadStrandFilter", "ReassignMappingQualityFilter", "ReassignOneMappingQualityFilter", "SampleFilter", "SingleReadGroupFilter", "UnmappedReadFilter" ] } } ], "inputBinding": { "position": 0, "prefix": "--read_filter", "separate": true, "sbg:cmdInclude": true }, "label": "Read Filter", "description": "Specify filtration criteria to apply to each read individually.", "id": "#read_filter" }, { "sbg:altPrefix": "-preserveQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "6", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--preserve_qscores_less_than", "separate": true, "sbg:cmdInclude": true }, "label": "Preserve Qscores Less Than", "description": "Bases with quality scores less than this threshold won't be recalibrated (with -BQSR).", "id": "#preserve_qscores_less_than" }, { "sbg:altPrefix": "-et", "sbg:category": "GATK General", "sbg:toolDefaultValue": "STANDARD", "type": [ "null", { "type": "enum", "symbols": [ "NO_ET", "STANDARD" ], "name": "phone_home" } ], "inputBinding": { "position": 0, "prefix": "--phone_home", "separate": true, "sbg:cmdInclude": true }, "label": "Phone Home", "description": "What kind of GATK run report should we generate? STANDARD is the default, can be NO_ET so nothing is posted to the run repository. Please see http://gatkforums.broadinstitute.org/discussion/1250/what-is-phone-home-and-how-does-it-affect-me#latest for details.", "id": "#phone_home" }, { "sbg:altPrefix": "-pedValidationType", "sbg:category": "GATK General", "sbg:toolDefaultValue": "STRICT", "type": [ "null", { "type": "enum", "symbols": [ "STRICT", "SILENT" ], "name": "pedigree_validation_type" } ], "inputBinding": { "position": 0, "prefix": "--pedigreeValidationType", "separate": true, "sbg:cmdInclude": true }, "label": "Pedigree Validation Type", "description": "How strict should we be in validating the pedigree information?.", "id": "#pedigree_validation_type" }, { "sbg:altPrefix": "-pedString", "sbg:category": "GATK General", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 0, "prefix": "--pedigreeString", "separate": true, "sbg:cmdInclude": true }, "label": "Pedigree String", "description": "Pedigree string for samples.", "id": "#pedigree_string" }, { "sbg:altPrefix": "-ndrs", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--nonDeterministicRandomSeed", "separate": true, "sbg:cmdInclude": true }, "label": "Non Deterministic Random Seed", "description": "Makes the GATK behave non deterministically, that is, the random numbers generated will be different in every run.", "id": "#non_deterministic_random_seed" }, { "sbg:altPrefix": "-mode", "sbg:category": "Apply Recalibration", "sbg:toolDefaultValue": "SNP", "type": [ "null", { "type": "enum", "symbols": [ "SNP", "INDEL", "BOTH" ], "name": "mode" } ], "inputBinding": { "position": 0, "prefix": "--mode", "separate": true, "sbg:cmdInclude": true }, "label": "Mode", "description": "Recalibration mode to employ: 1.) SNP for recalibrating only SNPs (emitting indels untouched in the output VCF); 2.) INDEL for indels; and 3.) BOTH for recalibrating both SNPs and indels simultaneously.", "id": "#mode" }, { "sbg:altPrefix": null, "sbg:category": "Execution", "sbg:toolDefaultValue": "2048", "type": [ "null", "int" ], "label": "Memory per job", "description": "Amount of RAM memory to be used per job.", "id": "#memory_per_job" }, { "sbg:category": "Execution", "sbg:toolDefaultValue": "0", "type": [ "null", "int" ], "label": "Memory overhead per job", "description": "Memory overhead per job. By default this parameter value is set to '0' (zero megabytes). This parameter value is added to the Memory per job parameter value. This results in the allocation of the sum total (Memory per job and Memory overhead per job) amount of memory per job. By default the memory per job parameter value is set to 2048 megabytes, unless specified otherwise.", "id": "#memory_overhead_per_job" }, { "sbg:altPrefix": "-maxRuntimeUnits", "sbg:category": "GATK General", "sbg:toolDefaultValue": "MINUTES", "type": [ "null", { "type": "enum", "symbols": [ "NANOSECONDS", "MICROSECONDS", "MILLISECONDS", "SECONDS", "MINUTES", "HOURS", "DAYS" ], "name": "max_runtime_units" } ], "inputBinding": { "position": 0, "prefix": "--maxRuntimeUnits", "separate": true, "sbg:cmdInclude": true }, "label": "Max Runtime Units", "description": "The TimeUnit for maxRuntime.", "id": "#max_runtime_units" }, { "sbg:altPrefix": "-maxRuntime", "sbg:category": "GATK General", "sbg:toolDefaultValue": "-1", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--maxRuntime", "separate": true, "sbg:cmdInclude": true }, "label": "Max Runtime", "description": "If provided, that GATK will stop execution cleanly as soon after maxRuntime has been exceeded, truncating the run but not exiting with a failure. By default the value is interpreted in minutes, but this can be changed by maxRuntimeUnits.", "id": "#max_runtime" }, { "sbg:altPrefix": "-kpr", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--keep_program_records", "separate": true, "sbg:cmdInclude": true }, "label": "Keep Program Records", "description": "Should we override the Walker's default and keep program records from the SAM header.", "id": "#keep_program_records" }, { "required": false, "sbg:altPrefix": "-L", "sbg:category": "Input Files", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--intervals", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Intervals", "description": "One or more genomic intervals over which to operate. Can be an specified in an .intervals file or a rod file.", "sbg:fileTypes": "TXT, BED, VCF", "id": "#intervals_file" }, { "sbg:altPrefix": "--intervals", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "string" ], "inputBinding": { "position": 0, "prefix": "-L", "separate": true, "sbg:cmdInclude": true }, "label": "Intervals", "description": "One or more genomic intervals over which to operate.", "id": "#intervals" }, { "sbg:altPrefix": "-isr", "sbg:category": "GATK General", "sbg:toolDefaultValue": "UNION", "type": [ "null", { "type": "enum", "symbols": [ "UNION", "INTERSECTION" ], "name": "interval_set_rule" } ], "inputBinding": { "position": 0, "prefix": "--interval_set_rule", "separate": true, "sbg:cmdInclude": true }, "label": "Interval Set Rule", "description": "Indicates the set merging approach the interval parser should use to combine the various -L or -XL inputs.", "id": "#interval_set_rule" }, { "sbg:altPrefix": "-ip", "sbg:category": "GATK General", "sbg:toolDefaultValue": "0", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--interval_padding", "separate": true, "sbg:cmdInclude": true }, "label": "Interval Padding", "description": "Indicates how many basepairs of padding to include around each of the intervals specified with the -L/--intervals argument.", "id": "#interval_padding" }, { "sbg:altPrefix": "-im", "sbg:category": "GATK General", "sbg:toolDefaultValue": "ALL", "type": [ "null", { "type": "enum", "symbols": [ "ALL", "OVERLAPPING_ONLY" ], "name": "interval_merging" } ], "inputBinding": { "position": 0, "prefix": "--interval_merging", "separate": true, "sbg:cmdInclude": true }, "label": "Interval Merging", "description": "Indicates the interval merging rule we should use for abutting intervals.", "id": "#interval_merging" }, { "sbg:altPrefix": "-ignoreFilter", "sbg:category": "Apply Recalibration", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 0, "prefix": "--ignore_filter", "separate": true, "sbg:cmdInclude": true }, "label": "Ignore Filter", "description": "If specified the variant recalibrator will use variants even if the specified filter name is marked in the input VCF file.", "id": "#ignore_filter" }, { "required": false, "sbg:altPrefix": "-K", "sbg:category": "Input Files", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--gatk_key", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Gatk key", "description": "GATK Key file. Required if running with -et NO_ET. Please see http://gatkforums.broadinstitute.org/discussion/1250/what-is-phone-home-and-how-does-it-affect-me#latest for details.", "sbg:fileTypes": "KEY, LICENSE", "id": "#gatk_key" }, { "sbg:altPrefix": "-fixMisencodedQuals", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "-fixMisencodedQuals", "separate": true, "sbg:cmdInclude": true }, "label": "Fix Misencoded Quals", "description": "Fix mis-encoded base quality scores.", "id": "#fix_misencoded_quals" }, { "required": false, "sbg:altPrefix": "-XL", "sbg:category": "Input Files", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--excludeIntervals", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Exclude Intervals", "description": "One or more genomic intervals to exclude from processing. Can be an .intervals file or a rod file.", "sbg:fileTypes": "TXT, BED, VCF", "id": "#exclude_intervals" }, { "sbg:altPrefix": "-EOQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--emit_original_quals", "separate": true, "sbg:cmdInclude": true }, "label": "Emit Original Quals", "description": "If true, enables printing of the OQ tag with the original base qualities (with -BQSR).", "id": "#emit_original_quals" }, { "sbg:altPrefix": "-dt", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", { "type": "enum", "symbols": [ "NONE", "ALL_READS", "BY_SAMPLE" ], "name": "downsampling_type" } ], "inputBinding": { "position": 0, "prefix": "--downsampling_type", "separate": true, "sbg:cmdInclude": true }, "label": "Downsampling Type", "description": "Type of reads downsampling to employ at a given locus. Reads will be selected randomly to be removed from the pile based on the method described here.", "id": "#downsampling_type" }, { "sbg:altPrefix": "-dfrac", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--downsample_to_fraction", "separate": true, "sbg:cmdInclude": true }, "label": "Downsample to Fraction", "description": "Fraction [0.0-1.0] of reads to downsample to.", "id": "#downsample_to_fraction" }, { "sbg:altPrefix": "-dcov", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--downsample_to_coverage", "separate": true, "sbg:cmdInclude": true }, "label": "Downsample to Coverage", "description": "Coverage to downsample to at any given locus; note that downsampled reads are randomly selected from all possible reads at a locus. For non-locus-based traversals (eg., ReadWalkers), this sets the maximum number of reads at each alignment start position.", "id": "#downsample_to_coverage" }, { "sbg:altPrefix": null, "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--disableRandomization", "separate": true, "sbg:cmdInclude": true }, "label": "Disable Randomization", "description": "Completely eliminates randomization from nondeterministic methods. To be used mostly in the testing framework where dynamic parallelism can result in differing numbers of calls to the generator.", "id": "#disable_radnomization" }, { "sbg:altPrefix": "-DIQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--disable_indel_quals", "separate": true, "sbg:cmdInclude": true }, "label": "Disable Indel Quals", "description": "If 'true', disables printing of base insertion and base deletion tags (with -BQSR). Turns off printing of the base insertion and base deletion tags when using the -BQSR argument and only the base substitution qualities will be produced.", "id": "#disable_indel_quals" }, { "sbg:altPrefix": "-DBQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "-1", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--defaultBaseQualities", "separate": true, "sbg:cmdInclude": true }, "label": "Default Base Qualities", "description": "If reads are missing some or all base quality scores, this value will be used for all base quality scores.", "id": "#default_base_qualities" }, { "sbg:altPrefix": null, "sbg:category": "Execution", "sbg:toolDefaultValue": "1", "type": [ "null", "int" ], "label": "CPU per job", "description": "Number of CPUs per job.", "id": "#cpu_per_job" }, { "sbg:altPrefix": "-baqGOP", "sbg:category": "GATK General", "sbg:toolDefaultValue": "40.0", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--baqGapOpenPenalty", "separate": true, "sbg:cmdInclude": true }, "label": "BAQ Gap Open Penalty", "description": "BAQ gap open penalty (Phred Scaled). Default value is 40. 30 is perhaps better for whole genome call sets.", "id": "#baq_gap_open_penalty" }, { "sbg:altPrefix": "-baq", "sbg:category": "GATK General", "sbg:toolDefaultValue": "OFF", "type": [ "null", { "type": "enum", "symbols": [ "OFF", "CALCULATE_AS_NECESSARY", "RECALCULATE" ], "name": "baq" } ], "inputBinding": { "position": 0, "prefix": "--baq", "separate": true, "sbg:cmdInclude": true }, "label": "BAQ Calculation Type", "description": "Type of BAQ calculation to apply in the engine.", "id": "#baq" }, { "sbg:altPrefix": "--allow_potentially_misencoded_quality_scores", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "-allowPotentiallyMisencodedQuals", "separate": true, "sbg:cmdInclude": true }, "label": "Allow Potentially Misencoded Quals", "description": "Do not fail when encountered base qualities that are too high and seemingly indicate a problem with the base quality encoding of the BAM file.", "id": "#allow_potentailly_misencoded_quals" } ], "outputs": [ { "type": [ "File" ], "label": "VCF", "description": "File to which variants should be written.", "outputBinding": { "glob": "*.vcf", "sbg:inheritMetadataFrom": "#variants" }, "id": "#vcf" } ], "requirements": [ { "class": "ExpressionEngineRequirement", "id": "#cwl-js-engine", "requirements": [ { "class": "DockerRequirement", "dockerPull": "rabix/js-engine" } ] } ], "hints": [ { "class": "sbg:CPURequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.cpu_per_job){\n \treturn $job.inputs.cpu_per_job\n }\n return 1 \n}" } }, { "class": "sbg:MemRequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.memory_per_job){\n if($job.inputs.memory_overhead_per_job){\n \treturn $job.inputs.memory_per_job + $job.inputs.memory_overhead_per_job\n }\n else\n \t\treturn $job.inputs.memory_per_job\n }\n else if(!$job.inputs.memory_per_job && $job.inputs.memory_overhead_per_job){\n\t\treturn 2048 + $job.inputs.memory_overhead_per_job \n }\n else\n \treturn 2048\n}" } }, { "class": "DockerRequirement", "dockerImageId": "47510cb2da55", "dockerPull": "images.sbgenomics.com/stefanristeski/gatk2-lite:2.3-9" } ], "arguments": [ { "position": 0, "prefix": "--out", "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n variant_name = [].concat($job.inputs.variants)[0].path.replace(/^.*[\\\\\\/]/, '').split('.')\n variant_namebase = variant_name.slice(0, variant_name.length-1).join('.')\n return variant_namebase + '.recalibrated.vcf'\n}" } } ], "sbg:job": { "inputs": { "variants": { "class": "File", "path": "variants.ext", "secondaryFiles": [], "size": 0 }, "validation_strictness": null, "use_original_qualities": null, "use_legacy_downsampler": null, "unsafe": null, "ts_filter_level": null, "tranches_file": { "class": "File", "path": "tranches_file.ext", "secondaryFiles": [], "size": 0 }, "threads_per_job": null, "tag": null, "remove_program_records": null, "reference": { "path": "." }, "recal_file": { "class": "File", "path": "recal_file.ext", "secondaryFiles": [], "size": 0 }, "read_group_black_list": [], "read_filter": [], "preserve_qscores_less_than": null, "phone_home": null, "pedigree_validation_type": null, "pedigree_string": [], "non_deterministic_random_seed": null, "mode": null, "memory_per_job": null, "memory_overhead_per_job": 0, "max_runtime_units": null, "max_runtime": null, "keep_program_records": null, "intervals_file": null, "intervals": null, "interval_set_rule": null, "interval_padding": null, "interval_merging": null, "ignore_filter": [], "gatk_key": null, "fix_misencoded_quals": null, "exclude_intervals": null, "emit_original_quals": null, "downsampling_type": null, "downsample_to_fraction": null, "downsample_to_coverage": null, "disable_radnomization": null, "disable_indel_quals": null, "default_base_qualities": null, "cpu_per_job": null, "baq_gap_open_penalty": null, "baq": null, "allow_potentailly_misencoded_quals": null }, "allocatedResources": { "mem": 2048, "cpu": 1 } }, "sbg:categories": [ "VCF-Processing" ], "sbg:cmdPreview": "java -Xmx2048M -jar /opt/GenomeAnalysisTKLite.jar --analysis_type ApplyRecalibration -nt 8 --reference_sequence . --input variants.ext --recal_file recal_file.ext --tranches_file tranches_file.ext --out variants.recalibrated.vcf", "sbg:contributors": [ "bix-demo" ], "sbg:createdBy": "bix-demo", "sbg:createdOn": 1450911340, "sbg:id": "admin/sbg-public-data/gatk-2-3-9-lite-applyrecalibration/6", "sbg:image_url": null, "sbg:latestRevision": 6, "sbg:license": "MIT License", "sbg:links": [ { "id": "https://www.broadinstitute.org/gatk/index.php", "label": "Homepage" }, { "id": "https://github.com/broadgsa/gatk-protected", "label": "Source Code" }, { "id": "https://www.broadinstitute.org/gatk/guide/pdfdocs/GATK_GuideBook_2.3-9.pdf", "label": "Wiki" }, { "id": "https://www.broadinstitute.org/gatk/download/auth?package=GATK-archive&version=2.3-9-ge5ebf34", "label": "Download" }, { "id": "https://www.broadinstitute.org/gatk/about/#in-the-literature", "label": "Publication" }, { "id": "https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantrecalibration_ApplyRecalibration.php", "label": "Documentation" } ], "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911345, "sbg:project": "bix-demo/gatk-2-3-9-lite-demo", "sbg:revision": 6, "sbg:revisionsInfo": [ { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911340, "sbg:revision": 0 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911341, "sbg:revision": 1 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911342, "sbg:revision": 2 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911342, "sbg:revision": 3 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911344, "sbg:revision": 4 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911344, "sbg:revision": 5 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911345, "sbg:revision": 6 } ], "sbg:sbgMaintained": false, "sbg:toolAuthor": "Broad Institute", "sbg:toolkit": "GATK", "sbg:toolkitVersion": "2.3.9 Lite", "sbg:validationErrors": [], "x": 3227.9416940321503, "y": 163.23535943966522 }, "label": "INDEL GATK ApplyRecalibration", "sbg:x": 3227.9416940321503, "sbg:y": 163.23535943966522 }, { "id": "#SBG_FASTA_Indices", "inputs": [ { "id": "#SBG_FASTA_Indices.reference", "source": [ "#SBG_Untar_fasta.output_fasta" ] } ], "outputs": [ { "id": "#SBG_FASTA_Indices.fasta_reference" }, { "id": "#SBG_FASTA_Indices.fasta_index" }, { "id": "#SBG_FASTA_Indices.fasta_dict" } ], "run": { "cwlVersion": "sbg:draft-2", "class": "CommandLineTool", "id": "admin/sbg-public-data/sbg-fasta-indices/9", "label": "SBG FASTA Indices", "description": "Tool allows creating FASTA dictionary and index simultaneously which is necessary for running GATK tools. This version of tool for indexing uses SAMtools faidx command (toolkit version0.1.19), while for the FASTA dictionary is used Picard CreateFastaDictionary (toolkit version 1.140)", "baseCommand": [ "python", "/opt/sbg-fasta-indices.py" ], "inputs": [ { "required": true, "sbg:stageInput": "link", "type": [ "File" ], "inputBinding": { "position": 0, "prefix": "--REFERENCE", "separate": true, "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "FASTA file", "description": "FASTA file to be indexed", "id": "#reference" } ], "outputs": [ { "type": [ "null", "File" ], "label": "Reference", "sbg:fileTypes": "FASTA", "outputBinding": { "glob": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n return $job.inputs.reference.path.split(\"/\").pop()\n}" }, "sbg:inheritMetadataFrom": "#reference", "secondaryFiles": [ ".fai", "^.dict" ] }, "id": "#fasta_reference", "fileTypes": "FASTA, FA" }, { "type": [ "null", "File" ], "label": "FASTA Index", "sbg:fileTypes": "FAI", "outputBinding": { "glob": "*.fai" }, "id": "#fasta_index" }, { "type": [ "null", "File" ], "label": "FASTA Dictionary", "sbg:fileTypes": "DICT", "outputBinding": { "glob": "*.dict" }, "id": "#fasta_dict" } ], "requirements": [ { "class": "ExpressionEngineRequirement", "id": "#cwl-js-engine", "requirements": [ { "class": "DockerRequirement", "dockerPull": "rabix/js-engine" } ] } ], "hints": [ { "class": "sbg:CPURequirement", "value": 1 }, { "class": "sbg:MemRequirement", "value": 2500 }, { "class": "DockerRequirement", "dockerImageId": "b177f5bd06db", "dockerPull": "images.sbgenomics.com/djordje_klisic/sbg-fasta-indices:1.0" } ], "arguments": [ { "position": 0, "separate": true, "valueFrom": "--dict" }, { "position": 0, "separate": true, "valueFrom": "--fai" } ], "sbg:job": { "inputs": { "reference": { "class": "File", "path": "/path/to/reference.ext", "secondaryFiles": [], "size": 0 } }, "allocatedResources": { "mem": 2500, "cpu": 1 } }, "sbg:categories": [ "Indexing" ], "sbg:cmdPreview": "python /opt/sbg-fasta-indices.py --REFERENCE /path/to/reference.ext --dict --fai", "sbg:contributors": [ "markop", "bix-demo" ], "sbg:createdBy": "bix-demo", "sbg:createdOn": 1450911283, "sbg:id": "admin/sbg-public-data/sbg-fasta-indices/9", "sbg:image_url": null, "sbg:latestRevision": 5, "sbg:license": "Apache License 2.0", "sbg:modifiedBy": "markop", "sbg:modifiedOn": 1458669249, "sbg:project": "bix-demo/sbgtools-demo", "sbg:revision": 5, "sbg:revisionsInfo": [ { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911283, "sbg:revision": 0 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911283, "sbg:revision": 1 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1458655509, "sbg:revision": 3 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1458655251, "sbg:revision": 2 }, { "sbg:modifiedBy": "markop", "sbg:modifiedOn": 1458658019, "sbg:revision": 4 }, { "sbg:modifiedBy": "markop", "sbg:modifiedOn": 1458669249, "sbg:revision": 5 } ], "sbg:sbgMaintained": false, "sbg:toolAuthor": "Sanja Mijalkovic, Seven Bridges Genomics, ", "sbg:toolkit": "SBGTools", "sbg:validationErrors": [], "x": 733.3336788416034, "y": 368.3333593606951 }, "label": "SBG FASTA Indices", "sbg:x": 733.3336788416034, "sbg:y": 368.3333593606951 }, { "id": "#SBG_Html2b64", "inputs": [ { "id": "#SBG_Html2b64.input_file", "source": [ "#FastQC.report_zip" ] } ], "outputs": [ { "id": "#SBG_Html2b64.b64html" } ], "run": { "cwlVersion": "sbg:draft-2", "class": "CommandLineTool", "id": "admin/sbg-public-data/sbg-html2b64/5", "label": "SBG Html2b64", "description": "Tool for converting archived html output of FastQC and similar tools to b64html so it can easily be displayed in web browsers or on SBG platform.", "baseCommand": [ "python", "/opt/sbg_html_to_b64.py" ], "inputs": [ { "required": false, "sbg:category": "File input.", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--input", "separate": true, "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Input file", "description": "Compressed archive.", "sbg:fileTypes": "ZIP", "id": "#input_file" } ], "outputs": [ { "type": [ "null", "File" ], "label": "B64html", "description": "Output file, b64html.", "sbg:fileTypes": "HTML, B64HTML", "outputBinding": { "glob": "*b64html", "sbg:inheritMetadataFrom": "#input_file" }, "id": "#b64html" } ], "hints": [ { "class": "sbg:CPURequirement", "value": 1 }, { "class": "sbg:MemRequirement", "value": 1000 }, { "class": "DockerRequirement", "dockerImageId": "8c35d2a2d8d1", "dockerPull": "images.sbgenomics.com/mladenlsbg/sbg-html-to-b64:1.0.1" } ], "sbg:job": { "inputs": { "input_file": { "class": "File", "path": "input_file.ext", "secondaryFiles": [], "size": 0 } }, "allocatedResources": { "mem": 1000, "cpu": 1 } }, "sbg:categories": [ "Converters", "Plotting-and-Rendering" ], "sbg:cmdPreview": "python /opt/sbg_html_to_b64.py", "sbg:contributors": [ "bix-demo" ], "sbg:createdBy": "bix-demo", "sbg:createdOn": 1450911294, "sbg:id": "admin/sbg-public-data/sbg-html2b64/5", "sbg:image_url": null, "sbg:latestRevision": 2, "sbg:license": "Apache License 2.0", "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1459963571, "sbg:project": "bix-demo/sbgtools-demo", "sbg:revision": 2, "sbg:revisionsInfo": [ { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911294, "sbg:revision": 0 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911294, "sbg:revision": 1 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1459963571, "sbg:revision": 2 } ], "sbg:sbgMaintained": false, "sbg:toolAuthor": "Seven Bridges", "sbg:toolkit": "SBGTools", "sbg:toolkitVersion": "1.0", "sbg:validationErrors": [], "x": 349.33335738711946, "y": 442.3333612812897 }, "label": "SBG Html2b64", "scatter": "#SBG_Html2b64.input_file", "sbg:x": 349.33335738711946, "sbg:y": 442.3333612812897 }, { "id": "#SBG_Untar_fasta", "inputs": [ { "id": "#SBG_Untar_fasta.input_tar_with_reference", "source": [ "#reference" ] } ], "outputs": [ { "id": "#SBG_Untar_fasta.output_fasta" } ], "run": { "cwlVersion": "sbg:draft-2", "class": "CommandLineTool", "id": "admin/sbg-public-data/sbg-untar-fasta/8", "label": "SBG Untar fasta", "description": "SBG Untar fasta outputs FA/FASTA/FA.GZ/FASTA.GZ from TAR.", "baseCommand": [ { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n reference_file = $job.inputs.input_tar_with_reference.path.split('/')[$job.inputs.input_tar_with_reference.path.split('/').length-1]\n ext = reference_file.split('.')[reference_file.split('.').length-1]\n if(ext=='tar'){\n return 'tar -xf ' + reference_file \n }\n else{\n return 'echo Passing input file '\n }\n}" } ], "inputs": [ { "required": true, "sbg:stageInput": "link", "type": [ "File" ], "label": "Input archive file with fasta", "description": "The input archive file to be unpacked.", "sbg:fileTypes": "TAR, FA, FASTA, FA.GZ, FASTA.GZ", "id": "#input_tar_with_reference" } ], "outputs": [ { "type": [ "File" ], "label": "Unpacked fasta file", "description": "Unpacked fasta file from the input archive.", "outputBinding": { "glob": "{*.fasta,*.fa,*.fa.gz,*.fasta.gz}", "sbg:inheritMetadataFrom": "#input_tar_with_reference" }, "id": "#output_fasta" } ], "requirements": [ { "class": "ExpressionEngineRequirement", "id": "#cwl-js-engine", "requirements": [ { "class": "DockerRequirement", "dockerPull": "rabix/js-engine" } ] } ], "hints": [ { "class": "sbg:CPURequirement", "value": 1 }, { "class": "sbg:MemRequirement", "value": 1000 }, { "class": "DockerRequirement", "dockerImageId": "58b79c627f95", "dockerPull": "images.sbgenomics.com/markop/sbg-decompressor:1.0" } ], "sbg:job": { "inputs": { "input_tar_with_reference": { "class": "File", "path": "input_file.fasta", "secondaryFiles": [], "size": 0 } }, "allocatedResources": { "mem": 1000, "cpu": 1 } }, "sbg:categories": [ "Other" ], "sbg:cmdPreview": "echo Passing input file", "sbg:contributors": [ "vladimirk", "bix-demo" ], "sbg:createdBy": "bix-demo", "sbg:createdOn": 1466002722, "sbg:homepage": "https://igor.sbgenomics.com/", "sbg:id": "admin/sbg-public-data/sbg-untar-fasta/8", "sbg:image_url": null, "sbg:latestRevision": 7, "sbg:license": "Apache License 2.0", "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1466077480, "sbg:project": "bix-demo/sbgtools-demo", "sbg:revision": 7, "sbg:revisionsInfo": [ { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1466002722, "sbg:revision": 0 }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1466003214, "sbg:revision": 1 }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1466004710, "sbg:revision": 2 }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1466004986, "sbg:revision": 3 }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1466005081, "sbg:revision": 4 }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1466005599, "sbg:revision": 5 }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1466007588, "sbg:revision": 6 }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1466077480, "sbg:revision": 7 } ], "sbg:sbgMaintained": false, "sbg:toolAuthor": "Vladimir Kovacevic, Seven Bridges Genomics", "sbg:toolkit": "SBGTools", "sbg:toolkitVersion": "v1.0", "sbg:validationErrors": [], "x": 293.3333436648053, "y": -95.57292397444452 }, "label": "SBG Untar fasta", "sbg:x": 293.3333436648053, "sbg:y": -95.57292397444452 }, { "id": "#BWA_INDEX", "inputs": [ { "id": "#BWA_INDEX.reference", "source": [ "#reference" ] } ], "outputs": [ { "id": "#BWA_INDEX.indexed_reference" } ], "run": { "cwlVersion": "sbg:draft-2", "class": "CommandLineTool", "id": "admin/sbg-public-data/bwa-index/32", "label": "BWA INDEX", "description": "BWA INDEX constructs the FM-index (Full-text index in Minute space) for the reference genome.\nGenerated index files will be used with BWA MEM, BWA ALN, BWA SAMPE and BWA SAMSE tools.\n\nIf input reference file has TAR extension it is assumed that BWA indices came together with it. BWA INDEX will only pass that TAR to the output. If input is not TAR, the creation of BWA indices and its packing in TAR file (together with the reference) will be performed.", "baseCommand": [ { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n reference_file = $job.inputs.reference.path.split('/')[$job.inputs.reference.path.split('/').length-1]\n ext = reference_file.split('.')[reference_file.split('.').length-1]\n if(ext=='tar'){\n return 'echo Index files passed without any processing!'\n }\n else{\n index_cmd = '/opt/bwa-0.7.13/bwa index '+ reference_file + ' '\n return index_cmd\n }\n}" } ], "inputs": [ { "sbg:category": "Configuration", "type": [ "null", "int" ], "label": "Total memory [Gb]", "description": "Total memory [GB] to be reserved for the tool (Default value is 1.5 x size_of_the_reference).", "id": "#total_memory" }, { "required": true, "sbg:category": "File input", "sbg:stageInput": "link", "type": [ "File" ], "label": "Reference", "description": "Input reference fasta of TAR file with reference and indices.", "sbg:fileTypes": "FASTA, FA, FA.GZ, FASTA.GZ, TAR", "id": "#reference" }, { "sbg:category": "Configuration", "type": [ "null", "string" ], "label": "Prefix of the index to be output", "description": "Prefix of the index [same as fasta name].", "id": "#prefix_of_the_index_to_be_output" }, { "sbg:category": "Configuration", "sbg:toolDefaultValue": "auto", "type": [ "null", { "type": "enum", "symbols": [ "bwtsw", "is", "div" ], "name": "bwt_construction" } ], "inputBinding": { "position": 0, "prefix": "-a", "separate": true, "sbg:cmdInclude": true }, "label": "Bwt construction", "description": "Algorithm for constructing BWT index. Available options are:s\tIS linear-time algorithm for constructing suffix array. It requires 5.37N memory where N is the size of the database. IS is moderately fast, but does not work with database larger than 2GB. IS is the default algorithm due to its simplicity. The current codes for IS algorithm are reimplemented by Yuta Mori. bwtsw\tAlgorithm implemented in BWT-SW. This method works with the whole human genome. Warning: `-a bwtsw' does not work for short genomes, while `-a is' and `-a div' do not work not for long genomes.", "id": "#bwt_construction" }, { "sbg:category": "Configuration", "sbg:toolDefaultValue": "10000000", "type": [ "null", "int" ], "label": "Block size", "description": "Block size for the bwtsw algorithm (effective with -a bwtsw).", "id": "#block_size" }, { "sbg:category": "Configuration", "type": [ "null", "boolean" ], "label": "Output index files renamed by adding 64", "description": "Index files named as 64 instead of .*.", "id": "#add_64_to_fasta_name" } ], "outputs": [ { "type": [ "null", "File" ], "label": "TARed fasta with its BWA indices", "description": "TARed fasta with its BWA indices.", "sbg:fileTypes": "TAR", "outputBinding": { "glob": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n reference_file = $job.inputs.reference.path.split('/')[$job.inputs.reference.path.split('/').length-1]\n ext = reference_file.split('.')[reference_file.split('.').length-1]\n if(ext=='tar'){\n return reference_file\n }\n else{\n return reference_file + '.tar'\n }\n}\n" }, "sbg:inheritMetadataFrom": "#reference" }, "id": "#indexed_reference", "fileTypes": "TAR" } ], "requirements": [ { "class": "ExpressionEngineRequirement", "id": "#cwl-js-engine", "requirements": [ { "class": "DockerRequirement", "dockerPull": "rabix/js-engine" } ] } ], "hints": [ { "class": "sbg:CPURequirement", "value": 1 }, { "class": "sbg:MemRequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n GB_1 = 1024*1024*1024\n reads_size = $job.inputs.reference.size\n\n if(!reads_size) { reads_size = GB_1 }\n \n if($job.inputs.total_memory){\n return $job.inputs.total_memory * 1024\n } else {\n return (parseInt(1.5 * reads_size / (1024*1024)))\n }\n}" } }, { "class": "DockerRequirement", "dockerImageId": "2f813371e803", "dockerPull": "images.sbgenomics.com/vladimirk/bwa:0.7.13" } ], "arguments": [ { "position": 0, "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n reference_file = $job.inputs.reference.path.split('/')[$job.inputs.reference.path.split('/').length-1]\n ext = reference_file.split('.')[reference_file.split('.').length-1]\n if(ext=='tar' || !$job.inputs.bwt_construction){\n return ''\n } else {\n return '-a ' + $job.inputs.bwt_construction\n }\n}" } }, { "position": 0, "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n reference_file = $job.inputs.reference.path.split('/')[$job.inputs.reference.path.split('/').length-1]\n ext = reference_file.split('.')[reference_file.split('.').length-1]\n if(ext=='tar' || !$job.inputs.prefix){\n return ''\n } else {\n return '-p ' + $job.inputs.prefix\n }\n}\n" } }, { "position": 0, "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n reference_file = $job.inputs.reference.path.split('/')[$job.inputs.reference.path.split('/').length-1]\n ext = reference_file.split('.')[reference_file.split('.').length-1]\n if(ext=='tar' || !$job.inputs.block_size){\n return ''\n } else {\n return '-b ' + $job.inputs.block_size\n }\n}\n\n" } }, { "position": 0, "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n reference_file = $job.inputs.reference.path.split('/')[$job.inputs.reference.path.split('/').length-1]\n ext = reference_file.split('.')[reference_file.split('.').length-1]\n if(ext=='tar' || !$job.inputs.add_64_to_fasta_name){\n return ''\n } else {\n return '-6 '\n }\n}\n" } }, { "position": 0, "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n reference_file = $job.inputs.reference.path.split('/')[$job.inputs.reference.path.split('/').length-1]\n ext = reference_file.split('.')[reference_file.split('.').length-1]\n if(ext=='tar'){\n return ''\n }\n else{\n tar_cmd = 'tar -cf ' + reference_file + '.tar ' + reference_file + ' *.amb' + ' *.ann' + ' *.bwt' + ' *.pac' + ' *.sa' \n return ' ; ' + tar_cmd\n }\n}" } } ], "sbg:job": { "inputs": { "total_memory": null, "reference": { "class": "File", "path": "/path/to/the/reference.fasta", "secondaryFiles": [ { "path": ".amb" }, { "path": ".ann" }, { "path": ".bwt" }, { "path": ".pac" }, { "path": ".sa" } ], "size": 0 }, "prefix_of_the_index_to_be_output": "prefix", "bwt_construction": "bwtsw", "block_size": 0, "add_64_to_fasta_name": true }, "allocatedResources": { "mem": 1536, "cpu": 1 } }, "sbg:categories": [ "Indexing", "FASTA-Processing" ], "sbg:cmdPreview": "/opt/bwa-0.7.13/bwa index reference.fasta -a bwtsw -6 ; tar -cf reference.fasta.tar reference.fasta *.amb *.ann *.bwt *.pac *.sa", "sbg:contributors": [ "vladimirk" ], "sbg:createdBy": "vladimirk", "sbg:createdOn": 1458658817, "sbg:id": "admin/sbg-public-data/bwa-index/32", "sbg:image_url": null, "sbg:latestRevision": 15, "sbg:license": "GNU Affero General Public License v3.0, MIT License", "sbg:links": [ { "id": "http://bio-bwa.sourceforge.net/", "label": "Homepage" }, { "id": "https://github.com/lh3/bwa", "label": "Source code" }, { "id": "http://bio-bwa.sourceforge.net/bwa.shtml", "label": "Wiki" }, { "id": "http://sourceforge.net/projects/bio-bwa/", "label": "Download" }, { "id": "http://www.ncbi.nlm.nih.gov/pubmed/19451168", "label": "Publication" } ], "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1469449858, "sbg:project": "vladimirk/bwa-mem-bundle-0-7-13-demo", "sbg:revision": 15, "sbg:revisionsInfo": [ { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1458658817, "sbg:revision": 0, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1458658836, "sbg:revision": 1, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1458745340, "sbg:revision": 2, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1460643813, "sbg:revision": 3, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1462801955, "sbg:revision": 4, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1465227109, "sbg:revision": 5, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1465231882, "sbg:revision": 6, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1465990497, "sbg:revision": 7, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1465992672, "sbg:revision": 8, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1465993183, "sbg:revision": 9, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1465994793, "sbg:revision": 10, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1466070064, "sbg:revision": 11, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1466071727, "sbg:revision": 12, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1466072504, "sbg:revision": 13, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1466077580, "sbg:revision": 14, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1469449858, "sbg:revision": 15, "sbg:revisionNotes": null } ], "sbg:sbgMaintained": false, "sbg:toolAuthor": "Heng Li", "sbg:toolkit": "BWA", "sbg:toolkitVersion": "0.7.13", "sbg:validationErrors": [], "x": 580.3333612283075, "y": 28.333334326744097 }, "label": "BWA INDEX", "sbg:x": 580.3333612283075, "sbg:y": 28.333334326744097 }, { "id": "#SnpEff", "inputs": [ { "id": "#SnpEff.variants_file", "source": [ "#GATK_ApplyRecalibration_1.vcf" ] }, { "id": "#SnpEff.total_memory", "default": 10 }, { "id": "#SnpEff.threads", "default": true }, { "id": "#SnpEff.output_format", "default": "vcf" }, { "id": "#SnpEff.database", "source": [ "#database_1" ] } ], "outputs": [ { "id": "#SnpEff.summary_text" }, { "id": "#SnpEff.summary" }, { "id": "#SnpEff.annotated" } ], "run": { "cwlVersion": "sbg:draft-2", "class": "CommandLineTool", "id": "admin/sbg-public-data/snpeff-4-2/36", "label": "SnpEff", "description": "SnpEff is a variant annotation and effect prediction​ tool. It annotates and predicts the effects of variants on genes, such as amino acid changes.\n\nTypical usage assumes the user chooses inputs that are predicted variants (SNPs, insertions, deletions, and MNPs). This input file is usually the result of a sequencing experiment, and it is usually in variant call format (VCF). SnpEff analyzes the input variants and, in the process, it annotates the variants and calculates the effects they produce on known genes (e.g. amino acid changes). The output file can be in several file formats. The most common format is VCF.\n\nThere is also a command line option to control the amount of RAM in MB [-Xmx%m] for java, which is a custom parameter.\nCommon issues:\n- Name of the snpEff file with database must contain the name that exactly relates to the reference that it is for (e.g. GRCh37.75.zip, hg19.zip).", "baseCommand": [ { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n return 'unzip -o ' + $job.inputs.database.path + ' -d /opt/snpEff ;'\n}" }, "java", { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n //java -Xmx4g path/to/snpEff/snpEff.jar -c path/to/snpEff/snpEff.config GRCh37.75 path/to/snps.vcf\n if($job.inputs.total_memory){\n mem_mb = parseInt($job.inputs.total_memory) * 1024\n \treturn '-Xmx'.concat(mem_mb, 'M')\n }\n \treturn '-Xmx4096M'\n}\n\n" }, "-jar", "/opt/snpEff/snpEff.jar" ], "inputs": [ { "required": true, "sbg:category": "File type inputs", "type": [ "File" ], "inputBinding": { "position": 2001, "separate": true, "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Input variants file", "description": "Input variants file.", "sbg:fileTypes": "VCF, TXT, PILEUP, BED", "id": "#variants_file" }, { "sbg:category": "Database optins", "sbg:stageInput": null, "type": [ "null", "int" ], "inputBinding": { "position": 490, "prefix": "-upDownStreamLen", "separate": true, "sbg:cmdInclude": true }, "label": "Upstream downstream interval length", "description": "Set upstream downstream interval length (in bases).", "id": "#up_down_stream_len" }, { "sbg:category": "Other input types", "sbg:toolDefaultValue": "2048", "type": [ "null", "int" ], "label": "Java Xmx%m requirement [Gb]", "description": "RAM requirement for the java process execution [Gb].", "id": "#total_memory" }, { "sbg:category": "Other input types", "sbg:toolDefaultValue": "FALSE", "type": [ "null", "boolean" ], "inputBinding": { "position": 280, "prefix": "-t", "separate": true, "sbg:cmdInclude": true }, "label": "Use multiple threads (implies '-noStats')", "description": "Use multiple threads (implies '-noStats'). Default: False.", "id": "#threads" }, { "sbg:category": "Database optins", "sbg:stageInput": null, "type": [ "null", "boolean" ], "inputBinding": { "position": 480, "prefix": "-strict", "separate": true, "sbg:cmdInclude": true }, "label": "Only use validated transcripts", "description": "Only use 'validated' transcripts (i.e. sequence has been checked). Default: false.", "id": "#strict" }, { "sbg:altPrefix": "-s", "sbg:category": "Other input types", "sbg:toolDefaultValue": "snpEff_summary.html", "type": [ "null", "string" ], "inputBinding": { "position": 60, "prefix": "-stats", "separate": true, "sbg:cmdInclude": true }, "label": "Name of stats file (summary)", "description": "Name of stats file (summary).", "id": "#stats" }, { "sbg:altPrefix": "-ss", "sbg:category": "Other input types", "sbg:toolDefaultValue": "2", "type": [ "null", "int" ], "inputBinding": { "position": 430, "prefix": "--spliceSiteSize", "separate": true, "sbg:cmdInclude": true }, "label": "Set size for splice sites (donor and acceptor) in bases", "description": "Set size for splice sites (donor and acceptor) in bases.", "id": "#splicesitesize" }, { "sbg:category": "Database options", "sbg:toolDefaultValue": "3", "type": [ "null", "int" ], "inputBinding": { "position": 440, "prefix": "-spliceRegionExonSize", "separate": true, "sbg:cmdInclude": true }, "label": "Set size for splice site region within exons", "description": "Set size for splice site region within exons. Default: 3 bases.", "id": "#splice_region_exons_size" }, { "sbg:category": "Annotations options", "sbg:toolDefaultValue": "FALSE", "type": [ "null", "boolean" ], "inputBinding": { "position": 250, "prefix": "-sequenceOntology", "separate": true, "sbg:cmdInclude": true }, "label": "Use Sequence Ontology terms", "description": "Use Sequence Ontology terms. Default: false.", "id": "#sequenceontology" }, { "sbg:category": "Other input types", "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 420, "prefix": "-reg", "separate": true, "sbg:cmdInclude": true }, "label": "Regulation track to use (this option can be used add several times)", "description": "Regulation track to use (this option can be used add several times).", "id": "#reg" }, { "sbg:category": "Other input types", "sbg:toolDefaultValue": "vcf", "type": [ "null", { "type": "enum", "symbols": [ "txt", "vcf", "gatk", "bed", "bedAnn" ], "name": "output_format" } ], "inputBinding": { "position": 50, "prefix": "-o", "separate": true, "sbg:cmdInclude": true }, "label": "Ouput format", "description": "Ouput format. Possible values: {txt, vcf, gatk, bed, bedAnn}.", "id": "#output_format" }, { "sbg:category": "Other input types", "type": [ "null", "boolean" ], "inputBinding": { "position": 400, "prefix": "--onlyReg", "separate": true, "sbg:cmdInclude": true }, "label": "Only use regulation tracks", "description": "Only use regulation tracks.", "id": "#onlyreg" }, { "sbg:category": "Database optins", "sbg:toolDefaultValue": "FALSE", "type": [ "null", "boolean" ], "inputBinding": { "position": 410, "prefix": "-onlyProtein", "separate": true, "sbg:cmdInclude": true }, "label": "Only protein", "description": "Only use protein coding transcripts. Default: false.", "id": "#only_protein" }, { "sbg:category": "Annotations options", "sbg:toolDefaultValue": "FALSE", "type": [ "null", "boolean" ], "inputBinding": { "position": 240, "prefix": "-oicr", "separate": true, "sbg:cmdInclude": true }, "label": "Add OICR tag in VCF file", "description": "Add OICR tag in VCF file. Default: false.", "id": "#oicr" }, { "sbg:category": "Other input types", "type": [ "null", "boolean" ], "inputBinding": { "position": 70, "prefix": "-noStats", "separate": true, "sbg:cmdInclude": true }, "label": "Do not create stats (summary) file", "description": "Do not create stats (summary) file.", "id": "#nostats" }, { "sbg:category": "Results filter options", "type": [ "null", "boolean" ], "inputBinding": { "position": 130, "prefix": "-no-utr", "separate": true, "sbg:cmdInclude": true }, "label": "Do not show 5_PRIME_UTR or 3_PRIME_UTR changes", "description": "Do not show 5_PRIME_UTR or 3_PRIME_UTR changes.", "id": "#no_utr" }, { "sbg:category": "Results filter options", "type": [ "null", "boolean" ], "inputBinding": { "position": 120, "prefix": "-no-upstream", "separate": true, "sbg:cmdInclude": true }, "label": "Do not show UPSTREAM changes", "description": "Do not show UPSTREAM changes.", "id": "#no_upstream" }, { "sbg:category": "Annotations options", "sbg:stageInput": null, "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 230, "prefix": "-noShiftHgvs", "separate": true, "sbg:cmdInclude": true }, "label": "Do not shift variants according to HGVS", "description": "Do not shift variants according to HGVS notation (most 3prime end).", "id": "#no_shift_hgvs" }, { "sbg:category": "Database options", "sbg:stageInput": null, "type": [ "null", "boolean" ], "inputBinding": { "position": 390, "prefix": "-noNextProt", "separate": true, "sbg:cmdInclude": true }, "label": "Disable NextProt annotations", "description": "Disable NextProt annotations.", "id": "#no_next_prot" }, { "sbg:category": "Database options", "sbg:stageInput": null, "type": [ "null", "boolean" ], "inputBinding": { "position": 380, "prefix": "-noMotif", "separate": true, "sbg:cmdInclude": true }, "label": "Disable motif annotations", "description": "Disable motif annotations.", "id": "#no_motif" }, { "sbg:category": "Annotations options", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 220, "prefix": "-noLof", "separate": true, "sbg:cmdInclude": true }, "label": "Do not add LOF and NMD annotations", "description": "Do not add LOF and NMD annotations.", "id": "#no_lof" }, { "sbg:category": "Results filter options", "type": [ "null", "boolean" ], "inputBinding": { "position": 110, "prefix": "-no-intron", "separate": true, "sbg:cmdInclude": true }, "label": "Do not show INTRON changes", "description": "Do not show INTRON changes.", "id": "#no_intron" }, { "sbg:category": "Results filter options", "type": [ "null", "boolean" ], "inputBinding": { "position": 100, "prefix": "-no-intergenic", "separate": true, "sbg:cmdInclude": true }, "label": "Do not show INTERGENIC changes", "description": "Do not show INTERGENIC changes.", "id": "#no_intergenic" }, { "sbg:category": "Database options", "sbg:toolDefaultValue": "FALSE", "type": [ "null", "boolean" ], "inputBinding": { "position": 370, "prefix": "-noInteraction", "separate": true, "sbg:cmdInclude": true }, "label": "Disable interaction annotations", "description": "Disable interaction annotations. Default: false.", "id": "#no_interaction" }, { "sbg:category": "Annotations options", "sbg:stageInput": null, "type": [ "null", "boolean" ], "inputBinding": { "position": 210, "prefix": "-noHgvs", "separate": true, "sbg:cmdInclude": true }, "label": "Do not add HGVS annotations", "description": "Do not add HGVS annotations.", "id": "#no_hgvs" }, { "sbg:category": "Database options", "sbg:stageInput": null, "type": [ "null", "boolean" ], "inputBinding": { "position": 360, "prefix": "-noGenome", "separate": true, "sbg:cmdInclude": true }, "label": "Do not load any genomic database", "description": "Do not load any genomic database (e.g. annotate using custom files).", "id": "#no_genome" }, { "sbg:category": "Results filter options", "type": [ "null", "boolean" ], "inputBinding": { "position": 90, "prefix": "-no-downstream", "separate": true, "sbg:cmdInclude": true }, "label": "Do not show DOWNSTREAM changes", "description": "Do not show DOWNSTREAM changes.", "id": "#no_downstream" }, { "sbg:category": "Other input types", "type": [ "null", "boolean" ], "inputBinding": { "position": 350, "prefix": "-nextProt", "separate": true, "sbg:cmdInclude": true }, "label": "Annotate using NextProt (requires NextProt database)", "description": "Annotate using NextProt (requires NextProt database).", "id": "#nextprot" }, { "sbg:category": "Other input types", "type": [ "null", "boolean" ], "inputBinding": { "position": 340, "prefix": "--motif", "separate": true, "sbg:cmdInclude": true }, "label": "Annotate using motifs (requires Motif database)", "description": "Annotate using motifs (requires Motif database).", "id": "#motif" }, { "sbg:category": "Database options", "sbg:stageInput": null, "type": [ "null", "int" ], "inputBinding": { "position": 330, "prefix": "-maxTSL", "separate": true, "sbg:cmdInclude": true }, "label": "Max TSL", "description": "Only use transcripts having Transcript Support Level lower than .", "id": "#max_tsl" }, { "required": false, "sbg:category": "Other input types", "sbg:stageInput": "link", "type": [ "null", "File" ], "inputBinding": { "position": 320, "prefix": "-interval", "separate": true, "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Use a custom intervals in TXT/BED/BigBed/VCF/GFF file (you may use this option many times)", "description": "Use a custom intervals in TXT/BED/BigBed/VCF/GFF file (you may use this option many times).", "sbg:fileTypes": "interval", "id": "#interval" }, { "sbg:category": "Database options", "sbg:toolDefaultValue": "TRUE", "type": [ "null", "boolean" ], "inputBinding": { "position": 310, "prefix": "-interaction", "separate": true, "sbg:cmdInclude": true }, "label": "Annotate using interactions", "description": "Annotate using inteactions (requires interaciton database). Default: true.", "id": "#interaction" }, { "sbg:category": "Other input types", "sbg:toolDefaultValue": "vcf", "type": [ "null", { "type": "enum", "symbols": [ "vcf", " txt", " pileup", " bed" ], "name": "input_format" } ], "inputBinding": { "position": 40, "prefix": "-i", "separate": true, "sbg:cmdInclude": true }, "label": "Input format", "description": "Input format. Possible values: {vcf, txt, pileup, bed}. [Default: vcf].", "id": "#input_format" }, { "sbg:category": "Configuration", "type": [ "null", "boolean" ], "inputBinding": { "position": 190, "prefix": "-hgvsTrId", "separate": true, "sbg:cmdInclude": true }, "label": "Use transcript ID in HGVS", "description": "Use transcript ID in HGVS notation. Default: false.", "id": "#hgvs_tr_id" }, { "sbg:category": "Annotations options", "sbg:stageInput": null, "type": [ "null", "boolean" ], "inputBinding": { "position": 180, "prefix": "-hgvs1LetterAa", "separate": true, "sbg:cmdInclude": true }, "label": "Use one letter Amino acid codes in HGVS", "description": "Use one letter Amino acid codes in HGVS notation. Default: false.", "id": "#hgvs_1_letter" }, { "sbg:category": "Annotations options", "sbg:toolDefaultValue": "FALSE", "type": [ "null", "boolean" ], "inputBinding": { "position": 170, "prefix": "-geneId", "separate": true, "sbg:cmdInclude": true }, "label": "Use gene ID instead of gene name (VCF output)", "description": "Use gene ID instead of gene name (VCF output). Default: false.", "id": "#geneid" }, { "sbg:category": "Annotations options", "type": [ "null", "boolean" ], "inputBinding": { "position": 160, "prefix": "-formatEff", "separate": true, "sbg:cmdInclude": true }, "label": "Use EFF field", "description": "Use 'EFF' field compatible with older versions (instead of 'ANN').", "id": "#format_eff" }, { "required": false, "sbg:altPrefix": "-fi", "sbg:category": "Results filter options", "type": [ "null", { "type": "array", "items": "File" } ], "inputBinding": { "position": 80, "prefix": "--filterInterval", "separate": true, "sbg:cmdInclude": true }, "label": "Only analyze changes that intersect with the intervals specified in this file (you may use this option many times)", "description": "Only analyze changes that intersect with the intervals specified in this file.", "sbg:fileTypes": "interval", "id": "#filterinterval" }, { "required": true, "sbg:category": "File type inputs", "type": [ "File" ], "inputBinding": { "position": 2000, "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n input_database = $job.inputs.database\n full_name = input_database.path.split('/')[input_database.path.split('/').length-1] \n name = full_name.slice(0, -4) // Cut .zip extension\n return name\n}" }, "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "SnpEff database file", "description": "SnpEff database file is zip archive that can be downloaded from the SnpEff official site, or using the SnpEff download app.", "sbg:fileTypes": "ZIP", "id": "#database" }, { "sbg:category": "Other input types", "type": [ "null", "boolean" ], "inputBinding": { "position": 20, "prefix": "-csvStats", "separate": true, "sbg:cmdInclude": true }, "label": "Create CSV summary file instead of HTML", "description": "Create CSV summary file instead of HTML.", "id": "#csvstats" }, { "required": false, "sbg:category": "Generic options", "sbg:stageInput": "link", "type": [ "null", "File" ], "inputBinding": { "position": 260, "prefix": "-c", "separate": true, "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Configuration file", "description": "Specify config file.", "sbg:fileTypes": "config", "id": "#configuration_file" }, { "sbg:category": "General options", "type": [ "null", "boolean" ], "inputBinding": { "position": 10, "prefix": "--classic", "separate": true, "sbg:cmdInclude": true }, "label": "Use old style annotations", "description": "Use old style annotations instead of Sequence Ontology and Hgvs.", "id": "#classic" }, { "sbg:category": "Database options", "type": [ "null", "boolean" ], "inputBinding": { "position": 300, "prefix": "-canon", "separate": true, "sbg:cmdInclude": true }, "label": "Only use canonical transcripts", "description": "Only use canonical transcripts.", "id": "#canon" }, { "required": false, "sbg:category": "Annotations options", "type": [ "null", "File" ], "inputBinding": { "position": 150, "prefix": "-cancerSamples", "separate": true, "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Two column TXT file defining 'original and derived' samples", "description": "Two column TXT file defining 'original \\t derived' samples.", "sbg:fileTypes": "TXT", "id": "#cancersamples" }, { "sbg:category": "Annotations options", "sbg:toolDefaultValue": "FALSE", "type": [ "null", "boolean" ], "inputBinding": { "position": 140, "prefix": "-cancer", "separate": true, "sbg:cmdInclude": true }, "label": "Perform 'cancer' comparisons (Somatic vs Germline)", "description": "Perform 'cancer' comparisons (Somatic vs Germline).", "id": "#cancer" }, { "sbg:category": "Other input types", "type": [ "null", "boolean" ], "inputBinding": { "position": 200, "prefix": "-lof", "separate": true, "sbg:cmdInclude": true }, "label": "Add loss of function (LOF) and Nonsense mediated decay (NMD) tags", "description": "Add loss of function (LOF) and Nonsense mediated decay (NMD) tags.", "id": "#add_lof_tag" }, { "sbg:category": "Annotations options", "sbg:toolDefaultValue": "TRUE", "type": [ "null", "boolean" ], "inputBinding": { "position": 170, "prefix": "--hgvs", "separate": true, "sbg:cmdInclude": true }, "label": "Use HGVS annotations for amino acid sub-field", "description": "Use HGVS annotations for amino acid sub-field. Default: true.", "id": "#add_hgvs_anno" } ], "outputs": [ { "type": [ "null", "File" ], "label": "Summary", "description": "SnpEff Summary in text format.", "sbg:fileTypes": "TXT", "outputBinding": { "glob": "*.txt", "sbg:inheritMetadataFrom": "#variants_file" }, "id": "#summary_text" }, { "type": [ "null", "File" ], "label": "Summary file", "description": "SnpEff summary file in HTML or CSV file format.", "sbg:fileTypes": "HTML, CSV", "outputBinding": { "glob": "*.html", "sbg:inheritMetadataFrom": "#variants_file" }, "id": "#summary" }, { "type": [ "null", "File" ], "label": "SnpEff Annotated file", "description": "SnpEff Annotated file.", "sbg:fileTypes": "VCF, TXT, GATK, BED, BEDANN", "outputBinding": { "glob": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\t\n filename = $job.inputs.variants_file.path\n basename = filename.split('.').slice(0, filename.split('.').length-1).join('.').replace(/^.*[\\\\\\/]/, '')\n \n \n if($job.inputs.output_format === \"txt\")\n {\n return basename.concat(\".snpEff_annotated.txt\")\n }\n else if ($job.inputs.output_format === \"bed\" || $job.inputs.output_format === \"bedAnn\")\n {\n return basename.concat(\".snpEff_annotated.bed\")\n }\n else\n {\n return basename.concat(\".snpEff_annotated.vcf\")\n }\n}" }, "sbg:inheritMetadataFrom": "#variants_file" }, "id": "#annotated" } ], "requirements": [ { "class": "ExpressionEngineRequirement", "id": "#cwl-js-engine", "requirements": [ { "class": "DockerRequirement", "dockerPull": "rabix/js-engine" } ] } ], "hints": [ { "class": "sbg:CPURequirement", "value": 1 }, { "class": "sbg:MemRequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if ($job.inputs.total_memory)\n {\n return $job.inputs.total_memory * 1024\n }\n \n else\n {\n return 4096\n }\n}" } }, { "class": "DockerRequirement", "dockerImageId": "aae3dcb89b53", "dockerPull": "images.sbgenomics.com/vladimirk/snpeff:4.2" } ], "arguments": [ { "position": 5000, "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\t\n filename = $job.inputs.variants_file.path\n basename = filename.split('.').slice(0, filename.split('.').length-1).join('.').replace(/^.*[\\\\\\/]/, '')\n \n \n if($job.inputs.output_format === \"txt\")\n {\n name = basename.concat(\".snpEff_annotated.txt\")\n }\n else if ($job.inputs.output_format === \"bed\" || $job.inputs.output_format === \"bedAnn\")\n {\n name = basename.concat(\".snpEff_annotated.bed\")\n }\n else\n {\n name = basename.concat(\".snpEff_annotated.vcf\")\n }\n return '> ' + name\n}\n" } }, { "position": 0, "separate": true, "valueFrom": "-nodownload" }, { "position": 0, "separate": true, "valueFrom": "-noLog" } ], "sbg:job": { "inputs": { "variants_file": { "class": "File", "path": "path/to/variance/varinats_file.vcf", "secondaryFiles": [], "size": 0 }, "up_down_stream_len": null, "total_memory": 3, "threads": true, "strict": false, "stats": "", "splicesitesize": null, "splice_region_exons_size": null, "sequenceontology": null, "reg": [], "output_format": "vcf", "onlyreg": null, "only_protein": false, "oicr": false, "nostats": false, "no_utr": null, "no_upstream": null, "no_shift_hgvs": false, "no_next_prot": false, "no_motif": false, "no_lof": false, "no_intron": null, "no_intergenic": null, "no_interaction": false, "no_hgvs": false, "no_genome": false, "no_downstream": null, "nextprot": null, "motif": null, "max_tsl": null, "interval": { "class": "File", "path": "/path/to/interval.ext", "secondaryFiles": [], "size": 0 }, "interaction": false, "input_format": "vcf", "hgvs_tr_id": false, "hgvs_1_letter": false, "geneid": null, "format_eff": false, "filterinterval": [], "database": { "metadata": {}, "path": "/path/to/database/GRCh37.75.zip", "secondaryFiles": [] }, "csvstats": false, "configuration_file": { "class": "File", "path": null, "secondaryFiles": [], "size": 0 }, "classic": false, "canon": null, "cancersamples": null, "cancer": null, "add_lof_tag": null, "add_hgvs_anno": null }, "allocatedResources": { "mem": 3072, "cpu": 1 } }, "sbg:categories": [ "Annotation", "VCF-Processing" ], "sbg:cmdPreview": "unzip -o /path/to/database/GRCh37.75.zip -d /opt/snpEff ; java -Xmx3072M -jar /opt/snpEff/snpEff.jar -nodownload -noLog GRCh37.75 path/to/variance/varinats_file.vcf > varinats_file.snpEff_annotated.vcf", "sbg:contributors": [ "vladimirk" ], "sbg:createdBy": "vladimirk", "sbg:createdOn": 1459258963, "sbg:id": "admin/sbg-public-data/snpeff-4-2/36", "sbg:image_url": null, "sbg:latestRevision": 23, "sbg:license": "GNU Lesser General Public License v3.0 only", "sbg:links": [ { "id": "http://snpeff.sourceforge.net/index.html", "label": "Homepage" }, { "id": "https://github.com/pcingola/SnpEff", "label": "Source Code" }, { "id": "http://snpeff.sourceforge.net/SnpEff_manual.html", "label": "Wiki" }, { "id": "http://sourceforge.net/projects/snpeff/files/snpEff_latest_core.zip", "label": "Download" }, { "id": "http://snpeff.sourceforge.net/SnpEff_paper.pdf", "label": "Publication" } ], "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1472212308, "sbg:project": "vladimirk/snpeff-4-2-demo", "sbg:revision": 23, "sbg:revisionNotes": "typos", "sbg:revisionsInfo": [ { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1459258963, "sbg:revision": 0, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1459268644, "sbg:revision": 1, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1459334075, "sbg:revision": 2, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1459344734, "sbg:revision": 3, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1459346778, "sbg:revision": 4, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1459349594, "sbg:revision": 5, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1459350669, "sbg:revision": 6, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1459352131, "sbg:revision": 7, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1459353590, "sbg:revision": 8, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1460986857, "sbg:revision": 9, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1460989537, "sbg:revision": 10, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1460994055, "sbg:revision": 11, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1461079628, "sbg:revision": 12, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1461145387, "sbg:revision": 13, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1461146419, "sbg:revision": 14, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1461857375, "sbg:revision": 15, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1461857546, "sbg:revision": 16, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1464273953, "sbg:revision": 17, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1464279085, "sbg:revision": 18, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1464625627, "sbg:revision": 19, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1471007312, "sbg:revision": 20, "sbg:revisionNotes": "typos" }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1472209344, "sbg:revision": 21, "sbg:revisionNotes": "Peer-review comments implemented." }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1472209772, "sbg:revision": 22, "sbg:revisionNotes": "Peer-review comments and typos implemented." }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1472212308, "sbg:revision": 23, "sbg:revisionNotes": "typos" } ], "sbg:sbgMaintained": false, "sbg:toolAuthor": "Pablo Cingolani/Broad Institue", "sbg:toolkit": "SnpEff", "sbg:toolkitVersion": "4.2", "sbg:validationErrors": [], "x": 3451.667128947061, "y": 269.4271676577653 }, "label": "SnpEff", "sbg:x": 3451.667128947061, "sbg:y": 269.4271676577653 }, { "id": "#Sambamba_View", "inputs": [ { "id": "#Sambamba_View.output", "default": "bam" }, { "id": "#Sambamba_View.nthreads", "default": 7 }, { "id": "#Sambamba_View.input", "source": [ "#BWA_MEM_Bundle_0_7_13.aligned_reads" ] }, { "id": "#Sambamba_View.filter", "default": "unmapped and mate_is_unmapped" } ], "outputs": [ { "id": "#Sambamba_View.filtered" } ], "run": { "cwlVersion": "sbg:draft-2", "class": "CommandLineTool", "id": "admin/sbg-public-data/sambamba-view-0-5-9/9", "label": "Sambamba View", "description": "Sambamba View efficiently filters a BAM file for alignments satisfying various conditions. It also accesses its SAM header and information about reference sequences. A JSON output is provided to make this data readily available for use with Perl, Python, and Ruby scripts.\n\nBy default, the tool expects a BAM file as an input. In order to work with a SAM file as an input, specify --sam-input command-line option. The tool does NOT automatically detect file format from its extension. Beware that when reading SAM, the tool will skip tags which don't conform to the SAM/BAM specification and set invalid fields to their default values. However, only syntax is checked, use --valid for full validation.", "baseCommand": [ "/opt/sambamba_0.5.9/sambamba_v0.5.9", "view" ], "inputs": [ { "sbg:altPrefix": "h", "sbg:category": "Execution", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--with-header", "separate": true, "sbg:cmdInclude": true }, "label": "With header", "description": "Print header before reads (always done for BAM output).", "id": "#with_header" }, { "sbg:category": "Execution", "type": [ "null", "boolean" ], "label": "Valid", "description": "Output only valid reads.", "id": "#valid" }, { "sbg:category": "Execution", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--subsampling-seed", "separate": true, "sbg:cmdInclude": true }, "label": "Subsampling seed", "description": "Set seed for subsampling.", "id": "#subsampling_seed" }, { "sbg:altPrefix": "s", "sbg:category": "Execution", "sbg:stageInput": null, "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--subsample=", "separate": true, "sbg:cmdInclude": true }, "label": "Subsample", "description": "Subsample reads (read pairs).", "id": "#subsample" }, { "sbg:altPrefix": "S", "sbg:category": "Execution", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--sam-input", "separate": true, "sbg:cmdInclude": true }, "label": "SAM input", "description": "Specify that input is in SAM format.", "id": "#sam_input" }, { "sbg:category": "Execution", "sbg:stageInput": null, "sbg:toolDefaultValue": "1", "type": [ "null", "int" ], "label": "Number of threads reserved on the instance", "description": "Number of threads reserved on the instance passed to the scheduler (number of jobs).", "id": "#reserved_threads" }, { "required": false, "sbg:altPrefix": "L", "sbg:category": "File input.", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--regions=", "separate": false, "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Regions", "description": "Output only reads overlapping one of regions from the BED file.", "sbg:fileTypes": "BED", "id": "#regions" }, { "required": false, "sbg:altPrefix": "T", "sbg:category": "Execution", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--ref-filename=", "separate": false, "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Reference", "description": "Specify reference for writing CRAM.", "sbg:fileTypes": "FASTA, FA", "id": "#ref_filename" }, { "sbg:altPrefix": "-f", "sbg:category": "Execution", "type": [ { "type": "enum", "symbols": [ "sam", "bam", "cram", "json" ], "name": "output" } ], "inputBinding": { "position": 1, "prefix": "--format=", "separate": false, "sbg:cmdInclude": true }, "label": "Output format", "description": "Specify which format to use for output (default is SAM).", "id": "#output" }, { "sbg:altPrefix": "-t", "sbg:category": "Execution", "sbg:toolDefaultValue": "8", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--nthreads=", "separate": false, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if ($job.inputs.nthreads)\n return $job.inputs.nthreads\n else\n return 8\n}" }, "sbg:cmdInclude": true }, "label": "Number of threads", "description": "Number of threads to use.", "id": "#nthreads" }, { "sbg:category": "Execution", "sbg:stageInput": null, "sbg:toolDefaultValue": "1024", "type": [ "null", "int" ], "label": "Memory in MB", "description": "Memory in MB.", "id": "#mem_mb" }, { "required": true, "sbg:category": "Inputs", "type": [ "File" ], "inputBinding": { "position": 2, "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Input", "description": "BAM or SAM file.", "sbg:fileTypes": "BAM, SAM", "id": "#input" }, { "sbg:altPrefix": "-F", "sbg:category": "Basic Options", "type": [ "null", "string" ], "inputBinding": { "position": 0, "prefix": "--filter", "separate": true, "itemSeparator": " ", "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if ($job.inputs.filter)\n {\n \treturn '\"'.concat($job.inputs.filter, '\"')\n }\n}" }, "sbg:cmdInclude": true }, "label": "Filter", "description": "Set custom filter for alignments.", "id": "#filter" }, { "sbg:category": "Execution", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--cram-input", "separate": true, "sbg:cmdInclude": true }, "label": "CRAM input", "description": "Specify that input is in CRAM format.", "id": "#cram_input" }, { "sbg:altPrefix": "c", "sbg:category": "Execution", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--count", "separate": true, "sbg:cmdInclude": true }, "label": "Count", "description": "Output to stdout only count of matching records, hHI are ignored.", "id": "#count" }, { "sbg:altPrefix": "l", "sbg:category": "Execution", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--compression-level", "separate": true, "sbg:cmdInclude": true }, "label": "Compression level", "description": "Specify compression level (from 0 to 9, works only for BAM output).", "id": "#compression_level" } ], "outputs": [ { "type": [ "null", "File" ], "label": "BAM file", "description": "Bam file.", "sbg:fileTypes": "BAM, SAM, JSON, MSGPACK", "outputBinding": { "glob": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n fnameRegex = /^(.*?)(?:\\.([^.]+))?$/;\n file_path = $job.inputs.input.path;\n base_name = fnameRegex.exec(file_path)[1];\n file_name = base_name.replace(/^.*[\\\\\\/]/, '');\n \n if ($job.inputs.output == 'sam'){\n \treturn file_name + '.filtered.sam'\n }\n else if ($job.inputs.output == 'bam'){\n \treturn file_name.concat('.filtered.bam')\n }\n else if ($job.inputs.output == 'json'){\n \treturn file_name.concat('.filtered.json')\n }\n else if ($job.inputs.output == 'msgpack'){\n \treturn file_name.concat('.filtered.msgpack')\n }\n else\t{\n \treturn file_name + '.filtered.sam'\n }\n}" }, "sbg:inheritMetadataFrom": "#input" }, "id": "#filtered" } ], "requirements": [ { "class": "ExpressionEngineRequirement", "id": "#cwl-js-engine", "requirements": [ { "class": "DockerRequirement", "dockerPull": "rabix/js-engine" } ] } ], "hints": [ { "class": "sbg:CPURequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if ($job.inputs.reserved_threads) {\n \n return $job.inputs.reserved_threads\n \n } else if ($job.inputs.nthreads) {\n \n return $job.inputs.nthreads\n \n } else {\n \n return 1\n }\n \n}" } }, { "class": "sbg:MemRequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if ($job.inputs.mem_mb) {\n \n return $job.inputs.mem_mb\n \n } else {\n \n return 1024\n \n }\n \n}" } }, { "class": "DockerRequirement", "dockerImageId": "59e577b13d5d", "dockerPull": "images.sbgenomics.com/mladenlsbg/sambamba:0.5.9" } ], "arguments": [ { "position": 3, "prefix": "-o", "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n fnameRegex = /^(.*?)(?:\\.([^.]+))?$/;\n if ($job.inputs.input) \n {\n \tfile_path = $job.inputs.input.path;\n \tbase_name = fnameRegex.exec(file_path)[1];\n \tfile_name = base_name.replace(/^.*[\\\\\\/]/, '');\n \n if ($job.inputs.output == 'sam'){\n \treturn file_name + '.filtered.sam'\n }\n else if ($job.inputs.output == 'bam'){\n \treturn file_name.concat('.filtered.bam')\n }\n else if ($job.inputs.output == 'json'){\n \treturn file_name.concat('.filtered.json')\n }\n else if ($job.inputs.output == 'msgpack'){\n \treturn file_name.concat('.filtered.msgpack')\n }\n else\t{\n \treturn file_name + '.filtered.sam'\n }\n }\n}" } } ], "sbg:job": { "inputs": { "with_header": null, "valid": null, "subsampling_seed": null, "subsample": 9.236016917973757, "sam_input": null, "reserved_threads": 8, "regions": null, "ref_filename": null, "output": "bam", "nthreads": null, "mem_mb": 7, "input": { "path": "/root/dir/example.bam" }, "filter": "unmapped", "cram_input": null, "count": null, "compression_level": null }, "allocatedResources": { "mem": 7, "cpu": 1 } }, "sbg:categories": [ "SAM/BAM-Processing" ], "sbg:cmdPreview": "/opt/sambamba_0.5.9/sambamba_v0.5.9 view --format=bam /root/dir/example.bam -o example.filtered.bam", "sbg:contributors": [ "vladimirk", "bix-demo", "ognjenm" ], "sbg:createdBy": "bix-demo", "sbg:createdOn": 1450911559, "sbg:id": "admin/sbg-public-data/sambamba-view-0-5-9/9", "sbg:image_url": null, "sbg:latestRevision": 9, "sbg:license": "GNU General Public License v2.0 only", "sbg:links": [ { "id": "http://lomereiter.github.io/sambamba/docs/sambamba-view.html", "label": "Homepage" }, { "id": "https://github.com/lomereiter/sambamba", "label": "Source code" }, { "id": "https://github.com/lomereiter/sambamba/wiki", "label": "Wiki" }, { "id": "https://github.com/lomereiter/sambamba/releases/tag/v0.5.9", "label": "Download" }, { "id": "http://lomereiter.github.io/sambamba/docs/sambamba-view.html", "label": "Publication" } ], "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1476709202, "sbg:project": "bix-demo/sambamba-0-5-9-demo", "sbg:revision": 9, "sbg:revisionNotes": "Added reserved number of threads.", "sbg:revisionsInfo": [ { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911559, "sbg:revision": 0, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911560, "sbg:revision": 1, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911561, "sbg:revision": 2, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911561, "sbg:revision": 3, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "ognjenm", "sbg:modifiedOn": 1470050578, "sbg:revision": 4, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "ognjenm", "sbg:modifiedOn": 1470050707, "sbg:revision": 5, "sbg:revisionNotes": "Added reference file type" }, { "sbg:modifiedBy": "ognjenm", "sbg:modifiedOn": 1470050762, "sbg:revision": 6, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1472475927, "sbg:revision": 7, "sbg:revisionNotes": "\"subsample\" type set to float." }, { "sbg:modifiedBy": "ognjenm", "sbg:modifiedOn": 1475064551, "sbg:revision": 8, "sbg:revisionNotes": "Added resource parameters" }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1476709202, "sbg:revision": 9, "sbg:revisionNotes": "Added reserved number of threads." } ], "sbg:sbgMaintained": false, "sbg:toolAuthor": "Artem Tarasov", "sbg:toolkit": "Sambamba", "sbg:toolkitVersion": "0.5.9", "sbg:validationErrors": [], "x": 1760.0002332263598, "y": -346.66673050986617 }, "label": "Sambamba View", "scatter": "#Sambamba_View.input", "sbg:x": 1760.0002332263598, "sbg:y": -346.66673050986617 }, { "id": "#Sambamba_Merge", "inputs": [ { "id": "#Sambamba_Merge.reserved_threads", "default": 7 }, { "id": "#Sambamba_Merge.num_of_threads", "default": 7 }, { "id": "#Sambamba_Merge.bams", "source": [ "#GATK_PrintReads.recalibrated_bam", "#Sambamba_View.filtered" ] } ], "outputs": [ { "id": "#Sambamba_Merge.merged_bam" } ], "run": { "cwlVersion": "sbg:draft-2", "class": "CommandLineTool", "id": "admin/sbg-public-data/sambamba-merge-0-5-9/18", "label": "Sambamba Merge", "description": "Sambamba Merge is used for merging several sorted BAM files into one. The sorting order of all the files must be the same, and it is maintained in the output file.", "baseCommand": [ { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if ($job.inputs.bams instanceof Array) { // VK\n if ($job.inputs.bams[0] instanceof Array) {\n \n // Support for input received as list of one-element-lists \n in_var = []\n for (i=0;i<$job.inputs.bams.length;i++) \n \t\tin_var = in_var.concat($job.inputs.bams[i]);\n \n } else {\n in_var = [].concat($job.inputs.bams)\n }\n \n \n } else {\n in_var = [].concat($job.inputs.bams)\n }\n comm=''\n if(in_var instanceof Array) // Always true\n {\n if(in_var.length==1)\n {\n comm+='cp '\n \n }\n \telse if(in_var.length>1)\n \t{\n \n \tcomm+='/opt/sambamba_0.5.9/sambamba_v0.5.9 merge '\n \tif($job.inputs.num_of_threads)\n \t\t{\n \t\tcomm+=' -t '\n \t\tcomm+=$job.inputs.num_of_threads\n \t\t}\n \t\tif($job.inputs.compression_level)\n \t\t{\n \t\t\tcomm+=' -l '\n \t\tcomm+=$job.inputs.compression_level\n \t\t}\n \t\t\n }\n \n \t\n\n }\n return comm\n}" } ], "inputs": [ { "sbg:category": "Execution", "sbg:toolDefaultValue": "1", "type": [ "null", "int" ], "label": "Number of threads reserved on the instance", "description": "Number of threads reserved on the instance passed to the scheduler (number of jobs).", "id": "#reserved_threads" }, { "sbg:category": "Merge", "type": [ "null", "int" ], "label": "Number of threads to use", "description": "Number of threads to use for compression/decompression.", "id": "#num_of_threads" }, { "sbg:category": "Execution", "sbg:stageInput": null, "sbg:toolDefaultValue": "1024", "type": [ "null", "int" ], "label": "Memory in MB", "description": "Memory in MB.", "id": "#mem_mb" }, { "sbg:category": "Merge", "type": [ "null", "int" ], "label": "Compression level", "description": "Level of compression for merged BAM file, number from 0 to 9.", "id": "#compression_level" }, { "required": true, "sbg:category": "Merge", "type": [ { "type": "array", "items": "File" } ], "inputBinding": { "position": 5, "separate": true, "sbg:cmdInclude": true }, "label": "BAM files", "description": "Input BAM files.", "sbg:fileTypes": "BAM", "id": "#bams" } ], "outputs": [ { "type": [ "null", "File" ], "label": "Merged bam", "description": "Merged bam.", "sbg:fileTypes": "BAM", "outputBinding": { "glob": "*.bam", "sbg:inheritMetadataFrom": "#bams", "secondaryFiles": [ ".bai", "^.bai" ] }, "id": "#merged_bam" } ], "requirements": [ { "class": "ExpressionEngineRequirement", "id": "#cwl-js-engine", "requirements": [ { "class": "DockerRequirement", "dockerPull": "rabix/js-engine" } ] } ], "hints": [ { "class": "sbg:CPURequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if ($job.inputs.reserved_threads) {\n \n return $job.inputs.reserved_threads\n \n } else if ($job.inputs.num_of_threads) {\n \n return $job.inputs.num_of_threads\n \n } else {\n \n return 1\n }\n \n}" } }, { "class": "sbg:MemRequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if ($job.inputs.mem_mb) {\n \n return $job.inputs.mem_mb\n \n } else {\n \n return 1024\n \n }\n \n}" } }, { "class": "DockerRequirement", "dockerPull": "images.sbgenomics.com/mladenlsbg/sambamba:0.5.9" } ], "arguments": [ { "position": 10, "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if ($job.inputs.bams instanceof Array) { // VK\n if ($job.inputs.bams[0] instanceof Array) {\n \n // Support for input received as list of one-element-lists \n in_var = []\n for (i=0;i<$job.inputs.bams.length;i++) \n \t\tin_var = in_var.concat($job.inputs.bams[i]);\n \n } else {\n in_var = [].concat($job.inputs.bams)\n }\n \n \n } else {\n in_var = [].concat($job.inputs.bams)\n }\n \n comm=''\n if(in_var.length==1)\n {\n \tcomm+='. '\n \n \tif(in_var[0].secondaryFiles!=undefined && in_var[0].secondaryFiles.length>0)\n \t{\n \t comm+='| cp '\n \t comm+=in_var[0].secondaryFiles[0].path\n \t comm+=' . '\n \t}\n }\n return comm\n}" } }, { "position": 5, "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n\n function common_end(strs) {\n \n \t// Find minimum length of file name\n \n \tls = [];\n whole = [];\n\tfor (i=0;i0) {\n \tcomstr = strs[0].path.slice(-ind);\n } else {\n comstr = 'different_extensions'\n }\n \n return comstr\n \n }\n \n if ($job.inputs.bams instanceof Array) { // VK\n if ($job.inputs.bams[0] instanceof Array) {\n \n // Support for input received as list of one-element-lists \n in_var = []\n for (i=0;i<$job.inputs.bams.length;i++) \n \t\tin_var = in_var.concat($job.inputs.bams[i]);\n \n } else {\n in_var = [].concat($job.inputs.bams)\n }\n \n \n } else {\n in_var = [].concat($job.inputs.bams)\n }\n \n prefix=''\n \n if(in_var.length==1) { \n return '' // Input will be just passed to output\n }else if (in_var[0].metadata){\n \t if (\"sample_id\" in in_var[0].metadata){ \n \t\t prefix = in_var[0].metadata[\"sample_id\"]; \n \t } else {\n \t\t prefix = 'sample_unknown';\n }\n \n }else {\n prefix = 'sample_unknown'; \n }\n \n // Create joint name and add the merged suffix\n joint_name = prefix + '_' + common_end(in_var);\n name = joint_name.split('.').slice(0,-1).join('.') + '.merged.bam'\n \n \n \n return name\n \n}\n\n\n" } } ], "sbg:job": { "inputs": { "reserved_threads": 2, "num_of_threads": 6, "mem_mb": 2, "compression_level": null, "bams": [ { "class": "File", "metadata": { "sample_id": "testmeta" }, "path": "/path/to/uuu_bams.bam", "secondaryFiles": [], "size": 0 }, { "class": "File", "path": "/path/to/uyyy_bams.bam", "secondaryFiles": [], "size": 0 } ] }, "allocatedResources": { "mem": 2, "cpu": 2 } }, "sbg:categories": [ "SAM/BAM-Processing" ], "sbg:cmdPreview": "/opt/sambamba_0.5.9/sambamba_v0.5.9 merge -t 6 /path/to/uuu_bams.bam /path/to/uyyy_bams.bam testmeta__bams.merged.bam", "sbg:contributors": [ "nevenam", "nevenam.sudo", "vladimirk", "ognjenm" ], "sbg:createdBy": "nevenam.sudo", "sbg:createdOn": 1458920412, "sbg:id": "admin/sbg-public-data/sambamba-merge-0-5-9/18", "sbg:image_url": null, "sbg:latestRevision": 18, "sbg:license": "GNU General Public License v2.0 only", "sbg:links": [ { "id": "http://lomereiter.github.io/sambamba/docs/sambamba-view.html", "label": "Homepage" }, { "id": "https://github.com/lomereiter/sambamba", "label": "Source code" }, { "id": "https://github.com/lomereiter/sambamba/wiki", "label": "Wiki" }, { "id": "https://github.com/lomereiter/sambamba/releases/tag/v0.5.9", "label": "Download" }, { "id": "http://lomereiter.github.io/sambamba/docs/sambamba-view.html", "label": "Publication" } ], "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1476706820, "sbg:project": "bix-demo/sambamba-0-5-9-demo", "sbg:revision": 18, "sbg:revisionNotes": "Added reserved number of threads.", "sbg:revisionsInfo": [ { "sbg:modifiedBy": "nevenam.sudo", "sbg:modifiedOn": 1458920412, "sbg:revision": 0, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "nevenam.sudo", "sbg:modifiedOn": 1458920459, "sbg:revision": 1, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "nevenam", "sbg:modifiedOn": 1462963630, "sbg:revision": 2, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "nevenam", "sbg:modifiedOn": 1462963660, "sbg:revision": 3, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "ognjenm", "sbg:modifiedOn": 1468849940, "sbg:revision": 4, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "ognjenm", "sbg:modifiedOn": 1470679574, "sbg:revision": 5, "sbg:revisionNotes": "Changed to common name" }, { "sbg:modifiedBy": "ognjenm", "sbg:modifiedOn": 1470752287, "sbg:revision": 6, "sbg:revisionNotes": "Smart naming introduced" }, { "sbg:modifiedBy": "ognjenm", "sbg:modifiedOn": 1470753233, "sbg:revision": 7, "sbg:revisionNotes": "Corrected single file case" }, { "sbg:modifiedBy": "ognjenm", "sbg:modifiedOn": 1470753950, "sbg:revision": 8, "sbg:revisionNotes": "Changed glob" }, { "sbg:modifiedBy": "ognjenm", "sbg:modifiedOn": 1470756561, "sbg:revision": 9, "sbg:revisionNotes": "Updated sample id tag" }, { "sbg:modifiedBy": "ognjenm", "sbg:modifiedOn": 1470757408, "sbg:revision": 10, "sbg:revisionNotes": "Added inherit metadata" }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1472994215, "sbg:revision": 11, "sbg:revisionNotes": "Added support for receiving bams as list inside list." }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1473000652, "sbg:revision": 12, "sbg:revisionNotes": "Glob returns *.bam" }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1473071971, "sbg:revision": 13, "sbg:revisionNotes": "Added support for input.bams received as list of one-element-lists." }, { "sbg:modifiedBy": "ognjenm", "sbg:modifiedOn": 1473257751, "sbg:revision": 14, "sbg:revisionNotes": "Added protection from null (10)" }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1473424164, "sbg:revision": 15, "sbg:revisionNotes": "Returned revision that accepts list of one-element lists." }, { "sbg:modifiedBy": "ognjenm", "sbg:modifiedOn": 1475064412, "sbg:revision": 16, "sbg:revisionNotes": "Added resource parameters" }, { "sbg:modifiedBy": "ognjenm", "sbg:modifiedOn": 1475064631, "sbg:revision": 17, "sbg:revisionNotes": "Changed mem error" }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1476706820, "sbg:revision": 18, "sbg:revisionNotes": "Added reserved number of threads." } ], "sbg:sbgMaintained": false, "sbg:toolAuthor": "Artem Tarasov", "sbg:toolkit": "Sambamba", "sbg:toolkitVersion": "0.5.9", "sbg:validationErrors": [], "x": 2026.6666666666679, "y": -445.5729166666669 }, "label": "Sambamba Merge", "sbg:x": 2026.6666666666679, "sbg:y": -445.5729166666669 }, { "id": "#GATK_RealignerTargetCreator", "inputs": [ { "id": "#GATK_RealignerTargetCreator.threads_per_job", "default": 4 }, { "id": "#GATK_RealignerTargetCreator.reference", "source": [ "#SBG_FASTA_Indices.fasta_reference" ] }, { "id": "#GATK_RealignerTargetCreator.reads", "source": [ "#BWA_MEM_Bundle_0_7_13.aligned_reads" ] }, { "id": "#GATK_RealignerTargetCreator.memory_per_job", "default": 2048 }, { "id": "#GATK_RealignerTargetCreator.memory_overhead_per_job", "default": 64 }, { "id": "#GATK_RealignerTargetCreator.known", "source": [ "#mills", "#1000g_indels" ] }, { "id": "#GATK_RealignerTargetCreator.intervals_file", "source": [ "#SBG_Prepare_Intervals.intervals" ] }, { "id": "#GATK_RealignerTargetCreator.cpu_per_job", "default": 1 } ], "outputs": [ { "id": "#GATK_RealignerTargetCreator.indel_realigner_intervals_file" } ], "run": { "cwlVersion": "sbg:draft-2", "class": "CommandLineTool", "id": "admin/sbg-public-data/gatk-2-3-9-lite-realignertargetcreator/13", "label": "GATK RealignerTargetCreator", "description": "Overview\n\nThe local realignment process is designed to consume one or more BAM files and to locally realign reads such that the number of mismatching bases is minimized across all the reads. In general, a large percent of regions requiring local realignment are due to the presence of an insertion or deletion (indels) in the individual's genome with respect to the reference genome. Such alignment artifacts result in many bases mismatching the reference near the misalignment, which are easily mistaken as SNPs. Moreover, since read mapping algorithms operate on each read independently, it is impossible to place reads on the reference genome such that mismatches are minimized across all reads. Consequently, even when some reads are correctly mapped with indels, reads covering the indel near just the start or end of the read are often incorrectly mapped with respect the true indel, also requiring realignment. Local realignment serves to transform regions with misalignments due to indels into clean reads containing a consensus indel suitable for standard variant discovery approaches. Unlike most mappers, this tool uses the full alignment context to determine whether an appropriate alternate reference (i.e. indel) exists.\n\nThere are 2 steps to the realignment process:\nDetermining (small) suspicious intervals which are likely in need of realignment (RealignerTargetCreator)\nRunning the realigner over those intervals (see the IndelRealigner tool)\nFor more details, see the indel realignment method documentation.\n\nInputs\nOne or more aligned BAM files and optionally, one or more lists of known indels.\n\nOutput\nA list of target intervals to pass to the IndelRealigner.\n\nUsage example:\n java -jar GenomeAnalysisTK.jar \\\n -T RealignerTargetCreator \\\n -R reference.fasta \\\n -I input.bam \\\n --known indels.vcf \\\n -o forIndelRealigner.intervals\n \nNotes\n\nThe input BAM(s), reference, and known indel file(s) should be the same ones to be used for the IndelRealigner step.\nWhen multiple potential indels are found by the tool in the same general region, the tool will choose the most likely one for realignment to the exclusion of the others. This is a known limitation of the tool.\nBecause reads produced from the 454 technology inherently contain false indels, the realigner will not work with them (or with reads from similar technologies).\nThis tool also ignores MQ0 reads and reads with consecutive indel operators in the CIGAR string.\n\n(IMPORTANT) Reference \".fasta\" Secondary Files\n\nTools in GATK that require a fasta reference file also look for the reference file's corresponding .fai (fasta index) and .dict (fasta dictionary) files. The fasta index file allows random access to reference bases and the dictionary file is a dictionary of the contig names and sizes contained within the fasta reference. These two secondary files are essential for GATK to work properly. To append these two files to your fasta reference please use the 'SBG FASTA Indices' tool within your GATK based workflow before using any of the GATK tools.", "baseCommand": [ "java", { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.memory_per_job){\n \treturn '-Xmx'.concat($job.inputs.memory_per_job, 'M')\n }\n \treturn '-Xmx2048M'\n}" }, "-jar", "/opt/GenomeAnalysisTKLite.jar", "--analysis_type", "RealignerTargetCreator", { "class": "Expression", "engine": "#cwl-js-engine", "script": "{ \n if(!$job.inputs.threads_per_job){\n return '-nt '.concat(4)\n }\n else{\n \treturn '-nt '.concat($job.inputs.threads_per_job)\n }\n}" } ], "inputs": [ { "sbg:altPrefix": "-window", "sbg:category": "Realigner Target Creator", "sbg:toolDefaultValue": "10", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--windowSize", "separate": true, "sbg:cmdInclude": true }, "label": "Window size", "description": "Window size for calculating entropy or SNP clusters. Any two SNP calls and/or high entropy positions are considered clustered when they occur no more than this many base pairs apart.", "id": "#window_size" }, { "sbg:altPrefix": "-S", "sbg:category": "GATK General", "sbg:toolDefaultValue": "SILENT", "type": [ "null", { "type": "enum", "symbols": [ "SILENT", "LENIENT", "STRICT" ], "name": "validation_strictness" } ], "inputBinding": { "position": 0, "prefix": "--validation_strictness", "separate": true, "sbg:cmdInclude": true }, "label": "Validation Strictness", "description": "How strict should we be with validation.", "id": "#validation_strictness" }, { "sbg:altPrefix": "-OQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--useOriginalQualities", "separate": true, "sbg:cmdInclude": true }, "label": "Use Original Qualities", "description": "If set, use the original base quality scores from the OQ tag when present instead of the standard scores.", "id": "#use_original_qualities" }, { "sbg:altPrefix": "-use_legacy_downsampler", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--use_legacy_downsampler", "separate": true, "sbg:cmdInclude": true }, "label": "Use Legacy Downsampler", "description": "Use the legacy downsampling implementation instead of the newer, less-tested implementation.", "id": "#use_legacy_downsampler" }, { "sbg:altPrefix": "-U", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", { "type": "enum", "symbols": [ "ALLOW_UNINDEXED_BAM", "ALLOW_UNSET_BAM_SORT_ORDER", "NO_READ_ORDER_VERIFICATION", "ALLOW_SEQ_DICT_INCOMPATIBILITY", "LENIENT_VCF_PROCESSING", "ALL" ], "name": "unsafe" } ], "inputBinding": { "position": 0, "prefix": "--unsafe", "separate": true, "sbg:cmdInclude": true }, "label": "Unsafe", "description": "If set, enables unsafe operations: nothing will be checked at runtime. For expert users only who know what they are doing. We do not support usage of this argument.", "id": "#unsafe" }, { "sbg:altPrefix": null, "sbg:category": "Execution", "sbg:toolDefaultValue": "4", "type": [ "null", "int" ], "label": "Threads per job", "description": "For tools which support multiprocessing, this value can be used to set the number of threads to be used.", "id": "#threads_per_job" }, { "sbg:altPrefix": "-tag", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "string" ], "inputBinding": { "position": 0, "prefix": "--tag", "separate": true, "sbg:cmdInclude": true }, "label": "Tag", "description": "Arbitrary tag string to identify this GATK run as part of a group of runs, for later analysis.", "id": "#tag" }, { "sbg:altPrefix": "-rpr", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--remove_program_records", "separate": true, "sbg:cmdInclude": true }, "label": "Remove Program Records", "description": "Should we override the Walker's default and remove program records from the SAM header.", "id": "#remove_program_records" }, { "required": true, "sbg:altPrefix": "-R", "sbg:category": "Input Files", "type": [ "File" ], "inputBinding": { "position": 0, "prefix": "--reference_sequence", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Reference Genome", "description": "Reference Genome in FASTA format.", "sbg:fileTypes": "FASTA, FA", "id": "#reference" }, { "required": false, "sbg:altPrefix": "-I", "sbg:category": "Input Files", "type": [ "null", { "type": "array", "items": "File" } ], "inputBinding": { "position": 0, "prefix": "--input_file", "separate": true, "sbg:cmdInclude": true, "secondaryFiles": [ ".bai" ] }, "label": "Read sequences", "description": "Read sequences in BAM format.", "sbg:fileTypes": "SAM, BAM", "id": "#reads" }, { "sbg:altPrefix": "-rgbl", "sbg:category": "GATK General", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 0, "prefix": "--read_group_black_list", "separate": true, "sbg:cmdInclude": true }, "label": "Read Group Black List", "description": "Filters out read groups matching : or a .txt file containing the filter strings one per line.", "id": "#read_group_black_list" }, { "sbg:altPrefix": "-rf", "sbg:category": "GATK General", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": { "type": "enum", "symbols": [ "BadCigarFilter", "BadMateFilter", "CountingFilteringIterator.CountingReadFilter", "DuplicateReadFilter", "FailsVendorQualityCheckFilter", "HCMappingQualityFilter", "LibraryReadFilter", "MalformedReadFilter", "MappingQualityFilter", "MappingQualityUnavailableFilter", "MappingQualityZeroFilter", "MateSameStrandFilter", "MaxInsertSizeFilter", "MissingReadGroupFilter", "NoOriginalQualityScoresFilter", "NotPrimaryAlignmentFilter", "OverclippedReadFilter", "Platform454Filter", "PlatformFilter", "PlatformUnitFilter", "ReadGroupBlackListFilter", "ReadLengthFilter", "ReadNameFilter", "ReadStrandFilter", "ReassignMappingQualityFilter", "ReassignOneMappingQualityFilter", "SampleFilter", "SingleReadGroupFilter", "UnmappedReadFilter" ] } } ], "inputBinding": { "position": 0, "prefix": "--read_filter", "separate": true, "sbg:cmdInclude": true }, "label": "Read Filter", "description": "Specify filtration criteria to apply to each read individually.", "id": "#read_filter" }, { "sbg:altPrefix": "-preserveQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "6", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--preserve_qscores_less_than", "separate": true, "sbg:cmdInclude": true }, "label": "Preserve Qscores Less Than", "description": "Bases with quality scores less than this threshold won't be recalibrated (with -BQSR).", "id": "#preserve_qscores_less_than" }, { "sbg:altPrefix": "-et", "sbg:category": "GATK General", "sbg:toolDefaultValue": "STANDARD", "type": [ "null", { "type": "enum", "symbols": [ "NO_ET", "STANDARD" ], "name": "phone_home" } ], "inputBinding": { "position": 0, "prefix": "--phone_home", "separate": true, "sbg:cmdInclude": true }, "label": "Phone Home", "description": "What kind of GATK run report should we generate? STANDARD is the default, can be NO_ET so nothing is posted to the run repository. Please see http://gatkforums.broadinstitute.org/discussion/1250/what-is-phone-home-and-how-does-it-affect-me#latest for details.", "id": "#phone_home" }, { "sbg:altPrefix": "-pedValidationType", "sbg:category": "GATK General", "sbg:toolDefaultValue": "STRICT", "type": [ "null", { "type": "enum", "symbols": [ "STRICT", "SILENT" ], "name": "pedigree_validation_type" } ], "inputBinding": { "position": 0, "prefix": "--pedigreeValidationType", "separate": true, "sbg:cmdInclude": true }, "label": "Pedigree Validation Type", "description": "How strict should we be in validating the pedigree information?.", "id": "#pedigree_validation_type" }, { "sbg:altPrefix": "-pedString", "sbg:category": "GATK General", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 0, "prefix": "--pedigreeString", "separate": true, "sbg:cmdInclude": true }, "label": "Pedigree String", "description": "Pedigree string for samples.", "id": "#pedigree_string" }, { "sbg:altPrefix": "-ndrs", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--nonDeterministicRandomSeed", "separate": true, "sbg:cmdInclude": true }, "label": "Non Deterministic Random Seed", "description": "Makes the GATK behave non deterministically, that is, the random numbers generated will be different in every run.", "id": "#non_deterministic_random_seed" }, { "sbg:altPrefix": "-mismatch", "sbg:category": "Realigner Target Creator", "sbg:toolDefaultValue": "0.0", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--mismatchFraction", "separate": true, "sbg:cmdInclude": true }, "label": "Mismatch fraction", "description": "Fraction of base qualities needing to mismatch for a position to have high entropy. To disable this behavior, set this value to <= 0 or > 1. This feature is really only necessary when using an ungapped aligner (e.g. MAQ in the case of single-end read data) and should be used in conjunction with USE_SW' option.", "id": "#mismatch_fraction" }, { "sbg:altPrefix": "-minReads", "sbg:category": "Realigner Target Creator", "sbg:toolDefaultValue": "4", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--minReadsAtLocus", "separate": true, "sbg:cmdInclude": true }, "label": "Minimum reads at locus", "description": "Minimum reads at a locus to enable using the entropy calculation.", "id": "#min_reads_at_locus" }, { "sbg:altPrefix": null, "sbg:category": "Execution", "sbg:toolDefaultValue": "2048", "type": [ "null", "int" ], "label": "Memory per job", "description": "Amount of RAM memory in MB to be used per job.", "id": "#memory_per_job" }, { "sbg:category": "Execution", "sbg:toolDefaultValue": "0", "type": [ "null", "int" ], "label": "Memory overhead per job", "description": "Memory overhead per job. By default this parameter value is set to '0' (zero megabytes). This parameter value is added to the Memory per job parameter value. This results in the allocation of the sum total (Memory per job and Memory overhead per job) amount of memory per job. By default the memory per job parameter value is set to 2048 megabytes, unless specified otherwise.", "id": "#memory_overhead_per_job" }, { "sbg:altPrefix": "-maxRuntimeUnits", "sbg:category": "GATK General", "sbg:toolDefaultValue": "MINUTES", "type": [ "null", { "type": "enum", "symbols": [ "NANOSECONDS", "MICROSECONDS", "MILLISECONDS", "SECONDS", "MINUTES", "HOURS", "DAYS" ], "name": "max_runtime_units" } ], "inputBinding": { "position": 0, "prefix": "--maxRuntimeUnits", "separate": true, "sbg:cmdInclude": true }, "label": "Max Runtime Units", "description": "The TimeUnit for maxRuntime.", "id": "#max_runtime_units" }, { "sbg:altPrefix": "-maxRuntime", "sbg:category": "GATK General", "sbg:toolDefaultValue": "-1", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--maxRuntime", "separate": true, "sbg:cmdInclude": true }, "label": "Max Runtime", "description": "If provided, that GATK will stop execution cleanly as soon after maxRuntime has been exceeded, truncating the run but not exiting with a failure. By default the value is interpreted in minutes, but this can be changed by maxRuntimeUnits.", "id": "#max_runtime" }, { "sbg:altPrefix": "-maxInterval", "sbg:category": "Realigner Target Creator", "sbg:toolDefaultValue": "500", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--maxIntervalSize", "separate": true, "sbg:cmdInclude": true }, "label": "Maximum interval size", "description": "Maximum interval size. Because the realignment algorithm is N^2, allowing too large an interval might take too long to completely realign.", "id": "#max_interval_size" }, { "required": true, "sbg:category": "Input Files", "sbg:stageInput": "link", "type": [ { "type": "array", "items": "File" } ], "inputBinding": { "position": 0, "prefix": "--known", "separate": true, "sbg:cmdInclude": true }, "label": "Known indels", "description": "VCF file with known indels.", "sbg:fileTypes": "VCF", "id": "#known" }, { "sbg:altPrefix": "-kpr", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--keep_program_records", "separate": true, "sbg:cmdInclude": true }, "label": "Keep Program Records", "description": "Should we override the Walker's default and keep program records from the SAM header.", "id": "#keep_program_records" }, { "required": false, "sbg:altPrefix": "-L", "sbg:category": "Input Files", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--intervals", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Intervals", "description": "One or more genomic intervals over which to operate. Can be an specified in an .intervals file or a rod file.", "sbg:fileTypes": "TXT, BED, VCF", "id": "#intervals_file" }, { "sbg:altPrefix": null, "sbg:category": "GATK General", "sbg:toolDefaultValue": "sample", "type": [ "null", "string" ], "inputBinding": { "position": 0, "prefix": "-L", "separate": true, "sbg:cmdInclude": true }, "label": "Intervals", "description": "One or more genomic intervals over which to operate.", "id": "#intervals" }, { "sbg:altPrefix": "-isr", "sbg:category": "GATK General", "sbg:toolDefaultValue": "UNION", "type": [ "null", { "type": "enum", "symbols": [ "UNION", "INTERSECTION" ], "name": "interval_set_rule" } ], "inputBinding": { "position": 0, "prefix": "--interval_set_rule", "separate": true, "sbg:cmdInclude": true }, "label": "Interval Set Rule", "description": "Indicates the set merging approach the interval parser should use to combine the various -L or -XL inputs.", "id": "#interval_set_rule" }, { "sbg:altPrefix": "-ip", "sbg:category": "GATK General", "sbg:toolDefaultValue": "0", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--interval_padding", "separate": true, "sbg:cmdInclude": true }, "label": "Interval Padding", "description": "Indicates how many basepairs of padding to include around each of the intervals specified with the -L/--intervals argument.", "id": "#interval_padding" }, { "sbg:altPrefix": "-im", "sbg:category": "GATK General", "sbg:toolDefaultValue": "ALL", "type": [ "null", { "type": "enum", "symbols": [ "ALL", "OVERLAPPING_ONLY" ], "name": "interval_merging" } ], "inputBinding": { "position": 0, "prefix": "--interval_merging", "separate": true, "sbg:cmdInclude": true }, "label": "Interval Merging", "description": "Indicates the interval merging rule we should use for abutting intervals.", "id": "#interval_merging" }, { "required": false, "sbg:altPrefix": "-K", "sbg:category": "Input Files", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--gatk_key", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Gatk key", "description": "GATK Key file. Required if running with -et NO_ET. Please see http://gatkforums.broadinstitute.org/discussion/1250/what-is-phone-home-and-how-does-it-affect-me#latest for details.", "sbg:fileTypes": "KEY, LICENSE", "id": "#gatk_key" }, { "sbg:altPrefix": "-fixMisencodedQuals", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "-fixMisencodedQuals", "separate": true, "sbg:cmdInclude": true }, "label": "Fix Misencoded Quals", "description": "Fix mis-encoded base quality scores.", "id": "#fix_misencoded_quals" }, { "required": false, "sbg:altPrefix": "-XL", "sbg:category": "Input Files", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--excludeIntervals", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Exclude Intervals", "description": "One or more genomic intervals to exclude from processing. Can be an .intervals file or a rod file.", "sbg:fileTypes": "TXT, BED, VCF", "id": "#exclude_intervals" }, { "sbg:altPrefix": "-EOQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--emit_original_quals", "separate": true, "sbg:cmdInclude": true }, "label": "Emit Original Quals", "description": "If true, enables printing of the OQ tag with the original base qualities (with -BQSR).", "id": "#emit_original_quals" }, { "sbg:altPrefix": "-dt", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", { "type": "enum", "symbols": [ "NONE", "ALL_READS", "BY_SAMPLE" ], "name": "downsampling_type" } ], "inputBinding": { "position": 0, "prefix": "--downsampling_type", "separate": true, "sbg:cmdInclude": true }, "label": "Downsampling Type", "description": "Type of reads downsampling to employ at a given locus. Reads will be selected randomly to be removed from the pile based on the method described here.", "id": "#downsampling_type" }, { "sbg:altPrefix": "-dfrac", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--downsample_to_fraction", "separate": true, "sbg:cmdInclude": true }, "label": "Downsample to Fraction", "description": "Fraction [0.0-1.0] of reads to downsample to.", "id": "#downsample_to_fraction" }, { "sbg:altPrefix": "-dcov", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--downsample_to_coverage", "separate": true, "sbg:cmdInclude": true }, "label": "Downsample to Coverage", "description": "Coverage to downsample to at any given locus; note that downsampled reads are randomly selected from all possible reads at a locus. For non-locus-based traversals (eg., ReadWalkers), this sets the maximum number of reads at each alignment start position.", "id": "#downsample_to_coverage" }, { "sbg:altPrefix": null, "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--disableRandomization", "separate": true, "sbg:cmdInclude": true }, "label": "Disable Randomization", "description": "Completely eliminates randomization from nondeterministic methods. To be used mostly in the testing framework where dynamic parallelism can result in differing numbers of calls to the generator.", "id": "#disable_radnomization" }, { "sbg:altPrefix": "-DIQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--disable_indel_quals", "separate": true, "sbg:cmdInclude": true }, "label": "Disable Indel Quals", "description": "If 'true', disables printing of base insertion and base deletion tags (with -BQSR). Turns off printing of the base insertion and base deletion tags when using the -BQSR argument and only the base substitution qualities will be produced.", "id": "#disable_indel_quals" }, { "sbg:altPrefix": "-DBQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "-1", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--defaultBaseQualities", "separate": true, "sbg:cmdInclude": true }, "label": "Default Base Qualities", "description": "If reads are missing some or all base quality scores, this value will be used for all base quality scores.", "id": "#default_base_qualities" }, { "sbg:altPrefix": null, "sbg:category": "Execution", "sbg:toolDefaultValue": "1", "type": [ "null", "int" ], "label": "CPU per job", "description": "Number of CPUs per job.", "id": "#cpu_per_job" }, { "sbg:altPrefix": "-baqGOP", "sbg:category": "GATK General", "sbg:toolDefaultValue": "40.0", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--baqGapOpenPenalty", "separate": true, "sbg:cmdInclude": true }, "label": "BAQ Gap Open Penalty", "description": "BAQ gap open penalty (Phred Scaled). Default value is 40. 30 is perhaps better for whole genome call sets.", "id": "#baq_gap_open_penalty" }, { "sbg:altPrefix": "-baq", "sbg:category": "GATK General", "sbg:toolDefaultValue": "OFF", "type": [ "null", { "type": "enum", "symbols": [ "OFF", "CALCULATE_AS_NECESSARY", "RECALCULATE" ], "name": "baq" } ], "inputBinding": { "position": 0, "prefix": "--baq", "separate": true, "sbg:cmdInclude": true }, "label": "BAQ Calculation Type", "description": "Type of BAQ calculation to apply in the engine.", "id": "#baq" }, { "sbg:altPrefix": "--allow_potentially_misencoded_quality_scores", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "-allowPotentiallyMisencodedQuals", "separate": true, "sbg:cmdInclude": true }, "label": "Allow Potentially Misencoded Quals", "description": "Do not fail when encountered base qualities that are too high and seemingly indicate a problem with the base quality encoding of the BAM file.", "id": "#allow_potentailly_misencoded_quals" } ], "outputs": [ { "type": [ "null", "File" ], "label": "Intervals", "description": "An output file created by the walker.", "sbg:fileTypes": "INTERVALS", "outputBinding": { "glob": "*.intervals", "sbg:metadata": { "intervals_file": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if ($job.inputs.intervals_file)\n if($job.inputs.intervals_file.metadata)\n if($job.inputs.intervals_file.metadata.sbg_scatter)\n return $job.inputs.intervals_file.path.split('/').pop()\n return 'NO_INTERVALS'\n}" } }, "sbg:inheritMetadataFrom": "#reads" }, "id": "#indel_realigner_intervals_file" } ], "requirements": [ { "class": "ExpressionEngineRequirement", "id": "#cwl-js-engine", "requirements": [ { "class": "DockerRequirement", "dockerPull": "rabix/js-engine" } ] } ], "hints": [ { "class": "sbg:CPURequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.cpu_per_job){\n \treturn $job.inputs.cpu_per_job\n }\n return 1\n}" } }, { "class": "sbg:MemRequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.memory_per_job){\n if($job.inputs.memory_overhead_per_job){\n \treturn $job.inputs.memory_per_job + $job.inputs.memory_overhead_per_job\n }\n else\n \t\treturn $job.inputs.memory_per_job\n }\n else if(!$job.inputs.memory_per_job && $job.inputs.memory_overhead_per_job){\n\t\treturn 2048 + $job.inputs.memory_overhead_per_job \n }\n else\n \treturn 2048\n}" } }, { "class": "DockerRequirement", "dockerImageId": "47510cb2da55", "dockerPull": "images.sbgenomics.com/stefanristeski/gatk2-lite:2.3-9" } ], "arguments": [ { "position": 0, "prefix": "--out", "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.reads){\n read_name = [].concat($job.inputs.reads)[0].path.replace(/^.*[\\\\\\/]/, '').split('.')\n read_namebase = read_name.slice(0, read_name.length-1).join('.')\n } else read_namebase = 'known_only'\n return read_namebase + '.intervals'\n}" } } ], "sbg:job": { "inputs": { "window_size": null, "validation_strictness": null, "use_original_qualities": null, "use_legacy_downsampler": null, "unsafe": null, "threads_per_job": 2, "tag": null, "remove_program_records": null, "reference": { "path": "/folder/reference.fasta" }, "reads": [ { "path": "input.bam" } ], "read_group_black_list": [], "read_filter": [], "preserve_qscores_less_than": null, "phone_home": null, "pedigree_validation_type": null, "pedigree_string": [], "non_deterministic_random_seed": null, "mismatch_fraction": null, "min_reads_at_locus": null, "memory_per_job": 1, "memory_overhead_per_job": 0, "max_runtime_units": null, "max_runtime": null, "max_interval_size": null, "known": [ { "path": "/folder/indels.vcf" } ], "keep_program_records": null, "intervals_file": { "metadata": { "sbg_scatter": "true" }, "path": "/path/to/file/rrrrrr.bed", "secondaryFiles": [] }, "intervals": "", "interval_set_rule": null, "interval_padding": null, "interval_merging": null, "gatk_key": null, "fix_misencoded_quals": null, "exclude_intervals": null, "emit_original_quals": null, "downsampling_type": null, "downsample_to_fraction": null, "downsample_to_coverage": null, "disable_radnomization": null, "disable_indel_quals": null, "default_base_qualities": null, "cpu_per_job": 1, "baq_gap_open_penalty": null, "baq": null, "allow_potentailly_misencoded_quals": null }, "allocatedResources": { "mem": 1, "cpu": 1 } }, "sbg:categories": [ "Analysis" ], "sbg:cmdPreview": "java -Xmx1M -jar /opt/GenomeAnalysisTKLite.jar --analysis_type RealignerTargetCreator -nt 2 --reference_sequence /folder/reference.fasta --known /folder/indels.vcf --out input.intervals", "sbg:contributors": [ "vladimirk", "bix-demo" ], "sbg:createdBy": "bix-demo", "sbg:createdOn": 1450911384, "sbg:id": "admin/sbg-public-data/gatk-2-3-9-lite-realignertargetcreator/13", "sbg:image_url": null, "sbg:latestRevision": 8, "sbg:license": "MIT License", "sbg:links": [ { "id": "https://www.broadinstitute.org/gatk/index.php", "label": "Homepage" }, { "id": "https://github.com/broadgsa/gatk-protected", "label": "Source code" }, { "id": "https://www.broadinstitute.org/gatk/guide/pdfdocs/GATK_GuideBook_2.3-9.pdf", "label": "Wiki" }, { "id": "ttps://www.broadinstitute.org/gatk/download/auth?package=GATK-archive&version=2.3-9-ge5ebf34", "label": "Download" }, { "id": "https://www.broadinstitute.org/gatk/about/#in-the-literature", "label": "Publication" }, { "id": "https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_indels_RealignerTargetCreator.php", "label": "Documentation" } ], "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1475576477, "sbg:project": "bix-demo/gatk-2-3-9-lite-demo", "sbg:revision": 8, "sbg:revisionNotes": "BAMs are not required input", "sbg:revisionsInfo": [ { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911384, "sbg:revision": 0, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911384, "sbg:revision": 1, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911385, "sbg:revision": 2, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911386, "sbg:revision": 3, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911387, "sbg:revision": 4, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1471364087, "sbg:revision": 5, "sbg:revisionNotes": "known link staged." }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1472650598, "sbg:revision": 6, "sbg:revisionNotes": "Scatter metadata." }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1472724438, "sbg:revision": 7, "sbg:revisionNotes": ".bai as secondary" }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1475576477, "sbg:revision": 8, "sbg:revisionNotes": "BAMs are not required input" } ], "sbg:sbgMaintained": false, "sbg:toolAuthor": "Broad Institute", "sbg:toolkit": "GATK", "sbg:toolkitVersion": "2.3.9 Lite", "sbg:validationErrors": [], "x": 1250.0004069010424, "y": 412.7608235677086 }, "label": "GATK RealignerTargetCreator", "scatter": "#GATK_RealignerTargetCreator.intervals_file", "sbg:x": 1250.0004069010424, "sbg:y": 412.7608235677086 }, { "id": "#GATK_PrintReads", "inputs": [ { "id": "#GATK_PrintReads.threads_per_job", "default": 4 }, { "id": "#GATK_PrintReads.reference", "source": [ "#SBG_FASTA_Indices.fasta_reference" ] }, { "id": "#GATK_PrintReads.reads", "source": [ "#GATK_IndelRealigner.realigned_bam_file" ] }, { "id": "#GATK_PrintReads.memory_per_job", "default": 2048 }, { "id": "#GATK_PrintReads.memory_overhead_per_job", "default": 64 }, { "id": "#GATK_PrintReads.intervals_file", "source": [ "#SBG_Prepare_Intervals.intervals" ] }, { "id": "#GATK_PrintReads.cpu_per_job", "default": 1 }, { "id": "#GATK_PrintReads.bqsr", "source": [ "#GATK_BaseRecalibrator.bqsr" ] } ], "outputs": [ { "id": "#GATK_PrintReads.recalibrated_bam" } ], "run": { "cwlVersion": "sbg:draft-2", "class": "CommandLineTool", "id": "admin/sbg-public-data/gatk-2-3-9-lite-printreads/21", "label": "GATK PrintReads", "description": "Overview\n\nPrintReads is a generic utility tool for manipulating sequencing data in SAM/BAM format. It can dynamically merge the contents of multiple input BAM files, resulting in merged output sorted in coordinate order. It can also optionally filter reads based on various read properties such as read group tags using the `--read_filter/-rf` command line argument (see documentation on read filters for more information).\n\nNote that when PrintReads is used as part of the Base Quality Score Recalibration workflow, it takes the `--BQSR` engine argument, which is listed under Inherited Arguments > CommandLineGATK below.\n\nInput\nOne or more bam files.\n\nOutput\nA single processed bam file.\n\nUsage examples:\n\n // Prints all reads that have a mapping quality above zero\n java -jar GenomeAnalysisTK.jar \\\n -T PrintReads \\\n -R reference.fasta \\\n -I input1.bam \\\n -I input2.bam \\\n -o output.bam \\\n --read_filter MappingQualityZero\n\n // Prints the first 2000 reads in the BAM file\n java -jar GenomeAnalysisTK.jar \\\n -T PrintReads \\\n -R reference.fasta \\\n -I input.bam \\\n -o output.bam \\\n -n 2000\n\n // Downsamples BAM file to 25%\n java -jar GenomeAnalysisTK.jar \\\n -T PrintReads \\\n -R reference.fasta \\\n -I input.bam \\\n -o output.bam \\\n -dfrac 0.25\n\n(IMPORTANT) Reference \".fasta\" Secondary Files\n\nTools in GATK that require a fasta reference file also look for the reference file's corresponding .fai (fasta index) and .dict (fasta dictionary) files. The fasta index file allows random access to reference bases and the dictionary file is a dictionary of the contig names and sizes contained within the fasta reference. These two secondary files are essential for GATK to work properly. To append these two files to your fasta reference please use the 'SBG FASTA Indices' tool within your GATK based workflow before using any of the GATK tools.", "baseCommand": [ "java", { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.memory_per_job){\n \treturn '-Xmx'.concat($job.inputs.memory_per_job, 'M')\n }\n \treturn '-Xmx2048M'\n}" }, "-jar", "/opt/GenomeAnalysisTKLite.jar", "--analysis_type", "PrintReads", { "class": "Expression", "engine": "#cwl-js-engine", "script": "{ \n if($job.inputs.threads_per_job){\n return '-nct '.concat($job.inputs.threads_per_job)\n }\n else{\n \treturn '-nct '.concat(4)\n }\n}" } ], "inputs": [ { "sbg:altPrefix": "-S", "sbg:category": "GATK General", "sbg:toolDefaultValue": "SILENT", "type": [ "null", { "type": "enum", "symbols": [ "SILENT", "LENIENT", "STRICT" ], "name": "validation_strictness" } ], "inputBinding": { "position": 0, "prefix": "--validation_strictness", "separate": true, "sbg:cmdInclude": true }, "label": "Validation Strictness", "description": "How strict should we be with validation.", "id": "#validation_strictness" }, { "sbg:altPrefix": "-OQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--useOriginalQualities", "separate": true, "sbg:cmdInclude": true }, "label": "Use Original Qualities", "description": "If set, use the original base quality scores from the OQ tag when present instead of the standard scores.", "id": "#use_original_qualities" }, { "sbg:altPrefix": "-use_legacy_downsampler", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--use_legacy_downsampler", "separate": true, "sbg:cmdInclude": true }, "label": "Use Legacy Downsampler", "description": "Use the legacy downsampling implementation instead of the newer, less-tested implementation.", "id": "#use_legacy_downsampler" }, { "sbg:altPrefix": "-U", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", { "type": "enum", "symbols": [ "ALLOW_UNINDEXED_BAM", "ALLOW_UNSET_BAM_SORT_ORDER", "NO_READ_ORDER_VERIFICATION", "ALLOW_SEQ_DICT_INCOMPATIBILITY", "LENIENT_VCF_PROCESSING", "ALL" ], "name": "unsafe" } ], "inputBinding": { "position": 0, "prefix": "--unsafe", "separate": true, "sbg:cmdInclude": true }, "label": "Unsafe", "description": "If set, enables unsafe operations: nothing will be checked at runtime. For expert users only who know what they are doing. We do not support usage of this argument.", "id": "#unsafe" }, { "sbg:altPrefix": null, "sbg:category": "Execution", "sbg:toolDefaultValue": "4", "type": [ "null", "int" ], "label": "Threads per job", "description": "For tools which support multiprocessing, this value can be used to set the number of threads to be used.", "id": "#threads_per_job" }, { "sbg:altPrefix": "-tag", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "string" ], "inputBinding": { "position": 0, "prefix": "--tag", "separate": true, "sbg:cmdInclude": true }, "label": "Tag", "description": "Arbitrary tag string to identify this GATK run as part of a group of runs, for later analysis.", "id": "#tag" }, { "sbg:altPrefix": "-s", "sbg:category": "Print Reads", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--simplify", "separate": true, "sbg:cmdInclude": true }, "label": "Simplify", "description": "Simplify all reads.", "id": "#simplify" }, { "sbg:altPrefix": "-sn", "sbg:category": "Print Reads", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 0, "prefix": "--sample_name", "separate": true, "sbg:cmdInclude": true }, "label": "Sample Name", "description": "Sample name to be included in the analysis. Can be specified multiple times.", "id": "#sample_name" }, { "required": false, "sbg:altPrefix": "-sf", "sbg:category": "Input Files", "type": [ "null", { "type": "array", "items": "File" } ], "inputBinding": { "position": 0, "prefix": "--sample_file", "separate": true, "sbg:cmdInclude": true }, "label": "Sample File", "description": "File containing a list of samples (one per line). Can be specified multiple times.", "id": "#sample_file" }, { "sbg:altPrefix": "-rpr", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--remove_program_records", "separate": true, "sbg:cmdInclude": true }, "label": "Remove Program Records", "description": "Should we override the Walker's default and remove program records from the SAM header.", "id": "#remove_program_records" }, { "required": true, "sbg:altPrefix": "-R", "sbg:category": "Input Files", "type": [ "File" ], "inputBinding": { "position": 0, "prefix": "--reference_sequence", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Reference Genome", "description": "Reference Genome in FASTA format.", "sbg:fileTypes": "FASTA, FA", "id": "#reference" }, { "required": true, "sbg:altPrefix": "-I", "sbg:category": "Input Files", "type": [ { "type": "array", "items": "File" } ], "inputBinding": { "position": 0, "prefix": "--input_file", "separate": true, "sbg:cmdInclude": true }, "label": "Read sequences", "description": "Read sequences in BAM format.", "sbg:fileTypes": "SAM, BAM", "id": "#reads" }, { "sbg:altPrefix": "-rgbl", "sbg:category": "GATK General", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 0, "prefix": "--read_group_black_list", "separate": true, "sbg:cmdInclude": true }, "label": "Read Group Black List", "description": "Filters out read groups matching : or a .txt file containing the filter strings one per line.", "id": "#read_group_black_list" }, { "sbg:altPrefix": "-readGroup", "sbg:category": "Print Reads", "sbg:toolDefaultValue": "", "type": [ "null", "string" ], "inputBinding": { "position": 0, "prefix": "--readGroup", "separate": true, "sbg:cmdInclude": true }, "label": "Read Group", "description": "Exclude all reads with this read group from the output.", "id": "#read_group" }, { "sbg:altPrefix": "-rf", "sbg:category": "GATK General", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": { "type": "enum", "symbols": [ "BadCigar", "BadMate", "CountingFilteringIterator.CountingRead", "DuplicateRead", "FailsVendorQualityCheck", "HCMappingQuality", "LibraryRead", "MalformedRead", "MappingQuality", "MappingQualityUnavailable", "MappingQualityZero", "MateSameStrand", "MaxInsertSize", "MissingReadGroup", "NoOriginalQualityScores", "NotPrimaryAlignment", "OverclippedRead", "Platform454", "PlatformFilter", "PlatformUnit", "ReadGroupBlackList", "ReadLength", "ReadName", "ReadStrand", "ReassignMappingQuality", "ReassignOneMappingQuality", "Sample", "SingleReadGroup", "UnmappedRead" ] } } ], "inputBinding": { "position": 0, "prefix": "--read_filter", "separate": true, "sbg:cmdInclude": true }, "label": "Read Filter", "description": "Specify filtration criteria to apply to each read individually.", "id": "#read_filter" }, { "sbg:altPrefix": "-preserveQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "6", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--preserve_qscores_less_than", "separate": true, "sbg:cmdInclude": true }, "label": "Preserve Qscores Less Than", "description": "Bases with quality scores less than this threshold won't be recalibrated (with -BQSR).", "id": "#preserve_qscores_less_than" }, { "sbg:altPrefix": "-platform", "sbg:category": "Print Reads", "sbg:toolDefaultValue": "", "type": [ "null", "string" ], "inputBinding": { "position": 0, "prefix": "--platform", "separate": true, "sbg:cmdInclude": true }, "label": "Platform", "description": "Exclude all reads with this platform from the output.", "id": "#platform" }, { "sbg:altPrefix": "-et", "sbg:category": "GATK General", "sbg:toolDefaultValue": "STANDARD", "type": [ "null", { "type": "enum", "symbols": [ "NO_ET", "STANDARD" ], "name": "phone_home" } ], "inputBinding": { "position": 0, "prefix": "--phone_home", "separate": true, "sbg:cmdInclude": true }, "label": "Phone Home", "description": "What kind of GATK run report should we generate? STANDARD is the default, can be NO_ET so nothing is posted to the run repository. Please see http://gatkforums.broadinstitute.org/discussion/1250/what-is-phone-home-and-how-does-it-affect-me#latest for details.", "id": "#phone_home" }, { "sbg:altPrefix": "-pedValidationType", "sbg:category": "GATK General", "sbg:toolDefaultValue": "STRICT", "type": [ "null", { "type": "enum", "symbols": [ "STRICT", "SILENT" ], "name": "pedigree_validation_type" } ], "inputBinding": { "position": 0, "prefix": "--pedigreeValidationType", "separate": true, "sbg:cmdInclude": true }, "label": "Pedigree Validation Type", "description": "How strict should we be in validating the pedigree information?.", "id": "#pedigree_validation_type" }, { "sbg:altPrefix": "-pedString", "sbg:category": "GATK General", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 0, "prefix": "--pedigreeString", "separate": true, "sbg:cmdInclude": true }, "label": "Pedigree String", "description": "Pedigree string for samples.", "id": "#pedigree_string" }, { "sbg:altPrefix": "-n", "sbg:category": "Print Reads", "sbg:toolDefaultValue": "-1", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--number", "separate": true, "sbg:cmdInclude": true }, "label": "Number", "description": "Print the first n reads from the file, discarding the rest.", "id": "#number" }, { "sbg:altPrefix": "-ndrs", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--nonDeterministicRandomSeed", "separate": true, "sbg:cmdInclude": true }, "label": "Non Deterministic Random Seed", "description": "Makes the GATK behave non deterministically, that is, the random numbers generated will be different in every run.", "id": "#non_deterministic_random_seed" }, { "sbg:altPrefix": null, "sbg:category": "Execution", "sbg:toolDefaultValue": "2048", "type": [ "null", "int" ], "label": "Memory per job", "description": "Amount of RAM memory in MB to be used per job.", "id": "#memory_per_job" }, { "sbg:category": "Execution", "sbg:toolDefaultValue": "0", "type": [ "null", "int" ], "label": "Memory overhead per job", "description": "Memory overhead per job. By default this parameter value is set to '0' (zero megabytes). This parameter value is added to the Memory per job parameter value. This results in the allocation of the sum total (Memory per job and Memory overhead per job) amount of memory per job. By default the memory per job parameter value is set to 2048 megabytes, unless specified otherwise.", "id": "#memory_overhead_per_job" }, { "sbg:altPrefix": "-maxRuntimeUnits", "sbg:category": "GATK General", "sbg:toolDefaultValue": "MINUTES", "type": [ "null", { "type": "enum", "symbols": [ "NANOSECONDS", "MICROSECONDS", "MILLISECONDS", "SECONDS", "MINUTES", "HOURS", "DAYS" ], "name": "max_runtime_units" } ], "inputBinding": { "position": 0, "prefix": "--maxRuntimeUnits", "separate": true, "sbg:cmdInclude": true }, "label": "Max Runtime Units", "description": "The TimeUnit for maxRuntime.", "id": "#max_runtime_units" }, { "sbg:altPrefix": "-maxRuntime", "sbg:category": "GATK General", "sbg:toolDefaultValue": "-1", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--maxRuntime", "separate": true, "sbg:cmdInclude": true }, "label": "Max Runtime", "description": "If provided, that GATK will stop execution cleanly as soon after maxRuntime has been exceeded, truncating the run but not exiting with a failure. By default the value is interpreted in minutes, but this can be changed by maxRuntimeUnits.", "id": "#max_runtime" }, { "sbg:altPrefix": "-kpr", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--keep_program_records", "separate": true, "sbg:cmdInclude": true }, "label": "Keep Program Records", "description": "Should we override the Walker's default and keep program records from the SAM header.", "id": "#keep_program_records" }, { "required": false, "sbg:category": "Input Files", "sbg:stageInput": "link", "type": [ "null", { "type": "array", "items": "File" } ], "inputBinding": { "position": 0, "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.intervals_file instanceof Array)\n if([].concat($job.inputs.reads)[0].metadata)\n if([].concat($job.inputs.reads)[0].metadata.intervals_file)\n return '--intervals ' + [].concat($job.inputs.reads)[0].metadata.intervals_file\n \n if($job.inputs.intervals_file)\n return '--intervals ' + $job.inputs.intervals_file.path\n else\n return ''\n}" }, "sbg:cmdInclude": true }, "label": "Intervals", "description": "One or more genomic intervals over which to operate. Can be an specified in an .intervals file or a rod file.", "sbg:fileTypes": "TXT, BED, VCF", "id": "#intervals_file" }, { "sbg:altPrefix": null, "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "string" ], "inputBinding": { "position": 0, "prefix": "-L", "separate": true, "sbg:cmdInclude": true }, "label": "Intervals", "description": "One or more genomic intervals over which to operate.", "id": "#intervals" }, { "sbg:altPrefix": "-isr", "sbg:category": "GATK General", "sbg:toolDefaultValue": "UNION", "type": [ "null", { "type": "enum", "symbols": [ "UNION", "INTERSECTION" ], "name": "interval_set_rule" } ], "inputBinding": { "position": 0, "prefix": "--interval_set_rule", "separate": true, "sbg:cmdInclude": true }, "label": "Interval Set Rule", "description": "Indicates the set merging approach the interval parser should use to combine the various -L or -XL inputs.", "id": "#interval_set_rule" }, { "sbg:altPrefix": "-ip", "sbg:category": "GATK General", "sbg:toolDefaultValue": "0", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--interval_padding", "separate": true, "sbg:cmdInclude": true }, "label": "Interval Padding", "description": "Indicates how many basepairs of padding to include around each of the intervals specified with the -L/--intervals argument.", "id": "#interval_padding" }, { "sbg:altPrefix": "-im", "sbg:category": "GATK General", "sbg:toolDefaultValue": "ALL", "type": [ "null", { "type": "enum", "symbols": [ "ALL", "OVERLAPPING_ONLY" ], "name": "interval_merging" } ], "inputBinding": { "position": 0, "prefix": "--interval_merging", "separate": true, "sbg:cmdInclude": true }, "label": "Interval Merging", "description": "Indicates the interval merging rule we should use for abutting intervals.", "id": "#interval_merging" }, { "required": false, "sbg:altPrefix": "-K", "sbg:category": "Input Files", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--gatk_key", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Gatk key", "description": "GATK Key file. Required if running with -et NO_ET. Please see http://gatkforums.broadinstitute.org/discussion/1250/what-is-phone-home-and-how-does-it-affect-me#latest for details.", "sbg:fileTypes": "KEY, LICENSE", "id": "#gatk_key" }, { "sbg:altPrefix": "-fixMisencodedQuals", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "-fixMisencodedQuals", "separate": true, "sbg:cmdInclude": true }, "label": "Fix Misencoded Quals", "description": "Fix mis-encoded base quality scores.", "id": "#fix_misencoded_quals" }, { "required": false, "sbg:altPrefix": "-XL", "sbg:category": "Input Files", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--excludeIntervals", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Exclude Intervals", "description": "One or more genomic intervals to exclude from processing. Can be an .intervals file or a rod file.", "sbg:fileTypes": "TXT, BED, VCF", "id": "#exclude_intervals" }, { "sbg:altPrefix": "-EOQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--emit_original_quals", "separate": true, "sbg:cmdInclude": true }, "label": "Emit Original Quals", "description": "If true, enables printing of the OQ tag with the original base qualities (with -BQSR).", "id": "#emit_original_quals" }, { "sbg:altPrefix": "-dt", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", { "type": "enum", "symbols": [ "NONE", "ALL_READS", "BY_SAMPLE" ], "name": "downsampling_type" } ], "inputBinding": { "position": 0, "prefix": "--downsampling_type", "separate": true, "sbg:cmdInclude": true }, "label": "Downsampling Type", "description": "Type of reads downsampling to employ at a given locus. Reads will be selected randomly to be removed from the pile based on the method described here.", "id": "#downsampling_type" }, { "sbg:altPrefix": "-dfrac", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--downsample_to_fraction", "separate": true, "sbg:cmdInclude": true }, "label": "Downsample to Fraction", "description": "Fraction [0.0-1.0] of reads to downsample to.", "id": "#downsample_to_fraction" }, { "sbg:altPrefix": "-dcov", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--downsample_to_coverage", "separate": true, "sbg:cmdInclude": true }, "label": "Downsample to Coverage", "description": "Coverage to downsample to at any given locus; note that downsampled reads are randomly selected from all possible reads at a locus. For non-locus-based traversals (eg., ReadWalkers), this sets the maximum number of reads at each alignment start position.", "id": "#downsample_to_coverage" }, { "sbg:altPrefix": null, "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--disableRandomization", "separate": true, "sbg:cmdInclude": true }, "label": "Disable Randomization", "description": "Completely eliminates randomization from nondeterministic methods. To be used mostly in the testing framework where dynamic parallelism can result in differing numbers of calls to the generator.", "id": "#disable_radnomization" }, { "sbg:altPrefix": "-DIQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--disable_indel_quals", "separate": true, "sbg:cmdInclude": true }, "label": "Disable Indel Quals", "description": "If 'true', disables printing of base insertion and base deletion tags (with -BQSR). Turns off printing of the base insertion and base deletion tags when using the -BQSR argument and only the base substitution qualities will be produced.", "id": "#disable_indel_quals" }, { "sbg:altPrefix": "-DBQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "-1", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--defaultBaseQualities", "separate": true, "sbg:cmdInclude": true }, "label": "Default Base Qualities", "description": "If reads are missing some or all base quality scores, this value will be used for all base quality scores.", "id": "#default_base_qualities" }, { "sbg:altPrefix": null, "sbg:category": "Execution", "sbg:toolDefaultValue": "1", "type": [ "null", "int" ], "label": "CPU per job", "description": "Number of CPUs per job.", "id": "#cpu_per_job" }, { "required": false, "sbg:altPrefix": null, "sbg:category": "Input Files", "type": [ "null", { "type": "array", "items": "File" } ], "inputBinding": { "position": 0, "prefix": "--BQSR", "separate": true, "sbg:cmdInclude": true }, "label": "BQSR Table", "description": "The input covariates table file which enables on-the-fly base quality score recalibration.", "sbg:fileTypes": "GRP", "id": "#bqsr" }, { "sbg:altPrefix": "-baqGOP", "sbg:category": "GATK General", "sbg:toolDefaultValue": "40.0", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--baqGapOpenPenalty", "separate": true, "sbg:cmdInclude": true }, "label": "BAQ Gap Open Penalty", "description": "BAQ gap open penalty (Phred Scaled). Default value is 40. 30 is perhaps better for whole genome call sets.", "id": "#baq_gap_open_penalty" }, { "sbg:altPrefix": "-baq", "sbg:category": "GATK General", "sbg:toolDefaultValue": "OFF", "type": [ "null", { "type": "enum", "symbols": [ "OFF", "CALCULATE_AS_NECESSARY", "RECALCULATE" ], "name": "baq" } ], "inputBinding": { "position": 0, "prefix": "--baq", "separate": true, "sbg:cmdInclude": true }, "label": "BAQ Calculation Type", "description": "Type of BAQ calculation to apply in the engine.", "id": "#baq" }, { "sbg:altPrefix": "--allow_potentially_misencoded_quality_scores", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "-allowPotentiallyMisencodedQuals", "separate": true, "sbg:cmdInclude": true }, "label": "Allow Potentially Misencoded Quals", "description": "Do not fail when encountered base qualities that are too high and seemingly indicate a problem with the base quality encoding of the BAM file.", "id": "#allow_potentailly_misencoded_quals" } ], "outputs": [ { "type": [ "File" ], "label": "Recalibrated BAM", "description": "Write output to this BAM filename.", "sbg:fileTypes": "BAM, SAM", "outputBinding": { "glob": "*.bam", "sbg:inheritMetadataFrom": "#reads", "secondaryFiles": [ ".bai", "^.bai" ] }, "id": "#recalibrated_bam" } ], "requirements": [ { "class": "ExpressionEngineRequirement", "id": "#cwl-js-engine", "requirements": [ { "class": "DockerRequirement", "dockerPull": "rabix/js-engine" } ] } ], "hints": [ { "class": "sbg:CPURequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.cpu_per_job){\n \treturn $job.inputs.cpu_per_job\n }\n\treturn 1\n}" } }, { "class": "sbg:MemRequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.memory_per_job){\n if($job.inputs.memory_overhead_per_job){\n \treturn $job.inputs.memory_per_job + $job.inputs.memory_overhead_per_job\n }\n else\n \t\treturn $job.inputs.memory_per_job\n }\n else if(!$job.inputs.memory_per_job && $job.inputs.memory_overhead_per_job){\n\t\treturn 2048 + $job.inputs.memory_overhead_per_job \n }\n else\n \treturn 2048\n}" } }, { "class": "DockerRequirement", "dockerImageId": "47510cb2da55", "dockerPull": "images.sbgenomics.com/stefanristeski/gatk2-lite:2.3-9" } ], "arguments": [ { "position": 0, "prefix": "--out", "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n read_name = [].concat($job.inputs.reads)[0].path.replace(/^.*[\\\\\\/]/, '').split('.')\n read_namebase = read_name.slice(0, read_name.length-1).join('.')\n\n if($job.inputs.bqsr){\n \treturn read_namebase + '.base_recalibrated.bam'\n }\n else{\n \treturn read_namebase + '.bam'\n }\n}" } }, { "position": 10000, "prefix": ";", "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n read_name = [].concat($job.inputs.reads)[0].path.replace(/^.*[\\\\\\/]/, '').split('.')\n read_namebase = read_name.slice(0, read_name.length-1).join('.')\n if($job.inputs.bqsr){\n\treturn 'mv ' + read_namebase + '.base_recalibrated.bai '+ read_namebase + '.base_recalibrated.bam.bai'\n }\n else{\n \treturn 'mv ' + read_namebase + '.bai '+read_namebase+'.bam.bai'\n }\n}" } } ], "sbg:job": { "inputs": { "validation_strictness": null, "use_original_qualities": null, "use_legacy_downsampler": null, "unsafe": null, "threads_per_job": null, "tag": null, "simplify": null, "sample_name": [], "sample_file": [], "remove_program_records": null, "reference": { "path": "/folder/reference.fasta" }, "reads": [ { "metadata": { "intervals_file": "3333.intervals" }, "path": "/folder/input1.bam", "secondaryFiles": [] } ], "read_group_black_list": [], "read_group": null, "read_filter": [ "MappingQualityZero" ], "preserve_qscores_less_than": null, "platform": null, "phone_home": null, "pedigree_validation_type": null, "pedigree_string": [], "number": null, "non_deterministic_random_seed": null, "memory_per_job": null, "memory_overhead_per_job": 0, "max_runtime_units": null, "max_runtime": null, "keep_program_records": null, "intervals_file": [ { "class": "File", "path": "/path/to/intervals_file-1.ext", "secondaryFiles": [], "size": 0 }, { "class": "File", "path": "/path/to/intervals_file-2.ext", "secondaryFiles": [], "size": 0 } ], "intervals": null, "interval_set_rule": null, "interval_padding": null, "interval_merging": null, "gatk_key": null, "fix_misencoded_quals": null, "exclude_intervals": null, "emit_original_quals": null, "downsampling_type": null, "downsample_to_fraction": null, "downsample_to_coverage": null, "disable_radnomization": null, "disable_indel_quals": null, "default_base_qualities": null, "cpu_per_job": null, "bqsr": [], "baq_gap_open_penalty": null, "baq": null, "allow_potentailly_misencoded_quals": null }, "allocatedResources": { "mem": 2048, "cpu": 1 } }, "sbg:categories": [ "SAM/BAM-Processing" ], "sbg:cmdPreview": "java -Xmx2048M -jar /opt/GenomeAnalysisTKLite.jar --analysis_type PrintReads -nct 4 --reference_sequence /folder/reference.fasta --input_file /folder/input1.bam --out input1.bam ; mv input1.bai input1.bam.bai", "sbg:contributors": [ "vladimirk", "bix-demo" ], "sbg:createdBy": "bix-demo", "sbg:createdOn": 1450911393, "sbg:id": "admin/sbg-public-data/gatk-2-3-9-lite-printreads/21", "sbg:image_url": null, "sbg:latestRevision": 12, "sbg:license": "MIT License", "sbg:links": [ { "id": "https://www.broadinstitute.org/gatk/index.php", "label": "Homepage" }, { "id": "https://github.com/broadgsa/gatk-protected", "label": "Source Code" }, { "id": "https://www.broadinstitute.org/gatk/guide/pdfdocs/GATK_GuideBook_2.3-9.pdf", "label": "Wiki" }, { "id": "https://www.broadinstitute.org/gatk/download/auth?package=GATK-archive&version=2.3-9-ge5ebf34", "label": "Download" }, { "id": "https://www.broadinstitute.org/gatk/about/#in-the-literature", "label": "Publication" }, { "id": "https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_readutils_PrintReads.php", "label": "Documentation" } ], "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1476372560, "sbg:project": "bix-demo/gatk-2-3-9-lite-demo", "sbg:revision": 12, "sbg:revisionNotes": "Read_filter names corrected (\"Filter\" string removed)", "sbg:revisionsInfo": [ { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911393, "sbg:revision": 0, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911394, "sbg:revision": 1, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911395, "sbg:revision": 2, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911396, "sbg:revision": 3, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911397, "sbg:revision": 4, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911398, "sbg:revision": 5, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911399, "sbg:revision": 6, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1461854096, "sbg:revision": 7, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1461861365, "sbg:revision": 8, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1472656041, "sbg:revision": 9, "sbg:revisionNotes": "metadata scatter." }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1472744347, "sbg:revision": 10, "sbg:revisionNotes": "intervals_file - stage link" }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1476371132, "sbg:revision": 11, "sbg:revisionNotes": "Added support for single bam processing without intervals" }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1476372560, "sbg:revision": 12, "sbg:revisionNotes": "Read_filter names corrected (\"Filter\" string removed)" } ], "sbg:sbgMaintained": false, "sbg:toolAuthor": "Broad Institute", "sbg:toolkit": "GATK", "sbg:toolkitVersion": "2.3.9 Lite", "sbg:validationErrors": [], "x": 1768.3340922859265, "y": 406.0939039029232 }, "label": "GATK PrintReads", "scatter": "#GATK_PrintReads.reads", "sbg:x": 1768.3340922859265, "sbg:y": 406.0939039029232 }, { "id": "#SBG_Prepare_VQSR_dbSNP", "inputs": [ { "id": "#SBG_Prepare_VQSR_dbSNP.truth", "default": true }, { "id": "#SBG_Prepare_VQSR_dbSNP.training", "default": true }, { "id": "#SBG_Prepare_VQSR_dbSNP.prior", "default": 2 }, { "id": "#SBG_Prepare_VQSR_dbSNP.label", "default": "dbsnp" }, { "id": "#SBG_Prepare_VQSR_dbSNP.input_vcf", "source": [ "#dbsnp" ] } ], "outputs": [ { "id": "#SBG_Prepare_VQSR_dbSNP.output_vcf" } ], "run": { "cwlVersion": "sbg:draft-2", "class": "CommandLineTool", "id": "admin/sbg-public-data/sbg-prepare-gatk-variantrecalibrator-resource/12", "label": "SBG Prepare VQSR dbSNP", "description": "Prepare VQSR resource is a tool for preparing resource datasets and arguments to use with VQSR. It sets a metadata for a list of sites for which to apply a prior probability of being correct, but which aren't used by the algorithm (training and truth sets are required to run).", "baseCommand": [ "echo", "Preparing", "VQSR", "Resources" ], "inputs": [ { "sbg:category": "", "type": [ "null", "boolean" ], "label": "Truth", "description": "Truth.", "id": "#truth" }, { "sbg:category": "", "type": [ "null", "boolean" ], "label": "Training", "description": "Training.", "id": "#training" }, { "sbg:category": "", "type": [ "float" ], "label": "Prior", "description": "Prior.", "id": "#prior" }, { "sbg:category": "", "type": [ "string" ], "label": "Label", "description": "Label.", "id": "#label" }, { "sbg:category": "", "type": [ "null", "boolean" ], "label": "Known", "description": "Known.", "id": "#known" }, { "required": false, "sbg:stageInput": "link", "type": [ "null", "File" ], "label": "VCF File", "description": "Input VCF file for GATK VariantRecalibrator Resources.", "sbg:fileTypes": "VCF", "id": "#input_vcf" } ], "outputs": [ { "type": [ "null", "File" ], "label": "Prepared VCF", "description": "Prepared VCF file for GATK VariantRecalibrator", "outputBinding": { "glob": "*.vcf", "sbg:metadata": { "resources": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{ \n if($job.inputs.known){known = 'true'}else{known = 'false'}\n if($job.inputs.training){training = 'true'}else{training = 'false'}\n if($job.inputs.truth){truth = 'true'}else{truth = 'false'}\n\n res = ['-resource:' + $job.inputs.label,\n 'known=' + known,\n 'training=' + training,\n 'truth=' + truth,\n 'prior=' + $job.inputs.prior\n ]\n return res.join(\",\")\n}\n\n\n " } }, "sbg:inheritMetadataFrom": "#input_vcf" }, "id": "#output_vcf" } ], "requirements": [ { "class": "ExpressionEngineRequirement", "id": "#cwl-js-engine", "requirements": [ { "class": "DockerRequirement", "dockerPull": "rabix/js-engine" } ] } ], "hints": [ { "class": "sbg:CPURequirement", "value": 1 }, { "class": "sbg:MemRequirement", "value": 1000 }, { "class": "DockerRequirement", "dockerPull": "ubuntu:14.04" } ], "sbg:job": { "inputs": { "truth": true, "training": null, "prior": 12, "label": "aaa", "known": true, "input_vcf": { "path": "vcf" } }, "allocatedResources": { "mem": 1000, "cpu": 1 } }, "sbg:categories": [ "VCF-Processing" ], "sbg:cmdPreview": "echo Preparing VQSR Resources", "sbg:contributors": [ "djordje_klisic", "vladimirk", "bogdang" ], "sbg:createdBy": "djordje_klisic", "sbg:createdOn": 1461613037, "sbg:id": "admin/sbg-public-data/sbg-prepare-gatk-variantrecalibrator-resource/12", "sbg:image_url": null, "sbg:latestRevision": 4, "sbg:license": "Apache License 2.0", "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1476451888, "sbg:project": "bix-demo/sbgtools-demo", "sbg:revision": 4, "sbg:revisionNotes": "command line echo", "sbg:revisionsInfo": [ { "sbg:modifiedBy": "djordje_klisic", "sbg:modifiedOn": 1461613037, "sbg:revision": 0, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "djordje_klisic", "sbg:modifiedOn": 1461613070, "sbg:revision": 1, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1471362399, "sbg:revision": 2, "sbg:revisionNotes": "VCF file type and required set for input_vcf." }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1472512980, "sbg:revision": 3, "sbg:revisionNotes": "VCF_input not required." }, { "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1476451888, "sbg:revision": 4, "sbg:revisionNotes": "command line echo" } ], "sbg:sbgMaintained": false, "sbg:toolAuthor": "Seven Bridges Genomics", "sbg:toolkit": "SBGTools", "sbg:toolkitVersion": "", "sbg:validationErrors": [], "x": 1965.0002025365889, "y": 676.0938190788879 }, "label": "SBG Prepare VQSR dbSNP", "sbg:x": 1965.0002025365889, "sbg:y": 676.0938190788879 }, { "id": "#SBG_Prepare_VQSR_Mills", "inputs": [ { "id": "#SBG_Prepare_VQSR_Mills.truth", "default": true }, { "id": "#SBG_Prepare_VQSR_Mills.training", "default": true }, { "id": "#SBG_Prepare_VQSR_Mills.prior", "default": 12 }, { "id": "#SBG_Prepare_VQSR_Mills.label", "default": "mills" }, { "id": "#SBG_Prepare_VQSR_Mills.input_vcf", "source": [ "#mills" ] } ], "outputs": [ { "id": "#SBG_Prepare_VQSR_Mills.output_vcf" } ], "run": { "cwlVersion": "sbg:draft-2", "class": "CommandLineTool", "id": "admin/sbg-public-data/sbg-prepare-gatk-variantrecalibrator-resource/12", "label": "SBG Prepare VQSR Mills", "description": "Prepare VQSR resource is a tool for preparing resource datasets and arguments to use with VQSR. It sets a metadata for a list of sites for which to apply a prior probability of being correct, but which aren't used by the algorithm (training and truth sets are required to run).", "baseCommand": [ "echo", "Preparing", "VQSR", "Resources" ], "inputs": [ { "sbg:category": "", "type": [ "null", "boolean" ], "label": "Truth", "description": "Truth.", "id": "#truth" }, { "sbg:category": "", "type": [ "null", "boolean" ], "label": "Training", "description": "Training.", "id": "#training" }, { "sbg:category": "", "type": [ "float" ], "label": "Prior", "description": "Prior.", "id": "#prior" }, { "sbg:category": "", "type": [ "string" ], "label": "Label", "description": "Label.", "id": "#label" }, { "sbg:category": "", "type": [ "null", "boolean" ], "label": "Known", "description": "Known.", "id": "#known" }, { "required": false, "sbg:stageInput": "link", "type": [ "null", "File" ], "label": "VCF File", "description": "Input VCF file for GATK VariantRecalibrator Resources.", "sbg:fileTypes": "VCF", "id": "#input_vcf" } ], "outputs": [ { "type": [ "null", "File" ], "label": "Prepared VCF", "description": "Prepared VCF file for GATK VariantRecalibrator", "outputBinding": { "glob": "*.vcf", "sbg:metadata": { "resources": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{ \n if($job.inputs.known){known = 'true'}else{known = 'false'}\n if($job.inputs.training){training = 'true'}else{training = 'false'}\n if($job.inputs.truth){truth = 'true'}else{truth = 'false'}\n\n res = ['-resource:' + $job.inputs.label,\n 'known=' + known,\n 'training=' + training,\n 'truth=' + truth,\n 'prior=' + $job.inputs.prior\n ]\n return res.join(\",\")\n}\n\n\n " } }, "sbg:inheritMetadataFrom": "#input_vcf" }, "id": "#output_vcf" } ], "requirements": [ { "class": "ExpressionEngineRequirement", "id": "#cwl-js-engine", "requirements": [ { "class": "DockerRequirement", "dockerPull": "rabix/js-engine" } ] } ], "hints": [ { "class": "sbg:CPURequirement", "value": 1 }, { "class": "sbg:MemRequirement", "value": 1000 }, { "class": "DockerRequirement", "dockerPull": "ubuntu:14.04" } ], "sbg:job": { "inputs": { "truth": true, "training": null, "prior": 12, "label": "aaa", "known": true, "input_vcf": { "path": "vcf" } }, "allocatedResources": { "mem": 1000, "cpu": 1 } }, "sbg:categories": [ "VCF-Processing" ], "sbg:cmdPreview": "echo Preparing VQSR Resources", "sbg:contributors": [ "djordje_klisic", "vladimirk", "bogdang" ], "sbg:createdBy": "djordje_klisic", "sbg:createdOn": 1461613037, "sbg:id": "admin/sbg-public-data/sbg-prepare-gatk-variantrecalibrator-resource/12", "sbg:image_url": null, "sbg:latestRevision": 4, "sbg:license": "Apache License 2.0", "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1476451888, "sbg:project": "bix-demo/sbgtools-demo", "sbg:revision": 4, "sbg:revisionNotes": "command line echo", "sbg:revisionsInfo": [ { "sbg:modifiedBy": "djordje_klisic", "sbg:modifiedOn": 1461613037, "sbg:revision": 0, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "djordje_klisic", "sbg:modifiedOn": 1461613070, "sbg:revision": 1, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1471362399, "sbg:revision": 2, "sbg:revisionNotes": "VCF file type and required set for input_vcf." }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1472512980, "sbg:revision": 3, "sbg:revisionNotes": "VCF_input not required." }, { "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1476451888, "sbg:revision": 4, "sbg:revisionNotes": "command line echo" } ], "sbg:sbgMaintained": false, "sbg:toolAuthor": "Seven Bridges Genomics", "sbg:toolkit": "SBGTools", "sbg:toolkitVersion": "", "sbg:validationErrors": [], "x": 2576.6670919921808, "y": 389.42713929919915 }, "label": "SBG Prepare VQSR Mills", "sbg:x": 2576.6670919921808, "sbg:y": 389.42713929919915 }, { "id": "#SBG_Prepare_VQSR_1000G", "inputs": [ { "id": "#SBG_Prepare_VQSR_1000G.truth", "default": true }, { "id": "#SBG_Prepare_VQSR_1000G.training", "default": true }, { "id": "#SBG_Prepare_VQSR_1000G.prior", "default": 10 }, { "id": "#SBG_Prepare_VQSR_1000G.label", "default": "1000G" }, { "id": "#SBG_Prepare_VQSR_1000G.input_vcf", "source": [ "#1000g_p1_snps" ] } ], "outputs": [ { "id": "#SBG_Prepare_VQSR_1000G.output_vcf" } ], "run": { "cwlVersion": "sbg:draft-2", "class": "CommandLineTool", "id": "admin/sbg-public-data/sbg-prepare-gatk-variantrecalibrator-resource/12", "label": "SBG Prepare VQSR 1000G", "description": "Prepare VQSR resource is a tool for preparing resource datasets and arguments to use with VQSR. It sets a metadata for a list of sites for which to apply a prior probability of being correct, but which aren't used by the algorithm (training and truth sets are required to run).", "baseCommand": [ "echo", "Preparing", "VQSR", "Resources" ], "inputs": [ { "sbg:category": "", "type": [ "null", "boolean" ], "label": "Truth", "description": "Truth.", "id": "#truth" }, { "sbg:category": "", "type": [ "null", "boolean" ], "label": "Training", "description": "Training.", "id": "#training" }, { "sbg:category": "", "type": [ "float" ], "label": "Prior", "description": "Prior.", "id": "#prior" }, { "sbg:category": "", "type": [ "string" ], "label": "Label", "description": "Label.", "id": "#label" }, { "sbg:category": "", "type": [ "null", "boolean" ], "label": "Known", "description": "Known.", "id": "#known" }, { "required": false, "sbg:stageInput": "link", "type": [ "null", "File" ], "label": "VCF File", "description": "Input VCF file for GATK VariantRecalibrator Resources.", "sbg:fileTypes": "VCF", "id": "#input_vcf" } ], "outputs": [ { "type": [ "null", "File" ], "label": "Prepared VCF", "description": "Prepared VCF file for GATK VariantRecalibrator", "outputBinding": { "glob": "*.vcf", "sbg:metadata": { "resources": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{ \n if($job.inputs.known){known = 'true'}else{known = 'false'}\n if($job.inputs.training){training = 'true'}else{training = 'false'}\n if($job.inputs.truth){truth = 'true'}else{truth = 'false'}\n\n res = ['-resource:' + $job.inputs.label,\n 'known=' + known,\n 'training=' + training,\n 'truth=' + truth,\n 'prior=' + $job.inputs.prior\n ]\n return res.join(\",\")\n}\n\n\n " } }, "sbg:inheritMetadataFrom": "#input_vcf" }, "id": "#output_vcf" } ], "requirements": [ { "class": "ExpressionEngineRequirement", "id": "#cwl-js-engine", "requirements": [ { "class": "DockerRequirement", "dockerPull": "rabix/js-engine" } ] } ], "hints": [ { "class": "sbg:CPURequirement", "value": 1 }, { "class": "sbg:MemRequirement", "value": 1000 }, { "class": "DockerRequirement", "dockerPull": "ubuntu:14.04" } ], "sbg:job": { "inputs": { "truth": true, "training": null, "prior": 12, "label": "aaa", "known": true, "input_vcf": { "path": "vcf" } }, "allocatedResources": { "mem": 1000, "cpu": 1 } }, "sbg:categories": [ "VCF-Processing" ], "sbg:cmdPreview": "echo Preparing VQSR Resources", "sbg:contributors": [ "djordje_klisic", "vladimirk", "bogdang" ], "sbg:createdBy": "djordje_klisic", "sbg:createdOn": 1461613037, "sbg:id": "admin/sbg-public-data/sbg-prepare-gatk-variantrecalibrator-resource/12", "sbg:image_url": null, "sbg:latestRevision": 4, "sbg:license": "Apache License 2.0", "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1476451888, "sbg:project": "bix-demo/sbgtools-demo", "sbg:revision": 4, "sbg:revisionNotes": "command line echo", "sbg:revisionsInfo": [ { "sbg:modifiedBy": "djordje_klisic", "sbg:modifiedOn": 1461613037, "sbg:revision": 0, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "djordje_klisic", "sbg:modifiedOn": 1461613070, "sbg:revision": 1, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1471362399, "sbg:revision": 2, "sbg:revisionNotes": "VCF file type and required set for input_vcf." }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1472512980, "sbg:revision": 3, "sbg:revisionNotes": "VCF_input not required." }, { "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1476451888, "sbg:revision": 4, "sbg:revisionNotes": "command line echo" } ], "sbg:sbgMaintained": false, "sbg:toolAuthor": "Seven Bridges Genomics", "sbg:toolkit": "SBGTools", "sbg:toolkitVersion": "", "sbg:validationErrors": [], "x": 2376.6672770182304, "y": -147.23963419596365 }, "label": "SBG Prepare VQSR 1000G", "sbg:x": 2376.6672770182304, "sbg:y": -147.23963419596365 }, { "id": "#SBG_Prepare_VQSR_HapMap", "inputs": [ { "id": "#SBG_Prepare_VQSR_HapMap.truth", "default": true }, { "id": "#SBG_Prepare_VQSR_HapMap.training", "default": true }, { "id": "#SBG_Prepare_VQSR_HapMap.prior", "default": 15 }, { "id": "#SBG_Prepare_VQSR_HapMap.label", "default": "hapmap" }, { "id": "#SBG_Prepare_VQSR_HapMap.input_vcf", "source": [ "#hapmap" ] } ], "outputs": [ { "id": "#SBG_Prepare_VQSR_HapMap.output_vcf" } ], "run": { "cwlVersion": "sbg:draft-2", "class": "CommandLineTool", "id": "admin/sbg-public-data/sbg-prepare-gatk-variantrecalibrator-resource/12", "label": "SBG Prepare VQSR HapMap", "description": "Prepare VQSR resource is a tool for preparing resource datasets and arguments to use with VQSR. It sets a metadata for a list of sites for which to apply a prior probability of being correct, but which aren't used by the algorithm (training and truth sets are required to run).", "baseCommand": [ "echo", "Preparing", "VQSR", "Resources" ], "inputs": [ { "sbg:category": "", "type": [ "null", "boolean" ], "label": "Truth", "description": "Truth.", "id": "#truth" }, { "sbg:category": "", "type": [ "null", "boolean" ], "label": "Training", "description": "Training.", "id": "#training" }, { "sbg:category": "", "type": [ "float" ], "label": "Prior", "description": "Prior.", "id": "#prior" }, { "sbg:category": "", "type": [ "string" ], "label": "Label", "description": "Label.", "id": "#label" }, { "sbg:category": "", "type": [ "null", "boolean" ], "label": "Known", "description": "Known.", "id": "#known" }, { "required": false, "sbg:stageInput": "link", "type": [ "null", "File" ], "label": "VCF File", "description": "Input VCF file for GATK VariantRecalibrator Resources.", "sbg:fileTypes": "VCF", "id": "#input_vcf" } ], "outputs": [ { "type": [ "null", "File" ], "label": "Prepared VCF", "description": "Prepared VCF file for GATK VariantRecalibrator", "outputBinding": { "glob": "*.vcf", "sbg:metadata": { "resources": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{ \n if($job.inputs.known){known = 'true'}else{known = 'false'}\n if($job.inputs.training){training = 'true'}else{training = 'false'}\n if($job.inputs.truth){truth = 'true'}else{truth = 'false'}\n\n res = ['-resource:' + $job.inputs.label,\n 'known=' + known,\n 'training=' + training,\n 'truth=' + truth,\n 'prior=' + $job.inputs.prior\n ]\n return res.join(\",\")\n}\n\n\n " } }, "sbg:inheritMetadataFrom": "#input_vcf" }, "id": "#output_vcf" } ], "requirements": [ { "class": "ExpressionEngineRequirement", "id": "#cwl-js-engine", "requirements": [ { "class": "DockerRequirement", "dockerPull": "rabix/js-engine" } ] } ], "hints": [ { "class": "sbg:CPURequirement", "value": 1 }, { "class": "sbg:MemRequirement", "value": 1000 }, { "class": "DockerRequirement", "dockerPull": "ubuntu:14.04" } ], "sbg:job": { "inputs": { "truth": true, "training": null, "prior": 12, "label": "aaa", "known": true, "input_vcf": { "path": "vcf" } }, "allocatedResources": { "mem": 1000, "cpu": 1 } }, "sbg:categories": [ "VCF-Processing" ], "sbg:cmdPreview": "echo Preparing VQSR Resources", "sbg:contributors": [ "djordje_klisic", "vladimirk", "bogdang" ], "sbg:createdBy": "djordje_klisic", "sbg:createdOn": 1461613037, "sbg:id": "admin/sbg-public-data/sbg-prepare-gatk-variantrecalibrator-resource/12", "sbg:image_url": null, "sbg:latestRevision": 4, "sbg:license": "Apache License 2.0", "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1476451888, "sbg:project": "bix-demo/sbgtools-demo", "sbg:revision": 4, "sbg:revisionNotes": "command line echo", "sbg:revisionsInfo": [ { "sbg:modifiedBy": "djordje_klisic", "sbg:modifiedOn": 1461613037, "sbg:revision": 0, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "djordje_klisic", "sbg:modifiedOn": 1461613070, "sbg:revision": 1, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1471362399, "sbg:revision": 2, "sbg:revisionNotes": "VCF file type and required set for input_vcf." }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1472512980, "sbg:revision": 3, "sbg:revisionNotes": "VCF_input not required." }, { "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1476451888, "sbg:revision": 4, "sbg:revisionNotes": "command line echo" } ], "sbg:sbgMaintained": false, "sbg:toolAuthor": "Seven Bridges Genomics", "sbg:toolkit": "SBGTools", "sbg:toolkitVersion": "", "sbg:validationErrors": [], "x": 2520.0005086263036, "y": -232.23963419596367 }, "label": "SBG Prepare VQSR HapMap", "sbg:x": 2520.0005086263036, "sbg:y": -232.23963419596367 }, { "id": "#SBG_Prepare_VQSR_Omni", "inputs": [ { "id": "#SBG_Prepare_VQSR_Omni.truth", "default": true }, { "id": "#SBG_Prepare_VQSR_Omni.training", "default": true }, { "id": "#SBG_Prepare_VQSR_Omni.prior", "default": 12 }, { "id": "#SBG_Prepare_VQSR_Omni.label", "default": "omni" }, { "id": "#SBG_Prepare_VQSR_Omni.input_vcf", "source": [ "#1000g_omni" ] } ], "outputs": [ { "id": "#SBG_Prepare_VQSR_Omni.output_vcf" } ], "run": { "cwlVersion": "sbg:draft-2", "class": "CommandLineTool", "id": "admin/sbg-public-data/sbg-prepare-gatk-variantrecalibrator-resource/12", "label": "SBG Prepare VQSR Omni", "description": "Prepare VQSR resource is a tool for preparing resource datasets and arguments to use with VQSR. It sets a metadata for a list of sites for which to apply a prior probability of being correct, but which aren't used by the algorithm (training and truth sets are required to run).", "baseCommand": [ "echo", "Preparing", "VQSR", "Resources" ], "inputs": [ { "sbg:category": "", "type": [ "null", "boolean" ], "label": "Truth", "description": "Truth.", "id": "#truth" }, { "sbg:category": "", "type": [ "null", "boolean" ], "label": "Training", "description": "Training.", "id": "#training" }, { "sbg:category": "", "type": [ "float" ], "label": "Prior", "description": "Prior.", "id": "#prior" }, { "sbg:category": "", "type": [ "string" ], "label": "Label", "description": "Label.", "id": "#label" }, { "sbg:category": "", "type": [ "null", "boolean" ], "label": "Known", "description": "Known.", "id": "#known" }, { "required": false, "sbg:stageInput": "link", "type": [ "null", "File" ], "label": "VCF File", "description": "Input VCF file for GATK VariantRecalibrator Resources.", "sbg:fileTypes": "VCF", "id": "#input_vcf" } ], "outputs": [ { "type": [ "null", "File" ], "label": "Prepared VCF", "description": "Prepared VCF file for GATK VariantRecalibrator", "outputBinding": { "glob": "*.vcf", "sbg:metadata": { "resources": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{ \n if($job.inputs.known){known = 'true'}else{known = 'false'}\n if($job.inputs.training){training = 'true'}else{training = 'false'}\n if($job.inputs.truth){truth = 'true'}else{truth = 'false'}\n\n res = ['-resource:' + $job.inputs.label,\n 'known=' + known,\n 'training=' + training,\n 'truth=' + truth,\n 'prior=' + $job.inputs.prior\n ]\n return res.join(\",\")\n}\n\n\n " } }, "sbg:inheritMetadataFrom": "#input_vcf" }, "id": "#output_vcf" } ], "requirements": [ { "class": "ExpressionEngineRequirement", "id": "#cwl-js-engine", "requirements": [ { "class": "DockerRequirement", "dockerPull": "rabix/js-engine" } ] } ], "hints": [ { "class": "sbg:CPURequirement", "value": 1 }, { "class": "sbg:MemRequirement", "value": 1000 }, { "class": "DockerRequirement", "dockerPull": "ubuntu:14.04" } ], "sbg:job": { "inputs": { "truth": true, "training": null, "prior": 12, "label": "aaa", "known": true, "input_vcf": { "path": "vcf" } }, "allocatedResources": { "mem": 1000, "cpu": 1 } }, "sbg:categories": [ "VCF-Processing" ], "sbg:cmdPreview": "echo Preparing VQSR Resources", "sbg:contributors": [ "djordje_klisic", "vladimirk", "bogdang" ], "sbg:createdBy": "djordje_klisic", "sbg:createdOn": 1461613037, "sbg:id": "admin/sbg-public-data/sbg-prepare-gatk-variantrecalibrator-resource/12", "sbg:image_url": null, "sbg:latestRevision": 4, "sbg:license": "Apache License 2.0", "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1476451888, "sbg:project": "bix-demo/sbgtools-demo", "sbg:revision": 4, "sbg:revisionNotes": "command line echo", "sbg:revisionsInfo": [ { "sbg:modifiedBy": "djordje_klisic", "sbg:modifiedOn": 1461613037, "sbg:revision": 0, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "djordje_klisic", "sbg:modifiedOn": 1461613070, "sbg:revision": 1, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1471362399, "sbg:revision": 2, "sbg:revisionNotes": "VCF file type and required set for input_vcf." }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1472512980, "sbg:revision": 3, "sbg:revisionNotes": "VCF_input not required." }, { "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1476451888, "sbg:revision": 4, "sbg:revisionNotes": "command line echo" } ], "sbg:sbgMaintained": false, "sbg:toolAuthor": "Seven Bridges Genomics", "sbg:toolkit": "SBGTools", "sbg:toolkitVersion": "", "sbg:validationErrors": [], "x": 2378.333638509116, "y": -325.5730056762697 }, "label": "SBG Prepare VQSR Omni", "sbg:x": 2378.333638509116, "sbg:y": -325.5730056762697 }, { "id": "#GATK_IndelRealigner", "inputs": [ { "id": "#GATK_IndelRealigner.target_intervals", "source": [ "#GATK_RealignerTargetCreator.indel_realigner_intervals_file" ] }, { "id": "#GATK_IndelRealigner.reference", "source": [ "#SBG_FASTA_Indices.fasta_reference" ] }, { "id": "#GATK_IndelRealigner.reads", "source": [ "#BWA_MEM_Bundle_0_7_13.aligned_reads" ] }, { "id": "#GATK_IndelRealigner.memory_per_job", "default": 2048 }, { "id": "#GATK_IndelRealigner.memory_overhead_per_job", "default": 64 }, { "id": "#GATK_IndelRealigner.known_alleles", "source": [ "#mills", "#1000g_indels" ] }, { "id": "#GATK_IndelRealigner.intervals_file", "source": [ "#SBG_Prepare_Intervals.intervals" ] }, { "id": "#GATK_IndelRealigner.cpu_per_job", "default": 1 } ], "outputs": [ { "id": "#GATK_IndelRealigner.realigned_bam_file" } ], "run": { "cwlVersion": "sbg:draft-2", "class": "CommandLineTool", "id": "admin/sbg-public-data/gatk-2-3-9-lite-indelrealigner/20", "label": "GATK IndelRealigner", "description": "Overview\n\nThe local realignment process is designed to consume one or more BAM files and to locally realign reads such that the number of mismatching bases is minimized across all the reads. In general, a large percent of regions requiring local realignment are due to the presence of an insertion or deletion (indels) in the individual's genome with respect to the reference genome. Such alignment artifacts result in many bases mismatching the reference near the misalignment, which are easily mistaken as SNPs. Moreover, since read mapping algorithms operate on each read independently, it is impossible to place reads on the reference genome such at mismatches are minimized across all reads. Consequently, even when some reads are correctly mapped with indels, reads covering the indel near just the start or end of the read are often incorrectly mapped with respect the true indel, also requiring realignment. Local realignment serves to transform regions with misalignments due to indels into clean reads containing a consensus indel suitable for standard variant discovery approaches. Unlike most mappers, this walker uses the full alignment context to determine whether an appropriate alternate reference (i.e. indel) exists. Following local realignment, the GATK tool Unified Genotyper can be used to sensitively and specifically identify indels.\n\nThere are 2 steps to the realignment process:\n\n1. Determining (small) suspicious intervals which are likely in need of realignment (see the RealignerTargetCreator tool)\n2. Running the realigner over those intervals (IndelRealigner)\nFor more details, see the indel realignment method documentation.\n\nInput\nOne or more aligned BAM files and optionally one or more lists of known indels.\n\nOutput\nA realigned version of your input BAM file(s).\n\nUsage example:\n java -jar GenomeAnalysisTK.jar \\\n -T IndelRealigner \\\n -R reference.fasta \\\n -I input.bam \\\n --known indels.vcf \\\n -targetIntervals intervalListFromRTC.intervals \\\n -o realignedBam.bam\n \nCaveats\n\nThe input BAM(s), reference, and known indel file(s) should be the same ones to be used for the IndelRealigner step.\nBecause reads produced from the 454 technology inherently contain false indels, the realigner will not work with them (or with reads from similar technologies).\nThis tool also ignores MQ0 reads and reads with consecutive indel operators in the CIGAR string.\n\n(IMPORTANT) Reference \".fasta\" Secondary Files\n\nTools in GATK that require a fasta reference file also look for the reference file's corresponding .fai (fasta index) and .dict (fasta dictionary) files. The fasta index file allows random access to reference bases and the dictionary file is a dictionary of the contig names and sizes contained within the fasta reference. These two secondary files are essential for GATK to work properly. To append these two files to your fasta reference please use the 'SBG FASTA Indices' tool within your GATK based workflow before using any of the GATK tools.", "baseCommand": [ "java", { "class": "Expression", "engine": "#cwl-js-engine", "script": "{ \n if($job.inputs.memory_per_job){\n return '-Xmx'.concat($job.inputs.memory_per_job, 'M')\n } \n \treturn '-Xmx2048M'\n}" }, "-jar", "/opt/GenomeAnalysisTKLite.jar", "--analysis_type", "IndelRealigner" ], "inputs": [ { "sbg:altPrefix": "-S", "sbg:category": "GATK General", "sbg:toolDefaultValue": "SILENT", "type": [ "null", { "type": "enum", "symbols": [ "SILENT", "LENIENT", "STRICT" ], "name": "validation_strictness" } ], "inputBinding": { "position": 0, "prefix": "--validation_strictness", "separate": true, "sbg:cmdInclude": true }, "label": "Validation Strictness", "description": "How strict should we be with validation.", "id": "#validation_strictness" }, { "sbg:altPrefix": "-OQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--useOriginalQualities", "separate": true, "sbg:cmdInclude": true }, "label": "Use Original Qualities", "description": "If set, use the original base quality scores from the OQ tag when present instead of the standard scores.", "id": "#use_original_qualities" }, { "sbg:altPrefix": "-use_legacy_downsampler", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--use_legacy_downsampler", "separate": true, "sbg:cmdInclude": true }, "label": "Use Legacy Downsampler", "description": "Use the legacy downsampling implementation instead of the newer, less-tested implementation.", "id": "#use_legacy_downsampler" }, { "sbg:altPrefix": "-U", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", { "type": "enum", "symbols": [ "ALLOW_UNINDEXED_BAM", "ALLOW_UNSET_BAM_SORT_ORDER", "NO_READ_ORDER_VERIFICATION", "ALLOW_SEQ_DICT_INCOMPATIBILITY", "LENIENT_VCF_PROCESSING", "ALL" ], "name": "unsafe" } ], "inputBinding": { "position": 0, "prefix": "--unsafe", "separate": true, "sbg:cmdInclude": true }, "label": "Unsafe", "description": "If set, enables unsafe operations: nothing will be checked at runtime. For expert users only who know what they are doing. We do not support usage of this argument.", "id": "#unsafe" }, { "required": true, "sbg:altPrefix": "-targetIntervals", "sbg:category": "Input Files", "type": [ "File" ], "inputBinding": { "position": 0, "prefix": "--targetIntervals", "separate": true, "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Target Intervals", "description": "Intervals file output from RealignerTargetCreator.", "sbg:fileTypes": "TXT, INTERVALS", "id": "#target_intervals" }, { "sbg:altPrefix": "-tag", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "string" ], "inputBinding": { "position": 0, "prefix": "--tag", "separate": true, "sbg:cmdInclude": true }, "label": "Tag", "description": "Arbitrary tag string to identify this GATK run as part of a group of runs, for later analysis.", "id": "#tag" }, { "sbg:altPrefix": "-rpr", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--remove_program_records", "separate": true, "sbg:cmdInclude": true }, "label": "Remove Program Records", "description": "Should we override the Walker's default and remove program records from the SAM header.", "id": "#remove_program_records" }, { "required": true, "sbg:altPrefix": "-R", "sbg:category": "Input Files", "type": [ "File" ], "inputBinding": { "position": 0, "prefix": "--reference_sequence", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Reference Genome", "description": "Reference Genome in FASTA format.", "sbg:fileTypes": "FASTA, FA", "id": "#reference" }, { "required": true, "sbg:altPrefix": "-I", "sbg:category": "Input Files", "type": [ { "type": "array", "items": "File" } ], "inputBinding": { "position": 0, "prefix": "--input_file", "separate": true, "sbg:cmdInclude": true, "secondaryFiles": [ ".bai" ] }, "label": "Read sequences", "description": "Read sequences in BAM format.", "sbg:fileTypes": "SAM, BAM", "id": "#reads" }, { "sbg:altPrefix": "-rgbl", "sbg:category": "GATK General", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 0, "prefix": "--read_group_black_list", "separate": true, "sbg:cmdInclude": true }, "label": "Read Group Black List", "description": "Filters out read groups matching : or a .txt file containing the filter strings one per line.", "id": "#read_group_black_list" }, { "sbg:altPrefix": "-rf", "sbg:category": "GATK General", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": { "type": "enum", "symbols": [ "BadCigarFilter", "BadMateFilter", "CountingFilteringIterator.CountingReadFilter", "DuplicateReadFilter", "FailsVendorQualityCheckFilter", "HCMappingQualityFilter", "LibraryReadFilter", "MalformedReadFilter", "MappingQualityFilter", "MappingQualityUnavailableFilter", "MappingQualityZeroFilter", "MateSameStrandFilter", "MaxInsertSizeFilter", "MissingReadGroupFilter", "NoOriginalQualityScoresFilter", "NotPrimaryAlignmentFilter", "OverclippedReadFilter", "Platform454Filter", "PlatformFilter", "PlatformUnitFilter", "ReadGroupBlackListFilter", "ReadLengthFilter", "ReadNameFilter", "ReadStrandFilter", "ReassignMappingQualityFilter", "ReassignOneMappingQualityFilter", "SampleFilter", "SingleReadGroupFilter", "UnmappedReadFilter" ] } } ], "inputBinding": { "position": 0, "prefix": "--read_filter", "separate": true, "sbg:cmdInclude": true }, "label": "Read Filter", "description": "Specify filtration criteria to apply to each read individually.", "id": "#read_filter" }, { "sbg:altPrefix": "-preserveQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "6", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--preserve_qscores_less_than", "separate": true, "sbg:cmdInclude": true }, "label": "Preserve Qscores Less Than", "description": "Bases with quality scores less than this threshold won't be recalibrated (with -BQSR).", "id": "#preserve_qscores_less_than" }, { "sbg:altPrefix": "-et", "sbg:category": "GATK General", "sbg:toolDefaultValue": "STANDARD", "type": [ "null", { "type": "enum", "symbols": [ "NO_ET", "STANDARD" ], "name": "phone_home" } ], "inputBinding": { "position": 0, "prefix": "--phone_home", "separate": true, "sbg:cmdInclude": true }, "label": "Phone Home", "description": "What kind of GATK run report should we generate? STANDARD is the default, can be NO_ET so nothing is posted to the run repository. Please see http://gatkforums.broadinstitute.org/discussion/1250/what-is-phone-home-and-how-does-it-affect-me#latest for details.", "id": "#phone_home" }, { "sbg:altPrefix": "-pedValidationType", "sbg:category": "GATK General", "sbg:toolDefaultValue": "STRICT", "type": [ "null", { "type": "enum", "symbols": [ "STRICT", "SILENT" ], "name": "pedigree_validation_type" } ], "inputBinding": { "position": 0, "prefix": "--pedigreeValidationType", "separate": true, "sbg:cmdInclude": true }, "label": "Pedigree Validation Type", "description": "How strict should we be in validating the pedigree information?.", "id": "#pedigree_validation_type" }, { "sbg:altPrefix": "-pedString", "sbg:category": "GATK General", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 0, "prefix": "--pedigreeString", "separate": true, "sbg:cmdInclude": true }, "label": "Pedigree String", "description": "Pedigree string for samples.", "id": "#pedigree_string" }, { "sbg:altPrefix": "-ndrs", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--nonDeterministicRandomSeed", "separate": true, "sbg:cmdInclude": true }, "label": "Non Deterministic Random Seed", "description": "Makes the GATK behave non deterministically, that is, the random numbers generated will be different in every run.", "id": "#non_deterministic_random_seed" }, { "sbg:altPrefix": "-noTags", "sbg:category": "Indel Realigner", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--noOriginalAlignmentTags", "separate": true, "sbg:cmdInclude": true }, "label": "No Original Alignment Tags", "description": "Don't output the original cigar or alignment start tags for each realigned read in the output bam.", "id": "#no_original_alignment_tags" }, { "sbg:altPrefix": null, "sbg:category": "Execution", "sbg:toolDefaultValue": "2048", "type": [ "null", "int" ], "label": "Memory per job", "description": "Amount of RAM memory in MB to be used per job.", "id": "#memory_per_job" }, { "sbg:category": "Execution", "sbg:toolDefaultValue": "0", "type": [ "null", "int" ], "label": "Memory overhead per job", "description": "Memory overhead per job. By default this parameter value is set to '0' (zero megabytes). This parameter value is added to the Memory per job parameter value. This results in the allocation of the sum total (Memory per job and Memory overhead per job) amount of memory per job. By default the memory per job parameter value is set to 2048 megabytes, unless specified otherwise.", "id": "#memory_overhead_per_job" }, { "sbg:altPrefix": "-maxRuntimeUnits", "sbg:category": "GATK General", "sbg:toolDefaultValue": "MINUTES", "type": [ "null", { "type": "enum", "symbols": [ "NANOSECONDS", "MICROSECONDS", "MILLISECONDS", "SECONDS", "MINUTES", "HOURS", "DAYS" ], "name": "max_runtime_units" } ], "inputBinding": { "position": 0, "prefix": "--maxRuntimeUnits", "separate": true, "sbg:cmdInclude": true }, "label": "Max Runtime Units", "description": "The TimeUnit for maxRuntime.", "id": "#max_runtime_units" }, { "sbg:altPrefix": "-maxRuntime", "sbg:category": "GATK General", "sbg:toolDefaultValue": "-1", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--maxRuntime", "separate": true, "sbg:cmdInclude": true }, "label": "Max Runtime", "description": "If provided, that GATK will stop execution cleanly as soon after maxRuntime has been exceeded, truncating the run but not exiting with a failure. By default the value is interpreted in minutes, but this can be changed by maxRuntimeUnits.", "id": "#max_runtime" }, { "sbg:altPrefix": "-maxInMemory", "sbg:category": "Indel Realigner", "sbg:toolDefaultValue": "150000", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--maxReadsInMemory", "separate": true, "sbg:cmdInclude": true }, "label": "Max Reads In Memory", "description": "Max reads allowed to be kept in memory at a time by the SAMFileWriter.", "id": "#max_reads_in_memory" }, { "sbg:altPrefix": "-maxReads", "sbg:category": "Indel Realigner", "sbg:toolDefaultValue": "20000", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--maxReadsForRealignment", "separate": true, "sbg:cmdInclude": true }, "label": "Max Reads For Realignment", "description": "Max reads allowed at an interval for realignment.", "id": "#max_reads_for_realignment" }, { "sbg:altPrefix": "-greedy", "sbg:category": "Indel Realigner", "sbg:toolDefaultValue": "120", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--maxReadsForConsensuses", "separate": true, "sbg:cmdInclude": true }, "label": "Max Reads For Consensuses", "description": "Max reads used for finding the alternate consensuses (necessary to improve performance in deep coverage).", "id": "#max_reads_for_consensuses" }, { "sbg:altPrefix": "-maxPosMove", "sbg:category": "Indel Realigner", "sbg:toolDefaultValue": "200", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--maxPositionalMoveAllowed", "separate": true, "sbg:cmdInclude": true }, "label": "Max Positional Move Allowed", "description": "Maximum positional move in basepairs that a read can be adjusted during realignment.", "id": "#max_positional_move_allowed" }, { "sbg:altPrefix": "-maxIsize", "sbg:category": "Indel Realigner", "sbg:toolDefaultValue": "3000", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--maxIsizeForMovement", "separate": true, "sbg:cmdInclude": true }, "label": "Max Isize For Movement", "description": "Maximum insert size of read pairs that we attempt to realign.", "id": "#max_isize_for_movement" }, { "sbg:altPrefix": null, "sbg:category": "Indel Realigner", "sbg:toolDefaultValue": "30", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--maxConsensuses", "separate": true, "sbg:cmdInclude": true }, "label": "Max Consensuses", "description": "Max alternate consensuses to try (necessary to improve performance in deep coverage).", "id": "#max_consensuses" }, { "sbg:altPrefix": "-LOD", "sbg:category": "Indel Realigner", "sbg:toolDefaultValue": "5.0", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--LODThresholdForCleaning", "separate": true, "sbg:cmdInclude": true }, "label": "Lod Threshold For Cleaning", "description": "LOD threshold above which the cleaner will clean.", "id": "#lod_threshold_for_cleaning" }, { "required": false, "sbg:altPrefix": "-known", "sbg:category": "Input Files", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--knownAlleles", "separate": true, "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Known Alleles", "description": "Input VCF file(s) with known indels.", "sbg:fileTypes": "VCF", "id": "#known_alleles" }, { "sbg:altPrefix": "-kpr", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--keep_program_records", "separate": true, "sbg:cmdInclude": true }, "label": "Keep Program Records", "description": "Should we override the Walker's default and keep program records from the SAM header.", "id": "#keep_program_records" }, { "required": false, "sbg:altPrefix": "-L", "sbg:category": "Input Files", "sbg:stageInput": "link", "type": [ "null", { "type": "array", "items": "File" } ], "inputBinding": { "position": 0, "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.intervals_file instanceof Array)\n if($job.inputs.target_intervals.metadata)\n if($job.inputs.target_intervals.metadata.intervals_file)\n return '--intervals ' + $job.inputs.target_intervals.metadata.intervals_file\n \n if($job.inputs.intervals_file)\n return '--intervals ' + $job.inputs.intervals_file.path\n else\n return ''\n}" }, "sbg:cmdInclude": true }, "label": "Intervals", "description": "One or more genomic intervals over which to operate. Can be an specified in an .intervals file or a rod file.", "sbg:fileTypes": "TXT, BED, VCF", "id": "#intervals_file" }, { "sbg:altPrefix": null, "sbg:category": "GATK General", "sbg:toolDefaultValue": "sample", "type": [ "null", "string" ], "inputBinding": { "position": 0, "prefix": "-L", "separate": true, "sbg:cmdInclude": true }, "label": "Intervals", "description": "One or more genomic intervals over which to operate.", "id": "#intervals" }, { "sbg:altPrefix": "-isr", "sbg:category": "GATK General", "sbg:toolDefaultValue": "UNION", "type": [ "null", { "type": "enum", "symbols": [ "UNION", "INTERSECTION" ], "name": "interval_set_rule" } ], "inputBinding": { "position": 0, "prefix": "--interval_set_rule", "separate": true, "sbg:cmdInclude": true }, "label": "Interval Set Rule", "description": "Indicates the set merging approach the interval parser should use to combine the various -L or -XL inputs.", "id": "#interval_set_rule" }, { "sbg:altPrefix": "-ip", "sbg:category": "GATK General", "sbg:toolDefaultValue": "0", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--interval_padding", "separate": true, "sbg:cmdInclude": true }, "label": "Interval Padding", "description": "Indicates how many basepairs of padding to include around each of the intervals specified with the -L/--intervals argument.", "id": "#interval_padding" }, { "sbg:altPrefix": "-im", "sbg:category": "GATK General", "sbg:toolDefaultValue": "ALL", "type": [ "null", { "type": "enum", "symbols": [ "ALL", "OVERLAPPING_ONLY" ], "name": "interval_merging" } ], "inputBinding": { "position": 0, "prefix": "--interval_merging", "separate": true, "sbg:cmdInclude": true }, "label": "Interval Merging", "description": "Indicates the interval merging rule we should use for abutting intervals.", "id": "#interval_merging" }, { "required": false, "sbg:altPrefix": "-K", "sbg:category": "Input Files", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--gatk_key", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Gatk key", "description": "GATK Key file. Required if running with -et NO_ET. Please see http://gatkforums.broadinstitute.org/discussion/1250/what-is-phone-home-and-how-does-it-affect-me#latest for details.", "sbg:fileTypes": "KEY, LICENSE", "id": "#gatk_key" }, { "sbg:altPrefix": "-fixMisencodedQuals", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "-fixMisencodedQuals", "separate": true, "sbg:cmdInclude": true }, "label": "Fix Misencoded Quals", "description": "Fix mis-encoded base quality scores.", "id": "#fix_misencoded_quals" }, { "required": false, "sbg:altPrefix": "-XL", "sbg:category": "Input Files", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--excludeIntervals", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Exclude Intervals", "description": "One or more genomic intervals to exclude from processing. Can be an .intervals file or a rod file.", "sbg:fileTypes": "TXT, BED, VCF", "id": "#exclude_intervals" }, { "sbg:altPrefix": "-entropy", "sbg:category": "Indel Realigner", "sbg:toolDefaultValue": "0.15", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--entropyThreshold", "separate": true, "sbg:cmdInclude": true }, "label": "Entropy Threshold", "description": "Percentage of mismatches at a locus to be considered having high entropy.", "id": "#entropy_threshold" }, { "sbg:altPrefix": "-EOQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--emit_original_quals", "separate": true, "sbg:cmdInclude": true }, "label": "Emit Original Quals", "description": "If true, enables printing of the OQ tag with the original base qualities (with -BQSR).", "id": "#emit_original_quals" }, { "sbg:altPrefix": "-dt", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", { "type": "enum", "symbols": [ "NONE", "ALL_READS", "BY_SAMPLE" ], "name": "downsampling_type" } ], "inputBinding": { "position": 0, "prefix": "--downsampling_type", "separate": true, "sbg:cmdInclude": true }, "label": "Downsampling Type", "description": "Type of reads downsampling to employ at a given locus. Reads will be selected randomly to be removed from the pile based on the method described here.", "id": "#downsampling_type" }, { "sbg:altPrefix": "-dfrac", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--downsample_to_fraction", "separate": true, "sbg:cmdInclude": true }, "label": "Downsample to Fraction", "description": "Fraction [0.0-1.0] of reads to downsample to.", "id": "#downsample_to_fraction" }, { "sbg:altPrefix": "-dcov", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--downsample_to_coverage", "separate": true, "sbg:cmdInclude": true }, "label": "Downsample to Coverage", "description": "Coverage to downsample to at any given locus; note that downsampled reads are randomly selected from all possible reads at a locus. For non-locus-based traversals (eg., ReadWalkers), this sets the maximum number of reads at each alignment start position.", "id": "#downsample_to_coverage" }, { "sbg:altPrefix": null, "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--disableRandomization", "separate": true, "sbg:cmdInclude": true }, "label": "Disable Randomization", "description": "Completely eliminates randomization from nondeterministic methods. To be used mostly in the testing framework where dynamic parallelism can result in differing numbers of calls to the generator.", "id": "#disable_radnomization" }, { "sbg:altPrefix": "-DIQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--disable_indel_quals", "separate": true, "sbg:cmdInclude": true }, "label": "Disable Indel Quals", "description": "If 'true', disables printing of base insertion and base deletion tags (with -BQSR). Turns off printing of the base insertion and base deletion tags when using the -BQSR argument and only the base substitution qualities will be produced.", "id": "#disable_indel_quals" }, { "sbg:altPrefix": "-DBQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "-1", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--defaultBaseQualities", "separate": true, "sbg:cmdInclude": true }, "label": "Default Base Qualities", "description": "If reads are missing some or all base quality scores, this value will be used for all base quality scores.", "id": "#default_base_qualities" }, { "sbg:altPrefix": null, "sbg:category": "Execution", "sbg:toolDefaultValue": "1", "type": [ "null", "int" ], "label": "CPU per job", "description": "Number of CPUs per job.", "id": "#cpu_per_job" }, { "sbg:altPrefix": "-model", "sbg:category": "Indel Realigner", "sbg:toolDefaultValue": "USE_READS", "type": [ "null", { "type": "enum", "symbols": [ "KNOWNS_ONLY", "USE_READS", "USE_SW" ], "name": "consensus_determination_model" } ], "inputBinding": { "position": 0, "prefix": "--consensusDeterminationModel", "separate": true, "sbg:cmdInclude": true }, "label": "Consensus Determination Model", "description": "Determines how to compute the possible alternate consenses.", "id": "#consensus_determination_model" }, { "sbg:altPrefix": "-baqGOP", "sbg:category": "GATK General", "sbg:toolDefaultValue": "40.0", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--baqGapOpenPenalty", "separate": true, "sbg:cmdInclude": true }, "label": "BAQ Gap Open Penalty", "description": "BAQ gap open penalty (Phred Scaled). Default value is 40. 30 is perhaps better for whole genome call sets.", "id": "#baq_gap_open_penalty" }, { "sbg:altPrefix": "-baq", "sbg:category": "GATK General", "sbg:toolDefaultValue": "OFF", "type": [ "null", { "type": "enum", "symbols": [ "OFF", "CALCULATE_AS_NECESSARY", "RECALCULATE" ], "name": "baq" } ], "inputBinding": { "position": 0, "prefix": "--baq", "separate": true, "sbg:cmdInclude": true }, "label": "BAQ Calculation Type", "description": "Type of BAQ calculation to apply in the engine.", "id": "#baq" }, { "sbg:altPrefix": "--allow_potentially_misencoded_quality_scores", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "-allowPotentiallyMisencodedQuals", "separate": true, "sbg:cmdInclude": true }, "label": "Allow Potentially Misencoded Quals", "description": "Do not fail when encountered base qualities that are too high and seemingly indicate a problem with the base quality encoding of the BAM file.", "id": "#allow_potentailly_misencoded_quals" } ], "outputs": [ { "type": [ "null", "File" ], "label": "Realigned BAM", "description": "Realigned BAM.", "sbg:fileTypes": "BAM", "outputBinding": { "glob": "*.realigned.bam", "sbg:metadata": { "intervals_file": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if ($job.inputs.target_intervals)\n if($job.inputs.target_intervals.metadata)\n if('intervals_file' in $job.inputs.target_intervals.metadata)\n return $job.inputs.target_intervals.metadata.intervals_file\n return 'NO_INTERVALS'\n}" } }, "sbg:inheritMetadataFrom": "#reads", "secondaryFiles": [ ".bai", "^.bai" ] }, "id": "#realigned_bam_file" } ], "requirements": [ { "class": "ExpressionEngineRequirement", "id": "#cwl-js-engine", "requirements": [ { "class": "DockerRequirement", "dockerPull": "rabix/js-engine" } ] } ], "hints": [ { "class": "sbg:CPURequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.cpu_per_job){\n \treturn $job.inputs.cpu_per_job\n }\n\treturn 1\n}" } }, { "class": "sbg:MemRequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.memory_per_job){\n if($job.inputs.memory_overhead_per_job){\n \treturn $job.inputs.memory_per_job + $job.inputs.memory_overhead_per_job\n }\n else\n \t\treturn $job.inputs.memory_per_job\n }\n else if(!$job.inputs.memory_per_job && $job.inputs.memory_overhead_per_job){\n\t\treturn 2048 + $job.inputs.memory_overhead_per_job \n }\n else\n \treturn 2048\n}" } }, { "class": "DockerRequirement", "dockerImageId": "47510cb2da55", "dockerPull": "images.sbgenomics.com/stefanristeski/gatk2-lite:2.3-9" } ], "arguments": [ { "position": 0, "prefix": "--out", "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n read_name = [].concat($job.inputs.reads)[0].path.replace(/^.*[\\\\\\/]/, '').split('.')\n read_namebase = read_name.slice(0, read_name.length-1).join('.')\n return read_namebase + '.realigned.bam'\n}" } } ], "sbg:job": { "inputs": { "validation_strictness": null, "use_original_qualities": null, "use_legacy_downsampler": null, "unsafe": null, "target_intervals": { "class": "File", "metadata": { "intervals_file": "treterfgsdfsd.4444" }, "path": "intervalListFromRTC.intervals", "secondaryFiles": [], "size": 0 }, "tag": null, "remove_program_records": null, "reference": { "path": "/folder/reference.fasta" }, "reads": [ { "path": "/folder/input.bam" } ], "read_group_black_list": [], "read_filter": [], "preserve_qscores_less_than": null, "phone_home": null, "pedigree_validation_type": null, "pedigree_string": [], "non_deterministic_random_seed": null, "no_original_alignment_tags": null, "memory_per_job": null, "memory_overhead_per_job": 1000, "max_runtime_units": null, "max_runtime": null, "max_reads_in_memory": null, "max_reads_for_realignment": null, "max_reads_for_consensuses": null, "max_positional_move_allowed": null, "max_isize_for_movement": null, "max_consensuses": null, "lod_threshold_for_cleaning": null, "known_alleles": [ { "path": "/folder/indels.vcf" } ], "keep_program_records": null, "intervals_file": [ { "class": "File", "path": "/path/to/intervals_file-1.ext", "secondaryFiles": [], "size": 0 }, { "class": "File", "path": "/path/to/intervals_file-2.ext", "secondaryFiles": [], "size": 0 } ], "intervals": "", "interval_set_rule": null, "interval_padding": null, "interval_merging": null, "gatk_key": null, "fix_misencoded_quals": null, "exclude_intervals": null, "entropy_threshold": null, "emit_original_quals": null, "downsampling_type": null, "downsample_to_fraction": null, "downsample_to_coverage": null, "disable_radnomization": null, "disable_indel_quals": null, "default_base_qualities": null, "cpu_per_job": null, "consensus_determination_model": null, "baq_gap_open_penalty": null, "baq": null, "allow_potentailly_misencoded_quals": null }, "allocatedResources": { "mem": 3048, "cpu": 1 } }, "sbg:categories": [ "Alignment" ], "sbg:cmdPreview": "java -Xmx2048M -jar /opt/GenomeAnalysisTKLite.jar --analysis_type IndelRealigner --reference_sequence /folder/reference.fasta --input_file /folder/input.bam --targetIntervals intervalListFromRTC.intervals --out input.realigned.bam", "sbg:contributors": [ "vladimirk", "bix-demo", "bogdang" ], "sbg:createdBy": "bix-demo", "sbg:createdOn": 1450911378, "sbg:id": "admin/sbg-public-data/gatk-2-3-9-lite-indelrealigner/20", "sbg:image_url": null, "sbg:latestRevision": 13, "sbg:license": "MIT License", "sbg:links": [ { "id": "https://www.broadinstitute.org/gatk/index.php", "label": "Homepage" }, { "id": "https://github.com/broadgsa/gatk-protected", "label": "Source code" }, { "id": "https://www.broadinstitute.org/gatk/guide/pdfdocs/GATK_GuideBook_2.3-9.pdf", "label": "Wiki" }, { "id": "https://www.broadinstitute.org/gatk/download/auth?package=GATK-archive&version=2.3-9-ge5ebf34", "label": "Download" }, { "id": "https://www.broadinstitute.org/gatk/about/#in-the-literature", "label": "Publication" }, { "id": "https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_indels_IndelRealigner.php", "label": "Documentation" } ], "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1478713467, "sbg:project": "bix-demo/gatk-2-3-9-lite-demo", "sbg:revision": 13, "sbg:revisionNotes": "Output name based on reads filename", "sbg:revisionsInfo": [ { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911378, "sbg:revision": 0, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911378, "sbg:revision": 1, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911379, "sbg:revision": 2, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911380, "sbg:revision": 3, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911381, "sbg:revision": 4, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911382, "sbg:revision": 5, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1460993599, "sbg:revision": 6, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1472651971, "sbg:revision": 7, "sbg:revisionNotes": "Scatter metadata support." }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1472652361, "sbg:revision": 8, "sbg:revisionNotes": "metadata scatter 2" }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1472655804, "sbg:revision": 9, "sbg:revisionNotes": "scatter metadata 3" }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1472724542, "sbg:revision": 10, "sbg:revisionNotes": ".bai as secondary." }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1472738930, "sbg:revision": 11, "sbg:revisionNotes": "output single file." }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1476371352, "sbg:revision": 12, "sbg:revisionNotes": "Added support for run without intervals" }, { "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1478713467, "sbg:revision": 13, "sbg:revisionNotes": "Output name based on reads filename" } ], "sbg:sbgMaintained": false, "sbg:toolAuthor": "Broad Institute", "sbg:toolkit": "GATK", "sbg:toolkitVersion": "2.3.9 Lite", "sbg:validationErrors": [], "x": 1421.667051858386, "y": 414.4274774032482 }, "label": "GATK IndelRealigner", "scatter": "#GATK_IndelRealigner.target_intervals", "sbg:x": 1421.667051858386, "sbg:y": 414.4274774032482 }, { "id": "#BWA_MEM_Bundle_0_7_13", "inputs": [ { "id": "#BWA_MEM_Bundle_0_7_13.total_memory", "default": 54 }, { "id": "#BWA_MEM_Bundle_0_7_13.threads", "default": 30 }, { "id": "#BWA_MEM_Bundle_0_7_13.sambamba_threads", "default": 30 }, { "id": "#BWA_MEM_Bundle_0_7_13.reference_index_tar", "source": [ "#BWA_INDEX.indexed_reference" ] }, { "id": "#BWA_MEM_Bundle_0_7_13.output_format", "default": "SortedBAM" }, { "id": "#BWA_MEM_Bundle_0_7_13.mark_shorter", "default": true }, { "id": "#BWA_MEM_Bundle_0_7_13.input_reads", "source": [ "#SBG_Pair_FASTQs_by_Metadata.tuple_list" ] }, { "id": "#BWA_MEM_Bundle_0_7_13.filter_out_secondary_alignments", "default": true }, { "id": "#BWA_MEM_Bundle_0_7_13.deduplication", "default": "MarkDuplicates" } ], "outputs": [ { "id": "#BWA_MEM_Bundle_0_7_13.aligned_reads" } ], "run": { "cwlVersion": "sbg:draft-2", "class": "CommandLineTool", "id": "admin/sbg-public-data/bwa-mem-bundle-0-7-13/59", "label": "BWA-MEM Bundle 0.7.13", "description": "**BWA MEM** is an algorithm designed for aligning sequence reads onto a large reference genome. BWA MEM is implemented as a component of BWA. The algorithm can automatically choose between performing end-to-end and local alignments. BWA MEM is capable of outputting multiple alignments, and finding chimeric reads. It can be applied to a wide range of read lengths, from 70 bp to several megabases. \n\nIn order to obtain possibilities for additional fast processing of aligned reads, two tools are embedded together into the same package with BWA MEM (0.7.13): Samblaster. (0.1.22) and Sambamba (v0.6.0). \nIf deduplication of alignments is needed, it can be done by setting the parameter 'Duplication'. **Samblaster** will be used internally to perform this action.\nBesides the standard BWA MEM SAM output file, BWA MEM package has been extended to support two additional output options: a BAM file obtained by piping through **Sambamba view** while filtering out the secondary alignments, as well as a Coordinate Sorted BAM option that additionally pipes the output through **Sambamba sort**, along with an accompanying .bai file produced by **Sambamba sort** as side effect. Parameters responsible for these additional features are 'Filter out secondary alignments' and 'Output format'. Passing data from BWA MEM to Samblaster and Sambamba tools has been done through the pipes which saves processing times of two read and write of aligned reads into the hard drive. \n\nFor input reads fastq files of total size less than 10 GB we suggest using the default setting for parameter 'total memory' of 15GB, for larger files we suggest using 58 GB of memory and 32 CPU cores.\n\n**Important:**\nIn order to work BWA MEM Bundle requires fasta reference file accompanied with **bwa fasta indices** in TAR file.\nThere is the **known issue** with samblaster. It does not support processing when number of sequences in fasta is larger than 32768. If this is the case do not use deduplication option because the output BAM will be corrupted.", "baseCommand": [ { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n cmd = \"/bin/bash -c \\\"\"\n return cmd\n}" }, { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n reference_file = $job.inputs.reference_index_tar.path.split('/')[$job.inputs.reference_index_tar.path.split('/').length-1]\n return 'tar -xf ' + reference_file + ' ; '\n \n}" }, "/opt/bwa-0.7.13/bwa", "mem" ], "inputs": [ { "sbg:category": "BWA Input/output options", "sbg:toolDefaultValue": "3", "type": [ "null", { "type": "enum", "symbols": [ "1", "2", "3", "4" ], "name": "verbose_level" } ], "inputBinding": { "position": 0, "prefix": "-v", "separate": true, "sbg:cmdInclude": true }, "label": "Verbose level", "description": "Verbose level: 1=error, 2=warning, 3=message, 4+=debugging.", "id": "#verbose_level" }, { "sbg:category": "BWA Input/output options", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "-Y", "separate": true, "sbg:cmdInclude": true }, "label": "Use soft clipping", "description": "Use soft clipping for supplementary alignments.", "id": "#use_soft_clipping" }, { "sbg:category": "BWA Scoring options", "sbg:toolDefaultValue": "17", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "-U", "separate": true, "sbg:cmdInclude": true }, "label": "Unpaired read penalty", "description": "Penalty for an unpaired read pair.", "id": "#unpaired_read_penalty" }, { "sbg:category": "Execution", "sbg:stageInput": null, "sbg:toolDefaultValue": "15", "type": [ "null", "int" ], "label": "Total memory", "description": "Total memory to be used by the tool in GB. It's sum of BWA, Sambamba Sort and Samblaster. For fastq files of total size less than 10GB, we suggest using the default setting of 15GB, for larger files we suggest using 58GB of memory (and 32CPU cores).", "id": "#total_memory" }, { "sbg:category": "Execution", "sbg:toolDefaultValue": "8", "type": [ "null", "int" ], "label": "Threads", "description": "Number of threads for BWA, Samblaster and Sambamba sort process.", "id": "#threads" }, { "sbg:category": "BWA Input/output options", "type": [ "null", { "type": "array", "items": "float" } ], "inputBinding": { "position": 0, "prefix": "-I", "separate": false, "sbg:cmdInclude": true }, "label": "Specify distribution parameters", "description": "Specify the mean, standard deviation (10% of the mean if absent), max (4 sigma from the mean if absent) and min of the insert size distribution.FR orientation only. This array can have maximum four values, where first two should be specified as FLOAT and last two as INT.", "id": "#speficy_distribution_parameters" }, { "sbg:category": "Execution", "type": [ "null", "int" ], "label": "Memory for BAM sorting", "description": "Amount of RAM [Gb] to give to the sorting algorithm (if not provided will be set to one third of the total memory).", "id": "#sort_memory" }, { "sbg:category": "BWA Input/output options", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "-p", "separate": true, "sbg:cmdInclude": true }, "label": "Smart pairing in input FASTQ file", "description": "Smart pairing in input FASTQ file (ignoring in2.fq).", "id": "#smart_pairing_in_input_fastq" }, { "sbg:category": "BWA Algorithm options", "sbg:toolDefaultValue": "500", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "-c", "separate": true, "sbg:cmdInclude": true }, "label": "Skip seeds with more than INT occurrences", "description": "Skip seeds with more than INT occurrences.", "id": "#skip_seeds" }, { "sbg:category": "BWA Algorithm options", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "-P", "separate": true, "sbg:cmdInclude": true }, "label": "Skip pairing", "description": "Skip pairing; mate rescue performed unless -S also in use.", "id": "#skip_pairing" }, { "sbg:category": "BWA Algorithm options", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "-S", "separate": true, "sbg:cmdInclude": true }, "label": "Skip mate rescue", "description": "Skip mate rescue.", "id": "#skip_mate_rescue" }, { "sbg:category": "BWA Algorithm options", "sbg:toolDefaultValue": "1.5", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "-r", "separate": true, "sbg:cmdInclude": true }, "label": "Select seeds", "description": "Look for internal seeds inside a seed longer than {-k} * FLOAT.", "id": "#select_seeds" }, { "sbg:category": "BWA Algorithm options", "sbg:toolDefaultValue": "20", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "-y", "separate": true, "sbg:cmdInclude": true }, "label": "Seed occurrence for the 3rd round", "description": "Seed occurrence for the 3rd round seeding.", "id": "#seed_occurrence_for_the_3rd_round" }, { "sbg:category": "BWA Scoring options", "sbg:toolDefaultValue": "1", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "-A", "separate": true, "sbg:cmdInclude": true }, "label": "Score for a sequence match", "description": "Score for a sequence match, which scales options -TdBOELU unless overridden.", "id": "#score_for_a_sequence_match" }, { "sbg:category": "Execution", "type": [ "null", "int" ], "label": "Sambamba Sort threads", "description": "Number of threads to pass to Sambamba sort, if used.", "id": "#sambamba_threads" }, { "sbg:category": "BWA Read Group Options", "sbg:toolDefaultValue": "Inferred from metadata", "type": [ "null", "string" ], "label": "Sample ID", "description": "Specify the sample ID for RG line - A human readable identifier for a sample or specimen, which could contain some metadata information. A sample or specimen is material taken from a biological entity for testing, diagnosis, propagation, treatment, or research purposes, including but not limited to tissues, body fluids, cells, organs, embryos, body excretory products, etc.", "id": "#rg_sample_id" }, { "sbg:category": "BWA Read Group Options", "sbg:toolDefaultValue": "Inferred from metadata", "type": [ "null", "string" ], "label": "Platform unit ID", "description": "Specify the platform unit (lane/slide) for RG line - An identifier for lanes (Illumina), or for slides (SOLiD) in the case that a library was split and ran over multiple lanes on the flow cell or slides.", "id": "#rg_platform_unit_id" }, { "sbg:category": "BWA Read Group Options", "sbg:toolDefaultValue": "Inferred from metadata", "type": [ "null", { "type": "enum", "symbols": [ "454", "Helicos", "Illumina", "Solid", "IonTorrent" ], "name": "rg_platform" } ], "label": "Platform", "description": "Specify the version of the technology that was used for sequencing, which will be placed in RG line.", "id": "#rg_platform" }, { "sbg:category": "BWA Read Group Options", "type": [ "null", "string" ], "label": "Median fragment length", "description": "Specify the median fragment length for RG line.", "id": "#rg_median_fragment_length" }, { "sbg:category": "BWA Read Group Options", "sbg:toolDefaultValue": "Inferred from metadata", "type": [ "null", "string" ], "label": "Library ID", "description": "Specify the identifier for the sequencing library preparation, which will be placed in RG line.", "id": "#rg_library_id" }, { "sbg:category": "Configuration", "sbg:toolDefaultValue": "1", "type": [ "null", "string" ], "label": "Read group ID", "description": "Read group ID", "id": "#rg_id" }, { "sbg:category": "BWA Read Group Options", "type": [ "null", "string" ], "label": "Data submitting center", "description": "Specify the data submitting center for RG line.", "id": "#rg_data_submitting_center" }, { "sbg:category": "Configuration", "sbg:stageInput": null, "sbg:toolDefaultValue": "1", "type": [ "null", "int" ], "label": "Reserved number of threads on the instance", "description": "Reserved number of threads on the instance used by scheduler.", "id": "#reserved_threads" }, { "required": true, "sbg:category": "Input files", "sbg:stageInput": "link", "type": [ "File" ], "label": "Reference Index TAR", "description": "Reference fasta file with BWA index files packed in TAR.", "sbg:fileTypes": "TAR", "id": "#reference_index_tar" }, { "sbg:category": "BWA Scoring options", "type": [ "null", { "type": "enum", "symbols": [ "pacbio", "ont2d", "intractg" ], "name": "read_type" } ], "inputBinding": { "position": 0, "prefix": "-x", "separate": true, "sbg:cmdInclude": true }, "label": "Sequencing technology-specific settings", "description": "Sequencing technology-specific settings; Setting -x changes multiple parameters unless overriden. pacbio: -k17 -W40 -r10 -A1 -B1 -O1 -E1 -L0 (PacBio reads to ref). ont2d: -k14 -W20 -r10 -A1 -B1 -O1 -E1 -L0 (Oxford Nanopore 2D-reads to ref). intractg: -B9 -O16 -L5 (intra-species contigs to ref).", "id": "#read_type" }, { "sbg:category": "BWA Read Group Options", "sbg:toolDefaultValue": "Constructed from per-attribute parameters or inferred from metadata.", "type": [ "null", "string" ], "label": "Read group header", "description": "Read group header line such as '@RG\\tID:foo\\tSM:bar'. This value takes precedence over per-attribute parameters.", "id": "#read_group_header" }, { "sbg:category": "Configuration", "type": [ "null", "string" ], "label": "Output SAM/BAM file name", "description": "Name of the output BAM file.", "id": "#output_name" }, { "sbg:category": "BWA Input/output options", "sbg:toolDefaultValue": "[5, 200]", "type": [ "null", { "type": "array", "items": "int" } ], "inputBinding": { "position": 0, "prefix": "-h", "separate": false, "itemSeparator": ",", "sbg:cmdInclude": true }, "label": "Output in XA", "description": "If there are 80% of the max score, output all in XA. This array should have no more than two values.", "id": "#output_in_xa" }, { "sbg:category": "BWA Input/output options", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "-V", "separate": true, "sbg:cmdInclude": true }, "label": "Output header", "description": "Output the reference FASTA header in the XR tag.", "id": "#output_header" }, { "sbg:category": "Execution", "sbg:toolDefaultValue": "SortedBAM", "type": [ "null", { "type": "enum", "symbols": [ "SAM", "BAM", "SortedBAM" ], "name": "output_format" } ], "label": "Output format", "description": "Specify output format (Sorted BAM option will output coordinate sorted BAM).", "id": "#output_format" }, { "sbg:category": "BWA Input/output options", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "-a", "separate": true, "sbg:cmdInclude": true }, "label": "Output alignments", "description": "Output all alignments for SE or unpaired PE.", "id": "#output_alignments" }, { "sbg:category": "BWA Scoring options", "sbg:toolDefaultValue": "4", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "-B", "separate": true, "sbg:cmdInclude": true }, "label": "Mismatch penalty", "description": "Penalty for a mismatch.", "id": "#mismatch_penalty" }, { "sbg:category": "BWA Algorithm options", "sbg:toolDefaultValue": "19", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "-k", "separate": true, "sbg:cmdInclude": true }, "label": "Minimum seed length", "description": "Minimum seed length for BWA MEM.", "id": "#minimum_seed_length" }, { "sbg:category": "BWA Input/output options", "sbg:toolDefaultValue": "30", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "-T", "separate": true, "sbg:cmdInclude": true }, "label": "Minimum alignment score for a read to be output in SAM/BAM", "description": "Minimum alignment score for a read to be output in SAM/BAM.", "id": "#minimum_output_score" }, { "sbg:category": "BWA Algorithm options", "sbg:toolDefaultValue": "50", "type": [ "null", "string" ], "inputBinding": { "position": 0, "prefix": "-m", "separate": true, "sbg:cmdInclude": true }, "label": "Mate rescue rounds", "description": "Perform at most INT rounds of mate rescues for each read.", "id": "#mate_rescue_rounds" }, { "sbg:category": "BWA Input/output options", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "-M", "separate": true, "sbg:cmdInclude": true }, "label": "Mark shorter", "description": "Mark shorter split hits as secondary.", "id": "#mark_shorter" }, { "sbg:category": "BWA Input/output options", "type": [ "null", "string" ], "inputBinding": { "position": 0, "prefix": "-H", "separate": true, "sbg:cmdInclude": true }, "label": "Insert string to output SAM or BAM header", "description": "Insert STR to header if it starts with @; or insert lines in FILE.", "id": "#insert_string_to_header" }, { "required": true, "sbg:category": "Input files", "type": [ { "type": "array", "items": "File" } ], "label": "Input reads", "description": "Input sequence reads.", "sbg:fileTypes": "FASTQ, FASTQ.GZ, FQ, FQ.GZ", "id": "#input_reads" }, { "sbg:category": "BWA Input/output options", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "-j", "separate": true, "sbg:cmdInclude": true }, "label": "Ignore ALT file", "description": "Treat ALT contigs as part of the primary assembly (i.e. ignore .alt file).", "id": "#ignore_alt_file" }, { "sbg:category": "BWA Scoring options", "sbg:toolDefaultValue": "[6,6]", "type": [ "null", { "type": "array", "items": "int" } ], "inputBinding": { "position": 0, "prefix": "-O", "separate": false, "itemSeparator": ",", "sbg:cmdInclude": true }, "label": "Gap open penalties", "description": "Gap open penalties for deletions and insertions. This array can't have more than two values.", "id": "#gap_open_penalties" }, { "sbg:category": "BWA Scoring options", "sbg:toolDefaultValue": "[1,1]", "type": [ "null", { "type": "array", "items": "int" } ], "inputBinding": { "position": 0, "prefix": "-E", "separate": false, "itemSeparator": ",", "sbg:cmdInclude": true }, "label": "Gap extension", "description": "Gap extension penalty; a gap of size k cost '{-O} + {-E}*k'. This array can't have more than two values.", "id": "#gap_extension_penalties" }, { "sbg:category": "Execution", "sbg:stageInput": null, "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "label": "Filter out secondary alignments", "description": "Filter out secondary alignments. Sambamba view tool will be used to perform this internally.", "id": "#filter_out_secondary_alignments" }, { "sbg:category": "BWA Algorithm options", "sbg:toolDefaultValue": "100", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "-d", "separate": true, "sbg:cmdInclude": true }, "label": "Dropoff", "description": "Off-diagonal X-dropoff.", "id": "#dropoff" }, { "sbg:category": "BWA Algorithm options", "sbg:toolDefaultValue": "0.50", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "-D", "separate": true, "sbg:cmdInclude": true }, "label": "Drop chains fraction", "description": "Drop chains shorter than FLOAT fraction of the longest overlapping chain.", "id": "#drop_chains_fraction" }, { "sbg:category": "BWA Algorithm options", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "-e", "separate": true, "sbg:cmdInclude": true }, "label": "Discard exact matches", "description": "Discard full-length exact matches.", "id": "#discard_exact_matches" }, { "sbg:category": "BWA Algorithm options", "sbg:toolDefaultValue": "0", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "-W", "separate": true, "sbg:cmdInclude": true }, "label": "Discard chain length", "description": "Discard a chain if seeded bases shorter than INT.", "id": "#discard_chain_length" }, { "sbg:category": "Samblaster parameters", "sbg:toolDefaultValue": "MarkDuplicates", "type": [ "null", { "type": "enum", "symbols": [ "None", "MarkDuplicates", "RemoveDuplicates" ], "name": "deduplication" } ], "label": "PCR duplicate detection", "description": "Use Samblaster for finding duplicates on sequence reads.", "id": "#deduplication" }, { "sbg:category": "BWA Scoring options", "sbg:toolDefaultValue": "[5,5]", "type": [ "null", { "type": "array", "items": "int" } ], "inputBinding": { "position": 0, "prefix": "-L", "separate": false, "itemSeparator": ",", "sbg:cmdInclude": true }, "label": "Clipping penalty", "description": "Penalty for 5'- and 3'-end clipping.", "id": "#clipping_penalty" }, { "sbg:category": "BWA Algorithm options", "sbg:toolDefaultValue": "100", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "-w", "separate": true, "sbg:cmdInclude": true }, "label": "Band width", "description": "Band width for banded alignment.", "id": "#band_width" }, { "sbg:category": "BWA Input/output options", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "-C", "separate": true, "sbg:cmdInclude": true }, "label": "Append comment", "description": "Append FASTA/FASTQ comment to SAM output.", "id": "#append_comment" } ], "outputs": [ { "type": [ "null", "File" ], "label": "Aligned SAM/BAM", "description": "Aligned reads.", "sbg:fileTypes": "SAM, BAM", "outputBinding": { "glob": "{*.sam,*.bam}", "sbg:metadata": { "reference_genome": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n reference_file = $job.inputs.reference_index_tar.path.split('/')[$job.inputs.reference_index_tar.path.split('/').length-1]\n name = reference_file.slice(0, -4) // cut .tar extension \n \n name_list = name.split('.')\n ext = name_list[name_list.length-1]\n\n if (ext == 'gz' || ext == 'GZ'){\n a = name_list.pop() // strip fasta.gz\n a = name_list.pop()\n } else\n a = name_list.pop() //strip only fasta/fa\n \n return name_list.join('.')\n \n}" } }, "sbg:inheritMetadataFrom": "#input_reads", "secondaryFiles": [ ".bai", "^.bai" ] }, "id": "#aligned_reads" } ], "requirements": [ { "class": "ExpressionEngineRequirement", "id": "#cwl-js-engine", "requirements": [ { "class": "DockerRequirement", "dockerPull": "rabix/js-engine" } ] } ], "hints": [ { "class": "sbg:CPURequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{ \n // Calculate suggested number of CPUs depending of the input reads size\n if ($job.inputs.input_reads.constructor == Array){\n if ($job.inputs.input_reads[1]){\n reads_size = $job.inputs.input_reads[0].size + $job.inputs.input_reads[1].size\n } else{\n reads_size = $job.inputs.input_reads[0].size\n }\n }\n else{\n reads_size = $job.inputs.input_reads.size\n }\n if(!reads_size) { reads_size = 0 }\n\n\n GB_1 = 1024*1024*1024\n if(reads_size < GB_1){ suggested_cpus = 1 }\n else if(reads_size < 10 * GB_1){ suggested_cpus = 8 }\n else { suggested_cpus = 31 }\n \n if($job.inputs.reserved_threads){ return $job.inputs.reserved_threads }\n else if($job.inputs.threads){ return $job.inputs.threads } \n else if($job.inputs.sambamba_threads) { return $job.inputs.sambamba_threads }\n else{ return suggested_cpus }\n}" } }, { "class": "sbg:MemRequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{ \n\n // Calculate suggested number of CPUs depending of the input reads size\n if ($job.inputs.input_reads.constructor == Array){\n if ($job.inputs.input_reads[1]){\n reads_size = $job.inputs.input_reads[0].size + $job.inputs.input_reads[1].size\n } else{\n reads_size = $job.inputs.input_reads[0].size\n }\n }\n else{\n reads_size = $job.inputs.input_reads.size\n }\n if(!reads_size) { reads_size = 0 }\n \n GB_1 = 1024*1024*1024\n if(reads_size < GB_1){ suggested_memory = 4 }\n else if(reads_size < 10 * GB_1){ suggested_memory = 15 }\n else { suggested_memory = 58 }\n \n if($job.inputs.total_memory){ \t\n return $job.inputs.total_memory* 1024 \n } \n else if($job.inputs.sort_memory){\n return $job.inputs.sort_memory* 1024\n }\n else{ \t\n return suggested_memory * 1024 \n }\n}" } }, { "class": "DockerRequirement", "dockerPull": "images.sbgenomics.com/vladimirk/bwa:0.7.13" } ], "arguments": [ { "position": 111, "prefix": "", "separate": false, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{ \n ///////////////////////////////////////////\n /// SAMBAMBA VIEW //////////////////////\n ///////////////////////////////////////////\nfunction common_substring(a,b) {\n var i = 0;\n \n while(a[i] === b[i] && i < a.length)\n {\n i = i + 1;\n }\n\n return a.slice(0, i);\n}\n \n // Set output file name\n if($job.inputs.input_reads[0] instanceof Array){\n input_1 = $job.inputs.input_reads[0][0] // scatter mode\n input_2 = $job.inputs.input_reads[0][1]\n } else if($job.inputs.input_reads instanceof Array){\n input_1 = $job.inputs.input_reads[0]\n input_2 = $job.inputs.input_reads[1]\n }else {\n input_1 = [].concat($job.inputs.input_reads)[0]\n input_2 = input_1\n }\n full_name = input_1.path.split('/')[input_1.path.split('/').length-1] \n\n if($job.inputs.output_name){name = $job.inputs.output_name }\n else if ($job.inputs.input_reads.length == 1){ \n name = full_name\n\n if(name.slice(-3, name.length) === '.gz' || name.slice(-3, name.length) === '.GZ')\n name = name.slice(0, -3) \n if(name.slice(-3, name.length) === '.fq' || name.slice(-3, name.length) === '.FQ')\n name = name.slice(0, -3)\n if(name.slice(-6, name.length) === '.fastq' || name.slice(-6, name.length) === '.FASTQ')\n name = name.slice(0, -6)\n \n }else{\n full_name2 = input_2.path.split('/')[input_2.path.split('/').length-1] \n name = common_substring(full_name, full_name2)\n \n if(name.slice(-1, name.length) === '_' || name.slice(-1, name.length) === '.')\n name = name.slice(0, -1)\n if(name.slice(-2, name.length) === 'p_' || name.slice(-1, name.length) === 'p.')\n name = name.slice(0, -2)\n if(name.slice(-2, name.length) === 'P_' || name.slice(-1, name.length) === 'P.')\n name = name.slice(0, -2)\n if(name.slice(-3, name.length) === '_p_' || name.slice(-3, name.length) === '.p.')\n name = name.slice(0, -3)\n if(name.slice(-3, name.length) === '_pe' || name.slice(-3, name.length) === '.pe')\n name = name.slice(0, -3)\n }\n \n // Read number of threads if defined\n if ($job.inputs.sambamba_threads){\n threads = $job.inputs.sambamba_threads\n }\n else if ($job.inputs.threads){\n threads = $job.inputs.threads\n }\n else { threads = 8 }\n \n if ($job.inputs.filter_out_secondary_alignments){\n filt_sec = ' --filter \\'not secondary_alignment\\' '\n }\n else {filt_sec=' '}\n \n // Set output command\n sambamba_path = '/opt/sambamba_v0.6.0'\n if ($job.inputs.output_format == 'BAM') {\n return \"| \" + sambamba_path + \" view -t \"+ threads + filt_sec + \"-f bam -S /dev/stdin -o \"+ name + \".bam\"\n }\n else if ($job.inputs.output_format == 'SAM'){ // SAM\n return \"> \" + name + \".sam\"\n }\n else { // SortedBAM is considered default\n return \"| \" + sambamba_path + \" view -t \"+ threads + filt_sec + \"-f bam -l 0 -S /dev/stdin\"\n }\n\n}" } }, { "position": 112, "separate": false, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n ///////////////////////////////////////////\n /// SAMBAMBA SORT //////////////////////\n///////////////////////////////////////////\n \nfunction common_substring(a,b) {\n var i = 0;\n while(a[i] === b[i] && i < a.length)\n {\n i = i + 1;\n }\n\n return a.slice(0, i);\n}\n\n // Set output file name\n if($job.inputs.input_reads[0] instanceof Array){\n input_1 = $job.inputs.input_reads[0][0] // scatter mode\n input_2 = $job.inputs.input_reads[0][1]\n } else if($job.inputs.input_reads instanceof Array){\n input_1 = $job.inputs.input_reads[0]\n input_2 = $job.inputs.input_reads[1]\n }else {\n input_1 = [].concat($job.inputs.input_reads)[0]\n input_2 = input_1\n }\n full_name = input_1.path.split('/')[input_1.path.split('/').length-1] \n \n if($job.inputs.output_name){name = $job.inputs.output_name }\n else if ($job.inputs.input_reads.length == 1){\n name = full_name\n if(name.slice(-3, name.length) === '.gz' || name.slice(-3, name.length) === '.GZ')\n name = name.slice(0, -3) \n if(name.slice(-3, name.length) === '.fq' || name.slice(-3, name.length) === '.FQ')\n name = name.slice(0, -3)\n if(name.slice(-6, name.length) === '.fastq' || name.slice(-6, name.length) === '.FASTQ')\n name = name.slice(0, -6)\n \n }else{\n full_name2 = input_2.path.split('/')[input_2.path.split('/').length-1] \n name = common_substring(full_name, full_name2)\n \n if(name.slice(-1, name.length) === '_' || name.slice(-1, name.length) === '.')\n name = name.slice(0, -1)\n if(name.slice(-2, name.length) === 'p_' || name.slice(-1, name.length) === 'p.')\n name = name.slice(0, -2)\n if(name.slice(-2, name.length) === 'P_' || name.slice(-1, name.length) === 'P.')\n name = name.slice(0, -2)\n if(name.slice(-3, name.length) === '_p_' || name.slice(-3, name.length) === '.p.')\n name = name.slice(0, -3)\n if(name.slice(-3, name.length) === '_pe' || name.slice(-3, name.length) === '.pe')\n name = name.slice(0, -3)\n }\n\n //////////////////////////\n // Set sort memory size\n \n reads_size = 0 // Not used because of situations when size does not exist!\n GB_1 = 1024*1024*1024\n if(reads_size < GB_1){ \n suggested_memory = 4\n suggested_cpus = 1\n }\n else if(reads_size < 10 * GB_1){ \n suggested_memory = 15\n suggested_cpus = 8\n }\n else { \n suggested_memory = 58 \n suggested_cpus = 31\n }\n \n \n if(!$job.inputs.total_memory){ total_memory = suggested_memory }\n else{ total_memory = $job.inputs.total_memory }\n\n // TODO:Rough estimation, should be fine-tuned!\n if(total_memory > 16){ sorter_memory = parseInt(total_memory / 3) }\n else{ sorter_memory = 5 }\n \n if ($job.inputs.sort_memory){\n sorter_memory_string = $job.inputs.sort_memory +'GiB'\n }\n else sorter_memory_string = sorter_memory + 'GiB' \n \n // Read number of threads if defined \n if ($job.inputs.sambamba_threads){\n threads = $job.inputs.sambamba_threads\n }\n else if ($job.inputs.threads){\n threads = $job.inputs.threads\n }\n else threads = suggested_cpus\n \n sambamba_path = '/opt/sambamba_v0.6.0'\n \n // SortedBAM is considered default\n if (!(($job.inputs.output_format == 'BAM') || ($job.inputs.output_format == 'SAM'))){\n cmd = \"| \" + sambamba_path + \" sort -t \" + threads\n return cmd + \" -m \"+sorter_memory_string+\" --tmpdir ./ -o \"+ name +\".bam -l 5 /dev/stdin\"\n }\n else return \"\"\n}\n \n" } }, { "position": 110, "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n ///////////////////////////////////////////\n /// SAMBLASTER //////////////////////\n ///////////////////////////////////////////\n if ($job.inputs.deduplication == \"MarkDuplicates\"){\n return \"| /opt/samblaster/samblaster -i /dev/stdin -o /dev/stdout\"\n }\n else if ($job.inputs.deduplication == \"RemoveDuplicates\"){\n return \"| /opt/samblaster/samblaster -r -i /dev/stdin -o /dev/stdout\"\n }\n else{\n return \"\" \n }\n}" } }, { "position": 1, "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n \n if($job.inputs.read_group_header){\n \treturn '-R ' + $job.inputs.read_group_header\n }\n \n function add_param(key, val){\n if(!val){\n return\n\t}\n param_list.push(key + ':' + val)\n }\n\n param_list = []\n\n // Set output file name\n if($job.inputs.input_reads[0] instanceof Array){\n input_1 = $job.inputs.input_reads[0][0] // scatter mode\n } else if($job.inputs.input_reads instanceof Array){\n input_1 = $job.inputs.input_reads[0]\n }else {\n input_1 = [].concat($job.inputs.input_reads)[0]\n }\n \n //Read metadata for input reads\n read_metadata = input_1.metadata\n if(!read_metadata) read_metadata = []\n\n if($job.inputs.rg_id){\n add_param('ID', $job.inputs.rg_id)\n }\n else {\n add_param('ID', '1')\n } \n \n \n if($job.inputs.rg_data_submitting_center){\n \tadd_param('CN', $job.inputs.rg_data_submitting_center)\n }\n else if('data_submitting_center' in read_metadata){\n \tadd_param('CN', read_metadata.data_submitting_center)\n }\n \n if($job.inputs.rg_library_id){\n \tadd_param('LB', $job.inputs.rg_library_id)\n }\n else if('library_id' in read_metadata){\n \tadd_param('LB', read_metadata.library_id)\n }\n \n if($job.inputs.rg_median_fragment_length){\n \tadd_param('PI', $job.inputs.rg_median_fragment_length)\n }\n\n \n if($job.inputs.rg_platform){\n \tadd_param('PL', $job.inputs.rg_platform)\n }\n else if('platform' in read_metadata){\n if(read_metadata.platform == 'HiSeq X Ten'){\n rg_platform = 'Illumina'\n }\n else{\n rg_platform = read_metadata.platform\n }\n \tadd_param('PL', rg_platform)\n }\n \n if($job.inputs.rg_platform_unit_id){\n \tadd_param('PU', $job.inputs.rg_platform_unit_id)\n }\n else if('platform_unit_id' in read_metadata){\n \tadd_param('PU', read_metadata.platform_unit_id)\n }\n \n if($job.inputs.rg_sample_id){\n \tadd_param('SM', $job.inputs.rg_sample_id)\n }\n else if('sample_id' in read_metadata){\n \tadd_param('SM', read_metadata.sample_id)\n }\n \n return \"-R '@RG\\\\t\" + param_list.join('\\\\t') + \"'\"\n \n}" } }, { "position": 101, "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n /////// Set input reads in the correct order depending of the paired end from metadata\n \n // Set output file name\n if($job.inputs.input_reads[0] instanceof Array){\n input_reads = $job.inputs.input_reads[0] // scatter mode\n } else {\n input_reads = $job.inputs.input_reads = [].concat($job.inputs.input_reads)\n }\n \n \n //Read metadata for input reads\n read_metadata = input_reads[0].metadata\n if(!read_metadata) read_metadata = []\n \n order = 0 // Consider this as normal order given at input: pe1 pe2\n \n // Check if paired end 1 corresponds to the first given read\n if(read_metadata == []){ order = 0 }\n else if('paired_end' in read_metadata){ \n pe1 = read_metadata.paired_end\n if(pe1 != 1) order = 1 // change order\n }\n\n // Return reads in the correct order\n if (input_reads.length == 1){\n return input_reads[0].path // Only one read present\n }\n else if (input_reads.length == 2){\n if (order == 0) return input_reads[0].path + ' ' + input_reads[1].path\n else return input_reads[1].path + ' ' + input_reads[0].path\n }\n\n}" } }, { "position": 2, "prefix": "-t", "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n \n reads_size = 0 \n\n GB_1 = 1024*1024*1024\n if(reads_size < GB_1){ suggested_threads = 1 }\n else if(reads_size < 10 * GB_1){ suggested_threads = 8 }\n else { suggested_threads = 31 }\n \n \n if(!$job.inputs.threads){ \treturn suggested_threads } \n else{ return $job.inputs.threads }\n}" } }, { "position": 10, "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n reference_file = $job.inputs.reference_index_tar.path.split('/')[$job.inputs.reference_index_tar.path.split('/').length-1]\n name = reference_file.slice(0, -4) // cut .tar extension \n \n return name\n \n}" } }, { "position": 10000, "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n cmd = \";declare -i pipe_statuses=(\\\\${PIPESTATUS[*]});len=\\\\${#pipe_statuses[@]};declare -i tot=0;echo \\\\${pipe_statuses[*]};for (( i=0; i<\\\\${len}; i++ ));do if [ \\\\${pipe_statuses[\\\\$i]} -ne 0 ];then tot=\\\\${pipe_statuses[\\\\$i]}; fi;done;if [ \\\\$tot -ne 0 ]; then >&2 echo Error in piping. Pipe statuses: \\\\${pipe_statuses[*]};fi; if [ \\\\$tot -ne 0 ]; then false;fi\\\"\"\n return cmd\n}" } } ], "sbg:job": { "inputs": { "verbose_level": null, "use_soft_clipping": null, "unpaired_read_penalty": null, "total_memory": null, "threads": null, "speficy_distribution_parameters": [], "sort_memory": 0, "smart_pairing_in_input_fastq": null, "skip_seeds": null, "skip_pairing": null, "skip_mate_rescue": null, "select_seeds": null, "seed_occurrence_for_the_3rd_round": null, "score_for_a_sequence_match": null, "sambamba_threads": null, "rg_sample_id": "", "rg_platform_unit_id": "", "rg_platform": null, "rg_median_fragment_length": "", "rg_library_id": "", "rg_id": "rg_id-string-value", "rg_data_submitting_center": "", "reserved_threads": 3, "reference_index_tar": { "class": "File", "path": "/path/to/reference.b37.fasta.gz.tar", "secondaryFiles": [ { "path": ".amb" }, { "path": ".ann" }, { "path": ".bwt" }, { "path": ".pac" }, { "path": ".sa" } ], "size": 0 }, "read_type": null, "read_group_header": "", "output_name": "", "output_in_xa": [], "output_header": null, "output_format": null, "output_alignments": null, "mismatch_penalty": null, "minimum_seed_length": null, "minimum_output_score": null, "mate_rescue_rounds": null, "mark_shorter": null, "insert_string_to_header": null, "input_reads": [ { "class": "File", "metadata": { "paired_end": "2", "platform": "HiSeq X Ten", "sample_id": "dnk_sample" }, "path": "/path/to/LP6005524-DNA_C01_lane_7.sorted.converted.filtered.pe_1.gz", "secondaryFiles": [], "size": 30000000000 }, { "path": "/path/to/LP6005524-DNA_C01_lane_7.sorted.converted.filtered.pe_2.gz" } ], "ignore_alt_file": null, "gap_open_penalties": [], "gap_extension_penalties": [], "filter_out_secondary_alignments": false, "dropoff": null, "drop_chains_fraction": null, "discard_exact_matches": null, "discard_chain_length": null, "deduplication": "MarkDuplicates", "clipping_penalty": [], "band_width": null, "append_comment": null }, "allocatedResources": { "mem": 4096, "cpu": 3 } }, "sbg:categories": [ "Alignment", "FASTQ-Processing" ], "sbg:cmdPreview": "/bin/bash -c \" tar -xf reference.b37.fasta.gz.tar ; /opt/bwa-0.7.13/bwa mem -R '@RG\\tID:rg_id-string-value\\tPL:Illumina\\tSM:dnk_sample' -t 1 reference.b37.fasta.gz /path/to/LP6005524-DNA_C01_lane_7.sorted.converted.filtered.pe_2.gz /path/to/LP6005524-DNA_C01_lane_7.sorted.converted.filtered.pe_1.gz | /opt/samblaster/samblaster -i /dev/stdin -o /dev/stdout | /opt/sambamba_v0.6.0 view -t 8 -f bam -l 0 -S /dev/stdin | /opt/sambamba_v0.6.0 sort -t 1 -m 5GiB --tmpdir ./ -o LP6005524-DNA_C01_lane_7.sorted.converted.filtered.bam -l 5 /dev/stdin ;declare -i pipe_statuses=(\\${PIPESTATUS[*]});len=\\${#pipe_statuses[@]};declare -i tot=0;echo \\${pipe_statuses[*]};for (( i=0; i<\\${len}; i++ ));do if [ \\${pipe_statuses[\\$i]} -ne 0 ];then tot=\\${pipe_statuses[\\$i]}; fi;done;if [ \\$tot -ne 0 ]; then >&2 echo Error in piping. Pipe statuses: \\${pipe_statuses[*]};fi; if [ \\$tot -ne 0 ]; then false;fi\"", "sbg:contributors": [ "bogdang", "bix-demo", "vladimirk" ], "sbg:createdBy": "vladimirk", "sbg:createdOn": 1458653351, "sbg:id": "admin/sbg-public-data/bwa-mem-bundle-0-7-13/59", "sbg:image_url": null, "sbg:latestRevision": 37, "sbg:license": "BWA: GNU Affero General Public License v3.0, MIT License. Sambamba: GNU GENERAL PUBLIC LICENSE. Samblaster: The MIT License (MIT)", "sbg:links": [ { "id": "http://bio-bwa.sourceforge.net/", "label": "Homepage" }, { "id": "https://github.com/lh3/bwa", "label": "Source code" }, { "id": "http://bio-bwa.sourceforge.net/bwa.shtml", "label": "Wiki" }, { "id": "http://sourceforge.net/projects/bio-bwa/", "label": "Download" }, { "id": "http://arxiv.org/abs/1303.3997", "label": "Publication" }, { "id": "http://www.ncbi.nlm.nih.gov/pubmed/19451168", "label": "Publication BWA Algorithm" } ], "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1480437238, "sbg:project": "vladimirk/bwa-mem-bundle-0-7-13-demo", "sbg:revision": 37, "sbg:revisionNotes": "Added RG ID as optional input parameter", "sbg:revisionsInfo": [ { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1458653351, "sbg:revision": 0, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1458653365, "sbg:revision": 1, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1458653397, "sbg:revision": 2, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1458653457, "sbg:revision": 3, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1458735076, "sbg:revision": 4, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1458744323, "sbg:revision": 5, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1460644019, "sbg:revision": 6, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1461676796, "sbg:revision": 7, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1461677982, "sbg:revision": 8, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1461691955, "sbg:revision": 9, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1462799414, "sbg:revision": 10, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1462800334, "sbg:revision": 11, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1465226602, "sbg:revision": 12, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1465997760, "sbg:revision": 13, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1465999303, "sbg:revision": 14, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1466161520, "sbg:revision": 15, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1468500423, "sbg:revision": 16, "sbg:revisionNotes": "Change red port type - FIX." }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1469448834, "sbg:revision": 17, "sbg:revisionNotes": "port renamed to reference index tar" }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1469449249, "sbg:revision": 18, "sbg:revisionNotes": "reference_index_tar renamed in other expressions." }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1470746327, "sbg:revision": 19, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1470747524, "sbg:revision": 20, "sbg:revisionNotes": "SortedBAM is default output type." }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1471860342, "sbg:revision": 21, "sbg:revisionNotes": "Fix for single-ended reads." }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1471864768, "sbg:revision": 22, "sbg:revisionNotes": "instanceof fix for common filename" }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1471866804, "sbg:revision": 23, "sbg:revisionNotes": "[]concat(input_reads)" }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1471868494, "sbg:revision": 24, "sbg:revisionNotes": "SortedBAM default - returned revision." }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1471879715, "sbg:revision": 25, "sbg:revisionNotes": "Fix for same common sub-strings." }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1471880959, "sbg:revision": 26, "sbg:revisionNotes": "Fix to support single FASTQ input." }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1472054931, "sbg:revision": 27, "sbg:revisionNotes": "reads_size for sorter made more robust." }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1472056751, "sbg:revision": 28, "sbg:revisionNotes": "FASTQs size use for memory and CPU estimation removed!" }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1472057639, "sbg:revision": 29, "sbg:revisionNotes": "reads size removed from estimating number of CPUs" }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1472122448, "sbg:revision": 30, "sbg:revisionNotes": "Added reference_genome metadata field to SAM/BAM." }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1472126991, "sbg:revision": 31, "sbg:revisionNotes": "BAM/SAM metadata, reference_genome in the same format as in drop down menu." }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1476202723, "sbg:revision": 32, "sbg:revisionNotes": "Added reserved number of threads as an input." }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1477616482, "sbg:revision": 33, "sbg:revisionNotes": "added piping command status check" }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1479314087, "sbg:revision": 34, "sbg:revisionNotes": "BAM index output port removed" }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1479483637, "sbg:revision": 35, "sbg:revisionNotes": "Support for files with \"HiSeq X Ten\" in platform metadata field" }, { "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1479492159, "sbg:revision": 36, "sbg:revisionNotes": "fix platform" }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1480437238, "sbg:revision": 37, "sbg:revisionNotes": "Added RG ID as optional input parameter" } ], "sbg:sbgMaintained": false, "sbg:toolAuthor": "Heng Li", "sbg:toolkit": "BWA", "sbg:toolkitVersion": "0.7.13", "sbg:validationErrors": [], "x": 942.3334725035608, "y": 143.00002728568273 }, "label": "BWA-MEM Bundle 0.7.13", "scatter": "#BWA_MEM_Bundle_0_7_13.input_reads", "sbg:x": 942.3334725035608, "sbg:y": 143.00002728568273 }, { "id": "#SBG_Pair_FASTQs_by_Metadata", "inputs": [ { "id": "#SBG_Pair_FASTQs_by_Metadata.fastq_list", "source": [ "#SBG_FASTQ_Quality_Adjuster.result" ] } ], "outputs": [ { "id": "#SBG_Pair_FASTQs_by_Metadata.tuple_list" } ], "run": { "cwlVersion": "sbg:draft-2", "class": "CommandLineTool", "id": "admin/sbg-public-data/sbg-pair-fastqs-by-metadata/9", "label": "SBG Pair FASTQs by Metadata", "description": "Tool accepts list of FASTQ files groups them into separate lists. This grouping is done using metadata values and their hierarchy (Sample ID > Library ID > Platform unit ID > File segment number) which should create unique combinations for each pair of FASTQ files. Important metadata fields are Sample ID, Library ID, Platform unit ID and File segment number. Not all of these four metadata fields are required, but the present set has to be sufficient to create unique combinations for each pair of FASTQ files. Files with no paired end metadata are grouped in the same way as the ones with paired end metadata, generally they should be alone in a separate list. Files with no metadata set will be grouped together, and there will be an error raised if there are more than 2 of these files together. \n\nIf there are more than two files in a group, this is considered an error, and the user should check if the metadata fields for those files are set properly. Also if there is a file that has paired end metadata set and is grouped with another with no paired end metadata, the tool will return an error. If there is only one file with paired end metadata set and it doesn't have a pair provided, it will be grouped into a separate list.\n\nCheck for metadata errors in job.err.log, they will point to the files whose metadata should be checked.", "baseCommand": [ "python", "pair_fastqs_by_metadata.py" ], "inputs": [ { "required": true, "sbg:stageInput": "link", "type": [ { "type": "array", "items": "File" } ], "inputBinding": { "position": 0, "prefix": "--fastq_list", "separate": true, "itemSeparator": ",", "sbg:cmdInclude": true }, "label": "List of FASTQ files", "description": "List of the FASTQ files with properly set metadata fileds.", "sbg:fileTypes": "FASTQ, FQ, FASTQ.GZ, FQ.GZ", "id": "#fastq_list" } ], "outputs": [ { "type": [ "null", { "type": "array", "items": "File" } ], "id": "#tuple_list" } ], "requirements": [ { "class": "CreateFileRequirement", "fileDef": [ { "filename": "pair_fastqs_by_metadata.py", "fileContent": "import functools\nimport json\nimport itertools\nimport docopt\nimport sys\nUSAGE = \"\"\"\n Usage:\n sbg_pair_fastqs_by_metadata.py --fastq_list FILE... --in_metafile FILE --out_metafile FILE [options]\n\n Description:\n Tool accepts list of FASTQ files for one sample as the input and groups them into pairs\n (two files for each paired end). This grouping is done using metadata values that are creating\n unique combination for each pair or of FASTQ files. Metadata that fields that are uniquely defining\n one FASTQ pair are Sample ID, Library ID, Platform Unit ID and File Segment Number.\n Listed order of metadata fields is also representing their hierarchy in the metadata structure.\n Not all of these four metadata fields are required, but the present set has to be sufficient to create\n unique combinations for each pair of FASTQ files. If multiple files have the same metadata and they can't\n be paired in the list of 2 elements, the tool will return a metadata error so the metadata can be properly set.\n\n Options:\n\n --help Show help dialog.\n\n --version Tool version.\n\n --fastq_list FILE... List of the FASTQ files with properly set metadata fields.\n\n --in_metafile FILE File from which necessary metadata information will be extracted.\n Expected value for the SBG platform is job.json. [Default: job.json]\n\n --out_metafile FILE File into which necessary file structure is going to be written.\n Expected value for the SBG platform is cwl.output.json.\n [Default: cwl.output.json]\n\n\n\"\"\"\n\n\nclass MetadataError(Exception):\n # Class for handling groups where one file has paired end metadata set and other doesn't\n def __init__(self, value):\n self.value = value\n\n def __str__(self):\n return repr(self.value)\n\n\ndef make_rg_id(metadata_key, input1):\n\n # Function that gets all the metadata fields that are set for a file, so it can\n # sort the files based on this metadata for grouping\n\n input_meta = input1['metadata']\n s = '__!__'\n rg = list()\n for key in ['sample_group', 'sample_id', 'library_id', 'platform_unit_id']:\n if key in input_meta:\n rg.append(input_meta[key])\n else:\n rg.append('')\n if 'file_segment_number' in input_meta and input_meta['file_segment_number'] is not None:\n rg.append(str(input_meta['file_segment_number']))\n else:\n rg.append('')\n\n # Metadata hierarchy: Sample_ID > Library_ID > Platform_Unit_ID > File_Segment_Number\n # By default files will be split using file segment number, which is the lowest in\n # The metadata hierarchy, if they have the same Sample ID or any of the higher tiers, they will be grouped by this\n rg_map = {\n 'sample_id': rg[:2],\n 'library_id': rg[:3],\n 'platform_unit_id': rg[:4],\n 'file_segment_number': rg[:5],\n }\n\n return s.join(rg_map[metadata_key]) if metadata_key in rg_map else getattr(input_meta, metadata_key)\n\n\ndef group_inputs(inp):\n\n metadata_key = 'file_segment_number'\n key_getter = functools.partial(make_rg_id, metadata_key)\n files = sorted([x for x in inp], key=key_getter)\n # Files are split into lists of same metadata hierarchies\n tuple_list_temp = [[f for f in val] for key, val in itertools.groupby(files, key_getter)]\n tuple_list = list()\n # Grouping files based on metadata\n for elem in tuple_list_temp:\n # Check for multiple files with same metadata - shouldn't be more than 2 grouped\n if len(elem) > 2:\n error_msg = 'Metadata error:'\n error_msg += 'More than two files are grouped! Check if you have set the metadata for these files: '\n for i in elem:\n error_msg += i['path'].split(\"/\")[-1]\n error_msg += ' '\n raise MetadataError(error_msg)\n # break\n if 'paired_end' in elem[0]['metadata'] and len(elem) > 1:\n if 'paired_end' not in elem[1]['metadata']:\n # If second doesn't have paired_end and first does - fail and raise an exception\n error_msg = 'Metadata error:'\n error_msg += 'paired_end metadata not set for one of two files. Check metadata for file: '\n error_msg += elem[1]['path'].split(\"/\")[-1]\n raise MetadataError(error_msg)\n # break\n # If set, check for second\n if elem[0]['metadata']['paired_end'] == \"2\":\n tuple_list.append([elem[0], elem[1]])\n continue\n else:\n tuple_list.append([elem[0], elem[1]])\n continue\n else:\n # if first file doesn't have paired_end and second file does - fail and raise an exception\n if len(elem) == 2 and 'paired_end' in elem[0]['metadata']:\n if 'paired_end' in elem[1]['metadata']:\n error_msg = 'Metadata error:'\n error_msg += 'paired_end metadata not set for one of two files. Check metadata for file: '\n error_msg += elem[0]['path'].split(\"/\")[-1]\n # break\n # group together singular files, with no paired_end set (single pair sequencing)\n if len(elem) == 1:\n tuple_list.append([elem[0]])\n else:\n tuple_list.append([elem[0], elem[1]])\n continue\n\n return tuple_list\n\n\ndef main():\n\n args = docopt.docopt(USAGE, version=1.0)\n\n job_json = args[\"--in_metafile\"]\n job_json_file = open(job_json)\n job_json_str = job_json_file.read()\n job_json_dict = json.loads(job_json_str)\n file_list = [elem for elem in job_json_dict['inputs'][\"fastq_list\"]]\n tuple_list = {}\n try:\n tuple_list = group_inputs(file_list)\n except MetadataError as err:\n sys.stderr.write(str(err))\n exit(1)\n\n tuple_list_dict = {\"tuple_list\": tuple_list}\n with open(args[\"--out_metafile\"], 'w') as p:\n json.dump(tuple_list_dict, p)\n exit(0)\n\nif __name__ == '__main__':\n main()" } ] } ], "hints": [ { "class": "sbg:CPURequirement", "value": 1 }, { "class": "sbg:MemRequirement", "value": 1024 }, { "class": "DockerRequirement", "dockerImageId": "d41a0837ab81", "dockerPull": "images.sbgenomics.com/nikola_jovanovic/sbg-pair-fastqs-by-metadata:v1" } ], "arguments": [ { "position": 1, "prefix": "--in_metafile", "separate": true, "valueFrom": "job.json" }, { "position": 2, "prefix": "--out_metafile", "separate": true, "valueFrom": "cwl.output.json" } ], "sbg:job": { "inputs": { "fastq_list": [ { "class": "File", "path": "/asda/dsa/sda/sda/fasta1.fastq", "secondaryFiles": [], "size": 0 }, { "path": "/asda/dsa/sda/sda/fasta2.fastq" }, { "path": "/asda/dsa/sda/sda/fasta3.fastq" }, { "path": "/asda/dsa/sda/sda/fasta4.fastq" } ] }, "allocatedResources": { "mem": 1024, "cpu": 1 } }, "sbg:appVersion": [ "sbg:draft-2" ], "sbg:categories": [ "Converters", "Other" ], "sbg:cmdPreview": "python pair_fastqs_by_metadata.py --fastq_list /asda/dsa/sda/sda/fasta1.fastq,/asda/dsa/sda/sda/fasta2.fastq,/asda/dsa/sda/sda/fasta3.fastq,/asda/dsa/sda/sda/fasta4.fastq --in_metafile job.json --out_metafile cwl.output.json", "sbg:contributors": [ "vladimirk", "markop", "nikola_jovanovic", "bix-demo" ], "sbg:createdBy": "bix-demo", "sbg:createdOn": 1450911289, "sbg:id": "admin/sbg-public-data/sbg-pair-fastqs-by-metadata/9", "sbg:image_url": null, "sbg:latestRevision": 7, "sbg:license": "Apache License 2.0", "sbg:modifiedBy": "nikola_jovanovic", "sbg:modifiedOn": 1489665046, "sbg:project": "bix-demo/sbgtools-demo", "sbg:projectName": "SBGTools - Demo New", "sbg:revision": 7, "sbg:revisionsInfo": [ { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911289, "sbg:revision": 0, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911290, "sbg:revision": 1, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911290, "sbg:revision": 2, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1463403276, "sbg:revision": 3, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "markop", "sbg:modifiedOn": 1469015151, "sbg:revision": 4, "sbg:revisionNotes": "Link fastq_list" }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1472059795, "sbg:revision": 5, "sbg:revisionNotes": "Added support for single file." }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1489510320, "sbg:revision": 6, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "nikola_jovanovic", "sbg:modifiedOn": 1489665046, "sbg:revision": 7, "sbg:revisionNotes": null } ], "sbg:sbgMaintained": false, "sbg:toolAuthor": "Djordje Klisic, Seven Bridges Genomics, ", "sbg:toolkit": "SBGTools", "sbg:validationErrors": [], "x": 752.3333843019286, "y": 256.33335844675776 }, "label": "SBG Pair FASTQs by Metadata", "sbg:x": 752.3333843019286, "sbg:y": 256.33335844675776 }, { "id": "#SBG_FASTQ_Quality_Adjuster", "inputs": [ { "id": "#SBG_FASTQ_Quality_Adjuster.fastq", "source": [ "#fastq" ] } ], "outputs": [ { "id": "#SBG_FASTQ_Quality_Adjuster.result" } ], "run": { "cwlVersion": "sbg:draft-2", "class": "CommandLineTool", "id": "admin/sbg-public-data/sbg-fastq-quality-adjuster/12", "label": "SBG FASTQ Quality Adjuster", "description": "This app detects quality score format used in input FASTQ file. By default, FASTQ quality score is then converted to standard Sanger quality score if conversion is required. \nIf \"Detection mode\" is selected, quality scale format is recognized and added to metadata, but conversion is not performed.\nSupported source formats are: Solexa, Illumina 1.3, Illumina 1.5 and Illumina 1.8.", "baseCommand": [ { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n qscale = \"to be detected\"\n \n if ($job.inputs.fastq.metadata)\n if ($job.inputs.fastq.metadata[\"quality_scale\"])\n qscale = $job.inputs.fastq.metadata[\"quality_scale\"] \n \n if ($job.inputs.used_quality_scale)\n if ($job.inputs.used_quality_scale != null) \n qscale = $job.inputs.used_quality_scale\n \n \n if (qscale == \"sanger\" || qscale == \"illumina18\" ) \n {// no conversion\n\treturn \"echo No conversion\"\n }\n else\n {\n return \"python3 sbg_fastq_quality_scale_adjuster.py\"\n }\n}" } ], "inputs": [ { "required": false, "sbg:category": "Input", "type": [ "null", { "type": "enum", "symbols": [ "sanger", "illumina18", "illumina13", "illumina15", "solexa" ], "name": "used_quality_scale" } ], "label": "Used quality scale", "description": "Used quality scale of FASTQ reads.", "id": "#used_quality_scale" }, { "required": false, "sbg:category": "Execution", "sbg:stageInput": null, "sbg:toolDefaultValue": "1", "type": [ "null", "int" ], "label": "Total memory [GB]", "description": "Total memory in GB.", "id": "#total_memory" }, { "required": true, "sbg:category": "Input", "sbg:stageInput": "link", "type": [ "File" ], "inputBinding": { "position": 1, "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n qscale = \"to be detected\"\n \n if ($job.inputs.fastq.metadata)\n if ($job.inputs.fastq.metadata[\"quality_scale\"])\n qscale = $job.inputs.fastq.metadata[\"quality_scale\"] \n \n if ($job.inputs.used_quality_scale)\n if ($job.inputs.used_quality_scale != null) \n qscale = $job.inputs.used_quality_scale\n \n \n if (qscale == \"sanger\" || qscale == \"illumina18\" ) {\n return \"\"\n }\n else\n {\n return \"--fastq \" + $job.inputs.fastq.path\n }\n}" }, "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Fastq file", "description": "Input FASTQ file.", "sbg:fileTypes": "FASTQ, FASTQ.GZ, FQ, FQ.GZ", "id": "#fastq" }, { "required": false, "sbg:stageInput": null, "type": [ "null", "boolean" ], "inputBinding": { "position": 2, "prefix": "--no_conversion", "separate": true, "sbg:cmdInclude": true }, "label": "Detection mode (No conversion)", "description": "Detect quality scale format, but do not perform conversion. Detected format will be added to metadata field 'Quality scale'.", "id": "#detection_mode" } ], "outputs": [ { "type": [ "null", "File" ], "label": "Result", "description": "Resulting file in FASTQ format.", "sbg:fileTypes": "FASTQ", "outputBinding": { "glob": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n\n qscale = \"to be detected\"\n \n if ($job.inputs.fastq.metadata)\n if ($job.inputs.fastq.metadata[\"quality_scale\"])\n qscale = $job.inputs.fastq.metadata[\"quality_scale\"] \n \n if ($job.inputs.used_quality_scale)\n if ($job.inputs.used_quality_scale != null) \n qscale = $job.inputs.used_quality_scale\n \n \n if (qscale == \"sanger\" || qscale == \"illumina18\" ) \n {\n return $job.inputs.fastq.path.replace(/^.*[\\\\\\/]/, '')\n }\n else\n {\n\tfile = $job.inputs.fastq.path\n\tfile_split = file.split('.')\n\tbasename = file_split\n \tif (basename.length > 1)\n {\n l_ext = basename.splice(basename.length-1)\n if (l_ext == 'gz')\n {\n basename = basename.slice(0, basename.length-1)\n }\n }\n \tretval = basename.concat('std.fastq')\n\treturn retval.join('.').replace(/^.*[\\\\\\/]/, '') + \"*\"\n }\n}" } }, "id": "#result" } ], "requirements": [ { "class": "ExpressionEngineRequirement", "id": "#cwl-js-engine", "requirements": [ { "class": "DockerRequirement", "dockerPull": "rabix/js-engine" } ] }, { "class": "CreateFileRequirement", "fileDef": [ { "filename": "sbg_fastq_quality_scale_adjuster.py", "fileContent": "\"\"\"\nUsage:\n sbg_fastq_quality_scale_adjuster.py --fastq FILE [--no_conversion]\n\nOptions:\n -h, --help Show this message.\n\n -f, --fastq FILE Input FASTQ file.\n \n --no_conversion Detect quality scale format and add it to metadata, but do not convert to Sanger.\n\n\"\"\"\n\nfrom docopt import docopt\nimport os\nimport gzip\nimport itertools as it\nimport shutil\nimport sys\nfrom math import log10\nfrom subprocess import Popen\nfrom CWL import CWL, CWLFile\nfrom functools import reduce\nimport subprocess\n\nargs = docopt(__doc__, version='1.0')\n\ninput_file = args['--fastq']\nno_conversion = args['--no_conversion']\n\nbase_name = input_file[input_file.rfind('/')+1:input_file.rfind('.') if input_file.rfind('.') != -1 else None]\nr_ext = input_file[input_file.rfind('.')+1:] if input_file.rfind('.') else \"\"\nl_ext = base_name.split('.')[-1].lower()\nif l_ext == 'fastq' or l_ext == 'fq':\n if not r_ext == 'fastq' and not r_ext == 'fq':\n base_name = base_name[:base_name.rfind('.')]\noutput_file = base_name + '.std.fastq'\n\n\n\"\"\"input and output names defined above\"\"\"\n\nclass myGzipFile(gzip.GzipFile):\n def __enter__(self, *args, **kwargs):\n if self.fileobj is None:\n raise ValueError(\"I/O operation on closed GzipFile object\")\n return self\n\n def __exit__(self, *args, **kwargs):\n self.close()\n\n\ndef extremes(a, b):\n if a is False:\n return b, b\n return min(a[0], b), max(a[1], b)\n\n\ndef walk_qualities(f, sample_size=1000):\n for i in range(sample_size * 4):\n try:\n line = next(f)\n except StopIteration:\n return\n if i % 4 == 3:\n yield line.rstrip()\n\n\ndef sniff(path):\n with open(path, 'rb') as f:\n gz = f.read(2) == b'\\x1f\\x8b'\n opn = myGzipFile if gz else open\n with opn(path) as f:\n ord_min, ord_max = reduce(extremes, list(it.chain(*walk_qualities(f))), False)\n if isinstance(ord_min,str):\n ord_min, ord_max = ord(ord_min), ord(ord_max)\n return get_scale(ord_min, ord_max)\n\n\ndef get_scale(ord_min, ord_max):\n options = {\n 'illumina13': (64, 105),\n 'illumina15': (66, 105),\n 'sanger': (33, 126),\n 'solexa': (59, 105),\n }\n fits = [(k, v) for k, v in options.items() if v[0] <= ord_min and v[1] >= ord_max]\n if not fits:\n message = 'Quality scale for range (%s, %s) not found.' % (ord_min, ord_max)\n raise Exception(message)\n # Return narrowest range\n return reduce(lambda a, b: a if a[1][1] - a[1][0] < b[1][1] - b[1][0] else b, fits)[0]\n\ndef qsolexa(x):\n return chr(int(round(10 * log10(10.0**((ord(x)-64)/10.0)+1))) + 33)\n\n\ndef qillumina13(x):\n return chr(ord(x) - 31)\n\n\ndef qillumina15(x):\n return chr(ord(x) - 31) if ord(x)-64 > 2 else chr(33)\n\n\ndef qillumina18(x):\n return x\n\n\n\"\"\"detect quality scale format\"\"\"\n\nmeta_qual = sniff(input_file)\n\n\n\"\"\"Add output quality scale format to metadata\"\"\"\ncwl = CWL()\ncwl.parse_job_json()\ninput_metadata = cwl.inputs['fastq']['metadata']\n\n\"\"\"adjust quality scale if needed\"\"\"\nif no_conversion:\n if input_file.rfind(\".gz\") == len(input_file) - 3:\n output_file = output_file + \".gz\"\n os.rename(input_file, output_file)\nelse:\n if meta_qual == 'illumina13':\n proc = qillumina13\n elif meta_qual == 'illumina15':\n proc = qillumina15\n elif meta_qual == 'solexa':\n proc = qsolexa\n else:\n proc = None\n \n if proc == qsolexa: \n with open(input_file, 'rb') as f:\n gz = f.read(2) == b'\\x1f\\x8b'\n open_gz = myGzipFile if gz else open\n with open(output_file, 'w') as out:\n for i, line in enumerate(open_gz(input_file)):\n if i % 4 == 3:\n line = line.strip()\n converted = ''.join(map(proc, line))\n out.write(converted + b'\\n')\n else:\n out.write(line)\n contents = \"Original fastq quality scale format was \" + meta_qual + \", and is converted to illumina18.\\n\"\n elif proc is not None: #seqtk converter from illumina13-15\n cmd = ['seqtk','seq','-Q64','-V',input_file,'>',output_file]\n with open(output_file, 'w') as out:\n p = Popen(cmd, stdout = out)\n p.communicate()\n contents = \"Original fastq quality scale format was \" + meta_qual + \", and is converted to illumina18.\\n\"\n else:\n if input_file.rfind(\".gz\") == len(input_file) - 3:\n output_file = output_file + \".gz\"\n os.rename(input_file, output_file)\n contents = \"Original fastq quality scale format was illumina18. No conversion performed.\\n\"\n\n sys.stderr.write(contents) #Write conversion to error log\n meta_qual = 'sanger'\n\n \noutput_cwl = CWLFile(os.path.abspath(output_file))\noutput_cwl.metadata = input_metadata\noutput_cwl.metadata['quality_scale'] = meta_qual\n\ncwl.make_cwl_output_json('cwl.output.json', {'result': output_cwl})" }, { "filename": "CWL.py", "fileContent": "import json\nfrom collections import defaultdict\nfrom copy import deepcopy\nfrom os.path import basename, abspath\n\n# constants\nINPUTS_KEY = 'inputs'\nMETADATA_KEY = 'metadata'\nNAME_KEY = 'name'\nPATH_KEY = 'path'\nCLASS_KEY = 'class'\nLOCATION_KEY = 'location'\nUNCATEGORIZED = 'uncategorized'\n\nclass CWLFile(object):\n _metadata = None\n _path = None\n\n @property\n def metadata(self):\n return self._metadata\n\n @metadata.setter\n def metadata(self, value):\n self._metadata = value\n\n @metadata.deleter\n def metadata(self):\n del self._metadata\n\n @property\n def path(self):\n return self._path\n\n @path.setter\n def path(self, value):\n self._path = value\n\n @path.deleter\n def path(self):\n del self._path\n\n def __str__(self):\n return self.path if self.path else ''\n\n def __repr__(self):\n return self.__str__()\n\n def __init_str__(self, file: str):\n self.metadata = {}\n self.path = abspath(file)\n\n def __init_dict__(self, file: dict):\n self.metadata = deepcopy(file['metadata']) if file['metadata'] else {}\n self.path = file['path'] if file['path'] else ''\n\n def __init__(self, *args, **kwargs):\n if (len(args) > 0):\n if isinstance(args[0], str):\n self.__init_str__(args[0])\n elif isinstance(args[0], dict):\n self.__init_dict__(args[0])\n else:\n raise Exception(\"Can\\'t make instance of class CWLFile. \"\n \"Argument have to be either instance of type str or dict. \")\n\n\n def intersect_metadata(self, m2: dict):\n \"\"\"\n :param m2: Metadata dictionary\n :return: None\n \"\"\"\n self.metadata = {k: v for k, v in self.metadata.items() if v == m2.get(k)}\n\n def toJSON(self):\n dict = {\n CLASS_KEY: 'File',\n PATH_KEY: self.__str__(),\n NAME_KEY: basename(self.__str__()),\n METADATA_KEY: self.metadata\n }\n return dict\n\nclass CWLGroup(dict):\n\n def _leafs(self, obj, out: list):\n \"\"\"\n :param obj: CWLGroup node\n :param out: Reference of output list\n :return: None\n \"\"\"\n if isinstance(obj, list):\n out.append(obj)\n elif isinstance(obj, dict):\n for _,value in obj.items():\n self._leafs(value, out)\n else:\n raise Exception('Unexpected type to be flatten.')\n\n def leafs(self):\n \"\"\"\n :return: List of groups located on leafs of grouped tree.\n \"\"\"\n out = list()\n self._leafs(self, out)\n return out\n\nclass CWL(object):\n\n #private\n _inputs = None\n _job_json_path = ''\n _cwl_output_json_path = ''\n\n #properties\n @property\n def inputs(self):\n return self._inputs\n\n @property\n def job_json_path(self):\n return self._job_json_path\n\n @job_json_path.setter\n def job_json_path(self, value):\n self._job_json_path = value\n\n @job_json_path.deleter\n def job_json_path(self):\n del self._job_json_path\n\n @property\n def cwl_output_json_path(self):\n return self._cwl_output_json_path\n\n @cwl_output_json_path.setter\n def cwl_output_json_path(self, value):\n self._cwl_output_json_path = value\n\n @cwl_output_json_path.deleter\n def cwl_output_json_path(self):\n del self._cwl_output_json_path\n\n #methods\n def __init__(self, job_json: str='job.json'):\n self._inputs = dict()\n self.parse_job_json(job_json_path=job_json)\n\n def parse_job_json(self, job_json_path='job.json', key: str=None):\n \"\"\"\n :param job_json_path: Location of job.json file\n :param key: Extract only specific key from job.json inputs\n :return: None\n \"\"\"\n self.job_json_path = job_json_path\n try:\n with open(self.job_json_path) as job_json_file:\n job_json = json.load(job_json_file)\n if job_json and INPUTS_KEY in job_json:\n if key is None:\n for key in job_json[INPUTS_KEY]:\n self._inputs[key] = job_json[INPUTS_KEY][key]\n elif key in job_json[INPUTS_KEY]:\n inputs = job_json[INPUTS_KEY][key]\n self._inputs = inputs\n else:\n raise Exception('Key '+ key +' is not member of inputs property.')\n except IOError:\n raise Exception('ERROR parse_fastq: job.json file doesn\\'t exists')\n\n def full_group_by(self, l, key=lambda x: x):\n \"\"\"\n :param l: List that will be grouped\n :param key: Function used for creating keys in new Group \n :return: Key, Value pairs\n \"\"\"\n d = defaultdict(list)\n for item in l:\n k = key(item)\n if k is not None:\n d[k].append(item)\n else:\n d[UNCATEGORIZED].append(item)\n return d.items()\n\n def group_by_metadata_key(self, metadata_key: str, inputs: list) -> CWLGroup:\n \"\"\"\n :param metadata_key: Key that is used for grouping\n :param inputs: List of inputs that will be grouped\n :return: Instance of CWLGroup after grouping by metadata_key\n \"\"\"\n return CWLGroup({key: [f for f in val]\n for key, val in self.full_group_by(inputs, key=lambda x: x[METADATA_KEY][metadata_key]\n if metadata_key in x[METADATA_KEY]\n else None)})\n\n def group_by(self, metadata_keys: list, input_key, sort_by_metadata_key: str=None) -> CWLGroup:\n \"\"\"\n :param metadata_keys: The keys that are used for grouping\n :param input_key: input key in job.json inputs field\n :param sort_by_metadata_key: Key used for sorting leafs\n :return: Instance of CWLGroup after grouping by metadata_keys\n \"\"\"\n if isinstance(metadata_keys,list) and len(metadata_keys) > 0:\n if input_key in self._inputs:\n groups = self.group_by_metadata_key(metadata_key=metadata_keys[0],\n inputs=deepcopy(self._inputs)[input_key])\n last_groups = [groups]\n for i in range(1, len(metadata_keys)):\n metadata_key = metadata_keys[i]\n newGroups = list()\n for group in last_groups:\n for group_key in group:\n g = self.group_by_metadata_key(metadata_key=metadata_key, inputs=group[group_key])\n group[group_key] = g\n newGroups.append(group[group_key])\n last_groups = newGroups\n\n if sort_by_metadata_key:\n for group in last_groups:\n for key in group:\n group[key].sort(key=lambda x: x[METADATA_KEY][sort_by_metadata_key])\n return groups\n else:\n raise Exception('Error: ' + input_key + ' not member of ', self._inputs)\n else:\n raise Exception('Error: metadata_keys argument have to be non empty list.')\n\n def create_out_json(self, file: CWLFile) -> dict:\n \"\"\"\n :param file: CWLFile object \n :return: JSON representation of file \n \"\"\"\n out = {}\n out[CLASS_KEY] = 'File'\n out[PATH_KEY] = file.__str__()\n out[NAME_KEY] = basename(file.__str__())\n out[METADATA_KEY] = file.metadata\n return out\n\n def create_cwl_file(self, path: str, metadata: dict):\n \"\"\"\n :param path: Path to cwl file \n :param metadata: Metadata information about file\n :return: CWLFile\n \"\"\"\n cwl_file = CWLFile(path)\n cwl_file.metadata = metadata\n return cwl_file\n def make_cwl_output_json(self, out_path: str, cwl_files: dict):\n \"\"\"\n :param out_path: Location where cwl.output.json will be created.\n :param cwl_files: Files with metadata that will be written into cwl.output.json in the header field.\n :param output_id: Output id in cwl.output.json\n :return: None\n \"\"\"\n try:\n\n with open(out_path, 'w') as out:\n json.dump(cwl_files, out, default=lambda o: o.toJSON(), sort_keys=True, indent=4, separators=(',', ': '))\n\n except Exception as e:\n print ('Error: ' + str(e))" } ] } ], "hints": [ { "class": "sbg:CPURequirement", "value": 1 }, { "class": "sbg:MemRequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.total_memory){\n return $job.inputs.total_memory * 1024\n } else {\n return 1000\n }\n}" } }, { "class": "DockerRequirement", "dockerPull": "images.sbgenomics.com/bogdang/sbg_quality_scale_adjuster:2.0" } ], "sbg:job": { "inputs": { "used_quality_scale": null, "total_memory": 9, "fastq": { "class": "File", "metadata": { "Quality scale": "sanger" }, "path": "/path/to/test.1.fastq", "secondaryFiles": [], "size": 0 }, "detection_mode": false }, "allocatedResources": { "mem": 9216, "cpu": 1 } }, "appUrl": "/u/bogdang/fastq-quality-converter/apps/#bogdang/fastq-quality-converter/sbg-fastq-quality-adjuster-seqtk/27", "sbg:appVersion": [ "sbg:draft-2" ], "sbg:categories": [ "Converters", "FASTQ-Processing" ], "sbg:cmdPreview": "python3 sbg_fastq_quality_scale_adjuster.py --fastq /path/to/test.1.fastq", "sbg:contributors": [ "vladimirk", "bogdang" ], "sbg:createdBy": "vladimirk", "sbg:createdOn": 1470927070, "sbg:id": "admin/sbg-public-data/sbg-fastq-quality-adjuster/12", "sbg:image_url": null, "sbg:latestRevision": 12, "sbg:license": "Apache License 2.0", "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1495706394, "sbg:project": "bix-demo/sbgtools-demo", "sbg:projectName": "SBGTools - Demo New", "sbg:revision": 12, "sbg:revisionNotes": "Added detection mode and switched to python3", "sbg:revisionsInfo": [ { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1470927070, "sbg:revision": 0, "sbg:revisionNotes": "Copy of bogdang/fastq-quality-converter/sbg-fastq-quality-adjuster/23" }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1472559664, "sbg:revision": 1, "sbg:revisionNotes": "Copy of bogdang/fastq-quality-converter/sbg-fastq-quality-adjuster/24" }, { "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1474546255, "sbg:revision": 2, "sbg:revisionNotes": "Copy of bogdang/fastq-quality-converter/sbg-fastq-quality-adjuster/25" }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1475084132, "sbg:revision": 3, "sbg:revisionNotes": "'sanger': (33, 74) instead 'sanger': (33, 126)" }, { "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1475231421, "sbg:revision": 4, "sbg:revisionNotes": "sanger 33:92" }, { "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1475234050, "sbg:revision": 5, "sbg:revisionNotes": "seqtk for converting from illumina13-15" }, { "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1478274820, "sbg:revision": 6, "sbg:revisionNotes": "fix for seqtk conversion from .gz files" }, { "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1478277013, "sbg:revision": 7, "sbg:revisionNotes": "Without seqtk" }, { "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1478300733, "sbg:revision": 8, "sbg:revisionNotes": "fix seqtk for .gz files" }, { "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1481123041, "sbg:revision": 9, "sbg:revisionNotes": "Support for files named filename.fq.fastq" }, { "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1481290093, "sbg:revision": 10, "sbg:revisionNotes": "No conversion if sanger or illumina18 quality scale set in metadata" }, { "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1481448894, "sbg:revision": 11, "sbg:revisionNotes": "fix" }, { "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1495706394, "sbg:revision": 12, "sbg:revisionNotes": "Added detection mode and switched to python3" } ], "sbg:sbgMaintained": false, "sbg:toolAuthor": "Seven Bridges Genomics", "sbg:toolkit": "SBGTools", "sbg:validationErrors": [], "x": 363.33334823449474, "y": 246.09373995040806 }, "label": "SBG FASTQ Quality Adjuster", "scatter": "#SBG_FASTQ_Quality_Adjuster.fastq", "sbg:x": 363.33334823449474, "sbg:y": 246.09373995040806 }, { "id": "#FastQC", "inputs": [ { "id": "#FastQC.input_fastq", "source": [ "#fastq" ] } ], "outputs": [ { "id": "#FastQC.report_zip" } ], "run": { "cwlVersion": "sbg:draft-2", "class": "CommandLineTool", "id": "admin/sbg-public-data/fastqc-0-11-4/18", "label": "FastQC", "description": "FastQC reads a set of sequence files and produces a quality control (QC) report from each one. These reports consist of a number of different modules, each of which will help identify a different type of potential problem in your data. \n\nSince it's necessary to convert the tool report in order to show them on Seven Bridges platform, it's recommended to use [FastQC Analysis workflow instead](https://igor.sbgenomics.com/public/apps#admin/sbg-public-data/fastqc-analysis/). \n\nFastQC is a tool which takes a FASTQ file and runs a series of tests on it to generate a comprehensive QC report. This report will tell you if there is anything unusual about your sequence. Each test is flagged as a pass, warning, or fail depending on how far it departs from what you would expect from a normal large dataset with no significant biases. It is important to stress that warnings or even failures do not necessarily mean that there is a problem with your data, only that it is unusual. It is possible that the biological nature of your sample means that you would expect this particular bias in your results.\n\n### Common Issues:\n\nOutput of the tool is ZIP archive. In order to view report on Seven Bridges platform, you can use SBG Html2b64 tool. It is advised to scatter SBG Html2b64 so it would be able to process an array of files. The example can be seen in [FastQC Analysis workflow](https://igor.sbgenomics.com/public/apps#admin/sbg-public-data/fastqc-analysis/) which you can also use instead of this tool.", "baseCommand": [ "fastqc" ], "inputs": [ { "sbg:altPrefix": "-t", "sbg:category": "Options", "sbg:toolDefaultValue": "1", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--threads", "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n//if \"threads\" is not specified\n//number of threads is determined based on number of inputs\n if (! $job.inputs.threads){\n $job.inputs.threads = [].concat($job.inputs.input_fastq).length\n }\n return Math.min($job.inputs.threads,7)\n}" }, "sbg:cmdInclude": true }, "label": "Threads", "description": "Specifies the number of files which can be processed simultaneously. Each thread will be allocated 250MB of memory so you shouldn't run more threads than your available memory will cope with, and not more than 6 threads on a 32 bit machine.", "id": "#threads" }, { "sbg:altPrefix": "-q", "sbg:category": "Options", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--quiet", "separate": true, "sbg:cmdInclude": true }, "label": "Quiet", "description": "Supress all progress messages on stdout and only report errors.", "id": "#quiet" }, { "sbg:category": "Options", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--nogroup", "separate": false, "sbg:cmdInclude": true }, "label": "Nogroup", "description": "Disable grouping of bases for reads >50bp. All reports will show data for every base in the read. WARNING: Using this option will cause fastqc to crash and burn if you use it on really long reads, and your plots may end up a ridiculous size. You have been warned.", "id": "#nogroup" }, { "sbg:category": "Options", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--nano", "separate": false, "sbg:cmdInclude": true }, "label": "Nano", "description": "Files come from naopore sequences and are in fast5 format. In this mode you can pass in directories to process and the program will take in all fast5 files within those directories and produce a single output file from the sequences found in all files.", "id": "#nano" }, { "sbg:category": "Execution parameters", "sbg:toolDefaultValue": "Determined by the number of input files", "type": [ "null", "int" ], "label": "Amount of memory allocated per job execution.", "description": "Amount of memory allocated per execution of FastQC job.", "id": "#memory_per_job" }, { "required": false, "sbg:altPrefix": "-l", "sbg:category": "File inputs", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--limits", "separate": true, "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Limits", "description": "Specifies a non-default file which contains a set of criteria which will be used to determine the warn/error limits for the various modules. This file can also be used to selectively remove some modules from the output all together. The format needs to mirror the default limits.txt file found in the Configuration folder.", "sbg:fileTypes": "TXT", "id": "#limits_file" }, { "sbg:altPrefix": "-f", "sbg:category": "Options", "sbg:toolDefaultValue": "7", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--kmers", "separate": true, "sbg:cmdInclude": true }, "label": "Kmers", "description": "Specifies the length of Kmer to look for in the Kmer content module. Specified Kmer length must be between 2 and 10. Default length is 7 if not specified.", "id": "#kmers" }, { "required": true, "sbg:category": "File inputs", "type": [ { "type": "array", "items": "File" } ], "inputBinding": { "position": 100, "separate": true, "sbg:cmdInclude": true }, "label": "Input file", "description": "Input file.", "sbg:fileTypes": "FASTQ, FQ, FASTQ.GZ, FQ.GZ, BAM, SAM", "id": "#input_fastq" }, { "sbg:altPrefix": "-f", "sbg:category": "Options", "sbg:toolDefaultValue": "FASTQ", "type": [ "null", { "type": "enum", "symbols": [ "bam", "sam", "bam_mapped", "sam_mapped", "fastq" ], "name": "format" } ], "inputBinding": { "position": 0, "prefix": "--format", "separate": true, "sbg:cmdInclude": true }, "label": "Format", "description": "Bypasses the normal sequence file format detection and forces the program to use the specified format. Valid formats are BAM, SAM, BAM_mapped, SAM_mapped and FASTQ.", "id": "#format" }, { "sbg:category": "Execution parameters", "sbg:toolDefaultValue": "Determined by the number of input files", "type": [ "null", "int" ], "label": "Number of CPUs.", "description": "Number of CPUs to be allocated per execution of FastQC.", "id": "#cpus_per_job" }, { "required": false, "sbg:altPrefix": "-c", "sbg:category": "File inputs", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--contaminants", "separate": true, "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Contaminants", "description": "Specifies a non-default file which contains the list of contaminants to screen overrepresented sequences against. The file must contain sets of named contaminants in the form name[tab]sequence. Lines prefixed with a hash will be ignored.", "sbg:fileTypes": "TXT", "id": "#contaminants_file" }, { "sbg:category": "Options", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--casava", "separate": false, "sbg:cmdInclude": true }, "label": "Casava", "description": "Files come from raw casava output. Files in the same sample group (differing only by the group number) will be analysed as a set rather than individually. Sequences with the filter flag set in the header will be excluded from the analysis. Files must have the same names given to them by casava (including being gzipped and ending with .gz) otherwise they won't be grouped together correctly.", "id": "#casava" }, { "required": false, "sbg:altPrefix": "-a", "sbg:category": "File inputs", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--adapters", "separate": true, "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Adapters", "description": "Specifies a non-default file which contains the list of adapter sequences which will be explicity searched against the library. The file must contain sets of named adapters in the form name[tab]sequence. Lines prefixed with a hash will be ignored.", "sbg:fileTypes": "TXT", "id": "#adapters_file" } ], "outputs": [ { "type": [ "null", { "type": "array", "items": "File" } ], "label": "Report zip", "description": "Zip archive of the report.", "sbg:fileTypes": "ZIP", "outputBinding": { "glob": "*_fastqc.zip", "sbg:metadata": { "__inherit__": "input_fastq" }, "sbg:inheritMetadataFrom": "#input_fastq" }, "id": "#report_zip" } ], "requirements": [ { "class": "ExpressionEngineRequirement", "id": "#cwl-js-engine", "requirements": [ { "class": "DockerRequirement", "dockerPull": "rabix/js-engine" } ] } ], "hints": [ { "class": "sbg:CPURequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n // if cpus_per_job is set, it takes precedence\n if ($job.inputs.cpus_per_job) {\n return $job.inputs.cpus_per_job \n }\n // if threads parameter is set, the number of CPUs is set based on that parametere\n else if ($job.inputs.threads) {\n return $job.inputs.threads\n }\n // else the number of CPUs is determined by the number of input files, up to 7 -- default\n else return Math.min([].concat($job.inputs.input_fastq).length,7)\n}" } }, { "class": "sbg:MemRequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n // if memory_per_job is set, it takes precedence\n if ($job.inputs.memory_per_job){\n return $job.inputs.memory_per_job\n }\n // if threads parameter is set, memory req is set based on the number of threads\n else if ($job.inputs.threads){\n return 1024 + 300*$job.inputs.threads\n }\n // else the memory req is determined by the number of input files, up to 7 -- default\n else return (1024 + 300*Math.min([].concat($job.inputs.input_fastq).length,7))\n}\n\n" } }, { "class": "DockerRequirement", "dockerImageId": "759c4c8fbafd", "dockerPull": "images.sbgenomics.com/mladenlsbg/fastqc:0.11.4" } ], "arguments": [ { "position": 0, "prefix": "", "separate": true, "valueFrom": "--noextract" }, { "position": 0, "prefix": "--outdir", "separate": true, "valueFrom": "." } ], "sbg:job": { "inputs": { "threads": null, "quiet": true, "nogroup": null, "nano": null, "memory_per_job": null, "limits_file": null, "kmers": null, "input_fastq": [ { "class": "File", "path": "/path/to/input_fastq-1.fastq", "secondaryFiles": [], "size": 0 }, { "class": "File", "path": "/path/to/input_fastq-2.fastq", "secondaryFiles": [], "size": 0 } ], "format": null, "cpus_per_job": null, "contaminants_file": null, "casava": null, "adapters_file": null }, "allocatedResources": { "mem": 1624, "cpu": 2 } }, "sbg:appVersion": [ "sbg:draft-2" ], "sbg:categories": [ "FASTQ-Processing", "Quality-Control", "Quantification" ], "sbg:cmdPreview": "fastqc --noextract --outdir . /path/to/input_fastq-1.fastq /path/to/input_fastq-2.fastq", "sbg:contributors": [ "mladenlSBG", "nikola_jovanovic", "bix-demo" ], "sbg:createdBy": "bix-demo", "sbg:createdOn": 1450911593, "sbg:id": "admin/sbg-public-data/fastqc-0-11-4/18", "sbg:image_url": null, "sbg:latestRevision": 10, "sbg:license": "GNU General Public License v3.0 only", "sbg:links": [ { "id": "http://www.bioinformatics.babraham.ac.uk/projects/fastqc/", "label": "Homepage" }, { "id": "http://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.4_source.zip", "label": "Source Code" }, { "id": "https://wiki.hpcc.msu.edu/display/Bioinfo/FastQC+Tutorial", "label": "Wiki" }, { "id": "http://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.4.zip", "label": "Download" }, { "id": "http://www.bioinformatics.babraham.ac.uk/projects/fastqc", "label": "Publication" } ], "sbg:modifiedBy": "nikola_jovanovic", "sbg:modifiedOn": 1493223877, "sbg:project": "bix-demo/fastqc-0-11-4-demo", "sbg:projectName": "FastQC 0.11.4 - Demo", "sbg:revision": 10, "sbg:revisionNotes": "* Fixed the JS expression for the CPU and Memory allocation\n* Added cpus_per_job and memory_per_job parameters\n* Removed default version for format, so the tool can handle combinations of file formats", "sbg:revisionsInfo": [ { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911593, "sbg:revision": 0, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911593, "sbg:revision": 1, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911594, "sbg:revision": 2, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "mladenlSBG", "sbg:modifiedOn": 1459870965, "sbg:revision": 3, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "mladenlSBG", "sbg:modifiedOn": 1465990120, "sbg:revision": 4, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "mladenlSBG", "sbg:modifiedOn": 1476188095, "sbg:revision": 5, "sbg:revisionNotes": "Input categories added." }, { "sbg:modifiedBy": "mladenlSBG", "sbg:modifiedOn": 1476270496, "sbg:revision": 6, "sbg:revisionNotes": "FASTQ input changed from single file to array. Added better thread handling. \n\nIMPORTANT NOTICE: If updating this tool in existing workflow, it's necessary to REMOVE SCATTER (uncheck it) from input_fastq or it might break the pipeline." }, { "sbg:modifiedBy": "mladenlSBG", "sbg:modifiedOn": 1476354537, "sbg:revision": 7, "sbg:revisionNotes": "FASTQ input changed from single file to array. Added better thread handling.\n\nIMPORTANT NOTICE: If updating this tool in existing workflow, it's necessary to REMOVE SCATTER (uncheck it) from input_fastq or it might break the pipeline." }, { "sbg:modifiedBy": "mladenlSBG", "sbg:modifiedOn": 1488882730, "sbg:revision": 8, "sbg:revisionNotes": "IMPORTANT NOTICE: If updating this tool in existing workflow, it's necessary to REMOVE SCATTER (uncheck it) from input_fastq or it might break the pipeline.\"\n\nAdded automatised handling of BAM and SAM files. Also, added security measures for better automated threading handling." }, { "sbg:modifiedBy": "nikola_jovanovic", "sbg:modifiedOn": 1488980183, "sbg:revision": 9, "sbg:revisionNotes": "Changed the file types of limits, adapters and contaminants files to be TXT, they have to be in format name[tab]sequence. Format should be similar to the one in the Configuration folder provided with FastQC, txt files.\n\n\"IMPORTANT NOTICE: If updating this tool in existing workflow, it's necessary to REMOVE SCATTER (uncheck it) from input_fastq or it might break the pipeline.\"" }, { "sbg:modifiedBy": "nikola_jovanovic", "sbg:modifiedOn": 1493223877, "sbg:revision": 10, "sbg:revisionNotes": "* Fixed the JS expression for the CPU and Memory allocation\n* Added cpus_per_job and memory_per_job parameters\n* Removed default version for format, so the tool can handle combinations of file formats" } ], "sbg:sbgMaintained": false, "sbg:toolAuthor": "Babraham Institute", "sbg:toolkit": "FastQC", "sbg:toolkitVersion": "0.11.4", "sbg:validationErrors": [], "x": 121.00001017252612, "y": 467.0000712076826 }, "label": "FastQC", "sbg:x": 121.00001017252612, "sbg:y": 467.0000712076826 }, { "id": "#SBG_Prepare_Intervals", "inputs": [ { "id": "#SBG_Prepare_Intervals.split_mode", "default": "File per chr with alt contig in a single file" }, { "id": "#SBG_Prepare_Intervals.fai_file", "source": [ "#SBG_FASTA_Indices.fasta_index" ] }, { "id": "#SBG_Prepare_Intervals.bed_file", "source": [ "#bed_file_1" ] } ], "outputs": [ { "id": "#SBG_Prepare_Intervals.names" }, { "id": "#SBG_Prepare_Intervals.intervals" } ], "run": { "cwlVersion": "sbg:draft-2", "class": "CommandLineTool", "id": "admin/sbg-public-data/sbg-prepare-intervals/84", "label": "SBG Prepare Intervals", "description": "Depending on selected Split Mode value, output files are generated in accordance with description below:\n\n1. File per interval - The tool creates one interval file per line of the input BED(FAI) file.\nEach interval file contains a single line (one of the lines of BED(FAI) input file).\n\n2. File per chr with alt contig in a single file - For each contig(chromosome) a single file\nis created containing all the intervals corresponding to it .\nAll the intervals (lines) other than (chr1, chr2 ... chrY or 1, 2 ... Y) are saved as\n(\"others.bed\").\n\n3. Output original BED - BED file is required for execution of this mode. If mode 3 is applied input is passed to the output.\n\n4. File per interval with alt contig in a single file - For each chromosome a single file is created for each interval.\nAll the intervals (lines) other than (chr1, chr2 ... chrY or 1, 2 ... Y) are saved as\n(\"others.bed\").\n\n##### Common issues: \nDo not use option 1 (File per interval) with exome BED or a BED with a lot of GL contigs, as it will create a large number of files.", "baseCommand": [ "python", "sbg_prepare_intervals.py" ], "inputs": [ { "sbg:category": "Input", "type": [ { "type": "enum", "symbols": [ "File per interval", "File per chr with alt contig in a single file", "Output original BED", "File per interval with alt contig in a single file" ], "name": "split_mode" } ], "inputBinding": { "position": 3, "prefix": "--mode", "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n mode = $job.inputs.split_mode\n switch (mode) \n {\n case \"File per interval\": \n return 1\n case \"File per chr with alt contig in a single file\": \n return 2\n case \"Output original BED\": \n return 3\n case \"File per interval with alt contig in a single file\": \n return 4 \n }\n return 3\n}" }, "sbg:cmdInclude": true }, "label": "Split mode", "description": "Depending on selected Split Mode value, output files are generated in accordance with description below: 1. File per interval - The tool creates one interval file per line of the input BED(FAI) file. Each interval file contains a single line (one of the lines of BED(FAI) input file). 2. File per chr with alt contig in a single file - For each contig(chromosome) a single file is created containing all the intervals corresponding to it . All the intervals (lines) other than (chr1, chr2 ... chrY or 1, 2 ... Y) are saved as (\"others.bed\"). 3. Output original BED - BED file is required for execution of this mode. If mode 3 is applied input is passed to the output. 4. File per interval with alt contig in a single file - For each chromosome a single file is created for each interval. All the intervals (lines) other than (chr1, chr2 ... chrY or 1, 2 ... Y) are saved as (\"others.bed\"). NOTE: Do not use option 1 (File per interval) with exome BED or a BED with a lot of GL contigs, as it will create a large number of files.", "id": "#split_mode" }, { "sbg:category": "Input", "type": [ "null", { "type": "enum", "symbols": [ "chr start end", "chr:start-end" ], "name": "format" } ], "label": "Interval format", "description": "Format of the intervals in the generated files.", "id": "#format" }, { "required": false, "sbg:category": "File Input", "type": [ "null", "File" ], "inputBinding": { "position": 2, "prefix": "--fai", "separate": true, "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Input FAI file", "description": "FAI file is converted to BED format if BED file is not provided.", "sbg:fileTypes": "FAI", "id": "#fai_file" }, { "required": false, "sbg:category": "File Input", "sbg:stageInput": "link", "type": [ "null", "File" ], "inputBinding": { "position": 1, "prefix": "--bed", "separate": true, "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Input BED file", "description": "Input BED file containing intervals. Required for modes 3 and 4.", "sbg:fileTypes": "BED", "id": "#bed_file" } ], "outputs": [ { "type": [ "null", "string" ], "label": "Output file names", "description": "File containing the names of created files.", "outputBinding": { "glob": "Intervals/names.txt", "outputEval": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{ \n content = $self[0].contents.replace(/\\0/g, '')\n content = content.replace('[','')\n content = content.replace(']','')\n content = content.replace(/\\'/g, \"\")\n content = content.replace(/\\s/g, '')\n content_arr = content.split(\",\")\n\n return content_arr\n \n\n} " } }, "id": "#names" }, { "type": [ "null", { "type": "array", "items": "File" } ], "label": "Intervals", "description": "Array of BED files genereted as per selected Split Mode.", "sbg:fileTypes": "BED", "outputBinding": { "glob": "Intervals/*.bed", "sbg:metadata": { "sbg_scatter": "true" } }, "id": "#intervals" } ], "requirements": [ { "class": "ExpressionEngineRequirement", "id": "#cwl-js-engine", "requirements": [ { "class": "DockerRequirement", "dockerPull": "rabix/js-engine" } ] }, { "class": "CreateFileRequirement", "fileDef": [ { "filename": "sbg_prepare_intervals.py", "fileContent": "\"\"\"\nUsage:\n sbg_prepare_intervals.py [options] [--fastq FILE --bed FILE --mode INT --format STR --others STR]\n\nDescription:\n Purpose of this tool is to split BED file into files based on the selected mode.\n If bed file is not provided fai(fasta index) file is converted to bed.\n\nOptions:\n\n -h, --help Show this message.\n\n -v, -V, --version Tool version.\n\n -b, -B, --bed FILE Path to input bed file.\n\n --fai FILE Path to input fai file.\n\n --format STR Output file format.\n\n --mode INT Select input mode.\n\n\"\"\"\n\n\nfrom docopt import docopt\nimport os\nimport shutil\nimport glob\n\ndefault_extension = '.bed' # for output files\n\n\n\ndef create_file(contents, contig_name, extension=default_extension):\n \"\"\"function for creating a file for all intervals in a contig\"\"\"\n\n new_file = open(\"Intervals/\" + contig_name + extension, \"w\")\n new_file.write(contents)\n new_file.close()\n\n\ndef add_to_file(line, name, extension=default_extension):\n \"\"\"function for adding a line to a file\"\"\"\n\n new_file = open(\"Intervals/\" + name + extension, \"a\")\n if lformat == formats[1]:\n sep = line.split(\"\\t\")\n line = sep[0] + \":\" + sep[1] + \"-\" + sep[2]\n new_file.write(line)\n new_file.close()\n\n\ndef fai2bed(fai):\n \"\"\"function to create a bed file from fai file\"\"\"\n\n region_thr = 10000000 # threshold used to determine starting point accounting for telomeres in chromosomes\n if not fai.rfind(\".fasta.fai\") == -1:\n basename = fai[0:fai.rfind(\".fasta.fai\")]\n else:\n basename = fai[0:fai.rfind(\".\")]\n with open(fai, \"r\") as ins:\n new_array = []\n for line in ins:\n len_reg = int(line.split()[1])\n cutoff = 0 if (len_reg < region_thr) else 0 # sd\\\\telomeres or start with 1\n new_line = line.split()[0] + '\\t' + str(cutoff) + '\\t' + str(len_reg + cutoff)\n new_array.append(new_line)\n new_file = open(basename + \".bed\", \"w\")\n new_file.write(\"\\n\".join(new_array))\n return basename + \".bed\"\n\ndef chr_intervals(no_of_chrms = 23):\n \"\"\"returns all possible designations for chromosome intervals\"\"\"\n \n chrms = []\n for i in range(1, no_of_chrms):\n chrms.append(\"chr\" + str(i))\n chrms.append(str(i))\n chrms.extend([\"x\", \"y\", \"chrx\", \"chry\"])\n return chrms\n\n\ndef mode_1(orig_file):\n \"\"\"mode 1: every line is a new file\"\"\"\n\n with open(orig_file, \"r\") as ins:\n prev = \"\"\n counter = 0\n names = []\n for line in ins:\n if line.split()[0] == prev:\n counter += 1\n else:\n counter = 0\n suffix = \"\" if (counter == 0) else \"_\" + str(counter)\n create_file(line, line.split()[0] + suffix)\n names.append(line.split()[0] + suffix)\n prev = line.split()[0]\n\n create_file(str(names), \"names\", extension=\".txt\")\n\ndef mode_2(orig_file, others_name):\n \"\"\"mode 2: separate file is created for each chromosome, and one file is created for other intervals\"\"\"\n\n chrms = chr_intervals()\n names = []\n\n with open(orig_file, 'r') as ins:\n for line in ins:\n name = line.split()[0]\n if name.lower() in chrms:\n name = name.lower()\n else:\n name = others_name\n try:\n add_to_file(line, name)\n if not name in names:\n names.append(name)\n except:\n raise Exception(\"Couldn't create or write in the file in mode 2\")\n\n create_file(str(names), \"names\", extension = \".txt\")\n\n\ndef mode_3(orig_file, extension=default_extension):\n \"\"\"mode 3: input file is staged to output\"\"\"\n\n orig_name = orig_file.split(\"/\")[len(orig_file.split(\"/\")) - 1]\n output_file = r\"./Intervals/\" + orig_name[0:orig_name.rfind('.')] + extension\n\n shutil.copyfile(orig_file, output_file)\n\n names = [orig_name[0:orig_name.rfind('.')]]\n create_file(str(names), \"names\", extension=\".txt\")\n\n\ndef mode_4(orig_file, others_name):\n \"\"\"mode 4: every interval in chromosomes is in a separate file. Other intervals are in a single file\"\"\"\n\n chrms = chr_intervals()\n names = []\n\n with open(orig_file, \"r\") as ins:\n counter = {}\n for line in ins:\n name = line.split()[0].lower()\n if name in chrms:\n if name in counter:\n counter[name] += 1\n else:\n counter[name] = 0\n suffix = \"\" if (counter[name] == 0) else \"_\" + str(counter[name])\n create_file(line, name + suffix)\n names.append(name + suffix)\n prev = name\n else:\n name = others_name\n if not name in names:\n names.append(name)\n try:\n add_to_file(line, name)\n except:\n raise Exception(\"Couldn't create or write in the file in mode 4\")\n\n create_file(str(names), \"names\", extension=\".txt\")\n\n\ndef prepare_intervals():\n # reading input files and split mode from command line\n args = docopt(__doc__, version='1.0')\n\n bed_file = args['--bed']\n fai_file = args['--fai']\n split_mode = int(args['--mode'])\n\n \n # define file name for non-chromosomal contigs\n others_name = 'others' \n\n global formats, lformat\n formats = [\"chr start end\", \"chr:start-end\"]\n lformat = args['--format']\n if lformat == None:\n lformat = formats[0]\n if not lformat in formats:\n raise Exception('Unsuported interval format')\n\n if not os.path.exists(r\"./Intervals\"):\n os.mkdir(r\"./Intervals\")\n else:\n files = glob.glob(r\"./Intervals/*\")\n for f in files:\n os.remove(f)\n\n # create variable input_file taking bed_file as priority\n if bed_file:\n input_file = bed_file\n elif fai_file:\n input_file = fai2bed(fai_file)\n else:\n raise Exception('No input files are provided')\n\n # calling adequate split mode function\n if split_mode == 1:\n mode_1(input_file)\n elif split_mode == 2:\n mode_2(input_file, others_name)\n elif split_mode == 3:\n if bed_file:\n mode_3(input_file)\n else:\n raise Exception('Bed file is required for mode 3')\n elif split_mode == 4:\n mode_4(input_file, others_name)\n else:\n raise Exception('Split mode value is not set')\n\n\nif __name__ == '__main__':\n prepare_intervals()" } ] } ], "hints": [ { "class": "sbg:CPURequirement", "value": 1 }, { "class": "sbg:MemRequirement", "value": 1000 }, { "class": "DockerRequirement", "dockerPull": "images.sbgenomics.com/bogdang/sbg_prepare_intervals:1.0" } ], "arguments": [ { "position": 0, "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\t\n if (typeof($job.inputs.format) !== \"undefined\")\n \treturn \"--format \" + \"\\\"\" + $job.inputs.format + \"\\\"\"\n}" } } ], "sbg:job": { "inputs": { "split_mode": null, "format": "chr start end", "fai_file": { "class": "File", "path": "/path/to/fai_file.ext", "secondaryFiles": [], "size": 0 }, "bed_file": { "class": "File", "path": "/path/to/bed_file.ext", "secondaryFiles": [], "size": 0 } }, "allocatedResources": { "mem": 1000, "cpu": 1 } }, "sbg:appVersion": [ "sbg:draft-2" ], "sbg:categories": [ "Converters" ], "sbg:cmdPreview": "python sbg_prepare_intervals.py --format \"chr start end\" --mode 3", "sbg:contributors": [ "vladimirk", "bogdang", "medjo", "bix-demo" ], "sbg:createdBy": "vladimirk", "sbg:createdOn": 1473083821, "sbg:id": "admin/sbg-public-data/sbg-prepare-intervals/84", "sbg:image_url": null, "sbg:latestRevision": 6, "sbg:license": "Apache License 2.0", "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1491905063, "sbg:project": "bix-demo/sbgtools-demo", "sbg:projectName": "SBGTools - Demo New", "sbg:revision": 6, "sbg:revisionNotes": "Common issues added", "sbg:revisionsInfo": [ { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1473083821, "sbg:revision": 0, "sbg:revisionNotes": "Copy of medjo/sbg-prepare-intervals/sbg-prepare-intervals/75" }, { "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1473084447, "sbg:revision": 1, "sbg:revisionNotes": "Copy of medjo/sbg-prepare-intervals/sbg-prepare-intervals/76" }, { "sbg:modifiedBy": "medjo", "sbg:modifiedOn": 1473928444, "sbg:revision": 2, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "medjo", "sbg:modifiedOn": 1474970272, "sbg:revision": 3, "sbg:revisionNotes": "split_mode set to required" }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1478525360, "sbg:revision": 4, "sbg:revisionNotes": "Fixed Toolkit name." }, { "sbg:modifiedBy": "medjo", "sbg:modifiedOn": 1491904483, "sbg:revision": 5, "sbg:revisionNotes": "Description changed" }, { "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1491905063, "sbg:revision": 6, "sbg:revisionNotes": "Common issues added" } ], "sbg:sbgMaintained": false, "sbg:toolAuthor": "Seven Bridges Genomics", "sbg:toolkit": "SBGTools", "sbg:toolkitVersion": "1.0", "sbg:validationErrors": [], "x": 1066.666849083377, "y": 609.4271201329931 }, "label": "SBG Prepare Intervals", "sbg:x": 1271.5189439680753, "sbg:y": 1117.4602671736395 }, { "id": "#GATK_BaseRecalibrator", "inputs": [ { "id": "#GATK_BaseRecalibrator.threads_per_job", "default": 32 }, { "id": "#GATK_BaseRecalibrator.reference", "source": [ "#SBG_FASTA_Indices.fasta_reference" ] }, { "id": "#GATK_BaseRecalibrator.reads", "source": [ "#GATK_IndelRealigner.realigned_bam_file" ] }, { "id": "#GATK_BaseRecalibrator.memory_per_job", "default": 50000 }, { "id": "#GATK_BaseRecalibrator.known_sites", "source": [ "#dbsnp", "#1000g_indels", "#mills" ] }, { "id": "#GATK_BaseRecalibrator.intervals", "source": [ "#bqsr_intervals" ] }, { "id": "#GATK_BaseRecalibrator.cpu_per_job", "default": 32 } ], "outputs": [ { "id": "#GATK_BaseRecalibrator.plot_pdf" }, { "id": "#GATK_BaseRecalibrator.bqsr" } ], "run": { "cwlVersion": "sbg:draft-2", "class": "CommandLineTool", "id": "admin/sbg-public-data/gatk-2-3-9-lite-base-recalibrator/15", "label": "GATK BaseRecalibrator", "description": "Overview\n\nThis tool is designed to work as the first pass in a two-pass processing step. It does a by-locus traversal operating only at sites that are not in dbSNP. We assume that all reference mismatches we see are therefore errors and indicative of poor base quality. This tool generates tables based on various user-specified covariates (such as read group, reported quality score, cycle, and context). Since there is a large amount of data, one can then calculate an empirical probability of error given the particular covariates seen at this site, where p(error) = num mismatches / num observations. The output file is a table (of the several covariate values, num observations, num mismatches, empirical quality score).\n\nNote: ReadGroupCovariate and QualityScoreCovariate are required covariates and will be added regardless of whether or not they were specified.\n\nInput\nA BAM file containing data that needs to be recalibrated.\nA database of known polymorphic sites to mask out.\n\nOutput\nA GATKReport file with many tables:\nThe list of arguments\nThe quantized qualities table\nThe recalibration table by read group\nThe recalibration table by quality score\nThe recalibration table for all the optional covariates\nThe GATKReport table format is intended to be easy to read by both humans and computer languages (especially R). Check out the documentation of the GATKReport (in the FAQs) to learn how to manipulate this table.\n\nUsage example\n java -jar GenomeAnalysisTK.jar \\\n -T BaseRecalibrator \\\n -R reference.fasta \\\n -I my_reads.bam \\\n -knownSites latest_dbsnp.vcf \\\n -o recal_data.table\n\n(IMPORTANT) Reference \".fasta\" Secondary Files\n\nTools in GATK that require a fasta reference file also look for the reference file's corresponding .fai (fasta index) and .dict (fasta dictionary) files. The fasta index file allows random access to reference bases and the dictionary file is a dictionary of the contig names and sizes contained within the fasta reference. These two secondary files are essential for GATK to work properly. To append these two files to your fasta reference please use the 'SBG FASTA Indices' tool within your GATK based workflow before using any of the GATK tools.", "baseCommand": [ "java", { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.memory_per_job){\n \treturn '-Xmx'.concat($job.inputs.memory_per_job, 'M')\n }\n \treturn '-Xmx2048M'\n}" }, "-jar", "/opt/GenomeAnalysisTKLite.jar", "--analysis_type", "BaseRecalibrator", { "class": "Expression", "engine": "#cwl-js-engine", "script": "{ \n if($job.inputs.threads_per_job){\n return '-nct '.concat($job.inputs.threads_per_job)\n }\n else{\n \treturn '-nct '.concat(3)\n }\n}" } ], "inputs": [ { "sbg:altPrefix": "-S", "sbg:category": "GATK General", "sbg:toolDefaultValue": "SILENT", "type": [ "null", { "type": "enum", "symbols": [ "SILENT", "LENIENT", "STRICT" ], "name": "validation_strictness" } ], "inputBinding": { "position": 0, "prefix": "--validation_strictness", "separate": true, "sbg:cmdInclude": true }, "label": "Validation Strictness", "description": "How strict should we be with validation.", "id": "#validation_strictness" }, { "sbg:altPrefix": "-OQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--useOriginalQualities", "separate": true, "sbg:cmdInclude": true }, "label": "Use Original Qualities", "description": "If set, use the original base quality scores from the OQ tag when present instead of the standard scores.", "id": "#use_original_qualities" }, { "sbg:altPrefix": "-use_legacy_downsampler", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--use_legacy_downsampler", "separate": true, "sbg:cmdInclude": true }, "label": "Use Legacy Downsampler", "description": "Use the legacy downsampling implementation instead of the newer, less-tested implementation.", "id": "#use_legacy_downsampler" }, { "sbg:altPrefix": "-U", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", { "type": "enum", "symbols": [ "ALLOW_UNINDEXED_BAM", "ALLOW_UNSET_BAM_SORT_ORDER", "NO_READ_ORDER_VERIFICATION", "ALLOW_SEQ_DICT_INCOMPATIBILITY", "LENIENT_VCF_PROCESSING", "ALL" ], "name": "unsafe" } ], "inputBinding": { "position": 0, "prefix": "--unsafe", "separate": true, "sbg:cmdInclude": true }, "label": "Unsafe", "description": "If set, enables unsafe operations: nothing will be checked at runtime. For expert users only who know what they are doing. We do not support usage of this argument.", "id": "#unsafe" }, { "sbg:altPrefix": null, "sbg:category": "Execution", "sbg:toolDefaultValue": "3", "type": [ "null", "int" ], "label": "Threads per job", "description": "For tools which support multiprocessing, this value can be used to set the number of threads to be used.", "id": "#threads_per_job" }, { "sbg:altPrefix": "-tag", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "string" ], "inputBinding": { "position": 0, "prefix": "--tag", "separate": true, "sbg:cmdInclude": true }, "label": "Tag", "description": "Arbitrary tag string to identify this GATK run as part of a group of runs, for later analysis.", "id": "#tag" }, { "sbg:altPrefix": "-sMode", "sbg:category": "Base Recalibrator", "sbg:toolDefaultValue": "SET_Q_ZERO", "type": [ "null", { "type": "enum", "symbols": [ "DO_NOTHING", "SET_Q_ZERO", "SET_Q_ZERO_BASE_N", "REMOVE_REF_BIAS" ], "name": "solid_recal_mode" } ], "inputBinding": { "position": 0, "prefix": "--solid_recal_mode", "separate": true, "sbg:cmdInclude": true }, "label": "Solid Recal Mode", "description": "How should we recalibrate solid bases in which the reference was inserted? Options = DO_NOTHING, SET_Q_ZERO, SET_Q_ZERO_BASE_N, or REMOVE_REF_BIAS.", "id": "#solid_recal_mode" }, { "sbg:altPrefix": null, "sbg:category": "Base Recalibrator", "sbg:toolDefaultValue": "THROW_EXCEPTION", "type": [ "null", { "type": "enum", "symbols": [ "THROW_EXCEPTION", "LEAVE_READ_UNRECALIBRATED", "PURGE_READ" ], "name": "solid_nocall_strategy" } ], "inputBinding": { "position": 0, "prefix": "--solid_nocall_strategy", "separate": true, "sbg:cmdInclude": true }, "label": "Solid Nocall Strategy", "description": "Defines the behavior of the recalibrator when it encounters no calls in the color space. Options = THROW_EXCEPTION, LEAVE_READ_UNRECALIBRATED, or PURGE_READ.", "id": "#solid_nocall_strategy" }, { "sbg:altPrefix": null, "sbg:category": "Base Recalibrator", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "-run_without_dbsnp_potentially_ruining_quality", "separate": true, "sbg:cmdInclude": true }, "label": "Run Without Dbsnp Potentially Ruining Quality", "description": "If specified, allows the recalibrator to be used without a dbsnp rod. Very unsafe and for expert users only.", "id": "#run_without_dbsnp_potentially_ruining_quality" }, { "sbg:altPrefix": "-rpr", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--remove_program_records", "separate": true, "sbg:cmdInclude": true }, "label": "Remove Program Records", "description": "Should we override the Walker's default and remove program records from the SAM header.", "id": "#remove_program_records" }, { "required": true, "sbg:altPrefix": "-R", "sbg:category": "Input Files", "type": [ "File" ], "inputBinding": { "position": 0, "prefix": "--reference_sequence", "separate": true, "itemSeparator": " ", "loadContents": true, "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Reference Genome", "description": "Reference Genome in FASTA format.", "sbg:fileTypes": "FASTA, FA", "id": "#reference" }, { "required": true, "sbg:altPrefix": "-I", "sbg:category": "Input Files", "type": [ { "type": "array", "items": "File" } ], "inputBinding": { "position": 0, "prefix": "--input_file", "separate": true, "sbg:cmdInclude": true, "secondaryFiles": [ ".bai" ] }, "label": "Read sequences", "description": "Read sequences in BAM format.", "sbg:fileTypes": "SAM, BAM", "id": "#reads" }, { "sbg:altPrefix": "-rgbl", "sbg:category": "GATK General", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 0, "prefix": "--read_group_black_list", "separate": true, "sbg:cmdInclude": true }, "label": "Read Group Black List", "description": "Filters out read groups matching : or a .txt file containing the filter strings one per line.", "id": "#read_group_black_list" }, { "sbg:altPrefix": "-rf", "sbg:category": "GATK General", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": { "type": "enum", "symbols": [ "BadCigarFilter", "BadMateFilter", "CountingFilteringIterator.CountingReadFilter", "DuplicateReadFilter", "FailsVendorQualityCheckFilter", "HCMappingQualityFilter", "LibraryReadFilter", "MalformedReadFilter", "MappingQualityFilter", "MappingQualityUnavailableFilter", "MappingQualityZeroFilter", "MateSameStrandFilter", "MaxInsertSizeFilter", "MissingReadGroupFilter", "NoOriginalQualityScoresFilter", "NotPrimaryAlignmentFilter", "OverclippedReadFilter", "Platform454Filter", "PlatformFilter", "PlatformUnitFilter", "ReadGroupBlackListFilter", "ReadLengthFilter", "ReadNameFilter", "ReadStrandFilter", "ReassignMappingQualityFilter", "ReassignOneMappingQualityFilter", "SampleFilter", "SingleReadGroupFilter", "UnmappedReadFilter" ] } } ], "inputBinding": { "position": 0, "prefix": "--read_filter", "separate": true, "sbg:cmdInclude": true }, "label": "Read Filter", "description": "Specify filtration criteria to apply to each read individually.", "id": "#read_filter" }, { "sbg:altPrefix": "-ql", "sbg:category": "Base Recalibrator", "sbg:toolDefaultValue": "16", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--quantizing_levels", "separate": true, "sbg:cmdInclude": true }, "label": "Quantizing Levels", "description": "Number of distinct quality scores in the quantized output.", "id": "#quantizing_levels" }, { "sbg:altPrefix": "-preserveQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "6", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--preserve_qscores_less_than", "separate": true, "sbg:cmdInclude": true }, "label": "Preserve Qscores Less Than", "description": "Bases with quality scores less than this threshold won't be recalibrated (with -BQSR).", "id": "#preserve_qscores_less_than" }, { "sbg:altPrefix": "-et", "sbg:category": "GATK General", "sbg:toolDefaultValue": "STANDARD", "type": [ "null", { "type": "enum", "symbols": [ "NO_ET", "STANDARD" ], "name": "phone_home" } ], "inputBinding": { "position": 0, "prefix": "--phone_home", "separate": true, "sbg:cmdInclude": true }, "label": "Phone Home", "description": "What kind of GATK run report should we generate? STANDARD is the default, can be NO_ET so nothing is posted to the run repository. Please see http://gatkforums.broadinstitute.org/discussion/1250/what-is-phone-home-and-how-does-it-affect-me#latest for details.", "id": "#phone_home" }, { "sbg:altPrefix": "-pedValidationType", "sbg:category": "GATK General", "sbg:toolDefaultValue": "STRICT", "type": [ "null", { "type": "enum", "symbols": [ "STRICT", "SILENT" ], "name": "pedigree_validation_type" } ], "inputBinding": { "position": 0, "prefix": "--pedigreeValidationType", "separate": true, "sbg:cmdInclude": true }, "label": "Pedigree Validation Type", "description": "How strict should we be in validating the pedigree information?.", "id": "#pedigree_validation_type" }, { "sbg:altPrefix": "-pedString", "sbg:category": "GATK General", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 0, "prefix": "--pedigreeString", "separate": true, "sbg:cmdInclude": true }, "label": "Pedigree String", "description": "Pedigree string for samples.", "id": "#pedigree_string" }, { "sbg:altPrefix": "-ndrs", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--nonDeterministicRandomSeed", "separate": true, "sbg:cmdInclude": true }, "label": "Non Deterministic Random Seed", "description": "Makes the GATK behave non deterministically, that is, the random numbers generated will be different in every run.", "id": "#non_deterministic_random_seed" }, { "sbg:altPrefix": "-noStandard", "sbg:category": "Base Recalibrator", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--no_standard_covs", "separate": true, "sbg:cmdInclude": true }, "label": "No Standard Covs", "description": "Do not use the standard set of covariates, but rather just the ones listed using the -cov argument. Cannot be used if grouped by interval.", "id": "#no_standard_covs" }, { "sbg:altPrefix": "-msdq", "sbg:category": "Base Recalibrator", "sbg:toolDefaultValue": "-1", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--mismatches_default_quality", "separate": true, "sbg:cmdInclude": true }, "label": "Mismatches Default Quality", "description": "Default quality for the base mismatches covariate.", "id": "#mismatches_default_quality" }, { "sbg:altPrefix": "-mcs", "sbg:category": "Base Recalibrator", "sbg:toolDefaultValue": "2", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--mismatches_context_size", "separate": true, "sbg:cmdInclude": true }, "label": "Mismatches Context Size", "description": "Size of the k-mer context to be used for base mismatches.", "id": "#mismatches_context_size" }, { "sbg:altPrefix": null, "sbg:category": "Execution", "sbg:toolDefaultValue": "2048", "type": [ "null", "int" ], "label": "Memory per job", "description": "Amount of RAM memory in MB to be used per job.", "id": "#memory_per_job" }, { "sbg:category": "Execution", "sbg:toolDefaultValue": "0", "type": [ "null", "int" ], "label": "Memory overhead per job", "description": "Memory overhead per job. By default this parameter value is set to '0' (zero megabytes). This parameter value is added to the Memory per job parameter value. This results in the allocation of the sum total (Memory per job and Memory overhead per job) amount of memory per job. By default the memory per job parameter value is set to 2048 megabytes, unless specified otherwise.", "id": "#memory_overhead_per_job" }, { "sbg:altPrefix": "-maxCycle", "sbg:category": "Base Recalibrator", "sbg:toolDefaultValue": "500", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--maximum_cycle_value", "separate": true, "sbg:cmdInclude": true }, "label": "Maximum Cycle Value", "description": "The maximum cycle value permitted for the Cycle covariate.", "id": "#maximum_cycle_value" }, { "sbg:altPrefix": "-maxRuntimeUnits", "sbg:category": "GATK General", "sbg:toolDefaultValue": "MINUTES", "type": [ "null", { "type": "enum", "symbols": [ "NANOSECONDS", "MICROSECONDS", "MILLISECONDS", "SECONDS", "MINUTES", "HOURS", "DAYS" ], "name": "max_runtime_units" } ], "inputBinding": { "position": 0, "prefix": "--maxRuntimeUnits", "separate": true, "sbg:cmdInclude": true }, "label": "Max Runtime Units", "description": "The TimeUnit for maxRuntime.", "id": "#max_runtime_units" }, { "sbg:altPrefix": "-maxRuntime", "sbg:category": "GATK General", "sbg:toolDefaultValue": "-1", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--maxRuntime", "separate": true, "sbg:cmdInclude": true }, "label": "Max Runtime", "description": "If provided, that GATK will stop execution cleanly as soon after maxRuntime has been exceeded, truncating the run but not exiting with a failure. By default the value is interpreted in minutes, but this can be changed by maxRuntimeUnits.", "id": "#max_runtime" }, { "sbg:altPrefix": "-lqt", "sbg:category": "Base Recalibrator", "sbg:toolDefaultValue": "2", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--low_quality_tail", "separate": true, "sbg:cmdInclude": true }, "label": "Low Quality Tail", "description": "Minimum quality for the bases in the tail of the reads to be considered.", "id": "#low_quality_tail" }, { "required": false, "sbg:category": "Input Files", "sbg:stageInput": "link", "type": [ "null", { "type": "array", "items": "File" } ], "inputBinding": { "position": 0, "prefix": "--knownSites", "separate": true, "sbg:cmdInclude": true }, "label": "Known Sites", "description": "A database of known polymorphic sites to skip over in the recalibration algorithm.", "sbg:fileTypes": "VCF, BED, TXT", "id": "#known_sites" }, { "sbg:altPrefix": "-kpr", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--keep_program_records", "separate": true, "sbg:cmdInclude": true }, "label": "Keep Program Records", "description": "Should we override the Walker's default and keep program records from the SAM header.", "id": "#keep_program_records" }, { "required": false, "sbg:category": "Input Files", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--intervals", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Intervals", "description": "One or more genomic intervals over which to operate. Can be an specified in an .intervals file or a rod file. Cannot be used if grouped by interval. .", "sbg:fileTypes": "TXT, BED, VCF, INTERVALS", "id": "#intervals_file" }, { "required": false, "sbg:category": "GATK General", "sbg:includeInPorts": true, "sbg:toolDefaultValue": "sample", "type": [ "null", "string" ], "inputBinding": { "position": 0, "prefix": "-L", "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if ($job.inputs.reference && $job.inputs.intervals){\n fasta = $job.inputs.reference.contents\n interval = $job.inputs.intervals\n // fasta - UCSC format\n if (fasta.indexOf(\">chr\") != -1){\n // INTERVAL - UCSC format, all w/ chr\n if (interval.indexOf(\"chr\") != -1){\n return interval\n }// if (interval.indexOf(\"chr\") == -1)\n // interval - 1000G format, convert to UCSC\n else{\n if (Number(interval) || interval == \"X\" || interval == \"Y\"){\n return \"chr\".concat(interval)\n }// if (Number(interval)|| interval == \"X\" || interval == \"Y\")\n else{\n if (interval == \"MT\"){\n return \"chrM\"\n }//if (interval == \"MT\")\n else{\n if(Number(interval.substr(2,6))<211){\n n = Number(interval.substr(2,6)) - 190;\n s = \"chr\";\n s = s.concat(n.toString());\n s = s.concat(\"_gl\");\n s = s.concat(interval.substr(2,6));\n s = s.concat(\"_random\");\n return s\n }// if(Number(interval.substr(2,interval.length-2))<211)\n else{\n if(Number(interval.substr(2,6))<250) {\n s = \"chrUn_gl\";\n s = s.concat(interval.substr(2,6));\n return s\n }//if(Number(interval.substr(2,interval.length-2))<250)\n else {\n return interval\n }//if(Number(interval.substr(2,interval.length-2))<250) - else\n }// if(Number(interval.substr(2,interval.length-2))<211) - else\n }//if (interval == \"MT\") - else\n }//if (Number(interval)|| interval == \"X\" || interval == \"Y\") - else\n }//if (interval.indexOf(\"chr\") == -1) - else\n }//if (fasta.indexOf(\">chr\") == -1)\n // fasta - 1000G format\n else{\n //interval - USCS format, all w/ chr, convert to 1000G\n if(interval.indexOf(\"chr\") != -1){\n if(Number(interval.substr(3,2)) && interval.length<6 || interval == \"chrX\" || interval == \"chrY\"){\n return interval.substr(3,2)\n }//if(Number(interval.substr(3,interval.length)) != NaN || interval == \"chrX\" || bsqr == \"chrY\")\n else{\n if (interval == \"chrM\") {\n return \"MT\"\n }//if (interval == \"chrM\")\n else{\n s = \"GL\";\n s = s.concat(interval.substr(8,6));\n s = s.concat(\".1\");\n return s\n }//if (interval == \"chrM\") - else\n }//if(Number(interval.substr(3,interval.length)) != NaN || interval == \"chrX\" || bsqr == \"chrY\") - else\n }//if($job.inputs.interval.indexOf(\"chr\") == -1)\n // interval - 1000G format\n else{\n return interval\n }//(interval.indexOf(\"chr\") == -1) - else\n }//if (fasta.indexOf(\">chr\") == -1) - else\n }//if ($job.inputs.fasta && $job.inputs.interval)\n}\n" }, "sbg:cmdInclude": true }, "label": "Intervals", "description": "One or more genomic intervals over which to operate.", "id": "#intervals" }, { "sbg:altPrefix": "-isr", "sbg:category": "GATK General", "sbg:toolDefaultValue": "UNION", "type": [ "null", { "type": "enum", "symbols": [ "UNION", "INTERSECTION" ], "name": "interval_set_rule" } ], "inputBinding": { "position": 0, "prefix": "--interval_set_rule", "separate": true, "sbg:cmdInclude": true }, "label": "Interval Set Rule", "description": "Indicates the set merging approach the interval parser should use to combine the various -L or -XL inputs.", "id": "#interval_set_rule" }, { "sbg:altPrefix": "-ip", "sbg:category": "GATK General", "sbg:toolDefaultValue": "0", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--interval_padding", "separate": true, "sbg:cmdInclude": true }, "label": "Interval Padding", "description": "Indicates how many basepairs of padding to include around each of the intervals specified with the -L/--intervals argument.", "id": "#interval_padding" }, { "sbg:altPrefix": "-im", "sbg:category": "GATK General", "sbg:toolDefaultValue": "ALL", "type": [ "null", { "type": "enum", "symbols": [ "ALL", "OVERLAPPING_ONLY" ], "name": "interval_merging" } ], "inputBinding": { "position": 0, "prefix": "--interval_merging", "separate": true, "sbg:cmdInclude": true }, "label": "Interval Merging", "description": "Indicates the interval merging rule we should use for abutting intervals.", "id": "#interval_merging" }, { "sbg:altPrefix": "-idq", "sbg:category": "Base Recalibrator", "sbg:toolDefaultValue": "45", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--insertions_default_quality", "separate": true, "sbg:cmdInclude": true }, "label": "Insertions Default Quality", "description": "Default quality for the base insertions covariate.", "id": "#insertions_default_quality" }, { "sbg:altPrefix": "-ics", "sbg:category": "Base Recalibrator", "sbg:toolDefaultValue": "3", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--indels_context_size", "separate": true, "sbg:cmdInclude": true }, "label": "Indels Context Size", "description": "Size of the k-mer context to be used for base insertions and deletions.", "id": "#indels_context_size" }, { "required": false, "sbg:altPrefix": "-K", "sbg:category": "Input Files", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--gatk_key", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Gatk key", "description": "GATK Key file. Required if running with -et NO_ET. Please see http://gatkforums.broadinstitute.org/discussion/1250/what-is-phone-home-and-how-does-it-affect-me#latest for details.", "sbg:fileTypes": "KEY, LICENSE", "id": "#gatk_key" }, { "sbg:altPrefix": "-fixMisencodedQuals", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "-fixMisencodedQuals", "separate": true, "sbg:cmdInclude": true }, "label": "Fix Misencoded Quals", "description": "Fix mis-encoded base quality scores.", "id": "#fix_misencoded_quals" }, { "required": false, "sbg:altPrefix": "-XL", "sbg:category": "Input Files", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--excludeIntervals", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Exclude Intervals", "description": "One or more genomic intervals to exclude from processing. Can be an .intervals file or a rod file.", "sbg:fileTypes": "TXT, BED, VCF", "id": "#exclude_intervals" }, { "sbg:altPrefix": "-EOQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--emit_original_quals", "separate": true, "sbg:cmdInclude": true }, "label": "Emit Original Quals", "description": "If true, enables printing of the OQ tag with the original base qualities (with -BQSR).", "id": "#emit_original_quals" }, { "sbg:altPrefix": "-dt", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", { "type": "enum", "symbols": [ "NONE", "ALL_READS", "BY_SAMPLE" ], "name": "downsampling_type" } ], "inputBinding": { "position": 0, "prefix": "--downsampling_type", "separate": true, "sbg:cmdInclude": true }, "label": "Downsampling Type", "description": "Type of reads downsampling to employ at a given locus. Reads will be selected randomly to be removed from the pile based on the method described here.", "id": "#downsampling_type" }, { "sbg:altPrefix": "-dfrac", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--downsample_to_fraction", "separate": true, "sbg:cmdInclude": true }, "label": "Downsample to Fraction", "description": "Fraction [0.0-1.0] of reads to downsample to.", "id": "#downsample_to_fraction" }, { "sbg:altPrefix": "-dcov", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--downsample_to_coverage", "separate": true, "sbg:cmdInclude": true }, "label": "Downsample to Coverage", "description": "Coverage to downsample to at any given locus; note that downsampled reads are randomly selected from all possible reads at a locus. For non-locus-based traversals (eg., ReadWalkers), this sets the maximum number of reads at each alignment start position.", "id": "#downsample_to_coverage" }, { "sbg:altPrefix": null, "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--disableRandomization", "separate": true, "sbg:cmdInclude": true }, "label": "Disable Randomization", "description": "Completely eliminates randomization from nondeterministic methods. To be used mostly in the testing framework where dynamic parallelism can result in differing numbers of calls to the generator.", "id": "#disable_radnomization" }, { "sbg:altPrefix": null, "sbg:category": "Base Recalibrator", "sbg:toolDefaultValue": "True", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--disable_indel_quals", "separate": true, "sbg:cmdInclude": true }, "label": "Disable indel quals", "description": "Disable indel quality recalibration. Must be set to true in GATK Lite.", "id": "#disable_indel_quals" }, { "sbg:altPrefix": "-ddq", "sbg:category": "Base Recalibrator", "sbg:toolDefaultValue": "45", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--deletions_default_quality", "separate": true, "sbg:cmdInclude": true }, "label": "Deletions Default Quality", "description": "Default quality for the base deletions covariate.", "id": "#deletions_default_quality" }, { "sbg:altPrefix": "-DBQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "-1", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--defaultBaseQualities", "separate": true, "sbg:cmdInclude": true }, "label": "Default Base Qualities", "description": "If reads are missing some or all base quality scores, this value will be used for all base quality scores.", "id": "#default_base_qualities" }, { "sbg:category": "Execution", "sbg:toolDefaultValue": "1", "type": [ "null", "int" ], "label": "CPU per job", "description": "Number of CPU per job.", "id": "#cpu_per_job" }, { "sbg:altPrefix": "-cov", "sbg:category": "Base Recalibrator", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": { "type": "enum", "symbols": [ "CycleCovariate", "ReadGroupCovariate", "ContextCovariate", "RepeatLengthCovariate", "QualityScoreCovariate" ] } } ], "inputBinding": { "position": 0, "prefix": "--covariate", "separate": true, "sbg:cmdInclude": true }, "label": "Covariate", "description": "One or more covariates to be used in the recalibration. Can be specified multiple times.", "id": "#covariate" }, { "sbg:altPrefix": "-bqsrBAQGOP", "sbg:category": "Base Recalibrator", "sbg:toolDefaultValue": "40.0", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--bqsrBAQGapOpenPenalty", "separate": true, "sbg:cmdInclude": true }, "label": "Bqsr Baq Gap Open Penalty", "description": "BQSR BAQ gap open penalty (Phred Scaled). Default value is 40. 30 is perhaps better for whole genome call sets.", "id": "#bqsr_baq_gap_open_penalty" }, { "sbg:altPrefix": "-bintag", "sbg:category": "Base Recalibrator", "sbg:toolDefaultValue": "", "type": [ "null", "string" ], "inputBinding": { "position": 0, "prefix": "--binary_tag_name", "separate": true, "sbg:cmdInclude": true }, "label": "Binary Tag Name", "description": "The binary tag covariate name if using it.", "id": "#binary_tag_name" }, { "sbg:altPrefix": "-baqGOP", "sbg:category": "GATK General", "sbg:toolDefaultValue": "40.0", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--baqGapOpenPenalty", "separate": true, "sbg:cmdInclude": true }, "label": "BAQ Gap Open Penalty", "description": "BAQ gap open penalty (Phred Scaled). Default value is 40. 30 is perhaps better for whole genome call sets.", "id": "#baq_gap_open_penalty" }, { "sbg:altPrefix": "-baq", "sbg:category": "GATK General", "sbg:toolDefaultValue": "OFF", "type": [ "null", { "type": "enum", "symbols": [ "OFF", "CALCULATE_AS_NECESSARY", "RECALCULATE" ], "name": "baq" } ], "inputBinding": { "position": 0, "prefix": "--baq", "separate": true, "sbg:cmdInclude": true }, "label": "BAQ Calculation Type", "description": "Type of BAQ calculation to apply in the engine.", "id": "#baq" }, { "sbg:altPrefix": "--allow_potentially_misencoded_quality_scores", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "-allowPotentiallyMisencodedQuals", "separate": true, "sbg:cmdInclude": true }, "label": "Allow Potentially Misencoded Quals", "description": "Do not fail when encountered base qualities that are too high and seemingly indicate a problem with the base quality encoding of the BAM file.", "id": "#allow_potentailly_misencoded_quals" } ], "outputs": [ { "type": [ "null", "File" ], "label": "Plot", "sbg:fileTypes": "PDF", "outputBinding": { "glob": "*.pdf" }, "id": "#plot_pdf" }, { "type": [ "File" ], "label": "BQSR Table", "description": "The output recalibration table file to create.", "sbg:fileTypes": "GRP", "outputBinding": { "glob": "*.recal_data.grp", "sbg:inheritMetadataFrom": "#reads" }, "id": "#bqsr" } ], "requirements": [ { "class": "ExpressionEngineRequirement", "id": "#cwl-js-engine", "requirements": [ { "class": "DockerRequirement", "dockerPull": "rabix/js-engine" } ] } ], "hints": [ { "class": "sbg:CPURequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.cpu_per_job){\n \treturn $job.inputs.cpu_per_job\n }\n\treturn 1\n}" } }, { "class": "sbg:MemRequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.memory_per_job){\n if($job.inputs.memory_overhead_per_job){\n \treturn $job.inputs.memory_per_job + $job.inputs.memory_overhead_per_job\n }\n else\n \t\treturn $job.inputs.memory_per_job\n }\n else if(!$job.inputs.memory_per_job && $job.inputs.memory_overhead_per_job){\n\t\treturn 2048 + $job.inputs.memory_overhead_per_job \n }\n else\n \treturn 2048\n}" } }, { "class": "DockerRequirement", "dockerImageId": "47510cb2da55", "dockerPull": "images.sbgenomics.com/stefanristeski/gatk2-lite:2.3-9" } ], "arguments": [ { "position": 0, "prefix": "--out", "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n read_name = [].concat($job.inputs.reads)[0].path.replace(/^.*[\\\\\\/]/, '').split('.')\n read_namebase = read_name.slice(0, read_name.length-1).join('.')\n return read_namebase + '.recal_data.grp'\n}" } }, { "position": 0, "separate": true, "valueFrom": "--disable_indel_quals" }, { "position": 0, "prefix": "--plot_pdf_file", "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n read_name = [].concat($job.inputs.reads)[0].path.replace(/^.*[\\\\\\/]/, '').split('.')\n read_namebase = read_name.slice(0, read_name.length-1).join('.')\n return read_namebase + '.pdf'\n\n}" } } ], "sbg:job": { "inputs": { "validation_strictness": null, "use_original_qualities": null, "use_legacy_downsampler": null, "unsafe": null, "threads_per_job": null, "tag": null, "solid_recal_mode": null, "solid_nocall_strategy": null, "run_without_dbsnp_potentially_ruining_quality": null, "remove_program_records": null, "reference": { "path": "/folder/reference.fasta" }, "reads": [ { "class": "File", "path": "/folder/my_reads.bam", "secondaryFiles": [], "size": 0 } ], "read_group_black_list": [], "read_filter": [], "quantizing_levels": null, "preserve_qscores_less_than": null, "phone_home": null, "pedigree_validation_type": null, "pedigree_string": [], "non_deterministic_random_seed": null, "no_standard_covs": null, "mismatches_default_quality": null, "mismatches_context_size": null, "memory_per_job": null, "memory_overhead_per_job": 0, "maximum_cycle_value": null, "max_runtime_units": null, "max_runtime": null, "low_quality_tail": null, "known_sites": [ { "path": "/folder/latest_dbsnp.vcf" } ], "keep_program_records": null, "intervals_file": null, "intervals": "20", "interval_set_rule": null, "interval_padding": null, "interval_merging": null, "insertions_default_quality": null, "indels_context_size": null, "gatk_key": null, "fix_misencoded_quals": null, "exclude_intervals": null, "emit_original_quals": null, "downsampling_type": null, "downsample_to_fraction": null, "downsample_to_coverage": null, "disable_radnomization": null, "disable_indel_quals": null, "deletions_default_quality": null, "default_base_qualities": null, "cpu_per_job": null, "covariate": [], "bqsr_baq_gap_open_penalty": null, "binary_tag_name": null, "baq_gap_open_penalty": null, "baq": null, "allow_potentailly_misencoded_quals": null }, "allocatedResources": { "mem": 2048, "cpu": 1 } }, "sbg:appVersion": [ "sbg:draft-2" ], "sbg:categories": [ "Plotting-and-Rendering", "SAM/BAM-Processing" ], "sbg:cmdPreview": "java -Xmx2048M -jar /opt/GenomeAnalysisTKLite.jar --analysis_type BaseRecalibrator -nct 3 --reference_sequence /folder/reference.fasta --input_file /folder/my_reads.bam --out my_reads.recal_data.grp --disable_indel_quals --plot_pdf_file my_reads.pdf", "sbg:contributors": [ "vladimirk", "bogdang", "nikola_jovanovic", "bix-demo" ], "sbg:createdBy": "bix-demo", "sbg:createdOn": 1450911406, "sbg:id": "admin/sbg-public-data/gatk-2-3-9-lite-base-recalibrator/15", "sbg:image_url": null, "sbg:latestRevision": 11, "sbg:license": "MIT License", "sbg:links": [ { "id": "https://www.broadinstitute.org/gatk/index.php", "label": "Homepage" }, { "id": "https://github.com/broadgsa/gatk-protected", "label": "Source Code" }, { "id": "https://www.broadinstitute.org/gatk/guide/pdfdocs/GATK_GuideBook_2.3-9.pdf", "label": "Wiki" }, { "id": "https://www.broadinstitute.org/gatk/download/auth?package=GATK-archive&version=2.3-9-ge5ebf34", "label": "Download" }, { "id": "https://www.broadinstitute.org/gatk/about/#in-the-literature", "label": "Publication" }, { "id": "https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_bqsr_BaseRecalibrator.php", "label": "Documentation" } ], "sbg:modifiedBy": "nikola_jovanovic", "sbg:modifiedOn": 1490995301, "sbg:project": "bix-demo/gatk-2-3-9-lite-demo", "sbg:projectName": "GATK 2.3.9 Lite - Demo New ", "sbg:revision": 11, "sbg:revisionNotes": "Set reference load content.", "sbg:revisionsInfo": [ { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911406, "sbg:revision": 0, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911409, "sbg:revision": 1, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911410, "sbg:revision": 2, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911411, "sbg:revision": 3, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911412, "sbg:revision": 4, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911413, "sbg:revision": 5, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1469450580, "sbg:revision": 6, "sbg:revisionNotes": "File extensions for intervals_file corrected." }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1471364360, "sbg:revision": 7, "sbg:revisionNotes": "known sites link." }, { "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1472226104, "sbg:revision": 8, "sbg:revisionNotes": "-L intervals string" }, { "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1478707638, "sbg:revision": 9, "sbg:revisionNotes": ".bai secondary file" }, { "sbg:modifiedBy": "nikola_jovanovic", "sbg:modifiedOn": 1490973283, "sbg:revision": 10, "sbg:revisionNotes": "Updated the intervals input to handle both chr and non chr strings based on what the input fasta file contains. \n\nIf the intervals input is set to chr20 and the fasta contigs are 1,2,3... it will convert the input to 20, and vice versa.\n\nIMPORTANT: The test will show that there are some errors in the expressions, the errors come from the fasta's loaded contents not being processed properly during the test as the files contain no contents for testing." }, { "sbg:modifiedBy": "nikola_jovanovic", "sbg:modifiedOn": 1490995301, "sbg:revision": 11, "sbg:revisionNotes": "Set reference load content." } ], "sbg:sbgMaintained": false, "sbg:toolAuthor": "Broad Institute", "sbg:toolkit": "GATK", "sbg:toolkitVersion": "2.3.9 Lite", "sbg:validationErrors": [], "x": 1581.1770643860907, "y": 402.27466428046705 }, "label": "GATK BaseRecalibrator", "sbg:x": 1581.1770643860907, "sbg:y": 402.27466428046705 }, { "id": "#GATK_UnifiedGenotyper", "inputs": [ { "id": "#GATK_UnifiedGenotyper.threads_per_job", "default": 4 }, { "id": "#GATK_UnifiedGenotyper.reference", "source": [ "#SBG_FASTA_Indices.fasta_reference" ] }, { "id": "#GATK_UnifiedGenotyper.reads", "source": [ "#GATK_PrintReads.recalibrated_bam" ] }, { "id": "#GATK_UnifiedGenotyper.memory_per_job", "default": 2048 }, { "id": "#GATK_UnifiedGenotyper.memory_overhead_per_job", "default": 64 }, { "id": "#GATK_UnifiedGenotyper.intervals_file", "source": [ "#SBG_Prepare_Intervals.intervals" ] }, { "id": "#GATK_UnifiedGenotyper.genotype_likelihoods_model", "default": "BOTH" }, { "id": "#GATK_UnifiedGenotyper.dbsnp", "source": [ "#dbsnp" ] }, { "id": "#GATK_UnifiedGenotyper.cpu_per_job", "default": 1 } ], "outputs": [ { "id": "#GATK_UnifiedGenotyper.raw_vcf" } ], "run": { "cwlVersion": "sbg:draft-2", "class": "CommandLineTool", "id": "admin/sbg-public-data/gatk-2-3-9-lite-unifiedgenotyper/34", "label": "GATK UnifiedGenotyper", "description": "Overview\n\nThis tool uses a Bayesian genotype likelihood model to estimate simultaneously the most likely genotypes and allele frequency in a population of N samples, emitting a genotype for each sample. The system can either emit just the variant sites or complete genotypes (which includes homozygous reference calls) satisfying some phred-scaled confidence value.\n\nInput\nThe read data from which to make variant calls.\n\nOutput\nA raw, unfiltered, highly sensitive callset in VCF format.\n\nUsage examples:\n\n//Multi-sample SNP calling\n java -jar GenomeAnalysisTK.jar \\\n -T UnifiedGenotyper \\\n -R reference.fasta \\\n -I sample1.bam [-I sample2.bam ...] \\\n --dbsnp dbSNP.vcf \\\n -o snps.raw.vcf \\\n -stand_call_conf [50.0] \\\n -stand_emit_conf 10.0 \\\n [-L targets.interval_list]\n \n//Generate calls at all sites\n java -jar GenomeAnalysisTK.jar \\\n -T UnifiedGenotyper \\\n -R reference.fasta \\\n -I input.bam \\\n -o raw_variants.vcf \\\n --output_mode EMIT_ALL_SITES\n \nCaveats\n\nThe caller can be very aggressive in calling variants in order to be very sensitive, so the raw output will contain many false positives. We use extensive post-calling filters to eliminate most of these FPs. See the documentation on filtering (especially by Variant Quality Score Recalibration) for more details.\nThis tool has been deprecated in favor of HaplotypeCaller, a much more sophisticated variant caller that produces much better calls, especially on indels, and includes features that allow it to scale to much larger cohort sizes.\nSpecial note on ploidy\n\nThis tool is able to handle almost any ploidy (except very high ploidies in large pooled experiments); the ploidy can be specified using the -ploidy argument for non-diploid organisms.\n\n(IMPORTANT) Reference \".fasta\" Secondary Files\n\nTools in GATK that require a fasta reference file also look for the reference file's corresponding .fai (fasta index) and .dict (fasta dictionary) files. The fasta index file allows random access to reference bases and the dictionary file is a dictionary of the contig names and sizes contained within the fasta reference. These two secondary files are essential for GATK to work properly. To append these two files to your fasta reference please use the 'SBG FASTA Indices' tool within your GATK based workflow before using any of the GATK tools.", "baseCommand": [ "java", { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.memory_per_job){\n \treturn '-Xmx'.concat($job.inputs.memory_per_job, 'M')\n }\n return '-Xmx2048M'\n}" }, "-jar", "/opt/GenomeAnalysisTKLite.jar", "--analysis_type", "UnifiedGenotyper", { "class": "Expression", "engine": "#cwl-js-engine", "script": "{ \n if($job.inputs.threads_per_job){\n return '-nt '.concat($job.inputs.threads_per_job)\n }\n else{\n \treturn '-nt '.concat(4)\n }\n}" } ], "inputs": [ { "sbg:altPrefix": "-S", "sbg:category": "GATK General", "sbg:toolDefaultValue": "SILENT", "type": [ "null", { "type": "enum", "symbols": [ "SILENT", "LENIENT", "STRICT" ], "name": "validation_strictness" } ], "inputBinding": { "position": 0, "prefix": "--validation_strictness", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Validation Strictness", "description": "How strict should we be with validation.", "id": "#validation_strictness" }, { "sbg:altPrefix": "-OQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--useOriginalQualities", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Use Original Qualities", "description": "If set, use the original base quality scores from the OQ tag when present instead of the standard scores.", "id": "#use_original_qualities" }, { "sbg:altPrefix": "-use_legacy_downsampler", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--use_legacy_downsampler", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Use Legacy Downsampler", "description": "Use the legacy downsampling implementation instead of the newer, less-tested implementation.", "id": "#use_legacy_downsampler" }, { "sbg:altPrefix": "-U", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", { "type": "enum", "symbols": [ "ALLOW_UNINDEXED_BAM", "ALLOW_UNSET_BAM_SORT_ORDER", "NO_READ_ORDER_VERIFICATION", "ALLOW_SEQ_DICT_INCOMPATIBILITY", "LENIENT_VCF_PROCESSING", "ALL" ], "name": "unsafe" } ], "inputBinding": { "position": 0, "prefix": "--unsafe", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Unsafe", "description": "If set, enables unsafe operations: nothing will be checked at runtime. For expert users only who know what they are doing. We do not support usage of this argument.", "id": "#unsafe" }, { "sbg:altPrefix": null, "sbg:category": "Execution", "sbg:toolDefaultValue": "4", "type": [ "null", "int" ], "label": "Threads per job", "description": "For tools which support multiprocessing, this value can be used to set the number of threads to be used.", "id": "#threads_per_job" }, { "sbg:altPrefix": "-tag", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "string" ], "inputBinding": { "position": 0, "prefix": "--tag", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Tag", "description": "Arbitrary tag string to identify this GATK run as part of a group of runs, for later analysis.", "id": "#tag" }, { "sbg:altPrefix": "-stand_emit_conf", "sbg:category": "Unified Genotyper", "sbg:toolDefaultValue": "30.0", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--standard_min_confidence_threshold_for_emitting", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Stand Emit Conf", "description": "The minimum phred-scaled confidence threshold at which variants should be emitted (and filtered with LowQual if less than the calling threshold).", "id": "#stand_emit_conf" }, { "sbg:altPrefix": "-stand_call_conf", "sbg:category": "Unified Genotyper", "sbg:toolDefaultValue": "30.0", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--standard_min_confidence_threshold_for_calling", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Stand Call Conf", "description": "The minimum phred-scaled confidence threshold at which variants should be called.", "id": "#stand_call_conf" }, { "sbg:altPrefix": "-rpr", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--remove_program_records", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Remove Program Records", "description": "Should we override the Walker's default and remove program records from the SAM header.", "id": "#remove_program_records" }, { "required": false, "sbg:altPrefix": null, "sbg:category": "Input Files", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--reference_sample_calls", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Reference Sample Calls", "description": "VCF file with the truth callset for the reference sample.", "id": "#reference_sample_calls" }, { "required": true, "sbg:altPrefix": "-R", "sbg:category": "Input Files", "type": [ "File" ], "inputBinding": { "position": 0, "prefix": "--reference_sequence", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Reference Genome", "description": "Reference Genome in FASTA format.", "sbg:fileTypes": "FASTA, FA", "id": "#reference" }, { "required": true, "sbg:altPrefix": "-I", "sbg:category": "Input Files", "type": [ { "type": "array", "items": "File" } ], "inputBinding": { "position": 0, "prefix": "--input_file", "separate": true, "sbg:cmdInclude": true, "secondaryFiles": [ ".bai" ] }, "label": "Read sequences", "description": "Read sequences in BAM format.", "sbg:fileTypes": "SAM, BAM", "id": "#reads" }, { "sbg:altPrefix": "-rgbl", "sbg:category": "GATK General", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 0, "prefix": "--read_group_black_list", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Read Group Black List", "description": "Filters out read groups matching : or a .txt file containing the filter strings one per line.", "id": "#read_group_black_list" }, { "sbg:altPrefix": "-rf", "sbg:category": "GATK General", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": { "type": "enum", "symbols": [ "BadCigarFilter", "BadMateFilter", "CountingFilteringIterator.CountingReadFilter", "DuplicateReadFilter", "FailsVendorQualityCheckFilter", "HCMappingQualityFilter", "LibraryReadFilter", "MalformedReadFilter", "MappingQualityFilter", "MappingQualityUnavailableFilter", "MappingQualityZeroFilter", "MateSameStrandFilter", "MaxInsertSizeFilter", "MissingReadGroupFilter", "NoOriginalQualityScoresFilter", "NotPrimaryAlignmentFilter", "OverclippedReadFilter", "Platform454Filter", "PlatformFilter", "PlatformUnitFilter", "ReadGroupBlackListFilter", "ReadLengthFilter", "ReadNameFilter", "ReadStrandFilter", "ReassignMappingQualityFilter", "ReassignOneMappingQualityFilter", "SampleFilter", "SingleReadGroupFilter", "UnmappedReadFilter" ] } } ], "inputBinding": { "position": 0, "prefix": "--read_filter", "separate": true, "sbg:cmdInclude": true }, "label": "Read Filter", "description": "Specify filtration criteria to apply to each read individually.", "id": "#read_filter" }, { "sbg:altPrefix": "-preserveQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "6", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--preserve_qscores_less_than", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Preserve Qscores Less Than", "description": "Bases with quality scores less than this threshold won't be recalibrated (with -BQSR).", "id": "#preserve_qscores_less_than" }, { "sbg:altPrefix": "-et", "sbg:category": "GATK General", "sbg:toolDefaultValue": "STANDARD", "type": [ "null", { "type": "enum", "symbols": [ "NO_ET", "STANDARD" ], "name": "phone_home" } ], "inputBinding": { "position": 0, "prefix": "--phone_home", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Phone Home", "description": "What kind of GATK run report should we generate? STANDARD is the default, can be NO_ET so nothing is posted to the run repository. Please see http://gatkforums.broadinstitute.org/discussion/1250/what-is-phone-home-and-how-does-it-affect-me#latest for details.", "id": "#phone_home" }, { "sbg:altPrefix": "-pedValidationType", "sbg:category": "GATK General", "sbg:toolDefaultValue": "STRICT", "type": [ "null", { "type": "enum", "symbols": [ "STRICT", "SILENT" ], "name": "pedigree_validation_type" } ], "inputBinding": { "position": 0, "prefix": "--pedigreeValidationType", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Pedigree Validation Type", "description": "How strict should we be in validating the pedigree information?.", "id": "#pedigree_validation_type" }, { "sbg:altPrefix": "-pedString", "sbg:category": "GATK General", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 0, "prefix": "--pedigreeString", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Pedigree String", "description": "Pedigree string for samples.", "id": "#pedigree_string" }, { "sbg:altPrefix": "-pcr_error", "sbg:category": "Unified Genotyper", "sbg:toolDefaultValue": "0.0001", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--pcr_error_rate", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Pcr Error Rate", "description": "The PCR error rate to be used for computing fragment-based likelihoods.", "id": "#pcr_error_rate" }, { "sbg:altPrefix": "-pairHMM", "sbg:category": "Unified Genotyper", "sbg:toolDefaultValue": "ORIGINAL", "type": [ "null", { "type": "enum", "symbols": [ "EXACT", "ORIGINAL", "CACHING", "LOGLESS_CACHING" ], "name": "pair_hmm_implementation" } ], "inputBinding": { "position": 0, "prefix": "--pair_hmm_implementation", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Pair Hmm Implementation", "description": "The PairHMM implementation to use for -glm INDEL genotype likelihood calculations.", "id": "#pair_hmm_implementation" }, { "sbg:altPrefix": null, "sbg:category": "Unified Genotyper", "sbg:toolDefaultValue": "EXACT_INDEPENDENT", "type": [ "null", { "type": "enum", "symbols": [ "EXACT_INDEPENDENT", "EXACT_REFERENCE", "EXACT_ORIGINAL", "EXACT_GENERAL_PLOIDY" ], "name": "p_nonref_model" } ], "inputBinding": { "position": 0, "prefix": "--p_nonref_model", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "P Nonref Model", "description": "Non-reference probability calculation model to employ.", "id": "#p_nonref_model" }, { "sbg:altPrefix": "-out_mode", "sbg:category": "Unified Genotyper", "sbg:toolDefaultValue": "EMIT_VARIANTS_ONLY", "type": [ "null", { "type": "enum", "symbols": [ "EMIT_VARIANTS_ONLY", "EMIT_ALL_CONFIDENT_SITES", "EMIT_ALL_SITES" ], "name": "output_mode" } ], "inputBinding": { "position": 0, "prefix": "--output_mode", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Output Mode", "description": "Specifies which type of calls we should output.", "id": "#output_mode" }, { "sbg:altPrefix": "-ndrs", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--nonDeterministicRandomSeed", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Non Deterministic Random Seed", "description": "Makes the GATK behave non deterministically, that is, the random numbers generated will be different in every run.", "id": "#non_deterministic_random_seed" }, { "sbg:altPrefix": "-minIndelFrac", "sbg:category": "Unified Genotyper", "sbg:toolDefaultValue": "0.25", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "-minIndelFrac", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Min Indel Frac", "description": "Minimum fraction of all reads at a locus that must contain an indel (of any allele) for that sample to contribute to the indel count for alleles.", "id": "#min_indel_frac" }, { "sbg:altPrefix": "-minIndelCnt", "sbg:category": "Unified Genotyper", "sbg:toolDefaultValue": "5", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--min_indel_count_for_genotyping", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Min Indel Cnt", "description": "Minimum number of consensus indels required to trigger genotyping run.", "id": "#min_indel_cnt" }, { "sbg:altPrefix": "-mbq", "sbg:category": "Unified Genotyper", "sbg:toolDefaultValue": "17", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--min_base_quality_score", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Min Base Quality Score", "description": "Minimum base quality required to consider a base for calling.", "id": "#min_base_quality_score" }, { "sbg:altPrefix": null, "sbg:category": "Execution", "sbg:toolDefaultValue": "2048", "type": [ "null", "int" ], "label": "Memory per job", "description": "Amount of RAM memory in MB to be used per job.", "id": "#memory_per_job" }, { "sbg:category": "Execution", "sbg:toolDefaultValue": "0", "type": [ "null", "int" ], "label": "Memory overhead per job", "description": "Memory overhead per job. By default this parameter value is set to '0' (zero megabytes). This parameter value is added to the Memory per job parameter value. This results in the allocation of the sum total (Memory per job and Memory overhead per job) amount of memory per job. By default the memory per job parameter value is set to 2048 megabytes, unless specified otherwise.", "id": "#memory_overhead_per_job" }, { "sbg:altPrefix": "-maxRuntimeUnits", "sbg:category": "GATK General", "sbg:toolDefaultValue": "MINUTES", "type": [ "null", { "type": "enum", "symbols": [ "NANOSECONDS", "MICROSECONDS", "MILLISECONDS", "SECONDS", "MINUTES", "HOURS", "DAYS" ], "name": "max_runtime_units" } ], "inputBinding": { "position": 0, "prefix": "--maxRuntimeUnits", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Max Runtime Units", "description": "The TimeUnit for maxRuntime.", "id": "#max_runtime_units" }, { "sbg:altPrefix": "-maxRuntime", "sbg:category": "GATK General", "sbg:toolDefaultValue": "-1", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--maxRuntime", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Max Runtime", "description": "If provided, that GATK will stop execution cleanly as soon after maxRuntime has been exceeded, truncating the run but not exiting with a failure. By default the value is interpreted in minutes, but this can be changed by maxRuntimeUnits.", "id": "#max_runtime" }, { "sbg:altPrefix": "-deletions", "sbg:category": "Unified Genotyper", "sbg:toolDefaultValue": "0.05", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--max_deletion_fraction", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Max Deletion Fraction", "description": "Maximum fraction of reads with deletions spanning this locus for it to be callable [to disable, set to 1; default:0.05].", "id": "#max_deletion_fraction" }, { "sbg:altPrefix": "-maxAltAlleles", "sbg:category": "Unified Genotyper", "sbg:toolDefaultValue": "6", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--max_alternate_alleles", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Max Alternate Alleles", "description": "Maximum number of alternate alleles to genotype.", "id": "#max_alternate_alleles" }, { "sbg:altPrefix": "-kpr", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--keep_program_records", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Keep Program Records", "description": "Should we override the Walker's default and keep program records from the SAM header.", "id": "#keep_program_records" }, { "required": false, "sbg:altPrefix": "-L", "sbg:category": "Input Files", "sbg:stageInput": "link", "type": [ "null", { "type": "array", "items": "File" } ], "inputBinding": { "position": 0, "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.intervals_file){\n if($job.inputs.intervals_file instanceof Array){\n if($job.inputs.intervals_file.length > 1){\n if([].concat($job.inputs.reads)[0].metadata)\n if([].concat($job.inputs.reads)[0].metadata.intervals_file)\n return '--intervals ' + [].concat($job.inputs.reads)[0].metadata.intervals_file\n } else return '--intervals ' + [].concat($job.inputs.intervals_file)[0].path\n } else return '--intervals ' + [].concat($job.inputs.intervals_file)[0].path\n } else\n return ''\n}" }, "sbg:cmdInclude": true }, "label": "Intervals", "description": "One or more genomic intervals over which to operate. Can be an specified in an .intervals file or a rod file.", "sbg:fileTypes": "BED, LIST, PICARD, INTERVAL_LIST, INTERVALS", "id": "#intervals_file" }, { "sbg:altPrefix": null, "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "string" ], "inputBinding": { "position": 0, "prefix": "-L", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Intervals", "description": "One or more genomic intervals over which to operate.", "id": "#intervals" }, { "sbg:altPrefix": "-isr", "sbg:category": "GATK General", "sbg:toolDefaultValue": "UNION", "type": [ "null", { "type": "enum", "symbols": [ "UNION", "INTERSECTION" ], "name": "interval_set_rule" } ], "inputBinding": { "position": 0, "prefix": "--interval_set_rule", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Interval Set Rule", "description": "Indicates the set merging approach the interval parser should use to combine the various -L or -XL inputs.", "id": "#interval_set_rule" }, { "sbg:altPrefix": "-ip", "sbg:category": "GATK General", "sbg:toolDefaultValue": "0", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--interval_padding", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Interval Padding", "description": "Indicates how many basepairs of padding to include around each of the intervals specified with the -L/--intervals argument.", "id": "#interval_padding" }, { "sbg:altPrefix": "-im", "sbg:category": "GATK General", "sbg:toolDefaultValue": "ALL", "type": [ "null", { "type": "enum", "symbols": [ "ALL", "OVERLAPPING_ONLY" ], "name": "interval_merging" } ], "inputBinding": { "position": 0, "prefix": "--interval_merging", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Interval Merging", "description": "Indicates the interval merging rule we should use for abutting intervals.", "id": "#interval_merging" }, { "sbg:altPrefix": "-indelHeterozygosity", "sbg:category": "Unified Genotyper", "sbg:toolDefaultValue": "0.000125", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--indel_heterozygosity", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Indel Heterozygosity", "description": "Heterozygosity for indel calling.", "id": "#indel_heterozygosity" }, { "sbg:altPrefix": "-indelGOP", "sbg:category": "Unified Genotyper", "sbg:toolDefaultValue": "45", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--indelGapOpenPenalty", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Indel Gap Open Penalty", "description": "Indel gap open penalty, as Phred-scaled probability. I.e., 30 => 10^-30/10.", "id": "#indel_gap_open_penalty" }, { "sbg:altPrefix": "-indelGCP", "sbg:category": "Unified Genotyper", "sbg:toolDefaultValue": "10", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--indelGapContinuationPenalty", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Indel Gap Continuation Penalty", "description": "Indel gap continuation penalty, as Phred-scaled probability. I.e., 30 => 10^-30/10.", "id": "#indel_gap_continuation_penalty" }, { "sbg:altPrefix": null, "sbg:category": "Unified Genotyper", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--ignoreLaneInfo", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Ignore Lane Info", "description": "Ignore lane when building error model, error model is then per-site.", "id": "#ignore_lane_info" }, { "sbg:altPrefix": "-hets", "sbg:category": "Unified Genotyper", "sbg:toolDefaultValue": "0.001", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--heterozygosity", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Heterozygosity", "description": "Heterozygosity value used to compute prior likelihoods for any locus.", "id": "#heterozygosity" }, { "sbg:altPrefix": "-G", "sbg:category": "Unified Genotyper", "sbg:toolDefaultValue": "[u'Standard']", "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 0, "prefix": "--group", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Group", "description": "One or more classes/groups of annotations to apply to variant calls.", "id": "#group" }, { "sbg:altPrefix": "-gt_mode", "sbg:category": "Unified Genotyper", "sbg:toolDefaultValue": "DISCOVERY", "type": [ "null", { "type": "enum", "symbols": [ "DISCOVERY", "GENOTYPE_GIVEN_ALLELES" ], "name": "genotyping_mode" } ], "inputBinding": { "position": 0, "prefix": "--genotyping_mode", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Genotyping Mode", "description": "Specifies how to determine the alternate alleles to use for genotyping.", "id": "#genotyping_mode" }, { "sbg:altPrefix": "-glm", "sbg:category": "Unified Genotyper", "sbg:toolDefaultValue": "SNP", "type": [ "null", { "type": "enum", "symbols": [ "SNP", "INDEL", "GENERALPLOIDYSNP", "GENERALPLOIDYINDEL", "BOTH" ], "name": "genotype_likelihoods_model" } ], "inputBinding": { "position": 0, "prefix": "--genotype_likelihoods_model", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Genotype Likelihoods Model", "description": "Genotype likelihoods calculation model to employ -- SNP is the default option, while INDEL is also available for calling indels and BOTH is available for calling both together.", "id": "#genotype_likelihoods_model" }, { "required": false, "sbg:altPrefix": "-K", "sbg:category": "Input Files", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--gatk_key", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Gatk key", "description": "GATK Key file. Required if running with -et NO_ET. Please see http://gatkforums.broadinstitute.org/discussion/1250/what-is-phone-home-and-how-does-it-affect-me#latest for details.", "sbg:fileTypes": "KEY, LICENSE", "id": "#gatk_key" }, { "sbg:altPrefix": "-fixMisencodedQuals", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "-fixMisencodedQuals", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Fix Misencoded Quals", "description": "Fix mis-encoded base quality scores.", "id": "#fix_misencoded_quals" }, { "required": false, "sbg:altPrefix": "-XL", "sbg:category": "Input Files", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--excludeIntervals", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Exclude Intervals", "description": "One or more genomic intervals to exclude from processing. Can be an .intervals file or a rod file.", "sbg:fileTypes": "TXT, BED, VCF", "id": "#exclude_intervals" }, { "sbg:altPrefix": "-XA", "sbg:category": "Unified Genotyper", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 0, "prefix": "--excludeAnnotation", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Exclude Annotation", "description": "One or more specific annotations to exclude.", "id": "#exclude_annotation" }, { "sbg:altPrefix": "-EOQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--emit_original_quals", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Emit Original Quals", "description": "If true, enables printing of the OQ tag with the original base qualities (with -BQSR).", "id": "#emit_original_quals" }, { "sbg:altPrefix": "-dt", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", { "type": "enum", "symbols": [ "NONE", "ALL_READS", "BY_SAMPLE" ], "name": "downsampling_type" } ], "inputBinding": { "position": 0, "prefix": "--downsampling_type", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Downsampling Type", "description": "Type of reads downsampling to employ at a given locus. Reads will be selected randomly to be removed from the pile based on the method described here.", "id": "#downsampling_type" }, { "sbg:altPrefix": "-dfrac", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--downsample_to_fraction", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Downsample to Fraction", "description": "Fraction [0.0-1.0] of reads to downsample to.", "id": "#downsample_to_fraction" }, { "sbg:altPrefix": "-dcov", "sbg:category": "GATK General", "sbg:toolDefaultValue": "", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--downsample_to_coverage", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Downsample to Coverage", "description": "Coverage to downsample to at any given locus; note that downsampled reads are randomly selected from all possible reads at a locus. For non-locus-based traversals (eg., ReadWalkers), this sets the maximum number of reads at each alignment start position.", "id": "#downsample_to_coverage" }, { "sbg:altPrefix": null, "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--disableRandomization", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Disable Randomization", "description": "Completely eliminates randomization from nondeterministic methods. To be used mostly in the testing framework where dynamic parallelism can result in differing numbers of calls to the generator.", "id": "#disable_radnomization" }, { "sbg:altPrefix": "-DIQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--disable_indel_quals", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Disable Indel Quals", "description": "If 'true', disables printing of base insertion and base deletion tags (with -BQSR). Turns off printing of the base insertion and base deletion tags when using the -BQSR argument and only the base substitution qualities will be produced.", "id": "#disable_indel_quals" }, { "sbg:altPrefix": "-DBQ", "sbg:category": "GATK General", "sbg:toolDefaultValue": "-1", "type": [ "null", "int" ], "inputBinding": { "position": 0, "prefix": "--defaultBaseQualities", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Default Base Qualities", "description": "If reads are missing some or all base quality scores, this value will be used for all base quality scores.", "id": "#default_base_qualities" }, { "required": false, "sbg:altPrefix": "-D", "sbg:category": "Input Files", "type": [ "null", "File" ], "inputBinding": { "position": 0, "separate": true, "itemSeparator": " ", "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if ($job.inputs.dbsnp)\n return '--dbsnp ' + [].concat($job.inputs.dbsnp)[0].path\n}" }, "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "DbSNP", "description": "DbSNP file in VCF format.", "sbg:fileTypes": "VCF", "id": "#dbsnp" }, { "sbg:altPrefix": null, "sbg:category": "Execution", "sbg:toolDefaultValue": "1", "type": [ "null", "int" ], "label": "CPU per job", "description": "Number of CPUs per job.", "id": "#cpu_per_job" }, { "sbg:altPrefix": "-contamination", "sbg:category": "Unified Genotyper", "sbg:toolDefaultValue": "0.05", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--contamination_fraction_to_filter", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Contamination", "description": "Fraction of contamination in sequencing data (for all samples) to aggressively remove.", "id": "#contamination" }, { "sbg:altPrefix": "-slod", "sbg:category": "Unified Genotyper", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--computeSLOD", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Compute Slod", "description": "If provided, we will calculate the SLOD (SB annotation).", "id": "#compute_slod" }, { "required": false, "sbg:altPrefix": null, "sbg:category": "Input Files", "type": [ "null", { "type": "array", "items": "File" } ], "inputBinding": { "position": 0, "prefix": "--comp", "separate": true, "sbg:cmdInclude": true }, "label": "Comp", "description": "Comparison VCF file.", "id": "#comp" }, { "required": false, "sbg:altPrefix": "-BQSR", "sbg:category": "Input Files", "type": [ "null", { "type": "array", "items": "File" } ], "inputBinding": { "position": 0, "prefix": "--BQSR", "separate": true, "sbg:cmdInclude": true }, "label": "BQSR", "description": "The input covariates table file which enables on-the-fly base quality score recalibration.", "sbg:fileTypes": "GRP", "id": "#bqsr" }, { "sbg:altPrefix": "-baqGOP", "sbg:category": "GATK General", "sbg:toolDefaultValue": "40.0", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--baqGapOpenPenalty", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "BAQ Gap Open Penalty", "description": "BAQ gap open penalty (Phred Scaled). Default value is 40. 30 is perhaps better for whole genome call sets.", "id": "#baq_gap_open_penalty" }, { "sbg:altPrefix": "-baq", "sbg:category": "GATK General", "sbg:toolDefaultValue": "OFF", "type": [ "null", { "type": "enum", "symbols": [ "OFF", "CALCULATE_AS_NECESSARY", "RECALCULATE" ], "name": "baq" } ], "inputBinding": { "position": 0, "prefix": "--baq", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "BAQ Calculation Type", "description": "Type of BAQ calculation to apply in the engine.", "id": "#baq" }, { "sbg:altPrefix": "-A", "sbg:category": "Unified Genotyper", "sbg:toolDefaultValue": "[]", "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 0, "prefix": "--annotation", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Annotation", "description": "One or more specific annotations to apply to variant calls.", "id": "#annotation" }, { "sbg:altPrefix": "-nda", "sbg:category": "Unified Genotyper", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--annotateNDA", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Annotate Nda", "description": "If provided, we will annotate records with the number of alternate alleles that were discovered (but not necessarily genotyped) at a given site.", "id": "#annotate_nda" }, { "sbg:altPrefix": "--allow_potentially_misencoded_quality_scores", "sbg:category": "GATK General", "sbg:toolDefaultValue": "False", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "-allowPotentiallyMisencodedQuals", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true }, "label": "Allow Potentially Misencoded Quals", "description": "Do not fail when encountered base qualities that are too high and seemingly indicate a problem with the base quality encoding of the BAM file.", "id": "#allow_potentailly_misencoded_quals" }, { "required": false, "sbg:altPrefix": "-alleles", "sbg:category": "Input Files", "type": [ "null", "File" ], "inputBinding": { "position": 0, "prefix": "--alleles", "separate": true, "itemSeparator": " ", "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Alleles", "description": "The set of alleles at which to genotype when --genotyping_mode is GENOTYPE_GIVEN_ALLELES.", "sbg:fileTypes": "VCF", "id": "#alleles" } ], "outputs": [ { "type": [ "null", "File" ], "label": "VCF", "description": "A raw, unfiltered, highly specific callset in VCF format.", "sbg:fileTypes": "VCF", "outputBinding": { "glob": "*.vcf", "sbg:inheritMetadataFrom": "#reads", "secondaryFiles": [ ".idx" ] }, "id": "#raw_vcf" } ], "requirements": [ { "class": "ExpressionEngineRequirement", "id": "#cwl-js-engine", "requirements": [ { "class": "DockerRequirement", "dockerPull": "rabix/js-engine" } ] } ], "hints": [ { "class": "sbg:CPURequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.cpu_per_job){\n \treturn $job.inputs.cpu_per_job\n }\n\treturn 1\n}" } }, { "class": "sbg:MemRequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.memory_per_job){\n if($job.inputs.memory_overhead_per_job){\n \treturn $job.inputs.memory_per_job + $job.inputs.memory_overhead_per_job\n }\n else\n \t\treturn $job.inputs.memory_per_job\n }\n else if(!$job.inputs.memory_per_job && $job.inputs.memory_overhead_per_job){\n\t\treturn 2048 + $job.inputs.memory_overhead_per_job \n }\n else\n \treturn 2048\n}" } }, { "class": "DockerRequirement", "dockerImageId": "47510cb2da55", "dockerPull": "images.sbgenomics.com/stefanristeski/gatk2-lite:2.3-9" } ], "arguments": [ { "position": 0, "prefix": "--out", "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n read_name = [].concat($job.inputs.reads)[0].path.replace(/^.*[\\\\\\/]/, '').split('.')\n read_namebase = read_name.slice(0, read_name.length-1).join('.')\n\n if($job.inputs.bqsr){\n \treturn read_namebase + '.base_recalibrated.vcf'\n }\n else{\n \treturn read_namebase + '.vcf'\n }\n}" } } ], "sbg:job": { "inputs": { "validation_strictness": null, "use_original_qualities": null, "use_legacy_downsampler": null, "unsafe": null, "threads_per_job": null, "tag": null, "stand_emit_conf": 10, "stand_call_conf": 50, "remove_program_records": null, "reference_sample_calls": null, "reference": { "path": "/folder/reference.fasta" }, "reads": [ { "path": "sample1.bam" }, { "path": "sample2.bam" } ], "read_group_black_list": [], "read_filter": [], "preserve_qscores_less_than": null, "phone_home": null, "pedigree_validation_type": null, "pedigree_string": [], "pcr_error_rate": null, "pair_hmm_implementation": null, "p_nonref_model": null, "output_mode": null, "non_deterministic_random_seed": null, "min_indel_frac": null, "min_indel_cnt": null, "min_base_quality_score": null, "memory_per_job": null, "memory_overhead_per_job": 0, "max_runtime_units": null, "max_runtime": null, "max_deletion_fraction": null, "max_alternate_alleles": null, "keep_program_records": null, "intervals_file": [ { "class": "File", "path": "/path/to/intervals_file-1.ext", "secondaryFiles": [], "size": 0 }, { "class": "File", "path": "/path/to/intervals_file-2.ext", "secondaryFiles": [], "size": 0 } ], "intervals": null, "interval_set_rule": null, "interval_padding": null, "interval_merging": null, "indel_heterozygosity": null, "indel_gap_open_penalty": null, "indel_gap_continuation_penalty": null, "ignore_lane_info": null, "heterozygosity": null, "group": [], "genotyping_mode": null, "genotype_likelihoods_model": null, "gatk_key": null, "fix_misencoded_quals": null, "exclude_intervals": null, "exclude_annotation": [], "emit_original_quals": null, "downsampling_type": null, "downsample_to_fraction": null, "downsample_to_coverage": null, "disable_radnomization": null, "disable_indel_quals": null, "default_base_qualities": null, "dbsnp": { "path": "/folder/dbSNP.vcf" }, "cpu_per_job": null, "contamination": null, "compute_slod": null, "comp": [], "bqsr": [], "baq_gap_open_penalty": null, "baq": null, "annotation": [], "annotate_nda": null, "allow_potentailly_misencoded_quals": null, "alleles": null }, "allocatedResources": { "mem": 2048, "cpu": 1 } }, "sbg:appVersion": [ "sbg:draft-2" ], "sbg:categories": [ "Variant-Calling" ], "sbg:cmdPreview": "java -Xmx2048M -jar /opt/GenomeAnalysisTKLite.jar --analysis_type UnifiedGenotyper -nt 4 --reference_sequence /folder/reference.fasta --input_file sample1.bam --input_file sample2.bam --out sample1.vcf", "sbg:contributors": [ "vladimirk", "bogdang", "bix-demo" ], "sbg:createdBy": "bix-demo", "sbg:createdOn": 1450911349, "sbg:id": "admin/sbg-public-data/gatk-2-3-9-lite-unifiedgenotyper/34", "sbg:image_url": null, "sbg:latestRevision": 15, "sbg:license": "MIT License", "sbg:links": [ { "id": "https://www.broadinstitute.org/gatk/index.php", "label": "Homepage" }, { "id": "https://github.com/broadgsa/gatk-protected", "label": "Source code" }, { "id": "https://www.broadinstitute.org/gatk/guide/pdfdocs/GATK_GuideBook_2.3-9.pdf", "label": "Wiki" }, { "id": "https://www.broadinstitute.org/gatk/download/auth?package=GATK-archive&version=2.3-9-ge5ebf34", "label": "Download" }, { "id": "https://www.broadinstitute.org/gatk/about/#in-the-literature", "label": "Publication" }, { "id": "https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_genotyper_UnifiedGenotyper.php", "label": "Documentation" } ], "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1484912384, "sbg:project": "bix-demo/gatk-2-3-9-lite-demo", "sbg:projectName": "GATK 2.3.9 Lite - Demo New ", "sbg:revision": 15, "sbg:revisionNotes": "Fix for single run without dbsnp", "sbg:revisionsInfo": [ { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911349, "sbg:revision": 0, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911350, "sbg:revision": 1, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911351, "sbg:revision": 2, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911352, "sbg:revision": 3, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911353, "sbg:revision": 4, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911354, "sbg:revision": 5, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1463297747, "sbg:revision": 6, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1463663096, "sbg:revision": 7, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1469527805, "sbg:revision": 8, "sbg:revisionNotes": "bam.bai extension removed from secondary files of reads input." }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1471364268, "sbg:revision": 9, "sbg:revisionNotes": "dbsnp link." }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1471445503, "sbg:revision": 10, "sbg:revisionNotes": "dbsnp guard []concat()." }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1472656204, "sbg:revision": 11, "sbg:revisionNotes": "Metadata scatter." }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1475751155, "sbg:revision": 12, "sbg:revisionNotes": "fix for single interval file support" }, { "sbg:modifiedBy": "vladimirk", "sbg:modifiedOn": 1475802580, "sbg:revision": 13, "sbg:revisionNotes": "Double --intervals removed!" }, { "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1476436438, "sbg:revision": 14, "sbg:revisionNotes": "^.bai --> .bai in reads input" }, { "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1484912384, "sbg:revision": 15, "sbg:revisionNotes": "Fix for single run without dbsnp" } ], "sbg:sbgMaintained": false, "sbg:toolAuthor": "Broad Institute", "sbg:toolkit": "GATK", "sbg:toolkitVersion": "2.3.9 Lite", "sbg:validationErrors": [], "x": 1971.6669149928698, "y": 422.7605820931547 }, "label": "GATK UnifiedGenotyper", "scatter": "#GATK_UnifiedGenotyper.reads", "sbg:x": 1971.6669149928698, "sbg:y": 422.7605820931547 }, { "id": "#Picard_CollectAlignmentSummaryMetrics", "inputs": [ { "id": "#Picard_CollectAlignmentSummaryMetrics.verbosity", "default": "INFO" }, { "id": "#Picard_CollectAlignmentSummaryMetrics.validation_stringency", "default": "SILENT" }, { "id": "#Picard_CollectAlignmentSummaryMetrics.reference", "source": [ "#SBG_FASTA_Indices.fasta_reference" ] }, { "id": "#Picard_CollectAlignmentSummaryMetrics.quiet", "default": "false" }, { "id": "#Picard_CollectAlignmentSummaryMetrics.metric_accumulation_level", "default": [ "ALL_READS" ] }, { "id": "#Picard_CollectAlignmentSummaryMetrics.max_insert_size", "default": 100000 }, { "id": "#Picard_CollectAlignmentSummaryMetrics.is_bisulfite_sequenced", "default": "false" }, { "id": "#Picard_CollectAlignmentSummaryMetrics.input_bam", "source": [ "#Sambamba_Merge.merged_bam" ] }, { "id": "#Picard_CollectAlignmentSummaryMetrics.compression_level", "default": 5 }, { "id": "#Picard_CollectAlignmentSummaryMetrics.assume_sorted", "default": "true" } ], "outputs": [ { "id": "#Picard_CollectAlignmentSummaryMetrics.summary_metrics" } ], "run": { "cwlVersion": "sbg:draft-2", "class": "CommandLineTool", "id": "admin/sbg-public-data/picard-collectalignmentsummarymetrics-1-140/7", "label": "Picard CollectAlignmentSummaryMetrics", "description": "Picard CollectAlignmentSummaryMetrics assesses the quality of alignment by analyzing a SAM or BAM file. It compares it with the reference file (FASTA) and provides alignment statistics, such as the number of input reads and the percent of reads that are mapped. It produces a file which contains summary alignment metrics from a SAM or BAM file.\n\nNote: This tool requires the exact same FASTA file as the one to which raw reads were aligned.\n\n### Common issues\n\n1) BAM file - Sort order should be coordinate based.\n2) Reference sequence file - Note that while this argument is not required, without it only a small subset of the metrics will be calculated. If reference sequence file is used, sequence index and dictionary are required. This tool requires the exact same FASTA file as the one to which raw reads were aligned.", "baseCommand": [ "java", { "class": "Expression", "engine": "#cwl-js-engine", "script": "{ \n if($job.inputs.memory_per_job){\n return '-Xmx'.concat($job.inputs.memory_per_job, 'M')\n } \n \treturn '-Xmx2048M'\n}" }, "-jar", "/opt/picard-tools-1.140/picard.jar", "CollectAlignmentSummaryMetrics" ], "inputs": [ { "sbg:category": "Options", "sbg:toolDefaultValue": "INFO", "type": [ "null", { "type": "enum", "symbols": [ "ERROR", "WARNING", "INFO", "DEBUG" ], "name": "verbosity" } ], "inputBinding": { "position": 6, "prefix": "VERBOSITY=", "separate": false, "sbg:cmdInclude": true }, "label": "Verbosity", "description": "Control verbosity of logging. Default value: INFO. This option can be set to 'null' to clear the default value. Possible values: {ERROR, WARNING, INFO, DEBUG}.", "id": "#verbosity" }, { "sbg:category": "Options", "sbg:toolDefaultValue": "SILENT", "type": [ "null", { "type": "enum", "symbols": [ "STRICT", "LENIENT", "SILENT" ], "name": "validation_stringency" } ], "inputBinding": { "position": 4, "prefix": "VALIDATION_STRINGENCY=", "separate": false, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if ($job.inputs.validation_stringency)\n {\n return $job.inputs.validation_stringency\n }\n else\n {\n return \"SILENT\"\n }\n}" }, "sbg:cmdInclude": true }, "label": "Validation stringency", "description": "Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default value: STRICT. This option can be set to 'null' to clear the default value. Possible values: {STRICT, LENIENT, SILENT}.", "id": "#validation_stringency" }, { "sbg:category": "Options", "sbg:toolDefaultValue": "0", "type": [ "null", "int" ], "inputBinding": { "position": 9, "prefix": "STOP_AFTER=", "separate": false, "sbg:cmdInclude": true }, "label": "Stop after", "description": "Stop after processing N reads, mainly for debugging. Default value: 0. This option can be set to 'null' to clear the default value.", "id": "#stop_after" }, { "required": false, "sbg:altPrefix": "R", "sbg:category": "File inputs", "type": [ "null", "File" ], "inputBinding": { "position": 3, "prefix": "REFERENCE_SEQUENCE=", "separate": false, "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Reference sequence", "description": "Reference sequence file. Note that while this argument is not required, without it only a small subset of the metrics will be calculated. If reference sequence file is used, sequence index and dictionary are required. This tool requires the exact same FASTA file as the one to which raw reads were aligned. Default value: null.", "sbg:fileTypes": "FASTA", "id": "#reference" }, { "sbg:category": "Options", "sbg:toolDefaultValue": "false", "type": [ "null", { "type": "enum", "symbols": [ "true", "false" ], "name": "quiet" } ], "inputBinding": { "position": 4, "prefix": "QUIET=", "separate": false, "sbg:cmdInclude": true }, "label": "Quiet", "description": "This parameter indicates whether to suppress job-summary info on System.err. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}.", "id": "#quiet" }, { "sbg:altPrefix": "LEVEL", "sbg:category": "Options", "sbg:toolDefaultValue": "ALL_READS", "type": [ "null", { "type": "array", "items": { "type": "enum", "symbols": [ "ALL_READS", "SAMPLE", "LIBRARY", "READ_GROUP" ] } } ], "inputBinding": { "position": 8, "prefix": "METRIC_ACCUMULATION_LEVEL=", "separate": false, "sbg:cmdInclude": true }, "label": "Metric accumulation level", "description": "This parameter indicates the level(s) at which to accumulate metrics. Default value: [ALL_READS]. This option can be set to 'null' to clear the default value. Possible values: {ALL_READS, SAMPLE, LIBRARY, READ_GROUP}. This option may be specified 0 or more times. This option can be set to 'null' to clear the default list.", "id": "#metric_accumulation_level" }, { "sbg:category": "Execution options", "sbg:toolDefaultValue": "2048", "type": [ "null", "int" ], "label": "Memory per job", "description": "Amount of RAM memory to be used per job. Defaults to 2048 MB for single threaded jobs.", "id": "#memory_per_job" }, { "sbg:category": "Options", "sbg:toolDefaultValue": "500000", "type": [ "null", "int" ], "inputBinding": { "position": 4, "prefix": "MAX_RECORDS_IN_RAM=", "separate": false, "sbg:cmdInclude": true }, "label": "Max records in RAM", "description": "When writing SAM files that need to be sorted, this parameter will specify the number of records stored in RAM before spilling to disk. Increasing this number reduces the number of file handles needed to sort a SAM file, and increases the amount of RAM needed. Default value: 500000. This option can be set to 'null' to clear the default value.", "id": "#max_records_in_ram" }, { "sbg:category": "Options", "sbg:toolDefaultValue": "100000", "type": [ "null", "int" ], "inputBinding": { "position": 9, "prefix": "MAX_INSERT_SIZE=", "separate": false, "sbg:cmdInclude": true }, "label": "Max insert size", "description": "Paired end reads above this insert size will be considered chimeric along with inter-chromosomal pairs. Default value: 100000. This option can be set to 'null' to clear the default value.", "id": "#max_insert_size" }, { "sbg:altPrefix": "BS", "sbg:category": "Options", "sbg:toolDefaultValue": "false", "type": [ "null", { "type": "enum", "symbols": [ "true", "false" ], "name": "is_bisulfite_sequenced" } ], "inputBinding": { "position": 8, "prefix": "BS=", "separate": false, "sbg:cmdInclude": true }, "label": "Is bisulfite sequenced", "description": "This parameter indicates whether the SAM or BAM file consists of bisulfite sequenced reads. Default value: false. This option can be set to 'null' to clear the default value. Possible values: {true, false}.", "id": "#is_bisulfite_sequenced" }, { "required": true, "sbg:altPrefix": "I", "sbg:category": "File inputs", "type": [ "File" ], "inputBinding": { "position": 0, "prefix": "INPUT=", "separate": false, "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Input file", "description": "Input SAM or BAM file. Required. Note: Sort order should be coordinate based.", "sbg:fileTypes": "BAM, SAM", "id": "#input_bam" }, { "sbg:category": "Options", "sbg:toolDefaultValue": "5", "type": [ "null", "int" ], "inputBinding": { "position": 4, "prefix": "COMPRESSION_LEVEL=", "separate": false, "sbg:cmdInclude": true }, "label": "Compression level", "description": "Compression level for all compressed files created (e.g. BAM and GELI). Default value: 5. This option can be set to 'null' to clear the default value.", "id": "#compression_level" }, { "sbg:altPrefix": "AS", "sbg:category": "Options", "sbg:toolDefaultValue": "true", "type": [ "null", { "type": "enum", "symbols": [ "true", "false" ], "name": "assume_sorted" } ], "inputBinding": { "position": 0, "prefix": "ASSUME_SORTED=", "separate": false, "sbg:cmdInclude": true }, "label": "Assume sorted", "description": "If this parameter is set to true, the sort order in the header file will be ignored. Default value: true. This option can be set to 'null' to clear the default value. Possible values: {true, false}.", "id": "#assume_sorted" }, { "sbg:category": "Options", "sbg:toolDefaultValue": "AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT", "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 8, "prefix": "ADAPTER_SEQUENCE=", "separate": false, "sbg:cmdInclude": true }, "label": "Adapter sequence", "description": "List of adapter sequences to use when processing the alignment metrics. Default value: [AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG, AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, AGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG, AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT, AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG]. This option can be set to 'null' to clear the default value. This option may be specified 0 or more times. This option can be set to 'null' to clear the default list.", "id": "#adapter_sequence" } ], "outputs": [ { "type": [ "File" ], "label": "Summary metrics", "description": "File to which the output will be written.", "sbg:fileTypes": "TXT", "outputBinding": { "glob": "*.summary_metrics.txt", "sbg:inheritMetadataFrom": "#input_bam" }, "id": "#summary_metrics" } ], "requirements": [ { "class": "ExpressionEngineRequirement", "id": "#cwl-js-engine", "requirements": [ { "class": "DockerRequirement", "dockerPull": "rabix/js-engine" } ] } ], "hints": [ { "class": "sbg:CPURequirement", "value": 1 }, { "class": "sbg:MemRequirement", "value": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if($job.inputs.memory_per_job){\n \treturn $job.inputs.memory_per_job\n }\n \treturn 2048\n}" } }, { "class": "DockerRequirement", "dockerImageId": "eab0e70b6629", "dockerPull": "images.sbgenomics.com/mladenlsbg/picard:1.140" } ], "arguments": [ { "position": 3, "prefix": "OUTPUT=", "separate": false, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if ($job.inputs.input_bam)\n {\n filename = [].concat($job.inputs.input_bam)[0].path\n filebase = filename.split('.').slice(0, -1)\n\n return filebase.concat(\"summary_metrics.txt\").join(\".\").replace(/^.*[\\\\\\/]/, '')\n }\n}\n" } } ], "sbg:job": { "inputs": { "verbosity": null, "validation_stringency": null, "stop_after": null, "reference": { "path": "/root/directory/example.fasta" }, "quiet": null, "metric_accumulation_level": [ "ALL_READS" ], "memory_per_job": 0, "max_records_in_ram": null, "max_insert_size": null, "is_bisulfite_sequenced": null, "input_bam": { "path": "/root/folder/example.bam" }, "compression_level": null, "assume_sorted": null, "adapter_sequence": [] }, "allocatedResources": { "mem": 2048, "cpu": 1 } }, "sbg:appVersion": [ "sbg:draft-2" ], "sbg:categories": [ "SAM/BAM-Processing", "Quality-Control", "Quantification" ], "sbg:cmdPreview": "java -Xmx2048M -jar /opt/picard-tools-1.140/picard.jar CollectAlignmentSummaryMetrics INPUT=/root/folder/example.bam OUTPUT=example.summary_metrics.txt", "sbg:contributors": [ "medjo", "bix-demo" ], "sbg:createdBy": "bix-demo", "sbg:createdOn": 1450911255, "sbg:id": "admin/sbg-public-data/picard-collectalignmentsummarymetrics-1-140/7", "sbg:image_url": null, "sbg:latestRevision": 7, "sbg:license": "MIT License, Apache 2.0 Licence", "sbg:links": [ { "id": "http://broadinstitute.github.io/picard/index.html", "label": "Homepage" }, { "id": "https://github.com/broadinstitute/picard/releases/tag/1.140", "label": "Source Code" }, { "id": "http://broadinstitute.github.io/picard/", "label": "Wiki" }, { "id": "https://github.com/broadinstitute/picard/zipball/master", "label": "Download" }, { "id": "http://broadinstitute.github.io/picard/", "label": "Publication" } ], "sbg:modifiedBy": "medjo", "sbg:modifiedOn": 1491905414, "sbg:project": "bix-demo/picard-1-140-demo", "sbg:projectName": "Picard 1.140 - Demo New", "sbg:revision": 7, "sbg:revisionsInfo": [ { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911255, "sbg:revision": 0, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911256, "sbg:revision": 1, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911257, "sbg:revision": 2, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1472811193, "sbg:revision": 3, "sbg:revisionNotes": "[].concat(input_bam)[0].path" }, { "sbg:modifiedBy": "medjo", "sbg:modifiedOn": 1491475478, "sbg:revision": 4, "sbg:revisionNotes": "Category field is set" }, { "sbg:modifiedBy": "medjo", "sbg:modifiedOn": 1491479931, "sbg:revision": 5, "sbg:revisionNotes": "Common isses" }, { "sbg:modifiedBy": "medjo", "sbg:modifiedOn": 1491486008, "sbg:revision": 6, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "medjo", "sbg:modifiedOn": 1491905414, "sbg:revision": 7, "sbg:revisionNotes": null } ], "sbg:sbgMaintained": false, "sbg:toolAuthor": "Broad Institute", "sbg:toolkit": "Picard", "sbg:toolkitVersion": "1.140", "sbg:validationErrors": [], "x": 2506.666847652864, "y": -523.3333948188377 }, "label": "Picard CollectAlignmentSummaryMetrics", "sbg:x": 2506.666847652864, "sbg:y": -523.3333948188377 }, { "id": "#SBG_Genome_Coverage", "inputs": [ { "id": "#SBG_Genome_Coverage.format", "default": "Bed-Histogram" }, { "id": "#SBG_Genome_Coverage.fasta", "source": [ "#SBG_FASTA_Indices.fasta_reference" ] }, { "id": "#SBG_Genome_Coverage.coverage_interval", "default": "Entire Interval" }, { "id": "#SBG_Genome_Coverage.bam", "source": [ "#Sambamba_Merge.merged_bam" ] } ], "outputs": [ { "id": "#SBG_Genome_Coverage.summary" }, { "id": "#SBG_Genome_Coverage.per_interval" }, { "id": "#SBG_Genome_Coverage.bed_graph" } ], "run": { "cwlVersion": "sbg:draft-2", "class": "CommandLineTool", "id": "admin/sbg-public-data/sbg-genome-coverage/3", "label": "SBG Genome Coverage", "description": "SBG Genome Coverage extends BEDTools Genome Coverage. The Genome Coverage calculates histograms, per-base reports and BedGraph summaries of feature coverage (aligned sequences for example) for a given genome. This extended version additionally extracts and creates a text file containing summary coverage stats.", "baseCommand": [ "python3.6", "sbg_genome_coverage.py" ], "inputs": [ { "sbg:category": "OPTIONS", "type": [ "null", "string" ], "inputBinding": { "position": 0, "prefix": "--trackopt", "separate": true, "sbg:cmdInclude": true }, "label": "Additional track", "description": "Writes additional track line definition parameters in the first line.", "id": "#trackopt" }, { "sbg:category": "OPTIONS", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--trackline", "separate": false, "sbg:cmdInclude": true }, "label": "UCSC track line", "description": "Adds a UCSC/Genome-Browser track line definition in the first line of the output.", "id": "#trackline" }, { "sbg:category": "OPTIONS", "type": [ "null", { "type": "enum", "symbols": [ "Not Specified", "Forward+", "Reverse-" ], "name": "strand" } ], "inputBinding": { "position": 0, "prefix": "--strand", "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if ($job.inputs.strand == 'Not Specified') return 0\n if ($job.inputs.strand == 'Forward+') return 1\n if ($job.inputs.strand == 'Reverse-') return 2\n}" }, "sbg:cmdInclude": true }, "label": "Strand", "description": "Calculate coverage of intervals from a specific strand.", "id": "#strand" }, { "sbg:category": "OPTIONS", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--split", "separate": false, "sbg:cmdInclude": true }, "label": "Split", "description": "Treat BAM entries as distinct BED intervals when computing coverage. Uses CIGAR 'N' and 'D' operations to infer the blocks for computing coverage.", "id": "#split" }, { "sbg:category": "OPTIONS", "type": [ "null", "float" ], "inputBinding": { "position": 0, "prefix": "--scale", "separate": true, "sbg:cmdInclude": true }, "label": "Scale", "description": "Scale the coverage by a constant factor. Requires BedGraph or Dept Per Base output.", "id": "#scale" }, { "sbg:category": "OPTIONS", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--report_z", "separate": false, "sbg:cmdInclude": true }, "label": "Report Z", "description": "If BedGraph or Depth Per Base output is selected, also report zero-depth positions.", "id": "#report_z" }, { "sbg:category": "OPTIONS", "type": [ "null", "boolean" ], "inputBinding": { "position": 0, "prefix": "--gzipped", "separate": false, "sbg:cmdInclude": true }, "label": "GZipped", "description": "Compress output with gzip.", "id": "#gzipped" }, { "sbg:category": "OPTIONS", "type": [ "null", { "type": "enum", "symbols": [ "Bed-Histogram", "Bed-DepthPerBase", "BedGraph" ], "name": "format" } ], "inputBinding": { "position": 0, "prefix": "--format", "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if ($job.inputs.format == 'Bed-Histogram') return 0\n if ($job.inputs.format == 'Bed-DepthPerBase') return 1\n if ($job.inputs.format == 'BedGraph') return 2\n}" }, "sbg:cmdInclude": true }, "label": "Format", "description": "Output format.", "id": "#format" }, { "required": true, "sbg:category": "INPUT FILES", "type": [ "File" ], "inputBinding": { "position": 0, "prefix": "-f", "separate": true, "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Fasta", "description": "Reference file.", "sbg:fileTypes": "FASTA, FA", "id": "#fasta" }, { "sbg:category": "OPTIONS", "type": [ "null", { "type": "enum", "symbols": [ "Entire Interval", "3' Positions Only", "5' Positions Only" ], "name": "coverage_interval" } ], "inputBinding": { "position": 0, "prefix": "--coverage_interval", "separate": true, "valueFrom": { "class": "Expression", "engine": "#cwl-js-engine", "script": "{\n if ($job.inputs.coverage_interval == 'Entire Interval') return 0\n if ($job.inputs.coverage_interval == \"3' Positions Only\") return 3\n if ($job.inputs.coverage_interval == \"5' Positions Only\") return 5\n}" }, "sbg:cmdInclude": true }, "label": "Coverage interval", "description": "Coverage calculation.", "id": "#coverage_interval" }, { "required": true, "sbg:category": "INPUT FILES", "type": [ "File" ], "inputBinding": { "position": 0, "prefix": "-b", "separate": true, "sbg:cmdInclude": true, "secondaryFiles": [] }, "label": "Bam", "description": "Input BAM file for coverage calculation.", "sbg:fileTypes": "BAM", "id": "#bam" } ], "outputs": [ { "type": [ "null", "File" ], "label": "Summary", "description": "Summary file.", "outputBinding": { "glob": "*.summary", "sbg:metadata": { "file_format": "TEXT" }, "sbg:inheritMetadataFrom": "#bam" }, "id": "#summary" }, { "type": [ "null", "File" ], "label": "Per interval", "outputBinding": { "glob": { "class": "Expression", "engine": "#cwl-js-engine", "script": "if ($job.inputs.gzipped)\n\t'*.per_interval.bed.gz'\nelse\n\t'*.per_interval.bed'" }, "sbg:metadata": { "file_format": "BED" }, "sbg:inheritMetadataFrom": "#bam" }, "id": "#per_interval" }, { "type": [ "null", "File" ], "label": "Bed graph", "outputBinding": { "glob": { "class": "Expression", "engine": "#cwl-js-engine", "script": "if ($job.inputs.gzipped)\n\t'*.bedgraph.gz'\nelse\n\t'*.bedgraph'" } }, "id": "#bed_graph" } ], "requirements": [ { "class": "ExpressionEngineRequirement", "id": "#cwl-js-engine", "requirements": [ { "class": "DockerRequirement", "dockerPull": "rabix/js-engine" } ] }, { "class": "CreateFileRequirement", "fileDef": [ { "filename": "sbg_genome_coverage.py", "fileContent": "\"\"\"\nUsage:\n sbg_genome_coverage.py --bam FILE --fasta FILE [options]\n\nOptions:\n -h, --help Show this message.\n\n -b, --bam FILE Input BAM file for coverage calculation.\n\n -f, --fasta FILE Reference file.\n\n --format ENUM Output format. Available options: {0, 1, 2}\n 0: Bed-Histogram\n 1: Bed-DepthPerBase\n 2: BedGraph\n [default: 0]\n\n --report_z If BedGraph or Depth Per Base output is\n selected, also report zero-depth positions.\n\n --gzipped Compress output with gzip.\n\n --split Treat BAM entries as distinct BED intervals\n when computing coverage. Uses CIGAR 'N' and\n 'D' operations to infer the blocks for\n computing coverage.\n\n --strand ENUM Calculate coverage of intervals from a\n specific strand.\n Available options: {0, 1, 2}\n 0: Not Specified\n 1: Forward+\n 2: Reverse-\n [default: 0]\n\n --coverage_interval ENUM Coverage calculation.\n Available options: \n 0: Entire Interval\n 3: 3' Positions Only\n 5: 5' Positions Only\n [default: 0]\n\n --scale FLOAT Scale the coverage by a constant factor.\n Requires BedGraph or Dept Per Base output.\n [default: 1.0]\n\n --trackopt STR Writes additional track line definition\n parameters in the first line.\n\n --trackline Adds a UCSC/Genome-Browser track line\n definition in the first line of the output.\n\n\"\"\"\n\nfrom docopt import docopt\nimport os\nimport pipes\nfrom pathlib import Path\nimport subprocess\nfrom enum import IntEnum\nfrom Compressor import PigzCompressor\n\nargs = docopt(__doc__, version='1.0')\n\nargs['--format'] = int(args['--format'])\nargs['--strand'] = int(args['--strand'])\nargs['--coverage_interval'] = int(args['--coverage_interval'])\nargs['--scale'] = float(args['--scale'])\n\n# BEDTOOLS_ROOT = ''\n\nBEDTOOLS_ROOT = '/opt/bedtools2/bin/'\n\nclass BEDTOOLS_FORMAT(IntEnum):\n BedHistogram = 0\n BedDepthPerBase = 1\n BedGraph = 2\n\nclass BEDTOOLS_STRAND(IntEnum):\n NotSpecified = 0\n Forward = 1\n Reverse = 2\n\nclass BEDTOOLS_COVERAGE_INTERVAL:\n EntireInterval = 0\n ThreePrimPositionsOnly = 3\n FivePrimPositionsOnly = 5\n\ndef append_arg(arg_list, *args):\n for arg in args:\n if arg not in (None, ''):\n arg_list.append(pipes.quote(arg))\n\ndef append_narg(arg_list, key, value):\n if value in (None, '') or value is False:\n return\n if isinstance(value, bool):\n append_arg(arg_list, key)\n else:\n append_arg(arg_list, key, str(value))\n\ndef get_output_file_name(bam, params):\n just_name = lambda file_name: Path(file_name).name\n\n if params['--format'] == BEDTOOLS_FORMAT.BedDepthPerBase:\n return just_name(bam) + '.per_interval.bed'\n elif params['--format'] == BEDTOOLS_FORMAT.BedHistogram:\n return just_name(bam) + '.bed'\n elif params['--format'] == BEDTOOLS_FORMAT.BedGraph:\n return just_name(bam) + '.bedgraph'\n else:\n raise Exception('Unsupported --format value.')\n\ndef get_bedtools_cmd(bam, genome, parameters) -> list:\n cmd_bedtools = [os.path.join(BEDTOOLS_ROOT, 'genomeCoverageBed'), '-ibam', bam, '-g', genome]\n\n if parameters['--format'] == BEDTOOLS_FORMAT.BedDepthPerBase:\n if parameters['--report_z'] is True:\n append_arg(cmd_bedtools, '-d')\n else:\n append_arg(cmd_bedtools, '-dz')\n elif parameters['--format'] == BEDTOOLS_FORMAT.BedGraph:\n if parameters['--report_z'] is True:\n append_arg(cmd_bedtools, '-bga')\n else:\n append_arg(cmd_bedtools, '-bg')\n\n append_narg(cmd_bedtools, '-split', parameters['--split'])\n\n if parameters['--coverage_interval'] == BEDTOOLS_COVERAGE_INTERVAL.FivePrimPositionsOnly:\n append_arg(cmd_bedtools, '-5')\n elif parameters['--coverage_interval'] == BEDTOOLS_COVERAGE_INTERVAL.ThreePrimPositionsOnly:\n append_arg(cmd_bedtools, '-3')\n\n if parameters['--format'] == BEDTOOLS_FORMAT.BedDepthPerBase or parameters['--format'] == BEDTOOLS_FORMAT.BedGraph:\n append_narg(cmd_bedtools, '-scale', parameters['--scale'])\n\n if parameters['--strand'] == BEDTOOLS_STRAND.Forward:\n append_narg(cmd_bedtools, '-strand', '+')\n elif parameters['--strand'] == BEDTOOLS_STRAND.Reverse:\n append_narg(cmd_bedtools, '-strand', '-')\n\n if parameters['--format'] == BEDTOOLS_FORMAT.BedGraph:\n append_narg(cmd_bedtools, '-trackopts', parameters['--trackopt'])\n append_narg(cmd_bedtools, '-trackline', parameters['--trackline'])\n\n return cmd_bedtools\n\ndef handle_bed_histogram(bam, genome, parameters, gzip, ziper):\n output_file = get_output_file_name(bam, parameters)\n cmd_bedtools = get_bedtools_cmd(bam, genome, parameters)\n cmd_summary = ['grep', '^genome']\n cmd_intervals = ['grep', '-v', '^genome']\n cmd_pigz = ['pigz', '--force', '--keep']\n\n summary = open(Path(output_file).name + '.summary', 'wb')\n per_interval_path = Path(output_file).name + '.per_interval.bed.gz' if gzip \\\n else Path(output_file).name + '.per_interval.bed'\n\n per_interval = open(per_interval_path, 'wb')\n\n bed_proc = subprocess.Popen(cmd_bedtools, stdout=subprocess.PIPE)\n sum_proc = subprocess.Popen(cmd_summary, stdin=subprocess.PIPE, stdout=summary)\n inter_proc = subprocess.Popen(cmd_intervals, stdin=subprocess.PIPE, stdout=subprocess.PIPE if gzip else per_interval)\n pigz_proc = None\n\n if gzip:\n pigz_proc = subprocess.Popen(cmd_pigz, stdin=inter_proc.stdout, stdout=per_interval)\n\n try:\n while True:\n data = bed_proc.stdout.readline()\n if not data:\n sum_proc.stdin.close()\n inter_proc.stdin.close()\n break\n sum_proc.stdin.write(data)\n inter_proc.stdin.write(data)\n\n bed_proc.wait()\n inter_proc.wait()\n sum_proc.wait()\n\n if gzip and pigz_proc:\n pigz_proc.wait()\n\n except Exception as e:\n print (str(e))\n bed_proc.kill()\n sum_proc.kill()\n inter_proc.kill()\n finally:\n summary.close()\n per_interval.close()\n\ndef handle_graph_or_depth(bam, genome, parameters, gzip, ziper):\n output_file = get_output_file_name(bam, parameters)\n cmd_bedtools = get_bedtools_cmd(bam, genome, parameters)\n\n stdout = subprocess.PIPE if gzip else open(output_file, 'w')\n process = subprocess.Popen(cmd_bedtools, stdout=stdout)\n if gzip:\n with open(output_file + '.gz', 'wb') as out:\n ziper.compress(stdin=process.stdout, stdout=out, params={'threads': 2})\n process.wait()\n\ndef main():\n ziper = PigzCompressor()\n if args['--format'] == BEDTOOLS_FORMAT.BedHistogram:\n handle_bed_histogram(args['--bam'], args['--fasta'], args, args['--gzipped'], ziper)\n elif args['--format'] == BEDTOOLS_FORMAT.BedDepthPerBase:\n handle_graph_or_depth(args['--bam'], args['--fasta'], args, args['--gzipped'], ziper)\n elif args['--format'] == BEDTOOLS_FORMAT.BedGraph:\n handle_graph_or_depth(args['--bam'], args['--fasta'], args, args['--gzipped'], ziper)\nif __name__ == '__main__':\n main()" } ] } ], "hints": [ { "class": "sbg:CPURequirement", "value": 1 }, { "class": "sbg:MemRequirement", "value": 4096 }, { "class": "DockerRequirement", "dockerPull": "images.sbgenomics.com/filip_tubic/sbg_genome_coverage:2.0" } ], "sbg:job": { "inputs": { "trackopt": "trackopt", "trackline": true, "strand": "Forward+", "split": true, "scale": 0, "report_z": true, "gzipped": true, "format": "BedGraph", "fasta": { "class": "File", "path": "fasta.ext", "secondaryFiles": [], "size": 0 }, "coverage_interval": "Entire Interval", "bam": { "class": "File", "path": "bam.ext", "secondaryFiles": [], "size": 0 } }, "allocatedResources": { "mem": 4096, "cpu": 1 } }, "sbg:appVersion": [ "sbg:draft-2" ], "sbg:categories": [ "Analysis" ], "sbg:cmdPreview": "python3.6 sbg_genome_coverage.py -b bam.ext -f fasta.ext", "sbg:contributors": [ "bogdang", "filip_tubic", "bix-demo" ], "sbg:createdBy": "bix-demo", "sbg:createdOn": 1450911308, "sbg:id": "admin/sbg-public-data/sbg-genome-coverage/3", "sbg:image_url": null, "sbg:latestRevision": 3, "sbg:license": "Apache License 2.0", "sbg:modifiedBy": "filip_tubic", "sbg:modifiedOn": 1493296999, "sbg:project": "bix-demo/sbgtools-demo", "sbg:projectName": "SBGTools - Demo New", "sbg:revision": 3, "sbg:revisionsInfo": [ { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911308, "sbg:revision": 0, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bix-demo", "sbg:modifiedOn": 1450911308, "sbg:revision": 1, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "bogdang", "sbg:modifiedOn": 1476214514, "sbg:revision": 2, "sbg:revisionNotes": "BEDTools newer version" }, { "sbg:modifiedBy": "filip_tubic", "sbg:modifiedOn": 1493296999, "sbg:revision": 3, "sbg:revisionNotes": null } ], "sbg:sbgMaintained": false, "sbg:toolAuthor": "Seven Bridges Genomics", "sbg:toolkit": "SBGTools", "sbg:validationErrors": [], "x": 2820.0009155273456, "y": -721.2212371826176 }, "label": "SBG Genome Coverage", "sbg:x": 2820.0009155273456, "sbg:y": -721.2212371826176 } ], "hints": [ { "class": "sbg:AWSInstanceType", "value": "c3.8xlarge" }, { "class": "sbg:GoogleInstanceType", "value": "n1-standard-32;pd-ssd;4096" }, { "class": "sbg:useSbgFS", "value": "true" } ], "sbg:batchInput": "#fastq", "sbg:batchBy": { "type": "criteria", "criteria": [ "metadata.sample_id" ] }, "requirements": [], "sbg:appVersion": [ "sbg:draft-2" ], "sbg:canvas_x": 218, "sbg:canvas_y": 67, "sbg:canvas_zoom": 0.5999999999999996, "sbg:categories": [ "WGS" ], "sbg:contributors": [ "admin", "sevenbridges" ], "sbg:createdBy": "sevenbridges", "sbg:createdOn": 1459852872, "sbg:id": "admin/sbg-public-data/whole-genome-analysis-bwa-gatk-2-3-9-lite/58", "sbg:image_url": "https://igor.sbgenomics.com/ns/brood/images/admin/sbg-public-data/whole-genome-analysis-bwa-gatk-2-3-9-lite/58.png", "sbg:latestRevision": 58, "sbg:license": "Apache License 2.0", "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1499770338, "sbg:project": "admin/sbg-public-data", "sbg:projectName": "SBG Public Data", "sbg:publisher": "sbg", "sbg:revision": 58, "sbg:revisionNotes": "Batch for fastqs", "sbg:revisionsInfo": [ { "sbg:modifiedBy": "sevenbridges", "sbg:modifiedOn": 1459852872, "sbg:revision": 0, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "sevenbridges", "sbg:modifiedOn": 1459852872, "sbg:revision": 1, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "sevenbridges", "sbg:modifiedOn": 1462904150, "sbg:revision": 2, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "sevenbridges", "sbg:modifiedOn": 1465231639, "sbg:revision": 3, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "sevenbridges", "sbg:modifiedOn": 1468402327, "sbg:revision": 4, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "sevenbridges", "sbg:modifiedOn": 1468402328, "sbg:revision": 5, "sbg:revisionNotes": null }, { "sbg:modifiedBy": "sevenbridges", "sbg:modifiedOn": 1468402328, "sbg:revision": 6, "sbg:revisionNotes": "Intervals in 2 BED files, one with GLs executes sequentially." }, { "sbg:modifiedBy": "sevenbridges", "sbg:modifiedOn": 1468402328, "sbg:revision": 7, "sbg:revisionNotes": "Added reads connection to Indel Realigner" }, { "sbg:modifiedBy": "sevenbridges", "sbg:modifiedOn": 1468402329, "sbg:revision": 8, "sbg:revisionNotes": "GL Intervals connected to Indel Realigner, Printreads and Unified Genotyper" }, { "sbg:modifiedBy": "sevenbridges", "sbg:modifiedOn": 1468402329, "sbg:revision": 9, "sbg:revisionNotes": "Added description for 2-BED file parallelization." }, { "sbg:modifiedBy": "sevenbridges", "sbg:modifiedOn": 1468402330, "sbg:revision": 10, "sbg:revisionNotes": "Added Split BED node." }, { "sbg:modifiedBy": "sevenbridges", "sbg:modifiedOn": 1468402330, "sbg:revision": 11, "sbg:revisionNotes": "updated GATK tools." }, { "sbg:modifiedBy": "sevenbridges", "sbg:modifiedOn": 1468402330, "sbg:revision": 12, "sbg:revisionNotes": "SnpEff4.2 added." }, { "sbg:modifiedBy": "sevenbridges", "sbg:modifiedOn": 1468402331, "sbg:revision": 13, "sbg:revisionNotes": "SnpEff threads and memory set.\nBWA MEM threads set to 30, to allow it to work in parallel with FastqC." }, { "sbg:modifiedBy": "sevenbridges", "sbg:modifiedOn": 1468402332, "sbg:revision": 14, "sbg:revisionNotes": "BWA MEM memory = 54Gb" }, { "sbg:modifiedBy": "sevenbridges", "sbg:modifiedOn": 1468402332, "sbg:revision": 15, "sbg:revisionNotes": "SnpEff summary HTML" }, { "sbg:modifiedBy": "sevenbridges", "sbg:modifiedOn": 1468402333, "sbg:revision": 16, "sbg:revisionNotes": "GATK Best practice changes." }, { "sbg:modifiedBy": "sevenbridges", "sbg:modifiedOn": 1468402333, "sbg:revision": 17, "sbg:revisionNotes": "BUGfix SBG Pass intervals connected" }, { "sbg:modifiedBy": "sevenbridges", "sbg:modifiedOn": 1468402334, "sbg:revision": 18, "sbg:revisionNotes": "BUGFIX sbg pass intervals connected." }, { "sbg:modifiedBy": "sevenbridges", "sbg:modifiedOn": 1468402334, "sbg:revision": 19, "sbg:revisionNotes": "BUGFIX SBG Pass intervals" }, { "sbg:modifiedBy": "sevenbridges", "sbg:modifiedOn": 1468402335, "sbg:revision": 20, "sbg:revisionNotes": "BUGFIX SBG Pass intervals" }, { "sbg:modifiedBy": "sevenbridges", "sbg:modifiedOn": 1468402336, "sbg:revision": 21, "sbg:revisionNotes": "BUGFIX SBG Pass intervals" }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1470145406, "sbg:revision": 22, "sbg:revisionNotes": "Source for Apps from Public Apps removed." }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1471539432, "sbg:revision": 23, "sbg:revisionNotes": "Added SBG Quality Adjuster." }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1471539433, "sbg:revision": 24, "sbg:revisionNotes": "Quality Adjuster scattered." }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1471539434, "sbg:revision": 25, "sbg:revisionNotes": "Prepare VQSR updated." }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1471539435, "sbg:revision": 26, "sbg:revisionNotes": "All inputs set to required." }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1471539436, "sbg:revision": 27, "sbg:revisionNotes": "Updated staged to link at some GATK tools." }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1471539437, "sbg:revision": 28, "sbg:revisionNotes": "UG fixed." }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1471539438, "sbg:revision": 29, "sbg:revisionNotes": "UG scatter []concat(dbsnp)" }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1471539439, "sbg:revision": 30, "sbg:revisionNotes": "Small fix - no link for dbsnp." }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1471963088, "sbg:revision": 31, "sbg:revisionNotes": "Missing suggested files set." }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1471963090, "sbg:revision": 32, "sbg:revisionNotes": "Missing file types set." }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1471963091, "sbg:revision": 33, "sbg:revisionNotes": "bwa-mem updated." }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1471963092, "sbg:revision": 34, "sbg:revisionNotes": "percent_bad_variants on VQSRs = 0.05" }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1471963093, "sbg:revision": 35, "sbg:revisionNotes": "Returned to older version with all inputs set to required." }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1473165023, "sbg:revision": 36, "sbg:revisionNotes": "Prepare intervals with tools in scatter mode. All GATK \"Scatter\" tools removed." }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1475500767, "sbg:revision": 37, "sbg:revisionNotes": "Input fastq batch by sample" }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1476270163, "sbg:revision": 38, "sbg:revisionNotes": "Interval -L 20 set at BaseRecalibrator" }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1476270164, "sbg:revision": 39, "sbg:revisionNotes": "Tools updated. Added detailed description of the workflow." }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1476440105, "sbg:revision": 40, "sbg:revisionNotes": "BQSR intervals set to required" }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1476801717, "sbg:revision": 41, "sbg:revisionNotes": "BWA-MEM and GATK tools optimized." }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1476801718, "sbg:revision": 42, "sbg:revisionNotes": "Sambamba merge and view updated, set number of reserved threads." }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1476801719, "sbg:revision": 43, "sbg:revisionNotes": "Split BED file by interval" }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1476801720, "sbg:revision": 44, "sbg:revisionNotes": "Memory overhead for UG set to 300MB to start it the same time as Sambamba Merge." }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1476801722, "sbg:revision": 45, "sbg:revisionNotes": "Memory overhead set to 300 for IR and PR, 64 for RTC." }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1476801723, "sbg:revision": 46, "sbg:revisionNotes": "Scatter by intervals. Memory overhead set to 300 instead 512." }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1476965427, "sbg:revision": 47, "sbg:revisionNotes": "Added genome coverage and memory overhead adjusted" }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1477053187, "sbg:revision": 48, "sbg:revisionNotes": "Returned older revision: BQSR interval set to required. Prepare intervals from SBGTools" }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1480071801, "sbg:revision": 49, "sbg:revisionNotes": "SBG prepare intervals from Demo project. All tools updated." }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1480428471, "sbg:revision": 50, "sbg:revisionNotes": "port intervals renamed to previous name bqsr_intervals" }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1481648022, "sbg:revision": 51, "sbg:revisionNotes": "port intervals renamed to previous name bqsr_intervals" }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1481648025, "sbg:revision": 52, "sbg:revisionNotes": "FASTQC and BWA-MEM updated." }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1481648027, "sbg:revision": 53, "sbg:revisionNotes": "Quality adjuster updated." }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1499769510, "sbg:revision": 54, "sbg:revisionNotes": "Tool updates.\nChanged inputs to VQSR INDELS to Mills and dbSNP per recommendations on GATK best practices:\nhttps://software.broadinstitute.org/gatk/documentation/article.php?id=1259\n\nhttps://software.broadinstitute.org/gatk/documentation/article?id=2805" }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1499769513, "sbg:revision": 55, "sbg:revisionNotes": "Set mills known to false" }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1499769515, "sbg:revision": 56, "sbg:revisionNotes": "VQSR suggested updated" }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1499769518, "sbg:revision": 57, "sbg:revisionNotes": "Added workflow description. intervals -> bqsr_intervals" }, { "sbg:modifiedBy": "admin", "sbg:modifiedOn": 1499770338, "sbg:revision": 58, "sbg:revisionNotes": "Batch for fastqs" } ], "sbg:sbgMaintained": false, "sbg:toolAuthor": "Seven Bridges", "sbg:toolkit": "SBGTools", "sbg:toolkitVersion": "1.0", "sbg:validationErrors": [] }