Varscan 2.3, workflow for calling SNVs and CVs Creation of mpileups and calling variants are done with parallel processing
java -jar cromwell.jar run varscan.wdl --inputs inputs.json
Parameter | Value | Description |
---|---|---|
inputTumor |
File | input .bam file for tumor sample |
inputNormal |
File | input .bam file for normal sample |
inputTumorIndex |
File | input .bai file for tumor sample |
inputNormalIndex |
File | input .bai file for normal sample |
reference |
String | Reference assembly id, hg19 hg38 or mm10 |
outputFileNamePrefix |
String | Output file(s) prefix |
Parameter | Value | Default | Description |
---|---|---|---|
bedIntervalsPath |
String | "" | Path to a .bed file used for splitting pileup job/limiting analysis to selected regions |
Parameter | Value | Default | Description |
---|---|---|---|
expandRegions.modules |
String | "hg38-dac-exclusion/1.0" | required modules (This is to allow modularized data for bed path) |
expandRegions.jobMemory |
Int | 4 | Memory for this task in GB |
expandRegions.timeout |
Int | 12 | Timeout in hours, needed to override imposed limits |
getChrCoefficient.memory |
Int | 1 | Memory allocated for this job |
getChrCoefficient.timeout |
Int | 1 | Hours before task timeout |
makePileups.samtools |
String | "$SAMTOOLS_ROOT/bin/samtools" | path to samtools |
makePileups.jobMemory |
Int | 18 | memory for this job, in Gb |
makePileups.minMemory |
Int | 6 | Minimal amount of memory to assign to the task |
makePileups.timeout |
Int | 40 | Timeout in hours, needed to override imposed limits |
runVarscanCNV.pValue |
Float | 0.05 | p-value for cnv calling, default is 0.05 |
runVarscanCNV.jobMemory |
Int | 20 | Memory in Gb for this job |
runVarscanCNV.minMemory |
Int | 4 | A minimum amount of memory allocated to the task, overrides the scaled RAM setting |
runVarscanCNV.javaMemory |
Int | 6 | Memory in Gb for Java |
runVarscanCNV.logFile |
String | "VARSCAN_CNV.log" | File for logging Varscan messages |
runVarscanCNV.varScan |
String | "$VARSCAN_ROOT/VarScan.jar" | path to varscan .jar file |
runVarscanCNV.modules |
String | "varscan/2.4.2 java/8" | Names and versions of modules |
runVarscanCNV.timeout |
Int | 40 | Timeout in hours, needed to override imposed limits |
getSnvNative.pValue |
Float | 0.05 | somatic p-value for SNV calling, default is 0.05 |
getSnvNative.jobMemory |
Int | 20 | Memory in Gb for this job |
getSnvNative.minMemory |
Int | 4 | A minimum amount of memory allocated to the task, overrides the scaled RAM setting |
getSnvNative.javaMemory |
Int | 6 | Memory in Gb for Java |
getSnvNative.minCoverage |
Int | 8 | Minimum coverage in normal and tumor to call variant [8] |
getSnvNative.minCoverageNormal |
Int | 8 | Minimum coverage in normal to call somatic [8] |
getSnvNative.minCoverageTumor |
Int | 6 | Minimum coverage in tumor to call somatic [6] |
getSnvNative.minVarFreq |
Float | 0.1 | Minimum variant frequency to call a heterozygote [0.10] |
getSnvNative.minFreqForHom |
Float | 0.75 | Minimum frequency to call homozygote [0.75] |
getSnvNative.normalPurity |
Float | 1.0 | Estimated purity (non-tumor content) of normal sample [1.00] |
getSnvNative.tumorPurity |
Float | 1.0 | Estimated purity (tumor content) of normal sample [1.00] |
getSnvNative.pValueHet |
Float | 0.99 | p-value threshold to call a heterozygote [0.99] |
getSnvNative.strandFilter |
Int | 0 | If set to 1, removes variants with >90% strand bias |
getSnvNative.validation |
Int | 0 | If set to 1, outputs all compared positions even if non-variant |
getSnvNative.outputVcf |
Int | 0 | Flag that when set to 1 indicates that we need results in vcf format |
getSnvNative.logFile |
String | "VARSCAN_SNV.log" | File for logging Varscan messages |
getSnvNative.varScan |
String | "$VARSCAN_ROOT/VarScan.jar" | path to varscan .jar file |
getSnvNative.modules |
String | "varscan/2.4.2 java/8" | Names and versions of modules |
getSnvNative.timeout |
Int | 40 | Timeout in hours, needed to override imposed limits |
getSnvVcf.pValue |
Float | 0.05 | somatic p-value for SNV calling, default is 0.05 |
getSnvVcf.jobMemory |
Int | 20 | Memory in Gb for this job |
getSnvVcf.minMemory |
Int | 4 | A minimum amount of memory allocated to the task, overrides the scaled RAM setting |
getSnvVcf.javaMemory |
Int | 6 | Memory in Gb for Java |
getSnvVcf.minCoverage |
Int | 8 | Minimum coverage in normal and tumor to call variant [8] |
getSnvVcf.minCoverageNormal |
Int | 8 | Minimum coverage in normal to call somatic [8] |
getSnvVcf.minCoverageTumor |
Int | 6 | Minimum coverage in tumor to call somatic [6] |
getSnvVcf.minVarFreq |
Float | 0.1 | Minimum variant frequency to call a heterozygote [0.10] |
getSnvVcf.minFreqForHom |
Float | 0.75 | Minimum frequency to call homozygote [0.75] |
getSnvVcf.normalPurity |
Float | 1.0 | Estimated purity (non-tumor content) of normal sample [1.00] |
getSnvVcf.tumorPurity |
Float | 1.0 | Estimated purity (tumor content) of normal sample [1.00] |
getSnvVcf.pValueHet |
Float | 0.99 | p-value threshold to call a heterozygote [0.99] |
getSnvVcf.strandFilter |
Int | 0 | If set to 1, removes variants with >90% strand bias |
getSnvVcf.validation |
Int | 0 | If set to 1, outputs all compared positions even if non-variant |
getSnvVcf.logFile |
String | "VARSCAN_SNV.log" | File for logging Varscan messages |
getSnvVcf.varScan |
String | "$VARSCAN_ROOT/VarScan.jar" | path to varscan .jar file |
getSnvVcf.modules |
String | "varscan/2.4.2 java/8" | Names and versions of modules |
getSnvVcf.timeout |
Int | 40 | Timeout in hours, needed to override imposed limits |
mergeCNV.jobMemory |
Int | 6 | memory in GB for this job |
mergeCNV.timeout |
Int | 10 | Timeout in hours, needed to override imposed limits |
mergeSNP.jobMemory |
Int | 6 | memory in GB for this job |
mergeSNP.timeout |
Int | 10 | Timeout in hours, needed to override imposed limits |
mergeIND.jobMemory |
Int | 6 | memory in GB for this job |
mergeIND.timeout |
Int | 10 | Timeout in hours, needed to override imposed limits |
mergeSNPvcf.jobMemory |
Int | 12 | memory in GB for this job |
mergeSNPvcf.javaMemory |
Int | 8 | memory in GB for java VM |
mergeSNPvcf.timeout |
Int | 10 | Timeout in hours, needed to override imposed limits |
mergeINDvcf.jobMemory |
Int | 12 | memory in GB for this job |
mergeINDvcf.javaMemory |
Int | 8 | memory in GB for java VM |
mergeINDvcf.timeout |
Int | 10 | Timeout in hours, needed to override imposed limits |
smoothData.varScan |
String | "$VARSCAN_ROOT/VarScan.jar" | Path to VarScan jar file |
smoothData.modules |
String | "varscan/2.4.2 java/8 rstats/3.6" | Modules for this job |
smoothData.min_coverage |
Int | 20 | Fine-tuning parameter for VarScan |
smoothData.max_homdel_coverage |
Int | 5 | Max coverage form homozygous deletion, default is 5 |
smoothData.min_tumor_coverage |
Int | 10 | Min coverage in tumor sample, default is 10 |
smoothData.del_threshold |
Float | 0.25 | Fine-tuning parameter for VarScan |
smoothData.amp_threshold |
Float | 0.25 | Amplification threshold to report, default is 0.25 |
smoothData.min_region_size |
Int | 10 | Fine-tuning parameter for VarScan |
smoothData.recenter_up |
Int | 0 | Fine-tuning parameter for VarScan |
smoothData.recenter_down |
Int | 0 | Fine-tuning parameter for VarScan |
smoothData.jobMemory |
Int | 16 | Memory in Gb for this job |
smoothData.javaMemory |
Int | 6 | Memory in Gb for Java |
Output | Type | Description | Labels |
---|---|---|---|
resultCnvFile |
File? | file with CNV calls, smoothed | vidarr_label: resultCnvFile |
resultSnpFile |
File | file with SNPs, native varscan format | vidarr_label: resultSnpFile |
resultIndelFile |
File | file with Indel calls, native varscan format | vidarr_label: resultIndelFile |
resultSnpVcfFile |
File | file with SNPs, vcf format | vidarr_label: resultSnpVcfFile |
resultIndelVcfFile |
File | file with Indels, vcf format | vidarr_label: resultIndelVcfFile |
This section lists command(s) run by varscan workflow
CHROM=$(echo ~{region} | sed 's/:.*//')
grep -w SN:$CHROM ~{refDict} | cut -f 3 | sed 's/.*://' | awk '{print int(($1/~{largestChrom} + 0.1) * 10)/10}'
bed file re-format to be used with scattered pileup creation. Note that it should be a resonable ( <100 perhaps? ) intervals so that we do not end up with a million jobs running. Use wisely, as it may result in grabbing a lot of compute nodes.
In this embedded script we reformat bed lines into varscan-friendly intervals
import os
if os.path.exists("~{bedPath}"):
with open("~{bedPath}") as f:
for line in f:
line = line.rstrip()
tmp = line.split("\t")
r = " " + tmp[0] + ":" + tmp[1] + "-" + tmp[2]
print(r)
f.close()
samtools mpileup -q 1 -r REGION -f REF_FASTA INPUT_NORMAL INPUT_TUMOR | awk -F "\t" '$4 > 0 && $7 > 0' | gzip -c > normtumor_sorted.pileup.gz
head -n 1 ~{filePaths[0]} > "~{outputFile}.~{outputExtension}"
cat ~{sep=' ' filePaths} | sort -V -k 1,2 | grep -v ^chrom | grep -v ^chrM >> "~{outputFile}.~{outputExtension}"
cat ~{sep=' ' filePaths} | awk '{if($1 == "chrM"){print $0}}' | sort -V -k 1,2 >> "~{outputFile}.~{outputExtension}"
if [ ! -s ~{outputFile}.~{outputExtension} ] ; then
rm ~{outputFile}.~{outputExtension}
fi
java -Xmx[MEMORY]G -jar picard.jar SortVcf I=INPUT_VCFS SD=SEQ_DICTIONARY O=OUTPUT_FILE.SUFFIX.vcf
See the full source code in .wdl, here we run this command:
zcat INPUT_PILEUP | java -Xmx[MEMORY]G -jar varscan somatic -mpileup 1
--somatic-p-value P_VALUE
Optional parameters:
--min-coverage-normal MIN_COVERAGE_NORMAL
--min-coverage-tumor MIN_COVERAGE_TUMOR
--min-var-freq MIN_VAR_FREQUENCY
--min-freq-for-hom MIN_FREQUENCY_FOR_HOM
--normal-purity NORMAL_PURITY
--tumor-purity TUMOR_PURIY
--p-value P_VALUE_HET
--strand-filter 1
--validation 1
--output-vcf 1
Settings for output format:
--output-snp SAMPLE_ID.snp.vcf --output-indel SAMPLE_ID.indel.vcf
or:
--output-snp SAMPLE_ID.snp --output-indel SAMPLE_ID.indel
A python code configures and runs this command:
zcat ~{inputPileup} | java -Xmx~{javaMemory}G -jar " + varscan + " copynumber --output-file ~{sampleID} -mpileup 1 --p-value ~{pValue}"
Varscan reports if the coverage threshold was sufficient for the analysis. We use this coverage setting in the next step
A python code configures and runs this command:
java -Xmx[MEMORY]G -jar varscan copyCaller CV_FILE
--output-file SAMPLE_ID.copynumber.filtered
--min-coverage MIN_COVERAGE # Value found in the previous step
Optional parameters:
--min-tumor-coverage MIN_TUMOR_COVERAGE
--max-homdel-coverage MAX_HOMDEL_COVERAGE
--del-threshold DEL_THRESHOLD
--amp-threshold AMP_THRESHOLD
--min-region-size MIN_REGION_SIZE
--recenter-up RECENTER_UP
--recenter-down RECENTER_DOWN
For support, please file an issue on the Github project or send an email to gsi@oicr.on.ca .
Generated with generate-markdown-readme (https://github.com/oicr-gsi/gsi-wdl-tools/)