-
Notifications
You must be signed in to change notification settings - Fork 0
/
commands.txt
122 lines (81 loc) · 3.41 KB
/
commands.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
## Commands
This section lists command(s) run by varscan workflow
* Produce a scaling coefficient for allocating RAM
```
CHROM=$(echo ~{region} | sed 's/:.*//')
grep -w SN:$CHROM ~{refDict} | cut -f 3 | sed 's/.*://' | awk '{print int(($1/~{largestChrom} + 0.1) * 10)/10}'
```
* Preprocessing
bed file re-format to be used with scattered pileup creation. Note that it should be a resonable ( <100 perhaps? ) intervals
so that we do not end up with a million jobs running. Use wisely, as it may result in grabbing a lot of compute nodes.
```
In this embedded script we reformat bed lines into varscan-friendly intervals
import os
if os.path.exists("~{bedPath}"):
with open("~{bedPath}") as f:
for line in f:
line = line.rstrip()
tmp = line.split("\t")
r = " " + tmp[0] + ":" + tmp[1] + "-" + tmp[2]
print(r)
f.close()
```
* Run samtools mpileup
```
samtools mpileup -q 1 -r REGION -f REF_FASTA INPUT_NORMAL INPUT_TUMOR | awk -F "\t" '$4 > 0 && $7 > 0' | gzip -c > normtumor_sorted.pileup.gz
```
* Remove mitochondrial chromosome:
```
head -n 1 ~{filePaths[0]} > "~{outputFile}.~{outputExtension}"
cat ~{sep=' ' filePaths} | sort -V -k 1,2 | grep -v ^chrom | grep -v ^chrM >> "~{outputFile}.~{outputExtension}"
cat ~{sep=' ' filePaths} | awk '{if($1 == "chrM"){print $0}}' | sort -V -k 1,2 >> "~{outputFile}.~{outputExtension}"
if [ ! -s ~{outputFile}.~{outputExtension} ] ; then
rm ~{outputFile}.~{outputExtension}
fi
```
* Sort vcf using sequence dictionary
```
java -Xmx[MEMORY]G -jar picard.jar SortVcf I=INPUT_VCFS SD=SEQ_DICTIONARY O=OUTPUT_FILE.SUFFIX.vcf
```
* SNP/Indel Calling:
```
See the full source code in .wdl, here we run this command:
zcat INPUT_PILEUP | java -Xmx[MEMORY]G -jar varscan somatic -mpileup 1
--somatic-p-value P_VALUE
Optional parameters:
--min-coverage-normal MIN_COVERAGE_NORMAL
--min-coverage-tumor MIN_COVERAGE_TUMOR
--min-var-freq MIN_VAR_FREQUENCY
--min-freq-for-hom MIN_FREQUENCY_FOR_HOM
--normal-purity NORMAL_PURITY
--tumor-purity TUMOR_PURIY
--p-value P_VALUE_HET
--strand-filter 1
--validation 1
--output-vcf 1
Settings for output format:
--output-snp SAMPLE_ID.snp.vcf --output-indel SAMPLE_ID.indel.vcf
or:
--output-snp SAMPLE_ID.snp --output-indel SAMPLE_ID.indel
```
* Find minimum coverage threshold for CV analysis:
```
A python code configures and runs this command:
zcat ~{inputPileup} | java -Xmx~{javaMemory}G -jar " + varscan + " copynumber --output-file ~{sampleID} -mpileup 1 --p-value ~{pValue}"
Varscan reports if the coverage threshold was sufficient for the analysis. We use this coverage setting in the next step
```
### Run copy number change analysis:
```
A python code configures and runs this command:
java -Xmx[MEMORY]G -jar varscan copyCaller CV_FILE
--output-file SAMPLE_ID.copynumber.filtered
--min-coverage MIN_COVERAGE # Value found in the previous step
Optional parameters:
--min-tumor-coverage MIN_TUMOR_COVERAGE
--max-homdel-coverage MAX_HOMDEL_COVERAGE
--del-threshold DEL_THRESHOLD
--amp-threshold AMP_THRESHOLD
--min-region-size MIN_REGION_SIZE
--recenter-up RECENTER_UP
--recenter-down RECENTER_DOWN
```