-
Notifications
You must be signed in to change notification settings - Fork 0
HG002 benchmarking
Darren J. Lin edited this page Apr 10, 2023
·
1 revision
Procedures of using HG002 SV groudtruth set for recall and precision measurement.
Example of create compressed and indexed VCF file for further evaluation.
cat HG002.pav.flye.vcf| awk '$1 ~ /^#/ {print $0;next} {print $0 | "sort -k1,1 -k2,2n"}' | bgzip -c > ./pav.flye.sorted.vcf.gz
tabix ./pav.flye.sorted.vcf.gz
NOTE:
- For PAV output, the header for SVLEN should be changed to
<Number=1,Type=Integer>
- For SVision output, the
Covered
in the filter column should be replaced withPASS
NOTE: We do not consider genotype accuracy for benchmarking and the Truvari version was v3.0.0.
For SVs at true INS/DEL regions (download link)
truvari bench -f ./hs37d5.fa -b ./HG002_SVs_Tier1_v0.6.vcf.gz --includebed ./HG002_SVs_Tier1_v0.6.bed --passonly --giabreport -r 1000 -p 0.00 -c /pav.flye.sorted.vcf.gz -o pav_flye
For SV at CMRGs (download link)
truvari bench -f ./hs37d5.fa -b ./HG002_GRCh37_CMRG_SV_v1.00.vcf.gz --includebed ./HG002_GRCh37_CMRG_SV_v1.00.bed --passonly --giabreport -r 1000 -p 0.00 -c /pav.flye.sorted.vcf.gz -o pav_flye