Skip to content

HG002 benchmarking

Darren J. Lin edited this page Apr 10, 2023 · 1 revision

Purpose

Procedures of using HG002 SV groudtruth set for recall and precision measurement.

HG002 benchmarking

Post-processing

Example of create compressed and indexed VCF file for further evaluation.

cat HG002.pav.flye.vcf| awk '$1 ~ /^#/ {print $0;next} {print $0 | "sort -k1,1 -k2,2n"}' | bgzip -c > ./pav.flye.sorted.vcf.gz
tabix ./pav.flye.sorted.vcf.gz

NOTE:

  1. For PAV output, the header for SVLEN should be changed to <Number=1,Type=Integer>
  2. For SVision output, the Covered in the filter column should be replaced with PASS

Truvari evaluation

NOTE: We do not consider genotype accuracy for benchmarking and the Truvari version was v3.0.0.

For SVs at true INS/DEL regions (download link)

truvari bench -f ./hs37d5.fa -b ./HG002_SVs_Tier1_v0.6.vcf.gz --includebed ./HG002_SVs_Tier1_v0.6.bed --passonly --giabreport -r 1000 -p 0.00 -c /pav.flye.sorted.vcf.gz -o pav_flye

For SV at CMRGs (download link)

truvari bench -f ./hs37d5.fa -b ./HG002_GRCh37_CMRG_SV_v1.00.vcf.gz --includebed ./HG002_GRCh37_CMRG_SV_v1.00.bed --passonly --giabreport -r 1000 -p 0.00 -c /pav.flye.sorted.vcf.gz -o pav_flye