Skip to content

Commit

Permalink
Clarify language
Browse files Browse the repository at this point in the history
  • Loading branch information
adthrasher committed Jan 23, 2024
1 parent 43273f4 commit 7334016
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions text/data_versioning.md
Expand Up @@ -172,10 +172,10 @@ We will also run without the GIAB high-confidence callset.
~{new version VCF} \
-o ~{report_name} \
-r ~{reference_fa} \
-T ~{gencode_CDS_bed} \
-T ~{gencode_CDS_bed}
```

Using a confident call set to restrict the comparison causes variants outside that region (in either the TRUTH or QUERY sample) to be excluded and marked as unknown (`UNK`). As can be seen below, when comparing the first release of our RNA-Seq workflow (v2.0.0) with the latest stable release (v3.0.1), there are a significant number of additional variants found in the new version. This suggests that our decision to increment the major revision was correct. Ultimately, most of those new variants are filtered out with standard quality filters applied during the variant calling pipeline. When looking at the result with the GIAB confident call set applied, there are a large number of variants (1939 SNPs that passed filtering for HG004 below) included. Upon further investigation, there are only 16 SNPs passing filtering that are not called in v2.0.0. The remaining variants are called in both versions, but labeled as `UNK` due to being outside the confident call regions (that is, not found in WGS).
We will run both because using a confident call set to restrict the comparison causes variants outside that region (in either the TRUTH or QUERY sample) to be excluded and marked as unknown (`UNK`). As can be seen below, when comparing the first release of our RNA-Seq workflow (v2.0.0) with the latest stable release (v3.0.1), there are a significant number of additional variants found in the new version. This suggests that our decision to increment the major revision was correct. Ultimately, most of those new variants are filtered out with standard quality filters applied during the variant calling pipeline. When looking at the result with the GIAB confident call set applied, there are a large number of variants (1939 SNPs that passed filtering for HG004 below) included and marked `UNK`. Upon further investigation, there are only 16 SNPs passing filtering that are not called in v2.0.0. The remaining variants are called in both versions, but labeled as `UNK` due to being outside the confident call regions (that is, not found in WGS).

#### HG002 v2.0.0 vs. v3.0.1 with GIAB confidence set
| Type | Filter | TRUTH.TOTAL | TRUTH.TP | TRUTH.FN | QUERY.TOTAL | QUERY.FP | QUERY.UNK | FP.gt | FP.al | METRIC.Recall | METRIC.Precision | METRIC.Frac_NA | METRIC.F1_Score | TRUTH.TOTAL.TiTv_ratio | QUERY.TOTAL.TiTv_ratio | TRUTH.TOTAL.het_hom_ratio | QUERY.TOTAL.het_hom_ratio |
Expand Down

0 comments on commit 7334016

Please sign in to comment.