In this release:
-
Small model integration: Speed increased by ~1.7x (40% runtime reduction) for WGS, PacBio, and ONT by introduction of additional small model. The small model identifies easy-to-call sites and invokes the standard DeepVariant model for harder sites. We observe similar or improved accuracies and confidence calibration with this combination. Use of the small model can be disabled with
--disable_small_model=true
option. For details, please see small model details doc. -
Pangenome-aware variant calling: Added a new ability to directly use information from a pangenome in the process of variant calling. This improves accuracy with both BAMs mapped with standard BWA and with BAMs using vg-Giraffe to a pangenome. Error reduction is ~30% with vg-Giraffe mapped WGS, 10% with BWA-mapped WGS, and 5% for BWA-mapped WES. See details in metrics page.
-
Configure a fast pipeline: Optional mode to increase efficiency for high-throughput GPU implementations. Configurations which pipeline example generation with GPU-based variant calling to increase utilization of GPU resources. See case study for details.
-
Introduced new Mas-Seq models for variant calling with Kinnex kits/Mas-Seq data. See case study for details.
-
PacBio models are now trained with labels from the Platinum Pedigree, which reduces errors by 34% on this more comprehensive truth set including very difficult parts of the genome.
-
Added SPRQ data to PacBio training datasets, improving accuracy for SPRQ chemistry. Updated the PacBio case study data to 2024 SPRQ release. Reduced error on SPRQ chemistry by 27% percent relative to DeepVariant v1.6. Updating to DeepVariant v1.8 is recommended for SPRQ.
-
Updated how model file metadata is specified, to accommodate more flexible ways of specifying channels. Custom models now require an accompanying example_info.json file containing the image shape details generated during training image generation in make_examples and call_variants stage. An example use of custom model is T7 cas-study where you can see
example_info.json
file is downloaded in this section to successfully run DeepVariant.
We are thankful for the contributions from:
- Mobin Asri (@mobinasri) and Juan Carlos Mier (@jmier2) on pangenome-aware DeepVariant work.
- Ralf W. Grosse-Kunstleve (@rwgk) for helping to migrate from CLIF to pybind.
- Shiyi Yin (@yinshiyi) for Mas-Seq model work.
- Maya Venkatraman (@mv2731) for helping to explore model architectures.
- Ben Soudry (@ben-soudry) for helping to streamline channel inputs.
- Atilla Kiraly (@akiraly1) and Yuchen Zhou (@Yuchen-95) on explainability work.
- Jorge Gonzalez Mendez (@jgonzalezmendez) on improving the C++ code quality.
- Stephanie Steele (@stesteele) for helping migrate python code to C++.