Somatic Variant Refinement (SVR)
Post-processing of somatic variants identified by automated somatic variant callers is an important next step in pipelines. This process requires applying subjective filtering strategies (e.g. remove variants with <20x total coverage) with the option to perform further somatic variant refinement. Traditionally, somatic variant refinement required manually reviewing aligned sequencing reads using a genomic visualization tool (e.g. Integrative Genomics Viewer) to eliminate false positives from a candidate somatic variant list. This laborious process is expensive and time consuming. Given that skilled reviewers can evaluated ~70-100 variants per hour, it can take hundreds of hours for an individual to complete manual review for only a single genomic study.
Although manual review is an important process for somatic variant calling and is used in most genomic studies, methods defining this process are either briefly stated or omitted from many publications. Our group has presented a Manual Review Standard Operating Procedure to aid in systematizing this process, however, even with adequate training there exists extensive inter- and intra-lab variability.
To date, there are no publicly available methods that attempt to automate somatic variant refinement. Here, we present an automated method for somatic variant refinement that dramatically reduces the time and cost associated with post-processing of somatic variants. The model is built with a large data set of 41,000 variants derived from various tumor types (solid and liquid), and spanning multiple different sequencing pipelines including whole-genome-sequencing, whole-exome-sequencing, and custom-capture-sequencing. The model is tested on two sets of validation data and attains an accuracy that recapitulates the process of manual review. It is our hope that this method will improve the somatic variant refinement process and alleviate some discrepancy associated with existing genomic pipelines.
Chapter 1 - Background Information:
Authors | Citation | About | Repository Installation
Chapter 2 - Identification of Somatic Variants in Sequencing Data:
Automated Somatic Variant Calling | Somatic Variant Refinement (SVR)
Chapter 3 - Methods and Analysis for Machine Learning Models:
Data Assembly | Logistic Regression Model | Random Forest Model | Deep Learning Model | Model Evaluation | Inter-reviewer Variability | Orthogonal Validation | Manual Review Validation | Re-review Analysis
Chapter 4 - DeepSVR Tutorial:
Tutorial Preface | DeepSVR Installation | Create the Classifier | Prepare Data | Classify Data | Re-Train Model
Chapter 5 - Usage Documents:
DeepSVR Installation | Usage Documents