Skip to content

SMuRF 2.0

Huang Weitai edited this page Jul 30, 2020 · 11 revisions

Introducing SMuRF 2.0

Machine learning-based R package for consensus somatic mutation prediction

STRELKA2 - MUTECT2 - FREEBAYES - VARDICT - VARSCAN


Introduction

SMuRF 2.0 incorporates the output of 5 somatic variant callers from the latest bcbio-nextgen 1.1.5 pipeline. New in SMuRF 2.0, Strelka2 is added to provide higher sensitivity and overall accuracy of somatic mutation calls. Users without access to Strelka2 outputs may download SMuRF v1.6.4 here.

Upgrades

  • Compatible with bcbio-nextgen 1.1.5+, leveraging the newest GATK4 for more efficient data processing.

  • Added support for Strelka2 (Kim et al. Nat Methods 2018) a new and variant caller available in the bcbio pipeline.

  • SMuRF 2.0 requires an updated version of the h2o package (3.26.0.7 and above) to properly load the pre-trained SNV and indel models. Productionizing of the model reduces the dependency on specific h2o versions.

Improved somatic mutation calling

  • SMuRF's pre-trained models are trained using cross-validation, with performance estimated on multiple independent test sets.

  • GATK4 improves recall in both SNV and indel prediction (v1.6:GATK3, v2.0-4caller:GATK4).

  • New sequence and variant features from Strelka2 are incorporated into SMuRF's training models. These features improve the sensitivity of SMuRF 2.0 (v2.0-4caller, v2.0-5caller)

Fig.1

Performance of SMuRF 2.0. Precision-recall profiles for individual somatic mutation callers and SMuRF evaluated on SNV and indels using 20% withheld test data. SMuRF models were trained on 80% matched tumour-normal WGS data from a chronic lymphocytic leukemia (CLL) patient and a medulloblastoma (MB) patient (Alioto et al., Nat. Comms 2015). To generate variations in sequencing coverage and tumour purity, noise was injected into the training and testing data; 1: normal samples down-sampled to 30x coverage, 2&3: lowered purity tumour samples spiked in with either 20x or 40x normal reads respectively (70% and 55% purity).

DREAM Benchmarks

Fig.2

Legend: SMuRF-2.0 (red), SMuRF-v1.6 (blue)