@kduyvesteyn kduyvesteyn released this Jan 4, 2019 · 6 commits to master since this release

Summary

  • BAM links are generated in the final links.json also when running from fastq (bug fix)
  • An additional config and parameterisation is added to run purple in SHALLOW_MODE
Assets 2

@kduyvesteyn kduyvesteyn released this Dec 27, 2018 · 10 commits to master since this release

Summary

  • Run post stats prior to indel realignment (bug fix)
Assets 2
Dec 27, 2018

v4.6

New pipeline release

@kduyvesteyn kduyvesteyn released this Dec 20, 2018 · 12 commits to master since this release

Summary
This release has been made in preparation for pipeline v5 which is build on a completely new architecture and infrastructure. This release only contains some cleanups and bug fixes compare to v4.4.

Various resources and JARs used by the pipeline can be found on https://resources.hartwigmedicalfoundation.nl.

Improvements

  • Added a GRIDSS somatic filter step which filters down GRIDSS raw output into filtered VCF (using GRIDSS pon)
  • GRIDSS filtered vcf is fed into purple which uses the structural variants as-usual but also tries to recover structural variants which were not previously called.

Cleanups

  • We generated a new amber BAF BED file to filter for likely heterozygous germline positions. This new BED file effectively leads to more BAF points, plus this file is now publicly shared on our resources page.
  • Manta and BPI have been removed
  • FastQC has been removed
  • The mappability tracks HDR file (used to annotate somatic variants with a mappability score) has been changed (bug fix).

Version changes

  • Purple to v2.17
  • New Rlibs dependencies (mainly for GRIDSS somatic filter), not publicly available. Tested on Rscript version v3.5.0

Somatic precision & sensitivity

The somatic precision and sensitivity of SNVs and Indels is determined on an internally sequenced GIAB-mix of 70% NA24385 and 30% NA12878 against 100% of NA24385 as reference sample. Results are identical to pipeline v4.0:

Type Algo TP FP FN Prec Sens Δ Prec Δ Sens
INDEL Strelka 74360 641 22412 99,1% 76,8% 0% 0%
SNV Strelka 955590 1253 38084 99,9% 96,2% 0% 0%
MNV Strelka 6868 21 0 99,7% 100,0% 0% 0%
Assets 2
Dec 12, 2018
Release candidate for pipeline v4.5

@kduyvesteyn kduyvesteyn released this Oct 18, 2018 · 50 commits to master since this release

Upgrade to GRIDSS to v2.0.1

See also https://github.com/PapenfussLab/gridss/

Assets 2

@kduyvesteyn kduyvesteyn released this Sep 6, 2018 · 59 commits to master since this release

  • Configuration changes in GRIDSS compared to pipeline v4.2
Assets 2

@kduyvesteyn kduyvesteyn released this Aug 28, 2018 · 64 commits to master since this release

Summary

This pipeline upgrades GRIDSS from v1.8.0 to v1.9.0 compared to v4.0
Various improvements to the GRIDSS somatic SV calling algorithm have made been made based on 163 GRIDSS runs done with pipeline v4.0, and have been released as part of GRIDSS v1.9.0.

See also https://github.com/PapenfussLab/gridss/

Other changes

  • We retain the metrics generated by the GRIDSS PreProcess steps. These metrics used to be cleaned up after a successful v4.0 run but can be useful for debugging.
  • BPI is upgraded from v1.6 to v1.7 (bug fix release)
  • Amber is upgraded from v1.5 to v1.6 (bug fix release)
Assets 2
Jul 29, 2018

v4.1

Bug fix release (manta bug fixed)

@kduyvesteyn kduyvesteyn released this Jul 22, 2018 · 91 commits to master since this release

Summary
Many minor changes to all somatic algorithms plus addition of GRIDSS structural variant caller.
Removal of KG pipeline and removal of tumor GATK calling.

Various resources and JARs used by the pipeline can be found on https://resources.hartwigmedicalfoundation.nl.

Improvements to somatic SNV / Indel calling

  • To improve sensitivity, variants on known pathogenic locations are retained all the way through Strelka if they are called by the initial Strelka (raw) caller. The list used by HMF can be found on the resources page and is based on CiViC, CGI and OncoKB, appended with a few promotor positions in TERT gene.
  • Post-strelka, variants are annotated with a mapping probability based on information known about the mappability of positions in the ref genome.
  • Switched from Germline PON v1.1 to Germline PON v2.0
  • Added a Somatic PON which filters out specific Strelka artefacts.
  • Added MNV merging. Variants that potentially affect the same codon(s) are checked for phasing and merged if they are phased. This is done within the Strelka Post Process JAR.
  • Cosmic annotation has been adjusted such that the COSMIC ID for every transcript affected by a variant is included, not just a random single COSMIC ID. Information is provided in the INFO to pick the COSMIC ID for a specific transcript.

Added GRIDSS as an additional somatic structural variant caller

  • GRIDSS is implemented next to Manta/BPI and our intention is to eventually replace Manta/BPI since we expect it to perform better across our cohort of samples. All documentation on GRIDSS can be found on https://github.com/PapenfussLab/gridss.

Other changes

  • Germline calling is now only performed on the reference sample and hence the germline VCF contains the calls for just one sample.
  • Every final VCF (germline, somatic, sv, etc) is gzipped and a tabix index is provided along with the gzipped VCF.
  • The kinship test to detect sample swaps is replaced by a test based on BAF scores. The main reason is that kinship penalises het-to-hom transitions, which happen in relation to the degree of LOH. Using BAFs, we can detect sample swaps by observing a mean BAF that significantly deviates from 0.5, which is independent of degree of LOH in the tumor.
  • The QC checks are now run as part of the pipeline while they previously used to be a post-pipeline step.
  • KG configuration is no longer supported, but there is an INI to analyse just a single sample. This ini runs the algorithms that would normally be run on the reference sample of a somatic pair of samples.

New tool versions

  • GRIDSS introduced at version v1.8.0 (using bwa v0.7.17)

Version changes

  • Purple v1.2 to v2.14
  • Cobalt v1.0 to v1.4
  • Amber v1.0 to v1.5
  • BPI v1.2 to v1.6
  • Strelka Post Process v1.0 to v1.4
  • HealthChecker v2.1 to v2.4
  • GATK v3.4.46 to v3.8
  • snpEff v4.1h to v4.3s

Quality

Since we don't have a KG pipeline anymore we don't report germline precision and sensitivity.

The somatic precision and sensitivity of SNVs and Indels is determined on an internally sequenced GIAB-mix of 70% NA24385 and 30% NA12878 against 100% of NA24385 as reference sample. Results are as follows:

Somatic precision & sensitivity

Type Algo TP FP FN Prec Sens Δ Prec Δ Sens
INDEL Strelka 74360 641 22412 99,1% 76,8% -0.1% -0.2%
SNV Strelka 955590 1253 38084 99,9% 96,2% 0% 0%
MNV Strelka 6868 21 0 99,7% 100,0% - -
  • Note: The differences between v3 are entirely attributed to changes we made in the way we measure the above numbers. Running the same method between v3 and v4 yields no differences which is as-expected since we made no changes that significantly affects either sensitivity or precision.

In addition, to measure exact false positive rate, we analyse a sample against itself in roughly 30x/100x coverage. With pipeline v4.0 release we find 136 false positives in total across the whole genome (109 SNVs and 27 INDELs).

Assets 2