Skip to content

Collection of rules used to calculate biomarkers like MSI, TMB and HRD.

License

Notifications You must be signed in to change notification settings

hydra-genetics/biomarker

Repository files navigation

https://hydra-genetics-biomarker.readthedocs.io

hydra-genetics/biomarker

Snakemake module containing processing steps used to generate different kind of biomarkers

Lint Snakefmt snakemake dry run integration test pycodestyle

License: GPL-3

💬 Introduction

The module consists of rules used to generate biomarkers. Currenlty available biomarkers are:

  • HLA-typing (still under development)
  • HRD (homologous recombination deficiency) (experimental, for conda environment see disclaimer)
  • TMB (tumor mutational burden)
  • Msi (microsatellite instability)

❗ Dependencies

In order to use this module, the following dependencies are required:

hydra-genetics pandas python snakemake singularity drmaa tabulate

🎒 Preparations

Sample and unit data

Input data should be added to samples.tsv and units.tsv. The following information need to be added to these files:

Column Id Description
samples.tsv
sample unique sample/patient id, one per row
tumor_content ratio of tumor cells to total cells
units.tsv
sample same sample/patient id as in samples.tsv
type data type identifier (one letter), can be one of Tumor, Normal, RNA
platform type of sequencing platform, e.g. NovaSeq
machine specific machine id, e.g. NovaSeq instruments have @Axxxxx
flowcell identifer of flowcell used
lane flowcell lane number
barcode sequence library barcode/index, connect forward and reverse indices by +, e.g. ATGC+ATGC
fastq1/2 absolute path to forward and reverse reads
adapter adapter sequences to be trimmed, separated by comma

Reference data

Msi

A panel of normal created by running MsiSensor-pro on a number of normal samples. Can be created with the help of the hydragenetics/references module.

TMB

Optional: A panel specific artifact list as well as a position specific background noise level panel. Can be created with the help of the hydragenetics/references module.

✅ Testing

The workflow repository contains a small test dataset .tests/integration which can be run like so:

$ cd .tests/integration
$ snakemake -s ../../Snakefile -j1 --configfile config.yaml --use-singularity

🚀 Usage

To use this module in your workflow, follow the description in the snakemake docs. Add the module to your Snakefile like so:

module biomarker:
    snakefile:
        github(
            "hydra-genetics/biomarker",
            path="workflow/Snakefile",
            tag="v0.1.0",
        )
    config:
        config


use rule * from biomarker as biomarker_*

Compatibility

Latest:

  • alignment:v0.3.1
  • annotation:v0.3.0
  • cnv_sv:v0.3.0
  • prealignment:v1.0.0

See COMPATIBLITY.md file for a complete list of module compatibility.

Input files

File Description
hydra-genetics/alignment data
alignment/samtools_merge_bam/{sample}_{type}.bam aligned reads
alignment/samtools_merge_bam/{sample}_{type}.bam.bai index file for alignment
hydra-genetics/annotation
annotation/background_annotation/{sample}_{type}.background_annotation.vcf.gz annotated vcf
hydra-genetics/cnv_sv data
cnv_sv/cnvkit_call/{sample}_{type}.loh.cns cnvkit segmentation results
hydra-genetics/prealignment
prealignment/merged/{sample}_{type}_fastq1.fastq.gz merged and trimmed reads
prealignment/merged/{sample}_{type}_fastq2.fastq.gz merged and trimmed reads

Output files

The following output files should be targeted via another rule:

File Description
biomarker/scarhrd/{sample}_{type}.{tc_method}.scarhrd_cnvkit_score.txt calculated HRD score based on cnvkit and scarHRD (experimental)
biomarker/msisensor_pro/{sample}_{type} msi score
biomarker/tmb/{sample}_{type}.TMB.txt tmb score and variants used

🧑‍⚖️ Rule Graph

Biomarker

rule_graph