https://hydra-genetics-biomarker.readthedocs.io
Snakemake module containing processing steps used to generate different kind of biomarkers
The module consists of rules used to generate biomarkers. Currenlty available biomarkers are:
HLA-typing(still under development)- HRD (homologous recombination deficiency) (experimental, for conda environment see disclaimer)
- TMB (tumor mutational burden)
- Msi (microsatellite instability)
In order to use this module, the following dependencies are required:
Input data should be added to samples.tsv
and units.tsv
.
The following information need to be added to these files:
Column Id | Description |
---|---|
samples.tsv |
|
sample | unique sample/patient id, one per row |
tumor_content | ratio of tumor cells to total cells |
units.tsv |
|
sample | same sample/patient id as in samples.tsv |
type | data type identifier (one letter), can be one of Tumor, Normal, RNA |
platform | type of sequencing platform, e.g. NovaSeq |
machine | specific machine id, e.g. NovaSeq instruments have @Axxxxx |
flowcell | identifer of flowcell used |
lane | flowcell lane number |
barcode | sequence library barcode/index, connect forward and reverse indices by + , e.g. ATGC+ATGC |
fastq1/2 | absolute path to forward and reverse reads |
adapter | adapter sequences to be trimmed, separated by comma |
A panel of normal created by running MsiSensor-pro on a number of normal samples. Can be created with the help of the hydragenetics/references module.
Optional: A panel specific artifact list as well as a position specific background noise level panel. Can be created with the help of the hydragenetics/references module.
The workflow repository contains a small test dataset .tests/integration
which can be run like so:
$ cd .tests/integration
$ snakemake -s ../../Snakefile -j1 --configfile config.yaml --use-singularity
To use this module in your workflow, follow the description in the
snakemake docs.
Add the module to your Snakefile
like so:
module biomarker:
snakefile:
github(
"hydra-genetics/biomarker",
path="workflow/Snakefile",
tag="v0.1.0",
)
config:
config
use rule * from biomarker as biomarker_*
Latest:
- alignment:v0.3.1
- annotation:v0.3.0
- cnv_sv:v0.3.0
- prealignment:v1.0.0
See COMPATIBLITY.md file for a complete list of module compatibility.
File | Description |
---|---|
hydra-genetics/alignment data |
|
alignment/samtools_merge_bam/{sample}_{type}.bam |
aligned reads |
alignment/samtools_merge_bam/{sample}_{type}.bam.bai |
index file for alignment |
hydra-genetics/annotation |
|
annotation/background_annotation/{sample}_{type}.background_annotation.vcf.gz |
annotated vcf |
hydra-genetics/cnv_sv data |
|
cnv_sv/cnvkit_call/{sample}_{type}.loh.cns |
cnvkit segmentation results |
hydra-genetics/prealignment |
|
prealignment/merged/{sample}_{type}_fastq1.fastq.gz |
merged and trimmed reads |
prealignment/merged/{sample}_{type}_fastq2.fastq.gz |
merged and trimmed reads |
The following output files should be targeted via another rule:
File | Description |
---|---|
biomarker/scarhrd/{sample}_{type}.{tc_method}.scarhrd_cnvkit_score.txt |
calculated HRD score based on cnvkit and scarHRD (experimental) |
biomarker/msisensor_pro/{sample}_{type} |
msi score |
biomarker/tmb/{sample}_{type}.TMB.txt |
tmb score and variants used |