Benchmarking of KAGE and other genotypers

This repository contains a Snakemake-pipeline for benchmarking KAGE and other genotypers. Benchmarks can be done on both real (experimental) or simulated data. Running all the experiments will take 2-3 days using 16 CPU cores for each genotyper, as some of the genotypers require 10+ hours to run. However, running all genotypers on a small simulated dataset can be done in less than an hour.

Reproducing the experiments in the KAGE manuscript

The branch v0.0.1 is a freeze of the code used to perform the experiments used in the KAGE manuscript. The Conda yml files in that branch will specify which versions of software were used.

Installation

Step 1: Intall Snakemake and Conda

Before you start, you will need both Snakemake (to run the benchmarking pipeline) and Conda (to get all the correct dependencies. Follow the instructions to install Snakemake if you don't have Snakemake allready.

Step 2: Clone this repository

git clone https://github.com/ivargr/genotyping-benchmarking

NOTE: All dependencies will automatically be installed by Conda when you run the pipeline, except for PanGenie (which is currently not available on Conda). You will need to install PanGenie manually, and edit config.yaml to specify the installation path of PanGenie.

Running the benchmarks

The default is to use 16 CPU cores for each method, and 40 CPU-cores to create indexes etc. If you want to change this, edit config.yaml before running.

All depenencies are handled by Conda (meaning you should use --use-conda with Snakemake) except Python dependencies, as we believe it is nice to have some control over these by installing them using your chosen Python interpreter. Thus, install Python requirements first into your chosen virtual environment:

pip install -r python_requirements.txt

Run on a simulated dataset

Simply run the following. This will run all the genotypers on a small simulated dataset, specified in config.yaml and create a table with the results.

snakemake -s simulated_experiment.smk --use-conda

If everything goes fine, a file figure11.html with the following result table will be generated:

+------------+---------------+------------------+-----------+-------------+----------------+---------+---------+--------------+
|            | Indels recall | Indels precision | Indels F1 | SNPs recall | SNPs precision | SNPs F1 | Runtime | Memory usage |
+------------+---------------+------------------+-----------+-------------+----------------+---------+---------+--------------+
|    KAGE    |     0.692     |      0.692       |   0.692   |    0.842    |     0.889      |  0.865  |  0 min  |     3 GB     |
|  PanGenie  |     0.769     |      0.714       |   0.741   |    0.789    |     0.714      |  0.750  |  2 min  |    48 GB     |
| Bayestyper |     0.231     |      1.000       |   0.375   |    0.158    |     1.000      |  0.273  |  1 min  |     3 GB     |
|   Malva    |     0.462     |      0.500       |   0.480   |    0.684    |     0.867      |  0.765  |  1 min  |    42 GB     |
| Graphtyper |     0.077     |      1.000       |   0.143   |    0.158    |     0.167      |  0.162  |  0 min  |     0 GB     |
|    GATK    |     0.154     |      0.125       |   0.138   |    0.263    |     0.714      |  0.385  |  0 min  |     2 GB     |
+------------+---------------+------------------+-----------+-------------+----------------+---------+---------+--------------+

Run all experiments

Note: This will take several days and require a lot of RAM. It is possible to pick a subset of methods by editing figures.smk.

snakemake

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
envs		envs
resources		resources
scripts		scripts
.gitignore		.gitignore
Readme.md		Readme.md
Snakefile		Snakefile
analysis.smk		analysis.smk
config.yaml		config.yaml
dag.png		dag.png
figures.smk		figures.smk
genotype_methods.smk		genotype_methods.smk
glimpse.smk		glimpse.smk
kage.smk		kage.smk
kage_prepare.smk		kage_prepare.smk
kage_with_mapped_reads.smk		kage_with_mapped_reads.smk
low_coverage_genotyping.smk		low_coverage_genotyping.smk
pangenie.smk		pangenie.smk
paragraph.smk		paragraph.smk
prepare_data.smk		prepare_data.smk
python_requirements.txt		python_requirements.txt
read_simulation.smk		read_simulation.smk
reference_genome.smk		reference_genome.smk
simulated_experiment.smk		simulated_experiment.smk
util.py		util.py
vg.smk		vg.smk

ivargr/genotyping-benchmarking

Folders and files

Latest commit

History

Repository files navigation

Benchmarking of KAGE and other genotypers

Reproducing the experiments in the KAGE manuscript

Installation

Step 1: Intall Snakemake and Conda

Step 2: Clone this repository

Running the benchmarks

Run on a simulated dataset

Run all experiments

About

Resources

Stars

Watchers

Forks

Languages