Skip to content

Citation and References

Lucas Czech edited this page Oct 7, 2022 · 23 revisions

Citation

When using grenepipe, please cite:

grenepipe: A flexible, scalable, and reproducible pipeline
to automate variant calling from sequence reads.

Lucas Czech and Moises Exposito-Alonso. Bioinformatics. 2022.
doi:10.1093/bioinformatics/btac600 [pdf]

Furthermore, please do not forget to cite all tools that you selected to be run for your analysis. See below for their references.

Read trimming

AdapterRemoval

AdapterRemoval: Easy cleaning of next-generation sequencing reads.
Lindgreen S.
BMC Res Notes. 2012.
doi:10.1186/1756-0500-5-337

AdapterRemoval v2: Rapid adapter trimming, identification, and read merging.
Schubert M, Lindgreen S, Orlando L.
BMC Res Notes. 2016.
doi:10.1186/s13104-016-1900-2

Cutadapt

Cutadapt removes adapter sequences from high-throughput sequencing reads.
Martin M.
EMBnet journal. 2011.
doi:10.14806/ej.17.1.200

fastp

fastp: an ultra-fast all-in-one FASTQ preprocessor.
Chen S, Zhou Y, Chen Y, Gu J.
Bioinformatics. 2018.
doi:10.1093/bioinformatics/bty560

SeqPrep

SeqPrep: Tool for stripping adaptors and/or merging paired reads with overlap into single reads.
John, JS.
https://github.com/jstjohn/SeqPrep

skewer

Skewer: a fast and accurate adapter trimmer for next-generation sequencing paired-end reads.
Jiang H, Lei R, Ding S-W, Zhu S.
BMC Bioinformatics. 2014.
doi:10.1186/1471-2105-15-182

trimmomatic

Trimmomatic: A flexible trimmer for Illumina sequence data.
Bolger AM, Lohse M, Usadel B.
Bioinformatics. 2014.
doi:10.1093/bioinformatics/btu170

Read mapping, duplication removal, and quality score recalibration

Bowtie 2

Fast gapped-read alignment with Bowtie 2.
Langmead B, Salzberg SL.
Nat Methods. 2012.
doi:10.1038/nmeth.1923

bwa mem and bwa aln

Fast and accurate short read alignment with Burrows-Wheeler transform.
Li H, Durbin R.
Bioinformatics. 2009.
doi:10.1093/bioinformatics/btp324

bwa mem2

Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems.
Vasimuddin M, Misra S, Li H, Aluru S.
2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 2019.
doi:10.1109/IPDPS.2019.00041

BamUtil clipOverlap

An efficient and scalable analysis framework for variant extraction and refinement from population scale DNA sequence data.
Jun G, Wing MK, Abecasis GR, Kang HM.
Genome Research, 25(6), gr.176552.114. 2015.
doi:10.1101/GR.176552.114

Picard MarkDuplicates

Picard toolkit.
Broad Institute; 2018.
GitHub repository, online: http://broadinstitute.github.io/picard/

DeDup

EAGER: efficient ancient genome reconstruction.
Peltzer A, Jäger G, Herbig A, Seitz A, Kniep C, Krause J, et al.
Genome Biol. 2016.
doi:10.1186/s13059-016-0918-z

GATK BaseRecalibrator

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data.
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al.
Genome Res. 2010.
doi:10.1101/GR.107524.110

samtool merge and samtool mpileup

The Sequence Alignment/Map format and SAMtools.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al.
Bioinformatics. 2009.
doi:10.1093/bioinformatics/btp352

Damage profiling

mapDamage

mapDamage: testing for damage patterns in ancient DNA sequences.
Ginolhac A, Rasmussen M, Gilbert MTP, Willerslev E, Orlando L.
Bioinformatics. 2011.
doi:10.1093/bioinformatics/btr347

mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters.
Jónsson H, Ginolhac A, Schubert M, Johnson PLF, Orlando L.
Bioinformatics. 2013.
doi:10.1093/bioinformatics/btt193

DamageProfiler

DamageProfiler: Fast damage pattern calculation for ancient DNA.
Neukamm J, Peltzer A, Nieselt K.
bioRxiv. 2020.
doi:10.1101/2020.10.01.322206

Variant calling, genotyping, and filtering

bcftools call

A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.
Li H.
Bioinformatics. 2011.
doi:10.1093/bioinformatics/btr509

freebayes

Haplotype-based variant detection from short-read sequencing.
Garrison E, Marth G.
arXiv. 2012.
arxiv:1207.3907

BEDOPS: high-performance genomic feature operations.
Neph S, Kuehn MS, Reynolds AP, Haugen E, Thurman RE, Johnson AK, et al.
Bioinformatics. 2012;28.
doi:10.1093/bioinformatics/bts277

GATK HaplotypeCaller, GATK SelectVariants, GATK VariantFiltration, GATK VariantRecalibrator

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data.
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al.
Genome Res. 2010.
doi:10.1101/GR.107524.110

Frequency calling

HAF-pipe

Accurate Allele Frequencies from Ultra-low Coverage Pool-Seq Samples in Evolve-and-Resequence Experiments.
Tilk S, Bergland A, Goodman A, Schmidt P, Petrov D, Greenblum S.
G3: Genes|Genomes|Genetics. 2019.
doi:10.1534/g3.119.400755

Maximum Likelihood Estimation of Frequencies of Known Haplotypes from Pooled Sequence Data.
Kessner D, Turner T, Novembre J.
Molecular Biology and Evolution. 2013.
doi:10.1093/molbev/mst016

Quality control, statistics, SNP annotation, reporting

FastQC

FastQC: a quality control tool for high throughput sequence data.
Andrews S.
Babraham Bioinformatics, Babraham Institute, Cambridge, United Kingdom; 2010.
Online: https://www.bioinformatics.babraham.ac.uk/projects/fastqc

samtool stats and samtool flagstat

The Sequence Alignment/Map format and SAMtools.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al.
Bioinformatics. 2009.
doi:10.1093/bioinformatics/btp352

QualiMap

Qualimap 2: Advanced multi-sample quality control for high-throughput sequencing data.
Okonechnikov K, Conesa A, García-Alcalde F.
Bioinformatics. 2016.
doi:10.1093/bioinformatics/btv566

Picard CollectMultipleMetrics

Picard toolkit.
Broad Institute; 2018.
GitHub repository, online: http://broadinstitute.github.io/picard/

snpEff

A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3.
Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, et al.
Fly. 2012.
doi:10.4161/fly.19695

VEP

The Ensembl Variant Effect Predictor.
McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GRS, Thormann A, Flicek P, Cunningham F.
Genome Biology. 2016.
doi:10.1186/s13059-016-0974-4

SeqKit

SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation.
Shen W, Le S, Li Y, Hu F.
PLOS ONE 11(10), e0163962. 2016.
doi:10.1371/journal.pone.0163962

bcftools stats

A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.
Li H.
Bioinformatics. 2011.
doi:10.1093/bioinformatics/btr509

MultiQC

MultiQC: Summarize analysis results for multiple tools and samples in a single report.
Ewels P, Magnusson M, Lundin S, Käller M.
Bioinformatics. 2016.
doi:10.1093/bioinformatics/btw354

Additional references

Snakemake

Snakemake--a scalable bioinformatics workflow engine.
Köster J, Rahmann S.
Bioinformatics. 2012.
doi:10.1093/bioinformatics/bts480

Sustainable data analysis with Snakemake.
Mölder F, Jablonski KP, Letcher B, Hall MB, Tomkins-Tinch CH, Sochat V, Forster J, Lee S, Twardziok SO, Kanitz A, Wilm A, Holtgrewe M, Rahmann S, Nahnsen S, Köster J.
F1000Res 10, 33. 2021.
doi:10.12688/f1000research.29032.2

Bioconda

Bioconda: A sustainable and comprehensive software distribution for the life sciences.
Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, et al.
Nat Methods. 2018.
doi:10.1038/s41592-018-0046-7

Fastq file format

The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants.
Cock PJA, Fields CJ, Goto N, Heuer ML, Rice PM.
Nucleic Acids Res. 2009.
doi:10.1093/nar/gkp1137

Fasta file format

Improved tools for biological sequence comparison.
Pearson WR, Lipman DJ.
Proceedings of the National Academy of Sciences. 1988.
doi:10.1073/pnas.85.8.2444

SAM/BAM file format

The Sequence Alignment/Map format and SAMtools.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup.
Bioinformatics. 2009.
doi:10.1093/bioinformatics/btp352

VCF file format

The variant call format and VCFtools.
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al.
Bioinformatics. 2011.
doi:10.1093/bioinformatics/btr330

GrENE-net

Genomics of rapid Evolution in Novel Environments network (GrENE-net).
Online: https://grenenet.org/