From 89f632ddd4d67a9319e8478c8dd5fba6fc777792 Mon Sep 17 00:00:00 2001 From: "James A. Fellows Yates" Date: Sat, 17 Nov 2018 20:08:49 +0100 Subject: [PATCH 1/4] Added to README.md quick start guide --- README.md | 28 +++++++++++++++++++++++----- 1 file changed, 23 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 1b1247d07..abd295e37 100644 --- a/README.md +++ b/README.md @@ -14,9 +14,9 @@ **nf-core/eager** is a bioinformatics best-practice analysis pipeline for ancient DNA data analysis. -The pipeline uses [Nextflow](https://www.nextflow.io), a bioinformatics workflow tool. It pre-processes raw data from FastQ inputs, aligns the reads and performs extensive quality-control on the results. It comes with docker / singularity containers making installation trivial and results highly reproducible. +The pipeline uses [Nextflow](https://www.nextflow.io), a bioinformatics workflow tool. It pre-processes raw data from FASTQ inputs, aligns the reads and performs extensive quality-control on the results. It comes with docker / singularity containers making installation trivial and results highly reproducible. -### Pipeline steps +## Pipeline steps * Create reference genome indices (optional) * BWA @@ -33,7 +33,25 @@ The pipeline uses [Nextflow](https://www.nextflow.io), a bioinformatics workflow * BAM Clipping for UDG+/UDGhalf protocols * PMDTools damage filtering / assessment -### Documentation +## Quick Start + +1. Install [`nextflow`](docs/installation.md) +2. Install one of [`docker`](https://docs.docker.com/engine/installation/), [`singularity`](https://www.sylabs.io/guides/3.0/user-guide/) or [`conda`](https://conda.io/miniconda.html) +3. Download the EAGER pipeline + +```bash +nextflow pull nf-core/eager +``` + +4. Set up your job with default parameters + +```bash +nextflow run nf-core -profile --reads'*_R{1,2}.fastq.gz' --fasta '/MultiQC/multiqc_report.html` + +## Documentation The nf-core/eager pipeline comes with documentation about the pipeline, found in the `docs/` directory: 1. [Installation](docs/installation.md) @@ -44,5 +62,5 @@ The nf-core/eager pipeline comes with documentation about the pipeline, found in 4. [Output and how to interpret the results](docs/output.md) 5. [Troubleshooting](docs/troubleshooting.md) -### Credits -This pipeline was written by Alexander Peltzer ([apeltzer](https://github.com/apeltzer)), with major contributions from Stephen Clayton, ideas and documentation from James Fellows-Yates, Raphael Eisenhofer and Judith Neukamm. If you want to contribute, please open an issue and ask to be added to the project - happy to do so and everyone is welcome to contribute here! \ No newline at end of file +## Credits +This pipeline was written by Alexander Peltzer ([apeltzer](https://github.com/apeltzer)), with major contributions from Stephen Clayton, ideas and documentation from James Fellows Yates, Raphael Eisenhofer and Judith Neukamm. If you want to contribute, please open an issue and ask to be added to the project - happy to do so and everyone is welcome to contribute here! \ No newline at end of file From 9c93d0f1b1c4fb8268925890dac3e02fcbe2cbcf Mon Sep 17 00:00:00 2001 From: "James A. Fellows Yates" Date: Sat, 17 Nov 2018 20:44:37 +0100 Subject: [PATCH 2/4] Made default pipeline more descriptive, added references to all tools used --- README.md | 68 +++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 51 insertions(+), 17 deletions(-) diff --git a/README.md b/README.md index abd295e37..362f3e47e 100644 --- a/README.md +++ b/README.md @@ -12,26 +12,36 @@ ## Introduction -**nf-core/eager** is a bioinformatics best-practice analysis pipeline for ancient DNA data analysis. +**nf-core/eager** is a bioinformatics best-practice analysis pipeline for NGS +sequencing based ancient DNA (aDNA) data analysis. -The pipeline uses [Nextflow](https://www.nextflow.io), a bioinformatics workflow tool. It pre-processes raw data from FASTQ inputs, aligns the reads and performs extensive quality-control on the results. It comes with docker / singularity containers making installation trivial and results highly reproducible. +The pipeline uses [Nextflow](https://www.nextflow.io), a bioinformatics +workflow tool. It pre-processes raw data from FASTQ inputs, aligns the reads +and performs extensive general NGS and aDNA specific quality-control on the +results. It comes with docker, singularity or conda containers making +installation trivial and results highly reproducible. ## Pipeline steps -* Create reference genome indices (optional) - * BWA - * Samtools Index - * Sequence Dictionary -* QC with FastQC -* AdapterRemoval for read clipping and merging -* Read mapping with BWA, BWA Mem or CircularMapper -* Samtools sort, index, stats & conversion to BAM -* DeDup or MarkDuplicates read deduplication -* QualiMap BAM QC Checking -* Preseq Library Complexity Estimation -* DamageProfiler damage profiling -* BAM Clipping for UDG+/UDGhalf protocols -* PMDTools damage filtering / assessment +By default the pipeline currently performs the following: + +* Create reference genome indices for mapping (`bwa`, `samtools`, and `picard`) +* Sequencing quality control (`FastQC`) +* Sequencing adapter removal and for paired end data merging (`AdapterRemoval`) +* Read mapping to reference using (`bwa aln`, `bwa mem` or `CircularMapper`) +* Post-mapping processing, statistics and conversion to bam (`samtools`) +* Ancient DNA C-to-T damage pattern visualisation (`DamageProfiler`) +* PCR duplicate removal (`DeDup` or `MarkDuplicates`) +* Post-mapping statistics and BAM quality control (`Qualimap`) +* Library Complexity Estimation (`preseq`) +* Overall pipeline statistics summaries (`MultiQC`) + +Additional functionality contained by the pipeline currently includes: + +* Illumina two-coloured sequencer poly-G tail removal (`fastp`) +* Automatic conversion of unmapped reads to FASTQ (`samtools`) +* Damage removal/clipping for UDG+/UDG-half treatment protocols (`BamUtil`) +* Damage reads extraction and assessment (`PMDTools`) ## Quick Start @@ -51,7 +61,11 @@ nextflow run nf-core -profile --reads'*_R{1,2}.fastq. 5. See the overview of the run with under `/MultiQC/multiqc_report.html` +Modifications to the default pipeline are easily made using various options +as described in the documentation. + ## Documentation + The nf-core/eager pipeline comes with documentation about the pipeline, found in the `docs/` directory: 1. [Installation](docs/installation.md) @@ -63,4 +77,24 @@ The nf-core/eager pipeline comes with documentation about the pipeline, found in 5. [Troubleshooting](docs/troubleshooting.md) ## Credits -This pipeline was written by Alexander Peltzer ([apeltzer](https://github.com/apeltzer)), with major contributions from Stephen Clayton, ideas and documentation from James Fellows Yates, Raphael Eisenhofer and Judith Neukamm. If you want to contribute, please open an issue and ask to be added to the project - happy to do so and everyone is welcome to contribute here! \ No newline at end of file + +This pipeline was written by Alexander Peltzer ([apeltzer](https://github.com/apeltzer)), +with major contributions from Stephen Clayton, ideas and documentation from +James Fellows Yates, Raphael Eisenhofer and Judith Neukamm. If you want to +contribute, please open an issue and ask to be added to the project - happy to +do so and everyone is welcome to contribute here! + +## Tool References + +* *EAGER v1, CircularMapper, DeDup* Peltzer, A., Jäger, G., Herbig, A., Seitz, A., Kniep, C., Krause, J., & Nieselt, K. (2016). EAGER: efficient ancient genome reconstruction. Genome Biology, 17(1), 1–14. [https://doi.org/10.1186/s13059-016-0918-z](https://doi.org/10.1186/s13059-016-0918-z) +* *FastQC* download: [https://doi.org/10.1186/s13059-016-0918-z](https://doi.org/10.1186/s13059-016-0918-z) Download: [https://github.com/apeltzer/EAGER-GUI](https://github.com/apeltzer/EAGER-GUI) and [https://github.com/apeltzer/EAGER-CLI](https://github.com/apeltzer/EAGER-CLI) +* *AdapterRemoval v2* Schubert, M., Lindgreen, S., & Orlando, L. (2016). AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Research Notes, 9, 88. [https://doi.org/10.1186/s13104-016-1900-2](https://doi.org/10.1186/s13104-016-1900-2) Download: [https://github.com/MikkelSchubert/adapterremoval](https://github.com/MikkelSchubert/adapterremoval) +* *bwa* Li, H., & Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics , 25(14), 1754–1760. [https://doi.org/10.1093/bioinformatics/btp324](https://doi.org/10.1093/bioinformatics/btp324) Download: [http://bio-bwa.sourceforge.net/bwa.shtml](http://bio-bwa.sourceforge.net/bwa.shtml) +* *SAMtools* Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., … 1000 Genome Project Data Processing Subgroup. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics , 25(16), 2078–2079. [https://doi.org/10.1093/bioinformatics/btp352](https://doi.org/10.1093/bioinformatics/btp352) Download: [http://www.htslib.org/](http://www.htslib.org/) +* *DamageProfiler* Judith Neukamm (Unpublished) +* *QualiMap* Okonechnikov, K., Conesa, A., & García-Alcalde, F. (2016). Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics , 32(2), 292–294. [https://doi.org/10.1093/bioinformatics/btv566](https://doi.org/10.1093/bioinformatics/btv566) Download: [http://qualimap.bioinfo.cipf.es/](http://qualimap.bioinfo.cipf.es/) +* *preseq* Daley, T., & Smith, A. D. (2013). Predicting the molecular complexity of sequencing libraries. Nature Methods, 10(4), 325–327. [https://doi.org/10.1038/nmeth.2375](https://doi.org/10.1038/nmeth.2375). Download: [http://smithlabresearch.org/software/preseq/](http://smithlabresearch.org/software/preseq/) +* *PMDTools* Skoglund, P., Northoff, B. H., Shunkov, M. V., Derevianko, A. P., Pääbo, S., Krause, J., & Jakobsson, M. (2014). Separating endogenous ancient DNA from modern day contamination in a Siberian Neandertal. Proceedings of the National Academy of Sciences of the United States of America, 111(6), 2229–2234. [https://doi.org/10.1073/pnas.1318934111](https://doi.org/10.1073/pnas.1318934111) Download: [https://github.com/pontussk/PMDtools](https://github.com/pontussk/PMDtools) +* *MultiQC* Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics , 32(19), 3047–3048. [https://doi.org/10.1093/bioinformatics/btw354](https://doi.org/10.1093/bioinformatics/btw354) Download: [https://multiqc.info/](https://multiqc.info/) +* *BamUtils* Jun, G., Wing, M. K., Abecasis, G. R., & Kang, H. M. (2015). An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data. Genome Research, 25(6), 918–925. [https://doi.org/10.1101/gr.176552.114](https://doi.org/10.1101/gr.176552.114) Download: [https://genome.sph.umich.edu/wiki/BamUtil](https://genome.sph.umich.edu/wiki/BamUtil) +* *FastP* Chen, S., Zhou, Y., Chen, Y., & Gu, J. (2018). fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics , 34(17), i884–i890. [https://doi.org/10.1093/bioinformatics/bty560](https://doi.org/10.1093/bioinformatics/bty560) Download: [https://github.com/OpenGene/fastp](https://github.com/OpenGene/fastp) \ No newline at end of file From 166c4a0eaf2b6ef155051de39d10ab58d2524213 Mon Sep 17 00:00:00 2001 From: "James A. Fellows Yates" Date: Sat, 17 Nov 2018 21:07:54 +0100 Subject: [PATCH 3/4] Experimental multiqc config for improved results order --- conf/multiqc_config.yaml | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/conf/multiqc_config.yaml b/conf/multiqc_config.yaml index d714e8ee0..d5f5a6540 100644 --- a/conf/multiqc_config.yaml +++ b/conf/multiqc_config.yaml @@ -5,3 +5,15 @@ report_comment: > report_section_order: nf-core/eager-software-versions: order: -1000 + fastqc: + after: 'nf-core/eager-software-versions' + adapterRemoval: + after: 'fastqc' + Samtools: + after: 'adapterRemoval' + dedup: + after: 'Samtools' + qualimap: + after: 'dedup' + preseq: + after: 'qualimap' \ No newline at end of file From 5d99341e3a2ce48a071905bcad94ba39079952db Mon Sep 17 00:00:00 2001 From: Alexander Peltzer Date: Sun, 18 Nov 2018 14:03:18 +0100 Subject: [PATCH 4/4] Update README.md Small change :-) --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index ffb1bcd57..224cc7f0f 100644 --- a/README.md +++ b/README.md @@ -86,8 +86,8 @@ do so and everyone is welcome to contribute here! ## Tool References -* *EAGER v1, CircularMapper, DeDup* Peltzer, A., Jäger, G., Herbig, A., Seitz, A., Kniep, C., Krause, J., & Nieselt, K. (2016). EAGER: efficient ancient genome reconstruction. Genome Biology, 17(1), 1–14. [https://doi.org/10.1186/s13059-016-0918-z](https://doi.org/10.1186/s13059-016-0918-z) -* *FastQC* download: [https://doi.org/10.1186/s13059-016-0918-z](https://doi.org/10.1186/s13059-016-0918-z) Download: [https://github.com/apeltzer/EAGER-GUI](https://github.com/apeltzer/EAGER-GUI) and [https://github.com/apeltzer/EAGER-CLI](https://github.com/apeltzer/EAGER-CLI) +* *EAGER v1, CircularMapper, DeDup* Peltzer, A., Jäger, G., Herbig, A., Seitz, A., Kniep, C., Krause, J., & Nieselt, K. (2016). EAGER: efficient ancient genome reconstruction. Genome Biology, 17(1), 1–14. [https://doi.org/10.1186/s13059-016-0918-z](https://doi.org/10.1186/s13059-016-0918-z) Download: [https://github.com/apeltzer/EAGER-GUI](https://github.com/apeltzer/EAGER-GUI) and [https://github.com/apeltzer/EAGER-CLI](https://github.com/apeltzer/EAGER-CLI) +* *FastQC* download: [https://www.bioinformatics.babraham.ac.uk/projects/fastqc/](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) * *AdapterRemoval v2* Schubert, M., Lindgreen, S., & Orlando, L. (2016). AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Research Notes, 9, 88. [https://doi.org/10.1186/s13104-016-1900-2](https://doi.org/10.1186/s13104-016-1900-2) Download: [https://github.com/MikkelSchubert/adapterremoval](https://github.com/MikkelSchubert/adapterremoval) * *bwa* Li, H., & Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics , 25(14), 1754–1760. [https://doi.org/10.1093/bioinformatics/btp324](https://doi.org/10.1093/bioinformatics/btp324) Download: [http://bio-bwa.sourceforge.net/bwa.shtml](http://bio-bwa.sourceforge.net/bwa.shtml) * *SAMtools* Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., … 1000 Genome Project Data Processing Subgroup. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics , 25(16), 2078–2079. [https://doi.org/10.1093/bioinformatics/btp352](https://doi.org/10.1093/bioinformatics/btp352) Download: [http://www.htslib.org/](http://www.htslib.org/)