Merge pull request #786 from nf-core/dev

Release PR for 2.4
nf-core · Sep 14, 2021 · cc66639 · cc66639
2 parents 70e3d27 + 2ccf4a3
commit cc66639
Show file tree

Hide file tree

Showing 21 changed files with 1,405 additions and 414 deletions.
diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md
@@ -70,14 +70,13 @@ If you wish to contribute a new step, please use the following coding standards:
 3. Define the output channel if needed (see below).
 4. Add any new flags/options to `nextflow.config` with a default (see below).
 5. Add any new flags/options to `nextflow_schema.json` with help text (with `nf-core schema build .`).
-6. Add any new flags/options to the help message (for integer/text parameters, print to help the corresponding `nextflow.config` parameter).
-7. Add sanity checks for all relevant parameters.
-8. Add any new software to the `scrape_software_versions.py` script in `bin/` and the version command to the `scrape_software_versions` process in `main.nf`.
-9. Do local tests that the new code works properly and as expected.
-10. Add a new test command in `.github/workflow/ci.yaml`.
-11. If applicable add a [MultiQC](https://https://multiqc.info/) module.
-12. Update MultiQC config `assets/multiqc_config.yaml` so relevant suffixes, name clean up, General Statistics Table column order, and module figures are in the right order.
-13. Optional: Add any descriptions of MultiQC report sections and output files to `docs/output.md`.
+6. Add sanity checks for all relevant parameters.
+7. Add any new software to the `scrape_software_versions.py` script in `bin/` and the version command to the `scrape_software_versions` process in `main.nf`.
+8. Do local tests that the new code works properly and as expected.
+9. Add a new test command in `.github/workflow/ci.yaml`.
+10. If applicable add a [MultiQC](https://https://multiqc.info/) module.
+11. Update MultiQC config `assets/multiqc_config.yaml` so relevant suffixes, name clean up, General Statistics Table column order, and module figures are in the right order.
+12. Optional: Add any descriptions of MultiQC report sections and output files to `docs/output.md`.
 
 ### Default values
 

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml
@@ -107,7 +107,7 @@ jobs:
       - name: Install dependencies
         run: |
           python -m pip install --upgrade pip
-          pip install nf-core
+          pip install nf-core==1.14
 
       - name: Run nf-core lint
         env:

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -3,7 +3,60 @@
 The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
 and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
 
-## v2.3.5 - 2021-06-03
+## [2.4.0] - 20201-09-14
+
+### `Added`
+
+- [#317](https://github.com/nf-core/eager/issues/317) Added bcftools stats for general genotyping statistics of VCF files
+- [#651](https://github.com/nf-core/eager/issues/651) - Adds removal of adapters specified in an AdapterRemoval adapter list file
+- [#642](https://github.com/nf-core/eager/issues/642) and [#431](https://github.com/nf-core/eager/issues/431) adds post-adapter removal barcode/fastq trimming
+- [#769](https://github.com/nf-core/eager/issues/769) - Adds lc_extrap mode to preseq (suggested by @roberta-davidson)
+
+### `Fixed`
+
+- Fixed some missing or incorrectly reported software versions
+- [#771](https://github.com/nf-core/eager/issues/771) Remove legacy code
+- Improved output documentation for MultiQC general stats table (thanks to @KathrinNaegele and @esalmela)
+- Improved output documentation for BowTie2 (thanks to @isinaltinkaya)
+- [#612](https://github.com/nf-core/eager/issues/612) Updated BAM trimming defaults to 0 to ensure no unwanted trimming when mixing half-UDG with no-UDG (thanks to @scarlhoff)
+- [#722](https://github.com/nf-core/eager/issues/722) Updated BWA mapping mapping parameters to latest recommendations - primarily alnn back to 0.01 and alno to 2 as per Oliva et al. 2021 (10.1093/bib/bbab076)
+- Updated workflow diagrams to reflect latest functionality
+- [#787](https://github.com/nf-core/eager/issues/787) Adds memory specification flags for the GATK UnifiedGenotyper and HaplotyperCaller steps (thanks to @nylander)
+- Fixed issue where MultiVCFAnalyzer would not pick up newly generated VCF files, when specifying additional VCF files.
+- [#790](https://github.com/nf-core/eager/issues/790) Fixed kraken2 report file-name collision when sample names have `.` in them
+- [#792](https://github.com/nf-core/eager/issues/792) Fixed java error messages for AdapterRemovalFixPrefix being hidden in output
+- [#794](https://github.com/nf-core/eager/issues/794) Aligned default test profile with nf-core standards (`test_tsv` is now `test`)
+
+### `Dependencies`
+
+- Bumped python: 3.7.3 -> 3.9.4
+- Bumped markdown: 3.2.2 -> 3.3.4
+- Bumped pymdown-extensions: 7.1 -> 8.2
+- Bumped pyments: 2.6.1 -> 2.9.0
+- Bumped adapterremoval: 2.3.1 -> 2.3.2
+- Bumped picard: 2.22.9 -> 2.26.0
+- Bumped samtools 1.9 -> 1.12
+- Bumped angsd: 0.933 -> 0.935
+- Bumped gatk4: 4.1.7.0 -> 4.2.0.0
+- Bumped multiqc: 1.10.1 -> 1.11
+- Bumped bedtools 2.29.2 -> 2.30.0
+- Bumped libiconv: 1.15 -> 1.16
+- Bumped preseq: 2.0.3 -> 3.1.2
+- Bumped bamutil: 1.0.14 -> 1.0.15
+- Bumped pysam: 0.15.4 -> 0.16.0
+- Bumped kraken2: 2.1.1 -> 2.1.2
+- Bumped pandas: 1.0.4 -> 1.2.4
+- Bumped freebayes: 1.3.2 -> 1.3.5
+- Bumped biopython: 1.76 -> 1.79
+- Bumped xopen: 0.9.0 -> 1.1.0
+- Bumped bowtie2: 2.4.2 -> 2.4.4
+- Bumped mapdamage2: 2.2.0 -> 2.2.1
+- Bumped bbmap: 38.87 -> 38.92
+- Added bcftools: 1.12
+
+### `Deprecated`
+
+## [2.3.5] - 2021-06-03
 
 ### `Added`
 
@@ -27,7 +80,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
 
 ### `Deprecated`
 
-## v2.3.4 - 2021-05-05
+## [2.3.4] - 2021-05-05
 
 ### `Added`
 
@@ -48,7 +101,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
 
 ### `Deprecated`
 
-## v2.3.3 - 2021-04-08
+## [2.3.3] - 2021-04-08
 
 ### `Added`
 

diff --git a/Dockerfile b/Dockerfile
@@ -7,7 +7,7 @@ COPY environment.yml /
 RUN conda env create --quiet -f /environment.yml && conda clean -a
 
 # Add conda installation dir to PATH (instead of doing 'conda activate')
-ENV PATH /opt/conda/envs/nf-core-eager-2.3.5/bin:$PATH
+ENV PATH /opt/conda/envs/nf-core-eager-2.4.0/bin:$PATH
 
 # Dump the details of the installed packages to a file for posterity
-RUN conda env export --name nf-core-eager-2.3.5 > nf-core-eager-2.3.5.yml
+RUN conda env export --name nf-core-eager-2.4.0 > nf-core-eager-2.4.0.yml
diff --git a/README.md b/README.md
@@ -7,6 +7,7 @@
 [![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A520.07.1-brightgreen.svg)](https://www.nextflow.io/)
 [![nf-core](https://img.shields.io/badge/nf--core-pipeline-brightgreen.svg)](https://nf-co.re/)
 [![DOI](https://zenodo.org/badge/135918251.svg)](https://zenodo.org/badge/latestdoi/135918251)
+[![Published in PeerJ](https://img.shields.io/badge/peerj-published-%2300B2FF)](https://peerj.com/articles/10947/)
 
 [![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg)](https://bioconda.github.io/)
 [![Docker](https://img.shields.io/docker/automated/nfcore/eager.svg)](https://hub.docker.com/r/nfcore/eager)
@@ -34,7 +35,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool
 3. Download the pipeline and test it on a minimal dataset with a single command:
 
     ```bash
-    nextflow run nf-core/eager -profile test_tsv,<docker/singularity/podman/shifter/charliecloud/conda/institute>
+    nextflow run nf-core/eager -profile test,<docker/singularity/podman/shifter/charliecloud/conda/institute>
     ```
 
     > Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile <institute>` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment.
@@ -65,7 +66,7 @@ By default the pipeline currently performs the following:
 
 * Create reference genome indices for mapping (`bwa`, `samtools`, and `picard`)
 * Sequencing quality control (`FastQC`)
-* Sequencing adapter removal and for paired end data merging (`AdapterRemoval`)
+* Sequencing adapter removal, paired-end data merging (`AdapterRemoval`)
 * Read mapping to reference using (`bwa aln`, `bwa mem`, `CircularMapper`, or `bowtie2`)
 * Post-mapping processing, statistics and conversion to bam (`samtools`)
 * Ancient DNA C-to-T damage pattern visualisation (`DamageProfiler`)
@@ -85,6 +86,7 @@ Additional functionality contained by the pipeline currently includes:
 #### Preprocessing
 
 * Illumina two-coloured sequencer poly-G tail removal (`fastp`)
+* Post-AdapterRemoval trimming of FASTQ files prior mapping (`fastp`)
 * Automatic conversion of unmapped reads to FASTQ (`samtools`)
 * Host DNA (mapped reads) stripping from input FASTQ files (for sensitive samples)
 
@@ -160,17 +162,22 @@ Those who have provided conceptual guidance, suggestions, bug reports etc.
 
 * [Alexandre Gilardet](https://github.com/alexandregilardet)
 * Arielle Munters
-* [Charles Plessy](https://github.com/charles-plessy)
 * [Åshild Vågene](https://github.com/ashildv)
+* [Charles Plessy](https://github.com/charles-plessy)
+* [Elina Salmela](https://github.com/esalmela)
 * [Hester van Schalkwyk](https://github.com/hesterjvs)
 * [Ido Bar](https://github.com/IdoBar)
 * [Irina Velsko](https://github.com/ivelsko)
+* [Işın Altınkaya](https://github.com/isinaltinkaya)
+* [Johan Nylander](https://github.com/nylander)
 * [Katerine Eaton](https://github.com/ktmeaton)
+* [Katrin Nägele](https://github.com/KathrinNaegele)
 * [Luc Venturini](https://github.com/lucventurini)
 * [Marcel Keller](https://github.com/marcel-keller)
 * [Pierre Lindenbaum](https://github.com/lindenb)
 * [Pontus Skoglund](https://github.com/pontussk)
 * [Raphael Eisenhofer](https://github.com/EisenRa)
+* [Roberta Davidson](https://github.com/roberta-davidson)
 * [Torsten Günter](https://bitbucket.org/tguenther/)
 * [Kevin Lord](https://github.com/lordkev)
 * [He Yu](https://github.com/paulayu)

diff --git a/assets/multiqc_config.yaml b/assets/multiqc_config.yaml
@@ -25,6 +25,7 @@ run_modules:
     - samtools
     - sexdeterrmine
     - hops
+    - bcftools
 
 extra_fn_clean_exts:
     - '_fastp'
@@ -60,13 +61,13 @@ extra_fn_clean_exts:
 
 top_modules:
     - 'fastqc':
-       name: 'FastQC (pre-AdapterRemoval)'
+       name: 'FastQC (pre-Trimming)'
        path_filters:
            - '*_raw_fastqc.zip'
     - 'fastp'
     - 'adapterRemoval'
     - 'fastqc':
-       name: 'FastQC (post-AdapterRemoval)'
+       name: 'FastQC (post-Trimming)'
        path_filters:
             - '*.truncated_fastqc.zip'
             - '*.combined*_fastqc.zip'
@@ -86,11 +87,14 @@ top_modules:
             - '*_postfilterflagstat.stats'
     - 'dedup'
     - 'picard'
-    - 'preseq'
+    - 'preseq':
+       path_filters:
+           - '*.preseq'
     - 'damageprofiler'
     - 'mtnucratio'
     - 'qualimap'
     - 'sexdeterrmine'
+    - 'bcftools'
     - 'multivcfanalyzer':
        path_filters:
            - '*MultiVCFAnalyzer.json'
@@ -106,7 +110,7 @@ remove_sections:
   - sexdeterrmine-snps
 
 table_columns_visible:
-    FastQC (pre-AdapterRemoval):
+    FastQC (pre-Trimming):
         percent_duplicates: False
         percent_gc: True
         avg_sequence_length: True
@@ -117,7 +121,7 @@ table_columns_visible:
     Adapter Removal:
         aligned_total: False
         percent_aligned: True
-    FastQC (post-AdapterRemoval):
+    FastQC (post-Trimming):
         avg_sequence_length: True
         percent_duplicates: False
         total_sequences: True
@@ -180,15 +184,15 @@ table_columns_visible:
         Total_Snps: False
 
 table_columns_placement:
-    FastQC (pre-AdapterRemoval):
+    FastQC (pre-Trimming):
         total_sequences: 100
         avg_sequence_length: 110
         percent_gc: 120
     fastp:
         after_filtering_gc_content: 200
     Adapter Removal:
         percent_aligned: 300
-    FastQC (post-AdapterRemoval): 
+    FastQC (post-Trimming): 
         total_sequences: 400
         avg_sequence_length: 410
         percent_gc: 420

diff --git a/bin/scrape_software_versions.py b/bin/scrape_software_versions.py
@@ -37,7 +37,8 @@
     'kraken':['v_kraken.txt', r"Kraken version (\S+)"],
     'eigenstrat_snp_coverage':['v_eigenstrat_snp_coverage.txt',r"(\S+)"],
     'mapDamage2':['v_mapdamage.txt',r"(\S+)"],
-    'bbduk':['v_bbduk.txt',r"(.*)"]
+    'bbduk':['v_bbduk.txt',r"(.*)"],
+    'bcftools':['v_bcftools.txt',r"(\S+)"]
 }
 
 results = OrderedDict()
@@ -75,6 +76,7 @@
 results['eigenstrat_snp_coverage'] = '<span style="color:#999999;\">N/A</span>'
 results['mapDamage2'] = '<span style="color:#999999;\">N/A</span>'
 results['bbduk'] = '<span style="color:#999999;\">N/A</span>'
+results['bcftools'] = '<span style="color:#999999;\">N/A</span>'
 
 # Search each file using its regex
 for k, v in regexes.items():

diff --git a/conf/test.config b/conf/test.config
@@ -4,12 +4,11 @@
  * -------------------------------------------------
  * Defines bundled input files and everything required
  * to run a fast and simple test. Use as follows:
- *   nextflow run nf-core/eager -profile test,<docker/singularity>
+ * nextflow run nf-core/eager -profile test, docker (or singularity, or conda)
  */
 
 includeConfig 'test_resources.config'
 
-
 params {
   config_profile_name = 'Test profile'
   config_profile_description = 'Minimal test dataset to check pipeline function'
@@ -19,9 +18,7 @@ params {
   max_time = 48.h
   genome = false
   //Input data
-  single_end = false
+  input = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/testdata/Mammoth/mammoth_design_fastq.tsv'
   // Genome references
   fasta = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Mammoth/Mammoth_MT_Krause.fasta'
-  // Ignore `--input` as otherwise the parameter validation will throw an error
-  schema_ignore_params = 'genomes,input_paths,input'
 }
diff --git a/conf/test_tsv.config → conf/test_direct.config b/conf/test_tsv.config → conf/test_direct.config
@@ -4,11 +4,12 @@
  * -------------------------------------------------
  * Defines bundled input files and everything required
  * to run a fast and simple test. Use as follows:
- * nextflow run nf-core/eager -profile test, docker (or singularity, or conda)
+ *   nextflow run nf-core/eager -profile test,<docker/singularity>
  */
 
 includeConfig 'test_resources.config'
 
+
 params {
   config_profile_name = 'Test profile'
   config_profile_description = 'Minimal test dataset to check pipeline function'
@@ -18,7 +19,9 @@ params {
   max_time = 48.h
   genome = false
   //Input data
-  input = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/testdata/Mammoth/mammoth_design_fastq.tsv'
+  single_end = false
   // Genome references
   fasta = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Mammoth/Mammoth_MT_Krause.fasta'
+  // Ignore `--input` as otherwise the parameter validation will throw an error
+  schema_ignore_params = 'genomes,input_paths,input'
 }