nf-core · drpatelh · Feb 1, 2024 · Feb 1, 2024 · Feb 1, 2024 · Feb 1, 2024
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -3,7 +3,13 @@
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
-## [Unpublished Version / DEV]
+## [[1.12.0](https://github.com/nf-core/fetchngs/releases/tag/1.12.0)] - 2024-02-02
+
+### :warning: Major enhancements
+
+- The Aspera CLI was recently added to [Bioconda](https://anaconda.org/bioconda/aspera-cli) and we have added it as another way of downloading FastQ files on top of the existing FTP and sra-tools support. In our limited benchmarks on all public Clouds we found ~50% speed-up in download times compared to FTP! We are not aware of any obvious downsides and have made this the default download method in the pipeline. You can however, revert to using FTP and sra-tools using the `--force_ftp_download` and `--force_sratools_download` parameters, respectively. We would love to have your feedback!
+- Support for Synapse ids has been dropped in this release. We haven't had any feedback from users whether it is being used or not. Users can run earlier versions of the pipeline if required.
+- We have significantly refactored and standardised the way we are using nf-test within this pipeline. This pipeline is now the current, best-practice implementation for nf-test usage on nf-core. We required a number of features to be added to nf-test and a huge shoutout to [Lukas Forer](https://github.com/lukfor) for entertaining our requests and implementing them within upstream :heart:!
 
 ### Credits
 
@@ -12,6 +18,7 @@ Special thanks to the following for their contributions to the release:
 - [Adam Talbot](https://github.com/adamrtalbot)
 - [Alexandru Mizeranschi](https://github.com/nicolae06)
 - [Alexander Blaessle](https://github.com/alexblaessle)
+- [Lukas Forer](https://github.com/lukfor)
 - [Maxime Garcia](https://github.com/maxulysse)
 - [Sebastian Uhrig](https://github.com/suhrig)
 
@@ -34,6 +41,7 @@ Thank you to everyone else that has contributed by reporting bugs, enhancements
 - [PR #261](https://github.com/nf-core/fetchngs/pull/261) - Revert sratools fasterqdump version ([#221](https://github.com/nf-core/fetchngs/issues/221))
 - [PR #262](https://github.com/nf-core/fetchngs/pull/262) - Use nf-test version v0.8.4 and remove implicit tags
 - [PR #263](https://github.com/nf-core/fetchngs/pull/263) - Refine tags used for workflows
+- [PR #264](https://github.com/nf-core/fetchngs/pull/264) - Remove synapse workflow from pipeline
 
 ### Software dependencies
 
@@ -49,9 +57,11 @@ Thank you to everyone else that has contributed by reporting bugs, enhancements
 
 ### Parameters
 
-| Old parameter | New parameter          |
-| ------------- | ---------------------- |
-|               | `--force_ftp_download` |
+| Old parameter      | New parameter          |
+| ------------------ | ---------------------- |
+|                    | `--force_ftp_download` |
+| `--input_type`     |                        |
+| `--synapse_config` |                        |
 
 > **NB:** Parameter has been **updated** if both old and new parameter information is present.
 > **NB:** Parameter has been **added** if just the new parameter information is present.

diff --git a/README.md b/README.md
@@ -72,13 +72,6 @@ Via a single file of ids, provided one-per-line (see [example input file](https:
    - Otherwise use [`sra-tools`](https://github.com/ncbi/sra-tools) to download `.sra` files and convert them to FastQ. Use `--force_sratools_download` to force this behaviour.
 4. Collate id metadata and paths to FastQ files in a single samplesheet
 
-### Synapse ids
-
-1. Resolve Synapse directory ids to their corresponding FastQ files ids via the `synapse list` command.
-2. Retrieve FastQ file metadata including FastQ file names, md5sums, etags, annotations and other data provenance via the `synapse show` command.
-3. Download FastQ files in parallel via `synapse get`
-4. Collate paths to FastQ files in a single samplesheet
-
 ## Pipeline output
 
 The columns in the output samplesheet can be tailored to be accepted out-of-the-box by selected nf-core pipelines (see [usage docs](https://nf-co.re/fetchngs/usage#samplesheet-format)), these currently include:

diff --git a/assets/schema_input.json b/assets/schema_input.json
@@ -9,7 +9,7 @@
         "properties": {
             "": {
                 "type": "string",
-                "pattern": "^(((SR|ER|DR)[APRSX])|(SAM(N|EA|D))|(PRJ(NA|EB|DB))|(GS[EM])|(syn))(\\d+)$",
+                "pattern": "^(((SR|ER|DR)[APRSX])|(SAM(N|EA|D))|(PRJ(NA|EB|DB))|(GS[EM]))(\\d+)$",
                 "errorMessage": "Please provide a valid SRA, ENA, DDBJ or GEO identifier"
             }
         }

diff --git a/conf/test_synapse.config b/conf/test_synapse.config
diff --git a/docs/output.md b/docs/output.md
@@ -8,9 +8,7 @@ This document describes the output produced by the pipeline. The directories lis
 
 The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data depending on the type of ids provided:
 
-- Download FastQ files and create samplesheet from:
-  1. [SRA / ENA / DDBJ / GEO ids](#sra--ena--ddbj--geo-ids)
-  2. [Synapse ids](#synapse-ids)
+- Download FastQ files and create samplesheet from [SRA / ENA / DDBJ / GEO ids](#sra--ena--ddbj--geo-ids)
 - [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
 
 Please see the [usage documentation](https://nf-co.re/fetchngs/usage#introduction) for a list of supported public repository identifiers and how to provide them to the pipeline.
@@ -36,27 +34,6 @@ Please see the [usage documentation](https://nf-co.re/fetchngs/usage#introductio
 
 The final sample information for all identifiers is obtained from the ENA which provides direct download links for FastQ files as well as their associated md5 sums. If download links exist, the files will be downloaded in parallel by FTP. Otherwise they are downloaded using sra-tools.
 
-### Synapse ids
-
-<details markdown="1">
-<summary>Output files</summary>
-
-- `fastq/`
-  - `*.fastq.gz`: Paired-end/single-end reads downloaded from Synapse.
-- `fastq/md5/`
-  - `*.md5`: Files containing `md5` sum for FastQ files downloaded from the Synapse platform.
-- `samplesheet/`
-  - `samplesheet.csv`: Auto-created samplesheet with collated metadata and paths to downloaded FastQ files.
-- `metadata/`
-  - `*.metadata.txt`: Original metadata file generated using the `synapse show` command.
-  - `*.list.txt`: Original output of the `synapse list` command, containing the Synapse ids, file version numbers, file names, and other file-specific data for the Synapse directory ID provided.
-
-</details>
-
-FastQ files and corresponding sample information for `Synapse` identifiers are downloaded in parallel directly from the [Synapse](https://www.synapse.org/#) platform. A [configuration file](http://python-docs.synapse.org/build/html/Credentials.html#use-synapseconfig) containing valid login credentials is required for Synapse downloads.
-
-The final sample information for the FastQ files downloaded from `Synapse` is obtained from the file name itself. The file names are parsed according to the glob pattern `*{1,2}*`. This returns the sample name, presumed to be the longest possible string matching the glob pattern, with the fewest number of wildcard insertions. Further information on sample name parsing can be found in the [usage documentation](https://nf-co.re/fetchngs/usage#introduction).
-
 ### Pipeline information
 
 <details markdown="1">

diff --git a/docs/usage.md b/docs/usage.md
@@ -8,15 +8,15 @@
 
 The pipeline has been set-up to automatically download and process the raw FastQ files from both public and private repositories. Identifiers can be provided in a file, one-per-line via the `--input` parameter. Currently, the following types of example identifiers are supported:
 
-| `SRA`        | `ENA`        | `DDBJ`       | `GEO`      | `Synapse`   |
-| ------------ | ------------ | ------------ | ---------- | ----------- |
-| SRR11605097  | ERR4007730   | DRR171822    | GSM4432381 | syn26240435 |
-| SRX8171613   | ERX4009132   | DRX162434    | GSE147507  |             |
-| SRS6531847   | ERS4399630   | DRS090921    |            |             |
-| SAMN14689442 | SAMEA6638373 | SAMD00114846 |            |             |
-| SRP256957    | ERP120836    | DRP004793    |            |             |
-| SRA1068758   | ERA2420837   | DRA008156    |            |             |
-| PRJNA625551  | PRJEB37513   | PRJDB4176    |            |             |
+| `SRA`        | `ENA`        | `DDBJ`       | `GEO`      |
+| ------------ | ------------ | ------------ | ---------- |
+| SRR11605097  | ERR4007730   | DRR171822    | GSM4432381 |
+| SRX8171613   | ERX4009132   | DRX162434    | GSE147507  |
+| SRS6531847   | ERS4399630   | DRS090921    |            |
+| SAMN14689442 | SAMEA6638373 | SAMD00114846 |            |
+| SRP256957    | ERP120836    | DRP004793    |            |
+| SRA1068758   | ERA2420837   | DRA008156    |            |
+| PRJNA625551  | PRJEB37513   | PRJDB4176    |            |
 
 ### SRR / ERR / DRR ids
 
@@ -34,25 +34,6 @@ If you have a GEO accession (found in the data availability section of published
 
 This downloads a text file called `SRR_Acc_List.txt` that can be directly provided to the pipeline once renamed with a .csv extension e.g. `--input SRR_Acc_List.csv`.
 
-### Synapse ids
-
-[Synapse](https://www.synapse.org/#) is a collaborative research platform created by [Sage Bionetworks](https://sagebionetworks.org/). Its aim is to promote reproducible research and responsible data sharing throughout the biomedical community. To download data from `Synapse`, the Synapse id of the _directory_ containing all files to be downloaded should be provided. The Synapse id should be an eleven-characters beginning with `syn`.
-
-This Synapse id will then be resolved to the Synapse id of the corresponding FastQ files contained within the directory. The individual FastQ files are then downloaded in parellel using the `synapse get` command. All Synapse metadata, annotations and data provenance are also downloaded using the `synapse show` command, and are outputted to a separate metadata file. By default, only the md5sums, file sizes, etags, Synapse ids, file names, and file versions are shown.
-
-In order to download data from Synapse, an account must be created and a user configuration file provided via the parameter `--synapse_config`. For more information about Synapse configuration, please see the [Synapse client configuration](https://help.synapse.org/docs/Client-Configuration.1985446156.html) documentation.
-
-The final sample information for the FastQ files used for samplesheet generation is obtained from the file name itself. The file names are parsed according to the glob pattern `*{1,2}*`, which returns the sample name, presumed to be the longest possible string matching the glob pattern, with the fewest number of wildcard insertions.
-
-<details markdown="1">
-<summary>Supported File Names</summary>
-
-- Files named `SRR493366_1.fastq` and `SRR493366_2.fastq` will have a sample name of `SRR493366`
-- Files named `SRR_493_367_1.fastq` and `SRR_493_367_2.fastq` will have a sample name of `SRR_493_367`
-- Files named `filename12_1.fastq` and `filename12_2.fastq` will have a sample name of `filename12`
-
-</details>
-
 ### Samplesheet format
 
 As a bonus, the columns in the auto-created samplesheet can be tailored to be accepted out-of-the-box by selected nf-core pipelines, these currently include:

diff --git a/main.nf b/main.nf
@@ -17,8 +17,7 @@ nextflow.enable.dsl = 2
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 */
 
-if (params.input_type == 'sra')     include { SRA     } from './workflows/sra'
-if (params.input_type == 'synapse') include { SYNAPSE } from './workflows/synapse'
+include { SRA } from './workflows/sra'
 
 //
 // WORKFLOW: Run main nf-core/fetchngs analysis pipeline depending on type of identifier provided
@@ -33,15 +32,7 @@ workflow NFCORE_FETCHNGS {
     //
     // WORKFLOW: Download FastQ files for SRA / ENA / GEO / DDBJ ids
     //
-    if (params.input_type == 'sra') {
-        SRA ( ids )
-
-    //
-    // WORKFLOW: Download FastQ files for Synapse ids
-    //
-    } else if (params.input_type == 'synapse') {
-        SYNAPSE ( ids )
-    }
+    SRA ( ids )
 
 }
 
@@ -69,7 +60,6 @@ workflow {
         params.monochrome_logs,
         params.outdir,
         params.input,
-        params.input_type,
         params.ena_metadata_fields
     )
 
@@ -84,7 +74,6 @@ workflow {
     // SUBWORKFLOW: Run completion tasks
     //
     PIPELINE_COMPLETION (
-        params.input_type,
         params.email,
         params.email_on_fail,
         params.plaintext_email,

diff --git a/modules/local/synapse_get/main.nf b/modules/local/synapse_get/main.nf
diff --git a/modules/local/synapse_get/nextflow.config b/modules/local/synapse_get/nextflow.config
diff --git a/modules/local/synapse_list/main.nf b/modules/local/synapse_list/main.nf
diff --git a/modules/local/synapse_list/nextflow.config b/modules/local/synapse_list/nextflow.config
diff --git a/modules/local/synapse_merge_samplesheet/main.nf b/modules/local/synapse_merge_samplesheet/main.nf
diff --git a/modules/local/synapse_merge_samplesheet/nextflow.config b/modules/local/synapse_merge_samplesheet/nextflow.config