Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove synapse workflow from pipeline #264

Merged
merged 6 commits into from
Feb 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 14 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,13 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unpublished Version / DEV]
## [[1.12.0](https://github.com/nf-core/fetchngs/releases/tag/1.12.0)] - 2024-02-02

### :warning: Major enhancements

- The Aspera CLI was recently added to [Bioconda](https://anaconda.org/bioconda/aspera-cli) and we have added it as another way of downloading FastQ files on top of the existing FTP and sra-tools support. In our limited benchmarks on all public Clouds we found ~50% speed-up in download times compared to FTP! We are not aware of any obvious downsides and have made this the default download method in the pipeline. You can however, revert to using FTP and sra-tools using the `--force_ftp_download` and `--force_sratools_download` parameters, respectively. We would love to have your feedback!
- Support for Synapse ids has been dropped in this release. We haven't had any feedback from users whether it is being used or not. Users can run earlier versions of the pipeline if required.
- We have significantly refactored and standardised the way we are using nf-test within this pipeline. This pipeline is now the current, best-practice implementation for nf-test usage on nf-core. We required a number of features to be added to nf-test and a huge shoutout to [Lukas Forer](https://github.com/lukfor) for entertaining our requests and implementing them within upstream :heart:!

### Credits

Expand All @@ -12,6 +18,7 @@ Special thanks to the following for their contributions to the release:
- [Adam Talbot](https://github.com/adamrtalbot)
- [Alexandru Mizeranschi](https://github.com/nicolae06)
- [Alexander Blaessle](https://github.com/alexblaessle)
- [Lukas Forer](https://github.com/lukfor)
- [Maxime Garcia](https://github.com/maxulysse)
- [Sebastian Uhrig](https://github.com/suhrig)

Expand All @@ -34,6 +41,7 @@ Thank you to everyone else that has contributed by reporting bugs, enhancements
- [PR #261](https://github.com/nf-core/fetchngs/pull/261) - Revert sratools fasterqdump version ([#221](https://github.com/nf-core/fetchngs/issues/221))
- [PR #262](https://github.com/nf-core/fetchngs/pull/262) - Use nf-test version v0.8.4 and remove implicit tags
- [PR #263](https://github.com/nf-core/fetchngs/pull/263) - Refine tags used for workflows
- [PR #264](https://github.com/nf-core/fetchngs/pull/264) - Remove synapse workflow from pipeline

### Software dependencies

Expand All @@ -49,9 +57,11 @@ Thank you to everyone else that has contributed by reporting bugs, enhancements

### Parameters

| Old parameter | New parameter |
| ------------- | ---------------------- |
| | `--force_ftp_download` |
| Old parameter | New parameter |
| ------------------ | ---------------------- |
| | `--force_ftp_download` |
| `--input_type` | |
| `--synapse_config` | |

> **NB:** Parameter has been **updated** if both old and new parameter information is present.
> **NB:** Parameter has been **added** if just the new parameter information is present.
Expand Down
7 changes: 0 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,13 +72,6 @@ Via a single file of ids, provided one-per-line (see [example input file](https:
- Otherwise use [`sra-tools`](https://github.com/ncbi/sra-tools) to download `.sra` files and convert them to FastQ. Use `--force_sratools_download` to force this behaviour.
4. Collate id metadata and paths to FastQ files in a single samplesheet

### Synapse ids

1. Resolve Synapse directory ids to their corresponding FastQ files ids via the `synapse list` command.
2. Retrieve FastQ file metadata including FastQ file names, md5sums, etags, annotations and other data provenance via the `synapse show` command.
3. Download FastQ files in parallel via `synapse get`
4. Collate paths to FastQ files in a single samplesheet

## Pipeline output

The columns in the output samplesheet can be tailored to be accepted out-of-the-box by selected nf-core pipelines (see [usage docs](https://nf-co.re/fetchngs/usage#samplesheet-format)), these currently include:
Expand Down
2 changes: 1 addition & 1 deletion assets/schema_input.json
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
"properties": {
"": {
"type": "string",
"pattern": "^(((SR|ER|DR)[APRSX])|(SAM(N|EA|D))|(PRJ(NA|EB|DB))|(GS[EM])|(syn))(\\d+)$",
"pattern": "^(((SR|ER|DR)[APRSX])|(SAM(N|EA|D))|(PRJ(NA|EB|DB))|(GS[EM]))(\\d+)$",
"errorMessage": "Please provide a valid SRA, ENA, DDBJ or GEO identifier"
}
}
Expand Down
25 changes: 0 additions & 25 deletions conf/test_synapse.config

This file was deleted.

25 changes: 1 addition & 24 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,7 @@ This document describes the output produced by the pipeline. The directories lis

The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data depending on the type of ids provided:

- Download FastQ files and create samplesheet from:
1. [SRA / ENA / DDBJ / GEO ids](#sra--ena--ddbj--geo-ids)
2. [Synapse ids](#synapse-ids)
- Download FastQ files and create samplesheet from [SRA / ENA / DDBJ / GEO ids](#sra--ena--ddbj--geo-ids)
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution

Please see the [usage documentation](https://nf-co.re/fetchngs/usage#introduction) for a list of supported public repository identifiers and how to provide them to the pipeline.
Expand All @@ -36,27 +34,6 @@ Please see the [usage documentation](https://nf-co.re/fetchngs/usage#introductio

The final sample information for all identifiers is obtained from the ENA which provides direct download links for FastQ files as well as their associated md5 sums. If download links exist, the files will be downloaded in parallel by FTP. Otherwise they are downloaded using sra-tools.

### Synapse ids

<details markdown="1">
<summary>Output files</summary>

- `fastq/`
- `*.fastq.gz`: Paired-end/single-end reads downloaded from Synapse.
- `fastq/md5/`
- `*.md5`: Files containing `md5` sum for FastQ files downloaded from the Synapse platform.
- `samplesheet/`
- `samplesheet.csv`: Auto-created samplesheet with collated metadata and paths to downloaded FastQ files.
- `metadata/`
- `*.metadata.txt`: Original metadata file generated using the `synapse show` command.
- `*.list.txt`: Original output of the `synapse list` command, containing the Synapse ids, file version numbers, file names, and other file-specific data for the Synapse directory ID provided.

</details>

FastQ files and corresponding sample information for `Synapse` identifiers are downloaded in parallel directly from the [Synapse](https://www.synapse.org/#) platform. A [configuration file](http://python-docs.synapse.org/build/html/Credentials.html#use-synapseconfig) containing valid login credentials is required for Synapse downloads.

The final sample information for the FastQ files downloaded from `Synapse` is obtained from the file name itself. The file names are parsed according to the glob pattern `*{1,2}*`. This returns the sample name, presumed to be the longest possible string matching the glob pattern, with the fewest number of wildcard insertions. Further information on sample name parsing can be found in the [usage documentation](https://nf-co.re/fetchngs/usage#introduction).

### Pipeline information

<details markdown="1">
Expand Down
37 changes: 9 additions & 28 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,15 @@

The pipeline has been set-up to automatically download and process the raw FastQ files from both public and private repositories. Identifiers can be provided in a file, one-per-line via the `--input` parameter. Currently, the following types of example identifiers are supported:

| `SRA` | `ENA` | `DDBJ` | `GEO` | `Synapse` |
| ------------ | ------------ | ------------ | ---------- | ----------- |
| SRR11605097 | ERR4007730 | DRR171822 | GSM4432381 | syn26240435 |
| SRX8171613 | ERX4009132 | DRX162434 | GSE147507 | |
| SRS6531847 | ERS4399630 | DRS090921 | | |
| SAMN14689442 | SAMEA6638373 | SAMD00114846 | | |
| SRP256957 | ERP120836 | DRP004793 | | |
| SRA1068758 | ERA2420837 | DRA008156 | | |
| PRJNA625551 | PRJEB37513 | PRJDB4176 | | |
| `SRA` | `ENA` | `DDBJ` | `GEO` |
| ------------ | ------------ | ------------ | ---------- |
| SRR11605097 | ERR4007730 | DRR171822 | GSM4432381 |
| SRX8171613 | ERX4009132 | DRX162434 | GSE147507 |
| SRS6531847 | ERS4399630 | DRS090921 | |
| SAMN14689442 | SAMEA6638373 | SAMD00114846 | |
| SRP256957 | ERP120836 | DRP004793 | |
| SRA1068758 | ERA2420837 | DRA008156 | |
| PRJNA625551 | PRJEB37513 | PRJDB4176 | |

### SRR / ERR / DRR ids

Expand All @@ -34,25 +34,6 @@ If you have a GEO accession (found in the data availability section of published

This downloads a text file called `SRR_Acc_List.txt` that can be directly provided to the pipeline once renamed with a .csv extension e.g. `--input SRR_Acc_List.csv`.

### Synapse ids

[Synapse](https://www.synapse.org/#) is a collaborative research platform created by [Sage Bionetworks](https://sagebionetworks.org/). Its aim is to promote reproducible research and responsible data sharing throughout the biomedical community. To download data from `Synapse`, the Synapse id of the _directory_ containing all files to be downloaded should be provided. The Synapse id should be an eleven-characters beginning with `syn`.

This Synapse id will then be resolved to the Synapse id of the corresponding FastQ files contained within the directory. The individual FastQ files are then downloaded in parellel using the `synapse get` command. All Synapse metadata, annotations and data provenance are also downloaded using the `synapse show` command, and are outputted to a separate metadata file. By default, only the md5sums, file sizes, etags, Synapse ids, file names, and file versions are shown.

In order to download data from Synapse, an account must be created and a user configuration file provided via the parameter `--synapse_config`. For more information about Synapse configuration, please see the [Synapse client configuration](https://help.synapse.org/docs/Client-Configuration.1985446156.html) documentation.

The final sample information for the FastQ files used for samplesheet generation is obtained from the file name itself. The file names are parsed according to the glob pattern `*{1,2}*`, which returns the sample name, presumed to be the longest possible string matching the glob pattern, with the fewest number of wildcard insertions.

<details markdown="1">
<summary>Supported File Names</summary>

- Files named `SRR493366_1.fastq` and `SRR493366_2.fastq` will have a sample name of `SRR493366`
- Files named `SRR_493_367_1.fastq` and `SRR_493_367_2.fastq` will have a sample name of `SRR_493_367`
- Files named `filename12_1.fastq` and `filename12_2.fastq` will have a sample name of `filename12`

</details>

### Samplesheet format

As a bonus, the columns in the auto-created samplesheet can be tailored to be accepted out-of-the-box by selected nf-core pipelines, these currently include:
Expand Down
15 changes: 2 additions & 13 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,7 @@ nextflow.enable.dsl = 2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

if (params.input_type == 'sra') include { SRA } from './workflows/sra'
if (params.input_type == 'synapse') include { SYNAPSE } from './workflows/synapse'
include { SRA } from './workflows/sra'

//
// WORKFLOW: Run main nf-core/fetchngs analysis pipeline depending on type of identifier provided
Expand All @@ -33,15 +32,7 @@ workflow NFCORE_FETCHNGS {
//
// WORKFLOW: Download FastQ files for SRA / ENA / GEO / DDBJ ids
//
if (params.input_type == 'sra') {
SRA ( ids )

//
// WORKFLOW: Download FastQ files for Synapse ids
//
} else if (params.input_type == 'synapse') {
SYNAPSE ( ids )
}
SRA ( ids )

}

Expand Down Expand Up @@ -69,7 +60,6 @@ workflow {
params.monochrome_logs,
params.outdir,
params.input,
params.input_type,
params.ena_metadata_fields
)

Expand All @@ -84,7 +74,6 @@ workflow {
// SUBWORKFLOW: Run completion tasks
//
PIPELINE_COMPLETION (
params.input_type,
params.email,
params.email_on_fail,
params.plaintext_email,
Expand Down
37 changes: 0 additions & 37 deletions modules/local/synapse_get/main.nf

This file was deleted.

16 changes: 0 additions & 16 deletions modules/local/synapse_get/nextflow.config

This file was deleted.

36 changes: 0 additions & 36 deletions modules/local/synapse_list/main.nf

This file was deleted.

10 changes: 0 additions & 10 deletions modules/local/synapse_list/nextflow.config

This file was deleted.

28 changes: 0 additions & 28 deletions modules/local/synapse_merge_samplesheet/main.nf

This file was deleted.

9 changes: 0 additions & 9 deletions modules/local/synapse_merge_samplesheet/nextflow.config

This file was deleted.

Loading
Loading