Skip to content

Commit

Permalink
Merge pull request #14 from drpatelh/release
Browse files Browse the repository at this point in the history
Change container for sra_ids_to_runinfo.nf process
  • Loading branch information
drpatelh committed Jun 22, 2021
2 parents c791fc6 + 721621f commit 3e06f50
Show file tree
Hide file tree
Showing 17 changed files with 165 additions and 35 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ jobs:
parameters:
[
"--nf_core_pipeline rnaseq",
"--ena_metadata_fields run_accession,experiment_accession,library_layout,fastq_ftp,fastq_md5",
"--ena_metadata_fields run_accession,experiment_accession,library_layout,fastq_ftp,fastq_md5 --sample_mapping_fields run_accession,library_layout",
--skip_fastq_download,
]
steps:
Expand Down
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,13 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [[1.1](https://github.com/nf-core/fetchngs/releases/tag/1.1)] - 2021-06-22

### Enhancements & fixes

* [[#12](https://github.com/nf-core/fetchngs/issues/12)] - Error when using singularity - /etc/resolv.conf doesn't exist in container
* Added `--sample_mapping_fields` parameter to create a separate `id_mappings.csv` and `multiqc_config.yml` with selected fields that can be used to rename samples in general and in [MultiQC](https://multiqc.info/docs/#bulk-sample-renaming)

## [[1.0](https://github.com/nf-core/fetchngs/releases/tag/1.0)] - 2021-06-08

Initial release of nf-core/fetchngs, created with the [nf-core](https://nf-co.re/) template.
Expand Down
13 changes: 13 additions & 0 deletions bin/multiqc_mappings_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/usr/bin/env python

import sys

with open(sys.argv[1], "r") as fin, open(sys.argv[2], "w") as fout:
header = fin.readline().split(',')
config = "sample_names_rename_buttons:\n"
config += "\n".join([' - ' + x.strip('"') for x in header])
config += "sample_names_rename:\n"
for line in fin:
config += f" - [{', '.join(line.strip().split(','))}]\n"
fout.write(config)

4 changes: 2 additions & 2 deletions bin/scrape_software_versions.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
print(" <dt>{}</dt><dd><samp>{}</samp></dd>".format(k, v))
print(" </dl>")

# Write out regexes as csv file:
with open("software_versions.csv", "w") as f:
# Write out as tsv file:
with open("software_versions.tsv", "w") as f:
for k, v in sorted(results.items()):
f.write("{}\t{}\n".format(k, v))
3 changes: 3 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -42,5 +42,8 @@ params {
'sra_merge_samplesheet' {
publish_dir = 'samplesheet'
}
'multiqc_mappings_config' {
publish_dir = 'samplesheet'
}
}
}
8 changes: 5 additions & 3 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,11 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
* `*.md5`: Files containing `md5` sum for FastQ files downloaded from the ENA / SRA.
* `samplesheet/`
* `samplesheet.csv`: Auto-created samplesheet with collated metadata and paths to downloaded FastQ files.
* `id_mappings.csv`: File with selected fields that can be used to rename samples to more informative names; see [`--sample_mapping_fields`](https://nf-co.re/fetchngs/parameters#sample_mapping_fields) parameter to customise this behaviour.
* `multiqc_config.yml`: [MultiQC](https://multiqc.info/docs/#bulk-sample-renaming) config file that can be passed to most nf-core pipelines via the `--multiqc_config` parameter for bulk renaming of sample names from database ids; [`--sample_mapping_fields`](https://nf-co.re/fetchngs/parameters#sample_mapping_fields) parameter to customise this behaviour.
* `metadata/`
* `*.runinfo_ftp.tsv`: Re-formatted metadata file downloaded from the ENA
* `*.runinfo.tsv`: Original metadata file downloaded from the ENA
* `*.runinfo_ftp.tsv`: Re-formatted metadata file downloaded from the ENA.
* `*.runinfo.tsv`: Original metadata file downloaded from the ENA.

</details>

Expand All @@ -37,7 +39,7 @@ Please see the [usage documentation](https://nf-co.re/fetchngs/usage#introductio

* `pipeline_info/`
* Reports generated by Nextflow: `execution_report.html`, `execution_timeline.html`, `execution_trace.txt` and `pipeline_dag.dot`/`pipeline_dag.svg`.
* Reports generated by the pipeline: `pipeline_report.html`, `pipeline_report.txt` and `software_versions.csv`.
* Reports generated by the pipeline: `pipeline_report.html`, `pipeline_report.txt` and `software_versions.tsv`.

</details>

Expand Down
8 changes: 4 additions & 4 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,15 +38,15 @@ This downloads a text file called `SRR_Acc_List.txt` which can be directly provi

The typical command for running the pipeline is as follows:

```bash
```console
nextflow run nf-core/fetchngs --input ids.txt -profile docker
```

This will launch the pipeline with the `docker` configuration profile. See below for more information about profiles.

Note that the pipeline will create the following files in your working directory:

```bash
```console
work # Directory containing the nextflow working files
results # Finished results (configurable, see below)
.nextflow_log # Log file from Nextflow
Expand All @@ -57,7 +57,7 @@ results # Finished results (configurable, see below)

When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline:

```bash
```console
nextflow pull nf-core/fetchngs
```

Expand Down Expand Up @@ -221,6 +221,6 @@ Some HPC setups also allow you to run nextflow within a cluster job submitted yo
In some cases, the Nextflow Java virtual machines can start to request a large amount of memory.
We recommend adding the following line to your environment to limit this (typically in `~/.bashrc` or `~./bash_profile`):

```bash
```console
NXF_OPTS='-Xms1g -Xmx4g'
```
10 changes: 5 additions & 5 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,12 @@ WorkflowMain.initialise(workflow, params, log)
========================================================================================
*/

workflow NFCORE_FETCHNGS {
include { FETCHNGS } from './workflows/fetchngs'

//
// WORKFLOW: Run main nf-core/fetchngs analysis pipeline
//
include { FETCHNGS } from './workflows/fetchngs'
//
// WORKFLOW: Run main nf-core/fetchngs analysis pipeline
//
workflow NFCORE_FETCHNGS {
FETCHNGS ()
}

Expand Down
2 changes: 1 addition & 1 deletion modules/local/get_software_versions.nf
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ process GET_SOFTWARE_VERSIONS {
path versions

output:
path "software_versions.csv" , emit: csv
path "software_versions.tsv" , emit: tsv
path 'software_versions_mqc.yaml', emit: yaml

script: // This script is bundled with the pipeline, in nf-core/fetchngs/bin/
Expand Down
33 changes: 33 additions & 0 deletions modules/local/multiqc_mappings_config.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
// Import generic module functions
include { saveFiles; getSoftwareName } from './functions'

params.options = [:]

process MULTIQC_MAPPINGS_CONFIG {
publishDir "${params.outdir}",
mode: params.publish_dir_mode,
saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:[:], publish_by_meta:[]) }

conda (params.enable_conda ? "conda-forge::python=3.8.3" : null)
if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) {
container "https://depot.galaxyproject.org/singularity/python:3.8.3"
} else {
container "quay.io/biocontainers/python:3.8.3"
}

input:
path csv

output:
path "*yml" , emit: yml
path "*.version.txt", emit: version

script:
"""
multiqc_mappings_config.py \\
$csv \\
multiqc_config.yml
python --version | sed -e "s/Python //g" > python.version.txt
"""
}
6 changes: 3 additions & 3 deletions modules/local/sra_ids_to_runinfo.nf
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@ process SRA_IDS_TO_RUNINFO {
mode: params.publish_dir_mode,
saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:[:], publish_by_meta:[]) }

conda (params.enable_conda ? "conda-forge::requests=2.24.0" : null)
conda (params.enable_conda ? "conda-forge::sed=4.7" : null)
if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) {
container "https://depot.galaxyproject.org/singularity/requests:2.24.0"
container "https://containers.biocontainers.pro/s3/SingImgsRepo/biocontainers/v1.2.0_cv1/biocontainers_v1.2.0_cv1.img"
} else {
container "quay.io/biocontainers/requests:2.24.0"
container "biocontainers/biocontainers:v1.2.0_cv1"
}

input:
Expand Down
9 changes: 8 additions & 1 deletion modules/local/sra_merge_samplesheet.nf
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,22 @@ process SRA_MERGE_SAMPLESHEET {

input:
path ('samplesheets/*')
path ('mappings/*')

output:
path "*csv", emit: csv
path "samplesheet.csv", emit: samplesheet
path "id_mappings.csv" , emit: mappings

script:
"""
head -n 1 `ls ./samplesheets/* | head -n 1` > samplesheet.csv
for fileid in `ls ./samplesheets/*`; do
awk 'NR>1' \$fileid >> samplesheet.csv
done
head -n 1 `ls ./mappings/* | head -n 1` > id_mappings.csv
for fileid in `ls ./mappings/*`; do
awk 'NR>1' \$fileid >> id_mappings.csv
done
"""
}
5 changes: 4 additions & 1 deletion modules/local/sra_runinfo_to_ftp.nf
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,15 @@ process SRA_RUNINFO_TO_FTP {
path runinfo

output:
path "*.tsv", emit: tsv
path "*.tsv" , emit: tsv
path "*.version.txt", emit: version

script:
"""
sra_runinfo_to_ftp.py \\
${runinfo.join(',')} \\
${runinfo.toString().tokenize(".")[0]}.runinfo_ftp.tsv
python --version | sed -e "s/Python //g" > python.version.txt
"""
}
35 changes: 29 additions & 6 deletions modules/local/sra_to_samplesheet.nf
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,17 @@ process SRA_TO_SAMPLESHEET {
input:
tuple val(meta), path(fastq)
val pipeline
val mapping_fields

output:
tuple val(meta), path("*csv"), emit: csv
tuple val(meta), path("*samplesheet.csv"), emit: samplesheet
tuple val(meta), path("*mappings.csv") , emit: mappings

exec:
//
// Create samplesheet containing metadata
//

// Remove custom keys needed to download the data
def meta_map = meta.clone()
meta_map.remove("id")
Expand All @@ -45,10 +51,27 @@ process SRA_TO_SAMPLESHEET {
pipeline_map << meta_map

// Create a samplesheet
csv = pipeline_map.keySet().collect{ '"' + it + '"'}.join(",") + '\n'
csv += pipeline_map.values().collect{ '"' + it + '"'}.join(",")
samplesheet = pipeline_map.keySet().collect{ '"' + it + '"'}.join(",") + '\n'
samplesheet += pipeline_map.values().collect{ '"' + it + '"'}.join(",")

// Write samplesheet to file
def samplesheet_file = task.workDir.resolve("${meta.id}.samplesheet.csv")
samplesheet_file.text = samplesheet

//
// Create sample id mappings file
//
mappings_map = pipeline_map.clone()
def fields = mapping_fields ? ['sample'] + mapping_fields.split(',').collect{ it.trim().toLowerCase() } : []
if ((mappings_map.keySet() + fields).unique().size() != mappings_map.keySet().size()) {
error("Invalid option for '--sample_mapping_fields': ${mapping_fields}.\nValid options: ${mappings_map.keySet().join(', ')}")
}

// Create mappings
mappings = fields.collect{ '"' + it + '"'}.join(",") + '\n'
mappings += mappings_map.subMap(fields).values().collect{ '"' + it + '"'}.join(",")

// Write to file
def file = task.workDir.resolve("${meta.id}.samplesheet.csv")
file.text = csv
// Write mappings to file
def mappings_file = task.workDir.resolve("${meta.id}.mappings.csv")
mappings_file.text = mappings
}
3 changes: 2 additions & 1 deletion nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ params {
input = null
nf_core_pipeline = null
ena_metadata_fields = null
sample_mapping_fields = 'run_accession,sample_accession,experiment_alias,run_alias,sample_alias,experiment_title,sample_title,sample_description,description'
skip_fastq_download = false

// Boilerplate options
Expand Down Expand Up @@ -146,7 +147,7 @@ manifest {
description = 'Pipeline to fetch metadata and raw FastQ files from public databases'
mainScript = 'main.nf'
nextflowVersion = '!>=21.04.0'
version = '1.0'
version = '1.1'
}

// Function to ensure that resource requirements don't go beyond
Expand Down
7 changes: 7 additions & 0 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,12 @@
"description": "Comma-separated list of ENA metadata fields to fetch before downloading data.",
"help_text": "The default list of fields used by the pipeline can be found at the top of the [`bin/sra_ids_to_runinfo.py`](https://github.com/nf-core/fetchngs/blob/master/bin/sra_ids_to_runinfo.py) script within the pipeline repo. This pipeline requires a minimal set of fields to download FastQ files i.e. `'run_accession,experiment_accession,library_layout,fastq_ftp,fastq_md5'`. Full list of accepted metadata fields can be obtained from the [ENA API](https://www.ebi.ac.uk/ena/portal/api/returnFields?dataPortal=ena&format=tsv&result=read_run])."
},
"sample_mapping_fields": {
"type": "string",
"fa_icon": "fas fa-globe-americas",
"description": "Comma-separated list of ENA metadata fields used to create mappings file for sample id mapping or renaming.",
"default": "run_accession,sample_accession,experiment_alias,run_alias,sample_alias,experiment_title,sample_title,sample_description,description"
},
"nf_core_pipeline": {
"type": "string",
"fa_icon": "fab fa-apple",
Expand Down Expand Up @@ -244,3 +250,4 @@
}
]
}

Loading

0 comments on commit 3e06f50

Please sign in to comment.