diff --git a/.editorconfig b/.editorconfig new file mode 100644 index 00000000..95549501 --- /dev/null +++ b/.editorconfig @@ -0,0 +1,27 @@ +root = true + +[*] +charset = utf-8 +end_of_line = lf +insert_final_newline = true +trim_trailing_whitespace = true +indent_size = 4 +indent_style = space + +[*.{yml,yaml}] +indent_size = 2 + +[*.json] +insert_final_newline = unset + +# These files are edited and tested upstream in nf-core/modules +[/modules/nf-core/**] +charset = unset +end_of_line = unset +insert_final_newline = unset +trim_trailing_whitespace = unset +indent_style = unset +indent_size = unset + +[/assets/email*] +indent_size = unset diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index 12722e4a..c01930f6 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -19,7 +19,7 @@ If you'd like to write some code for nf-core/scrnaseq, the standard workflow is * If there isn't one already, please create one so that others know you're working on this 2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [nf-core/scrnaseq repository](https://github.com/nf-core/scrnaseq) to your GitHub account 3. Make the necessary changes / additions within your forked repository following [Pipeline conventions](#pipeline-contribution-conventions) -4. Use `nf-core schema build .` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10). +4. Use `nf-core schema build` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10). 5. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged If you're not used to this workflow with git, you can start with some [docs from GitHub](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests) or even their [excellent `git` resources](https://try.github.io/). @@ -69,12 +69,12 @@ If you wish to contribute a new step, please use the following coding standards: 2. Write the process block (see below). 3. Define the output channel if needed (see below). 4. Add any new flags/options to `nextflow.config` with a default (see below). -5. Add any new flags/options to `nextflow_schema.json` with help text (with `nf-core schema build .`). +5. Add any new flags/options to `nextflow_schema.json` with help text (with `nf-core schema build`). 6. Add any new flags/options to the help message (for integer/text parameters, print to help the corresponding `nextflow.config` parameter). 7. Add sanity checks for all relevant parameters. 8. Add any new software to the `scrape_software_versions.py` script in `bin/` and the version command to the `scrape_software_versions` process in `main.nf`. 9. Do local tests that the new code works properly and as expected. -10. Add a new test command in `.github/workflow/ci.yaml`. +10. Add a new test command in `.github/workflow/ci.yml`. 11. If applicable add a [MultiQC](https://https://multiqc.info/) module. 12. Update MultiQC config `assets/multiqc_config.yaml` so relevant suffixes, name clean up, General Statistics Table column order, and module figures are in the right order. 13. Optional: Add any descriptions of MultiQC report sections and output files to `docs/output.md`. @@ -83,7 +83,7 @@ If you wish to contribute a new step, please use the following coding standards: Parameters should be initialised / defined with default values in `nextflow.config` under the `params` scope. -Once there, use `nf-core schema build .` to add to `nextflow_schema.json`. +Once there, use `nf-core schema build` to add to `nextflow_schema.json`. ### Default processes resource requirements diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md index 4180710a..e9e16009 100644 --- a/.github/ISSUE_TEMPLATE/bug_report.md +++ b/.github/ISSUE_TEMPLATE/bug_report.md @@ -18,7 +18,7 @@ Please delete this text and anything that's not relevant from the template below I have checked the following places for your error: - [ ] [nf-core website: troubleshooting](https://nf-co.re/usage/troubleshooting) -- [ ] [nf-core/scrnaseq pipeline documentation](https://nf-co.re/nf-core/scrnaseq/usage) +- [ ] [nf-core/scrnaseq pipeline documentation](https://nf-co.re/scrnaseq/usage) ## Description of the bug @@ -51,13 +51,12 @@ Have you provided the following extra information/files: ## Nextflow Installation -- Version: +- Version: ## Container engine - Engine: - version: -- Image tag: ## Additional context diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index a491f153..062da8b7 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -10,15 +10,15 @@ Remember that PRs should be made against the dev branch, unless you're preparing Learn more about contributing: [CONTRIBUTING.md](https://github.com/nf-core/scrnaseq/tree/master/.github/CONTRIBUTING.md) --> + ## PR checklist - [ ] This comment contains a description of changes (with reason). - [ ] If you've fixed a bug or added code that should be tested, add tests! - - [ ] If you've added a new tool - add to the software_versions process and a regex to `scrape_software_versions.py` - - [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/nf-core/scrnaseq/tree/master/.github/CONTRIBUTING.md) - - [ ] If necessary, also make a PR on the nf-core/scrnaseq _branch_ on the [nf-core/test-datasets](https://github.com/nf-core/test-datasets) repository. -- [ ] Make sure your code lints (`nf-core lint .`). + - [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/nf-core/scrnaseq/tree/master/.github/CONTRIBUTING.md) + - [ ] If necessary, also make a PR on the nf-core/scrnaseq _branch_ on the [nf-core/test-datasets](https://github.com/nf-core/test-datasets) repository. +- [ ] Make sure your code lints (`nf-core lint`). - [ ] Ensure the test suite passes (`nextflow run . -profile test,docker`). - [ ] Usage Documentation in `docs/usage.md` is updated. - [ ] Output Documentation in `docs/output.md` is updated. diff --git a/.github/workflows/awsfulltest.yml b/.github/workflows/awsfulltest.yml index b936bc9a..5674e4b4 100644 --- a/.github/workflows/awsfulltest.yml +++ b/.github/workflows/awsfulltest.yml @@ -1,46 +1,34 @@ name: nf-core AWS full size tests # This workflow is triggered on published releases. -# It can be additionally triggered manually with GitHub actions workflow dispatch. +# It can be additionally triggered manually with GitHub actions workflow dispatch button. # It runs the -profile 'test_full' on AWS batch on: - workflow_run: - workflows: ["nf-core Docker push (release)"] - types: [completed] + release: + types: [published] workflow_dispatch: - - -env: - AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} - AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} - TOWER_ACCESS_TOKEN: ${{ secrets.AWS_TOWER_TOKEN }} - AWS_JOB_DEFINITION: ${{ secrets.AWS_JOB_DEFINITION }} - AWS_JOB_QUEUE: ${{ secrets.AWS_JOB_QUEUE }} - AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }} - - jobs: - run-awstest: + run-tower: name: Run AWS full tests if: github.repository == 'nf-core/scrnaseq' runs-on: ubuntu-latest steps: - - name: Setup Miniconda - uses: conda-incubator/setup-miniconda@v2 - with: - auto-update-conda: true - python-version: 3.7 - - name: Install awscli - run: conda install -c conda-forge awscli - - name: Start AWS batch job + - name: Launch workflow via tower + uses: nf-core/tower-action@master # TODO nf-core: You can customise AWS full pipeline tests as required # Add full size test data (but still relatively small datasets for few samples) # on the `test_full.config` test runs with only one set of parameters - # Then specify `-profile test_full` instead of `-profile test` on the AWS batch command - run: | - aws batch submit-job \ - --region eu-west-1 \ - --job-name nf-core-scrnaseq \ - --job-queue $AWS_JOB_QUEUE \ - --job-definition $AWS_JOB_DEFINITION \ - --container-overrides '{"command": ["nf-core/scrnaseq", "-r '"${GITHUB_SHA}"' -profile test --outdir s3://'"${AWS_S3_BUCKET}"'/scrnaseq/results-'"${GITHUB_SHA}"' -w s3://'"${AWS_S3_BUCKET}"'/scrnaseq/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}' + + with: + workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }} + bearer_token: ${{ secrets.TOWER_BEARER_TOKEN }} + compute_env: ${{ secrets.TOWER_COMPUTE_ENV }} + pipeline: ${{ github.repository }} + revision: ${{ github.sha }} + workdir: s3://${{ secrets.AWS_S3_BUCKET }}/work/scrnaseq/work-${{ github.sha }} + parameters: | + { + "outdir": "s3://${{ secrets.AWS_S3_BUCKET }}/scrnaseq/results-${{ github.sha }}" + } + profiles: '[ "test_full", "aws_tower" ]' + diff --git a/.github/workflows/awstest.yml b/.github/workflows/awstest.yml index 848d9995..fea3ed13 100644 --- a/.github/workflows/awstest.yml +++ b/.github/workflows/awstest.yml @@ -1,42 +1,28 @@ name: nf-core AWS test -# This workflow is triggered on push to the master branch. -# It can be additionally triggered manually with GitHub actions workflow dispatch. -# It runs the -profile 'test' on AWS batch. +# This workflow can be triggered manually with the GitHub actions workflow dispatch button. +# It runs the -profile 'test' on AWS batch on: workflow_dispatch: - - -env: - AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} - AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} - TOWER_ACCESS_TOKEN: ${{ secrets.AWS_TOWER_TOKEN }} - AWS_JOB_DEFINITION: ${{ secrets.AWS_JOB_DEFINITION }} - AWS_JOB_QUEUE: ${{ secrets.AWS_JOB_QUEUE }} - AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }} - - jobs: - run-awstest: + run-tower: name: Run AWS tests if: github.repository == 'nf-core/scrnaseq' runs-on: ubuntu-latest steps: - - name: Setup Miniconda - uses: conda-incubator/setup-miniconda@v2 + - name: Launch workflow via tower + uses: nf-core/tower-action@master + with: - auto-update-conda: true - python-version: 3.7 - - name: Install awscli - run: conda install -c conda-forge awscli - - name: Start AWS batch job - # TODO nf-core: You can customise CI pipeline run tests as required - # For example: adding multiple test runs with different parameters - # Remember that you can parallelise this by using strategy.matrix - run: | - aws batch submit-job \ - --region eu-west-1 \ - --job-name nf-core-scrnaseq \ - --job-queue $AWS_JOB_QUEUE \ - --job-definition $AWS_JOB_DEFINITION \ - --container-overrides '{"command": ["nf-core/scrnaseq", "-r '"${GITHUB_SHA}"' -profile test --outdir s3://'"${AWS_S3_BUCKET}"'/scrnaseq/results-'"${GITHUB_SHA}"' -w s3://'"${AWS_S3_BUCKET}"'/scrnaseq/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}' + workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }} + bearer_token: ${{ secrets.TOWER_BEARER_TOKEN }} + compute_env: ${{ secrets.TOWER_COMPUTE_ENV }} + pipeline: ${{ github.repository }} + revision: ${{ github.sha }} + workdir: s3://${{ secrets.AWS_S3_BUCKET }}/work/scrnaseq/work-${{ github.sha }} + parameters: | + { + "outdir": "s3://${{ secrets.AWS_S3_BUCKET }}/scrnaseq/results-${{ github.sha }}" + } + profiles: '[ "test", "aws_tower" ]' + diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index ac5e5f11..8aa48496 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -8,6 +8,9 @@ on: release: types: [published] +# Uncomment if we need an edge release of Nextflow again +# env: NXF_EDGE: 1 + jobs: test: name: Run workflow tests @@ -19,30 +22,18 @@ jobs: NXF_ANSI_LOG: false strategy: matrix: - profile: ['test,docker', 'test_kallisto,docker', 'test,docker --aligner star'] + profile: + [ + "test,docker --aligner alevin", + "test,docker --aligner kallisto", + "test,docker --aligner star", + ] # Nextflow versions: check pipeline minimum and current latest - nxf_ver: ['20.10.0', '21.03.0-edge'] + nxf_ver: ["21.04.0", ""] steps: - name: Check out pipeline code uses: actions/checkout@v2 - - name: Check if Dockerfile or Conda environment changed - uses: technote-space/get-diff-action@v4 - with: - FILES: | - Dockerfile - environment.yml - - - name: Build new docker image - if: env.MATCHED_FILES - run: docker build --no-cache . -t nfcore/scrnaseq:1.1.0 - - - name: Pull docker image - if: ${{ !env.MATCHED_FILES }} - run: | - docker pull nfcore/scrnaseq:dev - docker tag nfcore/scrnaseq:dev nfcore/scrnaseq:1.1.0 - - name: Install Nextflow env: CAPSULE_LOG: none @@ -55,4 +46,3 @@ jobs: # Remember that you can parallelise this by using strategy.matrix run: | nextflow run ${GITHUB_WORKSPACE} -profile ${{ matrix.profile }} - diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml index fcde400c..3b448773 100644 --- a/.github/workflows/linting.yml +++ b/.github/workflows/linting.yml @@ -18,7 +18,7 @@ jobs: - name: Install markdownlint run: npm install -g markdownlint-cli - name: Run Markdownlint - run: markdownlint ${GITHUB_WORKSPACE} -c ${GITHUB_WORKSPACE}/.github/markdownlint.yml + run: markdownlint . # If the above check failed, post a comment on the PR explaining the failure - name: Post PR comment @@ -35,8 +35,8 @@ jobs: * On Mac: `brew install markdownlint-cli` * Everything else: [Install `npm`](https://www.npmjs.com/get-npm) then [install `markdownlint-cli`](https://www.npmjs.com/package/markdownlint-cli) (`npm install -g markdownlint-cli`) * Fix the markdown errors - * Automatically: `markdownlint . --config .github/markdownlint.yml --fix` - * Manually resolve anything left from `markdownlint . --config .github/markdownlint.yml` + * Automatically: `markdownlint . --fix` + * Manually resolve anything left from `markdownlint .` Once you push these changes the test should pass, and you can hide this comment :+1: @@ -46,6 +46,20 @@ jobs: repo-token: ${{ secrets.GITHUB_TOKEN }} allow-repeats: false + EditorConfig: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v2 + + - uses: actions/setup-node@v1 + with: + node-version: '10' + + - name: Install editorconfig-checker + run: npm install -g editorconfig-checker + + - name: Run ECLint check + run: editorconfig-checker -exclude README.md $(git ls-files | grep -v test) YAML: runs-on: ubuntu-latest @@ -84,7 +98,6 @@ jobs: repo-token: ${{ secrets.GITHUB_TOKEN }} allow-repeats: false - nf-core: runs-on: ubuntu-latest steps: @@ -114,7 +127,7 @@ jobs: GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }} GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }} - run: nf-core -l lint_log.txt lint ${GITHUB_WORKSPACE} --markdown lint_results.md + run: nf-core -l lint_log.txt lint --dir ${GITHUB_WORKSPACE} --markdown lint_results.md - name: Save PR number if: ${{ always() }} diff --git a/.github/workflows/push_dockerhub_dev.yml b/.github/workflows/push_dockerhub_dev.yml deleted file mode 100644 index 9902c471..00000000 --- a/.github/workflows/push_dockerhub_dev.yml +++ /dev/null @@ -1,28 +0,0 @@ -name: nf-core Docker push (dev) -# This builds the docker image and pushes it to DockerHub -# Runs on nf-core repo releases and push event to 'dev' branch (PR merges) -on: - push: - branches: - - dev - -jobs: - push_dockerhub: - name: Push new Docker image to Docker Hub (dev) - runs-on: ubuntu-latest - # Only run for the nf-core repo, for releases and merged PRs - if: ${{ github.repository == 'nf-core/scrnaseq' }} - env: - DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }} - DOCKERHUB_PASS: ${{ secrets.DOCKERHUB_PASS }} - steps: - - name: Check out pipeline code - uses: actions/checkout@v2 - - - name: Build new docker image - run: docker build --no-cache . -t nfcore/scrnaseq:dev - - - name: Push Docker image to DockerHub (dev) - run: | - echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin - docker push nfcore/scrnaseq:dev diff --git a/.github/workflows/push_dockerhub_release.yml b/.github/workflows/push_dockerhub_release.yml deleted file mode 100644 index 02bb43f3..00000000 --- a/.github/workflows/push_dockerhub_release.yml +++ /dev/null @@ -1,29 +0,0 @@ -name: nf-core Docker push (release) -# This builds the docker image and pushes it to DockerHub -# Runs on nf-core repo releases and push event to 'dev' branch (PR merges) -on: - release: - types: [published] - -jobs: - push_dockerhub: - name: Push new Docker image to Docker Hub (release) - runs-on: ubuntu-latest - # Only run for the nf-core repo, for releases and merged PRs - if: ${{ github.repository == 'nf-core/scrnaseq' }} - env: - DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }} - DOCKERHUB_PASS: ${{ secrets.DOCKERHUB_PASS }} - steps: - - name: Check out pipeline code - uses: actions/checkout@v2 - - - name: Build new docker image - run: docker build --no-cache . -t nfcore/scrnaseq:latest - - - name: Push Docker image to DockerHub (release) - run: | - echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin - docker push nfcore/scrnaseq:latest - docker tag nfcore/scrnaseq:latest nfcore/scrnaseq:${{ github.event.release.tag_name }} - docker push nfcore/scrnaseq:${{ github.event.release.tag_name }} diff --git a/.gitignore b/.gitignore index 6941693c..5d402163 100644 --- a/.gitignore +++ b/.gitignore @@ -3,10 +3,9 @@ work/ data/ results/ .DS_Store -tests/ testing/ testing* *.pyc log/ reports/ -testme.sh \ No newline at end of file +testme.sh diff --git a/.github/markdownlint.yml b/.markdownlint.yml similarity index 90% rename from .github/markdownlint.yml rename to .markdownlint.yml index 8d7eb53b..9e605fcf 100644 --- a/.github/markdownlint.yml +++ b/.markdownlint.yml @@ -1,6 +1,8 @@ # Markdownlint configuration file default: true line-length: false +ul-indent: + indent: 4 no-duplicate-header: siblings_only: true no-inline-html: diff --git a/CHANGELOG.md b/CHANGELOG.md index 7bbef2b1..7c39b276 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,15 +3,16 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## v2.0dev - + +* Pipeline ported to dsl2 +* Template update with latest nf-core/tools v2.1 + ## v1.1.0 - 2021-03-24 "Olive Mercury Corgi" * Template update with latest nf-core/tools v1.13.2 * Parameters JSON Schema added [#42](https://github.com/nf-core/scrnaseq/issues/42) - -## V1.0.1dev - * [25](https://github.com/nf-core/scrnaseq/issues/25) Fix small documentation error with wrong parameter for txp2gene -* Updated to nf-core template v1.11 ### Fixes diff --git a/CITATIONS.md b/CITATIONS.md new file mode 100644 index 00000000..3bdd2f65 --- /dev/null +++ b/CITATIONS.md @@ -0,0 +1,44 @@ +# nf-core/scrnaseq: Citations + +## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/) + +> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031. + +## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/) + +> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. + +## Pipeline tools + +* [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) + +* [MultiQC](https://www.ncbi.nlm.nih.gov/pubmed/27312411/) + > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. + +* [Alevin](https://doi.org/10.1186/s13059-019-1670-y) + > Srivastava, A., Malik, L., Smith, T. et al. Alevin efficiently estimates accurate gene abundances from dscRNA-seq data. Genome Biol 20, 65 (2019). + +* [Salmon](https://www.nature.com/articles/nmeth.4197) + > Patro, R., Duggal, G., Love, M. et al. Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods 14, 417–419 (2017). + +* [Kallisto/Bustools](https://www.nature.com/articles/s41587-021-00870-2) + > Melsted, P., Booeshaghi, A.S., Liu, L. et al. Modular, efficient and constant-memory single-cell RNA-seq preprocessing. Nat Biotechnol 39, 813–818 (2021). + +* [StarSolo](https://www.biorxiv.org/content/10.1101/2021.05.05.442755v1) + > Benjamin Kaminow, Dinar Yunusov, Alexander Dobin. STARsolo: accurate, fast and versatile mapping/quantification of single-cell and single-nucleus RNA-seq data. BioRxiv 2021.05.05.442755 (2021). + +## Software packaging/containerisation tools + +* [Anaconda](https://anaconda.com) + > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. + +* [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/) + > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506. + +* [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/) + > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671. + +* [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241) + +* [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/) + > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675. diff --git a/Dockerfile b/Dockerfile deleted file mode 100644 index 8d7eddcb..00000000 --- a/Dockerfile +++ /dev/null @@ -1,17 +0,0 @@ -FROM nfcore/base:1.13.3 -LABEL authors="Peter J Bailey, Alexander Peltzer, Olga Botvinnik" \ - description="Docker image containing all software requirements for the nf-core/scrnaseq pipeline" - -# Install the conda environment -COPY environment.yml / -RUN conda env create --quiet -f /environment.yml && conda clean -a - -# The conda bug with tbb - salmon: error while loading shared libraries: libtbb.so.2 -# pandoc via conda was not working -RUN apt-get update && apt-get install -y libtbb2 pandoc-citeproc - -# Add conda installation dir to PATH (instead of doing 'conda activate') -ENV PATH /opt/conda/envs/nf-core-scrnaseq-1.1.0/bin:$PATH - -# Dump the details of the installed packages to a file for posterity -RUN conda env export --name nf-core-scrnaseq-1.1.0 > nf-core-scrnaseq-1.1.0.yml diff --git a/README.md b/README.md index 7c9feee5..27d4b7de 100644 --- a/README.md +++ b/README.md @@ -1,23 +1,32 @@ # ![nf-core/scrnaseq](docs/images/nf-core-scrnaseq_logo.png) -**A fully automated Nextflow pipeline for Droplet-based (e.g. 10x Genomics) single-cell RNA-Seq data**. +[![GitHub Actions CI Status](https://github.com/nf-core/scrnaseq/workflows/nf-core%20CI/badge.svg)](https://github.com/nf-core/scrnaseq/actions?query=workflow%3A%22nf-core+CI%22) +[![GitHub Actions Linting Status](https://github.com/nf-core/scrnaseq/workflows/nf-core%20linting/badge.svg)](https://github.com/nf-core/scrnaseq/actions?query=workflow%3A%22nf-core+linting%22) +[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/scrnaseq/results) +[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.XXXXXXX-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.XXXXXXX) -[![GitHub Actions CI Status](https://github.com/nf-core/scrnaseq/workflows/nf-core%20CI/badge.svg)](https://github.com/nf-core/scrnaseq/actions) -[![GitHub Actions Linting Status](https://github.com/nf-core/scrnaseq/workflows/nf-core%20linting/badge.svg)](https://github.com/nf-core/scrnaseq/actions) -[![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A520.10.0-brightgreen.svg)](https://www.nextflow.io/) +[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A521.04.0-23aa62.svg?labelColor=000000)](https://www.nextflow.io/) +[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/) +[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/) +[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/) -[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg)](https://bioconda.github.io/) -[![Docker](https://img.shields.io/docker/automated/nfcore/scrnaseq.svg)](https://hub.docker.com/r/nfcore/scrnaseq) -[![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23scrnaseq-4A154B?logo=slack)](https://nfcore.slack.com/channels/scrnaseq) +[![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23scrnaseq-4A154B?labelColor=000000&logo=slack)](https://nfcore.slack.com/channels/scrnaseq) +[![Follow on Twitter](http://img.shields.io/badge/twitter-%40nf__core-1DA1F2?labelColor=000000&logo=twitter)](https://twitter.com/nf_core) +[![Watch on YouTube](http://img.shields.io/badge/youtube-nf--core-FF0000?labelColor=000000&logo=youtube)](https://www.youtube.com/c/nf-core) [![Join us on Slack](https://img.shields.io/badge/slack-nfcore/scrnaseq-blue.svg)](https://nfcore.slack.com/channels/scrnaseq) ## Introduction -**nf-core/scrnaseq** is a bioinformatics best-practise analysis pipeline for +**nf-core/scrnaseq** is a bioinformatics best-practice analysis pipeline for processing 10x Genomics single-cell RNA-seq data. -The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible. +The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from [nf-core/modules](https://github.com/nf-core/modules) in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community! + + +On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the [nf-core website](https://nf-co.re/scrnaseq/results). + +## Pipeline summary This is a community effort in building a pipeline capable to support: @@ -25,27 +34,41 @@ This is a community effort in building a pipeline capable to support: * STARSolo * Kallisto + BUStools -## Pipeline Summary +## Documentation -By default, the pipeline currently performs the following: +The nf-core/scrnaseq pipeline comes with documentation about the pipeline [usage](https://nf-co.re/scrnaseq/usage), [parameters](https://nf-co.re/scrnaseq/parameters) and [output](https://nf-co.re/scrnaseq/output). - +## Quick Start -* Sequencing quality control (`FastQC`) -* Overall pipeline run summaries (`MultiQC`) +1. Install [`Nextflow`](https://www.nextflow.io/docs/latest/getstarted.html#installation) (`>=21.04.0`) -## Documentation +2. Install any of [`Docker`](https://docs.docker.com/engine/installation/), [`Singularity`](https://www.sylabs.io/guides/3.0/user-guide/), [`Podman`](https://podman.io/), [`Shifter`](https://nersc.gitlab.io/development/shifter/how-to-use/) or [`Charliecloud`](https://hpc.github.io/charliecloud/) for full pipeline reproducibility _(please only use [`Conda`](https://conda.io/miniconda.html) as a last resort; see [docs](https://nf-co.re/usage/configuration#basic-configuration-profiles))_ + +3. Download the pipeline and test it on a minimal dataset with a single command: -The nf-core/scrnaseq pipeline comes with documentation about the pipeline: [usage](https://nf-co.re/scrnaseq/usage) and [output](https://nf-co.re/scrnaseq/output). + ```console + nextflow run nf-core/scrnaseq -profile test, + ``` + + > * Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile ` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment. + > * If you are using `singularity` then the pipeline will auto-detect this and attempt to download the Singularity images directly as opposed to performing a conversion from Docker images. If you are persistently observing issues downloading Singularity images directly due to timeout or network issues then please use the `--singularity_pull_docker_container` parameter to pull and convert the Docker image instead. Alternatively, it is highly recommended to use the [`nf-core download`](https://nf-co.re/tools/#downloading-pipelines-for-offline-use) command to pre-download all of the required containers before running the pipeline and to set the [`NXF_SINGULARITY_CACHEDIR` or `singularity.cacheDir`](https://www.nextflow.io/docs/latest/singularity.html?#singularity-docker-hub) Nextflow options to be able to store and re-use the images from a central location for future pipeline runs. + > * If you are using `conda`, it is highly recommended to use the [`NXF_CONDA_CACHEDIR` or `conda.cacheDir`](https://www.nextflow.io/docs/latest/conda.html) settings to store the environments in a central location for future pipeline runs. + +4. Start running your own analysis! + + ```console + nextflow run nf-core/scrnaseq -profile --input samplesheet.csv --genome_fasta GRCm38.p6.genome.chr19.fa --gtf gencode.vM19.annotation.chr19.gtf --protocol 10XV2 + ``` ## Credits The `nf-core/scrnaseq` was initiated by [Peter J. Bailey](https://github.com/PeterBailey) (Salmon Alevin, AlevinQC) with major contributions from [Olga Botvinnik](https://github.com/olgabot) (STARsolo, Testdata) and [Alex Peltzer](https://github.com/apeltzer) (Kallisto/BusTools workflow). -We thank the following people for their extensive assistance in the development -of this pipeline: +We thank the following people for their extensive assistance in the development of this pipeline: - +* @KevinMenden +* @ggabernet +* @FloWuenne ## Contributions and Support @@ -55,14 +78,10 @@ For further information or help, don't hesitate to get in touch on the [Slack `# ## Citations +If you use nf-core/scrnaseq for your analysis, please cite it using the following doi: [10.5281/zenodo.3568187](https://doi.org/10.5281/10.5281/zenodo.3568187) + The basic benchmarks that were used as motivation for incorporating the three available modular workflows can be found in [this publication](https://www.biorxiv.org/content/10.1101/673285v2). We offer all three paths for the processing of scRNAseq data so it remains up to the user to decide which pipeline workflow is chosen for a particular analysis question. -You can cite the `nf-core` publication as follows: - -> **The nf-core framework for community-curated bioinformatics pipelines.** -> -> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen. -> -> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x). +An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file. diff --git a/assets/sendmail_template.txt b/assets/sendmail_template.txt index c95d3186..2529693f 100644 --- a/assets/sendmail_template.txt +++ b/assets/sendmail_template.txt @@ -15,15 +15,15 @@ Content-ID: Content-Disposition: inline; filename="nf-core-scrnaseq_logo.png" <% out << new File("$projectDir/assets/nf-core-scrnaseq_logo.png"). - bytes. - encodeBase64(). - toString(). - tokenize( '\n' )*. - toList()*. - collate( 76 )*. - collect { it.join() }. - flatten(). - join( '\n' ) %> + bytes. + encodeBase64(). + toString(). + tokenize( '\n' )*. + toList()*. + collate( 76 )*. + collect { it.join() }. + flatten(). + join( '\n' ) %> <% if (mqcFile){ @@ -37,15 +37,15 @@ Content-ID: Content-Disposition: attachment; filename=\"${mqcFileObj.getName()}\" ${mqcFileObj. - bytes. - encodeBase64(). - toString(). - tokenize( '\n' )*. - toList()*. - collate( 76 )*. - collect { it.join() }. - flatten(). - join( '\n' )} + bytes. + encodeBase64(). + toString(). + tokenize( '\n' )*. + toList()*. + collate( 76 )*. + collect { it.join() }. + flatten(). + join( '\n' )} """ }} %> diff --git a/assets/where_are_my_files.txt b/assets/where_are_my_files.txt deleted file mode 100644 index 17873f87..00000000 --- a/assets/where_are_my_files.txt +++ /dev/null @@ -1,39 +0,0 @@ -===================== - Where are my files? -===================== - -By default, the nf-core/scrnaseq pipeline does not save large intermediate files to the -results directory. This is to try to conserve disk space. - -These files can be found in the pipeline `work` directory if needed. -Alternatively, re-run the pipeline using `-resume` in addition to one of -the below command-line options and they will be copied into the results directory: - -`--saveAlignedIntermediates` -The final BAM files created after the Picard MarkDuplicates step are always saved -and can be found in the `markDuplicates/` folder. -Specify this flag to also copy out BAM files from STAR / HISAT2 alignment and sorting steps. - -`--saveTrimmed` -Specify to save trimmed FastQ files to the results directory. - -`--saveReference` -Save any downloaded or generated reference genome files to your results folder. -These can then be used for future pipeline runs, reducing processing times. - ------------------------------------ - Setting defaults in a config file ------------------------------------ -If you would always like these files to be saved without having to specify this on -the command line, you can save the following to your personal configuration file -(eg. `~/.nextflow/config`): - -params.saveReference = true -params.saveTrimmed = true -params.saveAlignedIntermediates = true - -For more help, see the following documentation: - -https://github.com/nf-core/scrnaseq/blob/master/docs/usage.md -https://www.nextflow.io/docs/latest/getstarted.html -https://www.nextflow.io/docs/latest/config.html diff --git a/bin/alevin_qc.r b/bin/alevin_qc.r index 4af1fa30..ec1362f9 100755 --- a/bin/alevin_qc.r +++ b/bin/alevin_qc.r @@ -4,7 +4,7 @@ args <- commandArgs(trailingOnly=TRUE) if (length(args) < 3) { - stop("Usage: alevin_qc.r ", call.=FALSE) + stop("Usage: alevin_qc.r ", call.=FALSE) } require(alevinQC) @@ -15,6 +15,6 @@ sampleId <- args[2] outDir <- args[3] alevinQCReport(baseDir = baseDir, sampleId = sampleId, - outputFile = "alevinReport.html", - outputFormat = "html_document", - outputDir = outDir, forceOverwrite = TRUE) + outputFile = "alevinReport.html", + outputFormat = "html_document", + outputDir = outDir, forceOverwrite = TRUE) diff --git a/bin/check_samplesheet.py b/bin/check_samplesheet.py new file mode 100755 index 00000000..1b007b91 --- /dev/null +++ b/bin/check_samplesheet.py @@ -0,0 +1,145 @@ +#!/usr/bin/env python + +# This script is based on the example at: https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/samplesheet/samplesheet_test_illumina_amplicon.csv + +import os +import sys +import errno +import argparse + + +def parse_args(args=None): + Description = "Reformat nf-core/scrnaseq samplesheet file and check its contents." + Epilog = "Example usage: python check_samplesheet.py " + + parser = argparse.ArgumentParser(description=Description, epilog=Epilog) + parser.add_argument("FILE_IN", help="Input samplesheet file.") + parser.add_argument("FILE_OUT", help="Output file.") + return parser.parse_args(args) + + +def make_dir(path): + if len(path) > 0: + try: + os.makedirs(path) + except OSError as exception: + if exception.errno != errno.EEXIST: + raise exception + + +def print_error(error, context="Line", context_str=""): + error_str = "ERROR: Please check samplesheet -> {}".format(error) + if context != "" and context_str != "": + error_str = "ERROR: Please check samplesheet -> {}\n{}: '{}'".format( + error, context.strip(), context_str.strip() + ) + print(error_str) + sys.exit(1) + + +# TODO nf-core: Update the check_samplesheet function +def check_samplesheet(file_in, file_out): + """ + This function checks that the samplesheet follows the following structure: + + sample,fastq_1,fastq_2 + SAMPLE_PE,SAMPLE_PE_RUN1_1.fastq.gz,SAMPLE_PE_RUN1_2.fastq.gz + SAMPLE_PE,SAMPLE_PE_RUN2_1.fastq.gz,SAMPLE_PE_RUN2_2.fastq.gz + SAMPLE_SE,SAMPLE_SE_RUN1_1.fastq.gz, + + For an example see: + https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/samplesheet/samplesheet_test_illumina_amplicon.csv + """ + + sample_mapping_dict = {} + with open(file_in, "r") as fin: + + ## Check header + MIN_COLS = 2 + # TODO nf-core: Update the column names for the input samplesheet + HEADER = ["sample", "fastq_1", "fastq_2"] + header = [x.strip('"') for x in fin.readline().strip().split(",")] + if header[: len(HEADER)] != HEADER: + print("ERROR: Please check samplesheet header -> {} != {}".format(",".join(header), ",".join(HEADER))) + sys.exit(1) + + ## Check sample entries + for line in fin: + lspl = [x.strip().strip('"') for x in line.strip().split(",")] + + # Check valid number of columns per row + if len(lspl) < len(HEADER): + print_error( + "Invalid number of columns (minimum = {})!".format(len(HEADER)), + "Line", + line, + ) + num_cols = len([x for x in lspl if x]) + if num_cols < MIN_COLS: + print_error( + "Invalid number of populated columns (minimum = {})!".format(MIN_COLS), + "Line", + line, + ) + + ## Check sample name entries + sample, fastq_1, fastq_2 = lspl[: len(HEADER)] + sample = sample.replace(" ", "_") + if not sample: + print_error("Sample entry has not been specified!", "Line", line) + + ## Check FastQ file extension + for fastq in [fastq_1, fastq_2]: + if fastq: + if fastq.find(" ") != -1: + print_error("FastQ file contains spaces!", "Line", line) + if not fastq.endswith(".fastq.gz") and not fastq.endswith(".fq.gz"): + print_error( + "FastQ file does not have extension '.fastq.gz' or '.fq.gz'!", + "Line", + line, + ) + + ## Auto-detect paired-end/single-end + sample_info = [] ## [single_end, fastq_1, fastq_2] + if sample and fastq_1 and fastq_2: ## Paired-end short reads + sample_info = ["0", fastq_1, fastq_2] + elif sample and fastq_1 and not fastq_2: ## Single-end short reads + sample_info = ["1", fastq_1, fastq_2] + else: + print_error("Invalid combination of columns provided!", "Line", line) + + ## Create sample mapping dictionary = { sample: [ single_end, fastq_1, fastq_2 ] } + if sample not in sample_mapping_dict: + sample_mapping_dict[sample] = [sample_info] + else: + if sample_info in sample_mapping_dict[sample]: + print_error("Samplesheet contains duplicate rows!", "Line", line) + else: + sample_mapping_dict[sample].append(sample_info) + + ## Write validated samplesheet with appropriate columns + if len(sample_mapping_dict) > 0: + out_dir = os.path.dirname(file_out) + make_dir(out_dir) + with open(file_out, "w") as fout: + fout.write(",".join(["sample", "single_end", "fastq_1", "fastq_2"]) + "\n") + for sample in sorted(sample_mapping_dict.keys()): + + ## Check that multiple runs of the same sample are of the same datatype + if not all(x[0] == sample_mapping_dict[sample][0][0] for x in sample_mapping_dict[sample]): + print_error("Multiple runs of a sample must be of the same datatype!", "Sample: {}".format(sample)) + + for idx, val in enumerate(sample_mapping_dict[sample]): + fout.write(",".join(["{}_T{}".format(sample, idx + 1)] + val) + "\n") + else: + print_error("No entries to process!", "Samplesheet: {}".format(file_in)) + + +def main(args=None): + args = parse_args(args) + check_samplesheet(args.FILE_IN, args.FILE_OUT) + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/bin/markdown_to_html.py b/bin/markdown_to_html.py deleted file mode 100755 index a26d1ff5..00000000 --- a/bin/markdown_to_html.py +++ /dev/null @@ -1,91 +0,0 @@ -#!/usr/bin/env python -from __future__ import print_function -import argparse -import markdown -import os -import sys -import io - - -def convert_markdown(in_fn): - input_md = io.open(in_fn, mode="r", encoding="utf-8").read() - html = markdown.markdown( - "[TOC]\n" + input_md, - extensions=["pymdownx.extra", "pymdownx.b64", "pymdownx.highlight", "pymdownx.emoji", "pymdownx.tilde", "toc"], - extension_configs={ - "pymdownx.b64": {"base_path": os.path.dirname(in_fn)}, - "pymdownx.highlight": {"noclasses": True}, - "toc": {"title": "Table of Contents"}, - }, - ) - return html - - -def wrap_html(contents): - header = """ - - - - - -
- """ - footer = """ -
- - - """ - return header + contents + footer - - -def parse_args(args=None): - parser = argparse.ArgumentParser() - parser.add_argument("mdfile", type=argparse.FileType("r"), nargs="?", help="File to convert. Defaults to stdin.") - parser.add_argument( - "-o", "--out", type=argparse.FileType("w"), default=sys.stdout, help="Output file name. Defaults to stdout." - ) - return parser.parse_args(args) - - -def main(args=None): - args = parse_args(args) - converted_md = convert_markdown(args.mdfile.name) - html = wrap_html(converted_md) - args.out.write(html) - - -if __name__ == "__main__": - sys.exit(main()) diff --git a/bin/postprocessing.r b/bin/postprocessing.r index fb029609..556f82e2 100755 --- a/bin/postprocessing.r +++ b/bin/postprocessing.r @@ -1,7 +1,7 @@ args = commandArgs(trailingOnly=TRUE) if (length(args) < 2) { - stop("Usage: postprocessing.r ", call.=FALSE) + stop("Usage: postprocessing.r ", call.=FALSE) } base.path <- args[1] @@ -10,20 +10,20 @@ base.path <- args[1] # Parts of the function is taken from Seurat's Read10x parsing function ReadAlevin <- function( base.path = NULL ){ if (! dir.exists(base.path )){ - stop("Directory provided does not exist") + stop("Directory provided does not exist") } barcode.loc <- paste0( base.path, "alevin/quants_mat_rows.txt" ) gene.loc <- paste0( base.path, "alevin/quants_mat_cols.txt" ) matrix.loc <- paste0( base.path, "alevin/quants_mat.csv" ) if (!file.exists( barcode.loc )){ - stop("Barcode file missing") + stop("Barcode file missing") } if (! file.exists(gene.loc) ){ - stop("Gene name file missing") + stop("Gene name file missing") } if (! file.exists(matrix.loc )){ - stop("Expression matrix file missing") + stop("Expression matrix file missing") } matrix <- as.matrix(read.csv( matrix.loc, header=FALSE)) matrix <- t(matrix[,1:ncol(matrix)-1]) diff --git a/bin/scrape_software_versions.py b/bin/scrape_software_versions.py index 7174fb81..400b1730 100755 --- a/bin/scrape_software_versions.py +++ b/bin/scrape_software_versions.py @@ -1,41 +1,18 @@ #!/usr/bin/env python from __future__ import print_function -from collections import OrderedDict -import re +import os -regexes = { - 'nf-core/scrnaseq': ['v_pipeline.txt', r"(\S+)"], - 'Nextflow': ['v_nextflow.txt', r"(\S+)"], - 'STAR': ['v_star.txt', r"(\S+)"], - 'Salmon': ['v_salmon.txt', r"salmon (\S+)"], - 'Kallisto': ['v_kallisto.txt', r"kallisto, version (\S+)"], - 'BUStools': ['v_bustools.txt', r"bustools (\S+)"], - 'MultiQC': ['v_multiqc.txt', r"multiqc, version (\S+)"], -} -results = OrderedDict() -results['nf-core/scrnaseq'] = 'N/A' -results['Nextflow'] = 'N/A' -results['STAR'] = 'N/A' -results['Salmon'] = 'N/A' -results['Kallisto'] = 'N/A' -results['BUStools'] = 'N/A' -results['MultiQC'] = 'N/A' +results = {} +version_files = [x for x in os.listdir(".") if x.endswith(".version.txt")] +for version_file in version_files: -# Search each file using its regex -for k, v in regexes.items(): - try: - with open(v[0]) as x: - versions = x.read() - match = re.search(v[1], versions) - if match: - results[k] = "v{}".format(match.group(1)) - except IOError: - results[k] = False + software = version_file.replace(".version.txt", "") + if software == "pipeline": + software = "nf-core/scrnaseq" -# Remove software set to false in results -for k in list(results): - if not results[k]: - del results[k] + with open(version_file) as fin: + version = fin.read().strip() + results[software] = version # Dump to YAML print( @@ -49,11 +26,11 @@
""" ) -for k, v in results.items(): +for k, v in sorted(results.items()): print("
{}
{}
".format(k, v)) print("
") -# Write out regexes as csv file: -with open("software_versions.csv", "w") as f: - for k, v in results.items(): - f.write("{}\t{}\n".format(k, v)) \ No newline at end of file +# Write out as tsv file: +with open("software_versions.tsv", "w") as f: + for k, v in sorted(results.items()): + f.write("{}\t{}\n".format(k, v)) diff --git a/bin/t2g.py b/bin/t2g.py index dd95c710..6419dd1d 100755 --- a/bin/t2g.py +++ b/bin/t2g.py @@ -1,21 +1,21 @@ #!/usr/bin/env python -#This was downloaded on 2019-06-23 from https://github.com/bustools/getting_started/releases/ +#This was downloaded on 2019-06-23 from https://github.com/bustools/getting_started/releases/ #All credit goes to the original authors from the Kallisto/BUStools team! # BSD 2-Clause License -# +# # Copyright (c) 2017, Nicolas Bray, Harold Pimentel, Páll Melsted and Lior Pachter # All rights reserved. -# +# # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are met: -# +# # * Redistributions of source code must retain the above copyright notice, this # list of conditions and the following disclaimer. -# +# # * Redistributions in binary form must reproduce the above copyright notice, # this list of conditions and the following disclaimer in the documentation # and/or other materials provided with the distribution. -# +# # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" # AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE diff --git a/conf/base.config b/conf/base.config index d430ef7d..963664a2 100644 --- a/conf/base.config +++ b/conf/base.config @@ -1,49 +1,55 @@ /* - * ------------------------------------------------- - * nf-core/scrnaseq Nextflow base config file - * ------------------------------------------------- - * A 'blank slate' config file, appropriate for general - * use on most high performace compute environments. - * Assumes that all software is installed and available - * on the PATH. Runs in `local` mode - all jobs will be - * run on the logged in environment. - */ +======================================================================================== + nf-core/scrnaseq Nextflow base config file +======================================================================================== + A 'blank slate' config file, appropriate for general use on most high performance + compute environments. Assumes that all software is installed and available on + the PATH. Runs in `local` mode - all jobs will be run on the logged in environment. +---------------------------------------------------------------------------------------- +*/ process { - cpus = { check_max( 1 * task.attempt, 'cpus' ) } - memory = { check_max( 7.GB * task.attempt, 'memory' ) } - time = { check_max( 4.h * task.attempt, 'time' ) } + cpus = { check_max( 1 * task.attempt, 'cpus' ) } + memory = { check_max( 6.GB * task.attempt, 'memory' ) } + time = { check_max( 4.h * task.attempt, 'time' ) } - errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish' } - maxRetries = 1 - maxErrors = '-1' + errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : 'finish' } + maxRetries = 1 + maxErrors = '-1' - // Process-specific resource requirements - // See https://www.nextflow.io/docs/latest/config.html#config-process-selectors - - // Process-specific resource requirements - withLabel: low_memory { - cpus = { check_max (4, 'cpus')} - memory = { check_max( 16.GB * task.attempt, 'memory' ) } - } - withLabel: mid_memory { - cpus = { check_max (8, 'cpus')} - memory = { check_max( 32.GB * task.attempt, 'memory' ) } - time = { check_max( 8.h * task.attempt, 'time' ) } - } - withLabel: high_memory { - cpus = { check_max (10, 'cpus')} - memory = { check_max( 80.GB * task.attempt, 'memory' ) } - time = { check_max( 8.h * task.attempt, 'time' ) } - } - -} - -params { - // Defaults only, expecting to be overwritten - max_memory = 128.GB - max_cpus = 16 - max_time = 240.h - igenomes_base = 's3://ngi-igenomes/igenomes/' + // Process-specific resource requirements + // NOTE - Please try and re-use the labels below as much as possible. + // These labels are used and recognised by default in DSL2 files hosted on nf-core/modules. + // If possible, it would be nice to keep the same label naming convention when + // adding in your local modules too. + // See https://www.nextflow.io/docs/latest/config.html#config-process-selectors + withLabel:process_low { + cpus = { check_max( 2 * task.attempt, 'cpus' ) } + memory = { check_max( 12.GB * task.attempt, 'memory' ) } + time = { check_max( 4.h * task.attempt, 'time' ) } + } + withLabel:process_medium { + cpus = { check_max( 6 * task.attempt, 'cpus' ) } + memory = { check_max( 36.GB * task.attempt, 'memory' ) } + time = { check_max( 8.h * task.attempt, 'time' ) } + } + withLabel:process_high { + cpus = { check_max( 12 * task.attempt, 'cpus' ) } + memory = { check_max( 72.GB * task.attempt, 'memory' ) } + time = { check_max( 16.h * task.attempt, 'time' ) } + } + withLabel:process_long { + time = { check_max( 20.h * task.attempt, 'time' ) } + } + withLabel:process_high_memory { + memory = { check_max( 200.GB * task.attempt, 'memory' ) } + } + withLabel:error_ignore { + errorStrategy = 'ignore' + } + withLabel:error_retry { + errorStrategy = 'retry' + maxRetries = 2 + } } diff --git a/conf/igenomes.config b/conf/igenomes.config index 31b7ee61..855948de 100644 --- a/conf/igenomes.config +++ b/conf/igenomes.config @@ -1,421 +1,432 @@ /* - * ------------------------------------------------- - * Nextflow config file for iGenomes paths - * ------------------------------------------------- - * Defines reference genomes, using iGenome paths - * Can be used by any config that customises the base - * path using $params.igenomes_base / --igenomes_base - */ +======================================================================================== + Nextflow config file for iGenomes paths +======================================================================================== + Defines reference genomes using iGenome paths. + Can be used by any config that customises the base path using: + $params.igenomes_base / --igenomes_base +---------------------------------------------------------------------------------------- +*/ params { - // illumina iGenomes reference file paths - genomes { - 'GRCh37' { - fasta = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/README.txt" - mito_name = "MT" - macs_gsize = "2.7e9" - blacklist = "${projectDir}/assets/blacklists/GRCh37-blacklist.bed" + // illumina iGenomes reference file paths + genomes { + 'GRCh37' { + fasta = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Annotation/README.txt" + mito_name = "MT" + macs_gsize = "2.7e9" + blacklist = "${projectDir}/assets/blacklists/GRCh37-blacklist.bed" + } + 'GRCh38' { + fasta = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.bed" + mito_name = "chrM" + macs_gsize = "2.7e9" + blacklist = "${projectDir}/assets/blacklists/hg38-blacklist.bed" + } + 'GRCm38' { + fasta = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/README.txt" + mito_name = "MT" + macs_gsize = "1.87e9" + blacklist = "${projectDir}/assets/blacklists/GRCm38-blacklist.bed" + } + 'TAIR10' { + fasta = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/README.txt" + mito_name = "Mt" + } + 'EB2' { + fasta = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/README.txt" + } + 'UMD3.1' { + fasta = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/README.txt" + mito_name = "MT" + } + 'WBcel235' { + fasta = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.bed" + mito_name = "MtDNA" + macs_gsize = "9e7" + } + 'CanFam3.1' { + fasta = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/README.txt" + mito_name = "MT" + } + 'GRCz10' { + fasta = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.bed" + mito_name = "MT" + } + 'BDGP6' { + fasta = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.bed" + mito_name = "M" + macs_gsize = "1.2e8" + } + 'EquCab2' { + fasta = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/README.txt" + mito_name = "MT" + } + 'EB1' { + fasta = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/README.txt" + } + 'Galgal4' { + fasta = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.bed" + mito_name = "MT" + } + 'Gm01' { + fasta = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/README.txt" + } + 'Mmul_1' { + fasta = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/README.txt" + mito_name = "MT" + } + 'IRGSP-1.0' { + fasta = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.bed" + mito_name = "Mt" + } + 'CHIMP2.1.4' { + fasta = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/README.txt" + mito_name = "MT" + } + 'Rnor_5.0' { + fasta = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Annotation/Genes/genes.bed" + mito_name = "MT" + } + 'Rnor_6.0' { + fasta = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.bed" + mito_name = "MT" + } + 'R64-1-1' { + fasta = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.bed" + mito_name = "MT" + macs_gsize = "1.2e7" + } + 'EF2' { + fasta = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/README.txt" + mito_name = "MT" + macs_gsize = "1.21e7" + } + 'Sbi1' { + fasta = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/README.txt" + } + 'Sscrofa10.2' { + fasta = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/README.txt" + mito_name = "MT" + } + 'AGPv3' { + fasta = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Annotation/Genes/genes.bed" + mito_name = "Mt" + } + 'hg38' { + fasta = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Annotation/Genes/genes.bed" + mito_name = "chrM" + macs_gsize = "2.7e9" + blacklist = "${projectDir}/assets/blacklists/hg38-blacklist.bed" + } + 'hg19' { + fasta = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/README.txt" + mito_name = "chrM" + macs_gsize = "2.7e9" + blacklist = "${projectDir}/assets/blacklists/hg19-blacklist.bed" + } + 'mm10' { + fasta = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/README.txt" + mito_name = "chrM" + macs_gsize = "1.87e9" + blacklist = "${projectDir}/assets/blacklists/mm10-blacklist.bed" + } + 'bosTau8' { + fasta = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Annotation/Genes/genes.bed" + mito_name = "chrM" + } + 'ce10' { + fasta = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/README.txt" + mito_name = "chrM" + macs_gsize = "9e7" + } + 'canFam3' { + fasta = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/README.txt" + mito_name = "chrM" + } + 'danRer10' { + fasta = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Annotation/Genes/genes.bed" + mito_name = "chrM" + macs_gsize = "1.37e9" + } + 'dm6' { + fasta = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Annotation/Genes/genes.bed" + mito_name = "chrM" + macs_gsize = "1.2e8" + } + 'equCab2' { + fasta = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/README.txt" + mito_name = "chrM" + } + 'galGal4' { + fasta = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/README.txt" + mito_name = "chrM" + } + 'panTro4' { + fasta = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/README.txt" + mito_name = "chrM" + } + 'rn6' { + fasta = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Annotation/Genes/genes.bed" + mito_name = "chrM" + } + 'sacCer3' { + fasta = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/BismarkIndex/" + readme = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Annotation/README.txt" + mito_name = "chrM" + macs_gsize = "1.2e7" + } + 'susScr3' { + fasta = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/WholeGenomeFasta/genome.fa" + bwa = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/BWAIndex/genome.fa" + bowtie2 = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/Bowtie2Index/" + star = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/STARIndex/" + bismark = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/BismarkIndex/" + gtf = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/Genes/genes.gtf" + bed12 = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/Genes/genes.bed" + readme = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/README.txt" + mito_name = "chrM" + } } - 'GRCh38' { - fasta = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.bed" - mito_name = "chrM" - macs_gsize = "2.7e9" - blacklist = "${projectDir}/assets/blacklists/hg38-blacklist.bed" - } - 'GRCm38' { - fasta = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Annotation/README.txt" - mito_name = "MT" - macs_gsize = "1.87e9" - blacklist = "${projectDir}/assets/blacklists/GRCm38-blacklist.bed" - } - 'TAIR10' { - fasta = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Annotation/README.txt" - mito_name = "Mt" - } - 'EB2' { - fasta = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Annotation/README.txt" - } - 'UMD3.1' { - fasta = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Annotation/README.txt" - mito_name = "MT" - } - 'WBcel235' { - fasta = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Annotation/Genes/genes.bed" - mito_name = "MtDNA" - macs_gsize = "9e7" - } - 'CanFam3.1' { - fasta = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Annotation/README.txt" - mito_name = "MT" - } - 'GRCz10' { - fasta = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Annotation/Genes/genes.bed" - mito_name = "MT" - } - 'BDGP6' { - fasta = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Annotation/Genes/genes.bed" - mito_name = "M" - macs_gsize = "1.2e8" - } - 'EquCab2' { - fasta = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Annotation/README.txt" - mito_name = "MT" - } - 'EB1' { - fasta = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Annotation/README.txt" - } - 'Galgal4' { - fasta = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Annotation/Genes/genes.bed" - mito_name = "MT" - } - 'Gm01' { - fasta = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Annotation/README.txt" - } - 'Mmul_1' { - fasta = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Annotation/README.txt" - mito_name = "MT" - } - 'IRGSP-1.0' { - fasta = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Annotation/Genes/genes.bed" - mito_name = "Mt" - } - 'CHIMP2.1.4' { - fasta = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Annotation/README.txt" - mito_name = "MT" - } - 'Rnor_6.0' { - fasta = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Annotation/Genes/genes.bed" - mito_name = "MT" - } - 'R64-1-1' { - fasta = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.bed" - mito_name = "MT" - macs_gsize = "1.2e7" - } - 'EF2' { - fasta = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Annotation/README.txt" - mito_name = "MT" - macs_gsize = "1.21e7" - } - 'Sbi1' { - fasta = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Annotation/README.txt" - } - 'Sscrofa10.2' { - fasta = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Annotation/README.txt" - mito_name = "MT" - } - 'AGPv3' { - fasta = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Annotation/Genes/genes.bed" - mito_name = "Mt" - } - 'hg38' { - fasta = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Annotation/Genes/genes.bed" - mito_name = "chrM" - macs_gsize = "2.7e9" - blacklist = "${projectDir}/assets/blacklists/hg38-blacklist.bed" - } - 'hg19' { - fasta = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Annotation/README.txt" - mito_name = "chrM" - macs_gsize = "2.7e9" - blacklist = "${projectDir}/assets/blacklists/hg19-blacklist.bed" - } - 'mm10' { - fasta = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Annotation/README.txt" - mito_name = "chrM" - macs_gsize = "1.87e9" - blacklist = "${projectDir}/assets/blacklists/mm10-blacklist.bed" - } - 'bosTau8' { - fasta = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Annotation/Genes/genes.bed" - mito_name = "chrM" - } - 'ce10' { - fasta = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Annotation/README.txt" - mito_name = "chrM" - macs_gsize = "9e7" - } - 'canFam3' { - fasta = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Annotation/README.txt" - mito_name = "chrM" - } - 'danRer10' { - fasta = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Annotation/Genes/genes.bed" - mito_name = "chrM" - macs_gsize = "1.37e9" - } - 'dm6' { - fasta = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Annotation/Genes/genes.bed" - mito_name = "chrM" - macs_gsize = "1.2e8" - } - 'equCab2' { - fasta = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Annotation/README.txt" - mito_name = "chrM" - } - 'galGal4' { - fasta = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Annotation/README.txt" - mito_name = "chrM" - } - 'panTro4' { - fasta = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Annotation/README.txt" - mito_name = "chrM" - } - 'rn6' { - fasta = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Annotation/Genes/genes.bed" - mito_name = "chrM" - } - 'sacCer3' { - fasta = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/BismarkIndex/" - readme = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Annotation/README.txt" - mito_name = "chrM" - macs_gsize = "1.2e7" - } - 'susScr3' { - fasta = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/BWAIndex/genome.fa" - bowtie2 = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/Bowtie2Index/" - star = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/STARIndex/" - bismark = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/BismarkIndex/" - gtf = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/Genes/genes.gtf" - bed12 = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/Genes/genes.bed" - readme = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Annotation/README.txt" - mito_name = "chrM" - } - } } diff --git a/conf/modules.config b/conf/modules.config new file mode 100644 index 00000000..fe1aaed4 --- /dev/null +++ b/conf/modules.config @@ -0,0 +1,69 @@ +/* +======================================================================================== + Config file for defining DSL2 per module options +======================================================================================== + Available keys to override module options: + args = Additional arguments appended to command in module. + args2 = Second set of arguments appended to command in module (multi-tool modules). + args3 = Third set of arguments appended to command in module (multi-tool modules). + publish_dir = Directory to publish results. + publish_by_meta = Groovy list of keys available in meta map to append as directories to "publish_dir" path + If publish_by_meta = true - Value of ${meta['id']} is appended as a directory to "publish_dir" path + If publish_by_meta = ['id', 'custompath'] - If "id" is in meta map and "custompath" isn't then "${meta['id']}/custompath/" + is appended as a directory to "publish_dir" path + If publish_by_meta = false / null - No directories are appended to "publish_dir" path + publish_files = Groovy map where key = "file_ext" and value = "directory" to publish results for that file extension + The value of "directory" is appended to the standard "publish_dir" path as defined above. + If publish_files = null (unspecified) - All files are published. + If publish_files = false - No files are published. + suffix = File name suffix for output files. +---------------------------------------------------------------------------------------- +*/ + +params { + modules { + 'cellranger_mkgtf' { + publish_dir = "cellranger/mkgtf" + args = "--attribute=gene_biotype:protein_coding --attribute=gene_biotype:lncRNA --attribute=gene_biotype:pseudogene" + } + 'cellranger_mkref' { + publish_dir = "cellranger/mkref" + publish_files = false + } + 'cellranger_count' { + publish_dir = "cellranger/count" + publish_files = ['gz':'filtered_feature_bc_matrix'] + } + 'gffread_tx2pgene' { + args = "--table transcript_id,gene_id" + suffix = "_gffread" + } + 'salmon_alevin' { + args = "" + } + 'salmon_index' { + args = "" + } + 'alevinqc' { + args = "" + } + 'multiqc_alevin' { + args = "" + } + 'star_genomegenerate' { + args = "" + } + 'star_align' { + args = "--readFilesCommand zcat --runDirPerm All_RWX --outWigType bedGraph --twopassMode Basic --outSAMtype BAM SortedByCoordinate" + } + 'kallistobustools_ref' { + args = "" + } + 'kallistobustools_count' { + args = "" + } + 'multiqc_kb' { + args = "" + } + } +} diff --git a/conf/test.config b/conf/test.config index 1afdc0eb..6943aa11 100644 --- a/conf/test.config +++ b/conf/test.config @@ -1,27 +1,29 @@ /* - * ------------------------------------------------- - * Nextflow config file for running tests - * ------------------------------------------------- - * Defines bundled input files and everything required - * to run a fast and simple test. Use as follows: - * nextflow run nf-core/scrnaseq -profile test, - */ +======================================================================================== + Nextflow config file for running minimal tests +======================================================================================== + Defines input files and everything required to run a fast and simple pipeline test. + + Use as follows: + nextflow run nf-core/scrnaseq -profile test, + +---------------------------------------------------------------------------------------- +*/ params { - config_profile_name = 'Test profile' - config_profile_description = 'Minimal test dataset to check pipeline function' - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = 6.GB - max_time = 48.h + config_profile_name = 'Test profile' + config_profile_description = 'Minimal test dataset to check pipeline function' + + // Limit resources so that this can run on GitHub Actions + max_cpus = 2 + max_memory = 6.GB + max_time = 6.h - // Input data - input_paths = [ - ['S10_L001', - ['https://github.com/nf-core/test-datasets/raw/scrnaseq/testdata/S10_L001_R1_001.fastq.gz', - 'https://github.com/nf-core/test-datasets/raw/scrnaseq/testdata/S10_L001_R2_001.fastq.gz']]] - fasta = 'https://github.com/nf-core/test-datasets/raw/scrnaseq/reference/GRCm38.p6.genome.chr19.fa' - gtf = 'https://github.com/nf-core/test-datasets/raw/scrnaseq/reference/gencode.vM19.annotation.chr19.gtf' - // Ignore `--input` as otherwise the parameter validation will throw an error - schema_ignore_params = 'genomes,input_paths,input' + // Input data + input = 'https://github.com/nf-core/test-datasets/raw/scrnaseq/samplesheet-2-0.csv' + genome_fasta = 'https://github.com/nf-core/test-datasets/raw/scrnaseq/reference/GRCm38.p6.genome.chr19.fa' + gtf = 'https://github.com/nf-core/test-datasets/raw/scrnaseq/reference/gencode.vM19.annotation.chr19.gtf' + protocol = '10XV2' + // Ignore `--input` as otherwise the parameter validation will throw an error + schema_ignore_params = 'genomes,input_paths,input' } diff --git a/conf/test_full.config b/conf/test_full.config index 857be090..8bb92b3e 100644 --- a/conf/test_full.config +++ b/conf/test_full.config @@ -1,24 +1,24 @@ /* - * ------------------------------------------------- - * Nextflow config file for running full-size tests - * ------------------------------------------------- - * Defines bundled input files and everything required - * to run a full size pipeline test. Use as follows: - * nextflow run nf-core/scrnaseq -profile test_full, - */ +======================================================================================== + Nextflow config file for running full-size tests +======================================================================================== + Defines input files and everything required to run a full size pipeline test. + + Use as follows: + nextflow run nf-core/scrnaseq -profile test_full, + +---------------------------------------------------------------------------------------- +*/ params { - config_profile_name = 'Full test profile' - config_profile_description = 'Full test dataset to check pipeline function' + config_profile_name = 'Full test profile' + config_profile_description = 'Full test dataset to check pipeline function' + + // Input data for full size test + // TODO nf-core: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA) + // TODO nf-core: Give any required params for the test so that command line flags are not needed + input = 'https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/samplesheet/samplesheet_full_illumina_amplicon.csv' - // Input data for full size test - // TODO nf-core: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA) - // TODO nf-core: Give any required params for the test so that command line flags are not needed - single_end = false - input_paths = [ - ['Testdata', ['https://github.com/nf-core/test-datasets/raw/exoseq/testdata/Testdata_R1.tiny.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/exoseq/testdata/Testdata_R2.tiny.fastq.gz']], - ['SRR389222', ['https://github.com/nf-core/test-datasets/raw/methylseq/testdata/SRR389222_sub1.fastq.gz', 'https://github.com/nf-core/test-datasets/raw/methylseq/testdata/SRR389222_sub2.fastq.gz']] - ] - // Ignore `--input` as otherwise the parameter validation will throw an error - schema_ignore_params = 'genomes,input_paths,input' + // Genome references + genome = 'R64-1-1' } diff --git a/conf/test_kallisto.config b/conf/test_kallisto.config deleted file mode 100644 index 60a4d75b..00000000 --- a/conf/test_kallisto.config +++ /dev/null @@ -1,31 +0,0 @@ -/* - * ------------------------------------------------- - * Nextflow config file for running tests - * ------------------------------------------------- - * Defines bundled input files and everything required - * to run a fast and simple test. Use as follows: - * nextflow run nf-core/scrnaseq -profile test_kallisto,docker - */ - -params { - config_profile_name = 'Test profile' - config_profile_description = 'Minimal test dataset to check kallisto/bustools workflow function' - // Limit resources so that this can run on Travis - max_cpus = 2 - max_memory = 6.GB - max_time = 48.h - // Input data - barcode_whitelist = 'https://github.com/nf-core/test-datasets/raw/scrnaseq/reference/10xv2_whitelist_sub50K.txt' - aligner = 'kallisto' - bustools_correct = false - skip_bustools = true - type = '10x' - chemistry = 'V2' - input_paths = [ - ['SRR8599150', - ['https://github.com/nf-core/test-datasets/raw/scrnaseq/testdata/kallisto/SRR8599150_S1_L001_R1_001.sub5000.fastq.gz', - 'https://github.com/nf-core/test-datasets/raw/scrnaseq/testdata/kallisto/SRR8599150_S1_L001_R2_001.sub5000.fastq.gz']]] - transcript_fasta = 'https://github.com/nf-core/test-datasets/raw/scrnaseq/reference/kallisto/Mus_musculus.GRCm38.chr19subset.fa.gz' - gtf = 'https://github.com/nf-core/test-datasets/raw/scrnaseq/reference/kallisto/Mus_musculus.GRCm38.96.chr19.gtf.gz' - -} diff --git a/docs/README.md b/docs/README.md index 43c251d0..0ab24bdd 100644 --- a/docs/README.md +++ b/docs/README.md @@ -3,8 +3,8 @@ The nf-core/scrnaseq documentation is split into the following pages: * [Usage](usage.md) - * An overview of how the pipeline works, how to run it and a description of all of the different command-line flags. + * An overview of how the pipeline works, how to run it and a description of all of the different command-line flags. * [Output](output.md) - * An overview of the different results produced by the pipeline and how to interpret them. + * An overview of the different results produced by the pipeline and how to interpret them. You can find a lot more documentation about installing, configuring and running nf-core pipelines on the website: [https://nf-co.re](https://nf-co.re) diff --git a/docs/images/mqc_fastqc_adapter.png b/docs/images/mqc_fastqc_adapter.png new file mode 100755 index 00000000..361d0e47 Binary files /dev/null and b/docs/images/mqc_fastqc_adapter.png differ diff --git a/docs/images/mqc_fastqc_counts.png b/docs/images/mqc_fastqc_counts.png new file mode 100755 index 00000000..cb39ebb8 Binary files /dev/null and b/docs/images/mqc_fastqc_counts.png differ diff --git a/docs/images/mqc_fastqc_quality.png b/docs/images/mqc_fastqc_quality.png new file mode 100755 index 00000000..a4b89bf5 Binary files /dev/null and b/docs/images/mqc_fastqc_quality.png differ diff --git a/docs/output.md b/docs/output.md index e95218c0..db5f50ab 100644 --- a/docs/output.md +++ b/docs/output.md @@ -6,19 +6,18 @@ This document describes the output produced by the pipeline. Most of the plots a ## Pipeline overview -The pipeline is built using [Nextflow](https://www.nextflow.io/) -and processes data using the following steps: +The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps: * [nf-core/scrnaseq: Output](#nf-corescrnaseq-output) - * [:warning: Please read this documentation on the nf-core website: https://nf-co.re/scrnaseq/output](#warning-please-read-this-documentation-on-the-nf-core-website-httpsnf-corescrnaseqoutput) - * [Introduction](#introduction) - * [Pipeline overview](#pipeline-overview) - * [Kallisto & Bustools Results](#kallisto--bustools-results) - * [STARsolo](#starsolo) - * [Salmon Alevin & AlevinQC](#salmon-alevin--alevinqc) - * [Other output data](#other-output-data) - * [MultiQC](#multiqc) - * [Pipeline information](#pipeline-information) + * [:warning: Please read this documentation on the nf-core website: https://nf-co.re/scrnaseq/output](#warning-please-read-this-documentation-on-the-nf-core-website-httpsnf-corescrnaseqoutput) + * [Introduction](#introduction) + * [Pipeline overview](#pipeline-overview) + * [Kallisto & Bustools Results](#kallisto--bustools-results) + * [STARsolo](#starsolo) + * [Salmon Alevin & AlevinQC](#salmon-alevin--alevinqc) + * [Other output data](#other-output-data) + * [MultiQC](#multiqc) + * [Pipeline information](#pipeline-information) ## Kallisto & Bustools Results @@ -29,24 +28,24 @@ The pipeline can analyze data from single cell rnaseq experiments and generates **Output directory: `results/kallisto`** * `raw_bus` - * Contains the unconverted BUS formatted pseudo aligned data + * Contains the unconverted BUS formatted pseudo aligned data * `sort_bus` - * Contains the same BUS formatted data, sorted and corrected with the supplied barcode whitelist + * Contains the same BUS formatted data, sorted and corrected with the supplied barcode whitelist * `kallisto_gene_map` - * Contains the converted GTF gene map that is used by BUSTools for downstream analysis + * Contains the converted GTF gene map that is used by BUSTools for downstream analysis * `bustools_counts` - * Contains two subdirectories - * `eqcount`: Containing the Transcript Compatibility Count (TCC) Matrix (`tcc.mtx`) - * `genecount`: Containing the Gene Count Matrix (`gene.mtx`) + * Contains two subdirectories + * `eqcount`: Containing the Transcript Compatibility Count (TCC) Matrix (`tcc.mtx`) + * `genecount`: Containing the Gene Count Matrix (`gene.mtx`) * `bustools_metrics` - * Contains the JSON metrics generated by BUStools + * Contains the JSON metrics generated by BUStools For details on how to load these into R and perform further downstream analysis, please refer to the [BusTools HowTo](https://github.com/BUStools/getting_started/blob/master/getting_started.ipynb). **Output directory: `results/reference_genome`** * `kallisto_index` - * Contains the index of the supplied (genome/transcriptome) fasta file + * Contains the index of the supplied (genome/transcriptome) fasta file ## STARsolo @@ -57,56 +56,60 @@ For details on how to load these into R and perform further downstream analysis, **Output directory: `results/reference_genome`** * `star_index` - * Contains the index of the supplied genome fasta file + * Contains the index of the supplied genome fasta file ## Salmon Alevin & AlevinQC **Output directory: `results/alevin`** * `alevin` - * Contains the created Salmon Alevin pseudo-aligned output + * Contains the created Salmon Alevin pseudo-aligned output * `alevinqc` - * Contains the QC report for the aforementioned Salmon Alevin output data + * Contains the QC report for the aforementioned Salmon Alevin output data **Output directory: `results/reference_genome`** * `salmon_index` - * Contains the indexed reference transcriptome for Salmon Alevin + * Contains the indexed reference transcriptome for Salmon Alevin * `alevin/txp2gene.tsv` - * The transcriptome to gene mapping TSV file utilized by Salmon Alevin + * The transcriptome to gene mapping TSV file utilized by Salmon Alevin ## Other output data **Output directory: `results/reference_genome`** * `barcodes` - * Contains the utilized cell barcode whitelists (if applicable) + * Contains the utilized cell barcode whitelists (if applicable) * `extract_transcriptome` - * When supplied with a `--fasta` genome fasta, this contains the extracted transcriptome - * The GTF file supplied with `--gtf` is used to extract the transcriptome positions appropriately - -## MultiQC + * When supplied with a `--fasta` genome fasta, this contains the extracted transcriptome + * The GTF file supplied with `--gtf` is used to extract the transcriptome positions appropriately -[MultiQC](http://multiqc.info) is a visualization tool that generates a single HTML report summarizing all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in the report data directory. +## MultiQC -The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. +
+Output files -For more information about how to use MultiQC reports, see [https://multiqc.info](https://multiqc.info). +* `multiqc/` + * `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser. + * `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline. + * `multiqc_plots/`: directory containing static images from the report in various formats. -**Output files:** +
-* `multiqc/` - * `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser. - * `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline. - * `multiqc_plots/`: directory containing static images from the report in various formats. +[MultiQC](http://multiqc.info) is a visualization tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in the report data directory. -## Pipeline information +Results generated by MultiQC collate pipeline QC from supported tools e.g. FastQC. The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see . -[Nextflow](https://www.nextflow.io/docs/latest/tracing.html) provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage. +### Pipeline information -**Output files:** +
+Output files * `pipeline_info/` - * Reports generated by Nextflow: `execution_report.html`, `execution_timeline.html`, `execution_trace.txt` and `pipeline_dag.dot`/`pipeline_dag.svg`. - * Reports generated by the pipeline: `pipeline_report.html`, `pipeline_report.txt` and `software_versions.csv`. - * Documentation for interpretation of results in HTML format: `results_description.html`. + * Reports generated by Nextflow: `execution_report.html`, `execution_timeline.html`, `execution_trace.txt` and `pipeline_dag.dot`/`pipeline_dag.svg`. + * Reports generated by the pipeline: `pipeline_report.html`, `pipeline_report.txt` and `software_versions.tsv`. + * Reformatted samplesheet files used as input to the pipeline: `samplesheet.valid.csv`. + +
+ +[Nextflow](https://www.nextflow.io/docs/latest/tracing.html) provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage. diff --git a/docs/usage.md b/docs/usage.md index a064d1e7..6b30f05b 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -2,21 +2,57 @@ > _Documentation of pipeline parameters is generated automatically from the pipeline schema and can no longer be found in markdown files._ -## Introduction +## Samplesheet input + +You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row as shown in the examples below. + +```console +--input '[path to samplesheet file]' +``` + +### Multiple runs of the same sample + +The `sample` identifiers have to be the same when you have re-sequenced the same sample more than once e.g. to increase sequencing depth. The pipeline will concatenate the raw reads before performing any downstream analysis. Below is an example for the same sample sequenced across 3 lanes: + +```console +sample,fastq_1,fastq_2 +CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz +CONTROL_REP1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz +CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz +``` + +### Full samplesheet + +The samplesheet can have as many columns as you desire, however, there is a strict requirement for the first 3 columns to match those defined in the table below. + +A final samplesheet file consisting of both single- and paired-end data may look something like the one below. This is for 6 samples, where `TREATMENT_REP3` has been sequenced twice. + +```console +sample,fastq_1,fastq_2 +test,https://github.com/nf-core/test-datasets/raw/scrnaseq/testdata/S10_L001_R1_001.fastq.gz,https://github.com/nf-core/test-datasets/raw/scrnaseq/testdata/S10_L001_R2_001.fastq.gz +``` + +| Column | Description | +|----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `sample` | Custom sample name. This entry will be identical for multiple sequencing libraries/runs from the same sample. Spaces in sample names are automatically converted to underscores (`_`). | +| `fastq_1` | Full path to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". | +| `fastq_2` | Full path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". | + +An [example samplesheet](../assets/samplesheet.csv) has been provided with the pipeline. ## Running the pipeline The minimum typical command for running the pipeline is as follows: ```bash -nextflow run nf-core/scrnaseq --reads '*_R{1,2}.fastq.gz' --fasta human.fasta --gtf human.gtf -profile docker +nextflow run nf-core/scrnaseq --input 'samplesheet.csv' --fasta human.fasta --gtf human.gtf -profile docker ``` This will launch the pipeline with the `docker` configuration profile and default `--type` and `--barcode_whitelist`. See below for more information about profiles and these options. Note that the pipeline will create the following files in your working directory: -```bash +```console work # Directory containing the nextflow working files results # Finished results (configurable, see below) .nextflow.log # Log file from Nextflow @@ -27,13 +63,13 @@ results # Finished results (configurable, see below) When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline: -```bash +```console nextflow pull nf-core/scrnaseq ``` ### Reproducibility -It's a good idea to specify a pipeline version when running the pipeline on your data. This ensures that a specific version of the pipeline code and software are used when you run your pipeline. If you keep using the same tag, you'll be running the same version of the pipeline, even if there have been changes to the code since. +It is a good idea to specify a pipeline version when running the pipeline on your data. This ensures that a specific version of the pipeline code and software are used when you run your pipeline. If you keep using the same tag, you'll be running the same version of the pipeline, even if there have been changes to the code since. First, go to the [nf-core/scrnaseq releases page](https://github.com/nf-core/scrnaseq/releases) and find the latest version number - numeric only (eg. `1.0.0`). Then specify this when running the pipeline with `-r` (one hyphen) - eg. `-r 1.0.0`. @@ -49,7 +85,7 @@ Use this parameter to choose a configuration profile. Profiles can give configur Use this parameter to choose a configuration profile. Profiles can give configuration presets for different compute environments. -Several generic profiles are bundled with the pipeline which instruct the pipeline to use software packaged using different methods (Docker, Singularity, Podman, Shifter, Charliecloud, Conda) - see below. +Several generic profiles are bundled with the pipeline which instruct the pipeline to use software packaged using different methods (Docker, Singularity, Podman, Shifter, Charliecloud, Conda) - see below. When using Biocontainers, most of these software packaging methods pull Docker containers from quay.io e.g [FastQC](https://quay.io/repository/biocontainers/fastqc) except for Singularity which directly downloads Singularity images via https hosted by the [Galaxy project](https://depot.galaxyproject.org/singularity/) and Conda which downloads and installs software locally from [Bioconda](https://bioconda.github.io/). > We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported. @@ -60,28 +96,21 @@ They are loaded in sequence, so later profiles can overwrite earlier profiles. If `-profile` is not specified, the pipeline will run locally and expect all software to be installed and available on the `PATH`. This is _not_ recommended. -- `docker` - - A generic configuration profile to be used with [Docker](https://docker.com/) - - Pulls software from Docker Hub: [`nfcore/scrnaseq`](https://hub.docker.com/r/nfcore/scrnaseq/) -- `singularity` - - A generic configuration profile to be used with [Singularity](https://sylabs.io/docs/) - - Pulls software from Docker Hub: [`nfcore/scrnaseq`](https://hub.docker.com/r/nfcore/scrnaseq/) -- `podman` - - A generic configuration profile to be used with [Podman](https://podman.io/) - - Pulls software from Docker Hub: [`nfcore/scrnaseq`](https://hub.docker.com/r/nfcore/scrnaseq/) -- `shifter` - - A generic configuration profile to be used with [Shifter](https://nersc.gitlab.io/development/shifter/how-to-use/) - - Pulls software from Docker Hub: [`nfcore/scrnaseq`](https://hub.docker.com/r/nfcore/scrnaseq/) -- `charliecloud` - - A generic configuration profile to be used with [Charliecloud](https://hpc.github.io/charliecloud/) - - Pulls software from Docker Hub: [`nfcore/scrnaseq`](https://hub.docker.com/r/nfcore/scrnaseq/) -- `conda` - - Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker, Singularity, Podman, Shifter or Charliecloud. - - A generic configuration profile to be used with [Conda](https://conda.io/docs/) - - Pulls most software from [Bioconda](https://bioconda.github.io/) -- `test` - - A profile with a complete configuration for automated testing - - Includes links to test data so needs no other parameters +* `docker` + * A generic configuration profile to be used with [Docker](https://docker.com/) +* `singularity` + * A generic configuration profile to be used with [Singularity](https://sylabs.io/docs/) +* `podman` + * A generic configuration profile to be used with [Podman](https://podman.io/) +* `shifter` + * A generic configuration profile to be used with [Shifter](https://nersc.gitlab.io/development/shifter/how-to-use/) +* `charliecloud` + * A generic configuration profile to be used with [Charliecloud](https://hpc.github.io/charliecloud/) +* `conda` + * A generic configuration profile to be used with [Conda](https://conda.io/docs/). Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker, Singularity, Podman, Shifter or Charliecloud. +* `test` + * A profile with a complete configuration for automated testing + * Includes links to test data so needs no other parameters ### `-resume` @@ -93,27 +122,140 @@ You can also supply a run name to resume a specific run: `-resume [run-name]`. U Specify the path to a specific config file (this is a core Nextflow command). See the [nf-core website documentation](https://nf-co.re/usage/configuration) for more information. -#### Custom resource requests +## Custom configuration + +### Resource requests + +Whilst the default requirements set within the pipeline will hopefully work for most people and with most input data, you may find that you want to customise the compute resources that the pipeline requests. Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with any of the error codes specified [here](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/base.config#L18) it will automatically be resubmitted with higher requests (2 x original, then 3 x original). If it still fails after the third attempt then the pipeline execution is stopped. + +For example, if the nf-core/rnaseq pipeline is failing after multiple re-submissions of the `STAR_ALIGN` process due to an exit code of `137` this would indicate that there is an out of memory issue: + +```console +[62/149eb0] NOTE: Process `RNASEQ:ALIGN_STAR:STAR_ALIGN (WT_REP1)` terminated with an error exit status (137) -- Execution is retried (1) +Error executing process > 'RNASEQ:ALIGN_STAR:STAR_ALIGN (WT_REP1)' + +Caused by: + Process `RNASEQ:ALIGN_STAR:STAR_ALIGN (WT_REP1)` terminated with an error exit status (137) -Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with an error code of `143` (exceeded requested resources) it will automatically resubmit with higher requests (2 x original, then 3 x original). If it still fails after three times then the pipeline is stopped. +Command executed: + STAR \ + --genomeDir star \ + --readFilesIn WT_REP1_trimmed.fq.gz \ + --runThreadN 2 \ + --outFileNamePrefix WT_REP1. \ + -Whilst these default requirements will hopefully work for most people with most data, you may find that you want to customise the compute resources that the pipeline requests. You can do this by creating a custom config file. For example, to give the workflow process `star` 32GB of memory, you could use the following config: +Command exit status: + 137 + +Command output: + (empty) + +Command error: + .command.sh: line 9: 30 Killed STAR --genomeDir star --readFilesIn WT_REP1_trimmed.fq.gz --runThreadN 2 --outFileNamePrefix WT_REP1. +Work dir: + /home/pipelinetest/work/9d/172ca5881234073e8d76f2a19c88fb + +Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run` +``` + +To bypass this error you would need to find exactly which resources are set by the `STAR_ALIGN` process. The quickest way is to search for `process STAR_ALIGN` in the [nf-core/rnaseq Github repo](https://github.com/nf-core/rnaseq/search?q=process+STAR_ALIGN). We have standardised the structure of Nextflow DSL2 pipelines such that all module files will be present in the `modules/` directory and so based on the search results the file we want is `modules/nf-core/software/star/align/main.nf`. If you click on the link to that file you will notice that there is a `label` directive at the top of the module that is set to [`label process_high`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/modules/nf-core/software/star/align/main.nf#L9). The [Nextflow `label`](https://www.nextflow.io/docs/latest/process.html#label) directive allows us to organise workflow processes in separate groups which can be referenced in a configuration file to select and configure subset of processes having similar computing requirements. The default values for the `process_high` label are set in the pipeline's [`base.config`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/base.config#L33-L37) which in this case is defined as 72GB. Providing you haven't set any other standard nf-core parameters to __cap__ the [maximum resources](https://nf-co.re/usage/configuration#max-resources) used by the pipeline then we can try and bypass the `STAR_ALIGN` process failure by creating a custom config file that sets at least 72GB of memory, in this case increased to 100GB. The custom config below can then be provided to the pipeline via the [`-c`](#-c) parameter as highlighted in previous sections. ```nextflow process { - withName: star { - memory = 32.GB - } + withName: STAR_ALIGN { + memory = 100.GB + } +} +``` + +> **NB:** We specify just the process name i.e. `STAR_ALIGN` in the config file and not the full task name string that is printed to screen in the error message or on the terminal whilst the pipeline is running i.e. `RNASEQ:ALIGN_STAR:STAR_ALIGN`. You may get a warning suggesting that the process selector isn't recognised but you can ignore that if the process name has been specified correctly. This is something that needs to be fixed upstream in core Nextflow. + +### Tool-specific options + +For the ultimate flexibility, we have implemented and are using Nextflow DSL2 modules in a way where it is possible for both developers and users to change tool-specific command-line arguments (e.g. providing an additional command-line argument to the `STAR_ALIGN` process) as well as publishing options (e.g. saving files produced by the `STAR_ALIGN` process that aren't saved by default by the pipeline). In the majority of instances, as a user you won't have to change the default options set by the pipeline developer(s), however, there may be edge cases where creating a simple custom config file can improve the behaviour of the pipeline if for example it is failing due to a weird error that requires setting a tool-specific parameter to deal with smaller / larger genomes. + +The command-line arguments passed to STAR in the `STAR_ALIGN` module are a combination of: + +* Mandatory arguments or those that need to be evaluated within the scope of the module, as supplied in the [`script`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/modules/nf-core/software/star/align/main.nf#L49-L55) section of the module file. + +* An [`options.args`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/modules/nf-core/software/star/align/main.nf#L56) string of non-mandatory parameters that is set to be empty by default in the module but can be overwritten when including the module in the sub-workflow / workflow context via the `addParams` Nextflow option. + +The nf-core/rnaseq pipeline has a sub-workflow (see [terminology](https://github.com/nf-core/modules#terminology)) specifically to align reads with STAR and to sort, index and generate some basic stats on the resulting BAM files using SAMtools. At the top of this file we import the `STAR_ALIGN` module via the Nextflow [`include`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/subworkflows/nf-core/align_star.nf#L10) keyword and by default the options passed to the module via the `addParams` option are set as an empty Groovy map [here](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/subworkflows/nf-core/align_star.nf#L5); this in turn means `options.args` will be set to empty by default in the module file too. This is an intentional design choice and allows us to implement well-written sub-workflows composed of a chain of tools that by default run with the bare minimum parameter set for any given tool in order to make it much easier to share across pipelines and to provide the flexibility for users and developers to customise any non-mandatory arguments. + +When including the sub-workflow above in the main pipeline workflow we use the same `include` statement, however, we now have the ability to overwrite options for each of the tools in the sub-workflow including the [`align_options`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/workflows/rnaseq.nf#L225) variable that will be used specifically to overwrite the optional arguments passed to the `STAR_ALIGN` module. In this case, the options to be provided to `STAR_ALIGN` have been assigned sensible defaults by the developer(s) in the pipeline's [`modules.config`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/modules.config#L70-L74) and can be accessed and customised in the [workflow context](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/workflows/rnaseq.nf#L201-L204) too before eventually passing them to the sub-workflow as a Groovy map called `star_align_options`. These options will then be propagated from `workflow -> sub-workflow -> module`. + +As mentioned at the beginning of this section it may also be necessary for users to overwrite the options passed to modules to be able to customise specific aspects of the way in which a particular tool is executed by the pipeline. Given that all of the default module options are stored in the pipeline's `modules.config` as a [`params` variable](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/modules.config#L24-L25) it is also possible to overwrite any of these options via a custom config file. + +Say for example we want to append an additional, non-mandatory parameter (i.e. `--outFilterMismatchNmax 16`) to the arguments passed to the `STAR_ALIGN` module. Firstly, we need to copy across the default `args` specified in the [`modules.config`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/modules.config#L71) and create a custom config file that is a composite of the default `args` as well as the additional options you would like to provide. This is very important because Nextflow will overwrite the default value of `args` that you provide via the custom config. + +As you will see in the example below, we have: + +* appended `--outFilterMismatchNmax 16` to the default `args` used by the module. +* changed the default `publish_dir` value to where the files will eventually be published in the main results directory. +* appended `'bam':''` to the default value of `publish_files` so that the BAM files generated by the process will also be saved in the top-level results directory for the module. Note: `'out':'log'` means any file/directory ending in `out` will now be saved in a separate directory called `my_star_directory/log/`. + +```nextflow +params { + modules { + 'star_align' { + args = "--quantMode TranscriptomeSAM --twopassMode Basic --outSAMtype BAM Unsorted --readFilesCommand zcat --runRNGseed 0 --outFilterMultimapNmax 20 --alignSJDBoverhangMin 1 --outSAMattributes NH HI AS NM MD --quantTranscriptomeBan Singleend --outFilterMismatchNmax 16" + publish_dir = "my_star_directory" + publish_files = ['out':'log', 'tab':'log', 'bam':''] + } + } } ``` -See the main [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for more information. +### Updating containers + +The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. If for some reason you need to use a different version of a particular tool with the pipeline then you just need to identify the `process` name and override the Nextflow `container` definition for that process using the `withName` declaration. For example, in the [nf-core/viralrecon](https://nf-co.re/viralrecon) pipeline a tool called [Pangolin](https://github.com/cov-lineages/pangolin) has been used during the COVID-19 pandemic to assign lineages to SARS-CoV-2 genome sequenced samples. Given that the lineage assignments change quite frequently it doesn't make sense to re-release the nf-core/viralrecon everytime a new version of Pangolin has been released. However, you can override the default container used by the pipeline by creating a custom config file and passing it as a command-line argument via `-c custom.config`. + +1. Check the default version used by the pipeline in the module file for [Pangolin](https://github.com/nf-core/viralrecon/blob/a85d5969f9025409e3618d6c280ef15ce417df65/modules/nf-core/software/pangolin/main.nf#L14-L19) +2. Find the latest version of the Biocontainer available on [Quay.io](https://quay.io/repository/biocontainers/pangolin?tag=latest&tab=tags) +3. Create the custom config accordingly: + + * For Docker: + + ```nextflow + process { + withName: PANGOLIN { + container = 'quay.io/biocontainers/pangolin:3.0.5--pyhdfd78af_0' + } + } + ``` + + * For Singularity: + + ```nextflow + process { + withName: PANGOLIN { + container = 'https://depot.galaxyproject.org/singularity/pangolin:3.0.5--pyhdfd78af_0' + } + } + ``` -If you are likely to be running `nf-core` pipelines regularly it may be a good idea to request that your custom config file is uploaded to the `nf-core/configs` git repository. Before you do this please can you test that the config file works with your pipeline of choice using the `-c` parameter (see definition above). You can then create a pull request to the `nf-core/configs` repository with the addition of your config file, associated documentation file (see examples in [`nf-core/configs/docs`](https://github.com/nf-core/configs/tree/master/docs)), and amending [`nfcore_custom.config`](https://github.com/nf-core/configs/blob/master/nfcore_custom.config) to include your custom profile. + * For Conda: + + ```nextflow + process { + withName: PANGOLIN { + conda = 'bioconda::pangolin=3.0.5' + } + } + ``` + +> **NB:** If you wish to periodically update individual tool-specific results (e.g. Pangolin) generated by the pipeline then you must ensure to keep the `work/` directory otherwise the `-resume` ability of the pipeline will be compromised and it will restart from scratch. + +### nf-core/configs + +In most cases, you will only need to create a custom config as a one-off but if you and others within your organisation are likely to be running nf-core pipelines regularly and need to use the same settings regularly it may be a good idea to request that your custom config file is uploaded to the `nf-core/configs` git repository. Before you do this please can you test that the config file works with your pipeline of choice using the `-c` parameter. You can then create a pull request to the `nf-core/configs` repository with the addition of your config file, associated documentation file (see examples in [`nf-core/configs/docs`](https://github.com/nf-core/configs/tree/master/docs)), and amending [`nfcore_custom.config`](https://github.com/nf-core/configs/blob/master/nfcore_custom.config) to include your custom profile. + +See the main [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for more information about creating your own configuration files. If you have any questions or issues please send us a message on [Slack](https://nf-co.re/join/slack) on the [`#configs` channel](https://nfcore.slack.com/channels/configs). -### Running in the background +## Running in the background Nextflow handles job submissions and supervises the running jobs. The Nextflow process must run until the pipeline is finished. @@ -122,11 +264,11 @@ The Nextflow `-bg` flag launches Nextflow in the background, detached from your Alternatively, you can use `screen` / `tmux` or similar tool to create a detached session which you can log back into at a later time. Some HPC setups also allow you to run nextflow within a cluster job submitted your job scheduler (from where it submits more jobs). -#### Nextflow memory requirements +## Nextflow memory requirements In some cases, the Nextflow Java virtual machines can start to request a large amount of memory. We recommend adding the following line to your environment to limit this (typically in `~/.bashrc` or `~./bash_profile`): -```bash +```console NXF_OPTS='-Xms1g -Xmx4g' ``` diff --git a/environment.yml b/environment.yml deleted file mode 100644 index 5cc80727..00000000 --- a/environment.yml +++ /dev/null @@ -1,26 +0,0 @@ -# You can use this file to create a conda environment for this pipeline: -# conda env create -f environment.yml -name: nf-core-scrnaseq-1.1.0 -channels: - - conda-forge - - bioconda - - defaults -dependencies: - - conda-forge::python=3.7.3 - - conda-forge::markdown=3.1.1 - - conda-forge::pymdown-extensions=6.0 - - conda-forge::pygments=2.5.2 - - bioconda::multiqc=1.8 - - bioconda::salmon=1.4.0 - - bioconda::r-seurat=3.0.2 - - bioconda::bioconductor-isee=1.6.0 - - bioconda::bioconductor-tximeta=1.4.0 - - bioconda::bioawk=1.0 - - bioconda::bioconductor-alevinqc=1.2.0 - - bioconda::bioconductor-tximport=1.14.0 - - bioconda::star=2.7.2c #2.7.3a is broken! - - samtools=1.10 - - gffread=0.11.7 - - bioconda::kallisto=0.46.2 - - bioconda::bustools=0.40.0 - - font-ttf-source-code-pro=2.030 #Required for fonts in alevinQC diff --git a/lib/Headers.groovy b/lib/Headers.groovy deleted file mode 100644 index 15d1d388..00000000 --- a/lib/Headers.groovy +++ /dev/null @@ -1,43 +0,0 @@ -/* - * This file holds several functions used to render the nf-core ANSI header. - */ - -class Headers { - - private static Map log_colours(Boolean monochrome_logs) { - Map colorcodes = [:] - colorcodes['reset'] = monochrome_logs ? '' : "\033[0m" - colorcodes['dim'] = monochrome_logs ? '' : "\033[2m" - colorcodes['black'] = monochrome_logs ? '' : "\033[0;30m" - colorcodes['green'] = monochrome_logs ? '' : "\033[0;32m" - colorcodes['yellow'] = monochrome_logs ? '' : "\033[0;33m" - colorcodes['yellow_bold'] = monochrome_logs ? '' : "\033[1;93m" - colorcodes['blue'] = monochrome_logs ? '' : "\033[0;34m" - colorcodes['purple'] = monochrome_logs ? '' : "\033[0;35m" - colorcodes['cyan'] = monochrome_logs ? '' : "\033[0;36m" - colorcodes['white'] = monochrome_logs ? '' : "\033[0;37m" - colorcodes['red'] = monochrome_logs ? '' : "\033[1;91m" - return colorcodes - } - - static String dashed_line(monochrome_logs) { - Map colors = log_colours(monochrome_logs) - return "-${colors.dim}----------------------------------------------------${colors.reset}-" - } - - static String nf_core(workflow, monochrome_logs) { - Map colors = log_colours(monochrome_logs) - String.format( - """\n - ${dashed_line(monochrome_logs)} - ${colors.green},--.${colors.black}/${colors.green},-.${colors.reset} - ${colors.blue} ___ __ __ __ ___ ${colors.green}/,-._.--~\'${colors.reset} - ${colors.blue} |\\ | |__ __ / ` / \\ |__) |__ ${colors.yellow}} {${colors.reset} - ${colors.blue} | \\| | \\__, \\__/ | \\ |___ ${colors.green}\\`-._,-`-,${colors.reset} - ${colors.green}`._,._,\'${colors.reset} - ${colors.purple} ${workflow.manifest.name} v${workflow.manifest.version}${colors.reset} - ${dashed_line(monochrome_logs)} - """.stripIndent() - ) - } -} diff --git a/lib/NfcoreSchema.groovy b/lib/NfcoreSchema.groovy old mode 100644 new mode 100755 index 54935ec8..8d6920dd --- a/lib/NfcoreSchema.groovy +++ b/lib/NfcoreSchema.groovy @@ -1,6 +1,6 @@ -/* - * This file holds several functions used to perform JSON parameter validation, help and summary rendering for the nf-core pipeline template. - */ +// +// This file holds several functions used to perform JSON parameter validation, help and summary rendering for the nf-core pipeline template. +// import org.everit.json.schema.Schema import org.everit.json.schema.loader.SchemaLoader @@ -13,16 +13,23 @@ import groovy.json.JsonBuilder class NfcoreSchema { - /* - * Function to loop over all parameters defined in schema and check - * whether the given paremeters adhere to the specificiations - */ + // + // Resolve Schema path relative to main workflow directory + // + public static String getSchemaPath(workflow, schema_filename='nextflow_schema.json') { + return "${workflow.projectDir}/${schema_filename}" + } + + // + // Function to loop over all parameters defined in schema and check + // whether the given parameters adhere to the specifications + // /* groovylint-disable-next-line UnusedPrivateMethodParameter */ - private static void validateParameters(params, jsonSchema, log) { + public static void validateParameters(workflow, params, log, schema_filename='nextflow_schema.json') { def has_error = false //=====================================================================// // Check for nextflow core params and unexpected params - def json = new File(jsonSchema).text + def json = new File(getSchemaPath(workflow, schema_filename=schema_filename)).text def Map schemaParams = (Map) new JsonSlurper().parseText(json).get('definitions') def nf_params = [ // Options for base `nextflow` command @@ -112,43 +119,50 @@ class NfcoreSchema { } // unexpected params def params_ignore = params.schema_ignore_params.split(',') + 'schema_ignore_params' - if (!expectedParams.contains(specifiedParam) && !params_ignore.contains(specifiedParam)) { - unexpectedParams.push(specifiedParam) + def expectedParamsLowerCase = expectedParams.collect{ it.replace("-", "").toLowerCase() } + def specifiedParamLowerCase = specifiedParam.replace("-", "").toLowerCase() + def isCamelCaseBug = (specifiedParam.contains("-") && !expectedParams.contains(specifiedParam) && expectedParamsLowerCase.contains(specifiedParamLowerCase)) + if (!expectedParams.contains(specifiedParam) && !params_ignore.contains(specifiedParam) && !isCamelCaseBug) { + // Temporarily remove camelCase/camel-case params #1035 + def unexpectedParamsLowerCase = unexpectedParams.collect{ it.replace("-", "").toLowerCase()} + if (!unexpectedParamsLowerCase.contains(specifiedParamLowerCase)){ + unexpectedParams.push(specifiedParam) + } } } //=====================================================================// // Validate parameters against the schema - InputStream inputStream = new File(jsonSchema).newInputStream() - JSONObject rawSchema = new JSONObject(new JSONTokener(inputStream)) + InputStream input_stream = new File(getSchemaPath(workflow, schema_filename=schema_filename)).newInputStream() + JSONObject raw_schema = new JSONObject(new JSONTokener(input_stream)) // Remove anything that's in params.schema_ignore_params - rawSchema = removeIgnoredParams(rawSchema, params) + raw_schema = removeIgnoredParams(raw_schema, params) - Schema schema = SchemaLoader.load(rawSchema) + Schema schema = SchemaLoader.load(raw_schema) // Clean the parameters def cleanedParams = cleanParameters(params) // Convert to JSONObject def jsonParams = new JsonBuilder(cleanedParams) - JSONObject paramsJSON = new JSONObject(jsonParams.toString()) + JSONObject params_json = new JSONObject(jsonParams.toString()) // Validate try { - schema.validate(paramsJSON) + schema.validate(params_json) } catch (ValidationException e) { println '' log.error 'ERROR: Validation of pipeline parameters failed!' JSONObject exceptionJSON = e.toJSON() - printExceptions(exceptionJSON, paramsJSON, log) + printExceptions(exceptionJSON, params_json, log) println '' has_error = true } // Check for unexpected parameters if (unexpectedParams.size() > 0) { - Map colors = log_colours(params.monochrome_logs) + Map colors = NfcoreTemplate.logColours(params.monochrome_logs) println '' def warn_msg = 'Found unexpected parameters:' for (unexpectedParam in unexpectedParams) { @@ -164,266 +178,17 @@ class NfcoreSchema { } } - // Loop over nested exceptions and print the causingException - private static void printExceptions(exJSON, paramsJSON, log) { - def causingExceptions = exJSON['causingExceptions'] - if (causingExceptions.length() == 0) { - def m = exJSON['message'] =~ /required key \[([^\]]+)\] not found/ - // Missing required param - if (m.matches()) { - log.error "* Missing required parameter: --${m[0][1]}" - } - // Other base-level error - else if (exJSON['pointerToViolation'] == '#') { - log.error "* ${exJSON['message']}" - } - // Error with specific param - else { - def param = exJSON['pointerToViolation'] - ~/^#\// - def param_val = paramsJSON[param].toString() - log.error "* --${param}: ${exJSON['message']} (${param_val})" - } - } - for (ex in causingExceptions) { - printExceptions(ex, paramsJSON, log) - } - } - - // Remove an element from a JSONArray - private static JSONArray removeElement(jsonArray, element){ - def list = [] - int len = jsonArray.length() - for (int i=0;i - if(rawSchema.keySet().contains('definitions')){ - rawSchema.definitions.each { definition -> - for (key in definition.keySet()){ - if (definition[key].get("properties").keySet().contains(ignore_param)){ - // Remove the param to ignore - definition[key].get("properties").remove(ignore_param) - // If the param was required, change this - if (definition[key].has("required")) { - def cleaned_required = removeElement(definition[key].required, ignore_param) - definition[key].put("required", cleaned_required) - } - } - } - } - } - if(rawSchema.keySet().contains('properties') && rawSchema.get('properties').keySet().contains(ignore_param)) { - rawSchema.get("properties").remove(ignore_param) - } - if(rawSchema.keySet().contains('required') && rawSchema.required.contains(ignore_param)) { - def cleaned_required = removeElement(rawSchema.required, ignore_param) - rawSchema.put("required", cleaned_required) - } - } - return rawSchema - } - - private static Map cleanParameters(params) { - def new_params = params.getClass().newInstance(params) - for (p in params) { - // remove anything evaluating to false - if (!p['value']) { - new_params.remove(p.key) - } - // Cast MemoryUnit to String - if (p['value'].getClass() == nextflow.util.MemoryUnit) { - new_params.replace(p.key, p['value'].toString()) - } - // Cast Duration to String - if (p['value'].getClass() == nextflow.util.Duration) { - new_params.replace(p.key, p['value'].toString()) - } - // Cast LinkedHashMap to String - if (p['value'].getClass() == LinkedHashMap) { - new_params.replace(p.key, p['value'].toString()) - } - } - return new_params - } - - /* - * This method tries to read a JSON params file - */ - private static LinkedHashMap params_load(String json_schema) { - def params_map = new LinkedHashMap() - try { - params_map = params_read(json_schema) - } catch (Exception e) { - println "Could not read parameters settings from JSON. $e" - params_map = new LinkedHashMap() - } - return params_map - } - - private static Map log_colours(Boolean monochrome_logs) { - Map colorcodes = [:] - - // Reset / Meta - colorcodes['reset'] = monochrome_logs ? '' : "\033[0m" - colorcodes['bold'] = monochrome_logs ? '' : "\033[1m" - colorcodes['dim'] = monochrome_logs ? '' : "\033[2m" - colorcodes['underlined'] = monochrome_logs ? '' : "\033[4m" - colorcodes['blink'] = monochrome_logs ? '' : "\033[5m" - colorcodes['reverse'] = monochrome_logs ? '' : "\033[7m" - colorcodes['hidden'] = monochrome_logs ? '' : "\033[8m" - - // Regular Colors - colorcodes['black'] = monochrome_logs ? '' : "\033[0;30m" - colorcodes['red'] = monochrome_logs ? '' : "\033[0;31m" - colorcodes['green'] = monochrome_logs ? '' : "\033[0;32m" - colorcodes['yellow'] = monochrome_logs ? '' : "\033[0;33m" - colorcodes['blue'] = monochrome_logs ? '' : "\033[0;34m" - colorcodes['purple'] = monochrome_logs ? '' : "\033[0;35m" - colorcodes['cyan'] = monochrome_logs ? '' : "\033[0;36m" - colorcodes['white'] = monochrome_logs ? '' : "\033[0;37m" - - // Bold - colorcodes['bblack'] = monochrome_logs ? '' : "\033[1;30m" - colorcodes['bred'] = monochrome_logs ? '' : "\033[1;31m" - colorcodes['bgreen'] = monochrome_logs ? '' : "\033[1;32m" - colorcodes['byellow'] = monochrome_logs ? '' : "\033[1;33m" - colorcodes['bblue'] = monochrome_logs ? '' : "\033[1;34m" - colorcodes['bpurple'] = monochrome_logs ? '' : "\033[1;35m" - colorcodes['bcyan'] = monochrome_logs ? '' : "\033[1;36m" - colorcodes['bwhite'] = monochrome_logs ? '' : "\033[1;37m" - - // Underline - colorcodes['ublack'] = monochrome_logs ? '' : "\033[4;30m" - colorcodes['ured'] = monochrome_logs ? '' : "\033[4;31m" - colorcodes['ugreen'] = monochrome_logs ? '' : "\033[4;32m" - colorcodes['uyellow'] = monochrome_logs ? '' : "\033[4;33m" - colorcodes['ublue'] = monochrome_logs ? '' : "\033[4;34m" - colorcodes['upurple'] = monochrome_logs ? '' : "\033[4;35m" - colorcodes['ucyan'] = monochrome_logs ? '' : "\033[4;36m" - colorcodes['uwhite'] = monochrome_logs ? '' : "\033[4;37m" - - // High Intensity - colorcodes['iblack'] = monochrome_logs ? '' : "\033[0;90m" - colorcodes['ired'] = monochrome_logs ? '' : "\033[0;91m" - colorcodes['igreen'] = monochrome_logs ? '' : "\033[0;92m" - colorcodes['iyellow'] = monochrome_logs ? '' : "\033[0;93m" - colorcodes['iblue'] = monochrome_logs ? '' : "\033[0;94m" - colorcodes['ipurple'] = monochrome_logs ? '' : "\033[0;95m" - colorcodes['icyan'] = monochrome_logs ? '' : "\033[0;96m" - colorcodes['iwhite'] = monochrome_logs ? '' : "\033[0;97m" - - // Bold High Intensity - colorcodes['biblack'] = monochrome_logs ? '' : "\033[1;90m" - colorcodes['bired'] = monochrome_logs ? '' : "\033[1;91m" - colorcodes['bigreen'] = monochrome_logs ? '' : "\033[1;92m" - colorcodes['biyellow'] = monochrome_logs ? '' : "\033[1;93m" - colorcodes['biblue'] = monochrome_logs ? '' : "\033[1;94m" - colorcodes['bipurple'] = monochrome_logs ? '' : "\033[1;95m" - colorcodes['bicyan'] = monochrome_logs ? '' : "\033[1;96m" - colorcodes['biwhite'] = monochrome_logs ? '' : "\033[1;97m" - - return colorcodes - } - - static String dashed_line(monochrome_logs) { - Map colors = log_colours(monochrome_logs) - return "-${colors.dim}----------------------------------------------------${colors.reset}-" - } - - /* - Method to actually read in JSON file using Groovy. - Group (as Key), values are all parameters - - Parameter1 as Key, Description as Value - - Parameter2 as Key, Description as Value - .... - Group - - - */ - private static LinkedHashMap params_read(String json_schema) throws Exception { - def json = new File(json_schema).text - def Map schema_definitions = (Map) new JsonSlurper().parseText(json).get('definitions') - def Map schema_properties = (Map) new JsonSlurper().parseText(json).get('properties') - /* Tree looks like this in nf-core schema - * definitions <- this is what the first get('definitions') gets us - group 1 - title - description - properties - parameter 1 - type - description - parameter 2 - type - description - group 2 - title - description - properties - parameter 1 - type - description - * properties <- parameters can also be ungrouped, outside of definitions - parameter 1 - type - description - */ - - // Grouped params - def params_map = new LinkedHashMap() - schema_definitions.each { key, val -> - def Map group = schema_definitions."$key".properties // Gets the property object of the group - def title = schema_definitions."$key".title - def sub_params = new LinkedHashMap() - group.each { innerkey, value -> - sub_params.put(innerkey, value) - } - params_map.put(title, sub_params) - } - - // Ungrouped params - def ungrouped_params = new LinkedHashMap() - schema_properties.each { innerkey, value -> - ungrouped_params.put(innerkey, value) - } - params_map.put("Other parameters", ungrouped_params) - - return params_map - } - - /* - * Get maximum number of characters across all parameter names - */ - private static Integer params_max_chars(params_map) { - Integer max_chars = 0 - for (group in params_map.keySet()) { - def group_params = params_map.get(group) // This gets the parameters of that particular group - for (param in group_params.keySet()) { - if (param.size() > max_chars) { - max_chars = param.size() - } - } - } - return max_chars - } - - /* - * Beautify parameters for --help - */ - private static String params_help(workflow, params, json_schema, command) { - Map colors = log_colours(params.monochrome_logs) + // + // Beautify parameters for --help + // + public static String paramsHelp(workflow, params, command, schema_filename='nextflow_schema.json') { + Map colors = NfcoreTemplate.logColours(params.monochrome_logs) Integer num_hidden = 0 String output = '' output += 'Typical pipeline command:\n\n' output += " ${colors.cyan}${command}${colors.reset}\n\n" - Map params_map = params_load(json_schema) - Integer max_chars = params_max_chars(params_map) + 1 + Map params_map = paramsLoad(getSchemaPath(workflow, schema_filename=schema_filename)) + Integer max_chars = paramsMaxChars(params_map) + 1 Integer desc_indent = max_chars + 14 Integer dec_linewidth = 160 - desc_indent for (group in params_map.keySet()) { @@ -463,18 +228,17 @@ class NfcoreSchema { output += group_output } } - output += dashed_line(params.monochrome_logs) if (num_hidden > 0){ - output += colors.dim + "\n Hiding $num_hidden params, use --show_hidden_params to show.\n" + colors.reset - output += dashed_line(params.monochrome_logs) + output += colors.dim + "!! Hiding $num_hidden params, use --show_hidden_params to show them !!\n" + colors.reset } + output += NfcoreTemplate.dashedLine(params.monochrome_logs) return output } - /* - * Groovy Map summarising parameters/workflow options used by the pipeline - */ - private static LinkedHashMap params_summary_map(workflow, params, json_schema) { + // + // Groovy Map summarising parameters/workflow options used by the pipeline + // + public static LinkedHashMap paramsSummaryMap(workflow, params, schema_filename='nextflow_schema.json') { // Get a selection of core Nextflow workflow options def Map workflow_summary = [:] if (workflow.revision) { @@ -482,10 +246,10 @@ class NfcoreSchema { } workflow_summary['runName'] = workflow.runName if (workflow.containerEngine) { - workflow_summary['containerEngine'] = "$workflow.containerEngine" + workflow_summary['containerEngine'] = workflow.containerEngine } if (workflow.container) { - workflow_summary['container'] = "$workflow.container" + workflow_summary['container'] = workflow.container } workflow_summary['launchDir'] = workflow.launchDir workflow_summary['workDir'] = workflow.workDir @@ -497,7 +261,7 @@ class NfcoreSchema { // Get pipeline parameters defined in JSON Schema def Map params_summary = [:] def blacklist = ['hostnames'] - def params_map = params_load(json_schema) + def params_map = paramsLoad(getSchemaPath(workflow, schema_filename=schema_filename)) for (group in params_map.keySet()) { def sub_params = new LinkedHashMap() def group_params = params_map.get(group) // This gets the parameters of that particular group @@ -506,17 +270,7 @@ class NfcoreSchema { def params_value = params.get(param) def schema_value = group_params.get(param).default def param_type = group_params.get(param).type - if (schema_value == null) { - if (param_type == 'boolean') { - schema_value = false - } - if (param_type == 'string') { - schema_value = '' - } - if (param_type == 'integer') { - schema_value = 0 - } - } else { + if (schema_value != null) { if (param_type == 'string') { if (schema_value.contains('$projectDir') || schema_value.contains('${projectDir}')) { def sub_string = schema_value.replace('\$projectDir', '') @@ -535,8 +289,13 @@ class NfcoreSchema { } } - if (params_value != schema_value) { - sub_params.put("$param", params_value) + // We have a default in the schema, and this isn't it + if (schema_value != null && params_value != schema_value) { + sub_params.put(param, params_value) + } + // No default in the schema, and this isn't empty + else if (schema_value == null && params_value != "" && params_value != null && params_value != false) { + sub_params.put(param, params_value) } } } @@ -545,27 +304,214 @@ class NfcoreSchema { return [ 'Core Nextflow options' : workflow_summary ] << params_summary } - /* - * Beautify parameters for summary and return as string - */ - private static String params_summary_log(workflow, params, json_schema) { + // + // Beautify parameters for summary and return as string + // + public static String paramsSummaryLog(workflow, params) { + Map colors = NfcoreTemplate.logColours(params.monochrome_logs) String output = '' - def params_map = params_summary_map(workflow, params, json_schema) - def max_chars = params_max_chars(params_map) + def params_map = paramsSummaryMap(workflow, params) + def max_chars = paramsMaxChars(params_map) for (group in params_map.keySet()) { def group_params = params_map.get(group) // This gets the parameters of that particular group if (group_params) { - output += group + '\n' + output += colors.bold + group + colors.reset + '\n' for (param in group_params.keySet()) { - output += " \u001B[1m" + param.padRight(max_chars) + ": \u001B[1m" + group_params.get(param) + '\n' + output += " " + colors.blue + param.padRight(max_chars) + ": " + colors.green + group_params.get(param) + colors.reset + '\n' } output += '\n' } } - output += "[Only displaying parameters that differ from pipeline default]\n" - output += dashed_line(params.monochrome_logs) - output += '\n\n' + dashed_line(params.monochrome_logs) + output += "!! Only displaying parameters that differ from the pipeline defaults !!\n" + output += NfcoreTemplate.dashedLine(params.monochrome_logs) return output } + // + // Loop over nested exceptions and print the causingException + // + private static void printExceptions(ex_json, params_json, log) { + def causingExceptions = ex_json['causingExceptions'] + if (causingExceptions.length() == 0) { + def m = ex_json['message'] =~ /required key \[([^\]]+)\] not found/ + // Missing required param + if (m.matches()) { + log.error "* Missing required parameter: --${m[0][1]}" + } + // Other base-level error + else if (ex_json['pointerToViolation'] == '#') { + log.error "* ${ex_json['message']}" + } + // Error with specific param + else { + def param = ex_json['pointerToViolation'] - ~/^#\// + def param_val = params_json[param].toString() + log.error "* --${param}: ${ex_json['message']} (${param_val})" + } + } + for (ex in causingExceptions) { + printExceptions(ex, params_json, log) + } + } + + // + // Remove an element from a JSONArray + // + private static JSONArray removeElement(json_array, element) { + def list = [] + int len = json_array.length() + for (int i=0;i + if(raw_schema.keySet().contains('definitions')){ + raw_schema.definitions.each { definition -> + for (key in definition.keySet()){ + if (definition[key].get("properties").keySet().contains(ignore_param)){ + // Remove the param to ignore + definition[key].get("properties").remove(ignore_param) + // If the param was required, change this + if (definition[key].has("required")) { + def cleaned_required = removeElement(definition[key].required, ignore_param) + definition[key].put("required", cleaned_required) + } + } + } + } + } + if(raw_schema.keySet().contains('properties') && raw_schema.get('properties').keySet().contains(ignore_param)) { + raw_schema.get("properties").remove(ignore_param) + } + if(raw_schema.keySet().contains('required') && raw_schema.required.contains(ignore_param)) { + def cleaned_required = removeElement(raw_schema.required, ignore_param) + raw_schema.put("required", cleaned_required) + } + } + return raw_schema + } + + // + // Clean and check parameters relative to Nextflow native classes + // + private static Map cleanParameters(params) { + def new_params = params.getClass().newInstance(params) + for (p in params) { + // remove anything evaluating to false + if (!p['value']) { + new_params.remove(p.key) + } + // Cast MemoryUnit to String + if (p['value'].getClass() == nextflow.util.MemoryUnit) { + new_params.replace(p.key, p['value'].toString()) + } + // Cast Duration to String + if (p['value'].getClass() == nextflow.util.Duration) { + new_params.replace(p.key, p['value'].toString().replaceFirst(/d(?!\S)/, "day")) + } + // Cast LinkedHashMap to String + if (p['value'].getClass() == LinkedHashMap) { + new_params.replace(p.key, p['value'].toString()) + } + } + return new_params + } + + // + // This function tries to read a JSON params file + // + private static LinkedHashMap paramsLoad(String json_schema) { + def params_map = new LinkedHashMap() + try { + params_map = paramsRead(json_schema) + } catch (Exception e) { + println "Could not read parameters settings from JSON. $e" + params_map = new LinkedHashMap() + } + return params_map + } + + // + // Method to actually read in JSON file using Groovy. + // Group (as Key), values are all parameters + // - Parameter1 as Key, Description as Value + // - Parameter2 as Key, Description as Value + // .... + // Group + // - + private static LinkedHashMap paramsRead(String json_schema) throws Exception { + def json = new File(json_schema).text + def Map schema_definitions = (Map) new JsonSlurper().parseText(json).get('definitions') + def Map schema_properties = (Map) new JsonSlurper().parseText(json).get('properties') + /* Tree looks like this in nf-core schema + * definitions <- this is what the first get('definitions') gets us + group 1 + title + description + properties + parameter 1 + type + description + parameter 2 + type + description + group 2 + title + description + properties + parameter 1 + type + description + * properties <- parameters can also be ungrouped, outside of definitions + parameter 1 + type + description + */ + + // Grouped params + def params_map = new LinkedHashMap() + schema_definitions.each { key, val -> + def Map group = schema_definitions."$key".properties // Gets the property object of the group + def title = schema_definitions."$key".title + def sub_params = new LinkedHashMap() + group.each { innerkey, value -> + sub_params.put(innerkey, value) + } + params_map.put(title, sub_params) + } + + // Ungrouped params + def ungrouped_params = new LinkedHashMap() + schema_properties.each { innerkey, value -> + ungrouped_params.put(innerkey, value) + } + params_map.put("Other parameters", ungrouped_params) + + return params_map + } + + // + // Get maximum number of characters across all parameter names + // + private static Integer paramsMaxChars(params_map) { + Integer max_chars = 0 + for (group in params_map.keySet()) { + def group_params = params_map.get(group) // This gets the parameters of that particular group + for (param in group_params.keySet()) { + if (param.size() > max_chars) { + max_chars = param.size() + } + } + } + return max_chars + } } diff --git a/lib/NfcoreTemplate.groovy b/lib/NfcoreTemplate.groovy new file mode 100755 index 00000000..44551e0a --- /dev/null +++ b/lib/NfcoreTemplate.groovy @@ -0,0 +1,270 @@ +// +// This file holds several functions used within the nf-core pipeline template. +// + +import org.yaml.snakeyaml.Yaml + +class NfcoreTemplate { + + // + // Check AWS Batch related parameters have been specified correctly + // + public static void awsBatch(workflow, params) { + if (workflow.profile.contains('awsbatch')) { + // Check params.awsqueue and params.awsregion have been set if running on AWSBatch + assert (params.awsqueue && params.awsregion) : "Specify correct --awsqueue and --awsregion parameters on AWSBatch!" + // Check outdir paths to be S3 buckets if running on AWSBatch + assert params.outdir.startsWith('s3:') : "Outdir not on S3 - specify S3 Bucket to run on AWSBatch!" + } + } + + // + // Check params.hostnames + // + public static void hostName(workflow, params, log) { + Map colors = logColours(params.monochrome_logs) + if (params.hostnames) { + try { + def hostname = "hostname".execute().text.trim() + params.hostnames.each { prof, hnames -> + hnames.each { hname -> + if (hostname.contains(hname) && !workflow.profile.contains(prof)) { + log.info "=${colors.yellow}====================================================${colors.reset}=\n" + + "${colors.yellow}WARN: You are running with `-profile $workflow.profile`\n" + + " but your machine hostname is ${colors.white}'$hostname'${colors.reset}.\n" + + " ${colors.yellow_bold}Please use `-profile $prof${colors.reset}`\n" + + "=${colors.yellow}====================================================${colors.reset}=" + } + } + } + } catch (Exception e) { + log.warn "[$workflow.manifest.name] Could not determine 'hostname' - skipping check. Reason: ${e.message}." + } + } + } + + // + // Construct and send completion email + // + public static void email(workflow, params, summary_params, projectDir, log, multiqc_report=[]) { + + // Set up the e-mail variables + def subject = "[$workflow.manifest.name] Successful: $workflow.runName" + if (!workflow.success) { + subject = "[$workflow.manifest.name] FAILED: $workflow.runName" + } + + def summary = [:] + for (group in summary_params.keySet()) { + summary << summary_params[group] + } + + def misc_fields = [:] + misc_fields['Date Started'] = workflow.start + misc_fields['Date Completed'] = workflow.complete + misc_fields['Pipeline script file path'] = workflow.scriptFile + misc_fields['Pipeline script hash ID'] = workflow.scriptId + if (workflow.repository) misc_fields['Pipeline repository Git URL'] = workflow.repository + if (workflow.commitId) misc_fields['Pipeline repository Git Commit'] = workflow.commitId + if (workflow.revision) misc_fields['Pipeline Git branch/tag'] = workflow.revision + misc_fields['Nextflow Version'] = workflow.nextflow.version + misc_fields['Nextflow Build'] = workflow.nextflow.build + misc_fields['Nextflow Compile Timestamp'] = workflow.nextflow.timestamp + + def email_fields = [:] + email_fields['version'] = workflow.manifest.version + email_fields['runName'] = workflow.runName + email_fields['success'] = workflow.success + email_fields['dateComplete'] = workflow.complete + email_fields['duration'] = workflow.duration + email_fields['exitStatus'] = workflow.exitStatus + email_fields['errorMessage'] = (workflow.errorMessage ?: 'None') + email_fields['errorReport'] = (workflow.errorReport ?: 'None') + email_fields['commandLine'] = workflow.commandLine + email_fields['projectDir'] = workflow.projectDir + email_fields['summary'] = summary << misc_fields + + // On success try attach the multiqc report + def mqc_report = null + try { + if (workflow.success) { + mqc_report = multiqc_report.getVal() + if (mqc_report.getClass() == ArrayList && mqc_report.size() >= 1) { + if (mqc_report.size() > 1) { + log.warn "[$workflow.manifest.name] Found multiple reports from process 'MULTIQC', will use only one" + } + mqc_report = mqc_report[0] + } + } + } catch (all) { + if (multiqc_report) { + log.warn "[$workflow.manifest.name] Could not attach MultiQC report to summary email" + } + } + + // Check if we are only sending emails on failure + def email_address = params.email + if (!params.email && params.email_on_fail && !workflow.success) { + email_address = params.email_on_fail + } + + // Render the TXT template + def engine = new groovy.text.GStringTemplateEngine() + def tf = new File("$projectDir/assets/email_template.txt") + def txt_template = engine.createTemplate(tf).make(email_fields) + def email_txt = txt_template.toString() + + // Render the HTML template + def hf = new File("$projectDir/assets/email_template.html") + def html_template = engine.createTemplate(hf).make(email_fields) + def email_html = html_template.toString() + + // Render the sendmail template + def max_multiqc_email_size = params.max_multiqc_email_size as nextflow.util.MemoryUnit + def smail_fields = [ email: email_address, subject: subject, email_txt: email_txt, email_html: email_html, projectDir: "$projectDir", mqcFile: mqc_report, mqcMaxSize: max_multiqc_email_size.toBytes() ] + def sf = new File("$projectDir/assets/sendmail_template.txt") + def sendmail_template = engine.createTemplate(sf).make(smail_fields) + def sendmail_html = sendmail_template.toString() + + // Send the HTML e-mail + Map colors = logColours(params.monochrome_logs) + if (email_address) { + try { + if (params.plaintext_email) { throw GroovyException('Send plaintext e-mail, not HTML') } + // Try to send HTML e-mail using sendmail + [ 'sendmail', '-t' ].execute() << sendmail_html + log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Sent summary e-mail to $email_address (sendmail)-" + } catch (all) { + // Catch failures and try with plaintext + def mail_cmd = [ 'mail', '-s', subject, '--content-type=text/html', email_address ] + if ( mqc_report.size() <= max_multiqc_email_size.toBytes() ) { + mail_cmd += [ '-A', mqc_report ] + } + mail_cmd.execute() << email_html + log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Sent summary e-mail to $email_address (mail)-" + } + } + + // Write summary e-mail HTML to a file + def output_d = new File("${params.outdir}/pipeline_info/") + if (!output_d.exists()) { + output_d.mkdirs() + } + def output_hf = new File(output_d, "pipeline_report.html") + output_hf.withWriter { w -> w << email_html } + def output_tf = new File(output_d, "pipeline_report.txt") + output_tf.withWriter { w -> w << email_txt } + } + + // + // Print pipeline summary on completion + // + public static void summary(workflow, params, log) { + Map colors = logColours(params.monochrome_logs) + if (workflow.success) { + if (workflow.stats.ignoredCount == 0) { + log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Pipeline completed successfully${colors.reset}-" + } else { + log.info "-${colors.purple}[$workflow.manifest.name]${colors.red} Pipeline completed successfully, but with errored process(es) ${colors.reset}-" + } + } else { + hostName(workflow, params, log) + log.info "-${colors.purple}[$workflow.manifest.name]${colors.red} Pipeline completed with errors${colors.reset}-" + } + } + + // + // ANSII Colours used for terminal logging + // + public static Map logColours(Boolean monochrome_logs) { + Map colorcodes = [:] + + // Reset / Meta + colorcodes['reset'] = monochrome_logs ? '' : "\033[0m" + colorcodes['bold'] = monochrome_logs ? '' : "\033[1m" + colorcodes['dim'] = monochrome_logs ? '' : "\033[2m" + colorcodes['underlined'] = monochrome_logs ? '' : "\033[4m" + colorcodes['blink'] = monochrome_logs ? '' : "\033[5m" + colorcodes['reverse'] = monochrome_logs ? '' : "\033[7m" + colorcodes['hidden'] = monochrome_logs ? '' : "\033[8m" + + // Regular Colors + colorcodes['black'] = monochrome_logs ? '' : "\033[0;30m" + colorcodes['red'] = monochrome_logs ? '' : "\033[0;31m" + colorcodes['green'] = monochrome_logs ? '' : "\033[0;32m" + colorcodes['yellow'] = monochrome_logs ? '' : "\033[0;33m" + colorcodes['blue'] = monochrome_logs ? '' : "\033[0;34m" + colorcodes['purple'] = monochrome_logs ? '' : "\033[0;35m" + colorcodes['cyan'] = monochrome_logs ? '' : "\033[0;36m" + colorcodes['white'] = monochrome_logs ? '' : "\033[0;37m" + + // Bold + colorcodes['bblack'] = monochrome_logs ? '' : "\033[1;30m" + colorcodes['bred'] = monochrome_logs ? '' : "\033[1;31m" + colorcodes['bgreen'] = monochrome_logs ? '' : "\033[1;32m" + colorcodes['byellow'] = monochrome_logs ? '' : "\033[1;33m" + colorcodes['bblue'] = monochrome_logs ? '' : "\033[1;34m" + colorcodes['bpurple'] = monochrome_logs ? '' : "\033[1;35m" + colorcodes['bcyan'] = monochrome_logs ? '' : "\033[1;36m" + colorcodes['bwhite'] = monochrome_logs ? '' : "\033[1;37m" + + // Underline + colorcodes['ublack'] = monochrome_logs ? '' : "\033[4;30m" + colorcodes['ured'] = monochrome_logs ? '' : "\033[4;31m" + colorcodes['ugreen'] = monochrome_logs ? '' : "\033[4;32m" + colorcodes['uyellow'] = monochrome_logs ? '' : "\033[4;33m" + colorcodes['ublue'] = monochrome_logs ? '' : "\033[4;34m" + colorcodes['upurple'] = monochrome_logs ? '' : "\033[4;35m" + colorcodes['ucyan'] = monochrome_logs ? '' : "\033[4;36m" + colorcodes['uwhite'] = monochrome_logs ? '' : "\033[4;37m" + + // High Intensity + colorcodes['iblack'] = monochrome_logs ? '' : "\033[0;90m" + colorcodes['ired'] = monochrome_logs ? '' : "\033[0;91m" + colorcodes['igreen'] = monochrome_logs ? '' : "\033[0;92m" + colorcodes['iyellow'] = monochrome_logs ? '' : "\033[0;93m" + colorcodes['iblue'] = monochrome_logs ? '' : "\033[0;94m" + colorcodes['ipurple'] = monochrome_logs ? '' : "\033[0;95m" + colorcodes['icyan'] = monochrome_logs ? '' : "\033[0;96m" + colorcodes['iwhite'] = monochrome_logs ? '' : "\033[0;97m" + + // Bold High Intensity + colorcodes['biblack'] = monochrome_logs ? '' : "\033[1;90m" + colorcodes['bired'] = monochrome_logs ? '' : "\033[1;91m" + colorcodes['bigreen'] = monochrome_logs ? '' : "\033[1;92m" + colorcodes['biyellow'] = monochrome_logs ? '' : "\033[1;93m" + colorcodes['biblue'] = monochrome_logs ? '' : "\033[1;94m" + colorcodes['bipurple'] = monochrome_logs ? '' : "\033[1;95m" + colorcodes['bicyan'] = monochrome_logs ? '' : "\033[1;96m" + colorcodes['biwhite'] = monochrome_logs ? '' : "\033[1;97m" + + return colorcodes + } + + // + // Does what is says on the tin + // + public static String dashedLine(monochrome_logs) { + Map colors = logColours(monochrome_logs) + return "-${colors.dim}----------------------------------------------------${colors.reset}-" + } + + // + // nf-core logo + // + public static String logo(workflow, monochrome_logs) { + Map colors = logColours(monochrome_logs) + String.format( + """\n + ${dashedLine(monochrome_logs)} + ${colors.green},--.${colors.black}/${colors.green},-.${colors.reset} + ${colors.blue} ___ __ __ __ ___ ${colors.green}/,-._.--~\'${colors.reset} + ${colors.blue} |\\ | |__ __ / ` / \\ |__) |__ ${colors.yellow}} {${colors.reset} + ${colors.blue} | \\| | \\__, \\__/ | \\ |___ ${colors.green}\\`-._,-`-,${colors.reset} + ${colors.green}`._,._,\'${colors.reset} + ${colors.purple} ${workflow.manifest.name} v${workflow.manifest.version}${colors.reset} + ${dashedLine(monochrome_logs)} + """.stripIndent() + ) + } +} diff --git a/lib/Utils.groovy b/lib/Utils.groovy new file mode 100644 index 00000000..bcb958c8 --- /dev/null +++ b/lib/Utils.groovy @@ -0,0 +1,48 @@ +// +// This file holds several Groovy functions that could be useful for any Nextflow pipeline +// + +import org.yaml.snakeyaml.Yaml + +class Utils { + + // + // When running with -profile conda, warn if channels have not been set-up appropriately + // + public static void checkCondaChannels(log) { + Yaml parser = new Yaml() + def channels = [] + try { + def config = parser.load('conda config --show channels'.execute().text) + channels = config.channels + } catch (NullPointerException | IOException e) { + log.warn 'Could not verify conda channel configuration.' + return + } + + // Check that all channels are present + def required_channels = ['conda-forge', 'bioconda', 'defaults'] + def conda_check_failed = !required_channels.every { ch -> ch in channels } + + // Check that they are in the right order + conda_check_failed |= !(channels.indexOf('conda-forge') < channels.indexOf('bioconda')) + conda_check_failed |= !(channels.indexOf('bioconda') < channels.indexOf('defaults')) + + if (conda_check_failed) { + log.warn '=============================================================================\n' + + ' There is a problem with your Conda configuration!\n\n' + + ' You will need to set-up the conda-forge and bioconda channels correctly.\n' + + ' Please refer to https://bioconda.github.io/user/install.html#set-up-channels\n' + + ' NB: The order of the channels matters!\n' + + '===================================================================================' + } + } + + // + // Join module args with appropriate spacing + // + public static String joinModuleArgs(args_list) { + return ' ' + args_list.join(' ') + } + +} diff --git a/lib/Workflow.groovy b/lib/Workflow.groovy new file mode 100644 index 00000000..5a7f70e0 --- /dev/null +++ b/lib/Workflow.groovy @@ -0,0 +1,139 @@ +/* + * This file holds several functions specific to the pipeline. + */ + +class Workflow { + + // Citation string + private static String citation(workflow) { + return "If you use ${workflow.manifest.name} for your analysis please cite:\n\n" + + '* The pipeline\n' + + ' https://doi.org/10.5281/zenodo.3901628\n\n' + + '* The nf-core framework\n' + + ' https://doi.org/10.1038/s41587-020-0439-x\n\n' + + '* Software dependencies\n' + + " https://github.com/${workflow.manifest.name}/blob/master/CITATIONS.md" + } + + // Exit pipeline if incorrect --genome key provided + static void genomeExists(params, log) { + if (params.genomes && params.genome && !params.genomes.containsKey(params.genome)) { + log.error '=============================================================================\n' + + " Genome '${params.genome}' not found in any config files provided to the pipeline.\n" + + ' Currently, the available genome keys are:\n' + + " ${params.genomes.keySet().join(', ')}\n" + + '===================================================================================' + System.exit(1) + } + } + + /* + * Get workflow summary for MultiQC + */ + static String paramsSummaryMultiqc(workflow, summary) { + String summary_section = '' + for (group in summary.keySet()) { + def group_params = summary.get(group) // This gets the parameters of that particular group + if (group_params) { + summary_section += "

$group

\n" + summary_section += "
\n" + for (param in group_params.keySet()) { + summary_section += "
$param
${group_params.get(param) ?: 'N/A'}
\n" + } + summary_section += '
\n' + } + } + + String yaml_file_text = "id: '${workflow.manifest.name.replace('/', '-')}-summary'\n" + yaml_file_text += "description: ' - this information is collected when the pipeline is started.'\n" + yaml_file_text += "section_name: '${workflow.manifest.name} Workflow Summary'\n" + yaml_file_text += "section_href: 'https://github.com/${workflow.manifest.name}'\n" + yaml_file_text += "plot_type: 'html'\n" + yaml_file_text += 'data: |\n' + yaml_file_text += "${summary_section}" + return yaml_file_text + } + + /* + * Format the protocol + * Given the protocol paramter (params.protocol) and the aligner (params.aligner), + * this function formats the protocol such that it is fit for the respective + * subworkflow + */ + static formatProtocol(protocol, aligner) { + String new_protocol = protocol + String chemistry = '' + + // alevin + if (aligner == 'alevin') { + switch (protocol) { + case '10XV1': + new_protocol = 'chromium' + chemistry = 'V1' + break + case '10XV2': + new_protocol = 'chromium' + chemistry = 'V2' + break + case '10XV3': + new_protocol = 'chromiumV3' + chemistry = 'V3' + break + case 'dropseq': + new_protocol = 'dropseq' + } + } + + // star + else if (aligner == 'star') { + switch (protocol) { + case '10XV1': + new_protocol = 'CB_UMI_Simple' + chemistry = 'V1' + break + case '10XV2': + new_protocol = 'CB_UMI_Simple' + chemistry = 'V2' + break + case '10XV3': + new_protocol = 'CB_UMI_Simple' + chemistry = 'V3' + break + case 'dropseq': + new_protocol = 'CB_UMI_Simple' + break + case 'smartseq': + new_protocol = 'SmartSeq' + } + } + + // kallisto bustools + else if (aligner = 'kallisto' ) { + switch (protocol) { + case '10XV1': + new_protocol = '10XV1' + chemistry = 'V1' + break + case '10XV2': + new_protocol = '10XV2' + chemistry = 'V2' + break + case '10XV3': + new_protocol = '10XV3' + chemistry = 'V3' + break + case 'dropseq': + new_protocol = 'DROPSEQ' + break + case 'smartseq': + new_protocol = 'SMARTSEQ' + } + } + else { + exit 1, 'Aligner not recognized.' + } + + return [new_protocol, chemistry] + } + +} diff --git a/lib/WorkflowMain.groovy b/lib/WorkflowMain.groovy new file mode 100755 index 00000000..2dd7fe02 --- /dev/null +++ b/lib/WorkflowMain.groovy @@ -0,0 +1,94 @@ +// +// This file holds several functions specific to the main.nf workflow in the nf-core/scrnaseq pipeline +// + +class WorkflowMain { + + // + // Citation string for pipeline + // + public static String citation(workflow) { + return "If you use ${workflow.manifest.name} for your analysis please cite:\n\n" + + // TODO nf-core: Add Zenodo DOI for pipeline after first release + //"* The pipeline\n" + + //" https://doi.org/10.5281/zenodo.XXXXXXX\n\n" + + "* The nf-core framework\n" + + " https://doi.org/10.1038/s41587-020-0439-x\n\n" + + "* Software dependencies\n" + + " https://github.com/${workflow.manifest.name}/blob/master/CITATIONS.md" + } + + // + // Print help to screen if required + // + public static String help(workflow, params, log) { + def command = "nextflow run ${workflow.manifest.name} --input samplesheet.csv --genome GRCh37 -profile docker" + def help_string = '' + help_string += NfcoreTemplate.logo(workflow, params.monochrome_logs) + help_string += NfcoreSchema.paramsHelp(workflow, params, command) + help_string += '\n' + citation(workflow) + '\n' + help_string += NfcoreTemplate.dashedLine(params.monochrome_logs) + return help_string + } + + // + // Print parameter summary log to screen + // + public static String paramsSummaryLog(workflow, params, log) { + def summary_log = '' + summary_log += NfcoreTemplate.logo(workflow, params.monochrome_logs) + summary_log += NfcoreSchema.paramsSummaryLog(workflow, params) + summary_log += '\n' + citation(workflow) + '\n' + summary_log += NfcoreTemplate.dashedLine(params.monochrome_logs) + return summary_log + } + + // + // Validate parameters and print summary to screen + // + public static void initialise(workflow, params, log) { + // Print help to screen if required + if (params.help) { + log.info help(workflow, params, log) + System.exit(0) + } + + // Validate workflow parameters via the JSON schema + if (params.validate_params) { + NfcoreSchema.validateParameters(workflow, params, log) + } + + // Print parameter summary log to screen + log.info paramsSummaryLog(workflow, params, log) + + // Check that conda channels are set-up correctly + if (params.enable_conda) { + Utils.checkCondaChannels(log) + } + + // Check AWS batch settings + NfcoreTemplate.awsBatch(workflow, params) + + // Check the hostnames against configured profiles + NfcoreTemplate.hostName(workflow, params, log) + + // Check input has been provided + if (!params.input) { + log.error "Please provide an input samplesheet to the pipeline e.g. '--input samplesheet.csv'" + System.exit(1) + } + } + + // + // Get attribute from genome config file e.g. fasta + // + public static String getGenomeAttribute(params, attribute) { + def val = '' + if (params.genomes && params.genome && params.genomes.containsKey(params.genome)) { + if (params.genomes[ params.genome ].containsKey(attribute)) { + val = params.genomes[ params.genome ][ attribute ] + } + } + return val + } +} diff --git a/lib/WorkflowScrnaseq.groovy b/lib/WorkflowScrnaseq.groovy new file mode 100755 index 00000000..d3e4414a --- /dev/null +++ b/lib/WorkflowScrnaseq.groovy @@ -0,0 +1,60 @@ +// +// This file holds several functions specific to the workflow/scrnaseq.nf in the nf-core/scrnaseq pipeline +// + +class WorkflowScrnaseq { + + // + // Check and validate parameters + // + public static void initialise(params, log) { + genomeExistsError(params, log) + + if (!params.genome_fasta) { + log.error "Genome fasta file not specified with e.g. '--genome_fasta genome.fa' or via a detectable config file." + System.exit(1) + } + } + + // + // Get workflow summary for MultiQC + // + public static String paramsSummaryMultiqc(workflow, summary) { + String summary_section = '' + for (group in summary.keySet()) { + def group_params = summary.get(group) // This gets the parameters of that particular group + if (group_params) { + summary_section += "

$group

\n" + summary_section += "
\n" + for (param in group_params.keySet()) { + summary_section += "
$param
${group_params.get(param) ?: 'N/A'}
\n" + } + summary_section += '
\n' + } + } + + String yaml_file_text = "id: '${workflow.manifest.name.replace('/', '-')}-summary'\n" + yaml_file_text += "description: ' - this information is collected when the pipeline is started.'\n" + yaml_file_text += "section_name: '${workflow.manifest.name} Workflow Summary'\n" + yaml_file_text += "section_href: 'https://github.com/${workflow.manifest.name}'\n" + yaml_file_text += "plot_type: 'html'\n" + yaml_file_text += 'data: |\n' + yaml_file_text += "${summary_section}" + return yaml_file_text + } + + // + // Exit pipeline if incorrect --genome key provided + // + private static void genomeExistsError(params, log) { + if (params.genomes && params.genome && !params.genomes.containsKey(params.genome)) { + log.error '=============================================================================\n' + + " Genome '${params.genome}' not found in any config files provided to the pipeline.\n" + + ' Currently, the available genome keys are:\n' + + " ${params.genomes.keySet().join(', ')}\n" + + '===================================================================================' + System.exit(1) + } + } + +} diff --git a/main.nf b/main.nf index 44f38bbe..f2efec01 100644 --- a/main.nf +++ b/main.nf @@ -1,839 +1,58 @@ #!/usr/bin/env nextflow /* ======================================================================================== - nf-core/scrnaseq + nf-core/scrnaseq ======================================================================================== - nf-core/scrnaseq Analysis Pipeline. - #### Homepage / Documentation - https://github.com/nf-core/scrnaseq + Github : https://github.com/nf-core/scrnaseq + Website: https://nf-co.re/scrnaseq + Slack : https://nfcore.slack.com/channels/scrnaseq ---------------------------------------------------------------------------------------- */ -log.info Headers.nf_core(workflow, params.monochrome_logs) +nextflow.enable.dsl = 2 -//////////////////////////////////////////////////// -/* -- PRINT HELP -- */ -////////////////////////////////////////////////////+ -def json_schema = "$projectDir/nextflow_schema.json" -if (params.help) { - def command = "nextflow run nf-core/scrnaseq --input '*_R{1,2}.fastq.gz' -profile docker" - log.info NfcoreSchema.params_help(workflow, params, json_schema, command) - exit 0 -} - -//////////////////////////////////////////////////// -/* -- VALIDATE PARAMETERS -- */ -////////////////////////////////////////////////////+ -if (params.validate_params) { - NfcoreSchema.validateParameters(params, json_schema, log) -} - -//////////////////////////////////////////////////// -/* -- Collect configuration parameters -- */ -//////////////////////////////////////////////////// - -// Check if genome exists in the config file -if (params.genomes && params.genome && !params.genomes.containsKey(params.genome)) { - exit 1, "The provided genome '${params.genome}' is not available in the iGenomes file. Currently the available genomes are ${params.genomes.keySet().join(', ')}" -} - -//Check if one of the available aligners is used (alevin, kallisto, star) -if (params.aligner != 'star' && params.aligner != 'alevin' && params.aligner != 'kallisto'){ - exit 1, "Invalid aligner option: ${params.aligner}. Valid options: 'star', 'alevin', 'kallisto'" -} -//Check if STAR index is supplied properly -if( params.star_index && params.aligner == 'star' ){ - star_index = Channel - .fromPath(params.star_index) - .ifEmpty { exit 1, "STAR index not found: ${params.star_index}" } -} - -if (params.aligner == 'star' && (!params.star_index && (!params.gtf || !params.fasta))){ - exit 1, "STAR needs either a GTF + FASTA or a precomputed index supplied." -} - -//Sanity check Kallisto behaviour -if ( params.aligner == 'kallisto' && !( params.kallisto_index || ((params.fasta || params.transcript_fasta) && params.gtf ))) { - exit 1, "Kallisto needs either a precomputed index or a FASTA + GTF file to run!" -} - -//Check if GTF is supplied properly -if( params.gtf ){ - Channel - .fromPath(params.gtf) - .ifEmpty { exit 1, "GTF annotation file not found: ${params.gtf}" } - .into { gtf_extract_transcriptome; gtf_alevin; gtf_makeSTARindex; gtf_star; gtf_gene_map } -} - -//Check if TXP2Gene is provided for Alevin -if (!params.gtf && !params.txp2gene && params.aligner == 'alevin'){ - exit 1, "Must provide either a GTF file ('--gtf') or transcript to gene mapping ('--txp2gene') to align with Alevin" -} - -//Setup FastA channels -if( params.fasta ){ - Channel - .fromPath(params.fasta) - .ifEmpty { exit 1, "Fasta file not found: ${params.fasta}" } - .into { genome_fasta_extract_transcriptome ; genome_fasta_makeSTARindex } -} else { - genome_fasta_extract_transcriptome = Channel.empty() - genome_fasta_makeSTARindex = Channel.empty() -} -//Setup Transcript FastA channels -if( params.transcript_fasta ){ - if( params.aligner == "star" && !params.fasta) { - exit 1, "Transcriptome-only alignment is not valid with the aligner: ${params.aligner}. Transcriptome-only alignment is only valid with '--aligner alevin'" - } - Channel - .fromPath(params.transcript_fasta) - .ifEmpty { exit 1, "Fasta file not found: ${params.transcript_fasta}" } - .into { transcriptome_fasta_alevin; transcriptome_fasta_kallisto } -} else { - transcriptome_fasta_alevin = Channel.empty() - transcriptome_fasta_kallisto = Channel.empty() -} - -//Setup channel for salmon index if specified -if (params.aligner == 'alevin' && params.salmon_index) { - Channel - .fromPath(params.salmon_index) - .ifEmpty { exit 1, "Salmon index not found: ${params.salmon_index}" } - .set { salmon_index_alevin } -} - -// Check AWS batch settings -if (workflow.profile.contains('awsbatch')) { - // AWSBatch sanity checking - if (!params.awsqueue || !params.awsregion) exit 1, 'Specify correct --awsqueue and --awsregion parameters on AWSBatch!' - // Check outdir paths to be S3 buckets if running on AWSBatch - // related: https://github.com/nextflow-io/nextflow/issues/813 - if (!params.outdir.startsWith('s3:')) exit 1, 'Outdir not on S3 - specify S3 Bucket to run on AWSBatch!' - // Prevent trace files to be stored on S3 since S3 does not support rolling files. - if (params.tracedir.startsWith('s3:')) exit 1, 'Specify a local tracedir or run without trace! S3 cannot be used for tracefiles.' -} - -// Stage config files -ch_multiqc_config = file("$projectDir/assets/multiqc_config.yaml", checkIfExists: true) -ch_multiqc_custom_config = params.multiqc_config ? Channel.fromPath(params.multiqc_config, checkIfExists: true) : Channel.empty() -ch_output_docs = file("$projectDir/docs/output.md", checkIfExists: true) -ch_output_docs_images = file("$projectDir/docs/images/", checkIfExists: true) - -/* - * Create a channel for input read files - */ - - if(params.input_paths){ - Channel - .from(params.input_paths) - .map { row -> [ row[0], [file(row[1][0]), file(row[1][1])]] } - .ifEmpty { exit 1, "params.input_paths was empty - no input files supplied" } - .into { read_files_alevin; read_files_star; read_files_kallisto} - } else { - Channel - .fromFilePairs( params.input ) - .ifEmpty { exit 1, "Cannot find any reads matching: ${params.input}\nNB: Path needs to be enclosed in quotes!\nNB: Path requires at least one * wildcard!\n" } - .into { read_files_alevin; read_files_star; read_files_kallisto } -} - -//Whitelist files for STARsolo and Kallisto -whitelist_folder = "$baseDir/assets/whitelist/" - -//Automatically set up proper filepaths to the barcode whitelist files bundled with the pipeline -if (params.type == "10x" && !params.barcode_whitelist){ - barcode_filename = "$whitelist_folder/${params.type}_${params.chemistry}_barcode_whitelist.txt.gz" - Channel.fromPath(barcode_filename) - .ifEmpty{ exit 1, "Cannot find ${params.type} barcode whitelist: $barcode_filename" } - .set{ barcode_whitelist_gzipped } - Channel.empty().into{ barcode_whitelist_star; barcode_whitelist_kallisto; barcode_whitelist_alevinqc } -} else if (params.barcode_whitelist){ - Channel.fromPath(params.barcode_whitelist) - .ifEmpty{ exit 1, "Cannot find ${params.type} barcode whitelist: $barcode_filename" } - .into{ barcode_whitelist_star; barcode_whitelist_kallisto; barcode_whitelist_alevinqc } - barcode_whitelist_gzipped = Channel.empty() -} - -//////////////////////////////////////////////////// -/* -- PRINT PARAMETER SUMMARY -- */ -//////////////////////////////////////////////////// -log.info NfcoreSchema.params_summary_log(workflow, params, json_schema) - -// Header log info -def summary = [:] -if (workflow.revision) summary['Pipeline Release'] = workflow.revision -summary['Run Name'] = workflow.runName -summary['Reads'] = params.input -if(params.fasta) summary['Genome Reference'] = params.fasta -summary['GTF Reference'] = params.gtf -summary['Save Reference?'] = params.save_reference -summary['Aligner'] = params.aligner -if (params.salmon_index) summary['Salmon Index'] = params.salmon_index -if (params.star_index) summary['STARsolo Index'] = params.star_index -if (params.kallisto_index) summary['Kallisto Index'] = params.kallisto_index -summary['Droplet Technology'] = params.type -summary['Chemistry Version'] = params.chemistry -if (params.aligner == 'alevin') summary['Alevin TXP2Gene'] = params.txp2gene -if (params.aligner == 'kallisto') summary['Kallisto Gene Map'] = params.kallisto_gene_map -if(params.aligner == 'kallisto') summary['BUSTools Correct'] = params.bustools_correct -summary['Max Resources'] = "$params.max_memory memory, $params.max_cpus cpus, $params.max_time time per job" -if (workflow.containerEngine) summary['Container'] = "$workflow.containerEngine - $workflow.container" -summary['Output dir'] = params.outdir -summary['Launch dir'] = workflow.launchDir -summary['Working dir'] = workflow.workDir -summary['Script dir'] = workflow.projectDir -summary['User'] = workflow.userName -if (workflow.profile.contains('awsbatch')) { - summary['AWS Region'] = params.awsregion - summary['AWS Queue'] = params.awsqueue - summary['AWS CLI'] = params.awscli -} -summary['Config Profile'] = workflow.profile -if (params.config_profile_description) summary['Config Profile Description'] = params.config_profile_description -if (params.config_profile_contact) summary['Config Profile Contact'] = params.config_profile_contact -if (params.config_profile_url) summary['Config Profile URL'] = params.config_profile_url -summary['Config Files'] = workflow.configFiles.join(', ') -if (params.email || params.email_on_fail) { - summary['E-mail Address'] = params.email - summary['E-mail on failure'] = params.email_on_fail - summary['MultiQC maxsize'] = params.max_multiqc_email_size -} - -// Check the hostnames against configured profiles -checkHostname() - -Channel.from(summary.collect{ [it.key, it.value] }) - .map { k,v -> "
$k
${v ?: 'N/A'}
" } - .reduce { a, b -> return [a, b].join("\n ") } - .map { x -> """ - id: 'nf-core-scrnaseq-summary' - description: " - this information is collected when the pipeline is started." - section_name: 'nf-core/scrnaseq Workflow Summary' - section_href: 'https://github.com/nf-core/scrnaseq' - plot_type: 'html' - data: | -
- $x -
- """.stripIndent() } - .set { ch_workflow_summary } - -/* - * Parse software version numbers - */ -process get_software_versions { - publishDir "${params.outdir}/pipeline_info", mode: params.publish_dir_mode, - saveAs: { filename -> - if (filename.indexOf('.csv') > 0) filename - else null - } - - output: - file 'software_versions_mqc.yaml' into ch_software_versions_yaml - file 'software_versions.csv' - - script: - """ - echo $workflow.manifest.version &> v_pipeline.txt - echo $workflow.nextflow.version &> v_nextflow.txt - salmon --version &> v_salmon.txt 2>&1 || true - STAR --version &> v_star.txt 2>&1 || true - multiqc --version &> v_multiqc.txt 2>&1 || true - kallisto version &> v_kallisto.txt 2>&1 || true - bustools &> v_bustools.txt 2>&1 || true - - scrape_software_versions.py > software_versions_mqc.yaml - """ -} - -/* -* Preprocessing - Unzip 10X barcodes if they are supplied compressed -*/ -process unzip_10x_barcodes { - tag "${params.chemistry}" - publishDir "${params.outdir}/reference_data/barcodes", mode: 'copy' - - input: - file gzipped from barcode_whitelist_gzipped - - output: - file "${gzipped.simpleName}" into (barcode_whitelist_star_unzip, barcode_whitelist_kallisto_unzip, barcode_whitelist_alevinqc_unzip) - - when: params.type == '10x' && !params.barcode_whitelist - - script: - """ - gunzip -c $gzipped > ${gzipped.simpleName} - """ -} - - -/* - * Preprocessing - Extract transcriptome fasta from genome fasta - */ - -process extract_transcriptome { - tag "${genome_fasta}" - publishDir "${params.outdir}/reference_data/extract_transcriptome", mode: 'copy' - - input: - file genome_fasta from genome_fasta_extract_transcriptome - file gtf from gtf_extract_transcriptome - - output: - file "${genome_fasta}.transcriptome.fa" into (transcriptome_fasta_alevin_extracted, transcriptome_fasta_kallisto_extracted) - - when: !params.transcript_fasta && (params.aligner == 'alevin' || params.aligner == 'kallisto') - script: - // -F to preserve all GTF attributes in the fasta ID - """ - gffread -F $gtf -w "${genome_fasta}.transcriptome.fa" -g $genome_fasta - """ -} /* - * STEP 1 - Make_index - */ - -process build_salmon_index { - tag "$fasta" - label 'low_memory' - publishDir path: { params.save_reference ? "${params.outdir}/reference_genome/salmon_index" : params.outdir }, - saveAs: { params.save_reference ? it : null }, mode: 'copy' - - when: - params.aligner == 'alevin' && !params.salmon_index - - input: - file fasta from transcriptome_fasta_alevin.mix(transcriptome_fasta_alevin_extracted) - - output: - file "salmon_index" into salmon_index_alevin - - when: params.aligner == 'alevin' && !params.salmon_index - - script: - """ - salmon index -i salmon_index --gencode -k 31 -p 4 -t $fasta - """ -} - - -//Create a STAR index if not supplied via --star_index -process makeSTARindex { - label 'high_memory' - tag "$fasta" - publishDir path: { params.save_reference ? "${params.outdir}/reference_genome/star_index" : params.outdir }, - saveAs: { params.save_reference ? it : null }, mode: 'copy' - - input: - file fasta from genome_fasta_makeSTARindex - file gtf from gtf_makeSTARindex - - output: - file "star" into star_index - - when: params.aligner == 'star' && !params.star_index && params.fasta - - script: - def avail_mem = task.memory ? "--limitGenomeGenerateRAM ${task.memory.toBytes() - 100000000}" : '' - """ - mkdir star - STAR \\ - --runMode genomeGenerate \\ - --runThreadN ${task.cpus} \\ - --sjdbGTFfile $gtf \\ - --genomeDir star/ \\ - --genomeFastaFiles $fasta \\ - $avail_mem - """ -} - -/* -* Preprocessing - Generate Kallisto Index if not supplied via --kallisto_index -*/ -process build_kallisto_index { - tag "$fasta" - label 'mid_memory' - publishDir path: { params.save_reference ? "${params.outdir}/reference_genome/kallisto_index" : params.outdir }, - saveAs: { params.save_reference ? it : null }, mode: 'copy' - input: - file fasta from transcriptome_fasta_kallisto.mix(transcriptome_fasta_kallisto_extracted) - - output: - file "${name}.idx" into kallisto_index - - when: params.aligner == 'kallisto' && !params.kallisto_index - - script: - if("${fasta}".endsWith('.gz')){ - name = "${fasta.baseName}" - unzip = "gunzip -f ${fasta}" - } else { - unzip = "" - name = "${fasta}" - } - """ - $unzip - kallisto index -i ${name}.idx -k 31 $name - """ -} - -/* -* Preprocessing - Generate a Kallisto Gene Map if not supplied via --kallisto_gene_map -*/ -process build_gene_map{ - tag "$gtf" - publishDir "${params.outdir}/kallisto/kallisto_gene_map", mode: 'copy' - - input: - file gtf from gtf_gene_map - - output: - file "transcripts_to_genes.txt" into kallisto_gene_map - - when: params.aligner == 'kallisto' && !params.kallisto_gene_map - - script: - if("${gtf}".endsWith('.gz')){ - name = "${gtf.baseName}" - unzip = "gunzip -f ${gtf}" - } else { - unzip = "" - name = "${gtf}" - } - """ - $unzip - cat $name | t2g.py --use_version > transcripts_to_genes.txt - """ -} - - -/* -* Preprocessing - Generate TXP2Gene if not supplied via --txp2gene -*/ -process build_txp2gene { - tag "$gtf" - publishDir "${params.outdir}/reference_data/alevin/", mode: 'copy' - - input: - file gtf from gtf_alevin - - output: - file "txp2gene.tsv" into txp2gene - - when: - params.aligner == 'alevin' && !params.txp2gene - - script: - - """ - bioawk -c gff '\$feature=="transcript" {print \$group}' $gtf | awk -F ' ' '{print substr(\$4,2,length(\$4)-3) "\t" substr(\$2,2,length(\$2)-3)}' > txp2gene.tsv - """ -} - -/* -* Run Salmon Alevin -*/ -process alevin { - tag "$name" - label 'high_memory' - publishDir "${params.outdir}/alevin/alevin", mode: 'copy' - - input: - set val(name), file(reads) from read_files_alevin - file index from salmon_index_alevin.collect() - file txp2gene from txp2gene.collect() - - output: - file "${name}_alevin_results" into alevin_results, alevin_logs - - when: - params.aligner == "alevin" - - script: - read1 = reads[0] - read2 = reads[1] - """ - salmon alevin -l ISR -1 ${read1} -2 ${read2} \ - --chromium -i $index -o ${name}_alevin_results -p ${task.cpus} --tgMap $txp2gene --dumpFeatures –-dumpMtx - """ -} - -/* -* Run Alevin QC -*/ - -process alevin_qc { - tag "$prefix" - publishDir "${params.outdir}/alevin/alevin_qc", mode: 'copy' - - when: - params.aligner == "alevin" - - input: - file result from alevin_results - file whitelist from barcode_whitelist_alevinqc.mix(barcode_whitelist_alevinqc_unzip).collect() - - output: - file "${result}" into alevinqc_results - - script: - prefix = result.toString() - '_alevin_results' - """ - mv $whitelist ${result}/alevin/whitelist.txt - alevin_qc.r $result ${prefix} $result - """ -} - -process star { - label 'high_memory' - - tag "$prefix" - publishDir "${params.outdir}/STAR", mode: 'copy' - - input: - set val(samplename), file(reads) from read_files_star - file index from star_index.collect() - file gtf from gtf_star.collect() - file whitelist from barcode_whitelist_star.mix(barcode_whitelist_star_unzip).collect() - - output: - set file("*Log.final.out"), file ('*.bam') into star_aligned - file "*.out" into alignment_logs - file "*SJ.out.tab" - file "*Log.out" into star_log - file "${prefix}Aligned.sortedByCoord.out.bam.bai" into bam_index_rseqc, bam_index_genebody - - when: params.aligner == 'star' - - script: - prefix = reads[0].toString() - ~/(_R1)?(_trimmed)?(_val_1)?(\.fq)?(\.fastq)?(\.gz)?$/ - def star_mem = task.memory ?: params.star_memory ?: false - def avail_mem = star_mem ? "--limitBAMsortRAM ${star_mem.toBytes() - 100000000}" : '' - - seq_center = params.seq_center ? "--outSAMattrRGline ID:$prefix 'CN:$params.seq_center'" : '' - cdna_read = reads[0] - barcode_read = reads[1] - """ - STAR --genomeDir $index \\ - --sjdbGTFfile $gtf \\ - --readFilesIn $barcode_read $cdna_read \\ - --runThreadN ${task.cpus} \\ - --twopassMode Basic \\ - --outWigType bedGraph \\ - --outSAMtype BAM SortedByCoordinate $avail_mem \\ - --readFilesCommand zcat \\ - --runDirPerm All_RWX \\ - --outFileNamePrefix $prefix $seq_center \\ - --soloType Droplet \\ - --soloCBwhitelist $whitelist - - samtools index ${prefix}Aligned.sortedByCoord.out.bam - """ -} - -// get output flattened for other downstream processes -star_aligned - .flatMap { logs, bams -> bams } -.into { bam_count; bam_rseqc; bam_preseq; bam_markduplicates; bam_htseqcount; bam_stringtieFPKM; bam_for_genebody; bam_dexseq; leafcutter_bam } - -/* -* Run Kallisto Workflow +======================================================================================== + VALIDATE & PRINT PARAMETER SUMMARY +======================================================================================== */ -process kallisto { - tag "$name" - label 'mid_memory' - publishDir "${params.outdir}/kallisto/raw_bus", mode: 'copy' - - input: - set val(name), file(reads) from read_files_kallisto - file index from kallisto_index.collect() - output: - file "${name}_bus_output" into kallisto_bus_to_sort - file "${name}_kallisto.log" into kallisto_log_for_multiqc +WorkflowMain.initialise(workflow, params, log) - when: params.aligner == 'kallisto' - script: - """ - kallisto bus \\ - -i $index \\ - -o ${name}_bus_output/ \\ - -x ${params.type}${params.chemistry} \\ - -t ${task.cpus} \\ - $reads | tee ${name}_kallisto.log - """ -} /* -* Run BUSTools Correct / Sort on Kallisto Output +======================================================================================== + NAMED WORKFLOW FOR PIPELINE +======================================================================================== */ -process bustools_correct_sort{ - tag "$bus" - label 'mid_memory' - publishDir "${params.outdir}/kallisto/sort_bus", mode: 'copy' - input: - file bus from kallisto_bus_to_sort - file whitelist from barcode_whitelist_kallisto.mix(barcode_whitelist_kallisto_unzip).collect() +include { SCRNASEQ } from './workflows/scrnaseq' - output: - file bus into (kallisto_corrected_sort_to_count, kallisto_corrected_sort_to_metrics) +// +// WORKFLOW: Run main scrnaseq analysis pipeline +// - when: !params.skip_bustools - - script: - if(params.bustools_correct) { - correct = "bustools correct -w $whitelist -o ${bus}/output.corrected.bus ${bus}/output.bus" - sort_file = "${bus}/output.corrected.bus" - } else { - correct = "" - sort_file = "${bus}/output.bus" - } - """ - $correct - mkdir -p tmp - bustools sort -T tmp/ -t ${task.cpus} -m ${task.memory.toGiga()}G -o ${bus}/output.corrected.sort.bus $sort_file - """ +workflow NFCORE_SCRNASEQ{ + SCRNASEQ() } /* -* Run BUSTools count on sorted/corrected output from Kallisto|Bus +======================================================================================== + RUN ALL WORKFLOWS +======================================================================================== */ -process bustools_count{ - tag "$bus" - label 'mid_memory' - publishDir "${params.outdir}/kallisto/bustools_counts", mode: "copy" - - input: - file bus from kallisto_corrected_sort_to_count - file t2g from kallisto_gene_map.collect() - - output: - file "${bus}_eqcount" - file "${bus}_genecount" - script: - """ - mkdir -p ${bus}_eqcount - mkdir -p ${bus}_genecount - bustools count -o ${bus}_eqcount/tcc -g $t2g -e ${bus}/matrix.ec -t ${bus}/transcripts.txt ${bus}/output.corrected.sort.bus - bustools count -o ${bus}_genecount/gene -g $t2g -e ${bus}/matrix.ec -t ${bus}/transcripts.txt --genecounts ${bus}/output.corrected.sort.bus - """ +// +// WORKFLOW: Execute a single named workflow for the pipeline +// See: https://github.com/nf-core/rnaseq/issues/619 +// +workflow { + NFCORE_SCRNASEQ () } /* -* Run Bustools inspect +======================================================================================== + THE END +======================================================================================== */ - -process bustools_inspect{ - tag "$bus" - publishDir "${params.outdir}/kallisto/bustools_metrics", mode: "copy" - - input: - file bus from kallisto_corrected_sort_to_metrics - - output: - file "${bus}.json" - - script: - """ - bustools inspect -o ${bus}.json ${bus}/output.corrected.sort.bus - """ -} - -/* - * Run MultiQC on results / logfiles - */ -process multiqc { - publishDir "${params.outdir}/MultiQC", mode: params.publish_dir_mode - - input: - file multiqc_config from ch_multiqc_config - file ('software_versions/*') from ch_software_versions_yaml - file workflow_summary from ch_workflow_summary.collectFile(name: "workflow_summary_mqc.yaml") - file ('STAR/*') from star_log.collect().ifEmpty([]) - file ('alevin/*') from alevin_logs.collect().ifEmpty([]) - file ('kallisto/*') from kallisto_log_for_multiqc.collect().ifEmpty([]) - - output: - file "*multiqc_report.html" into ch_multiqc_report - file "*_data" - - script: - rtitle = '' - rfilename = '' - if (!(workflow.runName ==~ /[a-z]+_[a-z]+/)) { - rtitle = "--title \"${workflow.runName}\"" - rfilename = "--filename " + workflow.runName.replaceAll('\\W','_').replaceAll('_+','_') + "_multiqc_report" - } - custom_config_file = params.multiqc_config ? "--config $mqc_custom_config" : '' - - """ - multiqc -f $rtitle $rfilename $custom_config_file \ - -m custom_content -m salmon -m star -m kallisto . - """ -} - -/* - * Output Description HTML - */ -process output_documentation { - publishDir "${params.outdir}/pipeline_info", mode: params.publish_dir_mode - - input: - file output_docs from ch_output_docs - file images from ch_output_docs_images - - output: - file 'results_description.html' - - script: - """ - markdown_to_html.py $output_docs -o results_description.html - """ -} - -/* - * Completion e-mail notification - */ -workflow.onComplete { - - // Set up the e-mail variables - def subject = "[nf-core/scrnaseq] Successful: $workflow.runName" - if (!workflow.success) { - subject = "[nf-core/scrnaseq] FAILED: $workflow.runName" - } - def email_fields = [:] - email_fields['version'] = workflow.manifest.version - email_fields['runName'] = workflow.runName - email_fields['success'] = workflow.success - email_fields['dateComplete'] = workflow.complete - email_fields['duration'] = workflow.duration - email_fields['exitStatus'] = workflow.exitStatus - email_fields['errorMessage'] = (workflow.errorMessage ?: 'None') - email_fields['errorReport'] = (workflow.errorReport ?: 'None') - email_fields['commandLine'] = workflow.commandLine - email_fields['projectDir'] = workflow.projectDir - email_fields['summary'] = summary - email_fields['summary']['Date Started'] = workflow.start - email_fields['summary']['Date Completed'] = workflow.complete - email_fields['summary']['Pipeline script file path'] = workflow.scriptFile - email_fields['summary']['Pipeline script hash ID'] = workflow.scriptId - if (workflow.repository) email_fields['summary']['Pipeline repository Git URL'] = workflow.repository - if (workflow.commitId) email_fields['summary']['Pipeline repository Git Commit'] = workflow.commitId - if (workflow.revision) email_fields['summary']['Pipeline Git branch/tag'] = workflow.revision - email_fields['summary']['Nextflow Version'] = workflow.nextflow.version - email_fields['summary']['Nextflow Build'] = workflow.nextflow.build - email_fields['summary']['Nextflow Compile Timestamp'] = workflow.nextflow.timestamp - - // On success try attach the multiqc report - def mqc_report = null - try { - if (workflow.success) { - mqc_report = ch_multiqc_report.getVal() - if (mqc_report.getClass() == ArrayList) { - log.warn "[nf-core/scrnaseq] Found multiple reports from process 'multiqc', will use only one" - mqc_report = mqc_report[0] - } - } - } catch (all) { - log.warn "[nf-core/scrnaseq] Could not attach MultiQC report to summary email" - } - - // Check if we are only sending emails on failure - email_address = params.email - if (!params.email && params.email_on_fail && !workflow.success) { - email_address = params.email_on_fail - } - - // Render the TXT template - def engine = new groovy.text.GStringTemplateEngine() - def tf = new File("$projectDir/assets/email_template.txt") - def txt_template = engine.createTemplate(tf).make(email_fields) - def email_txt = txt_template.toString() - - // Render the HTML template - def hf = new File("$projectDir/assets/email_template.html") - def html_template = engine.createTemplate(hf).make(email_fields) - def email_html = html_template.toString() - - // Render the sendmail template - def smail_fields = [ email: email_address, subject: subject, email_txt: email_txt, email_html: email_html, projectDir: "$projectDir", mqcFile: mqc_report, mqcMaxSize: params.max_multiqc_email_size.toBytes() ] - def sf = new File("$projectDir/assets/sendmail_template.txt") - def sendmail_template = engine.createTemplate(sf).make(smail_fields) - def sendmail_html = sendmail_template.toString() - - // Send the HTML e-mail - if (email_address) { - try { - if (params.plaintext_email) { throw GroovyException('Send plaintext e-mail, not HTML') } - // Try to send HTML e-mail using sendmail - [ 'sendmail', '-t' ].execute() << sendmail_html - log.info "[nf-core/scrnaseq] Sent summary e-mail to $email_address (sendmail)" - } catch (all) { - // Catch failures and try with plaintext - def mail_cmd = [ 'mail', '-s', subject, '--content-type=text/html', email_address ] - if ( mqc_report.size() <= params.max_multiqc_email_size.toBytes() ) { - mail_cmd += [ '-A', mqc_report ] - } - mail_cmd.execute() << email_html - log.info "[nf-core/scrnaseq] Sent summary e-mail to $email_address (mail)" - } - } - - // Write summary e-mail HTML to a file - def output_d = new File("${params.outdir}/pipeline_info/") - if (!output_d.exists()) { - output_d.mkdirs() - } - def output_hf = new File(output_d, "pipeline_report.html") - output_hf.withWriter { w -> w << email_html } - def output_tf = new File(output_d, "pipeline_report.txt") - output_tf.withWriter { w -> w << email_txt } - - c_green = params.monochrome_logs ? '' : "\033[0;32m"; - c_purple = params.monochrome_logs ? '' : "\033[0;35m"; - c_red = params.monochrome_logs ? '' : "\033[0;31m"; - c_reset = params.monochrome_logs ? '' : "\033[0m"; - - if (workflow.stats.ignoredCount > 0 && workflow.success) { - log.info "-${c_purple}Warning, pipeline completed, but with errored process(es) ${c_reset}-" - log.info "-${c_red}Number of ignored errored process(es) : ${workflow.stats.ignoredCount} ${c_reset}-" - log.info "-${c_green}Number of successfully ran process(es) : ${workflow.stats.succeedCount} ${c_reset}-" - } - - if (workflow.success) { - log.info "-${c_purple}[nf-core/scrnaseq]${c_green} Pipeline completed successfully${c_reset}-" - } else { - checkHostname() - log.info "-${c_purple}[nf-core/scrnaseq]${c_red} Pipeline completed with errors${c_reset}-" - } - -} - -workflow.onError { - // Print unexpected parameters - easiest is to just rerun validation - NfcoreSchema.validateParameters(params, json_schema, log) -} - -def checkHostname() { - def c_reset = params.monochrome_logs ? '' : "\033[0m" - def c_white = params.monochrome_logs ? '' : "\033[0;37m" - def c_red = params.monochrome_logs ? '' : "\033[1;91m" - def c_yellow_bold = params.monochrome_logs ? '' : "\033[1;93m" - if (params.hostnames) { - def hostname = 'hostname'.execute().text.trim() - params.hostnames.each { prof, hnames -> - hnames.each { hname -> - if (hostname.contains(hname) && !workflow.profile.contains(prof)) { - log.error '====================================================\n' + - " ${c_red}WARNING!${c_reset} You are running with `-profile $workflow.profile`\n" + - " but your machine hostname is ${c_white}'$hostname'${c_reset}\n" + - " ${c_yellow_bold}It's highly recommended that you use `-profile $prof${c_reset}`\n" + - '============================================================' - } - } - } - } -} diff --git a/modules.json b/modules.json new file mode 100644 index 00000000..883d9b9a --- /dev/null +++ b/modules.json @@ -0,0 +1,32 @@ +{ + "name": "nf-core/scrnaseq", + "homePage": "https://github.com/nf-core/scrnaseq", + "repos": { + "nf-core/modules": { + "fastqc": { + "git_sha": "e937c7950af70930d1f34bb961403d9d2aa81c7d" + }, + "gffread": { + "git_sha": "49da8642876ae4d91128168cd0db4f1c858d7792" + }, + "gunzip": { + "git_sha": "3aacd46da2b221ed47aaa05c413a828538d2c2ae" + }, + "kallistobustools/ref": { + "git_sha": "49da8642876ae4d91128168cd0db4f1c858d7792" + }, + "multiqc": { + "git_sha": "e937c7950af70930d1f34bb961403d9d2aa81c7d" + }, + "salmon/index": { + "git_sha": "3aacd46da2b221ed47aaa05c413a828538d2c2ae" + }, + "salmon/quant": { + "git_sha": "3aacd46da2b221ed47aaa05c413a828538d2c2ae" + }, + "star/genomegenerate": { + "git_sha": "3aacd46da2b221ed47aaa05c413a828538d2c2ae" + } + } + } +} \ No newline at end of file diff --git a/modules/local/alevinqc.nf b/modules/local/alevinqc.nf new file mode 100644 index 00000000..590d11f2 --- /dev/null +++ b/modules/local/alevinqc.nf @@ -0,0 +1,40 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process ALEVINQC { + tag "$meta.id" + label 'process_low' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:meta.id) } + + conda (params.enable_conda ? "bioconda::bioconductor-alevinqc=1.6.1" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/bioconductor-alevinqc:1.6.1--r40hdfd78af_0" + } else { + container "quay.io/biocontainers/bioconductor-alevinqc:1.6.1--r40hdfd78af_0" + } + + input: + tuple val(meta), path(alevin_results) + + output: + tuple val(meta), path("alevinReport.html"), emit: report + path "*.version.txt" , emit: version + + script: + def software = getSoftwareName(task.process) + def prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}" + """ + #!/usr/bin/env Rscript + require(alevinQC) + alevinQCReport(baseDir = "${alevin_results}", sampleId = "${prefix}", + outputFile = "alevinReport.html", + outputFormat = "html_document", + outputDir = "./", forceOverwrite = TRUE) + write(as.character(packageVersion("alevinQC")), paste0("${software}", ".version.txt")) + """ +} diff --git a/modules/local/cellranger_count.nf b/modules/local/cellranger_count.nf new file mode 100644 index 00000000..497f6769 --- /dev/null +++ b/modules/local/cellranger_count.nf @@ -0,0 +1,42 @@ +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +def options = initOptions(params.options) + +process CELLRANGER_COUNT { + + label 'process_high' + + publishDir "${params.outdir}", + mode: 'copy', + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:'') } + + container "streitlab/custom-nf-modules-cellranger:latest" + + input: + tuple val(meta), path('fastqs/*') + path reference_genome + + output: + tuple val(meta), path("${prefix}_cellranger") , emit: cellranger_out + tuple val(meta), path("${prefix}/*") , emit: read_counts + path '*.version.txt' , emit: version + + script: + def prefix = meta.run ? "${meta.sample_name}_${meta.run}" : "${meta.sample_name}" + def software = getSoftwareName(task.process) + + """ + cellranger count \\ + --id='${prefix}_cellranger' \\ + --fastqs='fastqs' \\ + --sample=${meta.sample_id} \\ + --transcriptome=${reference_genome} \\ + ${options.args} + + mkdir ${prefix} + cp ${prefix}_cellranger/outs/filtered_feature_bc_matrix/* ${prefix} + + echo \$(cellranger --version 2>&1) | sed 's/^.*cellranger //; s/ .*\$//' > ${software}.version.txt + """ +} diff --git a/modules/local/cellranger_mkgtf.nf b/modules/local/cellranger_mkgtf.nf new file mode 100644 index 00000000..979b5331 --- /dev/null +++ b/modules/local/cellranger_mkgtf.nf @@ -0,0 +1,34 @@ +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +def options = initOptions(params.options) + +process CELLRANGER_MKGTF { + + label 'process_low' + + publishDir "${params.outdir}", + mode: 'copy', + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:'') } + + container "streitlab/custom-nf-modules-cellranger:latest" + + input: + path gtf + + output: + path '*.gtf' , emit: gtf + path '*.version.txt' , emit: version + + script: + def software = getSoftwareName(task.process) + + """ + cellranger mkgtf \\ + ${gtf} \\ + ${gtf.baseName}_mkgtf.gtf \\ + ${options.args} + + echo \$(cellranger --version 2>&1) | sed 's/^.*cellranger //; s/ .*\$//' > ${software}.version.txt + """ +} diff --git a/modules/local/cellranger_mkref.nf b/modules/local/cellranger_mkref.nf new file mode 100644 index 00000000..fb5583f2 --- /dev/null +++ b/modules/local/cellranger_mkref.nf @@ -0,0 +1,36 @@ +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +def options = initOptions(params.options) + +process CELLRANGER_MKREF { + + label 'process_medium' + + publishDir "${params.outdir}", + mode: 'copy', + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:'') } + + container "streitlab/custom-nf-modules-cellranger:latest" + + input: + path gtf + path fasta + + output: + path 'reference_genome' , emit: reference_genome + path '*.version.txt' , emit: version + + script: + def software = getSoftwareName(task.process) + + """ + cellranger mkref \\ + --genome=reference_genome \\ + --genes=${gtf} \\ + --fasta=${fasta} \\ + --nthreads=${task.cpus} + + echo \$(cellranger --version 2>&1) | sed 's/^.*cellranger //; s/ .*\$//' > ${software}.version.txt + """ +} diff --git a/modules/local/functions.nf b/modules/local/functions.nf new file mode 100644 index 00000000..da9da093 --- /dev/null +++ b/modules/local/functions.nf @@ -0,0 +1,68 @@ +// +// Utility functions used in nf-core DSL2 module files +// + +// +// Extract name of software tool from process name using $task.process +// +def getSoftwareName(task_process) { + return task_process.tokenize(':')[-1].tokenize('_')[0].toLowerCase() +} + +// +// Function to initialise default values and to generate a Groovy Map of available options for nf-core modules +// +def initOptions(Map args) { + def Map options = [:] + options.args = args.args ?: '' + options.args2 = args.args2 ?: '' + options.args3 = args.args3 ?: '' + options.publish_by_meta = args.publish_by_meta ?: [] + options.publish_dir = args.publish_dir ?: '' + options.publish_files = args.publish_files + options.suffix = args.suffix ?: '' + return options +} + +// +// Tidy up and join elements of a list to return a path string +// +def getPathFromList(path_list) { + def paths = path_list.findAll { item -> !item?.trim().isEmpty() } // Remove empty entries + paths = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes + return paths.join('/') +} + +// +// Function to save/publish module results +// +def saveFiles(Map args) { + if (!args.filename.endsWith('.version.txt')) { + def ioptions = initOptions(args.options) + def path_list = [ ioptions.publish_dir ?: args.publish_dir ] + if (ioptions.publish_by_meta) { + def key_list = ioptions.publish_by_meta instanceof List ? ioptions.publish_by_meta : args.publish_by_meta + for (key in key_list) { + if (args.meta && key instanceof String) { + def path = key + if (args.meta.containsKey(key)) { + path = args.meta[key] instanceof Boolean ? "${key}_${args.meta[key]}".toString() : args.meta[key] + } + path = path instanceof String ? path : '' + path_list.add(path) + } + } + } + if (ioptions.publish_files instanceof Map) { + for (ext in ioptions.publish_files) { + if (args.filename.endsWith(ext.key)) { + def ext_list = path_list.collect() + ext_list.add(ext.value) + return "${getPathFromList(ext_list)}/$args.filename" + } + } + } else if (ioptions.publish_files == null) { + return "${getPathFromList(path_list)}/$args.filename" + } + } +} diff --git a/modules/local/gene_map.nf b/modules/local/gene_map.nf new file mode 100644 index 00000000..bf213d28 --- /dev/null +++ b/modules/local/gene_map.nf @@ -0,0 +1,40 @@ +// Import generic module functions +include { saveFiles } from './functions' + +params.options = [:] + +/* + * Reformat design file and check validity + */ +process GENE_MAP { + tag "$gtf" + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:'pipeline_info', publish_id:'') } + + conda (params.enable_conda ? "conda-forge::python=3.8.3" : null) + if (workflow.containerEngine == 'singularity' && !params.pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/python:3.8.3" + } else { + container "quay.io/biocontainers/python:3.8.3" + } + + input: + path gtf + + output: + path "transcripts_to_genes.txt" , emit: gene_map + + script: + if("${gtf}".endsWith('.gz')){ + name = "${gtf.baseName}" + unzip = "gunzip -f ${gtf}" + } else { + unzip = "" + name = "${gtf}" + } + """ + $unzip + cat $name | t2g.py --use_version > transcripts_to_genes.txt + """ +} diff --git a/modules/local/get_software_versions.nf b/modules/local/get_software_versions.nf new file mode 100644 index 00000000..a1f3df03 --- /dev/null +++ b/modules/local/get_software_versions.nf @@ -0,0 +1,33 @@ +// Import generic module functions +include { saveFiles } from './functions' + +params.options = [:] + +process GET_SOFTWARE_VERSIONS { + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:'pipeline_info', meta:[:], publish_by_meta:[]) } + + conda (params.enable_conda ? "conda-forge::python=3.8.3" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/python:3.8.3" + } else { + container "quay.io/biocontainers/python:3.8.3" + } + + cache false + + input: + path versions + + output: + path "software_versions.tsv" , emit: tsv + path 'software_versions_mqc.yaml', emit: yaml + + script: // This script is bundled with the pipeline, in nf-core/scrnaseq/bin/ + """ + echo $workflow.manifest.version > pipeline.version.txt + echo $workflow.nextflow.version > nextflow.version.txt + scrape_software_versions.py &> software_versions_mqc.yaml + """ +} diff --git a/modules/local/gffread_transcriptome.nf b/modules/local/gffread_transcriptome.nf new file mode 100644 index 00000000..180d5159 --- /dev/null +++ b/modules/local/gffread_transcriptome.nf @@ -0,0 +1,38 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + + +params.options = [:] +options = initOptions(params.options) + +process GFFREAD_TRANSCRIPTOME { + tag "${genome_fasta}" + label 'process_low' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:'') } + + conda (params.enable_conda ? "bioconda::gffread=0.12.1" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/gffread:0.12.1--h2e03b76_1" + } else { + container "quay.io/biocontainers/gffread:0.12.1--h2e03b76_1" + } + + input: + path genome_fasta + path gtf + + output: + path "${genome_fasta}.transcriptome.fa" , emit: transcriptome_extracted + path "*.version.txt" , emit: version + + script: + def software = getSoftwareName(task.process) + + """ + gffread -F $gtf -w "${genome_fasta}.transcriptome.fa" -g $genome_fasta + + echo \$(gffread --version 2>&1) > ${software}.version.txt + """ +} diff --git a/modules/local/kallistobustools_count.nf b/modules/local/kallistobustools_count.nf new file mode 100644 index 00000000..398d7a21 --- /dev/null +++ b/modules/local/kallistobustools_count.nf @@ -0,0 +1,56 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process KALLISTOBUSTOOLS_COUNT { + tag "$meta.id" + label 'process_medium' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:meta.id) } + + conda (params.enable_conda ? "bioconda::kb-python=0.25.1" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/kb-python:0.25.1--py_0" + } else { + container "quay.io/biocontainers/kb-python:0.25.1--py_0" + } + + input: + tuple val(meta), path(reads) + path index + path t2g + path t1c + path t2c + val use_t1c + val use_t2c + val workflow + val technology + + output: + tuple val(meta), path ("*_kallistobustools_count*") , emit: counts + path "*.version.txt" , emit: version + + script: + def software = getSoftwareName(task.process) + def prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}" + def cdna = use_t1c ? "-c1 $t1c" : '' + def introns = use_t2c ? "-c2 $t2c" : '' + """ + kb count \\ + -t $task.cpus \\ + -i $index \\ + -g $t2g \\ + $cdna \\ + $introns \\ + --workflow $workflow \\ + -x $technology \\ + $options.args \\ + -o ${prefix}_kallistobustools_count \\ + ${reads[0]} ${reads[1]} + + echo \$(kb 2>&1) | sed 's/^kb_python //; s/Usage.*\$//' > ${software}.version.txt + """ +} diff --git a/modules/local/multiqc_alevin.nf b/modules/local/multiqc_alevin.nf new file mode 100644 index 00000000..dc99fbc4 --- /dev/null +++ b/modules/local/multiqc_alevin.nf @@ -0,0 +1,40 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process MULTIQC { + label 'process_medium' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:[:], publish_by_meta:[]) } + + conda (params.enable_conda ? "bioconda::multiqc=1.10.1" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/multiqc:1.10.1--py_0" + } else { + container "quay.io/biocontainers/multiqc:1.10.1--py_0" + } + + input: + path 'multiqc_config.yaml' + path multiqc_custom_config + path software_versions + path workflow_summary + path ('salmon_alevin/*') + + output: + path "*multiqc_report.html" , emit: report + path "*_data" , emit: data + path "*variants_metrics_mqc.csv", optional:true, emit: csv_variants + path "*assembly_metrics_mqc.csv", optional:true, emit: csv_assembly + path "*_plots" , optional:true, emit: plots + + script: + def software = getSoftwareName(task.process) + def custom_config = params.multiqc_config ? "--config $multiqc_custom_config" : '' + """ + multiqc -f $options.args $custom_config . + """ +} diff --git a/modules/local/multiqc_kb.nf b/modules/local/multiqc_kb.nf new file mode 100644 index 00000000..ad562ea3 --- /dev/null +++ b/modules/local/multiqc_kb.nf @@ -0,0 +1,40 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process MULTIQC { + label 'process_medium' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:[:], publish_by_meta:[]) } + + conda (params.enable_conda ? "bioconda::multiqc=1.10.1" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/multiqc:1.10.1--py_0" + } else { + container "quay.io/biocontainers/multiqc:1.10.1--py_0" + } + + input: + path 'multiqc_config.yaml' + path multiqc_custom_config + path software_versions + path workflow_summary + //path ('salmon_alevin/*') + + output: + path "*multiqc_report.html" , emit: report + path "*_data" , emit: data + path "*variants_metrics_mqc.csv", optional:true, emit: csv_variants + path "*assembly_metrics_mqc.csv", optional:true, emit: csv_assembly + path "*_plots" , optional:true, emit: plots + + script: + def software = getSoftwareName(task.process) + def custom_config = params.multiqc_config ? "--config $multiqc_custom_config" : '' + """ + multiqc -f $options.args $custom_config . + """ +} diff --git a/modules/local/multiqc_starsolo.nf b/modules/local/multiqc_starsolo.nf new file mode 100644 index 00000000..b54510ea --- /dev/null +++ b/modules/local/multiqc_starsolo.nf @@ -0,0 +1,40 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process MULTIQC { + label 'process_medium' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:[:], publish_by_meta:[]) } + + conda (params.enable_conda ? "bioconda::multiqc=1.10.1" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/multiqc:1.10.1--py_0" + } else { + container "quay.io/biocontainers/multiqc:1.10.1--py_0" + } + + input: + path 'multiqc_config.yaml' + path multiqc_custom_config + path software_versions + path workflow_summary + path ('STAR/*') + + output: + path "*multiqc_report.html" , emit: report + path "*_data" , emit: data + path "*variants_metrics_mqc.csv", optional:true, emit: csv_variants + path "*assembly_metrics_mqc.csv", optional:true, emit: csv_assembly + path "*_plots" , optional:true, emit: plots + + script: + def software = getSoftwareName(task.process) + def custom_config = params.multiqc_config ? "--config $multiqc_custom_config" : '' + """ + multiqc -f $options.args $custom_config . + """ +} diff --git a/modules/local/salmon_alevin.nf b/modules/local/salmon_alevin.nf new file mode 100644 index 00000000..0cbddf70 --- /dev/null +++ b/modules/local/salmon_alevin.nf @@ -0,0 +1,53 @@ + +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process SALMON_ALEVIN { + tag "$meta.id" + label 'process_medium' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), publish_id:meta.id) } + + conda (params.enable_conda ? "bioconda::salmon=1.4.0" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/salmon:1.4.0--h84f40af_1" + } else { + container "quay.io/biocontainers/salmon:1.4.0--h84f40af_1" + } + + input: + tuple val(meta), path(reads) + path index + path txp2gene + val protocol + path whitelist + + output: + tuple val(meta), path("*_alevin_results"), emit: alevin_results + path "*.version.txt" , emit: version + + script: + def software = getSoftwareName(task.process) + def prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}" + """ + salmon alevin \\ + -l ISR \\ + -p $task.cpus \\ + -1 ${reads[0]} \\ + -2 ${reads[1]} \\ + --${protocol} \\ + -i $index \\ + --tgMap $txp2gene \\ + --dumpFeatures --dumpMtx \\ + $options.args \\ + -o ${prefix}_alevin_results + + mv ${whitelist} ${prefix}_alevin_results/alevin/whitelist.txt + + salmon --version | sed -e "s/salmon //g" > ${software}.version.txt + """ +} diff --git a/modules/local/samplesheet_check.nf b/modules/local/samplesheet_check.nf new file mode 100644 index 00000000..23bba758 --- /dev/null +++ b/modules/local/samplesheet_check.nf @@ -0,0 +1,46 @@ +// Import generic module functions +include { saveFiles } from './functions' + +params.options = [:] + +process SAMPLESHEET_CHECK { + tag "$samplesheet" + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:'pipeline_info', meta:[:], publish_by_meta:[]) } + + conda (params.enable_conda ? "conda-forge::python=3.8.3" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/python:3.8.3" + } else { + container "quay.io/biocontainers/python:3.8.3" + } + + input: + path samplesheet + + output: + path '*.csv' + + script: // This script is bundled with the pipeline, in nf-core/scrnaseq/bin/ + """ + check_samplesheet.py \\ + $samplesheet \\ + samplesheet.valid.csv + """ +} + +// Function to get list of [ meta, [ fastq_1, fastq_2 ] ] +def get_samplesheet_paths(LinkedHashMap row) { + def meta = [:] + meta.id = row.sample + meta.single_end = row.single_end.toBoolean() + + def array = [] + if (meta.single_end) { + array = [ meta, [ file(row.fastq_1, checkIfExists: true) ] ] + } else { + array = [ meta, [ file(row.fastq_1, checkIfExists: true), file(row.fastq_2, checkIfExists: true) ] ] + } + return array +} diff --git a/modules/local/star_align.nf b/modules/local/star_align.nf new file mode 100644 index 00000000..7462437e --- /dev/null +++ b/modules/local/star_align.nf @@ -0,0 +1,75 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process STAR_ALIGN { + tag "$meta.id" + label 'process_high' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } + + conda (params.enable_conda ? 'bioconda::star=2.7.8a' : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container 'https://depot.galaxyproject.org/singularity/star:2.7.8a--h9ee0642_1' + } else { + container 'quay.io/biocontainers/star:2.7.8a--h9ee0642_1' + } + + input: + tuple val(meta), path(reads) + path index + path gtf + path whitelist + val protocol + + output: + tuple val(meta), path('*d.out.bam') , emit: bam + tuple val(meta), path('*Log.final.out') , emit: log_final + tuple val(meta), path('*Log.out') , emit: log_out + tuple val(meta), path('*Log.progress.out'), emit: log_progress + path '*.version.txt' , emit: version + + tuple val(meta), path('*sortedByCoord.out.bam') , optional:true, emit: bam_sorted + tuple val(meta), path('*toTranscriptome.out.bam'), optional:true, emit: bam_transcript + tuple val(meta), path('*Aligned.unsort.out.bam') , optional:true, emit: bam_unsorted + tuple val(meta), path('*fastq.gz') , optional:true, emit: fastq + tuple val(meta), path('*.tab') , optional:true, emit: tab + + script: + def software = getSoftwareName(task.process) + def prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}" + def ignore_gtf = params.star_ignore_sjdbgtf ? '' : "--sjdbGTFfile $gtf" + def seq_center = params.seq_center ? "--outSAMattrRGline ID:$prefix 'CN:$params.seq_center' 'SM:$prefix'" : "--outSAMattrRGline ID:$prefix 'SM:$prefix'" + def out_sam_type = (options.args.contains('--outSAMtype')) ? '' : '--outSAMtype BAM Unsorted' + def mv_unsorted_bam = (options.args.contains('--outSAMtype BAM Unsorted SortedByCoordinate')) ? "mv ${prefix}.Aligned.out.bam ${prefix}.Aligned.unsort.out.bam" : '' + def read_pair = params.protocol.contains("chromium") ? "${reads[1]} ${reads[0]}" : "${reads[0]} ${reads[1]}" + """ + STAR \\ + --genomeDir $index \\ + --readFilesIn ${reads[1]} ${reads[0]} \\ + --runThreadN $task.cpus \\ + --outFileNamePrefix $prefix. \\ + --soloCBwhitelist $whitelist \\ + --soloType $protocol \\ + $out_sam_type \\ + $ignore_gtf \\ + $seq_center \\ + $options.args \\ + + $mv_unsorted_bam + + if [ -f ${prefix}.Unmapped.out.mate1 ]; then + mv ${prefix}.Unmapped.out.mate1 ${prefix}.unmapped_1.fastq + gzip ${prefix}.unmapped_1.fastq + fi + if [ -f ${prefix}.Unmapped.out.mate2 ]; then + mv ${prefix}.Unmapped.out.mate2 ${prefix}.unmapped_2.fastq + gzip ${prefix}.unmapped_2.fastq + fi + + STAR --version | sed -e "s/STAR_//g" > ${software}.version.txt + """ +} diff --git a/modules/nf-core/modules/fastqc/functions.nf b/modules/nf-core/modules/fastqc/functions.nf new file mode 100644 index 00000000..da9da093 --- /dev/null +++ b/modules/nf-core/modules/fastqc/functions.nf @@ -0,0 +1,68 @@ +// +// Utility functions used in nf-core DSL2 module files +// + +// +// Extract name of software tool from process name using $task.process +// +def getSoftwareName(task_process) { + return task_process.tokenize(':')[-1].tokenize('_')[0].toLowerCase() +} + +// +// Function to initialise default values and to generate a Groovy Map of available options for nf-core modules +// +def initOptions(Map args) { + def Map options = [:] + options.args = args.args ?: '' + options.args2 = args.args2 ?: '' + options.args3 = args.args3 ?: '' + options.publish_by_meta = args.publish_by_meta ?: [] + options.publish_dir = args.publish_dir ?: '' + options.publish_files = args.publish_files + options.suffix = args.suffix ?: '' + return options +} + +// +// Tidy up and join elements of a list to return a path string +// +def getPathFromList(path_list) { + def paths = path_list.findAll { item -> !item?.trim().isEmpty() } // Remove empty entries + paths = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes + return paths.join('/') +} + +// +// Function to save/publish module results +// +def saveFiles(Map args) { + if (!args.filename.endsWith('.version.txt')) { + def ioptions = initOptions(args.options) + def path_list = [ ioptions.publish_dir ?: args.publish_dir ] + if (ioptions.publish_by_meta) { + def key_list = ioptions.publish_by_meta instanceof List ? ioptions.publish_by_meta : args.publish_by_meta + for (key in key_list) { + if (args.meta && key instanceof String) { + def path = key + if (args.meta.containsKey(key)) { + path = args.meta[key] instanceof Boolean ? "${key}_${args.meta[key]}".toString() : args.meta[key] + } + path = path instanceof String ? path : '' + path_list.add(path) + } + } + } + if (ioptions.publish_files instanceof Map) { + for (ext in ioptions.publish_files) { + if (args.filename.endsWith(ext.key)) { + def ext_list = path_list.collect() + ext_list.add(ext.value) + return "${getPathFromList(ext_list)}/$args.filename" + } + } + } else if (ioptions.publish_files == null) { + return "${getPathFromList(path_list)}/$args.filename" + } + } +} diff --git a/modules/nf-core/modules/fastqc/main.nf b/modules/nf-core/modules/fastqc/main.nf new file mode 100644 index 00000000..39c327b2 --- /dev/null +++ b/modules/nf-core/modules/fastqc/main.nf @@ -0,0 +1,47 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process FASTQC { + tag "$meta.id" + label 'process_medium' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } + + conda (params.enable_conda ? "bioconda::fastqc=0.11.9" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0" + } else { + container "quay.io/biocontainers/fastqc:0.11.9--0" + } + + input: + tuple val(meta), path(reads) + + output: + tuple val(meta), path("*.html"), emit: html + tuple val(meta), path("*.zip") , emit: zip + path "*.version.txt" , emit: version + + script: + // Add soft-links to original FastQs for consistent naming in pipeline + def software = getSoftwareName(task.process) + def prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}" + if (meta.single_end) { + """ + [ ! -f ${prefix}.fastq.gz ] && ln -s $reads ${prefix}.fastq.gz + fastqc $options.args --threads $task.cpus ${prefix}.fastq.gz + fastqc --version | sed -e "s/FastQC v//g" > ${software}.version.txt + """ + } else { + """ + [ ! -f ${prefix}_1.fastq.gz ] && ln -s ${reads[0]} ${prefix}_1.fastq.gz + [ ! -f ${prefix}_2.fastq.gz ] && ln -s ${reads[1]} ${prefix}_2.fastq.gz + fastqc $options.args --threads $task.cpus ${prefix}_1.fastq.gz ${prefix}_2.fastq.gz + fastqc --version | sed -e "s/FastQC v//g" > ${software}.version.txt + """ + } +} diff --git a/modules/nf-core/modules/fastqc/meta.yml b/modules/nf-core/modules/fastqc/meta.yml new file mode 100644 index 00000000..8eb9953d --- /dev/null +++ b/modules/nf-core/modules/fastqc/meta.yml @@ -0,0 +1,51 @@ +name: fastqc +description: Run FastQC on sequenced reads +keywords: + - quality control + - qc + - adapters + - fastq +tools: + - fastqc: + description: | + FastQC gives general quality metrics about your reads. + It provides information about the quality score distribution + across your reads, the per base sequence content (%A/C/G/T). + You get information about adapter contamination and other + overrepresented sequences. + homepage: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ + documentation: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/ +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - reads: + type: file + description: | + List of input FastQ files of size 1 and 2 for single-end and paired-end data, + respectively. +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - html: + type: file + description: FastQC report + pattern: "*_{fastqc.html}" + - zip: + type: file + description: FastQC report archive + pattern: "*_{fastqc.zip}" + - version: + type: file + description: File containing software version + pattern: "*.{version.txt}" +authors: + - "@drpatelh" + - "@grst" + - "@ewels" + - "@FelixKrueger" diff --git a/modules/nf-core/modules/gffread/functions.nf b/modules/nf-core/modules/gffread/functions.nf new file mode 100644 index 00000000..85628ee0 --- /dev/null +++ b/modules/nf-core/modules/gffread/functions.nf @@ -0,0 +1,78 @@ +// +// Utility functions used in nf-core DSL2 module files +// + +// +// Extract name of software tool from process name using $task.process +// +def getSoftwareName(task_process) { + return task_process.tokenize(':')[-1].tokenize('_')[0].toLowerCase() +} + +// +// Extract name of module from process name using $task.process +// +def getProcessName(task_process) { + return task_process.tokenize(':')[-1] +} + +// +// Function to initialise default values and to generate a Groovy Map of available options for nf-core modules +// +def initOptions(Map args) { + def Map options = [:] + options.args = args.args ?: '' + options.args2 = args.args2 ?: '' + options.args3 = args.args3 ?: '' + options.publish_by_meta = args.publish_by_meta ?: [] + options.publish_dir = args.publish_dir ?: '' + options.publish_files = args.publish_files + options.suffix = args.suffix ?: '' + return options +} + +// +// Tidy up and join elements of a list to return a path string +// +def getPathFromList(path_list) { + def paths = path_list.findAll { item -> !item?.trim().isEmpty() } // Remove empty entries + paths = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes + return paths.join('/') +} + +// +// Function to save/publish module results +// +def saveFiles(Map args) { + def ioptions = initOptions(args.options) + def path_list = [ ioptions.publish_dir ?: args.publish_dir ] + + // Do not publish versions.yml unless running from pytest workflow + if (args.filename.equals('versions.yml') && !System.getenv("NF_CORE_MODULES_TEST")) { + return null + } + if (ioptions.publish_by_meta) { + def key_list = ioptions.publish_by_meta instanceof List ? ioptions.publish_by_meta : args.publish_by_meta + for (key in key_list) { + if (args.meta && key instanceof String) { + def path = key + if (args.meta.containsKey(key)) { + path = args.meta[key] instanceof Boolean ? "${key}_${args.meta[key]}".toString() : args.meta[key] + } + path = path instanceof String ? path : '' + path_list.add(path) + } + } + } + if (ioptions.publish_files instanceof Map) { + for (ext in ioptions.publish_files) { + if (args.filename.endsWith(ext.key)) { + def ext_list = path_list.collect() + ext_list.add(ext.value) + return "${getPathFromList(ext_list)}/$args.filename" + } + } + } else if (ioptions.publish_files == null) { + return "${getPathFromList(path_list)}/$args.filename" + } +} diff --git a/modules/nf-core/modules/gffread/main.nf b/modules/nf-core/modules/gffread/main.nf new file mode 100644 index 00000000..4133ee08 --- /dev/null +++ b/modules/nf-core/modules/gffread/main.nf @@ -0,0 +1,40 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName; getProcessName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process GFFREAD { + tag "$gff" + label 'process_low' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:[:], publish_by_meta:[]) } + + conda (params.enable_conda ? "bioconda::gffread=0.12.1" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/gffread:0.12.1--h8b12597_0" + } else { + container "quay.io/biocontainers/gffread:0.12.1--h8b12597_0" + } + + input: + path gff + + output: + path "*.gtf" , emit: gtf + path "versions.yml" , emit: versions + + script: + def prefix = options.suffix ? "${gff.baseName}${options.suffix}" : "${gff.baseName}" + """ + gffread \\ + $gff \\ + $options.args \\ + -o ${prefix}.gtf + cat <<-END_VERSIONS > versions.yml + ${getProcessName(task.process)}: + ${getSoftwareName(task.process)}: \$(gffread --version 2>&1) + END_VERSIONS + """ +} diff --git a/modules/nf-core/modules/gffread/meta.yml b/modules/nf-core/modules/gffread/meta.yml new file mode 100644 index 00000000..bf1a15cb --- /dev/null +++ b/modules/nf-core/modules/gffread/meta.yml @@ -0,0 +1,33 @@ +name: gffread +description: Validate, filter, convert and perform various other operations on GFF files +keywords: + - gff + - conversion + - validation +tools: + - gffread: + description: GFF/GTF utility providing format conversions, region filtering, FASTA sequence extraction and more. + homepage: http://ccb.jhu.edu/software/stringtie/gff.shtml#gffread + documentation: http://ccb.jhu.edu/software/stringtie/gff.shtml#gffread + tool_dev_url: https://github.com/gpertea/gffread + doi: 10.12688/f1000research.23297.1 + licence: ['MIT'] + +input: + - gff: + type: file + description: A reference file in either the GFF3, GFF2 or GTF format. + pattern: "*.{gff, gtf}" + +output: + - gtf: + type: file + description: GTF file resulting from the conversion of the GFF input file + pattern: "*.{gtf}" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + +authors: + - "@emiller88" diff --git a/modules/nf-core/modules/gunzip/functions.nf b/modules/nf-core/modules/gunzip/functions.nf new file mode 100644 index 00000000..85628ee0 --- /dev/null +++ b/modules/nf-core/modules/gunzip/functions.nf @@ -0,0 +1,78 @@ +// +// Utility functions used in nf-core DSL2 module files +// + +// +// Extract name of software tool from process name using $task.process +// +def getSoftwareName(task_process) { + return task_process.tokenize(':')[-1].tokenize('_')[0].toLowerCase() +} + +// +// Extract name of module from process name using $task.process +// +def getProcessName(task_process) { + return task_process.tokenize(':')[-1] +} + +// +// Function to initialise default values and to generate a Groovy Map of available options for nf-core modules +// +def initOptions(Map args) { + def Map options = [:] + options.args = args.args ?: '' + options.args2 = args.args2 ?: '' + options.args3 = args.args3 ?: '' + options.publish_by_meta = args.publish_by_meta ?: [] + options.publish_dir = args.publish_dir ?: '' + options.publish_files = args.publish_files + options.suffix = args.suffix ?: '' + return options +} + +// +// Tidy up and join elements of a list to return a path string +// +def getPathFromList(path_list) { + def paths = path_list.findAll { item -> !item?.trim().isEmpty() } // Remove empty entries + paths = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes + return paths.join('/') +} + +// +// Function to save/publish module results +// +def saveFiles(Map args) { + def ioptions = initOptions(args.options) + def path_list = [ ioptions.publish_dir ?: args.publish_dir ] + + // Do not publish versions.yml unless running from pytest workflow + if (args.filename.equals('versions.yml') && !System.getenv("NF_CORE_MODULES_TEST")) { + return null + } + if (ioptions.publish_by_meta) { + def key_list = ioptions.publish_by_meta instanceof List ? ioptions.publish_by_meta : args.publish_by_meta + for (key in key_list) { + if (args.meta && key instanceof String) { + def path = key + if (args.meta.containsKey(key)) { + path = args.meta[key] instanceof Boolean ? "${key}_${args.meta[key]}".toString() : args.meta[key] + } + path = path instanceof String ? path : '' + path_list.add(path) + } + } + } + if (ioptions.publish_files instanceof Map) { + for (ext in ioptions.publish_files) { + if (args.filename.endsWith(ext.key)) { + def ext_list = path_list.collect() + ext_list.add(ext.value) + return "${getPathFromList(ext_list)}/$args.filename" + } + } + } else if (ioptions.publish_files == null) { + return "${getPathFromList(path_list)}/$args.filename" + } +} diff --git a/modules/nf-core/modules/gunzip/main.nf b/modules/nf-core/modules/gunzip/main.nf new file mode 100644 index 00000000..aec4569f --- /dev/null +++ b/modules/nf-core/modules/gunzip/main.nf @@ -0,0 +1,41 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName; getProcessName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process GUNZIP { + tag "$archive" + label 'process_low' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:[:], publish_by_meta:[]) } + + conda (params.enable_conda ? "conda-forge::sed=4.7" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://containers.biocontainers.pro/s3/SingImgsRepo/biocontainers/v1.2.0_cv1/biocontainers_v1.2.0_cv1.img" + } else { + container "biocontainers/biocontainers:v1.2.0_cv1" + } + + input: + path archive + + output: + path "$gunzip", emit: gunzip + path "versions.yml" , emit: versions + + script: + gunzip = archive.toString() - '.gz' + """ + gunzip \\ + -f \\ + $options.args \\ + $archive + + cat <<-END_VERSIONS > versions.yml + ${getProcessName(task.process)}: + ${getSoftwareName(task.process)}: \$(echo \$(gunzip --version 2>&1) | sed 's/^.*(gzip) //; s/ Copyright.*\$//') + END_VERSIONS + """ +} diff --git a/modules/nf-core/modules/gunzip/meta.yml b/modules/nf-core/modules/gunzip/meta.yml new file mode 100644 index 00000000..3482f0d2 --- /dev/null +++ b/modules/nf-core/modules/gunzip/meta.yml @@ -0,0 +1,28 @@ +name: gunzip +description: Compresses and decompresses files. +keywords: + - gunzip + - compression +tools: + - gunzip: + description: | + gzip is a file format and a software application used for file compression and decompression. + documentation: https://www.gnu.org/software/gzip/manual/gzip.html + licence: ['GPL-3.0-or-later'] +input: + - archive: + type: file + description: File to be compressed/uncompressed + pattern: "*.*" +output: + - gunzip: + type: file + description: Compressed/uncompressed file + pattern: "*.*" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@joseespinosa" + - "@drpatelh" diff --git a/modules/nf-core/modules/kallistobustools/ref/functions.nf b/modules/nf-core/modules/kallistobustools/ref/functions.nf new file mode 100644 index 00000000..85628ee0 --- /dev/null +++ b/modules/nf-core/modules/kallistobustools/ref/functions.nf @@ -0,0 +1,78 @@ +// +// Utility functions used in nf-core DSL2 module files +// + +// +// Extract name of software tool from process name using $task.process +// +def getSoftwareName(task_process) { + return task_process.tokenize(':')[-1].tokenize('_')[0].toLowerCase() +} + +// +// Extract name of module from process name using $task.process +// +def getProcessName(task_process) { + return task_process.tokenize(':')[-1] +} + +// +// Function to initialise default values and to generate a Groovy Map of available options for nf-core modules +// +def initOptions(Map args) { + def Map options = [:] + options.args = args.args ?: '' + options.args2 = args.args2 ?: '' + options.args3 = args.args3 ?: '' + options.publish_by_meta = args.publish_by_meta ?: [] + options.publish_dir = args.publish_dir ?: '' + options.publish_files = args.publish_files + options.suffix = args.suffix ?: '' + return options +} + +// +// Tidy up and join elements of a list to return a path string +// +def getPathFromList(path_list) { + def paths = path_list.findAll { item -> !item?.trim().isEmpty() } // Remove empty entries + paths = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes + return paths.join('/') +} + +// +// Function to save/publish module results +// +def saveFiles(Map args) { + def ioptions = initOptions(args.options) + def path_list = [ ioptions.publish_dir ?: args.publish_dir ] + + // Do not publish versions.yml unless running from pytest workflow + if (args.filename.equals('versions.yml') && !System.getenv("NF_CORE_MODULES_TEST")) { + return null + } + if (ioptions.publish_by_meta) { + def key_list = ioptions.publish_by_meta instanceof List ? ioptions.publish_by_meta : args.publish_by_meta + for (key in key_list) { + if (args.meta && key instanceof String) { + def path = key + if (args.meta.containsKey(key)) { + path = args.meta[key] instanceof Boolean ? "${key}_${args.meta[key]}".toString() : args.meta[key] + } + path = path instanceof String ? path : '' + path_list.add(path) + } + } + } + if (ioptions.publish_files instanceof Map) { + for (ext in ioptions.publish_files) { + if (args.filename.endsWith(ext.key)) { + def ext_list = path_list.collect() + ext_list.add(ext.value) + return "${getPathFromList(ext_list)}/$args.filename" + } + } + } else if (ioptions.publish_files == null) { + return "${getPathFromList(path_list)}/$args.filename" + } +} diff --git a/modules/nf-core/modules/kallistobustools/ref/main.nf b/modules/nf-core/modules/kallistobustools/ref/main.nf new file mode 100644 index 00000000..a8287498 --- /dev/null +++ b/modules/nf-core/modules/kallistobustools/ref/main.nf @@ -0,0 +1,72 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName; getProcessName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process KALLISTOBUSTOOLS_REF { + tag "$fasta" + label 'process_medium' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:[:], publish_by_meta:[]) } + + conda (params.enable_conda ? 'bioconda::kb-python=0.26.3' : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/kb-python:0.26.3--pyhdfd78af_0" + } else { + container "quay.io/biocontainers/kb-python:0.26.3--pyhdfd78af_0" + } + + input: + path fasta + path gtf + val workflow + + output: + path "versions.yml" , emit: versions + path "kb_ref_out.idx" , emit: index + path "t2g.txt" , emit: t2g + path "cdna.fa" , emit: cdna + path "intron.fa" , optional:true, emit: intron + path "cdna_t2c.txt" , optional:true, emit: cdna_t2c + path "intron_t2c.txt" , optional:true, emit: intron_t2c + + script: + if (workflow == "standard") { + """ + kb \\ + ref \\ + -i kb_ref_out.idx \\ + -g t2g.txt \\ + -f1 cdna.fa \\ + --workflow $workflow \\ + $fasta \\ + $gtf + + cat <<-END_VERSIONS > versions.yml + ${getProcessName(task.process)}: + ${getSoftwareName(task.process)}: \$(echo \$(kb --version 2>&1) | sed 's/^.*kb_python //;s/positional arguments.*\$//') + END_VERSIONS + """ + } else { + """ + kb \\ + ref \\ + -i kb_ref_out.idx \\ + -g t2g.txt \\ + -f1 cdna.fa \\ + -f2 intron.fa \\ + -c1 cdna_t2c.txt \\ + -c2 intron_t2c.txt \\ + --workflow $workflow \\ + $fasta \\ + $gtf + + cat <<-END_VERSIONS > versions.yml + ${getProcessName(task.process)}: + ${getSoftwareName(task.process)}: \$(echo \$(kb --version 2>&1) | sed 's/^.*kb_python //;s/positional arguments.*\$//') + END_VERSIONS + """ + } +} diff --git a/modules/nf-core/modules/kallistobustools/ref/meta.yml b/modules/nf-core/modules/kallistobustools/ref/meta.yml new file mode 100644 index 00000000..353b9c11 --- /dev/null +++ b/modules/nf-core/modules/kallistobustools/ref/meta.yml @@ -0,0 +1,60 @@ +name: kallistobustools_ref +description: index creation for kb count quantification of single-cell data. +keywords: + - kallisto-bustools + - index +tools: + - kb: + description: kallisto|bustools (kb) is a tool developed for fast and efficient processing of single-cell OMICS data. + homepage: https://www.kallistobus.tools/ + documentation: https://kb-python.readthedocs.io/en/latest/index.html + tool_dev_url: https://github.com/pachterlab/kb_python + doi: "https://doi.org/10.22002/D1.1876" + licence: MIT License + +input: + - fasta: + type: file + description: Genomic DNA fasta file + pattern: "*.{fasta,fasta.gz}" + - gtf: + type: file + description: Genomic gtf file + pattern: "*.{gtf,gtf.gz}" + - workflow: + type: value + description: String value defining worfklow to use, can be one of "standard", "lamanno", "nucleus" + pattern: "{standard,lamanno,nucleus}" + +output: + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - kb_ref_idx: + type: file + description: Index file from kb ref. + pattern: "*.{idx}" + - t2g: + type: file + description: Transcript to gene table + pattern: "*t2g.{txt}" + - cdna: + type: file + description: Cdna fasta file + pattern: "*cdna.{fa}" + - intron: + type: file + description: intron fasta file + pattern: "*intron.{fa}" + - cdna_t2c: + type: file + description: cdna transcript to capture file + pattern: "*cdna_t2c.{txt}" + - intron_t2c: + type: file + description: intron transcript to capture file + pattern: "*intron_t2c.{txt}" + +authors: + - "@flowuenne" diff --git a/modules/nf-core/modules/multiqc/functions.nf b/modules/nf-core/modules/multiqc/functions.nf new file mode 100644 index 00000000..da9da093 --- /dev/null +++ b/modules/nf-core/modules/multiqc/functions.nf @@ -0,0 +1,68 @@ +// +// Utility functions used in nf-core DSL2 module files +// + +// +// Extract name of software tool from process name using $task.process +// +def getSoftwareName(task_process) { + return task_process.tokenize(':')[-1].tokenize('_')[0].toLowerCase() +} + +// +// Function to initialise default values and to generate a Groovy Map of available options for nf-core modules +// +def initOptions(Map args) { + def Map options = [:] + options.args = args.args ?: '' + options.args2 = args.args2 ?: '' + options.args3 = args.args3 ?: '' + options.publish_by_meta = args.publish_by_meta ?: [] + options.publish_dir = args.publish_dir ?: '' + options.publish_files = args.publish_files + options.suffix = args.suffix ?: '' + return options +} + +// +// Tidy up and join elements of a list to return a path string +// +def getPathFromList(path_list) { + def paths = path_list.findAll { item -> !item?.trim().isEmpty() } // Remove empty entries + paths = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes + return paths.join('/') +} + +// +// Function to save/publish module results +// +def saveFiles(Map args) { + if (!args.filename.endsWith('.version.txt')) { + def ioptions = initOptions(args.options) + def path_list = [ ioptions.publish_dir ?: args.publish_dir ] + if (ioptions.publish_by_meta) { + def key_list = ioptions.publish_by_meta instanceof List ? ioptions.publish_by_meta : args.publish_by_meta + for (key in key_list) { + if (args.meta && key instanceof String) { + def path = key + if (args.meta.containsKey(key)) { + path = args.meta[key] instanceof Boolean ? "${key}_${args.meta[key]}".toString() : args.meta[key] + } + path = path instanceof String ? path : '' + path_list.add(path) + } + } + } + if (ioptions.publish_files instanceof Map) { + for (ext in ioptions.publish_files) { + if (args.filename.endsWith(ext.key)) { + def ext_list = path_list.collect() + ext_list.add(ext.value) + return "${getPathFromList(ext_list)}/$args.filename" + } + } + } else if (ioptions.publish_files == null) { + return "${getPathFromList(path_list)}/$args.filename" + } + } +} diff --git a/modules/nf-core/modules/multiqc/main.nf b/modules/nf-core/modules/multiqc/main.nf new file mode 100644 index 00000000..da780800 --- /dev/null +++ b/modules/nf-core/modules/multiqc/main.nf @@ -0,0 +1,35 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process MULTIQC { + label 'process_medium' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:[:], publish_by_meta:[]) } + + conda (params.enable_conda ? "bioconda::multiqc=1.10.1" : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/multiqc:1.10.1--py_0" + } else { + container "quay.io/biocontainers/multiqc:1.10.1--py_0" + } + + input: + path multiqc_files + + output: + path "*multiqc_report.html", emit: report + path "*_data" , emit: data + path "*_plots" , optional:true, emit: plots + path "*.version.txt" , emit: version + + script: + def software = getSoftwareName(task.process) + """ + multiqc -f $options.args . + multiqc --version | sed -e "s/multiqc, version //g" > ${software}.version.txt + """ +} diff --git a/modules/nf-core/modules/multiqc/meta.yml b/modules/nf-core/modules/multiqc/meta.yml new file mode 100644 index 00000000..532a8bb1 --- /dev/null +++ b/modules/nf-core/modules/multiqc/meta.yml @@ -0,0 +1,39 @@ +name: MultiQC +description: Aggregate results from bioinformatics analyses across many samples into a single report +keywords: + - QC + - bioinformatics tools + - Beautiful stand-alone HTML report +tools: + - multiqc: + description: | + MultiQC searches a given directory for analysis logs and compiles a HTML report. + It's a general use tool, perfect for summarising the output from numerous bioinformatics tools. + homepage: https://multiqc.info/ + documentation: https://multiqc.info/docs/ +input: + - multiqc_files: + type: file + description: | + List of reports / files recognised by MultiQC, for example the html and zip output of FastQC +output: + - report: + type: file + description: MultiQC report file + pattern: "multiqc_report.html" + - data: + type: dir + description: MultiQC data dir + pattern: "multiqc_data" + - plots: + type: file + description: Plots created by MultiQC + pattern: "*_data" + - version: + type: file + description: File containing software version + pattern: "*.{version.txt}" +authors: + - "@abhi18av" + - "@bunop" + - "@drpatelh" diff --git a/modules/nf-core/modules/salmon/index/functions.nf b/modules/nf-core/modules/salmon/index/functions.nf new file mode 100644 index 00000000..85628ee0 --- /dev/null +++ b/modules/nf-core/modules/salmon/index/functions.nf @@ -0,0 +1,78 @@ +// +// Utility functions used in nf-core DSL2 module files +// + +// +// Extract name of software tool from process name using $task.process +// +def getSoftwareName(task_process) { + return task_process.tokenize(':')[-1].tokenize('_')[0].toLowerCase() +} + +// +// Extract name of module from process name using $task.process +// +def getProcessName(task_process) { + return task_process.tokenize(':')[-1] +} + +// +// Function to initialise default values and to generate a Groovy Map of available options for nf-core modules +// +def initOptions(Map args) { + def Map options = [:] + options.args = args.args ?: '' + options.args2 = args.args2 ?: '' + options.args3 = args.args3 ?: '' + options.publish_by_meta = args.publish_by_meta ?: [] + options.publish_dir = args.publish_dir ?: '' + options.publish_files = args.publish_files + options.suffix = args.suffix ?: '' + return options +} + +// +// Tidy up and join elements of a list to return a path string +// +def getPathFromList(path_list) { + def paths = path_list.findAll { item -> !item?.trim().isEmpty() } // Remove empty entries + paths = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes + return paths.join('/') +} + +// +// Function to save/publish module results +// +def saveFiles(Map args) { + def ioptions = initOptions(args.options) + def path_list = [ ioptions.publish_dir ?: args.publish_dir ] + + // Do not publish versions.yml unless running from pytest workflow + if (args.filename.equals('versions.yml') && !System.getenv("NF_CORE_MODULES_TEST")) { + return null + } + if (ioptions.publish_by_meta) { + def key_list = ioptions.publish_by_meta instanceof List ? ioptions.publish_by_meta : args.publish_by_meta + for (key in key_list) { + if (args.meta && key instanceof String) { + def path = key + if (args.meta.containsKey(key)) { + path = args.meta[key] instanceof Boolean ? "${key}_${args.meta[key]}".toString() : args.meta[key] + } + path = path instanceof String ? path : '' + path_list.add(path) + } + } + } + if (ioptions.publish_files instanceof Map) { + for (ext in ioptions.publish_files) { + if (args.filename.endsWith(ext.key)) { + def ext_list = path_list.collect() + ext_list.add(ext.value) + return "${getPathFromList(ext_list)}/$args.filename" + } + } + } else if (ioptions.publish_files == null) { + return "${getPathFromList(path_list)}/$args.filename" + } +} diff --git a/modules/nf-core/modules/salmon/index/main.nf b/modules/nf-core/modules/salmon/index/main.nf new file mode 100644 index 00000000..c3fcef01 --- /dev/null +++ b/modules/nf-core/modules/salmon/index/main.nf @@ -0,0 +1,53 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName; getProcessName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process SALMON_INDEX { + tag "$transcript_fasta" + label "process_medium" + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:'index', meta:[:], publish_by_meta:[]) } + + conda (params.enable_conda ? 'bioconda::salmon=1.5.2' : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/salmon:1.5.2--h84f40af_0" + } else { + container "quay.io/biocontainers/salmon:1.5.2--h84f40af_0" + } + + input: + path genome_fasta + path transcript_fasta + + output: + path "salmon" , emit: index + path "versions.yml" , emit: versions + + script: + def get_decoy_ids = "grep '^>' $genome_fasta | cut -d ' ' -f 1 > decoys.txt" + def gentrome = "gentrome.fa" + if (genome_fasta.endsWith('.gz')) { + get_decoy_ids = "grep '^>' <(gunzip -c $genome_fasta) | cut -d ' ' -f 1 > decoys.txt" + gentrome = "gentrome.fa.gz" + } + """ + $get_decoy_ids + sed -i.bak -e 's/>//g' decoys.txt + cat $transcript_fasta $genome_fasta > $gentrome + + salmon \\ + index \\ + --threads $task.cpus \\ + -t $gentrome \\ + -d decoys.txt \\ + $options.args \\ + -i salmon + cat <<-END_VERSIONS > versions.yml + ${getProcessName(task.process)}: + ${getSoftwareName(task.process)}: \$(echo \$(salmon --version) | sed -e "s/salmon //g") + END_VERSIONS + """ +} diff --git a/modules/nf-core/modules/salmon/index/meta.yml b/modules/nf-core/modules/salmon/index/meta.yml new file mode 100644 index 00000000..3b0cd853 --- /dev/null +++ b/modules/nf-core/modules/salmon/index/meta.yml @@ -0,0 +1,36 @@ +name: salmon_index +description: Create index for salmon +keywords: + - index + - fasta + - genome + - reference +tools: + - salmon: + description: | + Salmon is a tool for wicked-fast transcript quantification from RNA-seq data + homepage: https://salmon.readthedocs.io/en/latest/salmon.html + manual: https://salmon.readthedocs.io/en/latest/salmon.html + doi: 10.1038/nmeth.4197 + licence: ['GPL-3.0-or-later'] +input: + - genome_fasta: + type: file + description: Fasta file of the reference genome + - transcriptome_fasta: + type: file + description: Fasta file of the reference transcriptome + +output: + - index: + type: directory + description: Folder containing the star index files + pattern: "salmon" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + +authors: + - "@kevinmenden" + - "@drpatelh" diff --git a/modules/nf-core/modules/salmon/quant/functions.nf b/modules/nf-core/modules/salmon/quant/functions.nf new file mode 100644 index 00000000..85628ee0 --- /dev/null +++ b/modules/nf-core/modules/salmon/quant/functions.nf @@ -0,0 +1,78 @@ +// +// Utility functions used in nf-core DSL2 module files +// + +// +// Extract name of software tool from process name using $task.process +// +def getSoftwareName(task_process) { + return task_process.tokenize(':')[-1].tokenize('_')[0].toLowerCase() +} + +// +// Extract name of module from process name using $task.process +// +def getProcessName(task_process) { + return task_process.tokenize(':')[-1] +} + +// +// Function to initialise default values and to generate a Groovy Map of available options for nf-core modules +// +def initOptions(Map args) { + def Map options = [:] + options.args = args.args ?: '' + options.args2 = args.args2 ?: '' + options.args3 = args.args3 ?: '' + options.publish_by_meta = args.publish_by_meta ?: [] + options.publish_dir = args.publish_dir ?: '' + options.publish_files = args.publish_files + options.suffix = args.suffix ?: '' + return options +} + +// +// Tidy up and join elements of a list to return a path string +// +def getPathFromList(path_list) { + def paths = path_list.findAll { item -> !item?.trim().isEmpty() } // Remove empty entries + paths = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes + return paths.join('/') +} + +// +// Function to save/publish module results +// +def saveFiles(Map args) { + def ioptions = initOptions(args.options) + def path_list = [ ioptions.publish_dir ?: args.publish_dir ] + + // Do not publish versions.yml unless running from pytest workflow + if (args.filename.equals('versions.yml') && !System.getenv("NF_CORE_MODULES_TEST")) { + return null + } + if (ioptions.publish_by_meta) { + def key_list = ioptions.publish_by_meta instanceof List ? ioptions.publish_by_meta : args.publish_by_meta + for (key in key_list) { + if (args.meta && key instanceof String) { + def path = key + if (args.meta.containsKey(key)) { + path = args.meta[key] instanceof Boolean ? "${key}_${args.meta[key]}".toString() : args.meta[key] + } + path = path instanceof String ? path : '' + path_list.add(path) + } + } + } + if (ioptions.publish_files instanceof Map) { + for (ext in ioptions.publish_files) { + if (args.filename.endsWith(ext.key)) { + def ext_list = path_list.collect() + ext_list.add(ext.value) + return "${getPathFromList(ext_list)}/$args.filename" + } + } + } else if (ioptions.publish_files == null) { + return "${getPathFromList(path_list)}/$args.filename" + } +} diff --git a/modules/nf-core/modules/salmon/quant/main.nf b/modules/nf-core/modules/salmon/quant/main.nf new file mode 100644 index 00000000..7c2e0e17 --- /dev/null +++ b/modules/nf-core/modules/salmon/quant/main.nf @@ -0,0 +1,79 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName; getProcessName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process SALMON_QUANT { + tag "$meta.id" + label "process_medium" + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } + + conda (params.enable_conda ? 'bioconda::salmon=1.5.2' : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container "https://depot.galaxyproject.org/singularity/salmon:1.5.2--h84f40af_0" + } else { + container "quay.io/biocontainers/salmon:1.5.2--h84f40af_0" + } + + input: + tuple val(meta), path(reads) + path index + path gtf + path transcript_fasta + val alignment_mode + val lib_type + + output: + tuple val(meta), path("${prefix}"), emit: results + path "versions.yml" , emit: versions + + script: + prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}" + + def reference = "--index $index" + def input_reads = meta.single_end ? "-r $reads" : "-1 ${reads[0]} -2 ${reads[1]}" + if (alignment_mode) { + reference = "-t $transcript_fasta" + input_reads = "-a $reads" + } + + def strandedness_opts = [ + 'A', 'U', 'SF', 'SR', + 'IS', 'IU' , 'ISF', 'ISR', + 'OS', 'OU' , 'OSF', 'OSR', + 'MS', 'MU' , 'MSF', 'MSR' + ] + def strandedness = 'A' + if (lib_type) { + if (strandedness_opts.contains(lib_type)) { + strandedness = lib_type + } else { + log.info "[Salmon Quant] Invalid library type specified '--libType=${lib_type}', defaulting to auto-detection with '--libType=A'." + } + } else { + strandedness = meta.single_end ? 'U' : 'IU' + if (meta.strandedness == 'forward') { + strandedness = meta.single_end ? 'SF' : 'ISF' + } else if (meta.strandedness == 'reverse') { + strandedness = meta.single_end ? 'SR' : 'ISR' + } + } + """ + salmon quant \\ + --geneMap $gtf \\ + --threads $task.cpus \\ + --libType=$strandedness \\ + $reference \\ + $input_reads \\ + $options.args \\ + -o $prefix + + cat <<-END_VERSIONS > versions.yml + ${getProcessName(task.process)}: + ${getSoftwareName(task.process)}: \$(echo \$(salmon --version) | sed -e "s/salmon //g") + END_VERSIONS + """ +} diff --git a/modules/nf-core/modules/salmon/quant/meta.yml b/modules/nf-core/modules/salmon/quant/meta.yml new file mode 100644 index 00000000..223ca82b --- /dev/null +++ b/modules/nf-core/modules/salmon/quant/meta.yml @@ -0,0 +1,56 @@ +name: salmon_quant +description: gene/transcript quantification with Salmon +keywords: + - index + - fasta + - genome + - reference +tools: + - salmon: + description: | + Salmon is a tool for wicked-fast transcript quantification from RNA-seq data + homepage: https://salmon.readthedocs.io/en/latest/salmon.html + manual: https://salmon.readthedocs.io/en/latest/salmon.html + doi: 10.1038/nmeth.4197 + licence: ['GPL-3.0-or-later'] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - reads: + type: file + description: | + List of input FastQ files of size 1 and 2 for single-end and paired-end data, + respectively. + - index: + type: directory + description: Folder containing the star index files + - gtf: + type: file + description: GTF of the reference transcriptome + - transcriptome_fasta: + type: file + description: Fasta file of the reference transcriptome + - alignment_mode: + type: boolean + description: whether to run salmon in alignment mode + - lib_type: + type: string + description: | + Override library type inferred based on strandedness defined in meta object + +output: + - sample_output: + type: directory + description: Folder containing the quantification results for a specific sample + pattern: "${prefix}" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + +authors: + - "@kevinmenden" + - "@drpatelh" diff --git a/modules/nf-core/modules/star/genomegenerate/functions.nf b/modules/nf-core/modules/star/genomegenerate/functions.nf new file mode 100644 index 00000000..85628ee0 --- /dev/null +++ b/modules/nf-core/modules/star/genomegenerate/functions.nf @@ -0,0 +1,78 @@ +// +// Utility functions used in nf-core DSL2 module files +// + +// +// Extract name of software tool from process name using $task.process +// +def getSoftwareName(task_process) { + return task_process.tokenize(':')[-1].tokenize('_')[0].toLowerCase() +} + +// +// Extract name of module from process name using $task.process +// +def getProcessName(task_process) { + return task_process.tokenize(':')[-1] +} + +// +// Function to initialise default values and to generate a Groovy Map of available options for nf-core modules +// +def initOptions(Map args) { + def Map options = [:] + options.args = args.args ?: '' + options.args2 = args.args2 ?: '' + options.args3 = args.args3 ?: '' + options.publish_by_meta = args.publish_by_meta ?: [] + options.publish_dir = args.publish_dir ?: '' + options.publish_files = args.publish_files + options.suffix = args.suffix ?: '' + return options +} + +// +// Tidy up and join elements of a list to return a path string +// +def getPathFromList(path_list) { + def paths = path_list.findAll { item -> !item?.trim().isEmpty() } // Remove empty entries + paths = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes + return paths.join('/') +} + +// +// Function to save/publish module results +// +def saveFiles(Map args) { + def ioptions = initOptions(args.options) + def path_list = [ ioptions.publish_dir ?: args.publish_dir ] + + // Do not publish versions.yml unless running from pytest workflow + if (args.filename.equals('versions.yml') && !System.getenv("NF_CORE_MODULES_TEST")) { + return null + } + if (ioptions.publish_by_meta) { + def key_list = ioptions.publish_by_meta instanceof List ? ioptions.publish_by_meta : args.publish_by_meta + for (key in key_list) { + if (args.meta && key instanceof String) { + def path = key + if (args.meta.containsKey(key)) { + path = args.meta[key] instanceof Boolean ? "${key}_${args.meta[key]}".toString() : args.meta[key] + } + path = path instanceof String ? path : '' + path_list.add(path) + } + } + } + if (ioptions.publish_files instanceof Map) { + for (ext in ioptions.publish_files) { + if (args.filename.endsWith(ext.key)) { + def ext_list = path_list.collect() + ext_list.add(ext.value) + return "${getPathFromList(ext_list)}/$args.filename" + } + } + } else if (ioptions.publish_files == null) { + return "${getPathFromList(path_list)}/$args.filename" + } +} diff --git a/modules/nf-core/modules/star/genomegenerate/main.nf b/modules/nf-core/modules/star/genomegenerate/main.nf new file mode 100644 index 00000000..cbbccce1 --- /dev/null +++ b/modules/nf-core/modules/star/genomegenerate/main.nf @@ -0,0 +1,45 @@ +// Import generic module functions +include { initOptions; saveFiles; getSoftwareName } from './functions' + +params.options = [:] +options = initOptions(params.options) + +process STAR_GENOMEGENERATE { + tag "$fasta" + label 'process_high' + publishDir "${params.outdir}", + mode: params.publish_dir_mode, + saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:'index', meta:[:], publish_by_meta:[]) } + + conda (params.enable_conda ? 'bioconda::star=2.7.8a' : null) + if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { + container 'https://depot.galaxyproject.org/singularity/star:2.7.8a--h9ee0642_1' + } else { + container 'quay.io/biocontainers/star:2.7.8a--h9ee0642_1' + } + + input: + path fasta + path gtf + + output: + path "star" , emit: index + path "*.version.txt", emit: version + + script: + def software = getSoftwareName(task.process) + def memory = task.memory ? "--limitGenomeGenerateRAM ${task.memory.toBytes() - 100000000}" : '' + """ + mkdir star + STAR \\ + --runMode genomeGenerate \\ + --genomeDir star/ \\ + --genomeFastaFiles $fasta \\ + --sjdbGTFfile $gtf \\ + --runThreadN $task.cpus \\ + $memory \\ + $options.args + + STAR --version | sed -e "s/STAR_//g" > ${software}.version.txt + """ +} diff --git a/modules/nf-core/modules/star/genomegenerate/meta.yml b/modules/nf-core/modules/star/genomegenerate/meta.yml new file mode 100644 index 00000000..04ade195 --- /dev/null +++ b/modules/nf-core/modules/star/genomegenerate/meta.yml @@ -0,0 +1,37 @@ +name: star_genomegenerate +description: Create index for STAR +keywords: + - index + - fasta + - genome + - reference +tools: + - star: + description: | + STAR is a software package for mapping DNA sequences against + a large reference genome, such as the human genome. + homepage: https://github.com/alexdobin/STAR + manual: https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf + doi: 10.1093/bioinformatics/bts635 + licence: ['MIT'] +input: + - fasta: + type: file + description: Fasta file of the reference genome + - gtf: + type: file + description: GTF file of the reference genome + +output: + - index: + type: directory + description: Folder containing the star index files + pattern: "star" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + +authors: + - "@kevinmenden" + - "@drpatelh" diff --git a/nextflow.config b/nextflow.config index 838af2b3..f23baa01 100644 --- a/nextflow.config +++ b/nextflow.config @@ -1,209 +1,251 @@ /* - * ------------------------------------------------- - * nf-core/scrnaseq Nextflow config file - * ------------------------------------------------- - * Default config options for all environments. - */ +======================================================================================== + nf-core/scrnaseq Nextflow config file +======================================================================================== + Default config options for all compute environments +---------------------------------------------------------------------------------------- +*/ // Global default params, used in configs params { - aligner = 'alevin' - // 10x V3 chemistry is compatible with V2 - barcode_whitelist = false - bustools_correct = true - chemistry = 'V3' - fasta = false - genome = false - gtf = false - kallisto_gene_map = false - kallisto_index = false - outdir = './results' - input = null - input_paths = null - star_index = false - txp2gene = false - type = '10x' - skip_bustools = false - save_reference = false - - - //Genome Defaults - salmon_index = params.genome ? params.genomes[ params.genome ].salmon_index ?: false : false - fasta = params.genome ? params.genomes[ params.genome ].fasta ?: false : false - transcript_fasta = params.genome ? params.genomes[ params.genome ].transcript_fasta ?: false : false - gtf = params.genome ? params.genomes[ params.genome ].gtf ?: false : false - txp2gene = params.genome ? params.genomes[ params.genome ].txp2gene ?: false : false - - // Template Boilerplate options - multiqc_config = false - email = false - email_on_fail = false - max_multiqc_email_size = 25.MB - plaintext_email = false - monochrome_logs = false - help = false - igenomes_base = 's3://ngi-igenomes/igenomes/' - tracedir = "${params.outdir}/pipeline_info" - igenomes_ignore = false - custom_config_version = 'master' - custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}" - hostnames = false - config_profile_description = false - config_profile_contact = false - config_profile_url = false - validate_params = true - show_hidden_params = false - schema_ignore_params = 'genomes,input_paths' - publish_dir_mode = 'copy' - config_profile_name = false - - - // Defaults only, expecting to be overwritten - max_memory = 128.GB - max_cpus = 16 - max_time = 240.h + // generic options + aligner = 'alevin' + bustools_correct = true + outdir = './results' + input = '' + save_reference = false + protocol = '10XV3' + + // reference files + genome_fasta = false + genome = false + gtf = false + transcript_fasta = false + + // salmon alevin parameters + barcode_whitelist = false + txp2gene = false + + // kallist bustools parameters + kallisto_gene_map = false + kallisto_index = false + skip_bustools = false + + // STARsolo parameters + star_index = false + star_ignore_sjdbgtf = false + + //Genome Defaults + salmon_index = params.genome ? params.genomes[ params.genome ].salmon_index ?: false : false + genome_fasta = params.genome ? params.genomes[ params.genome ].fasta ?: false : false + transcript_fasta = params.genome ? params.genomes[ params.genome ].transcript_fasta ?: false : false + gtf = params.genome ? params.genomes[ params.genome ].gtf ?: false : false + txp2gene = params.genome ? params.genomes[ params.genome ].txp2gene ?: false : false + + // Template Boilerplate options + multiqc_config = false + skip_multiqc = false + email = false + email_on_fail = false + max_multiqc_email_size = 25.MB + plaintext_email = false + monochrome_logs = false + help = false + igenomes_base = 's3://ngi-igenomes/igenomes/' + tracedir = "${params.outdir}/pipeline_info" + igenomes_ignore = false + + + validate_params = true + show_hidden_params = false + schema_ignore_params = 'genomes,input_paths' + publish_dir_mode = 'copy' + config_profile_name = false + enable_conda = false + seq_center = false + + // References + genome = null + igenomes_base = 's3://ngi-igenomes/igenomes' + igenomes_ignore = false + + // MultiQC options + multiqc_config = null + multiqc_title = null + max_multiqc_email_size = '25.MB' + + // Boilerplate options + outdir = './results' + tracedir = "${params.outdir}/pipeline_info" + publish_dir_mode = 'copy' + email = null + email_on_fail = null + plaintext_email = false + monochrome_logs = false + help = false + validate_params = true + show_hidden_params = false + schema_ignore_params = 'genomes,modules' + enable_conda = false + singularity_pull_docker_container = false + + // Config options + custom_config_version = 'master' + custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}" + hostnames = [:] + config_profile_description = null + config_profile_contact = null + config_profile_url = null + config_profile_name = null + + // Max resource options + // Defaults only, expecting to be overwritten + max_memory = '128.GB' + max_cpus = 16 + max_time = '240.h' } -// Container slug. Stable releases should specify release tag! -// Developmental code should specify :dev -process.container = 'nfcore/scrnaseq:1.1.0' - // Load base.config by default for all pipelines includeConfig 'conf/base.config' +// Load modules.config for DSL2 module specific options +includeConfig 'conf/modules.config' + // Load nf-core custom profiles from different Institutions try { - includeConfig "${params.custom_config_base}/nfcore_custom.config" + includeConfig "${params.custom_config_base}/nfcore_custom.config" } catch (Exception e) { - System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/nfcore_custom.config") -} - -profiles { - conda { - docker.enabled = false - singularity.enabled = false - podman.enabled = false - shifter.enabled = false - charliecloud = false - process.conda = "$projectDir/environment.yml" - } - debug { process.beforeScript = 'echo $HOSTNAME' } - docker { - docker.enabled = true - fixOwnership = true - runOptions = "-u \$(id -u):\$(id -g)" - singularity.enabled = false - podman.enabled = false - shifter.enabled = false - charliecloud.enabled = false - } - singularity { - docker.enabled = false - singularity.enabled = true - podman.enabled = false - shifter.enabled = false - charliecloud.enabled = false - singularity.autoMounts = true - } - podman { - singularity.enabled = false - docker.enabled = false - podman.enabled = true - shifter.enabled = false - charliecloud = false - } - shifter { - singularity.enabled = false - docker.enabled = false - podman.enabled = false - shifter.enabled = true - charliecloud.enabled = false - } - charliecloud { - singularity.enabled = false - docker.enabled = false - podman.enabled = false - shifter.enabled = false - charliecloud.enabled = true - } - test { includeConfig 'conf/test.config' } - test_kallisto { includeConfig 'conf/test_kallisto.config'} - test_full { includeConfig 'conf/test_full.config' } + System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/nfcore_custom.config") } // Load igenomes.config if required if (!params.igenomes_ignore) { - includeConfig 'conf/igenomes.config' + includeConfig 'conf/igenomes.config' +} else { + params.genomes = [:] +} + +profiles { + debug { process.beforeScript = 'echo $HOSTNAME' } + conda { + params.enable_conda = true + docker.enabled = false + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + docker { + docker.enabled = true + docker.userEmulation = true + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + singularity { + singularity.enabled = true + singularity.autoMounts = true + docker.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + podman { + podman.enabled = true + docker.enabled = false + singularity.enabled = false + shifter.enabled = false + charliecloud.enabled = false + } + shifter { + shifter.enabled = true + docker.enabled = false + singularity.enabled = false + podman.enabled = false + charliecloud.enabled = false + } + charliecloud { + charliecloud.enabled = true + docker.enabled = false + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + } + test { includeConfig 'conf/test.config' } + test_kallisto { includeConfig 'conf/test_kallisto.config'} + test_full { includeConfig 'conf/test_full.config' } } // Export these variables to prevent local Python/R libraries from conflicting with those in the container env { - PYTHONNOUSERSITE = 1 - R_PROFILE_USER = "/.Rprofile" - R_ENVIRON_USER = "/.Renviron" + PYTHONNOUSERSITE = 1 + R_PROFILE_USER = "/.Rprofile" + R_ENVIRON_USER = "/.Renviron" } // Capture exit codes from upstream processes when piping process.shell = ['/bin/bash', '-euo', 'pipefail'] +def trace_timestamp = new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss') timeline { - enabled = true - file = "${params.tracedir}/execution_timeline.html" + enabled = true + file = "${params.tracedir}/execution_timeline_${trace_timestamp}.html" } report { - enabled = true - file = "${params.tracedir}/execution_report.html" + enabled = true + file = "${params.tracedir}/execution_report_${trace_timestamp}.html" } trace { - enabled = true - file = "${params.tracedir}/execution_trace.txt" + enabled = true + file = "${params.tracedir}/execution_trace_${trace_timestamp}.txt" } dag { - enabled = true - file = "${params.tracedir}/pipeline_dag.svg" + enabled = true + file = "${params.tracedir}/pipeline_dag_${trace_timestamp}.svg" } manifest { - name = 'nf-core/scrnaseq' - author = 'Peter J Bailey, Alexander Peltzer, Olga Botvinnik' - homePage = 'https://github.com/nf-core/scrnaseq' - description = 'Pipeline for processing of 10xGenomics single cell rnaseq data' - mainScript = 'main.nf' - nextflowVersion = '>=20.10.0' - version = '1.1.0' + name = 'nf-core/scrnaseq' + author = 'Peter J Bailey, Alexander Peltzer, Olga Botvinnik' + homePage = 'https://github.com/nf-core/scrnaseq' + description = 'Pipeline for processing of 10xGenomics single cell rnaseq data' + mainScript = 'main.nf' + nextflowVersion = '!>=21.04.0' + version = '2.0dev' } // Function to ensure that resource requirements don't go beyond // a maximum limit def check_max(obj, type) { - if (type == 'memory') { - try { - if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1) - return params.max_memory as nextflow.util.MemoryUnit - else - return obj - } catch (all) { - println " ### ERROR ### Max memory '${params.max_memory}' is not valid! Using default value: $obj" - return obj - } - } else if (type == 'time') { - try { - if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1) - return params.max_time as nextflow.util.Duration - else - return obj - } catch (all) { - println " ### ERROR ### Max time '${params.max_time}' is not valid! Using default value: $obj" - return obj - } - } else if (type == 'cpus') { - try { - return Math.min( obj, params.max_cpus as int ) - } catch (all) { - println " ### ERROR ### Max cpus '${params.max_cpus}' is not valid! Using default value: $obj" - return obj + if (type == 'memory') { + try { + if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1) + return params.max_memory as nextflow.util.MemoryUnit + else + return obj + } catch (all) { + println " ### ERROR ### Max memory '${params.max_memory}' is not valid! Using default value: $obj" + return obj + } + } else if (type == 'time') { + try { + if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1) + return params.max_time as nextflow.util.Duration + else + return obj + } catch (all) { + println " ### ERROR ### Max time '${params.max_time}' is not valid! Using default value: $obj" + return obj + } + } else if (type == 'cpus') { + try { + return Math.min( obj, params.max_cpus as int ) + } catch (all) { + println " ### ERROR ### Max cpus '${params.max_cpus}' is not valid! Using default value: $obj" + return obj + } } - } } diff --git a/nextflow_schema.json b/nextflow_schema.json index 32d3a15c..ca88dc50 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -17,15 +17,9 @@ "description": "Input FastQ files", "help_text": "Use this to specify the location of your input FastQ files. For example:\n\n```bash\n--input 'path/to/data/sample_*_{1,2}.fastq'\n```\n\nPlease note the following requirements:\n\n1. The path must be enclosed in quotes\n2. The path must have at least one `*` wildcard character\n3. When using the pipeline with paired end data, the path must use `{1,2}` notation to specify read pairs.\n\nIf left unspecified, a default pattern is used: `data/*{1,2}.fastq.gz`" }, - "input_paths": { - "type": "string", - "hidden": true, - "description": "Input FastQ files path in array format", - "fa_icon": "fas fa-dna" - }, "outdir": { "type": "string", - "description": "The output directory where the results will be saved.", + "description": "Path to the output directory where the results will be saved.", "default": "./results", "fa_icon": "fas fa-folder-open" }, @@ -35,6 +29,11 @@ "fa_icon": "fas fa-envelope", "help_text": "Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (`~/.nextflow/config`) then you don't need to specify this on the command line for every run.", "pattern": "^([a-zA-Z0-9_\\-\\.]+)@([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5})$" + }, + "multiqc_title": { + "type": "string", + "description": "MultiQC report title. Printed as page header, used for filename if not otherwise specified.", + "fa_icon": "fas fa-file-signature" } } }, @@ -44,19 +43,6 @@ "description": "", "default": "", "properties": { - "type": { - "type": "string", - "description": "Name of droplet technology (Currently supported: 10x, Drop-Seq, inDrop, etc may be supported in the future.)", - "fa_icon": "fas fa-align-center", - "default": "10x" - }, - "chemistry": { - "type": "string", - "description": "Version of 10x chemistry", - "default": "V3", - "help_text": "To specify which chemistry (and thus barcode whitelist) to use, use the `--chemistry` flag. For example, to specify V3 chemistry (the default, as it is compatible with V2), use `--chemistry V3`.\n\nThese files were copied out of 10x Genomics' [cellranger](https://github.com/10XGenomics/cellranger) `cellranger/lib/python/cellranger/barcodes`, in some cases gzipped for simplicity across versions, and copied to `assets/whitelist`.\n\n- V1: `737K-april-2014_rc.txt` --> gzipped --> `10x_V1_barcode_whitelist.txt.gz`\n- V2: `737K-august-2016.txt` --> gzipped --> `10x_V2_barcode_whitelist.txt.gz`\n- V3: `3M-february-2018.txt.gz` --> `10x_V3_barcode_whitelist.txt.gz`", - "fa_icon": "fas fa-atom" - }, "barcode_whitelist": { "type": "string", "description": "If not using the 10X Genomics platform, a custom barcode whitelist can be used with `--barcode_whitelist`.", @@ -68,6 +54,17 @@ "default": "alevin", "help_text": "The workflow can handle three types of methods:\n\n- Kallisto/Bustools\n- Salmon Alevin + AlevinQC\n- STARsolo\n\nTo choose which one to use, please specify either `alevin`, `star` or `kallisto` as a parameter option for `--aligner`. By default, the pipeline runs the `alevin` option. Note that specifying another aligner option also requires choosing appropriate parameters (see below) for the selected option.", "fa_icon": "fas fa-align-center" + }, + "protocol": { + "type": "string", + "default": "10XV2", + "fa_icon": "fas fa-cogs", + "enum": [ + "10XV3", + "10XV2", + "10XV1", + "dropseq" + ] } }, "fa_icon": "fas fa-terminal" @@ -76,24 +73,28 @@ "title": "Reference genome options", "type": "object", "fa_icon": "fas fa-dna", - "description": "Options for the reference genome indices used to align reads.", + "description": "Reference genome related files and options required for the workflow.", "properties": { "genome": { "type": "string", "description": "Name of iGenomes reference.", "fa_icon": "fas fa-book", - "help_text": "If using a reference genome configured in the pipeline using iGenomes, use this parameter to give the ID for the reference. This is then used to build the full paths for all required reference genome files e.g. `--genome GRCh38`.\n\nSee the [nf-core website docs](https://nf-co.re/usage/reference_genomes) for more details." + "help_text": "If using a reference genome configured in the pipeline using iGenomes, use this parameter to give the ID for the reference. This is then used to build the full paths for all required reference genome files e.g. `--genome GRCh38`. \n\nSee the [nf-core website docs](https://nf-co.re/usage/reference_genomes) for more details." }, - "fasta": { + "genome_fasta": { "type": "string", - "fa_icon": "fas fa-font", + "format": "file-path", + "mimetype": "text/plain", + "pattern": "^\\S+\\.fn?a(sta)?(\\.gz)?$", "description": "Path to FASTA genome file.", - "help_text": "If you have no genome reference available, the pipeline can build one using a FASTA file. This requires additional time and resources, so it's better to use a pre-build index if possible." + "help_text": "This parameter is *mandatory* if `--genome` is not specified. If you don't have a BWA index available this will be generated for you automatically. Combine with `--save_reference` to save BWA index for future runs.", + "fa_icon": "far fa-file-code" }, "igenomes_base": { "type": "string", + "format": "directory-path", "description": "Directory / URL base for iGenomes references.", - "default": "s3://ngi-igenomes/igenomes/", + "default": "s3://ngi-igenomes/igenomes", "fa_icon": "fas fa-cloud-download-alt", "hidden": true }, @@ -151,6 +152,9 @@ "description": "Specify a path to the precomputed STAR index.", "help_text": "> NB: This has to be computed with STAR Version 2.7 or later, as STARsolo was only first supported by STAR Version 2.7.", "fa_icon": "fas fa-asterisk" + }, + "star_ignore_sjdbgtf": { + "type": "string" } }, "fa_icon": "fas fa-star" @@ -184,6 +188,95 @@ } } }, + "institutional_config_options": { + "title": "Institutional config options", + "type": "object", + "fa_icon": "fas fa-university", + "description": "Parameters used to describe centralised config profiles. These should not be edited.", + "help_text": "The centralised nf-core configuration profiles use a handful of pipeline parameters to describe themselves. This information is then printed to the Nextflow log when you run a pipeline. You should not need to change these values when you run a pipeline.", + "properties": { + "custom_config_version": { + "type": "string", + "description": "Git commit id for Institutional configs.", + "default": "master", + "hidden": true, + "fa_icon": "fas fa-users-cog" + }, + "custom_config_base": { + "type": "string", + "description": "Base directory for Institutional configs.", + "default": "https://raw.githubusercontent.com/nf-core/configs/master", + "hidden": true, + "help_text": "If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.", + "fa_icon": "fas fa-users-cog" + }, + "hostnames": { + "type": "string", + "description": "Institutional configs hostname.", + "hidden": true, + "fa_icon": "fas fa-users-cog" + }, + "config_profile_name": { + "type": "string", + "description": "Institutional config name.", + "hidden": true, + "fa_icon": "fas fa-users-cog" + }, + "config_profile_description": { + "type": "string", + "description": "Institutional config description.", + "hidden": true, + "fa_icon": "fas fa-users-cog" + }, + "config_profile_contact": { + "type": "string", + "description": "Institutional config contact information.", + "hidden": true, + "fa_icon": "fas fa-users-cog" + }, + "config_profile_url": { + "type": "string", + "description": "Institutional config URL link.", + "hidden": true, + "fa_icon": "fas fa-users-cog" + } + } + }, + "max_job_request_options": { + "title": "Max job request options", + "type": "object", + "fa_icon": "fab fa-acquisitions-incorporated", + "description": "Set the top limit for requested resources for any single job.", + "help_text": "If you are running on a smaller system, a pipeline step requesting more resources than are available may cause the Nextflow to stop the run with an error. These options allow you to cap the maximum resources requested by any single job so that the pipeline will run on your system.\n\nNote that you can not _increase_ the resources requested by any job using these options. For that you will need your own configuration file. See [the nf-core website](https://nf-co.re/usage/configuration) for details.", + "properties": { + "max_cpus": { + "type": "integer", + "description": "Maximum number of CPUs that can be requested for any single job.", + "default": 16, + "fa_icon": "fas fa-microchip", + "hidden": true, + "help_text": "Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. `--max_cpus 1`" + }, + "max_memory": { + "type": "string", + "description": "Maximum amount of memory that can be requested for any single job.", + "default": "128.GB", + "fa_icon": "fas fa-memory", + "pattern": "^\\d+(\\.\\d+)?\\.?\\s*(K|M|G|T)?B$", + "hidden": true, + "help_text": "Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. `--max_memory '8.GB'`" + }, + "max_time": { + "type": "string", + "description": "Maximum amount of time that can be requested for any single job.", + "default": "240.h", + "fa_icon": "far fa-clock", + "pattern": "^(\\d+\\.?\\s*(s|m|h|day)\\s*)+$", + "hidden": true, + "help_text": "Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. `--max_time '2.h'`" + } + } + }, "generic_options": { "title": "Generic options", "type": "object", @@ -194,13 +287,12 @@ "help": { "type": "boolean", "description": "Display help text.", - "hidden": true, - "fa_icon": "fas fa-question-circle" + "fa_icon": "fas fa-question-circle", + "hidden": true }, "publish_dir_mode": { "type": "string", "default": "copy", - "hidden": true, "description": "Method used to save pipeline results to output directory.", "help_text": "The Nextflow `publishDir` option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See [Nextflow docs](https://www.nextflow.io/docs/latest/process.html#publishdir) for details.", "fa_icon": "fas fa-copy", @@ -211,13 +303,7 @@ "copy", "copyNoFollow", "move" - ] - }, - "validate_params": { - "type": "boolean", - "description": "Boolean whether to validate parameters against the schema at runtime", - "default": true, - "fa_icon": "fas fa-check-square", + ], "hidden": true }, "email_on_fail": { @@ -225,30 +311,28 @@ "description": "Email address for completion summary, only when pipeline fails.", "fa_icon": "fas fa-exclamation-triangle", "pattern": "^([a-zA-Z0-9_\\-\\.]+)@([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5})$", - "hidden": true, - "help_text": "This works exactly as with `--email`, except emails are only sent if the workflow is not successful." + "help_text": "An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.", + "hidden": true }, "plaintext_email": { "type": "boolean", "description": "Send plain-text email instead of HTML.", "fa_icon": "fas fa-remove-format", - "hidden": true, - "help_text": "Set to receive plain-text e-mails instead of HTML formatted." + "hidden": true }, "max_multiqc_email_size": { "type": "string", "description": "File size limit when attaching MultiQC reports to summary emails.", + "pattern": "^\\d+(\\.\\d+)?\\.?\\s*(K|M|G|T)?B$", "default": "25.MB", "fa_icon": "fas fa-file-upload", - "hidden": true, - "help_text": "If file generated by pipeline exceeds the threshold, it will not be attached." + "hidden": true }, "monochrome_logs": { "type": "boolean", "description": "Do not use coloured log outputs.", "fa_icon": "fas fa-palette", - "hidden": true, - "help_text": "Set to disable colourful command line output and live life in monochrome." + "hidden": true }, "multiqc_config": { "type": "string", @@ -263,102 +347,32 @@ "fa_icon": "fas fa-cogs", "hidden": true }, + "validate_params": { + "type": "boolean", + "description": "Boolean whether to validate parameters against the schema at runtime", + "default": true, + "fa_icon": "fas fa-check-square", + "hidden": true + }, "show_hidden_params": { "type": "boolean", "fa_icon": "far fa-eye-slash", "description": "Show all params when using `--help`", "hidden": true, "help_text": "By default, parameters set as _hidden_ in the schema are not shown on the command line when a user runs with `--help`. Specifying this option will tell the pipeline to show all parameters." - } - } - }, - "max_job_request_options": { - "title": "Max job request options", - "type": "object", - "fa_icon": "fab fa-acquisitions-incorporated", - "description": "Set the top limit for requested resources for any single job.", - "help_text": "If you are running on a smaller system, a pipeline step requesting more resources than are available may cause the Nextflow to stop the run with an error. These options allow you to cap the maximum resources requested by any single job so that the pipeline will run on your system.\n\nNote that you can not _increase_ the resources requested by any job using these options. For that you will need your own configuration file. See [the nf-core website](https://nf-co.re/usage/configuration) for details.", - "properties": { - "max_cpus": { - "type": "integer", - "description": "Maximum number of CPUs that can be requested for any single job.", - "default": 16, - "fa_icon": "fas fa-microchip", - "hidden": true, - "help_text": "Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. `--max_cpus 1`" }, - "max_memory": { - "type": "string", - "description": "Maximum amount of memory that can be requested for any single job.", - "default": "128.GB", - "fa_icon": "fas fa-memory", - "pattern": "^[\\d\\.]+\\s*.(K|M|G|T)?B$", - "hidden": true, - "help_text": "Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. `--max_memory '8.GB'`" - }, - "max_time": { - "type": "string", - "description": "Maximum amount of time that can be requested for any single job.", - "default": "240.h", - "fa_icon": "far fa-clock", - "pattern": "^[\\d\\.]+\\.*(s|m|h|d)$", - "hidden": true, - "help_text": "Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. `--max_time '2.h'`" - } - } - }, - "institutional_config_options": { - "title": "Institutional config options", - "type": "object", - "fa_icon": "fas fa-university", - "description": "Parameters used to describe centralised config profiles. These should not be edited.", - "help_text": "The centralised nf-core configuration profiles use a handful of pipeline parameters to describe themselves. This information is then printed to the Nextflow log when you run a pipeline. You should not need to change these values when you run a pipeline.", - "properties": { - "custom_config_version": { - "type": "string", - "description": "Git commit id for Institutional configs.", - "default": "master", - "hidden": true, - "fa_icon": "fas fa-users-cog", - "help_text": "Provide git commit id for custom Institutional configs hosted at `nf-core/configs`. This was implemented for reproducibility purposes. Default: `master`.\n\n```bash\n## Download and use config file with following git commit id\n--custom_config_version d52db660777c4bf36546ddb188ec530c3ada1b96\n```" - }, - "custom_config_base": { - "type": "string", - "description": "Base directory for Institutional configs.", - "default": "https://raw.githubusercontent.com/nf-core/configs/master", - "hidden": true, - "help_text": "If you're running offline, nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell nextflow where to find them with the `custom_config_base` option. For example:\n\n```bash\n## Download and unzip the config files\ncd /path/to/my/configs\nwget https://github.com/nf-core/configs/archive/master.zip\nunzip master.zip\n\n## Run the pipeline\ncd /path/to/my/data\nnextflow run /path/to/pipeline/ --custom_config_base /path/to/my/configs/configs-master/\n```\n\n> Note that the nf-core/tools helper package has a `download` command to download all required pipeline files + singularity containers + institutional configs in one go for you, to make this process easier.", - "fa_icon": "fas fa-users-cog" - }, - "hostnames": { - "type": "string", - "description": "Institutional configs hostname.", - "hidden": true, - "fa_icon": "fas fa-users-cog" - }, - "config_profile_name": { - "type": "string", - "description": "Institutional config name.", - "hidden": true, - "fa_icon": "fas fa-users-cog" - }, - "config_profile_description": { - "type": "string", - "description": "Institutional config description.", - "hidden": true, - "fa_icon": "fas fa-users-cog" - }, - "config_profile_contact": { - "type": "string", - "description": "Institutional config contact information.", + "enable_conda": { + "type": "boolean", + "description": "Run this workflow with Conda. You can also use '-profile conda' instead of providing this parameter.", "hidden": true, - "fa_icon": "fas fa-users-cog" + "fa_icon": "fas fa-bacon" }, - "config_profile_url": { - "type": "string", - "description": "Institutional config URL link.", + "singularity_pull_docker_container": { + "type": "boolean", + "description": "Instead of directly downloading Singularity images for use with Singularity, force the workflow to pull and convert Docker containers instead.", "hidden": true, - "fa_icon": "fas fa-users-cog" + "fa_icon": "fas fa-toolbox", + "help_text": "This may be useful for example if you are unable to directly pull Singularity containers to run the pipeline due to http/https proxy issues." } } } @@ -383,13 +397,21 @@ "$ref": "#/definitions/kallisto_bus_options" }, { - "$ref": "#/definitions/generic_options" + "$ref": "#/definitions/institutional_config_options" }, { "$ref": "#/definitions/max_job_request_options" }, { - "$ref": "#/definitions/institutional_config_options" + "$ref": "#/definitions/generic_options" + } + ], + "properties": { + "skip_multiqc": { + "type": "string" + }, + "seq_center": { + "type": "string" } - ] -} + } +} \ No newline at end of file diff --git a/subworkflows/local/align_cellranger.nf b/subworkflows/local/align_cellranger.nf new file mode 100644 index 00000000..6a562da2 --- /dev/null +++ b/subworkflows/local/align_cellranger.nf @@ -0,0 +1,37 @@ +/* + * Alignment with Cellranger + */ + +params.cellranger_mkgtf_options = [:] +params.cellranger_mkref_options = [:] +params.cellranger_count_options = [:] + +include {METADATA} from "../../modules/local/software/metadata/main.nf" addParams(options: params.cellranger_mkgtf_options) +include {CELLRANGER_MKGTF} from "../../modules/local/software/cellranger/mkgtf/main.nf" addParams(options: params.cellranger_mkgtf_options) +include {CELLRANGER_MKREF} from "../../modules/local/software/cellranger/mkref/main.nf" addParams(options: params.cellranger_mkref_options) +include {CELLRANGER_COUNT} from "../../modules/local/software/cellranger/count/main.nf" addParams(options: params.cellranger_count_options) + +// Define workflow to subset and index a genome region fasta file +workflow CELLRANGER_ALIGN { + take: + fasta + gtf + samplesheet + + main: + // Set sample channel from samplesheet input + METADATA ( samplesheet ) + + // Filter GTF based on gene biotypes passed in params.modules + CELLRANGER_MKGTF ( gtf ) + + // Make reference genome + CELLRANGER_MKREF ( CELLRANGER_MKGTF.out.gtf, fasta ) + + // Obtain read counts + CELLRANGER_COUNT ( METADATA.out, CELLRANGER_MKREF.out.reference_genome ) + + emit: + read_counts = CELLRANGER_COUNT.out.read_counts + cellranger_out = CELLRANGER_COUNT.out.cellranger_out +} diff --git a/subworkflows/local/input_check.nf b/subworkflows/local/input_check.nf new file mode 100644 index 00000000..ccfdee39 --- /dev/null +++ b/subworkflows/local/input_check.nf @@ -0,0 +1,45 @@ +// +// Check input samplesheet and get read channels +// + +params.options = [:] + +include { SAMPLESHEET_CHECK } from '../../modules/local/samplesheet_check' addParams( options: params.options ) + +workflow INPUT_CHECK { + take: + samplesheet // file: /path/to/samplesheet.csv + + main: + SAMPLESHEET_CHECK ( samplesheet ) + SAMPLESHEET_CHECK + .out + .splitCsv ( header:true, sep:',' ) + .map { create_fastq_channels(it) } + .set { sample_info } + + emit: + sample_info // channel: [ val(meta), [ reads ] ] +} + + +// Function to get list of [ meta, [ fastq_1, fastq_2 ] ] +def create_fastq_channels(LinkedHashMap row) { + def meta = [:] + meta.id = row.sample + meta.single_end = row.single_end.toBoolean() + + def array = [] + if (!file(row.fastq_1).exists()) { + exit 1, "ERROR: Please check input samplesheet -> Read 1 FastQ file does not exist!\n${row.fastq_1}" + } + if (meta.single_end) { + array = [ meta, [ file(row.fastq_1) ] ] + } else { + if (!file(row.fastq_2).exists()) { + exit 1, "ERROR: Please check input samplesheet -> Read 2 FastQ file does not exist!\n${row.fastq_2}" + } + array = [ meta, [ file(row.fastq_1), file(row.fastq_2) ] ] + } + return array +} diff --git a/workflows/alevin.nf b/workflows/alevin.nf new file mode 100644 index 00000000..ad035626 --- /dev/null +++ b/workflows/alevin.nf @@ -0,0 +1,221 @@ +//////////////////////////////////////////////////// +/* -- SALMON ALEVIN WORKFLOW -- */ +//////////////////////////////////////////////////// + +params.summary_params = [:] + +//////////////////////////////////////////////////// +/* -- Collect configuration parameters -- */ +//////////////////////////////////////////////////// + +// Check if genome exists in the config file +if (params.genomes && params.genome && !params.genomes.containsKey(params.genome)) { + exit 1, "The provided genome '${params.genome}' is not available in the iGenomes file. Currently the available genomes are ${params.genomes.keySet().join(', ')}" +} + +//Check if GTF is supplied properly +if( params.gtf ){ + Channel + .fromPath(params.gtf) + .ifEmpty { exit 1, "GTF annotation file not found: ${params.gtf}" } + .set { gtf } +} + +// Check if TXP2Gene is provided for Alevin +if (!params.gtf && !params.txp2gene){ + exit 1, "Must provide either a GTF file ('--gtf') or transcript to gene mapping ('--txp2gene') to align with Alevin" +} + +//Setup FastA channels +if( params.genome_fasta ){ + Channel + .fromPath(params.genome_fasta) + .ifEmpty { exit 1, "Fasta file not found: ${params.genome_fasta}" } + .set { genome_fasta } +} + +//Setup Transcript FastA channels +if( params.transcript_fasta ){ + Channel + .fromPath(params.transcript_fasta) + .ifEmpty { exit 1, "Fasta file not found: ${params.transcript_fasta}" } + .set { transcriptome_fasta } +} + +// Check if files for index building are given if no index is specified +if (!params.salmon_index && (!params.genome_fasta)) { + exit 1, "Must provide a genome fasta file ('--genome_fasta') or a transcript fasta ('--transcript_fasta') if no index is given!" +} + +//Setup channel for salmon index if specified +if (params.salmon_index) { + Channel + .fromPath(params.salmon_index) + .ifEmpty { exit 1, "Salmon index not found: ${params.salmon_index}" } + .set { salmon_index_alevin } +} + +// Create a channel for input read files +if (params.input) { ch_input = file(params.input) } else { exit 1, 'Input samplesheet file not specified!' } + +// Check if txp2gene file has been provided +if (params.txp2gene){ + Channel + .fromPath(params.txp2gene) + .set{ ch_txp2gene } +} + +// Stage config files +ch_multiqc_config = file("$projectDir/assets/multiqc_config.yaml", checkIfExists: true) +ch_multiqc_custom_config = params.multiqc_config ? Channel.fromPath(params.multiqc_config, checkIfExists: true) : Channel.empty() +ch_output_docs = file("$projectDir/docs/output.md", checkIfExists: true) +ch_output_docs_images = file("$projectDir/docs/images/", checkIfExists: true) + +// Get the protocol parameter +(protocol, chemistry) = Workflow.formatProtocol(params.protocol, "alevin") + +//Whitelist files for STARsolo and Kallisto +whitelist_folder = "$baseDir/assets/whitelist/" + +//Automatically set up proper filepaths to the barcode whitelist files bundled with the pipeline +if (params.protocol.contains("10X") && !params.barcode_whitelist){ + barcode_filename = "$whitelist_folder/10x_${chemistry}_barcode_whitelist.txt.gz" + Channel.fromPath(barcode_filename) + .ifEmpty{ exit 1, "Cannot find ${protocol} barcode whitelist: $barcode_filename" } + .set{ barcode_whitelist_gzipped } +} else if (params.barcode_whitelist){ + Channel.fromPath(params.barcode_whitelist) + .ifEmpty{ exit 1, "Cannot find ${protocol} barcode whitelist: $barcode_filename" } + .set{ ch_barcode_whitelist } +} + +//////////////////////////////////////////////////// +/* -- Define command line options -- */ +//////////////////////////////////////////////////// +def modules = params.modules.clone() + +def salmon_index_options = modules['salmon_index'] +def gffread_txp2gene_options = modules['gffread_tx2pgene'] +def salmon_alevin_options = modules['salmon_alevin'] +def alevin_qc_options = modules['alevinqc'] +def multiqc_options = modules['multiqc_alevin'] + +//////////////////////////////////////////////////// +/* -- IMPORT LOCAL MODULES/SUBWORKFLOWS -- */ +//////////////////////////////////////////////////// +include { INPUT_CHECK } from '../subworkflows/local/input_check' addParams( options: [:] ) +include { GFFREAD_TRANSCRIPTOME } from '../modules/local/gffread_transcriptome' addParams( options: [:] ) +include { SALMON_ALEVIN } from '../modules/local/salmon_alevin' addParams( options: salmon_alevin_options ) +include { ALEVINQC } from '../modules/local/alevinqc' addParams( options: alevin_qc_options ) +include { GET_SOFTWARE_VERSIONS } from '../modules/local/get_software_versions' addParams( options: [publish_files: ['csv':'']] ) +include { MULTIQC } from '../modules/local/multiqc_alevin' addParams( options: multiqc_options ) + +//////////////////////////////////////////////////// +/* -- IMPORT NF-CORE MODULES/SUBWORKFLOWS -- */ +//////////////////////////////////////////////////// +include { GUNZIP } from '../modules/nf-core/modules/gunzip/main' addParams( options: [:] ) +include { GFFREAD as GFFREAD_TXP2GENE } from '../modules/nf-core/modules/gffread/main' addParams( options: gffread_txp2gene_options ) +include { SALMON_INDEX } from '../modules/nf-core/modules/salmon/index/main' addParams( options: salmon_index_options ) + +//////////////////////////////////////////////////// +/* -- RUN MAIN WORKFLOW -- */ +//////////////////////////////////////////////////// +def multiqc_report = [] + +workflow SCRNASEQ_ALEVIN { + ch_software_versions = Channel.empty() + + /* + * Check input files and stage input data + */ + INPUT_CHECK( ch_input ) + .map { + meta, reads -> meta.id = meta.id.split('_')[0..-2].join('_') + [ meta, reads ] + } + .groupTuple(by: [0]) + .map { it -> [ it[0], it[1].flatten() ] } + .set { ch_fastq } + + // unzip barcodes + if (params.protocol.contains("10X") && !params.barcode_whitelist) { + GUNZIP( barcode_whitelist_gzipped ) + ch_barcode_whitelist = GUNZIP.out.gunzip + } + + // Preprocessing - Extract transcriptome fasta from genome fasta + if (!params.transcript_fasta && params.genome_fasta && params.gtf) { + GFFREAD_TRANSCRIPTOME( genome_fasta, gtf ) + transcriptome_fasta = GFFREAD_TRANSCRIPTOME.out.transcriptome_extracted + ch_software_versions = ch_software_versions.mix(GFFREAD_TRANSCRIPTOME.out.version.first().ifEmpty(null)) + } + + /* + * Build salmon index + */ + if (!params.salmon_index) { + SALMON_INDEX( genome_fasta, transcriptome_fasta ) + salmon_index_alevin = SALMON_INDEX.out.index + } + + /* + * Build txp2gene map + */ + if (!params.txp2gene){ + GFFREAD_TXP2GENE( gtf ) + ch_txp2gene = GFFREAD_TXP2GENE.out.gtf + // Only collect version if not already done for gffread + if (!GFFREAD_TRANSCRIPTOME.out.version) { + ch_software_versions = ch_software_versions.mix(GFFREAD_TXP2GENE.out.version.first().ifEmpty(null)) + } + } + + /* + * Perform quantification with salmon alevin + */ + SALMON_ALEVIN ( ch_fastq, salmon_index_alevin, ch_txp2gene, protocol, ch_barcode_whitelist ) + ch_software_versions = ch_software_versions.mix(SALMON_ALEVIN.out.version.first().ifEmpty(null)) + ch_salmon_multiqc = SALMON_ALEVIN.out.alevin_results + + /* + * Run alevinQC + */ + ALEVINQC( SALMON_ALEVIN.out.alevin_results ) + ch_software_versions = ch_software_versions.mix(ALEVINQC.out.version.first().ifEmpty(null)) + + // collect software versions + GET_SOFTWARE_VERSIONS ( ch_software_versions.map { it }.collect() ) + + /* + * MultiQC + */ + if (!params.skip_multiqc) { + workflow_summary = Workflow.paramsSummaryMultiqc(workflow, params.summary_params) + ch_workflow_summary = Channel.value(workflow_summary) + + MULTIQC ( + ch_multiqc_config, + ch_multiqc_custom_config.collect().ifEmpty([]), + GET_SOFTWARE_VERSIONS.out.yaml.collect(), + ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml'), + ch_salmon_multiqc.collect{it[1]}.ifEmpty([]), + ) + multiqc_report = MULTIQC.out.report.toList() + } + +} + +//////////////////////////////////////////////////// +/* -- COMPLETION EMAIL -- */ +//////////////////////////////////////////////////// + +workflow.onComplete { + if (params.email || params.email_on_fail) { + NfcoreTemplate.email(workflow, params, summary_params, projectDir, log, multiqc_report) + } + NfcoreTemplate.summary(workflow, params, log) +} + +//////////////////////////////////////////////////// +/* -- THE END -- */ +//////////////////////////////////////////////////// diff --git a/workflows/kallisto_bustools.nf b/workflows/kallisto_bustools.nf new file mode 100644 index 00000000..48f6b87f --- /dev/null +++ b/workflows/kallisto_bustools.nf @@ -0,0 +1,185 @@ +//////////////////////////////////////////////////// +/* -- KALLISTO BUSTOOLS WORKFLOW -- */ +//////////////////////////////////////////////////// + +params.summary_params = [:] + +//////////////////////////////////////////////////// +/* -- Collect configuration parameters -- */ +//////////////////////////////////////////////////// + +// Check if genome exists in the config file +if (params.genomes && params.genome && !params.genomes.containsKey(params.genome)) { + exit 1, "The provided genome '${params.genome}' is not available in the iGenomes file. Currently the available genomes are ${params.genomes.keySet().join(', ')}" +} + +//Check if GTF is supplied properly +if( params.gtf ){ + Channel + .fromPath(params.gtf) + .ifEmpty { exit 1, "GTF annotation file not found: ${params.gtf}" } + .set { gtf } +} + +//Setup FastA channels +if( params.genome_fasta ){ + Channel + .fromPath(params.genome_fasta) + .ifEmpty { exit 1, "Fasta file not found: ${params.genome_fasta}" } + .set { genome_fasta } +} + + +// Check if files for index building are given if no index is specified +if (!params.kallisto_index && (!params.genome_fasta || !params.gtf)) { + exit 1, "Must provide a genome fasta file ('--genome_fasta') and a gtf file ('--gtf') if no index is given!" +} + +//Setup channel for salmon index if specified +if (params.kallisto_index) { + Channel + .fromPath(params.kallisto_index) + .ifEmpty { exit 1, "Kallisto index not found: ${params.kallisto_index}" } + .set { ch_kallisto_index } +} + +// Kallist gene map +// Check if txp2gene file has been provided +if (params.kallisto_gene_map){ + Channel + .fromPath(params.kallisto_gene_map) + .set{ ch_kallisto_gene_map } +} +if (!params.gtf && !params.kallisto_gene_map){ + exit 1, "Must provide either a GTF file ('--gtf') or kallisto gene map ('--kallisto_gene_map') to align with kallisto bustools!" +} + +// Get the protocol parameter +(protocol, chemistry) = Workflow.formatProtocol(params.protocol, "kallisto") +kb_workflow = "standard" + +// Create a channel for input read files +if (params.input) { ch_input = file(params.input) } else { exit 1, 'Input samplesheet file not specified!' } + +// Check AWS batch settings +// TODO use the Checks.awsBatch() function instead + +// Stage config files +ch_multiqc_config = file("$projectDir/assets/multiqc_config.yaml", checkIfExists: true) +ch_multiqc_custom_config = params.multiqc_config ? Channel.fromPath(params.multiqc_config, checkIfExists: true) : Channel.empty() +ch_output_docs = file("$projectDir/docs/output.md", checkIfExists: true) +ch_output_docs_images = file("$projectDir/docs/images/", checkIfExists: true) + +//////////////////////////////////////////////////// +/* -- Define command line options -- */ +//////////////////////////////////////////////////// +def modules = params.modules.clone() + +def kallistobustools_ref_options = modules['kallistobustools_ref'] +def kallistobustools_count_options = modules['kallistobustools_count'] +def multiqc_options = modules['multiqc_kb'] + +//////////////////////////////////////////////////// +/* -- IMPORT LOCAL MODULES/SUBWORKFLOWS -- */ +//////////////////////////////////////////////////// +include { INPUT_CHECK } from '../subworkflows/local/input_check' addParams( options: [:] ) +include { GENE_MAP } from '../modules/local/gene_map' addParams( options: [:] ) +include { KALLISTOBUSTOOLS_COUNT } from '../modules/local/kallistobustools_count' addParams( options: kallistobustools_count_options ) +include { GET_SOFTWARE_VERSIONS } from '../modules/local/get_software_versions' addParams( options: [publish_files: ['csv':'']] ) +include { MULTIQC } from '../modules/local/multiqc_kb' addParams( options: multiqc_options ) + +//////////////////////////////////////////////////// +/* -- IMPORT NF-CORE MODULES/SUBWORKFLOWS -- */ +//////////////////////////////////////////////////// +include { GUNZIP } from '../modules/nf-core/modules/gunzip/main' addParams( options: [:] ) +include { KALLISTOBUSTOOLS_REF } from '../modules/nf-core/modules/kallistobustools/ref/main' addParams( options: kallistobustools_ref_options ) + +//////////////////////////////////////////////////// +/* -- RUN MAIN WORKFLOW -- */ +//////////////////////////////////////////////////// +def multiqc_report = [] + +workflow KALLISTO_BUSTOOLS { + ch_software_versions = Channel.empty() + + /* + * Check input files and stage input data + */ + INPUT_CHECK( ch_input ) + .map { + meta, reads -> meta.id = meta.id.split('_')[0..-2].join('_') + [ meta, reads ] + } + .groupTuple(by: [0]) + .map { it -> [ it[0], it[1].flatten() ] } + .set { ch_fastq } + + /* + * Generate Kallisto Gene Map if not supplied and index is given + * If index is given, the gene map will be generated in the 'kb ref' step + */ + if (!params.kallisto_gene_map && params.kallisto_index) { + GENE_MAP( gtf ) + ch_kallisto_gene_map = GENE_MAP.out.gene_map + } + + /* + * Generate kallisto index + */ + if (!params.kallisto_index) { + KALLISTOBUSTOOLS_REF( genome_fasta, gtf, kb_workflow ) + ch_kallisto_gene_map = KALLISTOBUSTOOLS_REF.out.t2g + ch_kallisto_index = KALLISTOBUSTOOLS_REF.out.index + } + + /* + * Quantification with kallistobustools count + */ + KALLISTOBUSTOOLS_COUNT( + ch_fastq, + ch_kallisto_index, + ch_kallisto_gene_map, + [], + [], + false, + false, + kb_workflow, + protocol + ) + ch_software_versions = ch_software_versions.mix(KALLISTOBUSTOOLS_COUNT.out.version.first().ifEmpty(null)) + + // collect software versions + GET_SOFTWARE_VERSIONS ( ch_software_versions.map { it }.collect() ) + + /* + * MultiQC + */ + if (!params.skip_multiqc) { + workflow_summary = Workflow.paramsSummaryMultiqc(workflow, params.summary_params) + ch_workflow_summary = Channel.value(workflow_summary) + + MULTIQC ( + ch_multiqc_config, + ch_multiqc_custom_config.collect().ifEmpty([]), + GET_SOFTWARE_VERSIONS.out.yaml.collect(), + ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml') + ) + multiqc_report = MULTIQC.out.report.toList() + } + +} + +//////////////////////////////////////////////////// +/* -- COMPLETION EMAIL -- */ +//////////////////////////////////////////////////// + +workflow.onComplete { + if (params.email || params.email_on_fail) { + NfcoreTemplate.email(workflow, params, summary_params, projectDir, log, multiqc_report) + } + NfcoreTemplate.summary(workflow, params, log) +} + +//////////////////////////////////////////////////// +/* -- THE END -- */ +//////////////////////////////////////////////////// diff --git a/workflows/scrnaseq.nf b/workflows/scrnaseq.nf new file mode 100644 index 00000000..623e9575 --- /dev/null +++ b/workflows/scrnaseq.nf @@ -0,0 +1,69 @@ +/* +======================================================================================== + VALIDATE INPUTS +======================================================================================== +*/ + +def summary_params = NfcoreSchema.paramsSummaryMap(workflow, params) + +// Validate input parameters +WorkflowScrnaseq.initialise(params, log) + +// TODO nf-core: Add all file path parameters for the pipeline to the list below +// Check input path parameters to see if they exist +def checkPathParamList = [ params.input, params.multiqc_config, params.fasta ] +for (param in checkPathParamList) { if (param) { file(param, checkIfExists: true) } } + +// Check mandatory parameters +if (params.input) { ch_input = file(params.input) } else { exit 1, 'Input samplesheet not specified!' } + +/* +======================================================================================== + IMPORT WORKFLOWS +======================================================================================== +*/ + +include { SCRNASEQ_ALEVIN } from './alevin' +include { STARSOLO } from './starsolo' +include { KALLISTO_BUSTOOLS } from './kallisto_bustools' + + +/* +======================================================================================== + RUN MAIN WORKFLOW +======================================================================================== +*/ + +// Info required for completion email and summary +def multiqc_report = [] + +workflow SCRNASEQ { + // Run salmon alevin pipeline + if (params.aligner == "alevin") { + SCRNASEQ_ALEVIN() + } + + // Run STARSolo pipeline + if (params.aligner == "star") { + STARSOLO() + } + + // Run kallisto bustools pipeline + if (params.aligner == "kallisto") { + KALLISTO_BUSTOOLS() + } +} + +/* +======================================================================================== + COMPLETION EMAIL AND SUMMARY +======================================================================================== +*/ + +// Added to individual workflows + +/* +======================================================================================== + THE END +======================================================================================== +*/ diff --git a/workflows/starsolo.nf b/workflows/starsolo.nf new file mode 100644 index 00000000..836a1a91 --- /dev/null +++ b/workflows/starsolo.nf @@ -0,0 +1,193 @@ +//////////////////////////////////////////////////// +/* -- STARSolo WORKFLOW -- */ +//////////////////////////////////////////////////// + +params.summary_params = [:] + +//////////////////////////////////////////////////// +/* -- Collect configuration parameters -- */ +//////////////////////////////////////////////////// + +// Check if genome exists in the config file +if (params.genomes && params.genome && !params.genomes.containsKey(params.genome)) { + exit 1, "The provided genome '${params.genome}' is not available in the iGenomes file. Currently the available genomes are ${params.genomes.keySet().join(', ')}" +} + +//Check if GTF is supplied properly +if( params.gtf ){ + Channel + .fromPath(params.gtf) + .ifEmpty { exit 1, "GTF annotation file not found: ${params.gtf}" } + .set { gtf } +} + +//Setup FastA channels +if( params.genome_fasta ){ + Channel + .fromPath(params.genome_fasta) + .ifEmpty { exit 1, "Fasta file not found: ${params.genome_fasta}" } + .set { genome_fasta } +} + +//Setup Transcript FastA channels +if( params.transcript_fasta ){ + Channel + .fromPath(params.transcript_fasta) + .ifEmpty { exit 1, "Fasta file not found: ${params.transcript_fasta}" } + .set { transcriptome_fasta } +} + +// Check if STAR index is supplied properly +if( params.star_index ){ + star_index = Channel + .fromPath(params.star_index) + .ifEmpty { exit 1, "STAR index not found: ${params.star_index}" } +} + +if (!params.star_index && (!params.gtf || !params.genome_fasta)){ + exit 1, "STAR needs either a GTF + FASTA or a precomputed index supplied." +} + +// Create a channel for input read files +if (params.input) { ch_input = file(params.input) } else { exit 1, 'Input samplesheet file not specified!' } + +// Check if txp2gene file has been provided +if (params.txp2gene){ + Channel + .fromPath(params.txp2gene) + .set{ ch_txp2gene } +} + +// Check AWS batch settings +// TODO use the Checks.awsBatch() function instead + +// Stage config files +ch_multiqc_config = file("$projectDir/assets/multiqc_config.yaml", checkIfExists: true) +ch_multiqc_custom_config = params.multiqc_config ? Channel.fromPath(params.multiqc_config, checkIfExists: true) : Channel.empty() +ch_output_docs = file("$projectDir/docs/output.md", checkIfExists: true) +ch_output_docs_images = file("$projectDir/docs/images/", checkIfExists: true) + +// Get the protocol parameter +(protocol, chemistry) = Workflow.formatProtocol(params.protocol, "star") + +//Whitelist files for STARsolo and Kallisto +whitelist_folder = "$baseDir/assets/whitelist/" + +//Automatically set up proper filepaths to the barcode whitelist files bundled with the pipeline +if (params.protocol.contains("10X") && !params.barcode_whitelist){ + barcode_filename = "$whitelist_folder/10x_${chemistry}_barcode_whitelist.txt.gz" + Channel.fromPath(barcode_filename) + .ifEmpty{ exit 1, "Cannot find ${protocol} barcode whitelist: $barcode_filename" } + .set{ barcode_whitelist_gzipped } +} else if (params.barcode_whitelist){ + Channel.fromPath(params.barcode_whitelist) + .ifEmpty{ exit 1, "Cannot find ${protocol} barcode whitelist: $barcode_filename" } + .set{ ch_barcode_whitelist } +} + +//////////////////////////////////////////////////// +/* -- Define command line options -- */ +//////////////////////////////////////////////////// +def modules = params.modules.clone() + +def star_genomegenerate_options = modules['star_genomegenerate'] +def star_align_options = modules['star_align'] +def multiqc_options = modules['multiqc_alevin'] + +//////////////////////////////////////////////////// +/* -- IMPORT LOCAL MODULES/SUBWORKFLOWS -- */ +//////////////////////////////////////////////////// +include { INPUT_CHECK } from '../subworkflows/local/input_check' addParams( options: [:] ) +include { GET_SOFTWARE_VERSIONS } from '../modules/local/get_software_versions' addParams( options: [publish_files: ['csv':'']] ) +include { MULTIQC } from '../modules/local/multiqc_alevin' addParams( options: multiqc_options ) +include { STAR_ALIGN } from '../modules/local/star_align' addParams( options: star_align_options ) + +//////////////////////////////////////////////////// +/* -- IMPORT NF-CORE MODULES/SUBWORKFLOWS -- */ +//////////////////////////////////////////////////// +include { GUNZIP } from '../modules/nf-core/modules/gunzip/main' addParams( options: [:] ) +include { STAR_GENOMEGENERATE } from '../modules/nf-core/modules/star/genomegenerate/main' addParams( options: star_genomegenerate_options ) + + +//////////////////////////////////////////////////// +/* -- RUN MAIN WORKFLOW -- */ +//////////////////////////////////////////////////// +def multiqc_report = [] + +workflow STARSOLO { + ch_software_versions = Channel.empty() + + /* + * Check input files and stage input data + */ + INPUT_CHECK( ch_input ) + .map { + meta, reads -> meta.id = meta.id.split('_')[0..-2].join('_') + [ meta, reads ] + } + .groupTuple(by: [0]) + .map { it -> [ it[0], it[1].flatten() ] } + .set { ch_fastq } + + // unzip barcodes + if (params.protocol.contains("10X") && !params.barcode_whitelist) { + GUNZIP( barcode_whitelist_gzipped ) + ch_barcode_whitelist = GUNZIP.out.gunzip + } + + /* + * Build STAR index if not supplied + */ + if (!params.star_index) { + STAR_GENOMEGENERATE( genome_fasta, gtf ) + star_index = STAR_GENOMEGENERATE.out.index + } + + /* + * Perform mapping with STAR + */ + STAR_ALIGN( + ch_fastq, + star_index, + gtf, + ch_barcode_whitelist, + protocol + ) + ch_software_versions = ch_software_versions.mix(STAR_ALIGN.out.version.first().ifEmpty(null)) + ch_star_multiqc = STAR_ALIGN.out.log_final + + // collect software versions + GET_SOFTWARE_VERSIONS ( ch_software_versions.map { it }.collect() ) + + /* + * MultiQC + */ + if (!params.skip_multiqc) { + workflow_summary = Workflow.paramsSummaryMultiqc(workflow, params.summary_params) + ch_workflow_summary = Channel.value(workflow_summary) + + MULTIQC ( + ch_multiqc_config, + ch_multiqc_custom_config.collect().ifEmpty([]), + GET_SOFTWARE_VERSIONS.out.yaml.collect(), + ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml'), + ch_star_multiqc.collect{it[1]}.ifEmpty([]), + ) + multiqc_report = MULTIQC.out.report.toList() + } + +} + +//////////////////////////////////////////////////// +/* -- COMPLETION EMAIL -- */ +//////////////////////////////////////////////////// + +workflow.onComplete { + if (params.email || params.email_on_fail) { + NfcoreTemplate.email(workflow, params, summary_params, projectDir, log, multiqc_report) + } + NfcoreTemplate.summary(workflow, params, log) +} +//////////////////////////////////////////////////// +/* -- THE END -- */ +////////////////////////////////////////////////////