Skip to content
Nextflow-based pipeline to run and deploy reproducible analyses.
Nextflow Jupyter Notebook
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
analyses
bin
data
envs
lib
results
tables
.gitignore
.travis.yml
LICENSE
README.md
main.nf
nextflow.config

README.md

Universal data analysis pipeline

Build Status

Nextflow-based pipeline to run and deploy reproducible analyses. Alongside the pipeline I developed the toolbox reportsrender to execute notebooks, but it can as well be used without it.

Features

  • render jupyter notebooks or Rmarkdown notebooks (papermill/knitr)
  • ensure reproducible analyses
  • deploy reports to GitHub pages.

Structure

  • analyses: The actual analysis steps (i.e. jupyter notebooks, Rmarkdown documents, bash scripts) go here.
  • bin: scripts that can be called from nextflow directly (nextflow will add them to the PATH for commands ran from a process.
  • data: input data for the notebooks. I often replace this with a symlink to some data storage.
  • deploy: final reports. Will be filled by the deploy process which copies all html reports to that directory and creates an index file. A great way to share the final reports is to push this directory to Github pages.
  • envs: conda environment files go here. Create one file per notebook, or re-use environments for multiple notebooks -- it's up to you.
  • lib: put custom libraries (e.g. python modules) here.
  • results: final results generated by the pipeline go here. Concept: one can always delete the results directory and re-generate it from data using the pipeline.
  • tables: manually created input data that I want to be under version control. E.g. the list of samples and the associated patient data that you had to compile manually from three excel sheets because the biologists encoded data as background-color.
  • main.nf: The nextflow workflow that ties everything together.
  • nextflow.config: Contains configuration options for the pipeline (e.g. output directory). You can also set options here to run the pipeline on a HPC grid engine (e.g. SGE or SLURM).

How to run.

  1. Install nextflow In this case, we use conda. Check the nextflow webiste for other options.
conda create -n nextflow -c conda-forge -c bioconda nextflow
conda activate nextflow
  1. Clone this repository
gitclone git@github.com:grst/universal_analysis_pipeline.git
cd universal_analysis_pipeline
  1. Run the pipeline
./main.nf
  1. Share the results. You can zip and email the deploy folder. Even better is to share the results using github pages:
  • To setup GitHub pages, init a repository in the deploy folder and push to the gh-pages branch:
cd deploy
git init
git remote add origin <YOUR_REMOTE>
git checkout --orphan gh-pages
git add -A .
git commit -m "Initial deploy on gh-pages"
git push -u origin gh-pages
  • It can take a few minutes, but eventually your reports will be available at https://<yourgithubuser>.github.io/<yourrepo>

  • You might want to "password protect" your pages. This is not natively supported by GitHub pages, but a workaround is to put all files in a cryptic subfolder, e.g. rBymGubVBBrdHtGo6Of35E3uI. As GitHub pages doesn't list directories, you need to know the precise URL to access the folder. You can adjust the deploy dir in nextflow.config.

How to use

This repository is meant as a template. You can fork/clone this repository and expand from there. At least, you have to change two things:

  • Add your notebooks to the analyses folder
  • Edit main.nf to wire your notebooks together the right way. You can use reportsrender to execute the notebooks.

Ideas for the future:

  • convert conda envs to singularity containers to ensure reproducibility.
You can’t perform that action at this time.