Babraham Nextflow Pipelines

Introduction

This repository contains the nextflow pipelines which are used for the bulk processing of our NGS data. These pipelines are designed to work within the environment of our cluster, but could easily be made to work elsewhere.

These pipelines make it easy to run a standard processing of large numbers of sequences. They generally come in one of two types:

A pipeline for the complete processing of a particular type of data (eg nf_rnaseq) which will run several linked programs
A pipeline to run an individual program (eg nf_fastq_screen) which can be useful when more limited processing is required

All of the pipelines are designed to be directly executable and many of them will just need to be given a set of fastq files to process. Any that perform a mapping step will also need to supply details of the genome to process. More detailed technical information may be found at this Nextflow @Babraham User Guide.

Environment

The pipelines here are designed to work on a linux server which uses environment modules to dynamically load software packages. We do not include the analysis tools used by the pipelines and you must install these separately. Each of the pipelines will issue appropriate module load commands for the software it requires and will fail if these are not present.

You will need to have an installation of nextflow to run the pipelines, and this will need to be configured with an appropriate executor. Nextflow can talk to a variety of different queueing systems if running on a cluster, or can run in a local mode on a stand alone server. Our default setup uses the slurm queueing system, but you can change this by editing the nextflow.config file we distribute.

If you are working on a system which has no infrastructure and you're not interested in setting anything up then you may prefer looking at pipelines such as those from nf-core which offer a more complete infrastructure solution.

Genomes

Any pipeline which maps data to a reference genome will require the genome to have been downloaded and indexed prior to running the pipeline. Genome information is supplied to the pipelines in a series of config files found in the genomes.d folder. Each of these is a series of key value pairs which tell the pipeline some basic information about the genome (its name and the species it comes from) and the location of the indices or other configuraton files for different programs. For example our latest mouse file contains:

name	GRCm38
species	Mouse
fasta	/bi/scratch/Genomes/Mouse/GRCm38/
bismark	/bi/scratch/Genomes/Mouse/GRCm38/
cellranger  /bi/apps/cellranger/references/refdata-gex-mm10-2020-A/
bowtie	/bi/scratch/Genomes/Mouse/GRCm38/Mus_musculus.GRCm38
bowtie2	/bi/scratch/Genomes/Mouse/GRCm38/Mus_musculus.GRCm38
hisat2	/bi/scratch/Genomes/Mouse/GRCm38/Mus_musculus.GRCm38
gtf	/bi/scratch/Genomes/Mouse/GRCm38/Mus_musculus.GRCm38.90.gtf
hisat2_splices	/bi/scratch/Genomes/Mouse/GRCm38/Mus_musculus.GRCm38.90.hisat2_splices.txt

This means that within the pipelines you can just specify --genome GRCm38 and all of the required information will then be extracted from this config file.

If you install these pipelines on your own system then you will need to prepare the genomes which you want to use and populate the corresponding files in genomes.d

More detailed information please refer to the more detailed Nextflow @Babraham User Guide.

Problems and bugs

If you have any problems using these pipelines then feel free to file issues in our Github repository. For other questions you can email babraham.bioinformatics@babraham.ac.uk

Name		Name	Last commit message	Last commit date
Latest commit History 314 Commits
Docs		Docs
genomes.d		genomes.d
nf_modules		nf_modules
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
nextflow.config		nextflow.config
nf_bam2bigwig		nf_bam2bigwig
nf_bismark		nf_bismark
nf_bisulfite_PBAT		nf_bisulfite_PBAT
nf_bisulfite_PBAT_DirtyHarry		nf_bisulfite_PBAT_DirtyHarry
nf_bisulfite_RRBS		nf_bisulfite_RRBS
nf_bisulfite_RRBS_clock		nf_bisulfite_RRBS_clock
nf_bisulfite_WGBS		nf_bisulfite_WGBS
nf_bisulfite_scBSseq		nf_bisulfite_scBSseq
nf_bisulfite_scNMT		nf_bisulfite_scNMT
nf_bisulfite_scPBAT_DirtyHarry		nf_bisulfite_scPBAT_DirtyHarry
nf_bowtie2		nf_bowtie2
nf_cellranger		nf_cellranger
nf_chipseq		nf_chipseq
nf_fastq_screen		nf_fastq_screen
nf_fastqc		nf_fastqc
nf_hisat2		nf_hisat2
nf_pychopper		nf_pychopper
nf_qc		nf_qc
nf_reStrainingOrder		nf_reStrainingOrder
nf_rnaseq		nf_rnaseq
nf_sortIndexBAM		nf_sortIndexBAM
nf_traelseq		nf_traelseq
nf_traelseq_rescue		nf_traelseq_rescue
nf_trim_galore		nf_trim_galore
nf_umibam		nf_umibam

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Babraham Nextflow Pipelines

Introduction

Environment

Genomes

Problems and bugs

About

Releases

Packages

Contributors 2

Languages

License

s-andrews/nextflow_pipelines

Folders and files

Latest commit

History

Repository files navigation

Babraham Nextflow Pipelines

Introduction

Environment

Genomes

Problems and bugs

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages