Groovy Java Shell HTML JavaScript Makefile
Clone or download
Latest commit 2a56247 Aug 13, 2018
Failed to load latest commit information.
.github Updated issue template [ci skip] Jun 26, 2018
buildtools Repackaging google guava cache as jarjar lib Feb 26, 2014
cloud Updated README Sep 1, 2016
config/codenarc Update copyright info Dec 28, 2017
docker Enhanced support for Kubernetes #468 Feb 26, 2018
docs Update documentation (#827) Aug 8, 2018
gradle/wrapper Upgrade to Gradle 4.9 Jul 17, 2018
src Fixed splitter test #835 Aug 13, 2018
subprojects Fixed Byte hashing is never reached in CacheHelper class Aug 13, 2018
tests @ 3c3eb15 Fixed typo in tests Aug 7, 2018
third-party-licenses Merge AWS cloud support Aug 30, 2016
validation Added AWS batch CI tests Jul 2, 2018
.gitignore Enhanced support for Kubernetes #468 Feb 26, 2018
.gitmodules Using tests submodule relative path Mar 6, 2015
.travis.yml Disable test publishing on pull request Jun 14, 2018 Minor rewording of CONTRIBUTING Aug 6, 2018
COPYING Updated copyright info Feb 12, 2018
Makefile Dump current time when compilation complete Jun 16, 2018 DOC(minor): Fix url to groovy Jul 5, 2018
build.gradle [release 0.31.1] update timestamp and build numbers Aug 7, 2018
changelog.txt Updated changelog Aug 7, 2018
circle.yml Fixed tests report publish script Mar 18, 2018 Added 'provided' configuration in gradle build and some clean-up Sep 13, 2014 Updated console launcher Feb 9, 2017
gradlew Upgrade to Gradle 4.9 Jul 17, 2018
gradlew.bat Upgrade to Gradle 4.9 Jul 17, 2018 Invoke compile tasks and force dependencies download running the `inf… Nov 24, 2015 Enabling Java 9/10 support for build and launcher scripts #674 May 10, 2018
nextflow [release 0.31.1] update timestamp and build numbers Aug 7, 2018
nextflow.md5 [release 0.31.1] update timestamp and build numbers Aug 7, 2018
nextflow.sha1 [release 0.31.1] update timestamp and build numbers Aug 7, 2018
nextflow.sha256 [release 0.31.1] update timestamp and build numbers Aug 7, 2018 Added YourKit launcher file Mar 23, 2014 Enabling travis multi-JVM build May 10, 2018
settings.gradle Added GA4GH TES executor [experimental] #794 Jul 12, 2018 Modified CI trigger script Jun 5, 2018


"Dataflow variables are spectacularly expressive in concurrent programming"
Henri E. Bal , Jennifer G. Steiner , Andrew S. Tanenbaum


With the rise of big data, techniques to analyse and run experiments on large datasets are increasingly necessary.

Parallelization and distributed computing are the best ways to tackle this kind of problem, but the tools commonly available to the bioinformatics community traditionally lack good support for these techniques, or provide a model that fits badly with the specific requirements in the bioinformatics domain and, most of the time, require the knowledge of complex tools or low-level APIs.

Nextflow framework is based on the dataflow programming model, which greatly simplifies writing parallel and distributed pipelines without adding unnecessary complexity and letting you concentrate on the flow of data, i.e. the functional logic of the application/algorithm.

It doesn't aim to be another pipeline scripting language yet, but it is built around the idea that the Linux platform is the lingua franca of data science, since it provides many simple command line and scripting tools, which by themselves are powerful, but when chained together facilitate complex data manipulations.

In practice, this means that a Nextflow script is defined by composing many different processes. Each process can be written in any scripting language that can be executed by the Linux platform (BASH, Perl, Ruby, Python, etc), to which is added the ability to coordinate and synchronize the processes execution by simply specifying their inputs and outputs.

Quick start

Nextflow does not require any installation procedure, just download the distribution package by copying and pasting this command in your terminal:

curl -fsSL | bash

It creates the nextflow executable file in the current directory. You may want to move it to a folder accessible from your $PATH.

Create a file named with the following content and copy it to the path where you downloaded the Nextflow package.

process sayHello {

    printf 'Hello world! \n'

Launch the above example by typing the following command on your terminal console:

./nextflow run -process.echo true

Congratulations! You have just run your first program with Nextflow.

Something more useful

Let's see a more real example: execute a BLAST search, get the top 10 hits, extract the found protein sequences and align them.

Copy the following example into a file named .

params.query = "$HOME/sample.fa"
params.db = "$HOME/tools/blast-db/pdb/pdb"

process blast {
     file top_hits

    blastp -query ${params.query} -db ${params.db} -outfmt 6 \
    | head -n 10 \
    | cut -f 2 > top_hits

process extract {
     file top_hits
     file sequences

    "blastdbcmd -db ${params.db} -entry_batch $top_hits > sequences"

process align {
     file sequences
    echo true

    "t_coffee $sequences 2>&- | tee align_result"

The input and output declarations in each process, define what it is expecting to receive as input and what file(s) are going to be produced as output.

Since the two variables query and db are prefixed by the params qualifier, their values can be overridden quickly when the script is launched, by simply adding them on the Nextflow command line and prefixing them with the -- characters. For example:

./nextflow run --db=/path/to/blast/db --query=/path/to/query.fasta

Mixing scripting languages

Processes in your pipeline can be written in any scripting language supported by the underlying Linux platform. To use a scripting other than Linux BASH (e.g. Perl, Python, Ruby, R, etc), simply start your process script with the corresponding shebang declaration. For example:

process perlStuff {

    #!/usr/bin/env perl

    print 'Hi there!' . '\n';

process pyStuff {
    #!/usr/bin/env python

    x = 'Hello'
    y = 'world!'
    print "%s - %s" % (x,y)

Cluster Resource Managers support

Nextflow provides an abstraction between the pipeline functional logic and the underlying processing system. Thus it is possible to write your pipeline once and have it running on your computer or a cluster resource manager without modifying it.

Currently the following clusters are supported:

  • Open Grid Engine (SGE)
  • Univa Grid Engine
  • IBM Platform LSF
  • Linux SLURM
  • PBS/Torque
  • HTCondor (experimental)

By default processes are parallelized by spanning multiple threads in the machine where the pipeline is launched.

To submit the execution to a SGE cluster create a file named nextflow.config, in the directory where the pipeline is going to be launched, with the following content:

process {
  queue='<your execution queue>'

In doing that, processes will be executed as SGE jobs by using the qsub command, and so your pipeline will behave like any other SGE job script, with the benefit that Nextflow will automatically and transparently manage the processes synchronisation, file(s) staging/un-staging, etc.

Alternatively the same declaration can be defined in the file $HOME/.nextflow/config, which is supposed to hold the global Nextflow configuration.

Cloud support

Nextflow provides out of the box support for the Amazon AWS cloud allowing you to setup a computing cluster, deploy it and run your pipeline in the AWS infrastructure in a few commands.

The cloud configuration settings need to be specified in the nextflow.config file as shown below:

cloud {
      imageId = 'ami-43f49030'
      instanceType = 't2.micro'
      subnetId = 'subnet-05222a43'
      sharedStorageId = 'fs-1803efd1'
      spotPrice = 0.04 

aws {
    accessKey = 'xxx'
    secretKey = 'yyy'
    region = 'eu-west-1'

Replace the settings in the above example with values of your choice. The attribute sharedStorageId is optional, when provided the Amazon EFS file system is automatically mounted in the configured cloud environment. The spotPrice attribute allows you to use EC2 Spot instances in place of regular on-request instances, bidding for the specified price.

The settings in the aws block can be omitted, in that case Nextflow will use the AWS credentials defined in your environment, using the standard AWS variables and configuration files.

Once defined the configuration of your cloud environment, run the following command in the folder where the file nextflow.config was created:

nextflow cloud create my-cluster -c <num-of-nodes>

The string my-cluster identifies the cluster instance. Replace it with a name of your choice. Finally replace num-of-nodes with the actual number of instances that will made-up the cluster. WARNING: you will be charged accordingly the type and the number of instances chosen.

Once the cluster deployment completes, SSH in the master node following the instruction that will be printed. Then you will be able to run your Nextflow pipeline as usual.

Required dependencies

  • Compiler Java 8
  • Runtime Java 8 or later

Build from source

Nextflow is written in Groovy (a scripting language for the JVM). A pre-compiled, ready-to-run, package is available at the Github releases page, thus it is not necessary to compile it in order to use it.

If you are interested in modifying the source code, or contributing to the project, it worth knowing that the build process is based on the Gradle build automation system.

You can compile Nextflow by typing the following command in the project home directory on your computer:

make compile

The very first time you run it, it will automatically download all the libraries required by the build process. It may take some minutes to complete.

When complete, execute the program by using the script in the project directory.

The self-contained runnable Nextflow packages can be created by using the following command:

make pack

In order to install the compiled packages use the following command:

make install

Then you will be able to run nextflow using the nextflow launcher script in the project root folder.

Known compilation problems

Nextflow required JDK 8 to be compiled. The Java compiler used by the build process can be choose by setting the JAVA_HOME environment variable accordingly.

If the compilation stops reporting the error: java.lang.VerifyError: Bad <init> method call from inside of a branch, this is due to a bug affecting the following Java JDK:

  • 1.8.0 update 11
  • 1.8.0 update 20

Upgrade to a newer JDK to avoid to this issue. Alternatively a possible workaround is to define the following variable in your environment:


Read more at these links:

IntelliJ IDEA

Nextflow development with IntelliJ IDEA requires the latest version of the IDE (2017.3 or higher).

If you have it installed in your computer, follow the steps below in order to use it with Nextflow:

  1. Clone the Nextflow repository to a directory in your computer.
  2. Open IntelliJ IDEA and choose "Import project" in the "File" menu bar.
  3. Select the Nextflow project root directory in your computer and click "OK".
  4. Then, choose the "Gradle" item in the "external module" list and click on "Next" button.
  5. Confirm the default import options and click on "Finish" to finalize the project configuration.
  6. When the import process complete, select the "Project structure" command in the "File" menu bar.
  7. In the showed dialog click on the "Project" item in the list of the left, and make sure that the "Project SDK" choice on the right contains Java 8.


Nextflow documentation is available at this link


You can post questions, or report problems by using the Nextflow Google group available at this link!forum/nextflow


Build servers


The Nextflow framework is released under the GNU GPLv3 License.


If you use Nextflow for research purpose, please cite:

P. Di Tommaso, et al. Nextflow enables reproducible computational workflows. Nature Biotechnology 35, 316–319 (2017) doi:10.1038/nbt.3820


Nextflow is built on two great pieces of open source software, namely Groovy and Gpars

YourKit is kindly supporting this open source project with its full-featured Java Profiler. Read more