# Nextflow

Nextflow is a workflow management system (consisting of a domain specific language and workflow engine) that can be used to write and run data-intensive bioinformatics workflows.

One of the main advantages of Nextflow is that it provides an abstraction between the workflow’s functional logic and the underlying execution system (or runtime). Thus, it is possible to write a workflow that runs seamlessly on your computer, a cluster, or the cloud, without being modified. You simply define the target execution platform (local, lsf, aws etc.) in a configuration file.

With Nextflow you can write a pipeline once and run it everywhere!

Let's change back to the user `manager`. Type the commands below in the terminal window:

In [None]:
su manager

When prompted for a password enter the text below followed by the `Enter` key:

In [None]:
manager

## Installing nextflow

We will use `conda` to install `nextflow`. Let's create a `conda` environment called _nextflow_ and put `nextflow` version 23.04.1 and all it's dependencies in that environment.

In [None]:
conda create -n nextflow nextflow=23.04.1

Activate the _nextflow_ environment:

In [None]:
conda activate nextflow

## Nextflow and Containers  

Bioinformatics workflows are rarely composed of a single script or software tool. More often, they depend on many software applications and packages. As noted in the previous section, installing and maintaining such dependencies is a challenging task.

To address this challenge, Nextflow uses containers to manage software dependencies. Containers are similar to a virtual machine in that they have their own copy of the file system, processing space, memory management, and software installations etc. They can be run on any computer that supports containers in such a way that they are isolated from the host machine. 

Nextflow requires software applications used in a workflow to be encapsulated in one or more self-contained, ready-to-run containers. Nextflow supports both Docker and Singularity containers. We will demonstrate how to use Singularity with Nextflow as the usage of Docker is generally not allowed on compute clusters due to security constraints.

Therefore if using Nextflow you will also need Singulaity installed. To check that it is installed type:

In [None]:
singularity -h

Installing `singularity` will usually require root or admin privileges and the instructions will vary depending on your operating system. It is already installed on the computer you are using and here are the commands that were used to install it (do not run these commands here).

`wget https://github.com/sylabs/singularity/releases/download/v3.10.2/singularity-ce_3.10.2-jammy_amd64.deb`   
`sudo dpkg -i singularity-ce_3.9.8-focal_amd64.deb`   
`rm singularity-ce_3.9.8-focal_amd64.deb` 

## The nf-core project

The nf-core project is a community effort to collect a curated set of best-practice analysis pipelines built using Nextflow. It provides Nextflow implementations of modules/subworkflows for common bioinformatics analysis tasks (e.g. running bwa) and pipelines for common bioinformatics workflows (e.g. map and snp call a set of bacterial isolates) along with a set of guidelines that these implementations must adhere to.

You can benefit from nf-core by using existing well-established and tested pipelines rather than having to implement your own Nextflow pipeline. The pipelines offered by nf-core are standardized, portable, well documented and user-friendly and guarantee reproducibility of results. You can also become a developer and write your own pipelines in Nextflow using ready-made modules available in nf-core. However this is out of the scope of this tutorial. 

Now install `nf-core` with:

In [None]:
conda install nf-core

Once installed, you can check that everything is working by printing the help:

In [None]:
nf-core --help

As you can see from the `--help` output, the `nf-core` has a range of sub-commands. The simplest is `nf-core list`, which lists all available nf-core pipelines. The output shows the latest version number, when that was released. If the pipeline has been installed locally using Nextflow, it tells you when that was and whether you have the latest version.

In [None]:
nf-core list

To browse the available nf-core pipelines online visit (https://nf-co.re/pipelines)[https://nf-co.re/pipelines]

## Exercises

1. Use the nf-core help flag to print the list command usage
2. List all available nf-core pipelines
3. How many nf-core pipelines are there?
4. Sort the pipelines alphabetically, then by popularity (stars)
5. Filter pipelines for those that work with RNA

To download and run your first nf-core pipeline, continue to the next section section: [Running Nextflow Pipelines](nf_pipelines.ipynb)