# Installing and using Nextflow

This is a [Jupyter notebook](https://jupyter.org/) containing a guide for installing [Nextflow](https://www.nextflow.io/) on the QUT HPC.

Prepared by the [eResearch Office, QUT.](https://qutvirtual4.qut.edu.au/group/staff/governance/organisational-structure/academic-division/research-portfolio/research-infrastructure/eresearch)

Nextflow is a pipeline engine that can take advantage of the batch nature of the HPC environment to efficiently and quickly run Bioinformatic workflows.

For more information about Nextflow, please visit [Nextflow - A DSL for parallel and scalable computational pipelines](https://www.nextflow.io/)

**********************************

# Contents
[1. How to use this Jupyter Notebook](#overview)

[2. Installing Nextflow](#install)

[3. Testing Nextflow](#test)

[4. Updating Nextflow](#update)

[5. Running a specific Nextflow pipeline](#pipeline)

[6. Using the Nextflow Tower](#tower)


***************************

## 1. How to use this Jupyter Notebook <a class="anchor" id="overview"></a>

Juypter Notebooks run a 'kernel' that allow code to be run in code 'cells' in the Notebook. This Notebook is running the BASH kernel, which allows for commands to be run on QUTs high performance compute cluster (HPC).

You can run a code cell by clicking on the cell itself and clicking the run button (at the top of this Notebook), or by pressing shift+enter.

![](https://data36.com/wp-content/uploads/2021/07/how-to-run-cell-in-jupyter-notebook.png)

<div class="alert alert-block alert-warning">
As an example, run the following code cell to list the contents of your HPC home directory.
</div>

In [None]:
ls $HOME

**Before each code cell is a colour-coded text box that tells you what the cell does. The colour of the text box tells you whether a code cell is required to run as-is, optional or if it requires you to type input.** 

<div class="alert alert-block alert-success">
A green text box indicates a code cell that must be run, without alteration, to complete the workflow.
</div>

<div class="alert alert-block alert-warning">
A yellow text box indicates an optional code cell that doesn't have to be run to complete the workflow, but can be run to complete optional tasks.
</div>

<div class="alert alert-block alert-info">
A blue text box indicates a code cell that requires user input - this cell also must be run to complete the workflow, but the user needs to modify the command in the cell.
</div>

<div class="alert alert-block alert-danger">
In addition, some text boxes contain particularly important information. These will be coloured red.
</div>

*******************************

## 2. Installing Nextflow <a class="anchor" id="install"></a>

<div class="alert alert-block alert-warning">
If you're unsure if you've previously installed Nextflow, you can run the following to see if Nextflow is intalled and what version you have installed.
    </div>

In [None]:
module load java
nextflow -version

If you would like to update to the latest version of Nextflow (recommended), go to the next section: "Updating nextflow".

<div class="alert alert-block alert-success">
If 'nextflow -version' gives you an error then Nextflow is not installed or incorrectly installed. To install Nextflow, run the following:
</div>

In [None]:
curl -s https://get.nextflow.io | bash
mv nextflow $HOME/bin

<div class="alert alert-block alert-success">
To complete the setup, you will need to run the following code to set the parameters required to run Nextflow on the QUT HPC.
</div>

In [None]:
[[ -d $HOME/.nextflow ]] || mkdir -p $HOME/.nextflow
cat <<EOF > $HOME/.nextflow/config
singularity {
    cacheDir = '$HOME/.nextflow/NXF_SINGULARITY_CACHEDIR'
    autoMounts = true
}
conda {
    cacheDir = '$HOME/.nextflow/NXF_CONDA_CACHEDIR'
}
process {
  executor = 'pbspro'
  scratch = false
  cleanup = false
}
EOF

******************************************

## 3. Testing Nextflow <a class="anchor" id="test"></a>

<div class="alert alert-block alert-warning">
To check if Nextflow has been installed and set up correctly (even if you have previously installed Nextflow), you can run the test 'hello world' pipeline.
    </div>

In [None]:
mkdir $HOME/nftemp && cd $HOME/nftemp
nextflow run hello
#check for output of running the short nextflow hello pipeline
cd $HOME
rm -rf nftemp

If you see "Hello world!" (and "Bonjour world!", etc) then you've installed and setup Nextflow correctly.

**NOTE** If you get a 'Your local project version looks outdated' error, you'll need to pull down the latest version of the 'hello' workflow, like so:

In [None]:
nextflow pull hello

Then you should be able to run the previous 'nextflow run hello' command.

If you are still getting errors when trying to run the Hello World pipeline, contact the eResearch team by logging a request through the portal: https://eresearchqut.atlassian.net/servicedesk/customer/portals

************************************************

## 4. Updating Nextflow <a class="anchor" id="update"></a>

<div class="alert alert-block alert-warning">
To update Nextflow to the current version, run the following:
    <div>

In [None]:
nextflow self-update

If successful, you should see a 'Nextflow installation completed' message.

*********************

## 5. Running a specific Nextflow pipeline <a class="anchor" id="pipeline"></a>

There are a large number of Nextflow workflows, covering a wide range of omics analyses. [Nfcore](https://nf-co.re/) is the main repository with (at the time of writing this Notebook) 80 published pipelines:

https://nf-co.re/pipelines

Click on an nfcore pipeline from the link above (e.g. https://nf-co.re/rnaseq) and you'll see some default tabs - 'Introduction' will explain the pipeline and the tools used, 'Usage docs' will show you an example of a typical command to run that pipeline, 'Parameters' will show the paramaters than can be added or modified in the run command, and 'Output docs' will show the results generated by the workflow.

eResearch has written user guides for running some of these pipelines on the QUT HPC, but almost all nfcore workflows have a similar structure. A typical command has this structure: 

`nextflow run {pipeline name} {options}`

However, it is good practice and much safer to submit a job on the HPC to run Nextflow on your pipeline. A job file (called launch.pbs) to run the nfcore RNA-Seq pipeline might look like:

<div class="alert alert-block alert-danger">
Don't run this code. It is just an example.
    </div>

In [None]:
#!/bin/bash -l
#PBS -N MyNextflowRun
#PBS -l select=1:ncpus=2:mem=4gb
#PBS -l walltime=24:00:00
cd $PBS_O_WORKDIR
module load java
NXF_OPTS='-Xms1g -Xmx4g'
nextflow run nf-core/rnaseq

What do these lines mean?

Lines 1-5 are typical PBS system commands, here the name of the job is MyNextflowRun, 2 CPUS and 4gb of ram is selected, and the job will run for 24 hours. This is the total time for the pipeline run - it may take days or weeks depending on how much data and the pipeline.

Line 6 is to ensure the java environment is available (Nextflow needs Java to run)

Line 7 tells Nextflow how much ram to use

Line 8 runs Nextflow.

To see the output of Nextflow while running as a job you can use the Nextflow Tower.

****************************

## 6. Using the Nextflow Tower <a class="anchor" id="tower"></a>

Nextflow Tower allow monitoring of Nextflow runs. To use the NFTower, please visit

https://nftower.qut.edu.au or the [BioCommons: Nextflow Tower](https://tower.services.biocommons.org.au/login)

There are no passwords for the Tower, instead, you use a link sent to your email.

Look for the Sign in button (Top Right) then provide your email address.

In the email that comes from eresearch@qut.edu.au, look for the “Access Nextflow Tower now!” option.