# Tutorial: Running Nextflow in Python with Latch SDK

Estimated time to complete: 45 minutes

Nextflow is a popular framework for orchestrating bioinformatics workflows. In this tutorial, we will walk through how you can package an existing Nextflow script in Latch Python SDK!

## Prerequisites

Before we start, make sure you:

* Install the [Latch SDK](../getting_started/quick_start.md).
* Understand basic concepts of a workflow through our [Quickstart](../getting_started/quick_start.md) and [Authoring your Own Workflow](../getting_started/authoring_your_workflow.md).

As an example, we will use a BLAST workflow written in Nextflow. Let's dive in!

### Step 1: Install the Tutorial GitHub Repository

In [2]:
! git clone https://github.com/latchbio/blast-nextflow-latch

Cloning into 'blast-nextflow-latch'...
remote: Enumerating objects: 23, done.[K
remote: Counting objects: 100% (23/23), done.[K
remote: Compressing objects: 100% (16/16), done.[K
remote: Total 23 (delta 0), reused 23 (delta 0), pack-reused 0[K
Unpacking objects: 100% (23/23), 8.34 KiB | 2.78 MiB/s, done.


At a high level, the repo contains the original BLAST Nextflow Pipeline, as well as additional files and folders required to upload the workflow to Latch:

<a href="https://ibb.co/pLDVxRK"><img src="https://i.ibb.co/jrK0TWw/latch-nextflow-structure.png" alt="latch-nextflow-structure" border="0" /></a>

We will first attempt to run the Nextflow pipeline locally, and walk through the additional files required to package the pipeline and upload it to Latch.

## Step 2: Install dependencies for the Nextflow Pipeline

To successfully run Nextflow BLAST pipeline inside the Pod, your Pod environment needs:

* Java 8
* Nextflow (version 20.07.x or higher)

In this tutorial, we will use `conda` to manage the pipeline dependencies. We need to additionally install:

* Micromamba (a faster alternative to Anaconda)
* BLAST

Let's walk through our Jupyter Notebook to see how these dependencies are installed!

> Note: All the commands below assume that you are in a Linux environment. Please ensure that you are inside a Latch Pod or an alternative Linux environment before proceeding.

### Update system dependencies

First, let's download and update existing dependencies on our system:

In [3]:
! apt-get update -y && apt-get install -y curl unzip git

Get:1 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Hit:2 https://download.docker.com/linux/ubuntu focal InRelease                 
Hit:3 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu focal InRelease     
Hit:4 https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/ InRelease     
Hit:5 http://archive.ubuntu.com/ubuntu focal InRelease                         
Get:6 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
Get:7 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]
Get:8 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [1277 kB]
Get:9 http://archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 Packages [31.3 kB]
Get:10 http://archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [1929 kB]
Get:11 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [2859 kB]
Fetched 6433 kB in 3s (2379 kB/s)                          
Reading package lists... Done
Reading package list

### Install Java 8

Java 8 is required to run Nextflow. You can install the headless version of Java 8 like so:

In [4]:
! apt-get install -y default-jre-headless

Reading package lists... Done
Building dependency tree       
Reading state information... Done
default-jre-headless is already the newest version (2:1.11-72).
0 upgraded, 0 newly installed, 0 to remove and 78 not upgraded.


### Install Nextflow

In [5]:
! curl -s https://get.nextflow.io | bash && \
    mv nextflow /usr/bin/ && \
    chmod 777 /usr/bin/nextflow 

[Knloading nextflow dependencies. It may require a few seconds, please wait .. Downloading nextflow dependencies. It may require a few seconds, please wait .. 
      N E X T F L O W
      version 22.10.4 build 5836
      created 09-12-2022 09:58 UTC (09:58 GMT)
      cite doi:10.1038/nbt.3820
      http://nextflow.io


Nextflow installation completed. Please note:
- the executable file `nextflow` has been created in the folder: /root
- you may complete the installation by moving it to a directory in your $PATH



### Install Micromamba

Micromamba is a drop-in replace for Conda that is faster and more light-weight. It uses the same commands and configurations as Conda.

In [8]:
! export CONDA_DIR=/opt/conda
! export MAMBA_ROOT_PREFIX=/opt/conda
! export PATH=$CONDA_DIR/bin:$PATH

! apt-get update && apt-get install -y wget bzip2 \
    && wget -qO-  https://micro.mamba.pm/api/micromamba/linux-64/latest | tar -xvj bin/micromamba \
    && touch /root/.bashrc \
    && ./bin/micromamba shell init -s bash -p /opt/conda  \
    && grep -v '[ -z "\$PS1" ] && return' /root/.bashrc  > /opt/conda/bashrc \
    && apt-get clean autoremove --yes \
    && rm -rf /var/lib/{apt,dpkg,cache,log}

Hit:1 https://download.docker.com/linux/ubuntu focal InRelease
Hit:2 https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/ InRelease     
Hit:3 http://archive.ubuntu.com/ubuntu focal InRelease                         
Hit:4 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu focal InRelease
Hit:5 http://security.ubuntu.com/ubuntu focal-security InRelease
Hit:6 http://archive.ubuntu.com/ubuntu focal-updates InRelease
Hit:7 http://archive.ubuntu.com/ubuntu focal-backports InRelease
Reading package lists... Done
Reading package lists... Done
Building dependency tree       
Reading state information... Done
bzip2 is already the newest version (1.0.8-2).
wget is already the newest version (1.20.3-1ubuntu2).
0 upgraded, 0 newly installed, 0 to remove and 78 not upgraded.
bin/micromamba
Modifying RC file "/root/.bashrc"
Generating config for root prefix [1m"/opt/conda"[0m
Setting mamba executable to: [1m"/root/bin/micromamba"[0mAdding (or replacing) the following in your "/root/.bas

We can use YAML files to manage conda dependencies. Inspecting the `blast-nf` folder, there is a `conda.yml` file that specifies `blast`, which is the only dependency required for this pipeline.

```yml
# blast-nf/conda.yml
name: blast-nf
channels:
  - defaults
  - bioconda
  - conda-forge
dependencies:
  - blast
```

You can create an environment called `blast-nf` using Micromamba like so:

In [11]:
%%bash
/root/bin/micromamba create -f /root/blast-nextflow-latch/blast-nf/conda.yml -y


                                           __
          __  ______ ___  ____ _____ ___  / /_  ____ _
         / / / / __ `__ \/ __ `/ __ `__ \/ __ \/ __ `/
        / /_/ / / / / / / /_/ / / / / / / /_/ / /_/ /
       / .___/_/ /_/ /_/\__,_/_/ /_/ /_/_.___/\__,_/
      /_/

bioconda/linux-64                                           Using cache
bioconda/noarch                                             Using cache
conda-forge/linux-64                                        Using cache
conda-forge/noarch                                          Using cache

Transaction

  Prefix: /root/micromamba/envs/blast-nf

  Updating specs:

   - blast


  Package                         Version  Build             Channel                    Size
──────────────────────────────────────────────────────────────────────────────────────────────
  Install:
──────────────────────────────────────────────────────────────────────────────────────────────

  + _libgcc_mutex                     0.1  conda_forge

You can verify the dependencies installation by listing our the environment:

In [12]:
%%bash
/root/bin/micromamba env list


                                           __
          __  ______ ___  ____ _____ ___  / /_  ____ _
         / / / / __ `__ \/ __ `/ __ `__ \/ __ \/ __ `/
        / /_/ / / / / / / /_/ / / / / / / /_/ / /_/ /
       / .___/_/ /_/ /_/\__,_/_/ /_/ /_/_.___/\__,_/
      /_/

  Name       Active  Path                           
──────────────────────────────────────────────────────
  base               /root/micromamba               
  blast              /root/micromamba/envs/blast    
  blast-nf           /root/micromamba/envs/blast-nf 
  rnaseq-nf          /root/micromamba/envs/rnaseq-nf
                     /root/miniconda                
             *       /root/miniconda/envs/jupyterlab


Output on Latch Pod:

```bash
          __  ______ ___  ____ _____ ___  / /_  ____ _
         / / / / __ `__ \/ __ `/ __ `__ \/ __ \/ __ `/
        / /_/ / / / / / / /_/ / / / / / / /_/ / /_/ /
       / .___/_/ /_/ /_/\__,_/_/ /_/ /_/_.___/\__,_/
      /_/

  Name       Active  Path                           
──────────────────────────────────────────────────────
  base               /root/micromamba               
  blast-nf              /root/micromamba/envs/blast-nf
                     /root/miniconda                
             *       /root/miniconda/envs/jupyterlab
```


### Run the BLAST Nextflow Pipeline

Great! Now that we have successfully installed Nextflow and all required dependencies, let's run the BLAST pipeline locally:

In [13]:
%%bash
/root/bin/micromamba run -n blast-nf /bin/bash -c "nextflow run /root/blast-example/main.nf --out /root/results.txt"



N E X T F L O W  ~  version 22.10.4
Launching `/root/blast-example/main.nf` [prickly_baekeland] DSL2 - revision: 06c24a7542
[-        ] process > blast   -
[-        ] process > extract -

executor >  local (1)
[86/26851f] process > blast (1) [  0%] 0 of 1
[-        ] process > extract   -

executor >  local (2)
[86/26851f] process > blast (1)   [100%] 1 of 1 ✔
[f9/d54030] process > extract (1) [  0%] 0 of 1

executor >  local (2)
[86/26851f] process > blast (1)   [100%] 1 of 1 ✔
[f9/d54030] process > extract (1) [100%] 1 of 1 ✔

executor >  local (2)
[86/26851f] process > blast (1)   [100%] 1 of 1 ✔
[f9/d54030] process > extract (1) [100%] 1 of 1 ✔
matching sequences:
 >1ABO:B 
MNDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNS
>1ABO:A 
MNDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNS
>1YCS:B 
PEITGQVSLPPGKRTNLRKTGSERIAHGMRVKFNPLPLALLLDSSLEGEFDLVQRIIYEVDDPSLPNDEGITALHNAVCA
GHTEIVKFLVQFGVNVNAADSDGWTPLHCAASCNNVQVCKFLVESGAAVFAMTYSDMQTAADKCEEMEEGYTQCSQFLYG
VQEKMG

The results would be output under `/root/results.txt`. You can see an example output file [here](https://console.latch.bio/s/594643294258238).

## Step 3: Package the Nextflow Pipeline as a Latch Workflow

<a href="https://ibb.co/pLDVxRK"><img src="https://i.ibb.co/jrK0TWw/latch-nextflow-structure.png" alt="latch-nextflow-structure" border="0" /></a>

Now that we have successfully run the BLAST pipeline, let's walk through the additional files necessary to package it as a **Latch Workflow**. These files are:

1. A Dockerfile to install the required dependencies
2. A Python `__init__.py` to define workflow logic
3. A `version` file to semantically name the workflow version

### Define your Dockerfile to install dependencies

As the Latch workflow will be executed on a fresh machine on the Latch platform, we have to define a Dockerfile with the necessary dependencies for BLAST to run.

To do so, we can copy paste previous commands used to set up our environment:

<a href="https://ibb.co/BqRwNHR"><img src="https://i.ibb.co/dGy7gsy/nextflow-dockerfile.png" alt="nextflow-dockerfile" border="0" /></a>

* **Line 1**: is the [Latch base image](https://github.com/latchbio/latch-base), which is used to configure libraries required for consistent task behaviour.
* **Line 3-4**: downloads and installs the updates for each outdated package and dependency on the machine that executes the workflow. `curl` and `unzip` are also installed.
* **Line 7**: installs the Java runtime environment, which is required to run Nextflow.
* **Line 8-10**: installs Nextflow and moves the binary to `/usr/bin`.
* **Line 13-25**: is a series of commands to install Micromamba.
* **Line 27**: copies the BLAST Nextflow pipeline code to the task execution environment. The `/root/blast-nf` is that path at which the NF code is stored in the machine that executes the task on Latch.
* **Line 30**: uses Micromamba to install the dependencies as spefieid in `/root/blast-nf/conda.yml`.
* **Line 35-39**: are already provided in the boilerplate Dockerfile and are needed to ensure your build envrionment works correctly with Latch.

That's it! You've successfully defined your Dockerfile.

To test whether the Dockerfile builds the correct environment, open a new terminal inside Jupyterlab and register your workflow like so:

```bash
eval `ssh-agent -s`

latch register --remote blast-nextflow-latch
```

Open a remote debugging session:

```console
latch develop .
```

Enter an interative shell:

```console
>>> shell

Syncing local changes... 
Could not find /Users/hannahle/Documents/GitHub/nextflow-latch-wf/data - skipping
Finished syncing.
Pulling 812206152185.dkr.ecr.us-west-2.amazonaws.com/6064_nextflow-latch-wf:0.0.0-7da9b6... 
Image successfully pulled.
```

This will pull your workflow image built by the Dockerfile, which is handy to verify and reiterate on your build commands.

For example, we can verify that Nextflow is installed correctly by typing:

```console
root@ip-10-0-11-243:~# nextflow
Usage: nextflow [options] COMMAND [arg...]

Options:
  -C
     Use the specified configuration file(s) overriding any defaults
  -D
     Set JVM properties
  -bg
     Execute nextflow in background
  -c, -config
  ...
```

### Define the Latch workflow

The core logic of a Latch workflow is in the `wf/__init__.py`.

To wrap the Nextflow workflow inside a Latch workflow, first import the necessary dependencies

```python
import subprocess
from pathlib import Path
from typing import List
from latch import medium_task, workflow
from latch.resources.launch_plan import LaunchPlan
from latch.types import LatchAuthor, LatchFile, LatchMetadata, LatchParameter, LatchDir
```

Next, let's define our task:

<a href="https://ibb.co/DC5qkvb"><img src="https://i.ibb.co/LtnG6cP/blast-nf-wf.png" alt="blast-nf-wf" border="0" /></a>

* **Line 1**: specifies the compute that the RNASeq-NF pipeline needs. Here, we are using a `@small_task`, which will provision a machine with 2 cpus, 4 gigs of memory of memory to run the task. For a comprehensive list of all task resources available, visit [how to define cloud resources](./../basics/defining_cloud_resources.md).
* **Line 3-8:** are the task parameters. We choose `query` and `db` because they are also required parameters in the BLAST Nextflow pipeline, as shown in Nextflow's `main.nf` file.

<a href="https://ibb.co/CM9GF9z"><img src="https://i.ibb.co/pRKFpKx/main-nf.png" alt="main-nf" border="0" /></a>

* **Line 10**: creates a filepath called `results.txt` that can be used to output the BLAST results to.
* **Line 12**: `db.local_path` downloads the BLAST database to the task execution environment. We use Python list comprehension to retrieve all filenames under this directory.
* **Line 14**: retrieves the common filename prefix across alls inside the BLAST database. This is necessary because the BLAST pipeline requires a common filename prefix to be appended to the BLAST database directory.
* **Line 16-29**: specifies the command to be run by Python `subprocess` module.
* **Line 17-22**: tells Micromamba to use the `blast-nf` conda environment previously installed in our Dockerfile.
* **Line 24-27**: is the command to run the BLAST Nextflow pipeline with custom parameters.
* **Line 31**: uses `subprocess` to pops open a process to execute the Nextflow command.
* **Line 33**: takes the output `/root/results.txt` file and uploads it to [Latch Data](https://console.latch.bio/data) under a user-defined filename.

Now you have successfully defined a Latch task with custom compute resources to execute the BLAST Nextflow pipeline on Latch!


### Calling a Latch task inside a Latch workflow

Since this is a single task workflow, you can simply call the task inside the workflow and return its results like so:

```python
@workflow(metadata)
def blast_wf(
    query: LatchFile, db: LatchDir, out: str
) -> LatchFile:
    ...
    return blast_task(query=query, db=db, out=out)
```

### Defining Workflow GUI

To expose workflow parameters to a user-friendly workflow GUI, you can use the `LatchMetadata` object. An important point to note is that all workflow arguments need to be added to the `parameters` key of LatchMetadata for them to display on the GUI. For an exhaustive list of how workflow argument and their Python types map to the front-end interface, visit [Customizing Your Interface](../basics/customizing_interface.md)

```python
"""The metadata included here will be injected into your interface."""
metadata = LatchMetadata(
    display_name="Example: Wrapping a Nextflow BLAST Pipeline in Latch SDK",
    documentation="your-docs.dev",
    author=LatchAuthor(
        name="John von Neumann",
        email="hungarianpapi4@gmail.com",
        github="github.com/fluid-dynamix",
    ),
    repository="https://github.com/your-repo",
    license="MIT",
    parameters={
        "query": LatchParameter(
            display_name="FASTA File",
            description="Select FASTA file.",
            batch_table_column=True,  # Show this parameter in batched mode.
        ),
        "db": LatchParameter(
            display_name="BLAST Database",
            description="Select the database to run BLAST against.",
            batch_table_column=True,  # Show this parameter in batched mode.
        ),
        "out": LatchParameter(
            display_name="Output Text File",
            description="Specify the location of the output text file.",
            batch_table_column=True,  # Show this parameter in batched mode.
        )
    },
    tags=[],
)
```


### Adding Test Data

Finally, we can add some test data to run the workflow.

In the BLAST Nextflow workflow, there is a folder for test data under `blast-nf/data` and an additional folder for BLAST database under `blast-nf/blast-db/pdb`. Let's upload these folders to a public S3 link, so that they can be used by others when running the workflow on Latch:

In [19]:
! latch test-data upload blast-nextflow-latch/blast-nf/data
! latch test-data upload blast-nextflow-latch/blast-nf/blast-db/pdb

[32mSuccessfully uploaded test-data/4034/blast-nextflow-latch/blast-nf/data/sample.fa[0m
[32mSuccessfully uploaded to s3://latch-public/test-data/4034/blast-nextflow-latch/blast-nf/data[0m
[32mSuccessfully uploaded test-data/4034/blast-nextflow-latch/blast-nf/blast-db/pdb/tiny.pin[0m
[32mSuccessfully uploaded test-data/4034/blast-nextflow-latch/blast-nf/blast-db/pdb/tiny.pog[0m
[32mSuccessfully uploaded test-data/4034/blast-nextflow-latch/blast-nf/blast-db/pdb/tiny.psq[0m
[32mSuccessfully uploaded test-data/4034/blast-nextflow-latch/blast-nf/blast-db/pdb/tiny.phr[0m
[32mSuccessfully uploaded test-data/4034/blast-nextflow-latch/blast-nf/blast-db/pdb/tiny.psi[0m
[32mSuccessfully uploaded test-data/4034/blast-nextflow-latch/blast-nf/blast-db/pdb/tiny.psd[0m
[32mSuccessfully uploaded to s3://latch-public/test-data/4034/blast-nextflow-latch/blast-nf/blast-db/pdb[0m


Once the command runs successfully, you will see the links at which the folder is uploaded. You can then use the `LaunchPlan` construct to add the remote files as test data like so:

```python
"""
Add test data with a LaunchPlan. Provide default values in a dictionary with
the parameter names as the keys. These default values will be available under
the 'Test Data' dropdown at console.latch.bio.
"""
LaunchPlan(
    rnaseq_wf,
    "Test Data",
    {
        "reads": [
            LatchFile("s3://test-data/6064/rnaseq-nf/data/ggal/ggal_gut_1.fq"), # <- Here we are using a different user's public S3 link - Substitute with your own if desired. Both will work.
            LatchFile("s3://test-data/6064/rnaseq-nf/data/ggal/ggal_gut_2.fq"),
        ],
        "transcriptome": LatchFile(
            "s3://test-data/6064/rnaseq-nf/data/ggal/ggal_1_48850000_49020000.Ggal71.500bpflank.fa"
        ),
        "outdir": LatchDir("latch:///welcome"),
    },
)
```

### Registering the workflow to Latch Console

To publish the workflow to the Latch platform, you can navigate to the root workflow directory and upload it with the `latch register` command:

```
latch register --remote blast-nextflow-latch
```

This will give us:

* a no-code interface
* managed cloud infrastructure for workflow execution
* a dedicated API endpoint for programmatic execution
* hosted documentation
* parallelized CSV-to-batch execution

Once registration finishes, you can navigate to [Latch](https://console.latch.bio/workflows) to run your workflow.

---


## Commonly Asked Questions

Below we aim to provide answers to the most commonly asked questions about porting a Nextflow pipeline to Latch:

1. **Should I wrap an entire Nextflow pipeline in a single task or refactor each Nextflow process to an individual task?**

    For prototyping purposes, we recommend that you wrap an entire Nextflow pipeline in a single task first. This allows you to quickly experience the development experience with a Pythonic SDK and publish a first workflow that's ready-to-use for scientists.

    One disadvantage of this, however, is all processes are run on a single machine with fixed compute resource. If parallelization of individual processes across multiple machines is desired, it is beneficial to refactor each process into its individual task. With the [SDK's remote debugging toolkit](../basics/local_development.md), refactoring also enables for faster debugging and development.

2. **Can I take advantage of existing Netxflow's community workflows while using the Latch SDK?**

    Yes, absolutely! For example, say you want to run [NF-Core's demultiplex pipeline](https://nf-co.re/demultiplex), you can substitute the `nextflow run` command in our tutorial below with the following inside your Python subprocess:

    ```console
    nextflow run nf-core/demultiplex --input samplesheet.csv --outdir <OUTDIR> -profile <docker/singularity/podman/shifter/charliecloud/conda/institute>
    ```

3. **How does the SDK handle retries?**

    Visit the documentation on how the SDK handles retries [here](https://docs.latch.bio/basics/retries.html). Currently, the SDK does not yet support autoscaling compute resources for failed tasks due to out-of-memory errors. This is a feature we're actively investigating and will release in future versions.
