# Nextflow

[Nextflow](https://www.nextflow.io/) is a workflow management system used for executing scientific workflows across platforms scalably, portably, and reproducibly.
There are several options to automatically register Nextflow pipeline outputs into Lamindb instances:

1. Using a serverless environment trigger (e.g., AWS Lambda) to execute a Python script.
2. By using a [post-run script](https://docs.seqera.io/platform/23.4.0/launch/advanced#pre-and-post-run-scripts) on the Seqera Platform.

Both approaches execute a script that connects to the Lamindb instance and registers the run output, similar to the one shown below.

This guide shows how to manually register a Nextflow run and illustrates what such an automated script could look like by registering the output of a run of the [nf-core/scrnaseq](https://nf-co.re/scrnaseq/latest) pipeline into a lamin instance.

In [None]:
!lamin init --storage ./test-nextflow --name test-nextflow

In [None]:
import lamindb as ln
import anndata as ad

## Run nf-core/scrnaseq pipeline

Run nf-core/scrnaseq pipeline which can serve as a trigger for a registration script.

In [None]:
# The input data can be on a different machine than the one that executes the Nextflow pipeline
input_path = ln.UPath("s3://lamindb-test/scrnaseq_input")
input_path.download_to("scrnaseq_input")

In [None]:
# The test profile takes all just downloaded input files as input.
!nextflow run nf-core/scrnaseq -r 2.7.1 -profile docker,test -resume --outdir scrnaseq_output

:::{dropdown} What is the full run command for the test profile?

```
nextflow run nf-core/scrnaseq -r 2.7.1 \
    -profile docker \
    -resume \
    --outdir scrnaseq_output \
    --input 'scrnaseq_input/samplesheet-2-0.csv' \
    --skip_emptydrops \
    --fasta 'https://github.com/nf-core/test-datasets/raw/scrnaseq/reference/GRCm38.p6.genome.chr19.fa' \
    --gtf 'https://github.com/nf-core/test-datasets/raw/scrnaseq/reference/gencode.vM19.annotation.chr19.gtf' \
    --aligner 'star' \
    --protocol '10XV2' \
    --max_cpus 2 \
    --max_memory '6.GB' \
    --max_time '6.h'
```
:::

## Run registration script

After the pipeline has successfully completed, a script can be kicked off that registers the input and output data into a lamindb instance.

```{eval-rst}
.. literalinclude:: register_scrnaseq_run.py
   :language: python
   :caption: nf-core/scrnaseq run registration
```

In [None]:
!python register_scrnaseq_run.py --input scrnaseq_input --output scrnaseq_output

## Data lineage

The output data could now be accessed (in a different notebook/script) for analysis with full lineage.

In [None]:
matrix_af = ln.Artifact.get(description__icontains="filtered count matrix")

In [None]:
matrix_af.view_lineage()

## View transforms and runs in LaminHub

[![hub](https://img.shields.io/badge/View%20in%20LaminHub-mediumseagreen)](https://lamin.ai/laminlabs/lamindata/transform/vMwsczN6lGZWRm8w/foyuuRRmEt7KYiaU8hPD)

<img src="https://lamin-site-assets.s3.amazonaws.com/.lamindb/FtEgTeQ9FXdbVWNnVTZ2.png" width="900px">

## View the database content

In [None]:
ln.view()

In [None]:
# clean up the test instance:
!rm -rf test-nextflow
!lamin delete --force test-nextflow