# Tracking Bulk RNA-seq Nextflow runs

## Background

[Nextflow](https://www.nextflow.io/) is a workflow management system used for orchestrating and executing scientific workflows across different computational environments. Fundamental features include ease of scalability, portability, and reproducibility, as it allows researchers to define complex workflows in a platform-agnostic manner and run them efficiently on various computing infrastructures.

Here, we will demonstrate how to track Nextflow workflow execution and generated biological entities with [lamin](https://lamin.ai/).

## Setup

To run this notebook, you need to load a LaminDB instance that has the `bionty`` schema mounted.

Here, we’ll create a test instance (skip if you’d like to run it using your instance):

In [None]:
!lamin init --storage ./bulk_rna_seq --schema bionty

In [None]:
import lamindb as ln
import lnschema_bionty as lb
import pandas as pd

ln.settings.verbosity = 3  # show hints

## Fetching and tracking NGS files

First, we fetch 

1. https://github.com/nf-core/fetchngs
2. Track FASTQs

## Analysing raw FASTQ files and generating a count table

3. https://github.com/nf-core/rnaseq/ on the FASTQs
4. Track all output files

## Downstream analysis of RNA counts

5. https://github.com/nf-core/differentialabundance on the count table
6. Track all output files + use Bionty whereever we can

## Conclusion