# Track redun workflows

```{note}

This use case starts out with Rico Meinl's [GitHub repository](https://github.com/ricomnl/bioinformatics-pipeline-tutorial/tree/redun) (and [blog post](https://ricomnl.com/blog/bottom-up-bioinformatics-pipeline-extension-redun/)).

```

```{tip}

Source notebooks are in the [redun-lamin-fasta](https://github.com/laminlabs/redun-lamin-fasta) repository.

```

```{toctree}
:maxdepth: 1
:hidden:

redun-run
```

While redun focuses on managing worfklows for data pipelines, LaminDB offers a provenance-aware data lake.

redun schedules, executes, and tracks pipelines runs with a great level of control and metadata.

LaminDB's data lake complements redun with

1. data lineage _across_ computational pipelines, interactive analyses (notebooks), and UI-submitted data
2. curating, querying & structuring data by biological entities
3. extensible & modular Python ORM for queries & data access

In [None]:
!lamin login testuser1

In [None]:
!lamin init --storage .  --name redun-lamin-fasta

## Track the workflow as a pipeline

In [None]:
import lamindb as ln
import json

Track the workflow in the `Transform` registry:

In [None]:
ln.Transform(
    name="lamin-redun-fasta",
    type="pipeline",
    version="0.1.0",
    reference="https://github.com/laminlabs/redun-lamin-fasta",
).save()

## Amend the original redun workflow

To also track the input files that the redun workflow uses, we added the following lines to [workflow.py](https://github.com/laminlabs/redun-lamin-fasta/blob/main/docs/guide/workflow.py):

```python
    # register input files in lamindb
    ln.save(ln.File.from_dir(input_dir))
    # query & track this pipeline
    transform = ln.Transform.filter(name="lamin-redun-fasta", version="0.1.0").one()
    ln.track(transform)
    # query input files
    input_fastas = [
        File(str(file.stage())) for file in ln.File.filter(key__startswith="fasta/")
    ]
```

## Execute redun

Let's see what the input files are:

In [None]:
!ls ./fasta

And call the workflow:

In [None]:
!redun run workflow.py main --input-dir ./fasta --tag run=test-run  1> redun_stdout.txt 2>redun_stderr.txt

Inspect the output:

In [None]:
!cat redun_stdout.txt

And the error log:

In [None]:
!tail -1 redun_stderr.txt

Export run information to json and load it back to LaminDB:

In [None]:
!redun log --exec --exec-tag run=test-run --format json --no-pager > redun_exec.json

In [None]:
redun_exec = json.load(open("redun_exec.json"))

redun_exec

## Track redun outputs and execution ID

In [None]:
run = ln.Run.filter().order_by("-run_at").first()
run.reference = redun_exec["id"]
run.reference_type = "redun_id"
run.save()

There is just a single output file to track, here:

In [None]:
file = ln.File(
    data="data/results.tgz", description="redun-lamin-fasta results", run=run
)
file.save()
file.view_lineage()

## View the database content

In [None]:
ln.view()

In [None]:
!lamin delete redun-lamin-fasta