# Track data sources: `Pipeline`, `Notebook`, `Run`


## What is a `Run`?

{class}`~lamindb.DObject` are transformed by instances of {class}`~lamindb.schema.Run` and appear as {meth}`~lamindb.schema.Run.inputs` and {meth}`~lamindb.schema.Run.outputs`.

`Run` can be created from:
1. A Jupyter `Notebook`: interactive environment
2. A `Pipeline`: versioned code

## Notebook run

We've see a few examples that tracks data from notebooks, for instanace: {doc}`/guide/files`

In [None]:
import lamindb as ln
import lamindb.schema as lns

ln.context.track()

Let us query where `DObject` "iris_new" had been ingested:

In [None]:
ln.select(lns.Notebook).join(lns.Run).join(ln.DObject, name="iris_new").one()

Alternatively, you can query for the run that contains a notebook attribute:

```{admonition} What is ln.Session()?
:class: important

Why do we need session here? Find out in our [Session guide](https://lamin.ai/docs/db/faq/session).

```

In [None]:
with ln.Session() as ss:
    source_run = ss.select(lns.Run).join(ln.DObject, name="iris_new").one()
    print(source_run.notebook)

## Pipeline run

In [None]:
filepath = ln.dev.datasets.file_fastq()

When working with a pipeline, we'll register it before running it.

In [None]:
pipeline = ln.add(lns.Pipeline(v="1", name="10x scRNA-seq nextseq"))

pipeline

We can then use the {class}`~lamindb.context` as before (if we don't register a pipeline with the correct name, we'll be asked to):

In [None]:
ln.context.track(pipeline_name="10x scRNA-seq nextseq")

In [None]:
ln.context.pipeline

In [None]:
ln.context.run

In [None]:
dobject_fastq = ln.DObject(filepath)

In [None]:
ln.add(dobject_fastq)

We can also manually pass a run:
```
run = lns.Run(pipeline=pipeline, name="ingest-fastq")
ln.DObject(filepath, source=run)
```

## Track run inputs

Let's now register another pipeline:

In [None]:
pipeline = ln.add(lns.Pipeline(name="Cell Ranger", v="7"))

Let's create a run context for it:

In [None]:
ln.context.track(pipeline_name="Cell Ranger")

In [None]:
ln.context.run

Now we can opt to track any data object we load as an input for the current run:

In [None]:
dobject_fastq = ln.select(ln.DObject, name="input.fastq.gz").one()

To process in the pipeline, we typically need to load it (download it from the cloud, access the on-disk or in-memory representation):

In [None]:
dobject_fastq.load(is_run_input=True)

In [None]:
output_filepath = ln.dev.datasets.file_bam()

In [None]:
output_filepath

In [None]:
dobject = ln.DObject(output_filepath)

ln.add(dobject)

## Data lineage

Now let's track from which files that the `output.bam` file is generated, aka, the input file of the run that produced file `output.bam`

In [None]:
with ln.Session() as ss:
    run = ss.select(lns.Run).join(ln.DObject, name="output", suffix=".bam").one()
    print(run.inputs)