# Ingest and track data from pipeline runs

We've now seen how individual datasets can be ingested, let's move on to ingesting datasets generated by a pipeline run. 

```{note}

For the purpose of this guide, we ingest the pipeline output from within this notebook. Typically, this is done from the command line.
```

In [None]:
import lamindb as ln
import lamindb.schema as lns

ln.nb.header()

## Ingest raw data

In [None]:
filepath = ln.dev.datasets.dir_scrnaseq_cellranger() / "fastq/sample_1_R1.fastq.gz"

filepath

Create a BFX pipeline:

In [None]:
pipeline = ln.add(lns.Pipeline(v="1", name="10x scRNA-seq nextseq"))

In [None]:
pipeline

And a pipeline run:

In [None]:
run = lns.Run(pipeline=pipeline, name="ingest-fastq")

In [None]:
run

We see the run points to the pipeline:

In [None]:
run.pipeline

Let us ingest data from this pipeline run.

In [None]:
dobject_fq = ln.DObject(filepath, source=run)

In [None]:
dobject_fq

In [None]:
dobject_fq.source

In [None]:
dobject_fq = ln.add(dobject_fq)

We can now select dobject by `run`:

In [None]:
ln.select(ln.DObject).join(lns.Run, name="ingest-fastq").df()

## Ingest and track pipeline outputs

In [None]:
output_filepath = ln.dev.datasets.file_bam()

output_filepath

Let's now register another pipeline, which will use cellranger to analyze the scRNA-seq data from the input fastq file.

In [None]:
pipeline = ln.add(lns.Pipeline(v="7", name="Cell Ranger v7"))
run = lns.Run(pipeline=pipeline, name="cellranger scRNA-seq")

run

```{note}

Linking run input files will allow data lineage tracking.
```

In [None]:
run.inputs.append(dobject_fq)

In [None]:
dobject = ln.DObject(output_filepath, source=run)

ln.add(dobject)

## Track data lineage

Now let's track from which files that the `output.bam` file is generated, aka, the input file of the run that produced file `output.bam`

In [None]:
ss = ln.Session()

run = ss.select(lns.Run).join(ln.DObject, name="output", suffix=".bam").one()

In [None]:
run.inputs

In [None]:
ss.close()