# Ingest data from pipeline runs

We've now seen how individual datasets can be ingested, let's move on to ingesting datasets generated by a pipeline run. 

```{note}

For the purpose of this guide, we ingest the pipeline output from within this notebook. Typically, this is done from the command line.
```

[`lnbfx`](https://lamin.ai/docs/lnbfx) is an open-source package to manage data from bioinformatics pipelines, complementary to workflow tools.

Here we show how to ingest a file from bfx runs, but is applicable to any types of pipeline runs.

In [None]:
import lamindb as ln
import lnbfx

ln.nb.header()

Here, we ingest a set of bioinformatics output files generated by [Cell Ranger](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger).

In [None]:
bfx_run_output = ln.datasets.dir_scrnaseq_cellranger()
filepath = bfx_run_output / "fastq/sample_1_R1.fastq.gz"

filepath

In [None]:
ingest = ln.db.ingest.add(filepath)

Create bfx pipeline entries to be inserted:

In [None]:
bfx_pipeline = lnbfx.lookup.pipeline.cell_ranger_v7_0_0
pipeline = ln.db.query.pipeline(**bfx_pipeline).one_or_none()
if pipeline is None:
    pipeline_id = ln.db.insert.pipeline(**bfx_pipeline)
    ln.db.insert.bfx_pipeline(id=bfx_pipeline["id"], v=bfx_pipeline["v"])
else:
    pipeline_id = pipeline.id

# create a pipeline_run entry
pipeline_run = ln.schema.core.pipeline_run(
    pipeline_id=pipeline_id, pipeline_v=bfx_pipeline["v"], name="bfx_run_001"
)

# create a bfx_run entry
bfx_run = ln.schema.bfx.bfx_run(
    id=pipeline_run.id,
    dir="bfx_run_001",
    bfx_pipeline_id=bfx_pipeline["id"],
    bfx_pipeline_v=bfx_pipeline["v"],
)

In [None]:
pipeline_run, bfx_run

Link the pipeline entries to dobject

In [None]:
ingest.link.pipeline_run(pipeline_run)

In [None]:
ingest.link.add_entry("bfx_run", bfx_run)

Link biometa to dobject

In [None]:
biosample_id = ln.db.insert.biosample(name="test_biosample")
biometa = ln.schema.wetlab.biometa(biosample_id=biosample_id)

In [None]:
biometa

In [None]:
ingest.link.biometa(biometa)

Check all linked entries

In [None]:
ingest.link.linked_entries

Complete the ingestion

In [None]:
ingest.commit()

Query dobject by linked metadata

In [None]:
ln.db.query.dobject(where=dict(biosample=dict(name="test_biosample"))).df()

In [None]:
ln.db.query.dobject(where=dict(pipeline_run=dict(name="bfx_run_001"))).df()