[![Jupyter Notebook](https://img.shields.io/badge/Jupyter%20Notebook-orange)](https://github.com/laminlabs/lamin-usecases/blob/main/docs/flow.ipynb)

# Validate & register flow cytometry data

In [None]:
!lamin init --storage ./test-flow --schema bionty

In [None]:
import lamindb as ln
import lnschema_bionty as lb
import readfcs

lb.settings.species = "human"  # globally set species

In [None]:
ln.track()

We start with a flow cytometry file from Alpert19:

In [None]:
ln.dev.datasets.file_fcs_alpert19()

Use [readfcs](https://lamin.ai/docs/readfcs) to read the fcs file into memory:

In [None]:
adata = readfcs.read("Alpert19.fcs")
adata

## Track data with cell markers

We'll use the `CellMarker` reference to link features:

In [None]:
file = ln.File.from_anndata(adata, description="Alpert19", var_ref=lb.CellMarker.name)

We see that many features aren't validated. Let's standardize the identifiers:

In [None]:
adata.var.index = lb.CellMarker.standardize(adata.var.index)

Now things look much better, but we still have 5 CellMaker records that seem more like metadata.

In [None]:
file = ln.File.from_anndata(adata, description="Alpert19", var_ref=lb.CellMarker.name)

Hence, let's curate the AnnData a bit more:

In [None]:
validated = lb.CellMarker.bionty().validate(adata.var.index, "name")

Let's move metadata (non-validated cell markers) into `adata.obs`:

In [None]:
adata.obs = adata[:, ~validated].to_df()
adata = adata[:, validated].copy()

Now we have a clean panel of 35 CellMarkers and metadata that we don't want to register:

In [None]:
file = ln.File.from_anndata(adata, description="Alpert19", var_ref=lb.CellMarker.name)

In [None]:
file.save()

In [None]:
file.features

In [None]:
file.features["var"].df().head(10)

Let's register another flow file:

In [None]:
adata2 = readfcs.read(ln.dev.datasets.file_fcs())
file2 = ln.File.from_anndata(
    adata2, description="My fcs file", var_ref=lb.CellMarker.name
)

In [None]:
adata2.var.index = lb.CellMarker.standardize(adata2.var.index)

In [None]:
file2 = ln.File.from_anndata(
    adata2, description="My fcs file", var_ref=lb.CellMarker.name
)

In [None]:
file2.save()

In [None]:
file2.view_lineage()

## Query by cell markers

Which datasets have CD14 in the flow panel:

In [None]:
cell_markers = lb.CellMarker.lookup()

In [None]:
cell_markers.cd14

In [None]:
panels_with_cd14 = ln.FeatureSet.filter(cell_markers=cell_markers.cd14).all()

In [None]:
ln.File.filter(feature_sets__in=panels_with_cd14).df()

Shared cell markers between two files:

In [None]:
files = ln.File.filter(feature_sets__in=panels_with_cd14).list()
file1, file2 = files[0], files[1]

In [None]:
file1_markers = file1.features["var"]
file2_markers = file2.features["var"]

shared_markers = file1_markers & file2_markers
shared_markers.list("name")

## Flow marker registry

Check out your CellMarker registry:

In [None]:
lb.CellMarker.filter().df()

In [None]:
# a few tests
assert set(shared_markers.list("name")) == set(
    [
        "Ccr7",
        "CD3",
        "Cd14",
        "Cd19",
        "CD127",
        "CD27",
        "CD28",
        "CD8",
        "Cd4",
        "CD57",
    ]
)
ln.File.filter(feature_sets__in=panels_with_cd14).exists()

In [None]:
# clean up test instance
!lamin delete --force test-flow
!rm -r test-flow