[![Jupyter Notebook](https://img.shields.io/badge/Jupyter%20Notebook-orange)](https://github.com/laminlabs/lamin-usecases/blob/main/docs/facs.ipynb)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/laminlabs/lamin-usecases/main?labpath=lamin-usecases%2Fdocs%2Ffacs.ipynb)

# Validate & register flow cytometry data

Flow cytometry is a technique used to analyze and sort cells or particles based on their physical and chemical characteristics as they flow in a fluid stream through a laser beam.

Here, we'll transform, validate and register two flow cytometry datasets ([Alpert19](https://www.nature.com/articles/s41591-019-0381-y) and [FlowIO sample](https://github.com/whitews/FlowIO/blob/master/examples/fcs_files/100715.fcs)) to demonstrate how to create and query a custom flow cytometry registry.

In [None]:
!lamin init --storage ./test-flow --schema bionty

In [None]:
import lamindb as ln
import lnschema_bionty as lb
import readfcs

lb.settings.species = "human"

In [None]:
ln.track()

## Alpert19

### Transform ![](https://img.shields.io/badge/Transform-10b981)

(Here we skip steps of data transformations, which often includes filtering, normalizing, or formatting data.)

We start with a flow cytometry file from Alpert19:

In [None]:
ln.dev.datasets.file_fcs_alpert19(
    populate_registries=True,  # pre-populate registries to simulate an used instance
)

Use [readfcs](https://lamin.ai/docs/readfcs) to read the fcs file into memory:

In [None]:
adata = readfcs.read("Alpert19.fcs")
adata

### Validate ![](https://img.shields.io/badge/Validate-10b981) 

First, let's validate the features in `.var`.

We'll use the `CellMarker` reference to link features:

In [None]:
lb.CellMarker.validate(adata.var.index, "name");

We see that many features aren't validated. Let's standardize the identifiers first to get rid of synonyms:

In [None]:
adata.var.index = lb.CellMarker.standardize(adata.var.index)

Great, now we can validate our markers once more:

In [None]:
validated = lb.CellMarker.validate(adata.var.index, "name")

Things look much better, but we still have 5 CellMaker records that seem more like metadata.
Hence, let's curate the AnnData object a bit more.

Let's move metadata (non-validated cell markers) into `adata.obs`:

In [None]:
adata.obs = adata[:, ~validated].to_df()
adata = adata[:, validated].copy()

Now we have a clean panel of 35 cell markers:

In [None]:
lb.CellMarker.validate(adata.var.index, "name");

Next, let's register the metadata features we moved to .obs:

In [None]:
# Feature.from_df creates feature records with type auto-populated
features = ln.Feature.from_df(adata.obs)

In [None]:
ln.add(features)

In addition, We'd also like to link this file with external features:

In [None]:
ln.Feature.validate("assay", "name")
lb.ExperimentalFactor.validate("FACS", "name");

Since we never validated the term "FACS", let's search for it's ontology and register it:

In [None]:
lb.ExperimentalFactor.bionty().search("FACS").head(2)

In [None]:
facs = lb.ExperimentalFactor.from_bionty(ontology_id="EFO:0009108")
facs.save()

### Register ![](https://img.shields.io/badge/Register-10b981)

In [None]:
file = ln.File.from_anndata(adata, description="Alpert19", field=lb.CellMarker.name)

In [None]:
file.save()

In [None]:
features = ln.Feature.lookup()
file.add_labels(facs, features.assay)
file.add_labels(lb.settings.species, features.species)

In [None]:
file.features

Check a few validated cell markers in `.var`:

In [None]:
file.features["var"].df().head(10)

## FlowIO sample

Let's transform, validate and register another flow file:

### Transform ![](https://img.shields.io/badge/Transform-10b981)

There are no further transformations necessary.

In [None]:
adata2 = readfcs.read(ln.dev.datasets.file_fcs())

### Validate ![](https://img.shields.io/badge/Validate-10b981) 

We'd like to track all features in `.var`, so we register them:

In [None]:
adata2.var.index = lb.CellMarker.bionty().standardize(adata2.var.index)

In [None]:
markers = lb.CellMarker.from_values(adata2.var.index, "name")
ln.save(markers)

Standardize synonyms so that all features pass validation:

In [None]:
adata2.var.index = lb.CellMarker.standardize(adata2.var.index)

In [None]:
lb.CellMarker.validate(adata2.var.index, "name");

### Register ![](https://img.shields.io/badge/Register-10b981) 

In [None]:
file2 = ln.File.from_anndata(
    adata2, description="My fcs file", field=lb.CellMarker.name
)

In [None]:
file2.save()

In [None]:
file2.add_labels(facs, features.assay)
file2.add_labels(lb.settings.species, features.species)

In [None]:
file2.features

In [None]:
file2.view_flow()

### Query by cell markers ![](https://img.shields.io/badge/Access-10b981) 

Which datasets have CD14 in the flow panel:

In [None]:
cell_markers = lb.CellMarker.lookup()

In [None]:
cell_markers.cd14

In [None]:
panels_with_cd14 = ln.FeatureSet.filter(cell_markers=cell_markers.cd14).all()

In [None]:
ln.File.filter(feature_sets__in=panels_with_cd14).df()

Shared cell markers between two files:

In [None]:
files = ln.File.filter(feature_sets__in=panels_with_cd14, species__name="human").list()
file1, file2 = files[0], files[1]

In [None]:
file1_markers = file1.features["var"]
file2_markers = file2.features["var"]

shared_markers = file1_markers & file2_markers
shared_markers.list("name")

## Flow marker registry

Check out your CellMarker registry:

In [None]:
lb.CellMarker.filter().df()

In [None]:
# a few tests
assert set(shared_markers.list("name")) == set(
    [
        "Ccr7",
        "CD3",
        "Cd14",
        "Cd19",
        "CD127",
        "CD27",
        "CD28",
        "CD8",
        "Cd4",
        "CD57",
    ]
)
ln.File.filter(feature_sets__in=panels_with_cd14).exists()

In [None]:
# clean up test instance
!lamin delete --force test-flow
!rm -r test-flow