[![Jupyter Notebook](https://img.shields.io/badge/Jupyter%20Notebook-orange)](https://github.com/laminlabs/lamin-usecases/blob/main/docs/facs.ipynb)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/laminlabs/lamin-usecases/main?labpath=lamin-usecases%2Fdocs%2Ffacs.ipynb)

# Flow cytometry

Flow cytometry is a technique used to analyze and sort cells or particles based on their physical and chemical characteristics as they flow in a fluid stream through a laser beam.

Here, we'll transform, validate and register two flow cytometry datasets ([Alpert19](https://www.nature.com/articles/s41591-019-0381-y) and [FlowIO sample](https://github.com/whitews/FlowIO/blob/master/examples/fcs_files/100715.fcs)) to demonstrate how to create and query a custom flow cytometry registry.

## Setup

In [None]:
!lamin init --storage ./test-flow --schema bionty

In [None]:
import lamindb as ln
import lnschema_bionty as lb
import readfcs
import pytometry as pm
import scanpy as sc

lb.settings.species = "human"

In [None]:
ln.track()

## Alpert19

### Access ![](https://img.shields.io/badge/Access-10b981)

We start with a flow cytometry file from Alpert19:

In [None]:
ln.dev.datasets.file_fcs_alpert19(
    populate_registries=True,  # pre-populate registries to simulate an used instance
)

Use [readfcs](https://lamin.ai/docs/readfcs) to read the fcs file into memory:

In [None]:
adata = readfcs.read("Alpert19.fcs")
adata

### Transform ![](https://img.shields.io/badge/Transform-10b981)

In [None]:
pm.pp.split_signal(adata, var_key="channel")

In [None]:
pm.tl.normalize_arcsinh(adata, cofactor=150)

### Validate ![](https://img.shields.io/badge/Validate-10b981) 

First, let's validate the features in `.var` using {class}`~docs:lnschema_bionty.CellMarker`:

In [None]:
lb.CellMarker.validate(adata.var.index);

We see that many features aren't validated. Let's standardize the identifiers to map synonyms:

In [None]:
adata.var.index = lb.CellMarker.standardize(adata.var.index)
validated = lb.CellMarker.validate(adata.var.index)

More markers are validated now, but we still have 5 cell markers that seem more like metadata.
Hence, let's curate the `AnnData` object a bit more.

Let's move metadata (non-validated cell markers) into `adata.obs`:

In [None]:
adata.obs = adata[:, ~validated].to_df()
adata = adata[:, validated].copy()

Now we have a clean panel of 35 validated cell markers:

In [None]:
lb.CellMarker.validate(adata.var.index);

Next, let's register the metadata features we moved to `.obs`:

In [None]:
# Feature.from_df creates feature records with types
features = ln.Feature.from_df(adata.obs)
ln.add(features)

Lastly, we'd like to annotate this file with "assay".

Since we never validated the term "FACS", let's search for its ontology from public source and register it:

In [None]:
lb.ExperimentalFactor.bionty().search("FACS").head(2)

In [None]:
lb.ExperimentalFactor.from_bionty(ontology_id="EFO:0009108").save()

### Register ![](https://img.shields.io/badge/Register-10b981)

In [None]:
modalities = ln.Modality.lookup()
features = ln.Feature.lookup()
efs = lb.ExperimentalFactor.lookup()
species = lb.Species.lookup()

In [None]:
file = ln.File.from_anndata(
    adata, description="Alpert19", field=lb.CellMarker.name, modality=modalities.protein
)

In [None]:
file.save()

In [None]:
file.labels.add(efs.fluorescence_activated_cell_sorting, features.assay)
file.labels.add(species.human, features.species)

In [None]:
file.features

Check a few validated cell markers in `.var`:

In [None]:
file.features["var"].df().head()

Use auto-complete for marker names:

In [None]:
markers = file.features["var"].lookup()

In [None]:
sc.pp.pca(adata)
sc.pl.pca(adata, color=markers.cd14.name)

## FlowIO sample

Let's validate and register another flow file:

### Access ![](https://img.shields.io/badge/Access-10b981)

In [None]:
adata2 = readfcs.read(ln.dev.datasets.file_fcs())

This `AnnData` object does not require filtering, normalizing or formatting, hence, there is no ![](https://img.shields.io/badge/Transform-10b981) step.

### Validate ![](https://img.shields.io/badge/Validate-10b981) 

First, let's standardize the cell markers and validate them:

In [None]:
adata2.var.index = lb.CellMarker.standardize(adata2.var.index)
validated = lb.CellMarker.validate(adata2.var.index)

Next, register non-validated markers from Bionty:

In [None]:
records = lb.CellMarker.from_values(adata2.var.index[~validated])
ln.save(records)

Now they pass validation except for non-markers: 'FSC-A', 'FSC-H'

In [None]:
lb.CellMarker.validate(adata2.var.index);

### Register ![](https://img.shields.io/badge/Register-10b981) 

In [None]:
file2 = ln.File.from_anndata(
    adata2,
    description="My fcs file",
    field=lb.CellMarker.name,
    modality=modalities.protein,
)

In [None]:
file2.save()

In [None]:
file2.labels.add(efs.fluorescence_activated_cell_sorting, features.assay)
file2.labels.add(species.human, features.species)

In [None]:
file2.features

View data flow:

In [None]:
file2.view_flow()

## Flow marker registry ![](https://img.shields.io/badge/Access-10b981) 

Check out your flow marker registry:

In [None]:
lb.CellMarker.filter().df()

Search for a marker (synonyms aware):

```{tip}

Search for a non-registered marker from public source: `lb.CellMarker.bionty().search(...)`
```

In [None]:
lb.CellMarker.search("PD-1").head(2)

Auto-complete of markers:

In [None]:
cell_markers = lb.CellMarker.lookup()

In [None]:
cell_markers.cd14

Query panels and datasets based on markers, e.g. which datasets have CD14 in the flow panel:

In [None]:
panels_with_cd14 = ln.FeatureSet.filter(cell_markers=cell_markers.cd14).all()

In [None]:
ln.File.filter(feature_sets__in=panels_with_cd14).df()

Shared cell markers between two files:

In [None]:
# no need to load the content of files
files = ln.File.filter(feature_sets__in=panels_with_cd14, species=species.human).list()
file1, file2 = files[0], files[1]

In [None]:
file1_markers = file1.features["var"]
file2_markers = file2.features["var"]

shared_markers = file1_markers & file2_markers
shared_markers.list("name")

Load file in memory:

In [None]:
file1.load()

In [None]:
# clean up test instance
!lamin delete --force test-flow
!rm -r test-flow