[![Jupyter Notebook](https://img.shields.io/badge/Jupyter%20Notebook-orange)](https://github.com/laminlabs/lamin-usecases/blob/main/docs/facs.ipynb)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/laminlabs/lamin-usecases/main?labpath=lamin-usecases%2Fdocs%2Ffacs.ipynb)

# Flow cytometry

Flow cytometry is a technique used to detect and measure physical and chemical characteristics of a population of cells or particles ([wiki](https://en.wikipedia.org/wiki/Flow_cytometry)).

Here, we'll walk through how to
1. iteratively ingest datasets
2. query, search, integrate & analyze datasets

```{toctree}
:maxdepth: 1
:hidden:

facs1
facs2
```

## Setup

In [None]:
!lamin init --storage ./test-flow --schema bionty

In [None]:
import lamindb as ln
import lnschema_bionty as lb
import readfcs

lb.settings.species = "human"

In [None]:
ln.track()

## Ingest a first file

### Access ![](https://img.shields.io/badge/Access-10b981)

We start with a flow cytometry file from [Alpert *et al.*, Nat. Med. (2019)](https://pubmed.ncbi.nlm.nih.gov/30842675/).

Calling the following function downloads the file and pre-populates a few relevant registries:

In [None]:
ln.dev.datasets.file_fcs_alpert19(populate_registries=True)

We use [readfcs](https://lamin.ai/docs/readfcs) to read the raw fcs file into memory:

In [None]:
adata = readfcs.read("Alpert19.fcs")
adata

### Transform: normalize ![](https://img.shields.io/badge/Transform-10b981)

In this use case, we'd like to ingest & store curated data, and hence, we split signal and normalize using the [pytometry](https://github.com/buettnerlab/pytometry) package.

In [None]:
import pytometry as pm

In [None]:
pm.pp.split_signal(adata, var_key="channel")

In [None]:
pm.tl.normalize_arcsinh(adata, cofactor=150)

### Validate: cell markers ![](https://img.shields.io/badge/Validate-10b981)

First, we validate features in `.var` using {class}`~docs:lnschema_bionty.CellMarker`:

In [None]:
validated = lb.CellMarker.validate(adata.var.index)

We see that many features aren't validated because they're not standardized.

Hence, let's standardize feature names & validate again:

In [None]:
adata.var.index = lb.CellMarker.standardize(adata.var.index)
validated = lb.CellMarker.validate(adata.var.index)

The remaining non-validated features don't appear to be cell markers but rather metadata features.

Let's move them into `adata.obs`:

In [None]:
adata.obs = adata[:, ~validated].to_df()
adata = adata[:, validated].copy()

Now we have a clean panel of 35 validated cell markers:

In [None]:
validated = lb.CellMarker.validate(adata.var.index)
assert all(validated)  # all markers are validated

### Register: metadata ![](https://img.shields.io/badge/Register-10b981)

Next, let's register the metadata features we moved to `.obs`.

For this, we create one feature record for each column in the `.obs` dataframe:

In [None]:
features = ln.Feature.from_df(adata.obs)
ln.save(features)

We use the [Experimental Factor Ontology](https://www.ebi.ac.uk/efo/) through Bionty to create a "FACS" label for the dataset:

In [None]:
lb.ExperimentalFactor.bionty().search("FACS").head(2)  # search the public ontology

In [None]:
# import the record from the public ontology and save it to the registry
lb.ExperimentalFactor.from_bionty(ontology_id="EFO:0009108").save()

# show the content of the registry
lb.ExperimentalFactor.filter().df()

### Register: register data & annotate with metadata ![](https://img.shields.io/badge/Register-10b981)

In [None]:
modalities = ln.Modality.lookup()
features = ln.Feature.lookup()
efs = lb.ExperimentalFactor.lookup()
species = lb.Species.lookup()

In [None]:
file = ln.File.from_anndata(
    adata, description="Alpert19", field=lb.CellMarker.name, modality=modalities.protein
)

In [None]:
file.save()

Annotate by linking FACS & species labels:

In [None]:
file.labels.add(efs.fluorescence_activated_cell_sorting, features.assay)
file.labels.add(species.human, features.species)

## Inspect the registered file

Inspect features on a high level:

In [None]:
file.features

Inspect low-level features in `.var`:

In [None]:
file.features["var"].df().head()

Use auto-complete for marker names:

In [None]:
markers = file.features["var"].lookup()

In [None]:
import scanpy as sc

sc.pp.pca(adata)
sc.pl.pca(adata, color=markers.cd14.name)