![facs2/4](https://img.shields.io/badge/facs2/4-lightgrey)
[![Jupyter Notebook](https://img.shields.io/badge/Source%20on%20GitHub-orange)](https://github.com/laminlabs/lamin-usecases/blob/main/docs/facs2.ipynb)
[![lamindata](https://img.shields.io/badge/Source%20%26%20report%20on%20LaminHub-mediumseagreen)](https://lamin.ai/laminlabs/lamindata/transform/SmQmhrhigFPL5zKv/dEeVbAltShaPDkspCTMF)

# Append a new dataset

We have one dataset in storage and are about to receive a new dataset.

In this notebook, we'll see how to manage the situation.

In [None]:
import lamindb as ln
import bionty as bt
import readfcs

bt.settings.organism = "human"

ln.track("SmQmhrhigFPL0000")

## Ingest a new artifact

### Access ![](https://img.shields.io/badge/Access-10b981)

Let us validate and register another `.fcs` file from [Oetjen18](https://insight.jci.org/articles/view/124928):

In [None]:
filepath = readfcs.datasets.Oetjen18_t1()

adata = readfcs.read(filepath)
adata

## Transform: normalize ![](https://img.shields.io/badge/Transform-10b981)

In [None]:
import pytometry as pm

In [None]:
pm.pp.split_signal(adata, var_key="channel")

In [None]:
pm.pp.compensate(adata)

In [None]:
pm.tl.normalize_biExp(adata)

In [None]:
adata = adata[  # subset to rows that do not have nan values
    adata.to_df().isna().sum(axis=1) == 0
]

In [None]:
adata.to_df().describe()

### Validate cell markers ![](https://img.shields.io/badge/Validate-10b981) 

Let's see how many markers validate:

In [None]:
validated = bt.CellMarker.validate(adata.var.index)

Let's standardize and re-validate:

In [None]:
adata.var.index = bt.CellMarker.standardize(adata.var.index)
validated = bt.CellMarker.validate(adata.var.index)

Next, register non-validated markers from Bionty:

In [None]:
records = bt.CellMarker.from_values(adata.var.index[~validated])
ln.save(records)

Manually create 1 marker:

In [None]:
bt.CellMarker(name="CD14/19").save()

Move metadata to obs:

In [None]:
validated = bt.CellMarker.validate(adata.var.index)
adata.obs = adata[:, ~validated].to_df()
adata = adata[:, validated].copy()

Now all markers pass validation:

In [None]:
validated = bt.CellMarker.validate(adata.var.index)
assert all(validated)

### Register ![](https://img.shields.io/badge/Register-10b981) 

In [None]:
curate = ln.Curator.from_anndata(adata, var_index=bt.CellMarker.name, categoricals={})
curate.validate()

In [None]:
artifact = curate.save_artifact(description="Oetjen18_t1")

Annotate with more labels:

In [None]:
efs = bt.ExperimentalFactor.lookup()
organism = bt.Organism.lookup()

artifact.labels.add(efs.fluorescence_activated_cell_sorting)
artifact.labels.add(organism.human)

In [None]:
artifact.describe()

Inspect a PCA fo QC - this collection looks much like noise:

In [None]:
import scanpy as sc

markers = bt.CellMarker.lookup()

sc.pp.pca(adata)
sc.pl.pca(adata, color=markers.cd8.name)

## Create a new version of the collection by appending a artifact

Query the old version:

In [None]:
collection_v1 = ln.Collection.get(key="My versioned cytometry collection")

In [None]:
collection_v2 = ln.Collection(
    [artifact, collection_v1.ordered_artifacts[0]],
    revises=collection_v1,
    version="2",
)
collection_v2.describe()

In [None]:
collection_v2.save()