# Append a new batch of data

We have one file in storage and are about to receive a new batch of data.

In this notebook, we'll see how to manage the situation.

In [None]:
import lamindb as ln
import lnschema_bionty as lb
import readfcs

lb.settings.species = "human"

In [None]:
ln.track()

## Ingest a new file

### Access ![](https://img.shields.io/badge/Access-10b981)

Let us validate and register another `.fcs` file:

In [None]:
filepath = ln.dev.datasets.file_fcs()

adata = readfcs.read(filepath)

In [None]:
adata

## Transform: normalize ![](https://img.shields.io/badge/Transform-10b981)

In [None]:
import anndata as ad
import pytometry as pm

In [None]:
pm.pp.split_signal(adata, var_key="channel")

In [None]:
pm.tl.normalize_arcsinh(adata, cofactor=150)

In [None]:
adata = adata[  # subset to rows that do not have nan values
    adata.to_df().isna().sum(axis=1) == 0
]

In [None]:
adata.to_df().describe()

### Validate cell markers ![](https://img.shields.io/badge/Validate-10b981) 

Let's see how many markers validate:

In [None]:
validated = lb.CellMarker.validate(adata.var.index)

Let's standardize and re-validate:

In [None]:
adata.var.index = lb.CellMarker.standardize(adata.var.index)
validated = lb.CellMarker.validate(adata.var.index)

Next, register non-validated markers from Bionty:

In [None]:
records = lb.CellMarker.from_values(adata.var.index[~validated])
ln.save(records)

Now they pass validation:

In [None]:
validated = lb.CellMarker.validate(adata.var.index)
assert all(validated)

### Register ![](https://img.shields.io/badge/Register-10b981) 

In [None]:
modalities = ln.Modality.lookup()
features = ln.Feature.lookup()
efs = lb.ExperimentalFactor.lookup()
species = lb.Species.lookup()
markers = lb.CellMarker.lookup()

In [None]:
file = ln.File.from_anndata(
    adata,
    description="Flow cytometry file 2",
    field=lb.CellMarker.name,
    modality=modalities.protein,
)

In [None]:
file.save()

In [None]:
file.labels.add(efs.fluorescence_activated_cell_sorting, features.assay)
file.labels.add(species.human, features.species)

In [None]:
file.features

View data flow:

In [None]:
file.view_flow()

Inspect a PCA fo QC - this dataset looks much like noise:

In [None]:
import scanpy as sc

sc.pp.pca(adata)
sc.pl.pca(adata, color=markers.cd14.name)

## Create a new version of the dataset by appending a file

Query the old version:

In [None]:
dataset_v1 = ln.Dataset.filter(name="My versioned FACS dataset").one()

In [None]:
dataset_v2 = ln.Dataset(
    [file, dataset_v1.file], is_new_version_of=dataset_v1, version="2"
)

In [None]:
dataset_v2

In [None]:
dataset_v2.features

In [None]:
dataset_v2

In [None]:
dataset_v2.save()

In [None]:
dataset_v2.labels.add(efs.fluorescence_activated_cell_sorting, features.assay)
dataset_v2.labels.add(species.human, features.species)

In [None]:
dataset_v2.view_flow()