# Validate and register labels

This guide shows how to validate and curate labels in a DataFrame using LaminDB registries.

The validated object can be subsequently registered as an {class}`~lamindb.Artifact` in your LaminDB instance.

## Set up

In [None]:
!lamin init --storage ./test-validator --schema bionty

In [None]:
import lamindb as ln
import bionty as bt
import pandas as pd
from lamin_validator import Validator

ln.settings.verbosity = "hint"

## A DataFrame with labels

Let's start with a DataFrame object that we'd like to validate and curate:

In [None]:
df = pd.DataFrame({
    "cell_type": ["cerebral pyramidal neuron", "astrocyte", "oligodendrocyte"],
    "assay_ontology_id": ["EFO:0008913", "EFO:0008913", "EFO:0008913"],
    "donor": ["D0001", "D0002", "DOOO3"],
})
df

## Validate and curate metadata

Define validation criteria for the columns:

In [None]:
fields = {
    "cell_type": bt.CellType.name,
    "assay_ontology_id": bt.ExperimentalFactor.ontology_id,
    "donor": ln.ULabel.name,
}

Validate the Pandas DataFrame:

In [None]:
validator = Validator(df, fields=fields)

In [None]:
validated = validator.validate()

In [None]:
validated

## Register new metadata labels

Following the suggestions above to register labels that aren't present in the current instance:

(Note that our current instance is empty. Once you filled up the registries, registering new labels won't be frequently needed)

In [None]:
validator.register_labels("cell_type")

Fix typo and register again:

In [None]:
# use a lookup object to get the correct spelling of categories from public reference
lookup = validator.lookup("public")

In [None]:
lookup

In [None]:
cell_types = lookup["cell_type"]

In [None]:
cell_types.cerebral_cortex_pyramidal_neuron

In [None]:
# fix the typo
df["cell_type"] = df["cell_type"].replace({"cerebral pyramidal neuron": cell_types.cerebral_cortex_pyramidal_neuron.name})

validator.register_labels("cell_type")

In [None]:
validator.register_labels("assay_ontology_id")

In [None]:
validator.register_labels("donor")

To register non-validated terms, pass `validated_only=False`:

In [None]:
validator.register_labels("donor", validated_only=False)

Let's validate it again:

In [None]:
validated = validator.validate()

In [None]:
validated

## Register file

Now we are ready to register the artifact to the working instance:

In [None]:
ln.transform.stem_uid = "WOK3vP0bNGLx"
ln.transform.version = "0"
ln.track()

In [None]:
artifact = validator.register_artifact(description="test sample sheet")

View the registered artifact with metadata:

In [None]:
artifact.describe()

## Register collection

Register a new collection for the registered artifact:

In [None]:
# register a new collection
collection = validator.register_collection(
    artifact,  # registered artifact above, can also pass a list of artifacts
    name="Experiment X in brain",  # title of the publication
    description="10.1126/science.abl5197",  # DOI of the publication
    reference="E-MTAB-11536", # accession number (e.g. GSE#, E-MTAB#, etc.)
    reference_type="ArrayExpress") # source type (e.g. GEO, ArrayExpress, SRA, etc.)

In [None]:
collection.artifact

In [None]:
artifact.collection