# Validate and register an AnnData object

This guide shows how to validate and curate an AnnData object using LaminDB registries.

The validated object can be subsequently registered as an {class}`~lamindb.Artifact` in your LaminDB instance.

## Set up

Load your instance to register the validated AnnData:

In [None]:
!lamin init --storage ./test-anndata-validator --schema bionty

In [None]:
import lamindb as ln
import bionty as bt
from lamin_validator import AnnDataValidator, datasets

ln.settings.verbosity = "hint"

## An AnnData object with metadata

Let's load an AnnData object that we want to curate.
Note that our convenience dataloader prepopulates the registries to speed up the curation process.

In [None]:
adata = datasets.anndata_human_immune_cells(populate_registries=True)
adata

## Validate and curate metadata

Define validation criteria for the observation columns:

In [None]:
obs_fields = {
    "assay": bt.ExperimentalFactor.name,
    "cell_type": bt.CellType.name,
    "donor": ln.ULabel.name,
    "sex_ontology_term_id": bt.Phenotype.ontology_id,
    "tissue": bt.Tissue.name,
}

Next, we create a {class}`~lamin_validator.AnnDataValidator` object where we specify which `var_fields` and `obs_fields` to validate against.
We further specify to use the [cellxgene instance](https://lamin.ai/laminlabs/cellxgene) registries to curate against.
This allows us to register values that are currently missing in our instance from the [cellxgene instance](https://lamin.ai/laminlabs/cellxgene) directly.
By having our own registry but also validating against the [cellxgene instance](https://lamin.ai/laminlabs/cellxgene), we enable the addition of new registry values while keeping the [cellxgene instance](https://lamin.ai/laminlabs/cellxgene) focused on the [cellxgene schema](https://github.com/chanzuckerberg/single-cell-curation/tree/main/schema).

In [None]:
validator = AnnDataValidator(
    adata, 
    using="laminlabs/cellxgene",
    var_field=bt.Gene.ensembl_gene_id, 
    obs_fields=obs_fields,
)

In [None]:
validated = validator.validate(organism="human")

In [None]:
validated

## Register new metadata labels

Following the suggestions above to register genes and labels that aren't present in the current instance:

(Note that our instance is rather empty. Once you filled up the registries, registering new labels won't be frequently needed)

In [None]:
validator.register_variables()

In [None]:
validator.register_labels("assay")

In [None]:
validator.register_labels("donor")

In [None]:
validator.register_labels("donor", validated_only=False)

In [None]:
validator.register_labels('sex_ontology_term_id')

In [None]:
validator.register_labels('tissue')

An error is shown for the tissue label "lungg", which is a typo, should be "lung". Let's fix it:

In [None]:
# using a lookup object to find the correct term
tissues = validator.lookup()["tissue"]

In [None]:
adata.obs["tissue"] = adata.obs["tissue"].cat.rename_categories({"lungg": tissues.lung.name})

In [None]:
validator.register_labels('tissue')

Let's validate it again:

In [None]:
validated = validator.validate()

In [None]:
validated

## Register file

Now we are ready to register the artifact to the working instance:

In [None]:
ln.transform.stem_uid = "WOK3vP0bNGLx"
ln.transform.version = "0"
ln.track()

In [None]:
artifact = validator.register_artifact(description="test h5ad file")

View the registered artifact with metadata:

In [None]:
artifact.describe()

## Register collection

Register a new collection for the registered artifact:

In [None]:
# register a new collection
collection = validator.register_collection(
    artifact,  # registered artifact above, can also pass a list of artifacts
    name="Cross-tissue immune cell analysis reveals tissue-specific features in humans (for test demo only)",  # title of the publication
    description="10.1126/science.abl5197",  # DOI of the publication
    reference="E-MTAB-11536", # accession number (e.g. GSE#, E-MTAB#, etc.)
    reference_type="ArrayExpress") # source type (e.g. GEO, ArrayExpress, SRA, etc.)

In [None]:
collection.artifact

In [None]:
artifact.collection