# Enforce pre-defined validation constraints

In a [previous guide](./curate-df), you defined validation constraints ad-hoc when initializing {class}`~lamindb.Curator` objects.

Often, you want to enforce a pre-defined set of validation constraints, like, e.g., the CELLxGENE curator ({doc}`docs:cellxgene-curate`).

This guide shows how to subclass {class}`~lamindb.Curator` to enforce pre-defined constraints.

## Define a custom curator

Consider the example of electronic health records (EHR). We want to ensure that

1. every record has the fields `disease`, `phenotype`, `developmental_stage`, and `age`
2. values for these fields map against specific versions of pre-defined ontologies

The following implementation achieves the goal by subclassing {class}`~lamindb.core.DataFrameCurator`.

```{eval-rst}
.. literalinclude:: ehrcurator.py
   :language: python
   :caption: EHR Curator
```

## Use the custom curator

In [None]:
!lamin init --storage ./subclass-curator --schema bionty

In [None]:
import lamindb as ln
import bionty as bt
import pandas as pd
from ehrcurator import EHRCurator

ln.track("2XEr2IA4n1w40000")

In [None]:
# create example DataFrame that has all mandatory columns but one ('patient_age') is wrongly named
data = {
    'disease': ['Alzheimer disease', 'diabetes mellitus', 'breast cancer', 'Hypertension', 'asthma'],
    'phenotype': ['Mental deterioration', 'Hyperglycemia', 'Tumor growth', 'Increased blood pressure', 'Airway inflammation'],
    'developmental_stage': ['Adult', 'Adult', 'Adult', 'Adult', 'Child'],
    'patient_age': [70, 55, 60, 65, 12],
}
df = pd.DataFrame(data)
df

In [None]:
ehrcurator = EHRCurator(df)
ehrcurator.validate()

In [None]:
# Fix the name of wrongly spelled column
df.columns = df.columns.str.replace("patient_age", "age")
ehrcurator.validate()

In [None]:
# Use lookup objects to curate the values
disease_lo = bt.Disease.public().lookup()
phenotype_lo = bt.Phenotype.public().lookup()
developmental_stage_lo = bt.DevelopmentalStage.public().lookup()

df["disease"] = df["disease"].replace({"Hypertension": disease_lo.hypertensive_disorder.name})
df["phenotype"] = df["phenotype"].replace({
    "Tumor growth": phenotype_lo.neoplasm.name,
    "Airway inflammation": phenotype_lo.bronchitis.name}
)
df["developmental_stage"] = df["developmental_stage"].replace({
    "Adult": developmental_stage_lo.adolescent_stage.name,
    "Child": developmental_stage_lo.child_stage.name
})

ehrcurator.validate()

In [None]:
!rm -rf subclass-curator
!lamin delete --force subclass-curator