# Spatial

Here, you'll learn how to manage spatial datasets:

1. curate and ingest spatial data (![spatial1/4](https://img.shields.io/badge/spatial1/4-lightgrey))
2. query & analyze spatial datasets ([![spatial2/4](https://img.shields.io/badge/spatial2/4-lightgrey)](/spatial2))
3. load the collection into memory & train a ML model ([![spatial3/4](https://img.shields.io/badge/spatial3/4-lightgrey)](/spatial3))
4. create and share interactive visualizations with vitessce ([![spatial4/4](https://img.shields.io/badge/spatial4/4-lightgrey)](/spatial4))


```{toctree}
:maxdepth: 1
:hidden:

spatial2
spatial3
spatial4
```

Spatial omics data integrates molecular profiling (e.g., transcriptomics, proteomics) with spatial information, preserving the spatial organization of cells and tissues.
It enables high-resolution mapping of molecular activity within biological contexts, crucial for understanding cellular interactions and microenvironments.

Many different spatial technologies such as multiplexed imaging, spatial transcriptomics, spatial proteomics, whole-slide imaging, spatial metabolomics, and 3D tissue reconstruction exist which can all be stored in the [SpatialData](https://github.com/scverse/spatialdata) data framework.
For more details we refer to the original publication:

Marconato, L., Palla, G., Yamauchi, K.A. et al. SpatialData: an open and universal data framework for spatial omics. Nat Methods 22, 58–62 (2025). [https://doi.org/10.1038/s41592-024-02212-x](https://doi.org/10.1038/s41592-024-02212-x)

:::{note}

A collection of curated spatial datasets in SpatialData format is available on the [scverse/spatialdata-db instance](https://lamin.ai/scverse/spatialdata-db).

:::

In [None]:
# !pip install 'lamindb[jupyter,bionty]' spatialdata[spatialdata-plot]
!lamin init --storage ./test-spatial --schema bionty

In [None]:
import lamindb as ln
import bionty as bt
import pandas as pd
import spatialdata_plot
import warnings

warnings.filterwarnings("ignore")

ln.track()

## Creating SpatialData Artifacts

lamindb provides a {meth}`~lamindb.Artifact.from_spatialdata` method to create {class}`~lamindb.Artifact` from SpatialData objects.

In [None]:
example_blobs_sdata = ln.core.datasets.spatialdata_blobs()
example_blobs_sdata

In [None]:
blobs_af = ln.Artifact.from_spatialdata(example_blobs_sdata, key="example_blobs.zarr")
blobs_af

In [None]:
# SpatialData Artifacts have the corresponding otype
blobs_af.otype

In [None]:
# SpatialData Artifacts can easily be loaded back into memory
example_blobs_in_memory = blobs_af.load()
example_blobs_in_memory

## Curating SpatialData datasets

For the remainder of the guide, we will work with two 10X Xenium and a 10X Visium H&E image dataset.

More details can be found in the [ingestion notebook](https://lamin.ai/laminlabs/lamindata/transform/MN1DpkKGjzbk).

In [None]:
# load first of two cropped Xenium datasets
xenium_aligned_1_sdata = (
    ln.Artifact.using("laminlabs/lamindata")
    .get(key="xenium_aligned_1_guide_min.zarr")
    .load()
)
xenium_aligned_1_sdata

Metadata is stored in two places in the SpatialData object:
1. Dataset level metadata is stored in `sdata.attrs["sample"]`.
2. Measurement specific metadata is stored in the associated tables in `sdata.tables`.

We define a {class}`lamindb.Schema` to curate both sample and table metadata.

In [None]:
# define sample schema
xenium_sample_schema = ln.Schema(
    name="Xenium sample level",
    features=[
        ln.Feature(name="organism", dtype=bt.Organism).save(),
        ln.Feature(name="assay", dtype=bt.ExperimentalFactor).save(),
        ln.Feature(name="disease", dtype=bt.Disease).save(),
        ln.Feature(name="tissue", dtype=bt.Tissue).save(),
        ln.Feature(name="panel", dtype="cat[ULabel]").save(),
    ],
    coerce_dtype=True,
).save()

# define table obs schema
xenium_obs_schema = ln.Schema(
    name="Xenium obs level",
    features=[
        ln.Feature(name="celltype_major", dtype=bt.CellType).save(),
    ],
    coerce_dtype=True,
).save()

# define table var schema
spatial_var_schema = ln.Schema(
    name="Xenium var level", itype=bt.Gene.ensembl_gene_id, dtype=int
).save()

# define composite schema
xenium_schema = ln.Schema(
    name="Xenium schema",
    otype="SpatialData",
    components={
        "sample": xenium_sample_schema,
        "table:obs": xenium_obs_schema,
        "table:var": spatial_var_schema,
    },
).save()

In [None]:
xenium_curator = ln.curators.SpatialDataCurator(xenium_aligned_1_sdata, xenium_schema)
try:
    xenium_curator.validate()
except ln.errors.ValidationError as e:
    print(e)

In [None]:
xenium_curator.slots["sample"].cat.add_new_from("panel")

In [None]:
try:
    xenium_curator.validate()
except ln.errors.ValidationError as e:
    print(e)

In [None]:
xenium_aligned_1_sdata.tables["table"].obs["celltype_major"] = (
    xenium_aligned_1_sdata.tables["table"]
    .obs["celltype_major"]
    .replace(
        {
            "CAFs": "cancer associated fibroblast",
            "Endothelial": "endothelial cell",
            "Myeloid": "myeloid cell",
            "PVL": "perivascular cell",
            "T-cells": "T cell",
            "B-cells": "B cell",
            "Normal Epithelial": "epithelial cell",
            "Plasmablasts": "plasmablast",
            "Cancer Epithelial": "neoplastic epithelial cell",
        }
    )
)

In [None]:
try:
    xenium_curator.validate()
except ln.errors.ValidationError as e:
    print(e)

In [None]:
xenium_curator.slots["table:obs"].cat.add_new_from("celltype_major")

In [None]:
xenium_1_curated_af = xenium_curator.save_artifact(key="xenium1.zarr")

In [None]:
xenium_1_curated_af.describe()

## Creating a Collection of curated SpatialData datasets

We can reuse the same curator for a second Xenium dataset:

In [None]:
xenium_aligned_2_sdata = (
    ln.Artifact.using("laminlabs/lamindata")
    .get(key="xenium_aligned_2_guide_min.zarr")
    .load()
)

xenium_aligned_2_sdata.tables["table"].obs["celltype_major"] = (
    xenium_aligned_2_sdata.tables["table"]
    .obs["celltype_major"]
    .replace(
        {
            "CAFs": "cancer associated fibroblast",
            "Endothelial": "endothelial cell",
            "Myeloid": "myeloid cell",
            "PVL": "perivascular cell",
            "T-cells": "T cell",
            "B-cells": "B cell",
            "Normal Epithelial": "epithelial cell",
            "Plasmablasts": "plasmablast",
            "Cancer Epithelial": "neoplastic epithelial cell",
        }
    )
)

In [None]:
xenium_curator = ln.curators.SpatialDataCurator(xenium_aligned_2_sdata, xenium_schema)
try:
    xenium_curator.validate()
except ln.errors.ValidationError as e:
    print(e)

In [None]:
xenium_2_curated_af = xenium_curator.save_artifact(key="xenium2.zarr")

Analogously, we can define a Schema and Curator for Visium datasets:

In [None]:
visium_sample_schema = ln.Schema(
    name="Visium sample level",
    features=[
        ln.Feature(name="organism", dtype=bt.Organism).save(),
        ln.Feature(name="assay", dtype=bt.ExperimentalFactor).save(),
        ln.Feature(name="disease", dtype=bt.Disease).save(),
        ln.Feature(name="tissue", dtype=bt.Tissue).save(),
    ],
    coerce_dtype=True,
).save()

visium_schema = ln.Schema(
    name="Visium schema",
    otype="SpatialData",
    components={
        "sample": visium_sample_schema,
        "table:var": spatial_var_schema,
    },
).save()

In [None]:
visium_aligned_sdata = (
    ln.Artifact.using("laminlabs/lamindata")
    .get(key="visium_aligned_guide_min.zarr")
    .load()
)
visium_aligned_sdata

In [None]:
visium_curator = ln.curators.SpatialDataCurator(visium_aligned_sdata, visium_schema)
try:
    visium_curator.validate()
except ln.errors.ValidationError as e:
    print(e)

In [None]:
visium_curated_af = visium_curator.save_artifact(key="visium.zarr")

In [None]:
spatial_collection = ln.Collection(
    [xenium_1_curated_af, xenium_2_curated_af, visium_curated_af],
    key="spatial_collection",
).save()
spatial_collection

In [None]:
ln.finish()