[![Jupyter Notebook](https://img.shields.io/badge/Source%20on%20GitHub-orange)](https://github.com/laminlabs/lamin-usecases/blob/main/docs/spatial2.ipynb)


# Spatial

Here, you'll learn how to manage spatial datasets:

1. query & analyze spatial datasets (![spatial1/4](https://img.shields.io/badge/spatial1/4-lightgrey))
2. create and share interactive visualizations with vitessce ([![spatial2/4](https://img.shields.io/badge/spatial2/4-lightgrey)](/spatial2))
3. curate and ingest spatial data (![spatial3/4](https://img.shields.io/badge/spatial3/4-lightgrey))
4. load the collection into memory & train a ML model ([![spatial3/4](https://img.shields.io/badge/spatial4/4-lightgrey)](/spatial3))



```{toctree}
:maxdepth: 1
:hidden:

spatial2
spatial3
spatial4
```

Spatial omics data integrates molecular profiling (e.g., transcriptomics, proteomics) with spatial information, preserving the spatial organization of cells and tissues.
It enables high-resolution mapping of molecular activity within biological contexts, crucial for understanding cellular interactions and microenvironments.

Many different spatial technologies such as multiplexed imaging, spatial transcriptomics, spatial proteomics, whole slide imaging, spatial metabolomics, and 3D tissue reconstruction exist which can all be stored in the [SpatialData](https://github.com/scverse/spatialdata) data framework.
For more details we refer to the original publication:

Marconato, L., Palla, G., Yamauchi, K.A. et al. SpatialData: an open and universal data framework for spatial omics. Nat Methods 22, 58â€“62 (2025). [https://doi.org/10.1038/s41592-024-02212-x](https://doi.org/10.1038/s41592-024-02212-x)

```{note}
A collection of curated spatial datasets in SpatialData format is available on the [scverse/spatialdata-db instance](https://lamin.ai/scverse/spatialdata-db).
```

```{dropdown} spatial data vs SpatialData terminology
When we mention spatial data, we refer to data from spatial assays, such as spatial transcriptomics or proteomics, that includes spatial coordinates to represent the organization of molecular features in tissue.
When we refer SpatialData, we mean spatial omics data stored in the scverse SpatialData framework.
```

# Query and analyze spatial data

In [None]:
# pip install 'lamindb[jupyter,bionty]' spatialdata spatialdata-plot
!lamin init --storage ./test-spatial --modules bionty

In [None]:
import warnings

warnings.filterwarnings("ignore")

import lamindb as ln
import scanpy as sc
import spatialdata as sd
import spatialdata_plot as pl
import squidpy as sq
from matplotlib import pyplot as plt

ln.track()

## Query by biological metadata

We'll work with a human lung cancer dataset generated using 10x Genomics Xenium platform and available in a public instance. 
This FFPE (formalin-fixed paraffin-embedded) tissue sample includes spatial gene expression profiles.

Let's query the database to find Xenium datasets from lung tissue:

In [None]:
all_xenium_data = (
    ln.Artifact.connect("laminlabs/lamindata")
    .filter(assay="Xenium Spatial Gene Expression", tissue="lung")
)
all_xenium_data.to_dataframe()

## Analyze spatial data

Spatial data datasets stored as SpatialData objects can easily be examined and analyzed through the SpatialData framework, [squidpy](https://github.com/scverse/squidpy), and [scanpy](https://github.com/scverse/scanpy):

In [None]:
sdata = all_xenium_data[0].load()

Use `spatialdata-plot` to get an overview of the dataset:

In [None]:
axes = plt.subplots(1, 2, figsize=(10, 10))[1].flatten()
sdata.pl.render_images("he_image", scale="scale4").pl.show(ax=axes[0], title="H&E image")
sdata.pl.render_images("morphology_focus", scale="scale4").pl.show(ax=axes[1], title="Morphology image")

We can visualize the segmentations masks by rendering the shapes from the SpatialData object:

In [None]:
def crop0(x):
    return sd.bounding_box_query(
        x,
        min_coordinate=[20000, 7000],
        max_coordinate=[22000, 7500],
        axes=("x", "y"),
        target_coordinate_system="global",
    )

crop0(sdata).pl.render_images("he_image", scale="scale2").pl.render_shapes(
    "cell_boundaries", 
    fill_alpha=0, 
    outline_alpha=0.5
    ).pl.show(figsize=(15, 10),
              title="H&E image & cell boundaries", 
              coordinate_systems="global")


For any Xenium analysis we would use the `AnnData` object, which contains the count matrix, cell and gene annotations.
It is stored in the `spatialdata.tables` slot:

In [None]:
adata = sdata.tables['table']
adata

In [None]:
adata.obs

Calculate the quality control metrics on the AnnData object using `scanpy.pp.calculate_qc_metrics`:

In [None]:
sc.pp.calculate_qc_metrics(adata, percent_top=(10, 20, 50, 150), inplace=True)

The percentage of control probes and control codewords can be calculated from the `obs` slot:

In [None]:
cprobes = (
    adata.obs["control_probe_counts"].sum() / adata.obs["total_counts"].sum() * 100
)
cwords = (
    adata.obs["control_codeword_counts"].sum() / adata.obs["total_counts"].sum() * 100
)
print(f"Negative DNA probe count %: {cprobes}")
print(f"Negative decoding count %: {cwords}")

Plot the distribution of total transcripts per cell, unique transcripts per cell, area of segmented cells and the ratio of nuclei area to their cells:

In [None]:
fig, axs = plt.subplots(1, 4, figsize=(15, 4))

axs[0].set_title("Total transcripts per cell")
axs[0].hist(adata.obs["total_counts"], bins=100)

axs[1].set_title("Unique transcripts per cell")
axs[1].hist(adata.obs["n_genes_by_counts"], bins=100)

axs[2].set_title("Area of segmented cells")
axs[2].hist(adata.obs["cell_area"], bins=100)

axs[3].set_title("Nucleus ratio")
axs[3].hist(adata.obs["nucleus_area"] / adata.obs["cell_area"], bins=100)

plt.tight_layout()

Filter the cells based on the minimum number of counts required using `scanpy.pp.filter_cells` and the genes based on the minimum number of cells required with `scanpy.pp.filter_genes`. 
The parameters for the both were specified based on the plots above.

In [None]:
sc.pp.filter_cells(adata, min_counts=10)
sc.pp.filter_genes(adata, min_cells=10)

Normalize the total counts per cell using `scanpy.pp.normalize_total`, apply log transformation with `scanpy.pp.log1p`, perform principal component analysis using `scanpy.pp.pca`, and compute a neighborhood graph of observations with `scanpy.pp.neighbors`:

In [None]:
adata.layers["counts"] = adata.X.copy()
sc.pp.normalize_total(adata, inplace=True)
sc.pp.log1p(adata)
sc.pp.pca(adata)
sc.pp.neighbors(adata)

Compute a UMAP embedding of the neighborhood graph using `scanpy.tl.umap` and cluster cells into groups using `scanpy.tl.leiden`:

In [None]:
sc.tl.umap(adata)
sc.tl.leiden(adata)

Visualize annotation on UMAP and spatial coordinates:

In [None]:
sc.pl.umap(
    adata,
    color=[
        "total_counts",
        "n_genes_by_counts",
        "leiden",
    ],
    wspace=0.1,
)

In [None]:
sq.pl.spatial_scatter(
    adata,
    library_id="spatial",
    shape=None,
    color=[
        "leiden",
    ],
)

For a full tutorial on how to perform analysis of Xenium data, we refer to [squidpy's Xenium tutorial](https://squidpy.readthedocs.io/en/stable/notebooks/tutorials/tutorial_xenium.html).

In [None]:
ln.finish()