[![Jupyter Notebook](https://img.shields.io/badge/Source%20on%20GitHub-orange)](https://github.com/laminlabs/lamin-usecases/blob/main/docs/spatial2.ipynb)


# Query and analyze spatial data

After having created a SpatialData collection, we briefly discuss how to query and analyze spatial data.

In [None]:
import lamindb as ln
import bionty as bt
import squidpy as sq
import scanpy as sc
import spatialdata_plot
import warnings

warnings.filterwarnings("ignore")

ln.track(project="spatial guide datasets")

## Query a SpatialData Collection

### By provenance metadata

Query the transform, e.g., by key:

In [None]:
transform = ln.Transform.get(key="spatial.ipynb")
transform

Query the artifact:

In [None]:
ln.Artifact.filter(transform=transform).df()

## By biological metadata

Spatial data stored in SpatialData format and curated with the {class}`~lamindb.curators.SpatialDataCurator` can easily be queried by the annotated features and labels.
Although, we curated specific slots of SpatialData Artifacts, the labels are attached directly to the Artifact:

In [None]:
experimental_factors = bt.ExperimentalFactor.lookup()

# 10x xenium has a ln_ prefix because Python does not support numbers as attributes
all_xenium_data = ln.Artifact.filter(
    experimental_factors__name=experimental_factors.ln_10x_xenium
)
all_xenium_data.df()

### Inspect artifact metadata

Query all artifacts that measured the “celltype_major” feature:

In [None]:
# Only returns the Xenium datasets as the Visium dataset did not have annotated cell types
query_set = ln.Artifact.filter(feature_sets__features__name="celltype_major").all()
xenium_1_af, xenium_2_af = query_set[0], query_set[1]

In [None]:
xenium_1_af.describe()

In [None]:
xenium_1_af.view_lineage()

In [None]:
xenium_2_af.describe()

In [None]:
xenium_2_af.view_lineage()

## Analyze spatial data

Spatial data datasets stored as SpatialData objects can easily be examined and analyzed through the SpatialData framework, [squidpy](https://github.com/scverse/squidpy), and [scanpy](https://github.com/scverse/scanpy):

In [None]:
xenium_1_sd = xenium_1_af.load()
xenium_1_sd

Use spatialdata-plot to get an overview of the dataset:

In [None]:
xenium_1_sd.pl.render_images(element="morphology_focus").pl.render_shapes(
    fill_alpha=0, outline_alpha=0.2
).pl.show(coordinate_systems="aligned")

For any Xenium analysis we would use the `AnnData` object, which contains the count matrix, cell and gene annotations.
It is stored in the `spatialdata.tables` slot:

In [None]:
xenium_adata = xenium_1_sd.tables["table"]
xenium_adata

In [None]:
xenium_adata.obs

Calculate the quality control metrics on the AnnData object using `scanpy.pp.calculate_qc_metrics`:

In [None]:
sc.pp.calculate_qc_metrics(xenium_adata, percent_top=(10, 20, 50, 150), inplace=True)

The percentage of control probes and control codewords can be calculated from the `obs` slot:

In [None]:
cprobes = (
    xenium_adata.obs["control_probe_counts"].sum()
    / xenium_adata.obs["total_counts"].sum()
    * 100
)
cwords = (
    xenium_adata.obs["control_codeword_counts"].sum()
    / xenium_adata.obs["total_counts"].sum()
    * 100
)
print(f"Negative DNA probe count % : {cprobes}")
print(f"Negative decoding count % : {cwords}")

Visualize annotation on UMAP and spatial coordinates:

In [None]:
xenium_adata.layers["counts"] = xenium_adata.X.copy()
sc.pp.normalize_total(xenium_adata, inplace=True)
sc.pp.log1p(xenium_adata)
sc.pp.pca(xenium_adata)
sc.pp.neighbors(xenium_adata)
sc.tl.umap(xenium_adata)
sc.tl.leiden(xenium_adata)

In [None]:
sc.pl.umap(
    xenium_adata,
    color=[
        "total_counts",
        "n_genes_by_counts",
        "leiden",
    ],
    wspace=0.4,
)

In [None]:
sq.pl.spatial_scatter(
    xenium_adata,
    library_id="spatial",
    shape=None,
    color=[
        "leiden",
    ],
    wspace=0.4,
)

For a full tutorial on how to perform analysis of Xenium data, we refer to [squidpy's Xenium tutorial](https://squidpy.readthedocs.io/en/stable/notebooks/tutorials/tutorial_xenium.html).

In [None]:
ln.finish()