[![](https://img.shields.io/badge/Source%20on%20GitHub-orange)](https://github.com/laminlabs/lamin-spatial/blob/main/docs/vitessce.ipynb)

# Vitessce Data Preparation Tutorial

This tutorial has been adopted from the data preparation examples in [the Vitessce documention](https://vitessce.github.io/vitessce-python).

## 1. Setup

Install dependencies:

```python
pip install vitessce
pip install 'lamindb[jupyter,aws]'
```

In [None]:
!lamin load laminlabs/lamindata  # load your instance

In [None]:
from urllib.request import urlretrieve
from anndata import read_h5ad
from vitessce import (
    VitessceConfig,
    Component as cm,
    AnnDataWrapper,
)
from vitessce.data_utils import (
    to_uint8,
    sort_var_axis,
    optimize_adata,
)

import lamindb as ln

In [None]:
# to track the current notebook
# run ln.track() to generate the stem_uid and version
ln.settings.transform.stem_uid = "BZhZQ6uIbkWv"
ln.settings.transform.version = "1"
ln.track()

## 2. Download and process data

For this example, we need to download a dataset from the COVID-19 Cell Atlas https://www.covid19cellatlas.org/index.healthy.html#habib17.

In [None]:
# From https://github.com/vitessce/vitessce-python/blob/main/demos/habib-2017/src/convert_to_zarr.py
def convert_h5ad_to_zarr(input_path, output_path):
    adata = read_h5ad(input_path)

    # Store an expression matrix with only the highly variable genes.
    adata = adata[:, adata.var["highly_variable"]].copy()

    # Reorder the genes axis after hierarchical clustering.
    leaf_list = sort_var_axis(adata.X, adata.var.index.values)
    adata = adata[:, leaf_list].copy()

    # Store expression matrix as uint8.
    adata.layers["X_uint8"] = to_uint8(adata.X, norm_along="var")

    adata = optimize_adata(
        adata, obs_cols=["CellType"], obsm_keys=["X_umap"], layer_keys=["X_uint8"]
    )

    adata.write_zarr(output_path)

In [None]:
adata_filepath = "./habib17.processed.h5ad"
urlretrieve("https://covid19.cog.sanger.ac.uk/habib17.processed.h5ad", adata_filepath)
zarr_filepath = "./hhabib_2017_nature_methods.h5ad.zarr"

convert_h5ad_to_zarr(adata_filepath, zarr_filepath)

## 3. Create the Vitessce configuration

Set up the configuration by adding the views and datasets of interest.

In [None]:
vc = VitessceConfig(
    schema_version="1.0.15",
    name="Habib et al., 2017 Nature Methods",
    description=(
        "Archived frozen adult human post-mortem brain tissue profiled by snRNA-seq"
        " (DroNc-seq)"
    ),
)
dataset = vc.add_dataset(name="Habib 2017").add_object(
    AnnDataWrapper(
        adata_path=zarr_filepath,
        obs_feature_matrix_path="layers/X_uint8",
        obs_embedding_paths=["obsm/X_umap"],
        obs_embedding_names=["UMAP"],
        obs_set_paths=["obs/CellType"],
        obs_set_names=["Cell Type"],
    )
)
obs_sets = vc.add_view(cm.OBS_SETS, dataset=dataset)
obs_sets_sizes = vc.add_view(cm.OBS_SET_SIZES, dataset=dataset)
scatterplot = vc.add_view(cm.SCATTERPLOT, dataset=dataset, mapping="UMAP")
heatmap = vc.add_view(cm.HEATMAP, dataset=dataset)
genes = vc.add_view(cm.FEATURE_LIST, dataset=dataset)
vc.layout(((scatterplot | obs_sets) / heatmap) | (obs_sets_sizes / genes));

## 4. Ingest into LaminDB

Here is [a note](https://lamin.ai/laminlabs/lamindata/transform/WDjio16cQsdW5zKv) on folder upload speed and why chose to not use the `.export(to="s3")` functionality of Vitessce.

In [None]:
from lamindb.integrations import save_vitessce_config

In [None]:
artifact = save_vitessce_config(vc, description="A VitessceConfig object")

In [None]:
artifact

In [None]:
artifact.delete(permanent=True)