# Comparing Multiple Single-Cell Datasets with `cev`

In the this notebook we're going to show that `jscatter` can be used in application-specific [Comparative Embedding Visualization](https://github.com/OzetteTech/comparative-embedding-visualization) widget (or `cev` in short) to enhance the exploration and comparison of single-cell (and other data) data.

➡️ [For more infos about `cev` see github.com/OzetteTech/comparative-embedding-visualization](https://github.com/OzetteTech/comparative-embedding-visualization)

Using the same single-cell data from [Mair et al., 2022](https://www.nature.com/articles/s41586-022-04718-w) that we already looked at in [1-Getting-Started.ipynb](1-Getting-Started.ipynb#Scalability). However, this time we will **_compare healthy against cancer tissue_** embeddings.

As before, the data was clustered with [Ozette](https://www.ozette.com/)'s [FAUST method](https://doi.org/10.1016/j.patter.2021.100372) and transformed with [Ozette's Annotation Transformation](https://github.com/flekschas-ozette/ismb-biovis-2022) prior to being embedded with [UMAP](https://umap-learn.readthedocs.io/en/latest/).

> 🚨 Shout-out Alert

One more shout-out to [Trevor](https://trevorma.nz/) who led the development of [`cev`](comparative-embedding-visualization).

---

In [None]:
!mkdir -p data
!curl -L -C - -o data/mair-2022-tumor-006-ozette.pq https://storage.googleapis.com/flekschas/jupyter-scatter-tutorial/mair-2022-tumor-006-ozette.pq
!curl -L -C - -o data/mair-2022-tissue-138-ozette.pq https://storage.googleapis.com/flekschas/jupyter-scatter-tutorial/mair-2022-tissue-138-ozette.pq

In [None]:
import pandas as pd
mair_2022_tissue_ozette = pd.read_parquet("./data/mair-2022-tissue-138-ozette.pq")
mair_2022_tumor_ozette = pd.read_parquet("./data/mair-2022-tumor-006-ozette.pq")

First we will plot two scatters with vanilla `jscatter` to get an idea of the nature of the healthy and cancer tissue embeddings.

In [None]:
from itertools import cycle
from jscatter import Scatter, compose, glasbey_light

phenotypes = list(set(list(mair_2022_tumor_ozette.faustLabels.unique()) + list(mair_2022_tissue_ozette.faustLabels.unique())))

colormap = dict(zip(phenotypes, cycle(glasbey_light[1:])))
colormap["0_0_0_0_0"] = (0.2, 0.2, 0.2, 1.0)

config = dict(x='umapX', y='umapY', color_by='faustLabels', color_map=colormap, background_color="#111111", axes=False)

mair_2022_tissue_scatter = Scatter(data=mair_2022_tissue_ozette, **config)
mair_2022_tumor_scatter = Scatter(data=mair_2022_tumor_ozette, **config)

compose([(mair_2022_tissue_scatter, "Healthy Tissue"), (mair_2022_tumor_scatter, "Cancer Tissue")], row_height=640)

At a first glance you likely notice that both embedding visualizations looks fairly similar. This isn't surprising because the composition of cell types does not change drastically. Also, [Ozette's Annotation Transformation](https://github.com/flekschas-ozette/ismb-biovis-2022) ensures that both embeddings are aligned.

A big question is which clusters differ the most. To answer this question we can analyze the neighborhood graph of the points. However, such computations are outside of "vanilla" `jscatter`. But we can build a tool around this and use `jscatter` to drive the visualization and point selections.

We have done exactly this and created a tool+widget called _Comparative Embedding Visualization_ – or `cev` in short.

In [None]:
import pandas as pd
from cev.widgets import Embedding, EmbeddingComparisonWidget

tissue_ozette_embedding = Embedding.from_ozette(df=pd.read_parquet("./data/mair-2022-tissue-138-ozette.pq"))
tumor_ozette_embedding = Embedding.from_ozette(df=pd.read_parquet("./data/mair-2022-tumor-006-ozette.pq"))

tissue_vs_tumor = EmbeddingComparisonWidget(
    tissue_ozette_embedding,
    tumor_ozette_embedding,
    titles=["Healthy Tissue", "Cancer Tissue"],
    metric="abundance",
    selection="phenotype",
    auto_zoom=True,
    row_height=320,
    background_color="#111111",
)

tissue_vs_tumor

`cev` offers a bespoke interface to compare a pair embeddings (left vs right).