# Large Scale Embedding benchmarks

This notebook includes an example showing how to run large scale embedding benchmarks using scIB [(single-cell integration benchmark)](https://www.nature.com/articles/s41592-021-01336-8)

We use the GPU accelerated version implemented here: https://github.com/YosefLab/scib-metrics

Please follow installation instructions in that repo. 

*Note: installing Faiss can be difficult and may take some time*

*Running the full benchmarking suite on many cells can take many hours, even on GPUs with large amounts of memory, such as A100s, and with many threads*

## Load Imports and define Benchmark Function

In [1]:
import numpy as np
import scanpy as sc

from scib_metrics.benchmark import Benchmarker

import faiss

from scib_metrics.nearest_neighbors import NeighborsResults

# Faiss GPU accelerate nearest neighbors methods
def faiss_hnsw_nn(X: np.ndarray, k: int):
    """Gpu HNSW nearest neighbor search using faiss.

    See https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.md
    for index param details.
    """
    X = np.ascontiguousarray(X, dtype=np.float32)
    res = faiss.StandardGpuResources()
    M = 32
    index = faiss.IndexHNSWFlat(X.shape[1], M, faiss.METRIC_L2)
    gpu_index = faiss.index_cpu_to_gpu(res, 0, index)
    gpu_index.add(X)
    distances, indices = gpu_index.search(X, k)
    del index
    del gpu_index
    # distances are squared
    return NeighborsResults(indices=indices, distances=np.sqrt(distances))


def faiss_brute_force_nn(X: np.ndarray, k: int):
    """Gpu brute force nearest neighbor search using faiss."""
    X = np.ascontiguousarray(X, dtype=np.float32)
    res = faiss.StandardGpuResources()
    index = faiss.IndexFlatL2(X.shape[1])
    gpu_index = faiss.index_cpu_to_gpu(res, 0, index)
    gpu_index.add(X)
    distances, indices = gpu_index.search(X, k)
    del index
    del gpu_index
    # distances are squared
    return NeighborsResults(indices=indices, distances=np.sqrt(distances))

In [2]:
import warnings
warnings.filterwarnings("ignore")
from scib_metrics.benchmark import Benchmarker, BioConservation, BatchCorrection
import pandas as pd

## Benchmarking Function, returns dataframe of scores
def benchmark(ad, label_key="cell_type", batch_key="sample_id", obsm_keys=["X_uce", "X_scGPT", "X_geneformer"]):
    print(f"Running using CT key:", label_key)
    biocons = BioConservation()
    batchcons = BatchCorrection(pcr_comparison=False)
    
    bm = Benchmarker(
        ad,
        batch_key=batch_key,
        label_key=label_key,
        embedding_obsm_keys=obsm_keys,
        bio_conservation_metrics=biocons,
        batch_correction_metrics=None,
        n_jobs=48,
    )
    bm.prepare(neighbor_computer=faiss_brute_force_nn)
    bm.benchmark()
    df = bm.get_results(min_max_scale=False)
    return df

### Load in anndata

For this example, we will benchmark cells from developing mouse brain.

You can download an anndata object with UCE, scGPT and Geneformer embeddings precalulated from [here](https://drive.google.com/drive/folders/1f63fh0ykgEhCrkd_EVvIootBw7LYDVI7)

In [3]:
ad = sc.read("developing_mouse_brain.h5ad", cache=True)
ad

AnnData object with n_obs × n_vars = 597668 × 18285
    obs: 'n_counts', 'n_genes', 'region', 'age', 'experiment', 'species', 'sex', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'leiden', 'cell_type', 'sex_old', 'abca_class', 'abca_subclass', 'abca_supertype', 'abca_cluster', 'abca_region', 'leiden_old', 'region_dissected', 'biosample_id', 'donor_id', 'species__ontology_label', 'disease', 'disease__ontology_label', 'organ', 'organ__ontology_label', 'library_preparation_protocol', 'library_preparation_protocol__ontology_label', 'cell_type_author', 'cell_type__ontology_label', 'supercluster'
    var: 'n_cells', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'highly_variable_nbatches', 'feature_name'
    uns: '10x_batch_colors', '_scvi_manager_uuid', '_scvi_uuid', 'age_colors', 'ages_ordered_colors', 'dendrogram_leiden', 'hvg', 'leiden', 'log1p

In [4]:
cell_type_column = "supercluster"
batch_column = "donor_id"

In [5]:
len(ad.obs[cell_type_column].unique()) # Number of unique cell types

33

In [6]:
len(ad.obs[batch_column].unique()) # Number of unique batches

25

# Running the Benchmark

Running the benchmark on the full dataset can take a very long time. Instead, we can run on medium sized samples of cells.

In [7]:
sample_size = 100_000 # number of cells

In [8]:
from tqdm.auto import tqdm
sample_score_dfs = []

for i in tqdm(range(10)):
    # benchmark one sample
    # sample is drawn with random state i
    subsample_ad = sc.pp.subsample(ad, copy=True, n_obs=sample_size, random_state=i)
    sample_df = benchmark(subsample_ad, label_key=cell_type_column,  batch_key=batch_column)
    # show the results for this sample
    display(subsample_ad)
    # add it to the results for all samples
    sample_score_dfs.append(sample_df)

  0%|          | 0/10 [00:00<?, ?it/s]

Running using CT key: supercluster



Computing neighbors:   0%|                                                                                                                                                                                                              | 0/3 [00:00<?, ?it/s][A
Computing neighbors:  33%|██████████████████████████████████████████████████████████████████                                                                                                                                    | 1/3 [00:02<00:04,  2.44s/it][A
Computing neighbors:  67%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                                                  | 2/3 [00:03<00:01,  1.61s/it][A
Computing neighbors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|

AnnData object with n_obs × n_vars = 100000 × 18285
    obs: 'n_counts', 'n_genes', 'region', 'age', 'experiment', 'species', 'sex', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'leiden', 'cell_type', 'sex_old', 'abca_class', 'abca_subclass', 'abca_supertype', 'abca_cluster', 'abca_region', 'leiden_old', 'region_dissected', 'biosample_id', 'donor_id', 'species__ontology_label', 'disease', 'disease__ontology_label', 'organ', 'organ__ontology_label', 'library_preparation_protocol', 'library_preparation_protocol__ontology_label', 'cell_type_author', 'cell_type__ontology_label', 'supercluster'
    var: 'n_cells', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'highly_variable_nbatches', 'feature_name'
    uns: '10x_batch_colors', '_scvi_manager_uuid', '_scvi_uuid', 'age_colors', 'ages_ordered_colors', 'dendrogram_leiden', 'hvg', 'leiden', 'log1p

Running using CT key: supercluster



Computing neighbors:   0%|                                                                                                                                                                                                              | 0/3 [00:00<?, ?it/s][A
Computing neighbors:  33%|██████████████████████████████████████████████████████████████████                                                                                                                                    | 1/3 [00:01<00:03,  1.97s/it][A
Computing neighbors:  67%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                                                  | 2/3 [00:03<00:01,  1.43s/it][A
Computing neighbors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|

AnnData object with n_obs × n_vars = 100000 × 18285
    obs: 'n_counts', 'n_genes', 'region', 'age', 'experiment', 'species', 'sex', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'leiden', 'cell_type', 'sex_old', 'abca_class', 'abca_subclass', 'abca_supertype', 'abca_cluster', 'abca_region', 'leiden_old', 'region_dissected', 'biosample_id', 'donor_id', 'species__ontology_label', 'disease', 'disease__ontology_label', 'organ', 'organ__ontology_label', 'library_preparation_protocol', 'library_preparation_protocol__ontology_label', 'cell_type_author', 'cell_type__ontology_label', 'supercluster'
    var: 'n_cells', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'highly_variable_nbatches', 'feature_name'
    uns: '10x_batch_colors', '_scvi_manager_uuid', '_scvi_uuid', 'age_colors', 'ages_ordered_colors', 'dendrogram_leiden', 'hvg', 'leiden', 'log1p

Running using CT key: supercluster



Computing neighbors:   0%|                                                                                                                                                                                                              | 0/3 [00:00<?, ?it/s][A
Computing neighbors:  33%|██████████████████████████████████████████████████████████████████                                                                                                                                    | 1/3 [00:01<00:03,  1.93s/it][A
Computing neighbors:  67%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                                                  | 2/3 [00:02<00:01,  1.40s/it][A
Computing neighbors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|

AnnData object with n_obs × n_vars = 100000 × 18285
    obs: 'n_counts', 'n_genes', 'region', 'age', 'experiment', 'species', 'sex', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'leiden', 'cell_type', 'sex_old', 'abca_class', 'abca_subclass', 'abca_supertype', 'abca_cluster', 'abca_region', 'leiden_old', 'region_dissected', 'biosample_id', 'donor_id', 'species__ontology_label', 'disease', 'disease__ontology_label', 'organ', 'organ__ontology_label', 'library_preparation_protocol', 'library_preparation_protocol__ontology_label', 'cell_type_author', 'cell_type__ontology_label', 'supercluster'
    var: 'n_cells', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'highly_variable_nbatches', 'feature_name'
    uns: '10x_batch_colors', '_scvi_manager_uuid', '_scvi_uuid', 'age_colors', 'ages_ordered_colors', 'dendrogram_leiden', 'hvg', 'leiden', 'log1p

Running using CT key: supercluster



Computing neighbors:   0%|                                                                                                                                                                                                              | 0/3 [00:00<?, ?it/s][A
Computing neighbors:  33%|██████████████████████████████████████████████████████████████████                                                                                                                                    | 1/3 [00:01<00:03,  1.97s/it][A
Computing neighbors:  67%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                                                  | 2/3 [00:02<00:01,  1.42s/it][A
Computing neighbors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|

AnnData object with n_obs × n_vars = 100000 × 18285
    obs: 'n_counts', 'n_genes', 'region', 'age', 'experiment', 'species', 'sex', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'leiden', 'cell_type', 'sex_old', 'abca_class', 'abca_subclass', 'abca_supertype', 'abca_cluster', 'abca_region', 'leiden_old', 'region_dissected', 'biosample_id', 'donor_id', 'species__ontology_label', 'disease', 'disease__ontology_label', 'organ', 'organ__ontology_label', 'library_preparation_protocol', 'library_preparation_protocol__ontology_label', 'cell_type_author', 'cell_type__ontology_label', 'supercluster'
    var: 'n_cells', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'highly_variable_nbatches', 'feature_name'
    uns: '10x_batch_colors', '_scvi_manager_uuid', '_scvi_uuid', 'age_colors', 'ages_ordered_colors', 'dendrogram_leiden', 'hvg', 'leiden', 'log1p

Running using CT key: supercluster



Computing neighbors:   0%|                                                                                                                                                                                                              | 0/3 [00:00<?, ?it/s][A
Computing neighbors:  33%|██████████████████████████████████████████████████████████████████                                                                                                                                    | 1/3 [00:02<00:04,  2.06s/it][A
Computing neighbors:  67%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                                                  | 2/3 [00:03<00:01,  1.54s/it][A
Computing neighbors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|

AnnData object with n_obs × n_vars = 100000 × 18285
    obs: 'n_counts', 'n_genes', 'region', 'age', 'experiment', 'species', 'sex', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'leiden', 'cell_type', 'sex_old', 'abca_class', 'abca_subclass', 'abca_supertype', 'abca_cluster', 'abca_region', 'leiden_old', 'region_dissected', 'biosample_id', 'donor_id', 'species__ontology_label', 'disease', 'disease__ontology_label', 'organ', 'organ__ontology_label', 'library_preparation_protocol', 'library_preparation_protocol__ontology_label', 'cell_type_author', 'cell_type__ontology_label', 'supercluster'
    var: 'n_cells', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'highly_variable_nbatches', 'feature_name'
    uns: '10x_batch_colors', '_scvi_manager_uuid', '_scvi_uuid', 'age_colors', 'ages_ordered_colors', 'dendrogram_leiden', 'hvg', 'leiden', 'log1p

Running using CT key: supercluster



Computing neighbors:   0%|                                                                                                                                                                                                              | 0/3 [00:00<?, ?it/s][A
Computing neighbors:  33%|██████████████████████████████████████████████████████████████████                                                                                                                                    | 1/3 [00:02<00:03,  2.00s/it][A
Computing neighbors:  67%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                                                  | 2/3 [00:03<00:01,  1.61s/it][A
Computing neighbors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|

AnnData object with n_obs × n_vars = 100000 × 18285
    obs: 'n_counts', 'n_genes', 'region', 'age', 'experiment', 'species', 'sex', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'leiden', 'cell_type', 'sex_old', 'abca_class', 'abca_subclass', 'abca_supertype', 'abca_cluster', 'abca_region', 'leiden_old', 'region_dissected', 'biosample_id', 'donor_id', 'species__ontology_label', 'disease', 'disease__ontology_label', 'organ', 'organ__ontology_label', 'library_preparation_protocol', 'library_preparation_protocol__ontology_label', 'cell_type_author', 'cell_type__ontology_label', 'supercluster'
    var: 'n_cells', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'highly_variable_nbatches', 'feature_name'
    uns: '10x_batch_colors', '_scvi_manager_uuid', '_scvi_uuid', 'age_colors', 'ages_ordered_colors', 'dendrogram_leiden', 'hvg', 'leiden', 'log1p

Running using CT key: supercluster



Computing neighbors:   0%|                                                                                                                                                                                                              | 0/3 [00:00<?, ?it/s][A
Computing neighbors:  33%|██████████████████████████████████████████████████████████████████                                                                                                                                    | 1/3 [00:02<00:04,  2.02s/it][A
Computing neighbors:  67%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                                                  | 2/3 [00:03<00:01,  1.46s/it][A
Computing neighbors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|

AnnData object with n_obs × n_vars = 100000 × 18285
    obs: 'n_counts', 'n_genes', 'region', 'age', 'experiment', 'species', 'sex', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'leiden', 'cell_type', 'sex_old', 'abca_class', 'abca_subclass', 'abca_supertype', 'abca_cluster', 'abca_region', 'leiden_old', 'region_dissected', 'biosample_id', 'donor_id', 'species__ontology_label', 'disease', 'disease__ontology_label', 'organ', 'organ__ontology_label', 'library_preparation_protocol', 'library_preparation_protocol__ontology_label', 'cell_type_author', 'cell_type__ontology_label', 'supercluster'
    var: 'n_cells', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'highly_variable_nbatches', 'feature_name'
    uns: '10x_batch_colors', '_scvi_manager_uuid', '_scvi_uuid', 'age_colors', 'ages_ordered_colors', 'dendrogram_leiden', 'hvg', 'leiden', 'log1p

Running using CT key: supercluster



Computing neighbors:   0%|                                                                                                                                                                                                              | 0/3 [00:00<?, ?it/s][A
Computing neighbors:  33%|██████████████████████████████████████████████████████████████████                                                                                                                                    | 1/3 [00:02<00:04,  2.04s/it][A
Computing neighbors:  67%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                                                  | 2/3 [00:03<00:01,  1.50s/it][A
Computing neighbors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|

AnnData object with n_obs × n_vars = 100000 × 18285
    obs: 'n_counts', 'n_genes', 'region', 'age', 'experiment', 'species', 'sex', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'leiden', 'cell_type', 'sex_old', 'abca_class', 'abca_subclass', 'abca_supertype', 'abca_cluster', 'abca_region', 'leiden_old', 'region_dissected', 'biosample_id', 'donor_id', 'species__ontology_label', 'disease', 'disease__ontology_label', 'organ', 'organ__ontology_label', 'library_preparation_protocol', 'library_preparation_protocol__ontology_label', 'cell_type_author', 'cell_type__ontology_label', 'supercluster'
    var: 'n_cells', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'highly_variable_nbatches', 'feature_name'
    uns: '10x_batch_colors', '_scvi_manager_uuid', '_scvi_uuid', 'age_colors', 'ages_ordered_colors', 'dendrogram_leiden', 'hvg', 'leiden', 'log1p

Running using CT key: supercluster



Computing neighbors:   0%|                                                                                                                                                                                                              | 0/3 [00:00<?, ?it/s][A
Computing neighbors:  33%|██████████████████████████████████████████████████████████████████                                                                                                                                    | 1/3 [00:01<00:03,  1.99s/it][A
Computing neighbors:  67%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                                                  | 2/3 [00:03<00:01,  1.49s/it][A
Computing neighbors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|

AnnData object with n_obs × n_vars = 100000 × 18285
    obs: 'n_counts', 'n_genes', 'region', 'age', 'experiment', 'species', 'sex', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'leiden', 'cell_type', 'sex_old', 'abca_class', 'abca_subclass', 'abca_supertype', 'abca_cluster', 'abca_region', 'leiden_old', 'region_dissected', 'biosample_id', 'donor_id', 'species__ontology_label', 'disease', 'disease__ontology_label', 'organ', 'organ__ontology_label', 'library_preparation_protocol', 'library_preparation_protocol__ontology_label', 'cell_type_author', 'cell_type__ontology_label', 'supercluster'
    var: 'n_cells', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'highly_variable_nbatches', 'feature_name'
    uns: '10x_batch_colors', '_scvi_manager_uuid', '_scvi_uuid', 'age_colors', 'ages_ordered_colors', 'dendrogram_leiden', 'hvg', 'leiden', 'log1p

Running using CT key: supercluster



Computing neighbors:   0%|                                                                                                                                                                                                              | 0/3 [00:00<?, ?it/s][A
Computing neighbors:  33%|██████████████████████████████████████████████████████████████████                                                                                                                                    | 1/3 [00:02<00:04,  2.07s/it][A
Computing neighbors:  67%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                                                  | 2/3 [00:03<00:01,  1.50s/it][A
Computing neighbors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|

AnnData object with n_obs × n_vars = 100000 × 18285
    obs: 'n_counts', 'n_genes', 'region', 'age', 'experiment', 'species', 'sex', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'leiden', 'cell_type', 'sex_old', 'abca_class', 'abca_subclass', 'abca_supertype', 'abca_cluster', 'abca_region', 'leiden_old', 'region_dissected', 'biosample_id', 'donor_id', 'species__ontology_label', 'disease', 'disease__ontology_label', 'organ', 'organ__ontology_label', 'library_preparation_protocol', 'library_preparation_protocol__ontology_label', 'cell_type_author', 'cell_type__ontology_label', 'supercluster'
    var: 'n_cells', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'highly_variable_nbatches', 'feature_name'
    uns: '10x_batch_colors', '_scvi_manager_uuid', '_scvi_uuid', 'age_colors', 'ages_ordered_colors', 'dendrogram_leiden', 'hvg', 'leiden', 'log1p

# Final Scores

We can aggregate the scores from all the samples, taking the mean value (and standard deviation of the score)

In [9]:
grouped_mean = pd.concat([df.drop("Metric Type").reset_index() for df in sample_score_dfs]).groupby("Embedding").agg(np.mean)
# Note: we drop the "Metric Type" row since it contains strings which we can't take the mean of

In [10]:
grouped_std = pd.concat([df.drop("Metric Type").reset_index() for df in sample_score_dfs]).groupby("Embedding").agg(np.std)
# Note: we drop the "Metric Type" row since it contains strings which we can't take the std of

In [11]:
grouped_mean

Unnamed: 0_level_0,Isolated labels,KMeans NMI,KMeans ARI,Silhouette label,cLISI,Silhouette batch,iLISI,KBET,Graph connectivity,PCR comparison,Batch correction,Bio conservation,Total
Embedding,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
X_geneformer,0.532857,0.321145,0.118405,0.479277,0.982301,0.868312,0.165735,0.497117,0.709678,0.0,0.448169,0.486797,0.471346
X_scGPT,0.627445,0.615529,0.351586,0.536652,0.998366,0.88578,0.137406,0.426221,0.872698,0.0,0.464421,0.625916,0.561318
X_uce,0.752708,0.727828,0.50454,0.594331,0.99963,0.860244,0.136796,0.401463,0.832073,0.0,0.446115,0.715807,0.607931


In [12]:
grouped_mean["Bio conservation"]

Embedding
X_geneformer    0.486797
X_scGPT         0.625916
X_uce           0.715807
Name: Bio conservation, dtype: object