# Evaluator API Demo

In [1]:
import anndata as ad
import random
from sklearn.model_selection import train_test_split

from perturbench.analysis.benchmarks.evaluator import Evaluator

%reload_ext autoreload
%autoreload 2

For this demo, we'll be demonstrating the usage of the Evaluator API using the srivatsan20-transfer task

## Create an Evaluator object

List all tasks in the Evaluator class

In [2]:
Evaluator.list_tasks()

['srivatsan20-transfer',
 'mcfaline23-transfer',
 'norman19-combo',
 'frangieh21-transfer',
 'jiang24-transfer']

Create an evaluator object with the srivatsan20-transfer task. It will automatically download and process the srivatsan20 dataset for you. For the mcfaline23-transfer and jiang24-transfer tasks, you will need to run the notebooks in the notebooks/neurips2024/data_curation/ directory first to preprocess the data.

In [6]:
srivatsan20_eval = Evaluator(
    task='srivatsan20-transfer',
    local_data_cache='../neurips2024/perturbench_data',
)
srivatsan20_eval

Loading processed data from: ../neurips2024/perturbench_data/sciplex3_processed.h5ad


  for cov_keys, df in self.obs_dataframe.groupby(self.covariate_keys):
  for pert_key, df in self.obs_dataframe.groupby([self.perturbation_key]):


Split summary: 
           train  val  test
('mcf7',)    132   29    30
('a549',)    132   30    29
('k562',)    132   29    30


<perturbench.analysis.benchmarks.evaluator.Evaluator at 0x7fa95526c450>

We can pull the train/test/val splits as a dictionary of cell indexes from the evaluator object with the `get_split` method

In [7]:
split_dict = srivatsan20_eval.get_split()
for k,v in split_dict.items():
    print(k, len(v))

train 123044
val 28454
test 32358


## Simulate perturbation response predictions

The evaluator object requires a dictionary of perturbation response predictions as anndata objects. For this demo, we'll simulate these predictions by randomly subsampling and/or shuffling the data and treating those samples/shuffles as different "model predictions"

In [8]:
adata = srivatsan20_eval.ref_adata
adata

AnnData object with n_obs × n_vars = 28454 × 9198
    obs: 'ncounts', 'well', 'plate', 'cell_line', 'replicate', 'time', 'dose_value', 'pathway_level_1', 'pathway_level_2', 'perturbation', 'target', 'pathway', 'dose_unit', 'celltype', 'disease', 'cancer', 'tissue_type', 'organism', 'perturbation_type', 'ngenes', 'percent_mito', 'percent_ribo', 'nperts', 'chembl-ID', 'cell_type', 'condition', 'cov_merged', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'transfer_split_seed42'
    var: 'ensembl_id', 'ncounts', 'ncells', 'gene_symbol', 'n_cells', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'highly_variable_nbatches'
    uns: 'hvg', 'log1p', 'rank_genes_groups_cov'
    layer

We'll simulate predictions by randomly subsampling and/or shuffling the data and treating those samples/shuffles as different "model predictions"

In [9]:
adata.obs['condition_cell_type'] = adata.obs['condition'].astype(str) + '_' + adata.obs['cell_type'].astype(str)

In [10]:
sampled_cells, _ = train_test_split(
    adata.obs_names,
    test_size=0.25, 
    stratify=adata.obs['condition_cell_type'],
    random_state=54
)

In [11]:
sampled_adata = adata[sampled_cells, :]
sampled_adata.shape

(21340, 9198)

Now we'll create the shuffled predictions to serve as a negative control

In [12]:
random.seed(54)

random_adata_list = []
for cell_type in sampled_adata.obs.cell_type.unique():
    random_adata_cl = sampled_adata[sampled_adata.obs.cell_type == cell_type, :].copy()
    random_adata_cl.obs['condition'] = random.sample(
        list(random_adata_cl.obs['condition'].astype(str)), 
        k=random_adata_cl.n_obs,
    )
    random_adata_list.append(random_adata_cl)

random_adata = ad.concat(random_adata_list)
random_adata.shape

(21340, 9198)

The input to our evaluator class is a dictionary of model predictions

In [13]:
simulated_predictions = {
    'sampled': sampled_adata,
    'random': random_adata,
}

We then evaluate our simulated model predictions

In [14]:
metrics_df = srivatsan20_eval.evaluate(
    model_predictions=simulated_predictions,
    return_metrics_dataframe=True,
)

  adata_i.obs[pert_col] = adata_i.obs[pert_col].astype("category")
  adata.obs[pert_col] = adata.obs[pert_col].astype("category")
  summary_metrics = pd.DataFrame(summary_metrics_dict).T.applymap(


We can then look at the summary metrics returned by the evaluation. This is an average of the metric computed on a per-perturbation basis. As we can see, the sampled data is very close to the full observed data and the random data has no information at all

In [15]:
metrics_df

model,random,sampled
rmse_average,0.0268,0.005791
rmse_rank_average,0.4574,0.0
cosine_logfc,0.01067,0.9195
cosine_rank_logfc,0.4197,0.0
