# This notebook illsutrates how to evaluate the performance of runs

The main function used is evaluate_project from evaluate.py. To use the function 2 main thigs need to be defined:
- which dataset directories to use:
    - `data_dir` = for test set to use 
- which models to evaluate. To supply these **ONE** of 3 options need to be done:
    - `run_dir_list`  = list of pre-defined runs 
    - `project_dir` = a directory that contains subdirectories of models that will be evaluated
    - `wandb_project_name` = project completed in WandB. If using this option must also provide `wandb_dir` = a directory path where the outputs of the project are saved

Optionally output path `output_dir`, filename `output_prefix` can be defined and `batch_size` can be defined. Set batch size to smaller values for bigger models (at the cost of longer duration)

The outputs are:
- csv of the evaluation results wih 2 rows per target of a model corresponding to scaled or raw results (so the csv is made up of *N(models) * N(targets) * 2 rows*

If only MSE and Pearson's r are needed as an output you can set the flag for evaluation `fast=True` to compute running metrics.


In [1]:
%load_ext autoreload
%autoreload 2
import sys
sys.path.append('../scripts')
import evaluate
import pandas as pd

In [2]:
%%time
# use list of run directories
output_dir = '../tutorial_outputs/'
evaluate.evaluate_project(data_dir='../data/tfr_datasets/i_2048_w_1/',
                          run_dir_list=['../tutorial_outputs'], output_dir=output_dir,
                         batch_size=512)

USING PREDEFINED LIST OF RUNS


2022-04-08 14:35:10.505054: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-04-08 14:35:10.951940: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14257 MB memory:  -> device: 0, name: NVIDIA RTX A4000, pci bus id: 0000:c1:00.0, compute capability: 8.6


../tutorial_outputs


2022-04-08 14:35:12.441744: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8100
2022-04-08 14:35:13.757988: I tensorflow/stream_executor/cuda/cuda_blas.cc:1786] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.


CPU times: user 6.9 s, sys: 9.59 s, total: 16.5 s
Wall time: 11.1 s


In [3]:
evaluation_results = pd.read_csv(output_dir+'/evaluation_results.csv', index_col='index') # get results from default output path
evaluation_results.head(2)

Unnamed: 0_level_0,mse,js_per_seq,js_conc,poiss,pr_corr,sp_corr,targets,pred type,eval type,alpha,...,metrics,model_fn,num_epochs,record_test,rev_comp,shuffle,sigma,verbose,run_dir,scaling_factors
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,0.614011,0.517753,0.618847,0.976301,0.29898,0.163939,PC-3_replicate_1,raw,whole,False,...,"['mse', 'pearsonr', 'poisson']",basenjimod,2,False,True,True,20,True,../tutorial_outputs,
1,0.31332,0.481459,0.497185,0.960087,0.429559,0.331635,Panc1_replicate_1,raw,whole,False,...,"['mse', 'pearsonr', 'poisson']",basenjimod,2,False,True,True,20,True,../tutorial_outputs,


In [4]:
eval_type = 'whole'
pred_type = 'raw'

raw_whole_results = evaluation_results[(evaluation_results['pred type']==pred_type)&
                                      (evaluation_results['eval type']==eval_type)]
print('Average Pearson\'s r per run')
raw_whole_results.groupby('run_dir').mean()

Average Pearson's r per run


Unnamed: 0_level_0,mse,js_per_seq,js_conc,poiss,pr_corr,sp_corr,alpha,batch_size,bin_size,crop,...,log_wandb,lr_decay,lr_patience,num_epochs,record_test,rev_comp,shuffle,sigma,verbose,scaling_factors
run_dir,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
../tutorial_outputs,0.463665,0.499606,0.558016,0.968194,0.364269,0.247787,0.0,64.0,32.0,1.0,...,0.0,0.3,10.0,2.0,0.0,1.0,1.0,20.0,1.0,


In [5]:
# per cell line pearson r results
raw_whole_results.sort_values('pr_corr')[['targets', 'pr_corr']]

Unnamed: 0_level_0,targets,pr_corr
index,Unnamed: 1_level_1,Unnamed: 2_level_1
0,PC-3_replicate_1,0.29898
1,Panc1_replicate_1,0.429559


# A faster way to calculate Pearson's r and MSE

In [6]:
%%time
# use list of run directories
output_dir = '../tutorial_outputs/'
evaluate.evaluate_project(data_dir='../data/tfr_datasets/i_2048_w_1/',
                          run_dir_list=['../tutorial_outputs'], output_dir=output_dir,
                         batch_size=512, fast=True)

USING PREDEFINED LIST OF RUNS
../tutorial_outputs


13it [00:01,  8.23it/s]
13it [00:00, 324.21it/s]

CPU times: user 3.4 s, sys: 624 ms, total: 4.03 s
Wall time: 1.89 s



