In [1]:
from two_site_model import *
from utils.evol_indices_manipulation import *

%load_ext autoreload
%autoreload 2

np.set_printoptions(precision=4, suppress=True)
pd.set_option("display.precision", 3)

MSA_data_folder='./data/MSA'
MSA_weights_location='./data/weights'
VAE_checkpoint_location='./results/VAE_parameters/'
correlations_location='./results/correlations'
mutations_location='./data/mutations'
evol_indices_location = "./results/evol_indices"
comparisons_location = "./results/correlations/_comparisons"

In this notebook we show how to compute the evolutionary indices of 2-site and EVE models for single-mutants, and compare them with validation scores from [this deep mutational scan](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3851721/).

## Conversion of Mi3 couplings to our gauge:

We first translate the 2-site couplings resulting from Mi3 training to our conventions by loading them to an instance of Two_site_model:

In [3]:
Mi3_60 = Two_site_model(correlations_instance_location = correlations_location + os.sep + 
                     "PABP_38k/PABP_38k.Correlations",
                    E_couplings_location = "./data/Mi3_couplings/38k_run60/J.npy")

save_instance_to_file(correlations_location + os.sep + 
                     "PABP_38k/PABP_38k_Mi3_60.Two_site_model", Mi3_60)

Object successfully saved to "./results/correlations/PABP_38k/PABP_38k_Mi3_60.Two_site_model"


## Loading validation scores:

In [4]:
validation_df = pd.read_csv("./data/validation/PABP_YEAST_Fields2013-singles.csv", sep=';', comment='#').rename(
    columns={'mutant': 'mutations',
             'effect_prediction_epistatic': 'EVMutation', 
             'effect_prediction_independent': 'independent',
            'linear': 'experiment_linear'})

## Comparing EVE and Mi3 models:

After computing the single-mutant evolutionary indices for the trained EVE models using `compute_evol_indices.py`, we can compare them with the validation set and with those generated with Mi3 models:

In [5]:
list_EVE_models = [["EVE_400", evol_indices_location + os.sep + "EVE_400" + ".csv"],
                   ["EVE_1400", evol_indices_location + os.sep + "EVE_1400" + ".csv"],
                   ["EVE_NoBayes_200", evol_indices_location + os.sep + "EVE_NoBayes_200" + ".csv"]
                  ]

list_Mi3_models = [['Mi3_60', correlations_location + os.sep + "PABP_38k/PABP_38k_Mi3_60.Two_site_model"]]

offset = 123

list_mutations_location = mutations_location + os.sep + "PABP_all_singles" + ".csv"

out_file_location = comparisons_location + os.sep + "evol_indice_allmodels_38k.csv"


In [6]:
final_df = df_evol_indices_Spearman(validation_df, list_EVE_models, list_Mi3_models, 
                                     offset, list_mutations_location, out_file_location)
final_df

Unnamed: 0,independent,EVMutation,EVE_400,EVE_1400,EVE_NoBayes_200,Mi3_60
0,0.424,0.593,0.572,0.603,0.605,0.594
