# Model Sensitivity Analysis
Maximizing the ELBO is a non-convex optimization problem. The parameters estimate are sensitive to the choice of their initial estimates. Hence, we further evaluate the chosen set of hyperparameters for 50 random initialization and then select the best model out of it. 

Stages of the Analysis
 + Python script for variational posterior computation: **model_sensitivity_fit.py**
 + Script to evaluate the model for 50 random initialization: **mem_model_sensitivity**
 + Analysis of the output based on in sample $LLPD$
 

#### Script to evaluate the model
We have saved the command for calling the python script for parameter estimation in the file **mem_model_sensitivity**.

A line in the file **mem_model_sensitivity** calls the python script **model_sensitivity_fit.py** for a given choice of the parameters. 

*module purge ; module load slurm gcc python3 ; omp_num_threads=1 python3 model_sensitivity_fit.py 100.0 50 0.219 0.06503 0.0 50 200 > logfile/50.log 2>&1*

#### Parameter estimation 
We run the script on server using the command:
*sbatch -N [#node] -p [#partition] disBatch.py -t [#task on each node] [script_file]*

Example: *sbatch -N 2 -p ccm disBatch.py -t 25 mem_model_sensitivity*



#### Model output analysis
Let us consider out model output is saved in the folder **MMSens**. We load each of the output file, compute the $LLPD$ on  full data and select the model with the largest LLPD. 


In [1]:
# load module 
import glob
import pickle
import numpy as np 
import pandas as pd

# Get file name 
folname = 'MMSens/'
fname_o = glob.glob(folname+'*model_nb_cvtest.pkl')
fname_x = []
for tem in fname_o:
    if tem.find('sample') < 0.:
        fname_x.append(tem)
fname_o = fname_x    
#fname_o

In [None]:
import glob 
import pickle
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import hiplot as hip
import os 
import copy 
%matplotlib inline
## We have saved the variational posterior estimated from hyperparameter tuning in [HPcheck] folder 
#print("Current working directory:", os.getcwd())  # Check where the script is running from
fname_o = glob.glob('../results/sensitivity/models/*model_nb_cvtest.pkl') 
print("Looking for files in:", fname_o)
print("Loaded files:", fname_o) #Check if the file is loaded

In [None]:
# Find the relative file paths
#fname_o = glob.glob('../results/hyperparameter/*model_nb_cvtest.pkl')

# Iterate through each .pkl file and inspect its content ech .pkl should have 13 elements 
for file in fname_o:
    print(f"Inspecting file: {os.path.relpath(file)}")
    try:
        with open(file, "rb") as f:
            data = pickle.load(f)
        
        # Check if the data is iterable (e.g., list, tuple, dict)
        if isinstance(data, (list, tuple)):
            print(f"File contains a {type(data).__name__} with {len(data)} elements.")
            for i, element in enumerate(data):
                print(f"  Element {i}: Type={type(element)}")
        elif isinstance(data, dict):
            print(f"File contains a dictionary with {len(data)} keys.")
            for key, value in data.items():
                print(f"  Key='{key}': Type={type(value)}")
        else:
            print(f"File contains a single object of type {type(data).__name__}.")
    except Exception as e:
        print(f"Error while reading {file}: {e}")
    print("-" * 50)


In [None]:
# Extract model output
out = np.empty((len(fname_o),6))
for i in range(0,len(fname_o)):
    if (i%10) ==0:
        print(i)
    [holdout_mask, llpd, n_test, l,m_seed,sp_mean,\
                 sp_var, h_prop, uid, nsample_o,\
                 Yte_fit, cv_test] = pickle.load(open(fname_o[i], "rb"))
    out[i] = [i, l, sp_mean,sp_var,  np.mean(cv_test), np.mean(Yte_fit)]
    

In [3]:
pickle.dump(out, open('best_model_selected.pkl','wb'))  # save output 
out = pickle.load(open('best_model_selected.pkl','rb'))
outx = pd.DataFrame(out)
outx.columns = ['index','rank','lambda', 'upsilon', 'llpd' ,'Log-likelihood']
outx.head(10)

Unnamed: 0,index,rank,lambda,upsilon,llpd,Log-likelihood


In [4]:
# Get the file name and model output from the best model 
best_setting = outx[outx.iloc[:,4] == outx.iloc[:,4].max()]
i = int(best_setting.loc[:,'index'])
fname_o[i]

TypeError: cannot convert the series to <class 'int'>

In [5]:
best_setting

Unnamed: 0,index,rank,lambda,upsilon,llpd,Log-likelihood
2,2.0,200.0,0.246,0.10063,-3.258982,-3.257405


<font color=blue>**Our analysis suggest that MEM with seed 66 is most appropriate with highest full data LLPD.** </font>