# Posthoc Analysis of IMAGEN:
The preliminary results in our IMAGEN paper advocates for a more in-depth understanding of what contributes to the significant performance of the ML models for the three time-points: <br>
<li>Baseline (<b>BL</b>): Age <tr> <b>14</b></li>
<li>Follow 1 year (<b>FU1</b>): Age <b>16</b></li>
<li>Follow 2 year (<b>FU2</b>): Age <b>19</b></li>
<li>Follow 3 year (<b>FU3</b>): Age <b>22</b></li>

<br>
Such in-depth understanding can be achieved by performing follow-up analysis:

1. Summary statistics
2. Sensitivity analysis
3. Error analysis
4. Visualization SHAP

# 4. Visualization SHAP
## 4.1. SHAP values
 1. what is the best/fastest estimator for my 4 models?
 2. how to save and load SHAP values?
 3. What to visualize?

In [1]:
import h5py
from imagen_shap_visualization import *

In [None]:
# load the training data



In [2]:
# load the holdout data

h5_dir = "/ritter/share/data/IMAGEN/h5files/newholdout-fu3-espad-fu3-19a-binge-n102.h5"
data = h5py.File(h5_dir, 'r')
data.keys(), data.attrs.keys()

X = data['X'][()]
# y = data[data.attrs['labels'][0]][:4]
X_col_names = data.attrs['X_col_names']
group_mask = data['sex'].astype(bool)[()]
X.shape, len(X_col_names)

((102, 723), 723)

In [None]:
MODEL = ["SVM-RBF"]
to_shap(MODEL, X)

skipping model GB
skipping model LR
skipping model SVM-lin
generating SHAP values for model = SVM-rbf ..


Permutation explainer: 103it [12:11:37, 426.19s/it]                          
Permutation explainer:   4%|▍         | 4/102 [27:01<9:38:28, 354.17s/it]

In [None]:
with open('explainers/SVM-rbf0_shap.sav', 'rb') as fp:
    load_shap_values = pickle.load(fp)
if not os.path.isdir("figures"):
    os.makedirs("figures")

In [None]:
# plot
for i in [0]:
    # 1. summary bar plot of feature importnace
    shap.summary_plot(load_shap_values, features=X, feature_names=X_col_names, plot_type="bar", show=False)
    plt.title(model_name+": "+str(i))
    plt.savefig(f"figures/SVM-rbf{i}_bar.pdf", bbox_inches='tight')
    
    # 2. swarm plot showing shap values vs feature values ordered by feature importance
    shap.summary_plot(load_shap_values, features=X, feature_names=X_col_names, plot_type="dot", show=False)
    plt.title(model_name+": "+str(i))
    # plt.savefig("figures/{}_swarm.pdf".format(model_name+str(i)), bbox_inches='tight')
    plt.savefig(f"figures/SVM-rbf{i}_swarm.pdf", bbox_inches='tight')

In [None]:
# 1. summary bar plot of feature importnace
shap.summary_plot(load_shap_values, features=X, feature_names=X_col_names, plot_type="bar", show=False)
plt.title(model_name+": "+str(i))
plt.savefig("figures/SVM-rbf0_bar.pdf", bbox_inches='tight')

In [None]:
# 2. swarm plot showing shap values vs feature values ordered by feature importance
shap.summary_plot(load_shap_values, features=X, feature_names=X_col_names, plot_type="dot", show=False)
plt.title(model_name+": "+str(i))
# plt.savefig("figures/{}_swarm.pdf".format(model_name+str(i)), bbox_inches='tight')
plt.savefig("figures/SVM-rbf0_swarm.pdf", bbox_inches='tight')

In [None]:
shap.group_difference_plot(load_shap_values, group_mask=group_mask,feature_names=X_col_names, show=False)
plt.title(model_name+":"+str(i))
# plt.savefig("figrues/{}_bar-sexdiff.pdf".format(model_name+str(i)), bbox_inches='tight')
plt.savefig("figures/SVM-rbf0_bar-sexdiff.pdf", bbox_inches='tight')