# SVAE Training on High-dimensional Register Data

In this report, we will be training our SVAE models on high-dimensional dataset which is one of the two versions of register dataset used in this thesis. These are:

- 1- High-dimensional Dataset: The dataset where we only removed features that are fully correlated and having zero variance
- 2- Moderate-dimensional Dataset: The dataset where we removed features with correlation threshold of 0.9 and variance threshold of 1%.

Each Dataset is trained with 5-Fold Cross-Validation method. The hyperparameters of SVAEs are trained in two steps.

The hyperparameters of the **first step** are:
- Number of Hidden Layers in Encoder/Decoder 
- Number of Neurons in Hidden Layers of Encoder/Decoder 
- Number of Hidden Layers in Classifier 
- Number of Neurons in Hidden Layers of Classifier 
- Latent Size 

The hyperparameters of the **second step** are:
- alpha 
- beta 
- weight decay 



After finding best hyperparameters in the end of hyperparameter tuning, we will test the performance on validation test data which was not included in the validation training data.

### Importing required packages

In [1]:
import sys
import torch
import argparse
from torch.utils.data import DataLoader
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler
from custom_reg_dataset import RegisterDataset
from datetime import datetime
from timeit import default_timer as timer
import time
from statistics import mean as mean_calc

import os

import pytorch_warmup as warmup

from models.SVAE import SVAE
#from custom_dataset import TerraDataset
from utils.loss_fn import loss_fn_SVAE

from pytorchtools import EarlyStopping

from training_methods import cv_fold_maker, hyperparameter_tuner, model_test,latent_corr_calc

from sklearn.metrics import roc_auc_score
from sklearn.metrics import accuracy_score
import random
import numpy as np

## 1.Training on High-Dimensional Dataset

In this part of the report, we will concentrate on the training of the high-dimensional dataset. Before starting training, we will set the device as GPU and prepare convert training data into dataloaders which will be used in training process of SVAEs.

In [2]:
seed = 1
torch.manual_seed(seed)
if torch.cuda.is_available():
    torch.cuda.manual_seed(seed)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Importing the validation training data and validation test data prepared for High-dimensional dataset training.

In [3]:
highdim_valtrain_5f = pd.read_csv("fused_data_valtrain_5f_d365.txt", sep = "\t")
highdim_valtest = pd.read_csv("fused_data_valtest_d365.txt", sep = "\t")

Preparing dataloaders for training folds and validation folds to be used in hyperparameter tuning.

In [4]:
num_folds = 5
batch_size = 64

Dataloader_list, dataset_val_list, highdims = cv_fold_maker(highdim_valtrain_5f, num_folds, batch_size, seed = seed)

Prepared data are moved to device(GPU) and added to lists which will be iterated throughout the 5-Fold Cross Validation.

The order of folds in 5-Fold Cross Validation will be as in the following:
- Training Folds: 2, 3, 4, 5 / Validation Fold: 1
- Training Folds: 1, 3, 4, 5 / Validation Fold: 2
- Training Folds: 1, 2, 4, 5 / Validation Fold: 3
- Training Folds: 1, 2, 3, 5 / Validation Fold: 4
- Training Folds: 1, 2, 3, 4 / Validation Fold: 5

In [5]:
X_val_list = []
y_val_list = []

for dataset in dataset_val_list:
    X_val = dataset.x
    X_val = X_val.to(device)
    
    y_val = dataset.y
    y_val = y_val.to(device)
    
    X_val_list.append(X_val)
    y_val_list.append(y_val)

In above code, we create lists for validation fold data and validaton fold labels.

### 1.1 Hyperparameter Tuning Step 1: NN Architecture

In this first step of hyperparameter tuning, we will tune the hyperparameters related to Neural Network Architectures.

The parameters which will be stable throughout this hyperparameter tuning step are:

In [6]:
batch_size_list = [64]
w_decay_list = [10**-4]
alpha_list = [1]
beta_list = [1]
lr_list= [0.001]
es_thr = 100 

The parameters which will be tuned in this hyperparameter tuning step are:

In [7]:
#encoder number of layers = 1, 3, 5 / encoder number of neurons each layer = 128, 256, 512
encoder_layer_list = [[highdims, 128], [highdims, 256], [highdims, 512], [highdims, 128, 128, 128], \
                      [highdims, 256, 256, 256], [highdims, 512, 512, 512], [highdims, 128, 128, 128, 128, 128] ,\
                     [highdims, 256, 256, 256, 256, 256], [highdims, 512, 512, 512, 512, 512]]
#classifier number of layers = 1, 3, 5 / classifier number of neurons each layer = 128, 256, 512
classifier_layer_list = [[128, 1], [256, 1], [512, 1], [128,128,128, 1], [256,256,256, 1], [512, 512, 512, 1],\
                        [128, 128, 128, 128, 128, 1], [256, 256, 256, 256, 256, 1], [512, 512, 512, 512, 512, 1]]
#number of dimensions in the latent space
latent_size_list = [2, 8, 32, 64, 128]

Now we will use *hyperparameter_tuning* function in order to train SVAE models.

In [8]:
highdim_report_step1, highdim_loss_step1_list, highdim_loss_step1_names = hyperparameter_tuner(device, highdims, num_folds, 
                                                                                               Dataloader_list,
                         X_val_list, y_val_list, batch_size_list, w_decay_list, alpha_list, beta_list, lr_list,
                         encoder_layer_list, classifier_layer_list, latent_size_list, seed = seed, es_thr = es_thr)

Training start date is: 2023-05-08 17:26:11.982795
./net_weights/SVAE_models_08-05-2023_17-26-11 directory is created
08-05-2023_17-26-11_loss_logs directory is created under H:\Projects\My Thesis\loss_values\
1 settings have been checked and saved to report file
6 settings have been checked and saved to report file
11 settings have been checked and saved to report file
16 settings have been checked and saved to report file
21 settings have been checked and saved to report file
26 settings have been checked and saved to report file
31 settings have been checked and saved to report file
36 settings have been checked and saved to report file
41 settings have been checked and saved to report file
46 settings have been checked and saved to report file
51 settings have been checked and saved to report file
56 settings have been checked and saved to report file
61 settings have been checked and saved to report file
66 settings have been checked and saved to report file
71 settings have been 

#### Hyperparameter Tuning Step 1 Results

#### Mean Auroc

In [9]:
highdim_report_step1.sort_values(by=["CV_avg_val_auroc"], ascending = False)

Unnamed: 0,Model_number,Seed,Batch_size,Encoder_num_neurons,Encoder_num_hidden_layers,Clf_num_neurons,Clf_num_hidden_layers,Latent_size,Alpha,Beta,...,CV3_val_acc,CV4_val_acc,CV5_val_acc,CV1_val_auroc,CV2_val_auroc,CV3_val_auroc,CV4_val_auroc,CV5_val_auroc,CV_avg_val_acc,CV_avg_val_auroc
399,400,1,64,512,5,256,5,128,1,1,...,0.587395,0.565401,0.552013,0.569659,0.594262,0.593246,0.585840,0.573016,0.563818,0.583205
359,360,1,64,256,5,512,5,128,1,1,...,0.556303,0.569620,0.521812,0.578966,0.613036,0.571067,0.580295,0.564298,0.540919,0.581532
224,225,1,64,256,3,512,5,128,1,1,...,0.452941,0.576371,0.529362,0.571938,0.579105,0.592878,0.575815,0.585180,0.544034,0.580983
365,366,1,64,512,5,256,1,2,1,1,...,0.609244,0.627848,0.618289,0.569762,0.587092,0.594570,0.573180,0.578116,0.612669,0.580544
376,377,1,64,512,5,128,3,8,1,1,...,0.552941,0.590717,0.553691,0.564092,0.598918,0.578684,0.591941,0.566490,0.578024,0.580025
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
93,94,1,64,512,1,128,1,64,1,1,...,0.526050,0.520675,0.557886,0.556105,0.541329,0.538957,0.524394,0.551548,0.534172,0.542467
121,122,1,64,512,1,128,5,8,1,1,...,0.525210,0.552743,0.531040,0.526394,0.550791,0.535786,0.565294,0.525031,0.530654,0.540659
2,3,1,64,128,1,128,1,32,1,1,...,0.527731,0.556118,0.536074,0.548734,0.550958,0.521035,0.545695,0.534826,0.545326,0.540250
22,23,1,64,128,1,256,3,32,1,1,...,0.512605,0.532489,0.514262,0.538343,0.566210,0.510650,0.535378,0.542144,0.527139,0.538545


Based on the results the best settings we got from the step 1 hyperparameter tuning are:

- Number of Hidden Layers in Encoder/Decoder = 5
- Number of Neurons in Hidden Layers of Encoder/Decoder = 512
- Number of Hidden Layers in Classifier = 5
- Number of Neurons in Hidden Layers of Classifier = 256
- Latent Size = 128


### 1.2 Hyperparameter Tuning Step 2: Loss Hyperparameters

In this second step of hyperparameter tuning, we will tune the loss hyperparameters such as weight decay used for regularization and loss calculation hyperparameters alpha and beta.

The parameters which will be stable throughout this hyperparameter tuning step are:

In [12]:
batch_size_list = [64]
lr_list= [0.001]
es_thr = 100 

#Best values for number of nodes and hidden layers found for encoder/decoder in first tuning step was:
encoder_layer_list = [[highdims, 512, 512, 512, 512, 512]]
#Best values for number of nodes and hidden layers found for classifier in first tuning step was:
classifier_layer_list = [[256, 256, 256, 256, 256, 1]]

latent_size_list = [128]

The parameters which will be tuned in this hyperparameter tuning step are:

In [13]:
w_decay_list = [1, 10**-2, 10**-4, 10**-6]
alpha_list = [0.1, 1, 10, 100]
beta_list = [0.1, 1, 10, 100]

Now we will use *hyperparameter_tuning* function in order to train SVAE models based on the results we got from first step.

In [14]:
highdim_report_step2,highdim_loss_step2_list, highdim_loss_step2_names = hyperparameter_tuner(device, highdims, num_folds, Dataloader_list,
                         X_val_list, y_val_list, batch_size_list, w_decay_list, alpha_list, beta_list, lr_list,
                         encoder_layer_list, classifier_layer_list, latent_size_list, seed = seed, es_thr = es_thr)

Training start date is: 2023-05-15 01:44:25.657980
./net_weights/SVAE_models_15-05-2023_01-44-25 directory is created
15-05-2023_01-44-25_loss_logs directory is created under H:\Projects\My Thesis\loss_values\
1 settings have been checked and saved to report file
6 settings have been checked and saved to report file
11 settings have been checked and saved to report file
16 settings have been checked and saved to report file
21 settings have been checked and saved to report file
26 settings have been checked and saved to report file
31 settings have been checked and saved to report file
36 settings have been checked and saved to report file
41 settings have been checked and saved to report file
46 settings have been checked and saved to report file
51 settings have been checked and saved to report file
56 settings have been checked and saved to report file
61 settings have been checked and saved to report file
Training end date is: 2023-05-16 09:24:39.487368
Total Training Time is: 07:4

#### Hyperparameter Tuning Step 2 Results

#### Mean Auroc

In [15]:
highdim_report_step2.sort_values(by=["CV_avg_val_auroc"], ascending = False)

Unnamed: 0,Model_number,Seed,Batch_size,Encoder_num_neurons,Encoder_num_hidden_layers,Clf_num_neurons,Clf_num_hidden_layers,Latent_size,Alpha,Beta,...,CV3_val_acc,CV4_val_acc,CV5_val_acc,CV1_val_auroc,CV2_val_auroc,CV3_val_auroc,CV4_val_auroc,CV5_val_auroc,CV_avg_val_acc,CV_avg_val_auroc
37,38,1,64,512,5,256,5,128,1.0,1.0,...,0.519328,0.577215,0.613255,0.564609,0.608070,0.580850,0.598203,0.561071,0.551665,0.582561
49,50,1,64,512,5,256,5,128,0.1,1.0,...,0.452941,0.576371,0.529362,0.571938,0.579105,0.592878,0.575815,0.585180,0.544034,0.580983
38,39,1,64,512,5,256,5,128,1.0,10.0,...,0.607563,0.592405,0.569631,0.562406,0.581693,0.595868,0.576676,0.574040,0.580487,0.578136
54,55,1,64,512,5,256,5,128,1.0,10.0,...,0.535294,0.521519,0.456376,0.565510,0.570755,0.600122,0.580811,0.572416,0.535979,0.577923
50,51,1,64,512,5,256,5,128,0.1,10.0,...,0.505042,0.514768,0.619966,0.566116,0.580103,0.582730,0.582789,0.574743,0.521918,0.577296
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
61,62,1,64,512,5,256,5,128,100.0,1.0,...,0.585714,0.602532,0.602349,0.536806,0.565063,0.570133,0.558719,0.547085,0.588897,0.555561
6,7,1,64,512,5,256,5,128,1.0,10.0,...,0.609244,0.629536,0.619966,0.542119,0.547639,0.563258,0.559500,0.560264,0.614694,0.554556
7,8,1,64,512,5,256,5,128,1.0,100.0,...,0.609244,0.629536,0.619966,0.564031,0.553112,0.538766,0.560090,0.554114,0.614694,0.554023
26,27,1,64,512,5,256,5,128,10.0,10.0,...,0.556303,0.573840,0.573826,0.535424,0.550374,0.567288,0.563113,0.553173,0.557915,0.553874


Based on the results the best settings we got from the step 2 hyperparameter tuning are:

- alpha = 1
- beta = 1
- weight decay = 0.0001

### 1.3 High-dimensional Dataset Hyperparameter Tuning Result

In the end, the best SVAE hyperparameter values for the high-dimensional dataset are:

- Number of Hidden Layers in Encoder/Decoder = 5
- Number of Neurons in Hidden Layers of Encoder/Decoder = 512
- Number of Hidden Layers in Classifier = 5
- Number of Neurons in Hidden Layers of Classifier = 256
- Latent Size = 128
- alpha = 1
- beta = 1
- weight decay = 0.0001

## 2. Testing on Validation Test Data

As we have found the best SVAE hyperparameter values for high-dimensional dataset, we can check their performance on validation test data which was not included in hyperparameter tuning.

### 2.1 Testing on Validation Test Data for High-dimensional Data

- Number of Hidden Layers in Encoder/Decoder = 5
- Number of Neurons in Hidden Layers of Encoder/Decoder = 512
- Number of Hidden Layers in Classifier = 5
- Number of Neurons in Hidden Layers of Classifier = 256
- Latent Size = 128
- alpha = 1
- beta = 1
- weight decay = 0.0001

In [22]:
train_df = highdim_valtrain_5f
test_df = highdim_valtest
column_names = test_df.drop(columns = ["persistence_d365","pid"]).columns

In [23]:
batch_size = 64
lr_init = 0.001
es_thr = 100
es_active = True

In [24]:
#Hyperparameters tuned in step 1
encoder_layers = [highdims, 512, 512, 512, 512, 512]
classifier_layers = [256, 256, 256, 256, 256, 1]
latent_size = 128
#Hyperparameters tuned in step 2
w_decay = 0.0001
alpha = 1
beta = 1




In [25]:
report_df_highdim_data = pd.DataFrame(columns = ["seed", "best_auroc_epoch","best_loss_epoch", "accuracy", "auroc"])

In [26]:
seeds_list = list(range(1,500,10))

In [27]:
now = datetime.now()

# dd/mm/YY H:M:S
dt_string = now.strftime("%d-%m-%Y_%H-%M-%S")
seeds_corr_list = []

for i, seed in enumerate(seeds_list):
    print("seed is", seed)
    best_auroc_epoch, best_loss_epoch, acc_score, auroc_score, X_test, y_test, eval_z, eval_pred_labels, eval_pred_prob = model_test(device, 
           train_df, test_df, batch_size, w_decay, alpha, 
           beta, lr_init, encoder_layers, classifier_layers,latent_size, 
            epochs = 500, seed = seed, es_active = es_active, es_thr = es_thr, es_patience = 10, 
           datestring = dt_string, shuffling = False)
    
    #Creating new result row
    results = [seed, best_auroc_epoch, best_loss_epoch, acc_score, auroc_score]
    #adding created result row to the report
    report_df_highdim_data.loc[len(report_df_highdim_data)] = results
    #top variables correlated with latent features are calculated
    latent_f_top_vars_list = latent_corr_calc(X_test, column_names, eval_z)
    #adding the top variables calculated with the specified random seed
    seeds_corr_list.append(latent_f_top_vars_list)
    
 
    
current_dir = os.getcwd()
report_folder = current_dir + "\hyperparameter_reports\\"
if not os.path.exists(report_folder):
    os.mkdir(report_folder)
    print("hyperparameter_reports directory is created under " + current_dir)
    
report_df_highdim_data.to_csv(os.path.join(report_folder,"highdim_data_valtest_report_"+ dt_string +  ".txt"), sep = "\t", index = False)


seed is 1
./net_weights/SVAE_models_16-05-2023_10-06-19/ directory is created
16-05-2023_10-06-19_loss_logs directory is created under H:\Projects\My Thesis\loss_values\
Accuracy Score on the test data is: 0.6075187969924812
ROC-AUC Score on the test data is: 0.6148487643645029
seed is 11
Accuracy Score on the test data is: 0.6270676691729323
ROC-AUC Score on the test data is: 0.6027614435604751
seed is 21
Accuracy Score on the test data is: 0.6075187969924812
ROC-AUC Score on the test data is: 0.5965063991698374
seed is 31
Accuracy Score on the test data is: 0.6225563909774436
ROC-AUC Score on the test data is: 0.6018774741535032
seed is 41
Accuracy Score on the test data is: 0.6210526315789474
ROC-AUC Score on the test data is: 0.5752142664975595
seed is 51
Accuracy Score on the test data is: 0.6210526315789474
ROC-AUC Score on the test data is: 0.5894154271878242
seed is 61
Accuracy Score on the test data is: 0.6285714285714286
ROC-AUC Score on the test data is: 0.5847649794381029
s

In [28]:
report_df_highdim_data["auroc"].mean()

0.5958437103655021

In [29]:
report_df_highdim_data["auroc"].std()

0.015188779651819707

In [30]:
print(" The AUROC Performance of Best SVAE model on high-dimensional dataset over 50 seeds is: ")

print(str(round(report_df_highdim_data["auroc"].mean(),4)) + " +- " + str(round(report_df_highdim_data["auroc"].std(), 4)))

 The AUROC Performance of Best SVAE model on high-dimensional dataset over 50 seeds is: 
0.5958 +- 0.0152


### Correlation calculation between input variables and latent dimensions

In [31]:
current_dir = os.getcwd()
latent_folder = current_dir + "\latent_variable_correlations\\"
if not os.path.exists(latent_folder):
    os.mkdir(latent_folder)
    print("hyperparameter_reports directory is created under " + current_dir)
    
    
top_amount = 10
seed_lenght = len(seeds_list)    

latent_total_corr_list = []
#iterating over each latent variable
for f in range(eval_z.shape[1]):
    f_dict = {}
    #iterating over correlation list of each random seed
    for s, seed_corr in enumerate(seeds_corr_list):
        if s == 0:
            f_corr_seed1 = seed_corr[f].droplevel(level = 0).dropna()
            f_corr_seed1 = pd.DataFrame({'Variable':f_corr_seed1.index, 'AVG_abs_corr':f_corr_seed1.values})
            print("Top "+ str(top_amount) + " variables correlated with latent feature " + str(f+1) + " for random seed " + str(seeds_list[s]))
            f_corr_seed1_sorted = f_corr_seed1.sort_values(by = "AVG_abs_corr", ascending = False)[:top_amount]
            print(f_corr_seed1_sorted)
            f_corr_seed1_sorted.to_csv(os.path.join(latent_folder,"highdim_top_" + str(top_amount) + "_correlations_with_latent_variable_" + str(f+1) + "_"  + dt_string +  "_no_shuffling_random_seed" + str(seeds_list[s]) + ".txt"), sep = "\t", index = True)
            
        f_corr = seed_corr[f].droplevel(level = 0).dropna()
        
        for var_num in range(f_corr.size):
            if f_corr.index[var_num] not in f_dict:
                f_dict[f_corr.index[var_num]] = f_corr[var_num]
            else:
                f_dict[f_corr.index[var_num]] += f_corr[var_num]
        
    f_total_corr_df = pd.DataFrame.from_dict(f_dict, orient = "index").rename(columns = {0:"AVG_abs_corr"})
    latent_total_corr_list.append(f_total_corr_df)
    
        
        

for i in range(len(latent_total_corr_list)):
    latent_avg_corr = latent_total_corr_list[i] / seed_lenght 
    print("Top "+ str(top_amount) + " variables correlated with latent feature " + str(i+1) + " averaged in " + str(seed_lenght) + " random seeds.")
    latent_avg_corr_sorted = latent_avg_corr.sort_values(by = "AVG_abs_corr",ascending = False)[:top_amount]
    print(latent_avg_corr_sorted)
    
    latent_avg_corr_sorted.to_csv(os.path.join(latent_folder,"highdim_top_" + str(top_amount) + "_correlations_with_latent_variable_" + str(i+1) + "_"  + dt_string +  "_no_shuffling_" + str(seed_lenght) + " seeds" + ".txt"), sep = "\t", index = True)


Top 10 variables correlated with latent feature 1 for random seed 1
      Variable  AVG_abs_corr
0  ATC01_N02AG      0.149808
1    ICD01_O80      0.134327
2  ATC10_R01AD      0.124578
3  ATC10_B03BA      0.124304
4  ATC10_R01AC      0.121373
5  ATC01_G02BB      0.115956
6    ICD10_Z53      0.114751
7    ICD01_D25      0.112630
8    ICD10_N71      0.112621
9    ICD01_Z37      0.111198
Top 10 variables correlated with latent feature 2 for random seed 1
      Variable  AVG_abs_corr
0  ATC01_N02BA      0.172295
1    ICD10_D07      0.129577
2    ICD01_R00      0.124491
3    ICD01_J90      0.120814
4    ICD10_J68      0.118486
5  ATC10_L01BC      0.118486
6    ICD10_R91      0.112542
7    ICD01_R10      0.111117
8    ICD10_B98      0.110924
9  ATC01_C10AB      0.110924
Top 10 variables correlated with latent feature 3 for random seed 1
    Variable  AVG_abs_corr
0  ICD10_E83      0.125471
1  ICD10_G00      0.122241
2  ICD01_D61      0.122241
3  ICD10_N17      0.122241
4  ICD10_G94      0.122

Top 10 variables correlated with latent feature 23 for random seed 1
      Variable  AVG_abs_corr
0    ICD01_J30      0.164738
1    ICD01_A09      0.134015
2  ATC01_C09CA      0.123173
3    ICD01_H25      0.122389
4  ATC01_N05CF      0.119237
5  ATC10_C09CA      0.119050
6  ATC01_L04AA      0.116168
7  ATC01_J01EA      0.115790
8    ICD10_E10      0.111684
9    ICD10_J01      0.108484
Top 10 variables correlated with latent feature 24 for random seed 1
      Variable  AVG_abs_corr
0    ICD10_M67      0.117003
1    ICD01_K52      0.106585
2    ICD10_Z99      0.100078
3    ICD10_L29      0.099119
4    ICD10_L73      0.097201
5    ICD10_C64      0.096062
6    ICD01_I73      0.095282
7    ICD10_F29      0.094903
8    ICD10_R60      0.093542
9  ATC01_N02BE      0.092255
Top 10 variables correlated with latent feature 25 for random seed 1
      Variable  AVG_abs_corr
0    ICD01_K58      0.122931
1  ATC10_D06BB      0.119822
2    ICD10_K80      0.113101
3    ICD01_S66      0.112359
4    ICD01

Top 10 variables correlated with latent feature 45 for random seed 1
      Variable  AVG_abs_corr
0    ICD10_K90      0.116233
1    ICD01_O48      0.115989
2    ICD01_E14      0.115655
3    ICD01_Z91      0.113159
4    ICD10_M80      0.110100
5    ICD01_O61      0.109318
6    ICD10_J11      0.107761
7    ICD01_S30      0.107761
8  ATC01_M01CB      0.107761
9    ICD01_G25      0.107761
Top 10 variables correlated with latent feature 46 for random seed 1
      Variable  AVG_abs_corr
0    ICD10_Z12      0.142189
1  ATC01_D01AC      0.136539
2    ICD10_S50      0.128146
3    ICD01_B34      0.128146
4    ICD10_L29      0.126361
5  ATC01_M04AC      0.120687
6    ICD10_S90      0.119970
7    ICD01_K31      0.118181
8  ATC10_N07AX      0.118181
9    ICD10_J14      0.118181
Top 10 variables correlated with latent feature 47 for random seed 1
      Variable  AVG_abs_corr
0    ICD10_Z21      0.149091
1    ICD10_N97      0.143177
2  ATC10_R03BB      0.134681
3  ATC01_N02BE      0.130265
4    ICD10

Top 10 variables correlated with latent feature 67 for random seed 1
      Variable  AVG_abs_corr
0  ATC10_D06BX      0.135107
1  ATC10_N03AX      0.122369
2    ICD10_A05      0.119986
3  ATC01_A01AD      0.115265
4    ICD10_G56      0.115038
5    ICD01_G44      0.113036
6    ICD10_N95      0.112690
7  ATC01_N03AX      0.111923
8    ICD01_F84      0.111893
9    ICD01_S22      0.109327
Top 10 variables correlated with latent feature 68 for random seed 1
      Variable  AVG_abs_corr
0  ATC10_A04AD      0.145508
1  ATC10_A03FA      0.130486
2    ICD01_T85      0.129592
3    ICD01_S51      0.129592
4    ICD10_K90      0.126644
5    ICD01_L72      0.126423
6    ICD01_M08      0.119192
7  ATC01_M01AC      0.117891
8  ATC01_C05BA      0.115712
9    ICD01_M19      0.113216
Top 10 variables correlated with latent feature 69 for random seed 1
      Variable  AVG_abs_corr
0    ICD10_S22      0.131407
1    ICD10_D07      0.120610
2  ATC01_R06AE      0.117540
3  ATC01_M03BB      0.116620
4    ICD10

Top 10 variables correlated with latent feature 88 for random seed 1
      Variable  AVG_abs_corr
0    ICD10_E87      0.126264
1  ATC10_Y92FB      0.123779
2  ATC10_Y92GA      0.123779
3  ATC10_Y92HB      0.123779
4    ICD10_M79      0.122683
5  ATC01_C03DB      0.121234
6    ICD10_R59      0.116467
7    ICD10_J46      0.115441
8    ICD10_M95      0.113237
9    ICD10_H57      0.112500
Top 10 variables correlated with latent feature 89 for random seed 1
              Variable  AVG_abs_corr
0  edu_lvl_09y_or_less      0.141198
1           workloss10      0.139331
2          ATC10_C08CA      0.136772
3            ICD01_Z26      0.133904
4          ATC10_G03CX      0.117572
5      edu_lvl_missing      0.115223
6            ICD10_K04      0.113196
7          ATC01_C03DA      0.112955
8            ICD01_M24      0.111199
9            ICD01_D84      0.111199
Top 10 variables correlated with latent feature 90 for random seed 1
      Variable  AVG_abs_corr
0    ICD10_N18      0.170860
1  ATC10_

Top 10 variables correlated with latent feature 109 for random seed 1
      Variable  AVG_abs_corr
0    ICD01_L81      0.139677
1  ATC01_C08DA      0.139677
2    ICD01_R15      0.139677
3    ICD01_K42      0.139677
4    ICD10_E34      0.139677
5  ATC10_J01DC      0.137756
6  ATC10_G01AF      0.130923
7    ICD10_S24      0.129209
8    ICD10_D68      0.127521
9  ATC01_J01EE      0.125123
Top 10 variables correlated with latent feature 110 for random seed 1
      Variable  AVG_abs_corr
0  ATC01_J01CR      0.138901
1    ICD10_M25      0.123329
2  ATC10_N05CM      0.123075
3    ICD01_B34      0.116911
4    ICD10_S50      0.116911
5    ICD10_M76      0.116703
6    ICD10_S35      0.111199
7    ICD01_I49      0.110531
8    ICD10_H33      0.106560
9    ICD10_L60      0.106313
Top 10 variables correlated with latent feature 111 for random seed 1
      Variable  AVG_abs_corr
0  ATC10_C08CA      0.140650
1  ATC01_C08CA      0.134824
2    ICD01_S43      0.120375
3       female      0.112699
4  ATC1

Top 10 variables correlated with latent feature 30 averaged in 50 random seeds.
             AVG_abs_corr
ICD10_R15        0.046637
ATC01_G04BX      0.046449
ATC01_D11AX      0.046026
hosptime_10      0.045880
ATC10_B01AA      0.045765
ICD10_Z64        0.044552
ATC10_N02AA      0.044251
ICD10_F17        0.043986
ICD01_Z85        0.043281
ICD10_R42        0.043142
Top 10 variables correlated with latent feature 31 averaged in 50 random seeds.
             AVG_abs_corr
hosptime_10      0.046749
hosptime_01      0.044589
ICD10_O23        0.043209
ICD10_N31        0.042859
ICD01_Z96        0.042544
pain_m0          0.042499
ATC01_D11AX      0.042242
ICD10_R15        0.042165
ATC01_A02BC      0.042036
ICD10_K26        0.041971
Top 10 variables correlated with latent feature 32 averaged in 50 random seeds.
             AVG_abs_corr
hosptime_10      0.050803
ICD01_E11        0.049566
ICD10_T92        0.045953
ATC01_D02AE      0.045510
ATC01_A11EA      0.045445
ATC01_N05CF      0.044688
ICD10_

             AVG_abs_corr
hosptime_10      0.050402
ICD10_L73        0.047381
ICD10_R15        0.045862
ATC10_N06AX      0.045101
ICD10_I50        0.044752
ATC01_D07AC      0.044513
ATC01_C03CA      0.043937
ATC10_A02AD      0.043719
ATC10_G04BD      0.043272
ATC01_N02AJ      0.043168
Top 10 variables correlated with latent feature 60 averaged in 50 random seeds.
             AVG_abs_corr
ATC10_R03AK      0.046894
ICD10_J30        0.045887
hosptime_10      0.045114
ATC10_S01GX      0.044986
ATC10_N02AA      0.043627
ATC10_A06AD      0.042773
hosptime_01      0.042752
ICD01_N10        0.042271
ICD10_N76        0.042146
ATC01_G04BX      0.042118
Top 10 variables correlated with latent feature 61 averaged in 50 random seeds.
             AVG_abs_corr
hosptime_10      0.054148
ICD01_Z85        0.048015
ICD10_R60        0.047924
ATC01_S01AE      0.044428
ICD10_K08        0.044428
ICD10_K05        0.044428
ATC10_G03CA      0.043827
ICD10_R15        0.043333
ICD10_E11        0.042965
ATC10_Y9

             AVG_abs_corr
hosptime_10      0.046264
hosptime_01      0.044265
ICD10_K92        0.043966
ICD10_G44        0.043586
ATC10_Y92AD      0.043586
ICD10_D50        0.043483
ICD01_R32        0.043344
ICD10_E11        0.043087
ATC10_A06AB      0.042183
ICD10_C20        0.042064
Top 10 variables correlated with latent feature 97 averaged in 50 random seeds.
             AVG_abs_corr
hosptime_10      0.051676
ICD10_O23        0.047312
ATC01_D11AX      0.046768
ATC01_B03BA      0.046737
ICD01_Z85        0.044618
ICD10_I80        0.044453
ATC10_A10BA      0.044446
ICD10_N91        0.044389
ICD10_R10        0.044375
ATC10_Y92BA      0.043995
Top 10 variables correlated with latent feature 98 averaged in 50 random seeds.
             AVG_abs_corr
ICD10_R15        0.054789
hosptime_10      0.049127
ATC01_D10AX      0.046482
ATC01_G04BX      0.046008
ATC01_D11AX      0.045163
ATC01_C03DA      0.044916
ATC01_A06AD      0.044592
ICD10_Z50        0.044509
ICD10_D35        0.044384
ICD01_K4

ATC01_N03AF      0.041922
Top 10 variables correlated with latent feature 128 averaged in 50 random seeds.
             AVG_abs_corr
hosptime_10      0.043503
ICD10_R10        0.043399
ATC01_P01BA      0.043397
ICD01_M05        0.042665
ATC01_S01XA      0.042550
ATC01_S01BC      0.042462
ATC10_B03BA      0.042237
ICD01_Z96        0.041772
ATC01_S01ED      0.041270
ICD10_J30        0.040936
