# Ablation - Evaluating Varied Neural Network Configurations
As our second ablation, we evaluated the perfomance of competing Neural Network model configurations on the FEBRL Regen. (regenerated) dataset. Specifically, the following three model configuations were prepared:
* NN with Logistic activation function (original study
used ’RELU’).
* NN with ’ADAM’ solver (original study used ’lbfgs’).
* NN with ’SGD’ (Stochastic Gradient Descent) solver.

All other configuration parameters were left unchanged except those identified above. These models were then benchmarked on the FEBRL Regn. dataset dataset over 10 runs and reported the following averaged results (along with their respective standard deviations).

The results of this ablation provide us with some exciting results. It can be observed that all our custom configurations of the original NN model perfom significantly better than the one used in the original study. Moreover, they even perform better than every other model in the original study, including the ensemble model claimed by the authors as the best performer. Specifically, our NN model using Adam solver performs with the highest precision (98.85%) while Logistic Activation provides the smallest number of False Counts (96.6).

In [1]:
import recordlinkage as rl, pandas as pd, numpy as np
from sklearn.model_selection import KFold
from sklearn import svm
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.neural_network import MLPClassifier
from sklearn.utils import shuffle
from recordlinkage.preprocessing import phonetic
from numpy.random import choice
import collections, numpy
from IPython.display import clear_output
from sklearn.model_selection import train_test_split, KFold
from math import comb

## General Functions

In [2]:
'''
Source: 
K. Vo, J. Jonnagaddala and S.-T. Liaw, "Medical-Record-Linkage-Ensemble," 16 February 2019. [Online]. 
Available: https://github.com/ePBRN/Medical-Record-Linkage-Ensemble/.
'''
def generate_true_links(df): 
    # although the match_id column is included in the original df to imply the true links,
    # this function will create the true_link object identical to the true_links properties
    # of recordlinkage toolkit, in order to exploit "Compare.compute()" from that toolkit
    # in extract_function() for extracting features quicker.
    # This process should be deprecated in the future release of the UNSW toolkit.
    df["rec_id"] = df.index.values.tolist()
    indices_1 = []
    indices_2 = []
    processed = 0
    for match_id in df["match_id"].unique():
        if match_id != -1:    
            processed = processed + 1
            # print("In routine generate_true_links(), count =", processed)
            # clear_output(wait=True)
            linkages = df.loc[df['match_id'] == match_id]
            for j in range(len(linkages)-1):
                for k in range(j+1, len(linkages)):
                    indices_1 = indices_1 + [linkages.iloc[j]["rec_id"]]
                    indices_2 = indices_2 + [linkages.iloc[k]["rec_id"]]    
    links = pd.MultiIndex.from_arrays([indices_1,indices_2])
    return links

def generate_false_links(df, size):
    # A counterpart of generate_true_links(), with the purpose to generate random false pairs
    # for training. The number of false pairs in specified as "size".
    df["rec_id"] = df.index.values.tolist()
    indices_1 = []
    indices_2 = []
    unique_match_id = df["match_id"].unique()
    for j in range(size):
            false_pair_ids = choice(unique_match_id, 2)
            candidate_1_cluster = df.loc[df['match_id'] == false_pair_ids[0]]
            candidate_1 = candidate_1_cluster.iloc[choice(range(len(candidate_1_cluster)))]
            candidate_2_cluster = df.loc[df['match_id'] == false_pair_ids[1]]
            candidate_2 = candidate_2_cluster.iloc[choice(range(len(candidate_2_cluster)))]    
            indices_1 = indices_1 + [candidate_1["rec_id"]]
            indices_2 = indices_2 + [candidate_2["rec_id"]]  
    links = pd.MultiIndex.from_arrays([indices_1,indices_2])
    return links

def swap_fields_flag(f11, f12, f21, f22):
    return int((f11 == f22) and (f12 == f21))

def extract_features(df, links):
    c = rl.Compare()
    c.string('given_name', 'given_name', method='jarowinkler', label='y_name')
    c.string('given_name_soundex', 'given_name_soundex', method='jarowinkler', label='y_name_soundex')
    c.string('given_name_nysiis', 'given_name_nysiis', method='jarowinkler', label='y_name_nysiis')
    c.string('surname', 'surname', method='jarowinkler', label='y_surname')
    c.string('surname_soundex', 'surname_soundex', method='jarowinkler', label='y_surname_soundex')
    c.string('surname_nysiis', 'surname_nysiis', method='jarowinkler', label='y_surname_nysiis')
    c.exact('street_number', 'street_number', label='y_street_number')
    c.string('address_1', 'address_1', method='levenshtein', threshold=0.7, label='y_address1')
    c.string('address_2', 'address_2', method='levenshtein', threshold=0.7, label='y_address2')
    c.exact('postcode', 'postcode', label='y_postcode')
    c.exact('day', 'day', label='y_day')
    c.exact('month', 'month', label='y_month')
    c.exact('year', 'year', label='y_year')
        
    # Build features
    feature_vectors = c.compute(links, df, df)
    return feature_vectors

def generate_train_X_y(df):
    # This routine is to generate the feature vector X and the corresponding labels y
    # with exactly equal number of samples for both classes to train the classifier.
    pos = extract_features(df, train_true_links)
    train_false_links = generate_false_links(df, len(train_true_links))    
    neg = extract_features(df, train_false_links)
    X = pos.values.tolist() + neg.values.tolist()
    y = [1]*len(pos)+[0]*len(neg)
    X, y = shuffle(X, y, random_state=0)
    X = np.array(X)
    y = np.array(y)
    return X, y



def classify(model, test_vectors):
    result = model.predict(test_vectors)
    return result

    
def evaluation(test_labels, result):
    true_pos = np.logical_and(test_labels, result)
    count_true_pos = np.sum(true_pos)
    true_neg = np.logical_and(np.logical_not(test_labels),np.logical_not(result))
    count_true_neg = np.sum(true_neg)
    false_pos = np.logical_and(np.logical_not(test_labels), result)
    count_false_pos = np.sum(false_pos)
    false_neg = np.logical_and(test_labels,np.logical_not(result))
    count_false_neg = np.sum(false_neg)
    precision = count_true_pos/(count_true_pos+count_false_pos)
    sensitivity = count_true_pos/(count_true_pos+count_false_neg) # sensitivity = recall
    confusion_matrix = [count_true_pos, count_false_pos, count_false_neg, count_true_neg]
    no_links_found = np.count_nonzero(result)
    no_false = count_false_pos + count_false_neg
    Fscore = 2*precision*sensitivity/(precision+sensitivity)
    metrics_result = {'no_false':no_false, 'confusion_matrix':confusion_matrix ,'precision':precision,
                     'sensitivity':sensitivity ,'no_links':no_links_found, 'F-score': Fscore}
    return metrics_result

def blocking_performance(candidates, true_links, df):
    count = 0
    for candi in candidates:
        if df.loc[candi[0]]["match_id"]==df.loc[candi[1]]["match_id"]:
            count = count + 1
    return count

## Ablation Presets and Functions

In [3]:
def train_nn_ablation(modeltype, modelparam, train_vectors, train_labels):
    
    if modeltype == 'nn_original': # NN with RELU Activation as used by authors
        model = MLPClassifier(solver='lbfgs', alpha=modelparam, hidden_layer_sizes=(256, ), 
                              activation = 'relu',random_state=None, batch_size='auto', 
                              learning_rate='constant',  learning_rate_init=0.001, 
                              power_t=0.5, max_iter=10000, shuffle=True, 
                              tol=0.0001, verbose=False, warm_start=False, momentum=0.9, 
                              nesterovs_momentum=True, early_stopping=False, 
                              validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08)
        model.fit(train_vectors, train_labels)
        
    elif modeltype == 'nn_logistic_activation': # NN with Logistic Activation
        model = MLPClassifier(solver='lbfgs', alpha=modelparam, hidden_layer_sizes=(256, ), 
                              activation = 'logistic',random_state=None, batch_size='auto', 
                              learning_rate='constant',  learning_rate_init=0.001, 
                              power_t=0.5, max_iter=10000, shuffle=True, 
                              tol=0.0001, verbose=False, warm_start=False, momentum=0.9, 
                              nesterovs_momentum=True, early_stopping=False, 
                              validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08)
        model.fit(train_vectors, train_labels)
        
    elif modeltype == 'nn_adam': # Neural Network using ADAM Solver
        model = MLPClassifier(solver='adam', alpha=modelparam, hidden_layer_sizes=(256, ), 
                              activation = 'relu',random_state=None, batch_size='auto', 
                              learning_rate='constant',  learning_rate_init=0.001, 
                              power_t=0.5, max_iter=10000, shuffle=True, 
                              tol=0.0001, verbose=False, warm_start=False, momentum=0.9, 
                              nesterovs_momentum=True, early_stopping=False, 
                              validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08)
        model.fit(train_vectors, train_labels)
        
    elif modeltype == 'nn_sgd': # Neural Network using SGD Solver
        model = MLPClassifier(solver='sgd', alpha=modelparam, hidden_layer_sizes=(256, ), 
                              activation = 'relu',random_state=None, batch_size='auto', 
                              learning_rate='adaptive',  learning_rate_init=0.001, 
                              power_t=0.5, max_iter=10000, shuffle=True, 
                              tol=0.0001, verbose=False, warm_start=False, momentum=0.9, 
                              nesterovs_momentum=True, early_stopping=False, 
                              validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08)
        model.fit(train_vectors, train_labels)
    
    return model

In [4]:
'''
Source: 
K. Vo, J. Jonnagaddala and S.-T. Liaw, "Medical-Record-Linkage-Ensemble," 16 February 2019. [Online]. 
Available: https://github.com/ePBRN/Medical-Record-Linkage-Ensemble/.
'''
trainset = 'febrl3_UNSW'
testset = 'febrl4_UNSW'


In [5]:
%%time
'''
Source: 
K. Vo, J. Jonnagaddala and S.-T. Liaw, "Medical-Record-Linkage-Ensemble," 16 February 2019. [Online]. 
Available: https://github.com/ePBRN/Medical-Record-Linkage-Ensemble/.
'''
## TRAIN SET CONSTRUCTION

# Import
print("Import train set...")
df_train = pd.read_csv(trainset+".csv", index_col = "rec_id")
train_true_links = generate_true_links(df_train)
print("Train set size:", len(df_train), ", number of matched pairs: ", str(len(train_true_links)))

# Preprocess train set
df_train['postcode'] = df_train['postcode'].astype(str)
df_train['given_name_soundex'] = phonetic(df_train['given_name'], method='soundex')
df_train['given_name_nysiis'] = phonetic(df_train['given_name'], method='nysiis')
df_train['surname_soundex'] = phonetic(df_train['surname'], method='soundex')
df_train['surname_nysiis'] = phonetic(df_train['surname'], method='nysiis')

# Final train feature vectors and labels
X_train, y_train = generate_train_X_y(df_train)
print("Finished building X_train, y_train")

Import train set...
Train set size: 5000 , number of matched pairs:  1165


  s = s.str.replace(r"[\-\_\s]", "")
  s = s.str.replace(r"[\-\_\s]", "")
  s = s.str.replace(r"[\-\_\s]", "")
  s = s.str.replace(r"[\-\_\s]", "")


Finished building X_train, y_train
CPU times: user 835 ms, sys: 52.6 ms, total: 887 ms
Wall time: 883 ms


In [6]:
%%time
'''
Source: 
K. Vo, J. Jonnagaddala and S.-T. Liaw, "Medical-Record-Linkage-Ensemble," 16 February 2019. [Online]. 
Available: https://github.com/ePBRN/Medical-Record-Linkage-Ensemble/.

Code has been modified to reproduce and print Table 4 of the paper.
'''
# Blocking Criteria: declare non-match of all of the below fields disagree
# Import
print("Import test set...")
FEBRL_blocking_results = []
df_test = pd.read_csv(testset+".csv", index_col = "rec_id")
test_true_links = generate_true_links(df_test)
leng_test_true_links = len(test_true_links)
print("Test set size:", len(df_test), ", number of matched pairs: ", str(leng_test_true_links))

total_possible_pairs = comb(len(df_test),2)
match_pairs = leng_test_true_links

print("BLOCKING PERFORMANCE:")
blocking_fields = ["given_name", "surname", "postcode"]
all_candidate_pairs = []
for field in blocking_fields:
    block_indexer = rl.BlockIndex(on=field)
    candidates = block_indexer.index(df_test)
    detects = blocking_performance(candidates, test_true_links, df_test)
    all_candidate_pairs = candidates.union(all_candidate_pairs)
    print("Number of pairs of matched "+ field +": "+str(len(candidates)), ", detected ",
         detects,'/'+ str(leng_test_true_links) + " true matched pairs, missed " + 
          str(leng_test_true_links-detects) )
    # row 1
    row = []
    row.append(field)
    row.append('nc')
    nc = len(candidates)
    row.append(nc)
    FEBRL_blocking_results.append(row)
    
    # row 2 
    row = []
    row.append(field)
    row.append('pc')
    pc = round(detects/match_pairs*100.0, 2)
    row.append(pc)
    FEBRL_blocking_results.append(row)
    
    # row 3
    row = []
    row.append(field)
    row.append('rr')
    rr = round((1-(len(candidates)/1.0/total_possible_pairs))*100, 2)
    row.append(rr)
    FEBRL_blocking_results.append(row)
    
detects = blocking_performance(all_candidate_pairs, test_true_links, df_test)
print("Number of pairs of at least 1 field matched: " + str(len(all_candidate_pairs)), ", detected ",
     detects,'/'+ str(leng_test_true_links) + " true matched pairs, missed " + 
          str(leng_test_true_links-detects) )

#Reproducing Table 4
# row 1
row_all = []
row_all.append('All')
row_all.append('nc')
nc = len(all_candidate_pairs)
row_all.append(nc)
FEBRL_blocking_results.append(row_all)

# row 2
row_all = []
row_all.append('All')
row_all.append('pc')
pc = round(detects/match_pairs*100.0, 2)
row_all.append(pc)
FEBRL_blocking_results.append(row_all)

# row 3
row_all = []
row_all.append('All')
row_all.append('rr')
rr = round((1-(len(candidates)/1.0/total_possible_pairs))*100, 2)
row_all.append(rr)
FEBRL_blocking_results.append(row_all)

Import test set...
Test set size: 10000 , number of matched pairs:  5000
BLOCKING PERFORMANCE:
Number of pairs of matched given_name: 154898 , detected  3287 /5000 true matched pairs, missed 1713
Number of pairs of matched surname: 170843 , detected  3325 /5000 true matched pairs, missed 1675
Number of pairs of matched postcode: 53197 , detected  4219 /5000 true matched pairs, missed 781
Number of pairs of at least 1 field matched: 372073 , detected  4894 /5000 true matched pairs, missed 106
CPU times: user 54.8 s, sys: 126 ms, total: 55 s
Wall time: 55 s


In [7]:
%%time
'''
Source: 
K. Vo, J. Jonnagaddala and S.-T. Liaw, "Medical-Record-Linkage-Ensemble," 16 February 2019. [Online]. 
Available: https://github.com/ePBRN/Medical-Record-Linkage-Ensemble/.
'''
## TEST SET CONSTRUCTION

# Preprocess test set
print("Processing test set...")
print("Preprocess...")
df_test['postcode'] = df_test['postcode'].astype(str)
df_test['given_name_soundex'] = phonetic(df_test['given_name'], method='soundex')
df_test['given_name_nysiis'] = phonetic(df_test['given_name'], method='nysiis')
df_test['surname_soundex'] = phonetic(df_test['surname'], method='soundex')
df_test['surname_nysiis'] = phonetic(df_test['surname'], method='nysiis')

# Test feature vectors and labels construction
print("Extract feature vectors...")
df_X_test = extract_features(df_test, all_candidate_pairs)
vectors = df_X_test.values.tolist()
labels = [0]*len(vectors)
feature_index = df_X_test.index
for i in range(0, len(feature_index)):
    if df_test.loc[feature_index[i][0]]["match_id"]==df_test.loc[feature_index[i][1]]["match_id"]:
        labels[i] = 1
X_test, y_test = shuffle(vectors, labels, random_state=0)
X_test = np.array(X_test)
y_test = np.array(y_test)
print("Count labels of y_test:",collections.Counter(y_test))
print("Finished building X_test, y_test")

Processing test set...
Preprocess...
Extract feature vectors...


  s = s.str.replace(r"[\-\_\s]", "")
  s = s.str.replace(r"[\-\_\s]", "")
  s = s.str.replace(r"[\-\_\s]", "")
  s = s.str.replace(r"[\-\_\s]", "")


Count labels of y_test: Counter({0: 367179, 1: 4894})
Finished building X_test, y_test
CPU times: user 35.2 s, sys: 158 ms, total: 35.3 s
Wall time: 35.4 s


## Running the Ablation Tests

In [12]:
%%time
'''
Modifying the code provided by the authors to produce the results in Table 6 of the paper. 
Used the hyperparameters as specified by Table 5 of the paper to build the models.

Source: 
K. Vo, J. Jonnagaddala and S.-T. Liaw, "Medical-Record-Linkage-Ensemble," 16 February 2019. [Online]. 
Available: https://github.com/ePBRN/Medical-Record-Linkage-Ensemble/.
'''

NN_original_pr = [] 
NN_original_re = [] 
NN_original_fs = [] 
NN_original_fc = [] 

NN_logistic_pr = [] 
NN_logistic_re = [] 
NN_logistic_fs = [] 
NN_logistic_fc = []

NN_sgd_pr = [] 
NN_sgd_re = [] 
NN_sgd_fs = [] 
NN_sgd_fc = []

NN_adam_pr = [] 
NN_adam_re = [] 
NN_adam_fs = [] 
NN_adam_fc = []


for i in range(10):
    print('Run number: ', i)
    models = ['nn_original', 'nn_logistic_activation', 'nn_sgd', 'nn_adam']
    
    # NN ORIGINAL
    modeltype = models[0]
    modelparam = 100
    #TRAIN & EVAL
    md = train_nn_ablation(modeltype, modelparam, X_train, y_train)
    final_result = classify(md, X_test)
    final_eval = evaluation(y_test, final_result)
    precision = final_eval['precision']
    sensitivity = final_eval['sensitivity']
    Fscore = final_eval['F-score']
    nb_false = final_eval['no_false']
    #ADD RESULTS
    NN_original_pr.append(precision)
    NN_original_re.append(sensitivity)
    NN_original_fs.append(Fscore)
    NN_original_fc.append(nb_false)
    
    # NN LOGISTIC
    modeltype = models[1]
    modelparam = 100
    #TRAIN & EVAL
    md = train_nn_ablation(modeltype, modelparam, X_train, y_train)
    final_result = classify(md, X_test)
    final_eval = evaluation(y_test, final_result)
    precision = final_eval['precision']
    sensitivity = final_eval['sensitivity']
    Fscore = final_eval['F-score']
    nb_false = final_eval['no_false']
    #ADD RESULTS
    NN_logistic_pr.append(precision)
    NN_logistic_re.append(sensitivity)
    NN_logistic_fs.append(Fscore)
    NN_logistic_fc.append(nb_false)
    
    # NN SGD
    modeltype = models[2]
    modelparam = 100
    #TRAIN & EVAL
    md = train_nn_ablation(modeltype, modelparam, X_train, y_train)
    final_result = classify(md, X_test)
    final_eval = evaluation(y_test, final_result)
    precision = final_eval['precision']
    sensitivity = final_eval['sensitivity']
    Fscore = final_eval['F-score']
    nb_false = final_eval['no_false']
    #ADD RESULTS
    NN_sgd_pr.append(precision)
    NN_sgd_re.append(sensitivity)
    NN_sgd_fs.append(Fscore)
    NN_sgd_fc.append(nb_false)
    
    # NN ADAM
    modeltype = models[3]
    modelparam = 100
    #TRAIN & EVAL
    md = train_nn_ablation(modeltype, modelparam, X_train, y_train)
    final_result = classify(md, X_test)
    final_eval = evaluation(y_test, final_result)
    precision = final_eval['precision']
    sensitivity = final_eval['sensitivity']
    Fscore = final_eval['F-score']
    nb_false = final_eval['no_false']
    #ADD RESULTS
    NN_adam_pr.append(precision)
    NN_adam_re.append(sensitivity)
    NN_adam_fs.append(Fscore)
    NN_adam_fc.append(nb_false)
    
    
pr_col_MEAN = []
pr_col_MEAN.append(sum(NN_original_pr) / float(len(NN_original_pr)))
pr_col_MEAN.append(sum(NN_logistic_pr) / float(len(NN_logistic_pr)))
pr_col_MEAN.append(sum(NN_sgd_pr) / float(len(NN_sgd_pr)))
pr_col_MEAN.append(sum(NN_adam_pr) / float(len(NN_adam_pr)))

re_col_MEAN = []
re_col_MEAN.append(sum(NN_original_re) / float(len(NN_original_re)))
re_col_MEAN.append(sum(NN_logistic_re) / float(len(NN_logistic_re)))
re_col_MEAN.append(sum(NN_sgd_re) / float(len(NN_sgd_re)))
re_col_MEAN.append(sum(NN_adam_re) / float(len(NN_adam_re)))

fs_col_MEAN = []
fs_col_MEAN.append(sum(NN_original_fs) / float(len(NN_original_fs)))
fs_col_MEAN.append(sum(NN_logistic_fs) / float(len(NN_logistic_fs)))
fs_col_MEAN.append(sum(NN_sgd_fs) / float(len(NN_sgd_fs)))
fs_col_MEAN.append(sum(NN_adam_fs) / float(len(NN_adam_fs)))

fc_col_MEAN = []
fc_col_MEAN.append(sum(NN_original_fc) / float(len(NN_original_fc)))
fc_col_MEAN.append(sum(NN_logistic_fc) / float(len(NN_logistic_fc)))
fc_col_MEAN.append(sum(NN_sgd_fc) / float(len(NN_sgd_fc)))
fc_col_MEAN.append(sum(NN_adam_fc) / float(len(NN_adam_fc)))


models = ['nn_original', 'nn_logistic_activation', 'nn_sgd', 'nn_adam']
df_means = pd.DataFrame(models, columns=['Model'])
df_means['pr(%)'] = pr_col_MEAN
df_means['pr(%)'] = df_means['pr(%)']*100
df_means['re(%)'] = re_col_MEAN
df_means['re(%)'] = df_means['re(%)']*100
df_means['fs(%)'] = fs_col_MEAN
df_means['fs(%)'] = df_means['fs(%)']*100
df_means['fc'] = fc_col_MEAN

df_means

Run number:  0
Run number:  1
Run number:  2
Run number:  3
Run number:  4
Run number:  5
Run number:  6
Run number:  7
Run number:  8
Run number:  9
CPU times: user 3min 9s, sys: 1min 54s, total: 5min 3s
Wall time: 43.7 s


Unnamed: 0,Model,pr(%),re(%),fs(%),fc
0,nn_original,96.48639,99.652636,98.043951,194.6
1,nn_logistic_activation,98.6769,99.358398,99.016475,96.6
2,nn_sgd,98.776555,99.09481,98.935181,104.4
3,nn_adam,98.849954,98.794442,98.820085,115.5


In [20]:
%%time 
import statistics

pr_col_STD = []
pr_col_STD.append(statistics.pstdev(NN_original_pr))
pr_col_STD.append(statistics.pstdev(NN_logistic_pr))
pr_col_STD.append(statistics.pstdev(NN_sgd_pr))
pr_col_STD.append(statistics.pstdev(NN_adam_pr))

re_col_STD = []
re_col_STD.append(statistics.pstdev(NN_original_re))
re_col_STD.append(statistics.pstdev(NN_logistic_re))
re_col_STD.append(statistics.pstdev(NN_sgd_re))
re_col_STD.append(statistics.pstdev(NN_adam_re))

fs_col_STD = []
fs_col_STD.append(statistics.pstdev(NN_original_fs))
fs_col_STD.append(statistics.pstdev(NN_logistic_fs))
fs_col_STD.append(statistics.pstdev(NN_sgd_fs))
fs_col_STD.append(statistics.pstdev(NN_adam_fs))

fc_col_STD = []
fc_col_STD.append(statistics.pstdev(NN_original_fc))
fc_col_STD.append(statistics.pstdev(NN_logistic_fc))
fc_col_STD.append(statistics.pstdev(NN_sgd_fc))
fc_col_STD.append(statistics.pstdev(NN_adam_fc))


df_STD = pd.DataFrame(models, columns=['Model'])
df_STD['pr(%)'] = pr_col_STD
df_STD['pr(%)'] = df_STD['pr(%)']*100
df_STD['re(%)'] = re_col_STD
df_STD['re(%)'] = df_STD['re(%)']*100
df_STD['fs(%)'] = fs_col_STD
df_STD['fs(%)'] = df_STD['fs(%)']*100
df_STD['fc'] = fc_col_STD

df_STD

CPU times: user 7.59 ms, sys: 316 µs, total: 7.9 ms
Wall time: 7.72 ms


Unnamed: 0,Model,pr(%),re(%),fs(%),fc
0,nn_original,0.045341,0.0,0.023407,2.236068
1,nn_logistic_activation,0.023328,0.01001,0.012124,1.0
2,nn_sgd,0.202952,0.110434,0.04979,5.0
3,nn_adam,0.552481,0.387585,0.144129,14.525839
