# Note

This note will conduct an experiment following this pipeline:

1. Compose a Judge Committee, $C_J = \{f_1, \ldots, f_N\}$. Here, we can take two versions of Committee: SB type and B type. SB type Committee consists of specialized ResNet models while B type doesn't.

2. Here we define several terminologies:

- Let $D_{*} = (X_*, Y_*)$ denote dataset, where the asterisk reserves type of dataset, for instance $D_{\textrm{tr}}$. $D$ refers dataset (includes unseen data). 
- Define $I_C : D \to [0,1]$ which measures disagreement(or entropy) among Committee $C$. 
- $\alpha \in [0,1]$ denotes a given entropy threshold.
- For a dataset $D_{*}$, define $D^0_{*}(C) := \{I_C(x) < \alpha\}$ and $D^1_{*}(C) := \{I_C(x) \geq \alpha\}$.

3. Specialize each member of the Judge Committee $f \in C_J$ into $g$ on $D^1_{*}(C_J)$. We call the newly obtained Committee $C_S = \{g_1, \ldots, g_N\}$.

4. When an instance $x$ is given, first determine $I_{C_J}(x)$, and make prediction according to the value. If $I_{C_J}(x) < \alpha$ then $C_J$ predicts, otherwise $C_S$ does.

In [1]:
## load necessary libraries
import os, glob
import numpy as np
import tensorflow as tf
import pandas as pd
import keras
from keras.models import load_model
from keras.datasets import cifar10
from sklearn.model_selection import train_test_split
from keras.utils import to_categorical
from keras import Model
from scipy.stats import entropy
import math



2025-08-18 22:02:16.667842: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


## Compose Judge Committee

We can have two type of $C_J$: SB and B. 

In [None]:
## predefined dictionary for label
label_dict = {(3,1) : 'ResNet20v1',
        (3,2) : 'ResNet20v2',
        (9,1): 'ResNet56v1',
        (9,2): 'ResNet56v2'}

## load models

# Say 'B-type' model, where 'B' stands for 'Base'.
folder = 'CIFAR10models/ResNet/'
pattern = os.path.join(folder, '*cifar10*.keras')
file_list = sorted(glob.glob(pattern))
loaded_models= {os.path.basename(f): load_model(f) for f in file_list}

## Compose Committee
Judge_Committee_B = []
for file_name, model in loaded_models.items():
    # Extract model name from the file name
    base = file_name.replace('.keras','')
    parts = base.split('_')
    # base = n_3_v1_cifar10_1 or n_3_v1_cifar10
    # parts = [n, 3, v1, cifar10, 1] or [n, 3, v1, cifar10]
    model_name = label_dict[(int(parts[1]),int(parts[2][-1]))] + '_B_' +parts[-1][-1]
    Judge_Committee_B.append((model_name, model))
print(f"Total {len(Judge_Committee_B)} B-type models loaded")

# Say 'SB-type', where 'SB' stands for 'Specialized Base'.
folder = 'CIFAR10models/more_tunned/'
pattern = os.path.join(folder, '*_more_specialized*.keras')
file_list = sorted(glob.glob(pattern))
loaded_models.update({os.path.basename(f): load_model(f) for f in file_list})

## Compose Committee
Judge_Committee_SB = []
for file_name, model in loaded_models.items():
    # Extract model name from the file name
    base = file_name.replace('.keras','')
    parts = base.split('_')
    # base = ResNet20v1_more_specialized_0, ResNet20v1_once-more_specialized_0, or ResNet20v1_twice-more_specialized_0
    # parts = [ResNet20v1, more, specialized, 0] or [ResNet20v1, once-more, specialized, 0] and so forth.
    if parts[1] != 'more':
        continue
    model_name = parts[0] + '_SB_' + parts[-1]
    Judge_Committee_SB.append((model_name, model))
print(f"Total {len(Judge_Committee_SB)} SB-type models loaded")


## Define $I$

We can have three measurements: 

- Let say $f_1(x) = \left(y^1_1, \ldots, y^m_1\right), \ldots, f_N(x) = \left(y^1_N, \ldots, y^m_N\right)$ with $m$-many classes and $N$-many models. 

- By taking $\arg\max f_i(x)$, we have each answers $a_1, \ldots , a_N$. $a_1, \ldots, a_N$ give discrete distribution such that $p_j = P(X=j) = \frac{\{a_i = j\}}{N}$ where $j=1,\ldots,m$. With this, we define $I^{\arg\max}_{C}(x) = -\sum p_i \log_{m}p_i$.

- With raw predictions, we define $\frac{\sum_{k=1}^N y^i_k}{N} = p_i$ and $I^\textrm{overall}_{C}(x) = -\sum p_i \log_{m}p_i$. Note that if $y^i_k = 0 $ or $1$ then it coincides with the previous one.

- Also with raw predictions, we define $ I^{\textrm{cross}} = -\sum_{k\neq l} \sum_{i = 1}^{m} y^i_k \log_m y^i_l $

In [67]:
from typing import Sequence


# Sequence allows to have anything that behaves like a sequence including lists, tuples, and arrays.
def cross_entropy(arrays: Sequence[np.ndarray], log_base: float | None = None) -> float:
    """
    Compute  f = - sum_{k != l} sum_{i=1}^m a_{i,k} * log( a_{i,l} )
    where each input is an array of shape (m, 1) (or (m,)).

    Parameters
    ----------
    arrays : sequence of np.ndarray
        N arrays, each of shape (m,1) or (m,). They should all have the same m.
    log_base : float or None, optional
        If provided (e.g., 10), compute logarithm in this base.
        If None, uses the natural logarithm.

    Returns
    -------
    float
        The scalar value of the cross entropy f among the arrays

    Notes
    -----
    - Time complexity is O(m N^2) dominated by the matrix multiply, where N is the length of arrays.
    """
    if len(arrays) == 0:
        return 0.0

    # Stack to shape (m, N)
    cols = [np.asarray(a).reshape(-1) for a in arrays]
    m = cols[0].shape[0]
    if any(c.shape[0] != m for c in cols):
        raise ValueError("All arrays must have the same first dimension m.")
    A = np.column_stack(cols)  # shape (m, N)

    L = np.log(A+1e-12)
    if log_base is not None:
        L = L / np.log(log_base)

    M = A.T @ L  # shape (N, N), M[k, l] = sum_i a_{i,k} * log(a_{i,l})
    total = -(np.sum(M) - np.trace(M))  # sum over k != l, then negate
    return float(total)

In [59]:
arr1 = np.random.rand(10, 3)
arr2 = np.random.rand(10, 3)
arr3 = np.random.rand(10, 3)
arr4 = np.random.rand(10, 3)

In [65]:
stacked

array([[[0.12351293, 0.29237231, 0.70961478],
        [0.7398503 , 0.05672318, 0.63630757],
        [0.11432249, 0.22154813, 0.886484  ],
        [0.35177859, 0.40300334, 0.68322051]],

       [[0.20570996, 0.04658574, 0.01420149],
        [0.26276874, 0.81190614, 0.24089568],
        [0.78056377, 0.95961187, 0.10431696],
        [0.87814417, 0.95017658, 0.84739626]],

       [[0.70383204, 0.25313825, 0.8408647 ],
        [0.14020372, 0.80240872, 0.96665429],
        [0.78052659, 0.63248383, 0.24348909],
        [0.52170827, 0.83689521, 0.8256521 ]],

       [[0.81953719, 0.38692825, 0.25188964],
        [0.5310831 , 0.96873106, 0.06320516],
        [0.94242732, 0.04088438, 0.31880355],
        [0.08900948, 0.45553488, 0.09730984]],

       [[0.47038348, 0.63170717, 0.48889846],
        [0.99294304, 0.72238559, 0.3006918 ],
        [0.74789541, 0.01997505, 0.09957158],
        [0.27661194, 0.19025774, 0.26321753]],

       [[0.95027208, 0.01553644, 0.98367798],
        [0.42677776, 0.5

In [61]:
stacked = np.stack([arr1,arr2,arr3, arr4], axis=1)

13.070740258358502

In [None]:
a1 = np.array([[0.2],[0.8]])
a2 = np.array([[0.5],[0.5]])
a3 = np.array([[0.9],[0.1]])

val = cross_entropy([a1, a2, a3], log_base=2) 
print('a1,a2,a3',val)

val = cross_entropy([a1, a1, a1],  log_base=2) 
print('a1,a1,a1',val)

val = cross_entropy([a2, a2, a2],  log_base=2) 
print('a2,a2,a2',val)

val = cross_entropy([a3, a3, a3], log_base=2) 
print('a3,a3,a3',val)

val = cross_entropy([a1, a2, a2],  log_base=2) 
print('a1,a2,a2',val)

val = cross_entropy([a3, a2, a2],  log_base=2) 
print('a3,a2,a2',val)

val = cross_entropy([a2, a3, a3],  log_base=2) 
print('a2,a3,a3',val)

val = cross_entropy([a1, a3, a3],  log_base=2) 
print('a1,a3,a3',val)

val = cross_entropy([a2, a1, a1],  log_base=2) 
print('a2,a1,a1',val)

val = cross_entropy([a3, a1, a1], log_base=2) 
print('a3,a1,a1',val)

a1,a2,a3 9.868764878539828
a1,a1,a1 4.3315685693241734
a2,a2,a2 6.0
a3,a3,a3 2.8139735615356876
a1,a2,a2 6.643856189774725
a3,a2,a2 7.473931188332412
a2,a3,a3 6.411922375510974
a1,a3,a3 10.557733566151086
a2,a1,a1 6.087712379549449
a3,a1,a1 11.06359856874725


In [69]:
def get_entropy_array(committee : list[tuple[str,Model]],
                      entropy_version : str = "argmax",
                      num_classes : int = 10,
                      sample : np.ndarray = None, 
                      preds : dict[str, np.ndarray] = None) -> tuple[np.ndarray, dict[str, np.ndarray]]:
    '''
    param committee: list of tuple (member, model)
    param entropy_version: whether to use argmax, overall or cross_entropy version for entropy calculation
    param num_classes: number of targets
    param sample: numpy array of (n,) shape
    param preds: dictionary of (raw) predictions from each model in the committee.
    Either sample or preds must be provided. If both sample and preds are provided, preds will be used.
    
    return (numpy array of entropy of (n,) shape, dictionary of preds whose keys are the member names) 
    '''
    if sample is None and preds is None:
        raise ValueError("Either sample or preds must be provided")
    if entropy_version not in ["argmax", "overall", "cross_entropy"]:
        raise ValueError("entropy_version must be either 'argmax', 'overall' or 'cross_entropy'")

    if preds is None:
        preds = {}
        for member, model in committee:
            preds[member] = model.predict(sample,verbose = 0)
            
    if entropy_version == "argmax":
        stacked = np.stack([np.argmax(member_pred, axis=-1) for member_pred in preds.values()], axis=1)
        counts = np.apply_along_axis(lambda x: np.bincount(x, minlength=num_classes), axis=1, arr=stacked)
        probs = counts / counts.sum(axis=1, keepdims=True)
        disagreements = entropy(probs, axis=1, base = num_classes)
        ## if member prediction is of (*,) dimensional array, then error will occur at stacked and counting procedure as well.
        ## To remedy this, we can put reshaping line of code before stacking.
        
    elif entropy_version == "overall": # overall
        stacked = np.stack([member_pred for member_pred in preds.values()], axis=1)
        probs = np.sum(stacked,axis=1)/len(committee)
        disagreements = entropy(probs, axis=1, base = num_classes)
    elif entropy_version == 'cross_entropy':
        stacked = np.stack([member_pred for member_pred in preds.values()], axis=1)
        n = len(stacked)
        disagreements = np.ndarray(np.zeros((n,num_classes)))
        for i in range(n):
            disagreements[i] = cross_entropy(stacked[i,:], log_base=num_classes)

    return (disagreements, preds) 

## Plotting entropies to choose threshold

We may define entropy threhold arbitrarily. Or, we can see the boxplots of entropy.

In [None]:
import matplotlib.pyplot as plt

def plot_disagreements(disagreements: np.ndarray | dict[str, np.ndarray], title: str = "Disagreements"):
    '''
    disagreements is either a numpy array of entropies, or a dictionary of entropy whose keys are the name of sample.
    '''
    plt.figure(figsize=(16, 6))
    plt.boxplot(disagreements if isinstance(disagreements, np.ndarray) else list(disagreements.values()))
    plt.title(title)
    if isinstance(disagreements, dict):
        plt.xticks(ticks=range(1, len(disagreements) + 1), labels=list(disagreements.keys()))
    else:
        plt.xlabel("Sample Index")
    plt.ylabel("Disagreement")
    plt.show()
    
def get_statistics(disagreements: np.ndarray) -> dict[str, float]:
    '''
    disagreements is a numpy array of entropies.
    return dictionary of average, median, std, 1Q, 3Q.
    '''
    return {
        "average": float(np.mean(disagreements)),
        "median": float(np.median(disagreements)),
        "std": float(np.std(disagreements)),
        "1Q": float(np.percentile(disagreements, 25)),
        "3Q": float(np.percentile(disagreements, 75)),
    }

## Specialization function

In [54]:
from keras import layers
from keras.layers import Dense
from keras.callbacks import ModelCheckpoint, LearningRateScheduler, ReduceLROnPlateau

def lr_schedule(epoch):
    """Learning Rate Schedule

    Learning rate is scheduled to be reduced after 5, 10, 15, 18 epochs.
    Called automatically every epoch as part of callbacks during training.

    # Arguments
        epoch (int): The number of epochs

    # Returns
        lr (float32): learning rate
    """
    lr = 1e-3
    if epoch > 18:
        lr *= 0.5e-3
    elif epoch > 15:
        lr *= 1e-3
    elif epoch > 10:
        lr *= 1e-2
    elif epoch > 5:
        lr *= 1e-1
#     print('Learning rate: ', lr)
    return lr

def turn_specialist(model : Model, path : str,
        x_tr: np.ndarray | None = None,
        y_tr: np.ndarray | None = None,
        x_v: np.ndarray | None = None,
        y_v: np.ndarray | None = None,
        num_classes : int = 10,
        epochs: int = 21,
        learning_rate : float = 1e-3,
        batch_size: int = 128,
        # save_each: bool = False,
        # save_bests: int | None = None,
        verbose: int = 1,
        name : str = '',
        add_last_dense : bool = True
    ):
        if add_last_dense:
            # build specialist network
            base = Model(inputs = model.inputs, outputs = model.layers[-2].output, name=f"base{name}")
            x    = keras.Input(shape=base.input_shape[1:], name=f"in{name}")
            y    = Dense(num_classes, name=f"dense{name}")(base(x)) 
            z    = layers.Softmax(name=f"softmax{name}")(y)
            specialist = Model(inputs = x, outputs = z)
            specialist.compile(
                optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
                loss="categorical_crossentropy",
                metrics=["accuracy"],
            )
        else:
            # we don't add a new dense layer
            # and fine tune it.
            specialist = model
            specialist.compile(
                optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
                loss="categorical_crossentropy",
                metrics=["accuracy"],
            )


        # callbacks
        callbacks = [ModelCheckpoint(path, monitor="val_accuracy",
                                    save_best_only=True, verbose=verbose)]
        
        callbacks += [LearningRateScheduler(lr_schedule),
                        ReduceLROnPlateau(factor=np.sqrt(0.1), patience=5, min_lr=5e-7)]

        # fit 
        hist = specialist.fit(x_tr, y_tr, batch_size=batch_size,
                        validation_data=(x_v, y_v),
                        epochs=epochs, callbacks=callbacks, verbose=verbose)



        # ---------- summary ----------
        metric = "val_accuracy"
        best = np.max(hist.history[metric])
        first = hist.history[metric][0]
        print(f"best {metric} {best:.3f} (first {first:.3f})")
        return specialist, hist


In [None]:
def specialize_Committee(committee: dict[str, Model], 
                         training_data: tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray],
                         data_name: str = '',
                         committee_name:str = '',
                         num_classes : int = 10):
    x_tr, y_tr, x_v, y_v = training_data
    if y_tr.ndim == 1:
        y_tr = to_categorical(y_tr, num_classes)
    if y_v.ndim == 1:
        y_v = to_categorical(y_v, num_classes)
    for member, model in committee.items():
        print(f"Training specialist for {member}...")
        # baseline, which is specialize once with the original labels.
        specialist, history = turn_specialist(model,
                        path=f'./specialists/{committee_name}/{member}_{data_name}.keras',
                        x_tr = x_tr,
                        y_tr = y_tr,
                        x_v = x_v,
                        y_v = y_v,
                        name = f'{committee_name}_{member}_{data_name}',
                        verbose = 0
                        )
        # see the graph of training process
        plt.plot(history.history['accuracy'], label='train accuracy')
        plt.plot(history.history['val_accuracy'], label='val accuracy')
        plt.title(f'Accuracy of {member}\non {data_name}')
        plt.xlabel('Epochs')
        plt.ylabel('Accuracy')
        plt.legend()
        plt.show()
        
        # save the specialist model
        specialist.save(f'./specialists/{committee_name}/{member}_{data_name}.keras')
    

## Load datasets

In [None]:
## load CIFAR10 dataset

(x, y), (x_test, y_test) = cifar10.load_data()
x_train, x_val, y_train, y_val = train_test_split(x, y, test_size=0.2, 
                                              random_state=42) # random state has been always 42.
x_train = x_train.astype('float32') / 255.0
x_val = x_val.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

y_train_categorical_10 = to_categorical(y_train,10)
y_val_categorical_10 = to_categorical(y_val,10)
y_test_categorical_10 = to_categorical(y_test,10)

## we load only two adversarial samples

import re

pattern = re.compile(r'_\d+to\d+_')

folder = './adversarial_examples/gen_by_ResNet'
all_files = os.listdir(folder)

filtered_files = [
    fname for fname in all_files
    if fname.endswith('.npy') and not pattern.search(fname)]


## load all the training and validation dataset

adv_dataset = {}
import re
for f in filtered_files:
    
    base = f.replace('.keras.npy','').replace('.npy','')
    parts = base.split('_')
    '''
    cwl2_x_tr_untargeted
    cwl2_x_v_untargeted
    cwl2_x_test_untargeted <- gen by ResNet56v1_0
    cwl2_x_test_untargeted_gen_by_ResNet56v1_1
    pgd_0.376_x_untarget
    pgd_0.376_x_val_untarget
    pgd_0.376_x_test_untarget_gen_by_n_9_v1_cifar10
    '''
    if base in ['cwl2_x_tr_untargeted', 
                'cwl2_x_v_untargeted',
                 'cwl2_x_test_untargeted', 
                #  'cwl2_x_test_untargeted_gen_by_ResNet56v1_1',
                 'pgd_0.376_x_untarget', 
                 'pgd_0.376_x_val_untarget',
                 'pgd_0.376_x_test_untarget_gen_by_n_9_v1_cifar10']:
        attack_type = parts[0]
        if attack_type == 'cwl2':
            if parts[2] == 'tr':
                train_or_val_or_test = 'train'
            elif parts[2] == 'test':
                train_or_val_or_test = 'test'
            else:
                train_or_val_or_test = 'val'
        else:
            attack_type = 'PGD'
            if parts[3] == 'val':
                train_or_val_or_test = 'val'
            elif parts[3] == 'test':
                train_or_val_or_test = 'test'
            else:
                train_or_val_or_test = 'train'
    else:
        continue
    key = (attack_type, train_or_val_or_test)
    adv_dataset[key] = np.load(os.path.join(folder, f))
    

adv_dataset[('VGG','test')] = np.load('./adversarial_examples/gen_by_VGG/pgd_0.376_x_test_untarget_by_vgg19.npy')
print('-'*50, 'keys for adv samples', '-'*50)
for k in adv_dataset.keys():
    print(k)
    
    

## Get raw predictions

In [None]:
raw_predictions_B = {
    'original': {'train':{}, 'val':{}, 'test':{}},
    'cwl2': {'train':{}, 'val':{}, 'test':{}},
    'PGD': {'train':{}, 'val':{}, 'test':{}},
    'VGG': {'test':{}}
    }


for member, model in Judge_Committee_B:
    raw_predictions_B['original']['train'][member] = model.predict(x_train, verbose=0)
    raw_predictions_B['original']['val'][member] = model.predict(x_val, verbose=0)
    raw_predictions_B['original']['test'][member] = model.predict(x_test, verbose=0)

    for key, adv_samples in adv_dataset.items():
        raw_predictions_B[key[0]][key[1]][member] = model.predict(adv_samples, verbose=0)
        

import pickle as pkl

## dump raw predictions into pickle files
with open('./data/raw_predictions_B.pkl', 'wb') as f:
    pkl.dump(raw_predictions_B, f)
    
raw_predictions_SB = {
    'original': {'train':{}, 'val':{}, 'test':{}},
    'cwl2': {'train':{}, 'val':{}, 'test':{}},
    'PGD': {'train':{}, 'val':{}, 'test':{}},
    'VGG': {'test':{}}
    }


for member, model in Judge_Committee_SB:
    raw_predictions_SB['original']['train'][member] = model.predict(x_train, verbose=0)
    raw_predictions_SB['original']['val'][member] = model.predict(x_val, verbose=0)
    raw_predictions_SB['original']['test'][member] = model.predict(x_test, verbose=0)

    for key, adv_samples in adv_dataset.items():
        raw_predictions_SB[key[0]][key[1]][member] = model.predict(adv_samples, verbose=0)
        

## dump raw predictions into pickle files
with open('./data/raw_predictions_SB.pkl', 'wb') as f:
    pkl.dump(raw_predictions_SB, f)

## get entropies

In [None]:
# ## Load raw predictions_SB from pickle files
# with open('./data/raw_predictions_SB.pkl', 'rb') as f:
#     raw_predictions_SB = pkl.load(f)
# ## Load raw predictions_B from pickle files
# with open('./data/raw_predictions_B.pkl', 'rb') as f:
#     raw_predictions_B = pkl.load(f)

entropy_arrays_SB = {}
for data_name in raw_predictions_SB.keys():
    for data_type in raw_predictions_SB[data_name].keys():
        entropy_array = get_entropy_array(Judge_Committee_SB, 
                          entropy_version = 'cross_entropy',
                          preds = raw_predictions_SB[data_name][data_type])
        entropy_arrays_SB[(data_name, data_type)] = entropy_array

## dump it
with open('./data/entropy_arrays_SB.pkl', 'wb') as f:
    pkl.dump(entropy_arrays_SB, f)
    
entropy_arrays_B = {}
for data_name in raw_predictions_B.keys():
    for data_type in raw_predictions_SB[data_name].keys():
        entropy_array = get_entropy_array(Judge_Committee_B, 
                          entropy_version = 'cross_entropy',
                          preds = raw_predictions_B[data_name][data_type])
        entropy_arrays_B[(data_name, data_type)] = entropy_array

## dump it
with open('./data/entropy_arrays_B.pkl', 'wb') as f:
    pkl.dump(entropy_arrays_B, f)

## See statistics

In [None]:
# ## Load entropy arrays from pickle files
# with open('./data/entropy_arrays_SB.pkl', 'rb') as f:
#     entropy_arrays_SB = pkl.load(f)
# with open('./data/entropy_arrays_B.pkl', 'rb') as f:
#     entropy_arrays_B = pkl.load(f)

plot_disagreements(entropy_arrays_SB, title = "Entropy Disagreements - SB")
for key, value in entropy_arrays_SB.items():
    print(f"{key}: {get_statistics(value)}")
plot_disagreements(entropy_arrays_B, title = "Entropy Disagreements - B")
for key, value in entropy_arrays_B.items():
    print(f"{key}: {get_statistics(value)}")

In [None]:
for key, value in entropy_arrays_SB.items():
    print(key)
    
    if key[1] == 'train':
        truth = y_train.reshape(-1)
    elif key[1] == 'val':
        truth = y_val.reshape(-1)
    else:
        truth = y_test.reshape(-1)
    stats = get_statistics(value)
    alpha = stats['average']
    print(alpha)
    loc_0 = entropy_arrays_SB[key]<alpha
    loc_1 = entropy_arrays_SB[key]>=alpha
    X_0 = raw_predictions_SB[key[0]][key[1]][loc_0]
    X_1 = raw_predictions_SB[key[0]][key[1]][loc_1]
    
    print(f"X_0 shape: {X_0.shape}, X_1 shape: {X_1.shape}")
    for member, model in Judge_Committee_SB:

        acc_0 = (np.argmax(X_0,axis=-1) == truth[loc_0])/len(X_0)
        acc_1 = (np.argmax(X_1,axis=-1) == truth[loc_1])/len(X_1)
    ## make table
    table = pd.DataFrame({
        "Member": [member],
        "Model": [model],
        f"{key}_0": [acc_0],
        f"{key}_1": [acc_1],
    })
table

In [None]:
table[table.columns.str.startswith("Acc_")]