### Evaluating CPM with CEBaB
Our Causal Proxy Model (CPM) is for providing concept-based explanation for a blackbox model. We use newly developed CEBaB benchmark for comparing CPM with other concept-based explanation methods. This notebook evaluates CPM with CEBaB benchmark under different settings.

More importantly, we introduce new baselines for CPM as well. Formally, we evaluate the blackbox model with interchange intervention evaluation (which will be introduced in details below).

In this notebook, we can evaluate the following models:
- CPM: `BERT-base-uncased`
- CPM: `RoBERTa-base`
- CPM: `GPT2`
- CPM: `LSTM+GloVe`
- CPM: `Control`

and we can evaluate with the following conditions:
- 2-class
- 3-class
- 5-class

#### Imports and Libs

In [1]:
from libs import *
from modelings.modelings_bert import *
from modelings.modelings_roberta import *
from modelings.modelings_gpt2 import *
from modelings.modelings_lstm import *
"""
For evaluate, we use a single random seed, as
the models are trained with 5 different seeds
already.
"""
_ = random.seed(123)
_ = np.random.seed(123)
_ = torch.manual_seed(123)



#### Main evaluate script

In [42]:
"""
The following blocks will run CEBaB benchmark in
all the combinations of the following conditions.
"""
grid = {
    "seed": [42, 66, 77, 88, 99],
    "h_dim": [75],
    "class_num": [2, 3, 5],
    "control": [True],
    "beta" : [1.0],
    "gemma" : [3.0],
    "cls_dropout" : [0.1],
    "enc_dropout" : [0.1],
    "model_arch" : ["lstm"]
}

keys, values = zip(*grid.items())
permutations_dicts = [dict(zip(keys, v)) for v in itertools.product(*values)]

device = 'cuda:8'
batch_size = 32

In [43]:
results = {}
for i in range(len(permutations_dicts)):
    
    seed=permutations_dicts[i]["seed"]
    class_num=permutations_dicts[i]["class_num"]
    beta=permutations_dicts[i]["beta"]
    gemma=permutations_dicts[i]["gemma"]
    h_dim=permutations_dicts[i]["h_dim"]
    dataset_type = f'{class_num}-way'
    correction_epsilon=None
    cls_dropout=permutations_dicts[i]["cls_dropout"]
    enc_dropout=permutations_dicts[i]["enc_dropout"]
    control=permutations_dicts[i]["control"]
    model_arch=permutations_dicts[i]["model_arch"]
    
    if model_arch == "bert-base-uncased":
        model_path = "BERT-control-results" if control else "BERT-results"
        model_module = BERTForCEBaB
        explainer_module = CausalProxyModelForBERT
    elif model_arch == "roberta-base":
        model_path = "RoBERTa-control-results" if control else "RoBERTa-results"
        model_module = RoBERTaForCEBaB
        explainer_module = CausalProxyModelForRoBERTa
    elif model_arch == "gpt2":
        model_path = "gpt2-control-results" if control else "gpt2-results"
        model_module = GPT2ForCEBaB
        explainer_module = CausalProxyModelForGPT2
    elif model_arch == "lstm":
        model_path = "lstm-control-results" if control else "lstm-results"
        model_module = LSTMForCEBaB
        explainer_module = CausalProxyModelForLSTM
        
    grid_conditions=(
        ("seed", seed),
        ("class_num", class_num),
        ("beta", beta),
        ("gemma", gemma),
        ("h_dim", h_dim),
        ("dataset_type", dataset_type),
        ("correction_epsilon", correction_epsilon),
        ("cls_dropout", cls_dropout),
        ("enc_dropout", enc_dropout),
        ("control", control),
        ("model_arch", model_arch),
    )
    print("Running for this setting: ", grid_conditions)

    blackbox_model_path = f'CEBaB/{model_arch}.CEBaB.sa.{class_num}-class.exclusive.seed_{seed}'
    if control:
        cpm_model_path = blackbox_model_path
    else:
        cpm_model_path = f'../proxy_training_results/{model_path}/'\
                           f'cebab.train.train.alpha.1.0'\
                           f'.beta.{beta}.gemma.{gemma}.dim.{h_dim}.hightype.'\
                           f'{model_arch}.Proxy.'\
                           f'CEBaB.sa.{class_num}-class.exclusive.'\
                           f'mode.align.cls.dropout.{cls_dropout}.enc.dropout.{enc_dropout}.seed_{seed}'

    # load data from HF
    cebab = datasets.load_dataset(
        'CEBaB/CEBaB', use_auth_token=True,
        cache_dir="../../huggingface_cache/"
    )
    train, dev, test = preprocess_hf_dataset(
        cebab, one_example_per_world=True, 
        verbose=1, dataset_type=dataset_type
    )

    tf_model = model_module(
        blackbox_model_path, 
        device=device, 
        batch_size=batch_size
    )
    explanator = explainer_module(
        blackbox_model_path,
        cpm_model_path, 
        device=device, 
        batch_size=batch_size,
        intervention_h_dim=h_dim,
    )

    train_dataset = train.copy()
    dev_dataset = test.copy()

    result_per_example, ATE, CEBaB_metrics, CEBaB_metrics_per_aspect_direction, \
    CEBaB_metrics_per_aspect, CaCE_per_aspect_direction, \
    ACaCE_per_aspect, performance_report = cebab_pipeline(
        tf_model, explanator, 
        train_dataset, dev_dataset, 
        dataset_type=dataset_type,
        correction_epsilon=correction_epsilon,
    )
    
    results[grid_conditions] = (
        result_per_example, ATE, CEBaB_metrics, CEBaB_metrics_per_aspect_direction, \
        CEBaB_metrics_per_aspect, CaCE_per_aspect_direction, \
        ACaCE_per_aspect, performance_report
    )

Running for this setting:  (('seed', 42), ('class_num', 2), ('beta', 1.0), ('gemma', 3.0), ('h_dim', 75), ('dataset_type', '2-way'), ('correction_epsilon', None), ('cls_dropout', 0.1), ('enc_dropout', 0.1), ('control', True), ('model_arch', 'lstm'))


Using custom data configuration CEBaB--CEBaB-0e2f7ed67c9d7e55
Reusing dataset parquet (../../huggingface_cache/parquet/CEBaB--CEBaB-0e2f7ed67c9d7e55/0.0.0/0b6d5799bb726b24ad7fc7be720c170d8e497f575d02d47537de9a5bac074901)


  0%|          | 0/4 [00:00<?, ?it/s]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dataset['review_majority'] = dataset['review_majority'].apply(lambda score: encoding[score])


Dropping no majority reviews: 16.6382% of train dataset.
Dropped 391 examples with a neutral label.
Dropped 452 examples with a neutral label.
Dropped 461 examples with a neutral label.
intervention_h_dim=75


Some weights of IITLSTMForSequenceClassification were not initialized from the model checkpoint at CEBaB/lstm.CEBaB.sa.2-class.exclusive.seed_42 and are newly initialized: ['multitask_classifier.out_proj.weight', 'multitask_classifier.dense.weight', 'multitask_classifier.dense.bias', 'multitask_classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|██████████| 72/72 [00:01<00:00, 57.36it/s]


Running for this setting:  (('seed', 42), ('class_num', 3), ('beta', 1.0), ('gemma', 3.0), ('h_dim', 75), ('dataset_type', '3-way'), ('correction_epsilon', None), ('cls_dropout', 0.1), ('enc_dropout', 0.1), ('control', True), ('model_arch', 'lstm'))


Using custom data configuration CEBaB--CEBaB-0e2f7ed67c9d7e55
Reusing dataset parquet (../../huggingface_cache/parquet/CEBaB--CEBaB-0e2f7ed67c9d7e55/0.0.0/0b6d5799bb726b24ad7fc7be720c170d8e497f575d02d47537de9a5bac074901)


  0%|          | 0/4 [00:00<?, ?it/s]

Dropping no majority reviews: 16.6382% of train dataset.
intervention_h_dim=75


Some weights of IITLSTMForSequenceClassification were not initialized from the model checkpoint at CEBaB/lstm.CEBaB.sa.3-class.exclusive.seed_42 and are newly initialized: ['multitask_classifier.out_proj.weight', 'multitask_classifier.dense.weight', 'multitask_classifier.dense.bias', 'multitask_classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|██████████| 124/124 [00:02<00:00, 57.93it/s]


Running for this setting:  (('seed', 42), ('class_num', 5), ('beta', 1.0), ('gemma', 3.0), ('h_dim', 75), ('dataset_type', '5-way'), ('correction_epsilon', None), ('cls_dropout', 0.1), ('enc_dropout', 0.1), ('control', True), ('model_arch', 'lstm'))


Using custom data configuration CEBaB--CEBaB-0e2f7ed67c9d7e55
Reusing dataset parquet (../../huggingface_cache/parquet/CEBaB--CEBaB-0e2f7ed67c9d7e55/0.0.0/0b6d5799bb726b24ad7fc7be720c170d8e497f575d02d47537de9a5bac074901)


  0%|          | 0/4 [00:00<?, ?it/s]

Dropping no majority reviews: 16.6382% of train dataset.
intervention_h_dim=75


Some weights of IITLSTMForSequenceClassification were not initialized from the model checkpoint at CEBaB/lstm.CEBaB.sa.5-class.exclusive.seed_42 and are newly initialized: ['multitask_classifier.out_proj.weight', 'multitask_classifier.dense.weight', 'multitask_classifier.dense.bias', 'multitask_classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|██████████| 124/124 [00:02<00:00, 58.85it/s]


Running for this setting:  (('seed', 66), ('class_num', 2), ('beta', 1.0), ('gemma', 3.0), ('h_dim', 75), ('dataset_type', '2-way'), ('correction_epsilon', None), ('cls_dropout', 0.1), ('enc_dropout', 0.1), ('control', True), ('model_arch', 'lstm'))


Using custom data configuration CEBaB--CEBaB-0e2f7ed67c9d7e55
Reusing dataset parquet (../../huggingface_cache/parquet/CEBaB--CEBaB-0e2f7ed67c9d7e55/0.0.0/0b6d5799bb726b24ad7fc7be720c170d8e497f575d02d47537de9a5bac074901)


  0%|          | 0/4 [00:00<?, ?it/s]

Dropping no majority reviews: 16.6382% of train dataset.
Dropped 391 examples with a neutral label.
Dropped 452 examples with a neutral label.
Dropped 461 examples with a neutral label.
intervention_h_dim=75


Some weights of IITLSTMForSequenceClassification were not initialized from the model checkpoint at CEBaB/lstm.CEBaB.sa.2-class.exclusive.seed_66 and are newly initialized: ['multitask_classifier.out_proj.weight', 'multitask_classifier.dense.weight', 'multitask_classifier.dense.bias', 'multitask_classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|██████████| 72/72 [00:01<00:00, 57.32it/s]


Running for this setting:  (('seed', 66), ('class_num', 3), ('beta', 1.0), ('gemma', 3.0), ('h_dim', 75), ('dataset_type', '3-way'), ('correction_epsilon', None), ('cls_dropout', 0.1), ('enc_dropout', 0.1), ('control', True), ('model_arch', 'lstm'))


Using custom data configuration CEBaB--CEBaB-0e2f7ed67c9d7e55
Reusing dataset parquet (../../huggingface_cache/parquet/CEBaB--CEBaB-0e2f7ed67c9d7e55/0.0.0/0b6d5799bb726b24ad7fc7be720c170d8e497f575d02d47537de9a5bac074901)


  0%|          | 0/4 [00:00<?, ?it/s]

Dropping no majority reviews: 16.6382% of train dataset.
intervention_h_dim=75


Some weights of IITLSTMForSequenceClassification were not initialized from the model checkpoint at CEBaB/lstm.CEBaB.sa.3-class.exclusive.seed_66 and are newly initialized: ['multitask_classifier.out_proj.weight', 'multitask_classifier.dense.weight', 'multitask_classifier.dense.bias', 'multitask_classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|██████████| 124/124 [00:02<00:00, 58.39it/s]


Running for this setting:  (('seed', 66), ('class_num', 5), ('beta', 1.0), ('gemma', 3.0), ('h_dim', 75), ('dataset_type', '5-way'), ('correction_epsilon', None), ('cls_dropout', 0.1), ('enc_dropout', 0.1), ('control', True), ('model_arch', 'lstm'))


Using custom data configuration CEBaB--CEBaB-0e2f7ed67c9d7e55
Reusing dataset parquet (../../huggingface_cache/parquet/CEBaB--CEBaB-0e2f7ed67c9d7e55/0.0.0/0b6d5799bb726b24ad7fc7be720c170d8e497f575d02d47537de9a5bac074901)


  0%|          | 0/4 [00:00<?, ?it/s]

Dropping no majority reviews: 16.6382% of train dataset.
intervention_h_dim=75


Some weights of IITLSTMForSequenceClassification were not initialized from the model checkpoint at CEBaB/lstm.CEBaB.sa.5-class.exclusive.seed_66 and are newly initialized: ['multitask_classifier.out_proj.weight', 'multitask_classifier.dense.weight', 'multitask_classifier.dense.bias', 'multitask_classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|██████████| 124/124 [00:02<00:00, 57.27it/s]


Running for this setting:  (('seed', 77), ('class_num', 2), ('beta', 1.0), ('gemma', 3.0), ('h_dim', 75), ('dataset_type', '2-way'), ('correction_epsilon', None), ('cls_dropout', 0.1), ('enc_dropout', 0.1), ('control', True), ('model_arch', 'lstm'))


Using custom data configuration CEBaB--CEBaB-0e2f7ed67c9d7e55
Reusing dataset parquet (../../huggingface_cache/parquet/CEBaB--CEBaB-0e2f7ed67c9d7e55/0.0.0/0b6d5799bb726b24ad7fc7be720c170d8e497f575d02d47537de9a5bac074901)


  0%|          | 0/4 [00:00<?, ?it/s]

Dropping no majority reviews: 16.6382% of train dataset.
Dropped 391 examples with a neutral label.
Dropped 452 examples with a neutral label.
Dropped 461 examples with a neutral label.
intervention_h_dim=75


Some weights of IITLSTMForSequenceClassification were not initialized from the model checkpoint at CEBaB/lstm.CEBaB.sa.2-class.exclusive.seed_77 and are newly initialized: ['multitask_classifier.out_proj.weight', 'multitask_classifier.dense.weight', 'multitask_classifier.dense.bias', 'multitask_classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|██████████| 72/72 [00:01<00:00, 54.95it/s]


Running for this setting:  (('seed', 77), ('class_num', 3), ('beta', 1.0), ('gemma', 3.0), ('h_dim', 75), ('dataset_type', '3-way'), ('correction_epsilon', None), ('cls_dropout', 0.1), ('enc_dropout', 0.1), ('control', True), ('model_arch', 'lstm'))


Using custom data configuration CEBaB--CEBaB-0e2f7ed67c9d7e55
Reusing dataset parquet (../../huggingface_cache/parquet/CEBaB--CEBaB-0e2f7ed67c9d7e55/0.0.0/0b6d5799bb726b24ad7fc7be720c170d8e497f575d02d47537de9a5bac074901)


  0%|          | 0/4 [00:00<?, ?it/s]

Dropping no majority reviews: 16.6382% of train dataset.
intervention_h_dim=75


Some weights of IITLSTMForSequenceClassification were not initialized from the model checkpoint at CEBaB/lstm.CEBaB.sa.3-class.exclusive.seed_77 and are newly initialized: ['multitask_classifier.out_proj.weight', 'multitask_classifier.dense.weight', 'multitask_classifier.dense.bias', 'multitask_classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|██████████| 124/124 [00:02<00:00, 59.03it/s]


Running for this setting:  (('seed', 77), ('class_num', 5), ('beta', 1.0), ('gemma', 3.0), ('h_dim', 75), ('dataset_type', '5-way'), ('correction_epsilon', None), ('cls_dropout', 0.1), ('enc_dropout', 0.1), ('control', True), ('model_arch', 'lstm'))


Using custom data configuration CEBaB--CEBaB-0e2f7ed67c9d7e55
Reusing dataset parquet (../../huggingface_cache/parquet/CEBaB--CEBaB-0e2f7ed67c9d7e55/0.0.0/0b6d5799bb726b24ad7fc7be720c170d8e497f575d02d47537de9a5bac074901)


  0%|          | 0/4 [00:00<?, ?it/s]

Dropping no majority reviews: 16.6382% of train dataset.
intervention_h_dim=75


Some weights of IITLSTMForSequenceClassification were not initialized from the model checkpoint at CEBaB/lstm.CEBaB.sa.5-class.exclusive.seed_77 and are newly initialized: ['multitask_classifier.out_proj.weight', 'multitask_classifier.dense.weight', 'multitask_classifier.dense.bias', 'multitask_classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|██████████| 124/124 [00:02<00:00, 56.95it/s]


Running for this setting:  (('seed', 88), ('class_num', 2), ('beta', 1.0), ('gemma', 3.0), ('h_dim', 75), ('dataset_type', '2-way'), ('correction_epsilon', None), ('cls_dropout', 0.1), ('enc_dropout', 0.1), ('control', True), ('model_arch', 'lstm'))


Using custom data configuration CEBaB--CEBaB-0e2f7ed67c9d7e55
Reusing dataset parquet (../../huggingface_cache/parquet/CEBaB--CEBaB-0e2f7ed67c9d7e55/0.0.0/0b6d5799bb726b24ad7fc7be720c170d8e497f575d02d47537de9a5bac074901)


  0%|          | 0/4 [00:00<?, ?it/s]

Dropping no majority reviews: 16.6382% of train dataset.
Dropped 391 examples with a neutral label.
Dropped 452 examples with a neutral label.
Dropped 461 examples with a neutral label.
intervention_h_dim=75


Some weights of IITLSTMForSequenceClassification were not initialized from the model checkpoint at CEBaB/lstm.CEBaB.sa.2-class.exclusive.seed_88 and are newly initialized: ['multitask_classifier.out_proj.weight', 'multitask_classifier.dense.weight', 'multitask_classifier.dense.bias', 'multitask_classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|██████████| 72/72 [00:01<00:00, 55.55it/s]


Running for this setting:  (('seed', 88), ('class_num', 3), ('beta', 1.0), ('gemma', 3.0), ('h_dim', 75), ('dataset_type', '3-way'), ('correction_epsilon', None), ('cls_dropout', 0.1), ('enc_dropout', 0.1), ('control', True), ('model_arch', 'lstm'))


Using custom data configuration CEBaB--CEBaB-0e2f7ed67c9d7e55
Reusing dataset parquet (../../huggingface_cache/parquet/CEBaB--CEBaB-0e2f7ed67c9d7e55/0.0.0/0b6d5799bb726b24ad7fc7be720c170d8e497f575d02d47537de9a5bac074901)


  0%|          | 0/4 [00:00<?, ?it/s]

Dropping no majority reviews: 16.6382% of train dataset.
intervention_h_dim=75


Some weights of IITLSTMForSequenceClassification were not initialized from the model checkpoint at CEBaB/lstm.CEBaB.sa.3-class.exclusive.seed_88 and are newly initialized: ['multitask_classifier.out_proj.weight', 'multitask_classifier.dense.weight', 'multitask_classifier.dense.bias', 'multitask_classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|██████████| 124/124 [00:02<00:00, 56.13it/s]


Running for this setting:  (('seed', 88), ('class_num', 5), ('beta', 1.0), ('gemma', 3.0), ('h_dim', 75), ('dataset_type', '5-way'), ('correction_epsilon', None), ('cls_dropout', 0.1), ('enc_dropout', 0.1), ('control', True), ('model_arch', 'lstm'))


Using custom data configuration CEBaB--CEBaB-0e2f7ed67c9d7e55
Reusing dataset parquet (../../huggingface_cache/parquet/CEBaB--CEBaB-0e2f7ed67c9d7e55/0.0.0/0b6d5799bb726b24ad7fc7be720c170d8e497f575d02d47537de9a5bac074901)


  0%|          | 0/4 [00:00<?, ?it/s]

Dropping no majority reviews: 16.6382% of train dataset.
intervention_h_dim=75


Some weights of IITLSTMForSequenceClassification were not initialized from the model checkpoint at CEBaB/lstm.CEBaB.sa.5-class.exclusive.seed_88 and are newly initialized: ['multitask_classifier.out_proj.weight', 'multitask_classifier.dense.weight', 'multitask_classifier.dense.bias', 'multitask_classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|██████████| 124/124 [00:02<00:00, 57.55it/s]


Running for this setting:  (('seed', 99), ('class_num', 2), ('beta', 1.0), ('gemma', 3.0), ('h_dim', 75), ('dataset_type', '2-way'), ('correction_epsilon', None), ('cls_dropout', 0.1), ('enc_dropout', 0.1), ('control', True), ('model_arch', 'lstm'))


Using custom data configuration CEBaB--CEBaB-0e2f7ed67c9d7e55
Reusing dataset parquet (../../huggingface_cache/parquet/CEBaB--CEBaB-0e2f7ed67c9d7e55/0.0.0/0b6d5799bb726b24ad7fc7be720c170d8e497f575d02d47537de9a5bac074901)


  0%|          | 0/4 [00:00<?, ?it/s]

Dropping no majority reviews: 16.6382% of train dataset.
Dropped 391 examples with a neutral label.
Dropped 452 examples with a neutral label.
Dropped 461 examples with a neutral label.
intervention_h_dim=75


Some weights of IITLSTMForSequenceClassification were not initialized from the model checkpoint at CEBaB/lstm.CEBaB.sa.2-class.exclusive.seed_99 and are newly initialized: ['multitask_classifier.out_proj.weight', 'multitask_classifier.dense.weight', 'multitask_classifier.dense.bias', 'multitask_classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|██████████| 72/72 [00:01<00:00, 55.52it/s]


Running for this setting:  (('seed', 99), ('class_num', 3), ('beta', 1.0), ('gemma', 3.0), ('h_dim', 75), ('dataset_type', '3-way'), ('correction_epsilon', None), ('cls_dropout', 0.1), ('enc_dropout', 0.1), ('control', True), ('model_arch', 'lstm'))


Using custom data configuration CEBaB--CEBaB-0e2f7ed67c9d7e55
Reusing dataset parquet (../../huggingface_cache/parquet/CEBaB--CEBaB-0e2f7ed67c9d7e55/0.0.0/0b6d5799bb726b24ad7fc7be720c170d8e497f575d02d47537de9a5bac074901)


  0%|          | 0/4 [00:00<?, ?it/s]

Dropping no majority reviews: 16.6382% of train dataset.
intervention_h_dim=75


Some weights of IITLSTMForSequenceClassification were not initialized from the model checkpoint at CEBaB/lstm.CEBaB.sa.3-class.exclusive.seed_99 and are newly initialized: ['multitask_classifier.out_proj.weight', 'multitask_classifier.dense.weight', 'multitask_classifier.dense.bias', 'multitask_classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|██████████| 124/124 [00:02<00:00, 56.66it/s]


Running for this setting:  (('seed', 99), ('class_num', 5), ('beta', 1.0), ('gemma', 3.0), ('h_dim', 75), ('dataset_type', '5-way'), ('correction_epsilon', None), ('cls_dropout', 0.1), ('enc_dropout', 0.1), ('control', True), ('model_arch', 'lstm'))


Using custom data configuration CEBaB--CEBaB-0e2f7ed67c9d7e55
Reusing dataset parquet (../../huggingface_cache/parquet/CEBaB--CEBaB-0e2f7ed67c9d7e55/0.0.0/0b6d5799bb726b24ad7fc7be720c170d8e497f575d02d47537de9a5bac074901)


  0%|          | 0/4 [00:00<?, ?it/s]

Dropping no majority reviews: 16.6382% of train dataset.
intervention_h_dim=75


Some weights of IITLSTMForSequenceClassification were not initialized from the model checkpoint at CEBaB/lstm.CEBaB.sa.5-class.exclusive.seed_99 and are newly initialized: ['multitask_classifier.out_proj.weight', 'multitask_classifier.dense.weight', 'multitask_classifier.dense.bias', 'multitask_classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
100%|██████████| 124/124 [00:02<00:00, 56.95it/s]


#### Tabularize your results

In [44]:
important_keys = list(grid.keys())
values = []
for k, v in results.items():
    _values = []
    for ik in important_keys:
        _values.append(dict(k)[ik])
    _values.append(v[2]["ICaCE-L2"].iloc[0])
    _values.append(v[2]["ICaCE-cosine"].iloc[0])
    _values.append(v[2]["ICaCE-normdiff"].iloc[0])
    _values.append(v[-1].iloc[0][0])
    values.append(_values)
important_keys.extend(["ICaCE-L2", "ICaCE-cosine", "ICaCE-normdiff", "macro-f1"])
df = pd.DataFrame(values, columns=important_keys)
df.sort_values(by=['class_num'], ascending=True)

Unnamed: 0,seed,h_dim,class_num,control,beta,gemma,cls_dropout,enc_dropout,model_arch,ICaCE-L2,ICaCE-cosine,ICaCE-normdiff,macro-f1
0,42,75,2,True,1.0,3.0,0.1,0.1,lstm,0.2832,0.8827,0.2749,0.943242
3,66,75,2,True,1.0,3.0,0.1,0.1,lstm,0.2838,0.8762,0.2792,0.94499
6,77,75,2,True,1.0,3.0,0.1,0.1,lstm,0.2727,0.8649,0.2645,0.943278
9,88,75,2,True,1.0,3.0,0.1,0.1,lstm,0.2776,0.8884,0.27,0.936722
12,99,75,2,True,1.0,3.0,0.1,0.1,lstm,0.2825,0.8753,0.2753,0.934246
1,42,75,3,True,1.0,3.0,0.1,0.1,lstm,0.5214,0.7691,0.4401,0.745765
4,66,75,3,True,1.0,3.0,0.1,0.1,lstm,0.5266,0.7779,0.4572,0.753937
7,77,75,3,True,1.0,3.0,0.1,0.1,lstm,0.5352,0.7906,0.4685,0.751368
10,88,75,3,True,1.0,3.0,0.1,0.1,lstm,0.5336,0.7884,0.4706,0.74742
13,99,75,3,True,1.0,3.0,0.1,0.1,lstm,0.5327,0.7882,0.4578,0.748613


#### Save your results somewhere and load again to tabularize your results altogether

In [45]:
output_name = input("Plase give an output file name: ")

output_directory = f'../proxy_training_results/{model_path}/'
output_filename = os.path.join(output_directory, f'{output_name}.pkl')
print("Writing to file: ", output_filename)
with open(output_filename, 'wb') as f:
    pickle.dump(results, f)

Plase give an output file name: results
Writing to file:  ../proxy_training_results/lstm-control-results/results.pkl
