# Reproducing CB-LLMs Paper

### Clone the [Repo](https://github.com/neil-dandekar/capstone)

In [None]:
!git clone https://github.com/neil-dandekar/capstone.git

fatal: destination path 'capstone' already exists and is not an empty directory.


In [None]:
import os
os.chdir("/content/capstone/classification")

In [None]:
!pip install -r requirements.txt



## Setup (Text-Classification)

### Clone Checkpoints

In [None]:
!git lfs install
!git clone https://huggingface.co/cesun/cbllm-classification temp_repo
!mv temp_repo/mpnet_acs .
!rm -rf temp_repo

Updated git hooks.
Git LFS initialized.
Cloning into 'temp_repo'...
remote: Enumerating objects: 138, done.[K
remote: Counting objects: 100% (3/3), done.[K
remote: Compressing objects: 100% (3/3), done.[K
remote: Total 138 (delta 0), reused 0 (delta 0), pack-reused 135 (from 1)[K
Receiving objects: 100% (138/138), 20.01 KiB | 10.01 MiB/s, done.
Filtering content: 100% (117/117), 9.05 GiB | 156.52 MiB/s, done.
mv: cannot move 'temp_repo/mpnet_acs' to './mpnet_acs': Directory not empty


## Run Evaluations

### Table 2: Test Accuracy of CB-LLMs for Text Classification



The following code is aimed at replicating Table 2 of the paper.

This runs one of the evaluations from Table 2

In [None]:
# !python test_CBLLM.py --cbl_path mpnet_acs/SetFit_sst2/roberta_cbm/cbl_acc.pt

We automate this process by running all commands, collecting the outputs, and displaying them in a format like Table 2.

In [None]:
import subprocess
import json
import pandas as pd
import re

def run(cmd):
    print(f"\n=== Running: {cmd} ===\n")
    out = subprocess.check_output(cmd, shell=True, text=True)
    print(out)

    # extract accuracy from patterns like:
    # {'accuracy': 0.9406919275123559}
    match = re.search(r"\{\'accuracy\':\s*([0-9\.]+)\}", out)
    if match:
        return float(match.group(1))
    else:
        raise ValueError("Could not parse accuracy from output.")

results = []

In [None]:
BASE_CBL = "mpnet_acs"
BASE_TBM = "baseline_models/tbmc3m"
BASE_ROB = "baseline_models/roberta"

commands = [
    # CB-LLM (no ACC)
    ("CB-LLM", "SST2",
     f"python test_CBLLM.py --cbl_path {BASE_CBL}/SetFit_sst2/roberta_cbm/cbl.pt"),
    ("CB-LLM", "YelpP",
     f"python test_CBLLM.py --cbl_path {BASE_CBL}/yelp_polarity/roberta_cbm/cbl.pt"),
    ("CB-LLM", "AGnews",
     f"python test_CBLLM.py --cbl_path {BASE_CBL}/ag_news/roberta_cbm/cbl.pt"),
    ("CB-LLM", "DBpedia",
     f"python test_CBLLM.py --cbl_path {BASE_CBL}/dbpedia_14/roberta_cbm/cbl.pt"),

    # CB-LLM w/ ACC
    ("CB-LLM w/ ACC", "SST2",
     f"python test_CBLLM.py --cbl_path {BASE_CBL}/SetFit_sst2/roberta_cbm/cbl_acc.pt"),
    ("CB-LLM w/ ACC", "YelpP",
     f"python test_CBLLM.py --cbl_path {BASE_CBL}/yelp_polarity/roberta_cbm/cbl_acc.pt"),
    ("CB-LLM w/ ACC", "AGnews",
     f"python test_CBLLM.py --cbl_path {BASE_CBL}/ag_news/roberta_cbm/cbl_acc.pt"),
    ("CB-LLM w/ ACC", "DBpedia",
     f"python test_CBLLM.py --cbl_path {BASE_CBL}/dbpedia_14/roberta_cbm/cbl_acc.pt"),

    # TBM & C3M
    ("TBM&C3M", "SST2",
     f"python test_black_box.py --model_path {BASE_TBM}/backbone_finetuned_sst2.pt"),
    ("TBM&C3M", "YelpP",
     f"python test_black_box.py --model_path {BASE_TBM}/backbone_finetuned_yelp_polarity.pt"),
    ("TBM&C3M", "AGnews",
     f"python test_black_box.py --model_path {BASE_TBM}/backbone_finetuned_ag_news.pt"),
    ("TBM&C3M", "DBpedia",
     f"python test_black_box.py --model_path {BASE_TBM}/backbone_finetuned_dbpedia_14.pt"),

    # Roberta black-box baseline
    ("Roberta", "SST2",
     f"python test_black_box.py --model_path {BASE_ROB}/backbone_finetuned_sst2.pt"),
    ("Roberta", "YelpP",
     f"python test_black_box.py --model_path {BASE_ROB}/backbone_finetuned_yelp_polarity.pt"),
    ("Roberta", "AGnews",
     f"python test_black_box.py --model_path {BASE_ROB}/backbone_finetuned_ag_news.pt"),
    ("Roberta", "DBpedia",
     f"python test_black_box.py --model_path {BASE_ROB}/backbone_finetuned_dbpedia_14.pt"),
]


In [None]:
results = []

for method, dataset, cmd in commands:
    acc = run(cmd)
    results.append((method, dataset, acc))


=== Running: python test_CBLLM.py --cbl_path mpnet_acs/SetFit_sst2/roberta_cbm/cbl.pt ===

loading data...
test data len:  1821
tokenizing...
creating loader...
preparing backbone(roberta)+CBL...
get concept features...
{'accuracy': 0.9011532125205931}


=== Running: python test_CBLLM.py --cbl_path mpnet_acs/yelp_polarity/roberta_cbm/cbl.pt ===

loading data...
test data len:  38000
tokenizing...
creating loader...
preparing backbone(roberta)+CBL...
get concept features...
{'accuracy': 0.9312105263157895}


=== Running: python test_CBLLM.py --cbl_path mpnet_acs/ag_news/roberta_cbm/cbl.pt ===

loading data...
test data len:  7600
tokenizing...
creating loader...
preparing backbone(roberta)+CBL...
get concept features...
{'accuracy': 0.900921052631579}


=== Running: python test_CBLLM.py --cbl_path mpnet_acs/dbpedia_14/roberta_cbm/cbl.pt ===

loading data...
test data len:  70000
tokenizing...
creating loader...
preparing backbone(roberta)+CBL...
get concept features...
{'accuracy': 0.9

CalledProcessError: Command 'python test_black_box.py --model_path baseline_models/tbmc3m/backbone_finetuned_sst2.pt' returned non-zero exit status 1.

In [None]:
results

[('CB-LLM', 'SST2', 0.9011532125205931),
 ('CB-LLM', 'YelpP', 0.9312105263157895),
 ('CB-LLM', 'AGnews', 0.900921052631579),
 ('CB-LLM', 'DBpedia', 0.9830714285714286),
 ('CB-LLM w/ ACC', 'SST2', 0.9406919275123559),
 ('CB-LLM w/ ACC', 'YelpP', 0.9805789473684211),
 ('CB-LLM w/ ACC', 'AGnews', 0.9452631578947368),
 ('CB-LLM w/ ACC', 'DBpedia', 0.9927571428571429)]

In [None]:
df = pd.DataFrame(results, columns=["Method", "Dataset", "Accuracy"])
pivot = df.pivot(index="Method", columns="Dataset", values="Accuracy")

pivot = pivot[["SST2", "YelpP", "AGnews", "DBpedia"]]
df.to_csv("table2_results.csv", index=False)

pivot

Dataset,SST2,YelpP,AGnews,DBpedia
Method,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
CB-LLM,0.901153,0.931211,0.900921,0.983071
CB-LLM w/ ACC,0.940692,0.980579,0.945263,0.992757


### Table 3: Human Evaluations for Task 1 (Activation Faithfulness)

In [None]:
# !python print_concept_activations.py --cbl_path mpnet_acs/SetFit_sst2/roberta_cbm/cbl_acc.pt

In [None]:
import subprocess, os, time

MODELS = {
    "SST2":      "mpnet_acs/SetFit_sst2/roberta_cbm/cbl_acc.pt",
    "YelpP":     "mpnet_acs/yelp_polarity/roberta_cbm/cbl_acc.pt",
    "AGnews":    "mpnet_acs/ag_news/roberta_cbm/cbl_acc.pt",
    "DBpedia":   "mpnet_acs/dbpedia_14/roberta_cbm/cbl_acc.pt",
}

def run(cmd):
    print(f"\n=== Running: {cmd} ===\n")
    out = subprocess.run(cmd, shell=True, text=True, capture_output=True)
    if out.stdout: print(out.stdout)
    if out.stderr: print(out.stderr)

def concept_file_path(model_path):
    acs = model_path.split("/")[0]
    dataset = model_path.split("/")[1]
    backbone = model_path.split("/")[2]
    return f"{acs}/{dataset}/{backbone}/Concept_activation_acc.txt"

In [None]:
# 1) Generate concept activation text output for all datasets
for name, path in MODELS.items():
    run(f"python print_concept_activations.py --cbl_path {path}")
    print(f"✓ Completed concept extraction for {name}")


=== Running: python print_concept_activations.py --cbl_path mpnet_acs/SetFit_sst2/roberta_cbm/cbl_acc.pt ===



KeyboardInterrupt: 

In [None]:
# 2) Display where the files were written
print("\n=== Output Files Saved At ===\n")
for name, path in MODELS.items():
    print(f"{name}: {concept_file_path(path)}")


=== Output Files Saved At ===

SST2: mpnet_acs/SetFit_sst2/roberta_cbm/Concept_activation_acc.txt
YelpP: mpnet_acs/yelp_polarity/roberta_cbm/Concept_activation_acc.txt
AGnews: mpnet_acs/ag_news/roberta_cbm/Concept_activation_acc.txt
DBpedia: mpnet_acs/dbpedia_14/roberta_cbm/Concept_activation_acc.txt


### Table 4: Human Evaluations for Task 2 (Contribution Faithfulness)


In [None]:
import subprocess, os, time

MODELS = {
    "SST2":      "mpnet_acs/SetFit_sst2/roberta_cbm/cbl_acc.pt",
    "YelpP":     "mpnet_acs/yelp_polarity/roberta_cbm/cbl_acc.pt",
    "AGnews":    "mpnet_acs/ag_news/roberta_cbm/cbl_acc.pt",
    "DBpedia":   "mpnet_acs/dbpedia_14/roberta_cbm/cbl_acc.pt",
}
# python print_concept_contributions.py --cbl_path mpnet_acs/SetFit_sst2/roberta_cbm/cbl_acc.pt

def run(cmd):
    print(f"\n=== Running: {cmd} ===\n")
    out = subprocess.run(cmd, shell=True, text=True, capture_output=True)
    if out.stdout: print(out.stdout)
    if out.stderr: print(out.stderr)

def concept_file_path(model_path):
    acs = model_path.split("/")[0]
    dataset = model_path.split("/")[1]
    backbone = model_path.split("/")[2]
    return f"{acs}/{dataset}/{backbone}/Concept_activation_acc.txt"

In [None]:
# 1) Generate concept activation text output for all datasets
for name, path in MODELS.items():
    run(f"python print_concept_contributions.py --cbl_path {path}")
    print(f"✓ Completed concept extraction for {name}")

### Setup (Text Generation)

In [None]:
os.chdir("/content/capstone/generation")

In [None]:
!pip install -r requirements.txt



In [None]:
!git lfs install
!git clone https://huggingface.co/cesun/cbllm-generation temp_repo
!mv temp_repo/from_pretained_llama3_lora_cbm .
!rm -rf temp_repo

Updated git hooks.
Git LFS initialized.
Cloning into 'temp_repo'...
remote: Enumerating objects: 43, done.[K
remote: Counting objects: 100% (40/40), done.[K
remote: Compressing objects: 100% (40/40), done.[K
remote: Total 43 (delta 9), reused 0 (delta 0), pack-reused 3 (from 1)[K
Unpacking objects: 100% (43/43), 8.81 KiB | 1.26 MiB/s, done.
Filtering content: 100% (12/12), 3.54 GiB | 109.38 MiB/s, done.


### Table 5: Accuracy, Steerability, and Perplexity of CB-LLMs for Text Generation

In [None]:
!python test_CBLLM.py --cbl_path from_pretained_llama3_lora_cbm/SetFit_sst2/roberta_cbm/cbl_acc.pt


python3: can't open file '/content/capstone/generation/test_CBLLM.py': [Errno 2] No such file or directory


In [None]:
!python train_classifier.py --dataset sst2
!python test_steerability.py --dataset sst2

Traceback (most recent call last):
  File "/content/capstone/generation/train_classifier.py", line 11, in <module>
    from langdetect import detect
ModuleNotFoundError: No module named 'langdetect'
2025-11-17 23:39:59.713141: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-11-17 23:39:59.732456: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1763422799.754231   44904 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1763422799.760983   44904 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory

In [None]:
import os
import subprocess
import pandas as pd


DATASETS = ["sst2", "yelp_polarity", "ag_news", "dbpedia_14"]

MODELS = {
    "cbllm_adv": "from_pretained_llama3_lora_cbm",
    "cbllm_noadv": "from_pretained_llama3_lora_cbm_noadv",
    "llama3_blackbox": "from_pretained_llama3_lora_blackbox"
}

RESULTS = []

def run_cmd(cmd):
    print(f"\n Running: {cmd}")
    out = subprocess.run(cmd, shell=True, text=True, capture_output=True)
    print(out.stdout)
    for line in reversed(out.stdout.splitlines()):
        try:
            return float(line.strip())
        except:
            pass
    return None

In [None]:
for model_key, ckpt_dir in MODELS.items():
    print(f"\n=====================================")
    print(f" Evaluating Model: {model_key}")
    print(f"=====================================")

    for dataset in DATASETS:
        print(f"\n---- Dataset: {dataset} ----")

        # Copies correct checkpoint files into expected names:
        subprocess.run(f"cp {ckpt_dir}/{dataset}/adapter_model.bin llama3", shell=True)
        subprocess.run(f"cp {ckpt_dir}/{dataset}/cbl.pt cbl.pt", shell=True)

        # ---------- Accuracy ----------
        acc = run_cmd(f"python test_concepts.py --dataset {dataset}")

        # ---------- Steerability (skip for black-box) ----------
        steer = None
        if model_key != "llama3_blackbox":
            run_cmd(f"python train_classifier.py --dataset {dataset}")
            steer = run_cmd(f"python test_steerability.py --dataset {dataset}")

        # ---------- Perplexity ----------
        ppl = run_cmd(f"python test_perplexity.py --dataset {dataset}")

        RESULTS.append({
            "Model": model_key,
            "Dataset": dataset,
            "Accuracy": acc,
            "Steerability": steer,
            "Perplexity": ppl
        })


 Evaluating Model: cbllm_adv

---- Dataset: sst2 ----

 Running: python test_concepts.py --dataset sst2
loading data...
test data len:  1821
tokenizing...


 Running: python train_classifier.py --dataset sst2


 Running: python test_steerability.py --dataset sst2


 Running: python test_perplexity.py --dataset sst2


---- Dataset: yelp_polarity ----

 Running: python test_concepts.py --dataset yelp_polarity
loading data...
test data len:  38000
tokenizing...


 Running: python train_classifier.py --dataset yelp_polarity


 Running: python test_steerability.py --dataset yelp_polarity


 Running: python test_perplexity.py --dataset yelp_polarity


---- Dataset: ag_news ----

 Running: python test_concepts.py --dataset ag_news
loading data...
test data len:  7600
tokenizing...


 Running: python train_classifier.py --dataset ag_news


 Running: python test_steerability.py --dataset ag_news


 Running: python test_perplexity.py --dataset ag_news


---- Dataset: dbpedia_14 ----

 Running: 

In [None]:
df = pd.DataFrame(RESULTS)
pivot = df.pivot(index="Model", columns="Dataset", values=["Accuracy", "Steerability", "Perplexity"])

df.to_csv("table5_results.csv", index=False)
pivot


Unnamed: 0_level_0,Accuracy,Accuracy,Accuracy,Accuracy,Steerability,Steerability,Steerability,Steerability,Perplexity,Perplexity,Perplexity,Perplexity
Dataset,ag_news,dbpedia_14,sst2,yelp_polarity,ag_news,dbpedia_14,sst2,yelp_polarity,ag_news,dbpedia_14,sst2,yelp_polarity
Model,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2
cbllm_adv,,,,,,,,,,,,
cbllm_noadv,,,,,,,,,,,,
llama3_blackbox,,,,,,,,,,,,
