**Official implementation of paper "Revisiting In-context Learning Inference Circuit in Large Language Models" (ICLR 2025)**:

### The Centroid Classifier Probing

This experiment is to train centroid classifiers on the ICL hidden states, then test the accuracies to get whether the information in the hidden states is sufficient for ICL task (as shown in Fig. 3). Also, by controlling the selection of different hidden states, we can conduct a control experiment as shown in Fig. 5 (Right).

Author: Hakaze Cho, yfzhao@jaist.ac.jp, 2024/09

Organized, commented, and modified by: Hakaze Cho, 2025/01/26

**Part I: Import, Define, and Load Everything**

What you should do:
1. [Cell 1] Change to the path from your working directory to the directory containing the README.md file.
2. [Cell 2] Define your experiment parameters.
3. Run the Cell 1 and Cell 2.

In [None]:
# Cell 1: Import libraries and change the working directory.

## Change the working directory
import os
try:
    # Change to the path from your working directory to the directory containing the README.md file.
    os.chdir("ICL_Inference_Dynamics_Released") 
except:
    print("Already in the correct directory or the directory does not exist.")

## Import libraries
from util import load_model_and_data, inference
import StaICC
import matplotlib.pyplot as plt
from util import layered_hidden_calibration
import functools

## Some definations for the plots.
plt.style.use('default')
plt.rc('font',family='Cambria Math')
plt.rcParams['font.family'] = 'serif'
plt.rcParams['font.serif'] = ['Cambria Math'] + plt.rcParams['font.serif']

EXPERIMENT_PRESETS = {
    "Fig3_Orange": ("none", "none", True, False),
    "Fig3_Blue": ("last_sentence_token", "last_sentence_token", True, False),
    "Fig5_Orange": ("none", "none", True, False),
    "Fig5_Green_Solid": ("none", "label_words", True, False),
    "Fig5_Red_Solid": ("none", "label_words", False, False),
    "Fig5_Green_Dotted": ("label_words", "label_words", True, False),
    "Fig5_Red_Dotted": ("label_words", "label_words", False, False),
    "Fig5_Gray": ("none", "label_words", True, True),
}

**instructions on parameter `trained_token_type`, `predicted_token_type`, `corr_label`, and `no_context_baseline`:**

- For Fig. 3:
    - Orange: `trained_token_type = "none"`, `predicted_token_type = "none"`, `corr_label = True`, `no_context_baseline = False`
    - Green: `trained_token_type = "last_sentence_token"`, `predicted_token_type = "last_sentence_token"`, `corr_label = True`, `no_context_baseline = False`
- For Fig. 5 (Right): 
    - Solid curves: `trained_token_type = "none"`; Dotted curves: `trained_token_type = "label_words"`
    - Green curves: `predicted_token_type = "label_words"`, `corr_label = True`, `no_context_baseline = False`
    - Red curves: `predicted_token_type = "label_words"`, `corr_label = False`, `no_context_baseline = False`
    - Orange curve: `predicted_token_type = "none"`, `corr_label = True`, `no_context_baseline = False`
    - Gray curve: `predicted_token_type = "label_words"`, `corr_label = True`, `no_context_baseline = True`

You can directly set the `experiment_presets` to the desired preset, defined in Cell 1, or set the parameters manually.

In [28]:
# Cell 2: Model and huggingfacetoken configurations

## The huggingface model name to be tested as the LM for ICL. 
## Recommended: "meta-llama/Meta-Llama-3-8B", "EleutherAI/pythia-6.9b", "tiiuae/falcon-7b", "meta-llama/Meta-Llama-3-70B", "tiiuae/falcon-40b"
ICL_model_name = "tiiuae/falcon-7b" 

## Whether to use the quantized version of the model. 
## Recommended: Keep it default.
quantized = False if ICL_model_name in ["meta-llama/Meta-Llama-3-8B", "EleutherAI/pythia-6.9b", "tiiuae/falcon-7b"] else True

## The huggingface token to access the model. If you use the Llama model, you need to set this.
huggingface_token = "your token here"


# Experiment parameters

## On which token the centroid classifier is trained. Alternative: "none" (forerunner token s), "label_words" (y), "last_sentence_token" (x).
trained_token_type = "label_words"

## On which token the centroid classifier predicts. Alternative: "none" (forerunner token s), "label_words" (y), "last_sentence_token" (x).
predicted_token_type = "label_words" 

## Whether the **last** label in the prompt is (True) correct label or (False) wrong label.
## Only effective when the `_token_type` is "label_words".
corr_label = True

## Correct Label w/o Context. That is, use classifier trained on the normal prompt to predict the label-token-only string.
## Should only be true when you want to get the gray line in the Fig. 5 (Right).
no_context_baseline = False

## Experiment Presets
## See cell 1 for the details. If not None, the above parameters will be overwritten.
experiment_presets = "Fig5_Red_Dotted"

## The demonstration numbers. Recommended: 0, 1, 2, 4, 8, 12.
k = 4 

## The used dataset index from the StaICC library. Alternative: 0, 1, 2, 3, 4, 5.
dataset_index = 2 

## Force the ICL_model to reload, even the ICL_model is already in the variables. 
## Recommended: False.
model_forced_reload = False

In [None]:
# Cell 3: Load the experiment presets.

if experiment_presets is not None:
    print("Using the experiment presets. Overwriting the parameters.")
    trained_token_type, predicted_token_type, corr_label, no_context_baseline = EXPERIMENT_PRESETS[experiment_presets]

In [None]:
# Cell 4: Load the data and build the test inputs.

bench = StaICC.Normal(k)
prompts, queries = load_model_and_data.load_data_from_StaICC_experimentor(bench[dataset_index], predicted_token_type, corr_label)
if no_context_baseline:
    prompts = bench[dataset_index].get_label_space()

In [31]:
# Cell 5: Load the model.

vars_dict = vars() if "ICL_model" in vars() else locals()
if "ICL_model" not in vars_dict or model_forced_reload:
    ICL_model, ICL_tknz = load_model_and_data.load_ICL_model(ICL_model_name, huggingface_token = huggingface_token, quantized = quantized)
    loaded = True

**Part II: Run the Experiment**

What you should do:

1. Run the Cell 6 - 8.

In [None]:
# Cell 6: Get the ICL hidden states that should be used for the prediction of the centroid classifier.

ICL_hidden_states = inference.ICL_inference_to_hidden_states(ICL_model, ICL_tknz, prompts)

In [None]:
# Cell 7: Train the centroid classifier, one for each layer of the ICL hidden states.

inferencer = layered_hidden_calibration.layered_hidden_calibration(
    bench[dataset_index].get_label_space(), 
    len(ICL_hidden_states), 
    trained_token_type, 
    corr_label if trained_token_type == "label_words" else True
)
inferencer.train(
    bench[dataset_index].prompt_former, 
    functools.partial(inference.ICL_inference_to_hidden_states_transposed, model = ICL_model, tokenizer = ICL_tknz),
    calibration_set = bench[dataset_index].calibration_set(),
    calibration_number = 256,
    k = k
)

In [34]:
# Cell 8: Decode the ICL hidden states by the centroid classifier, and evalate the performance by StaICC.

inf_res = inferencer.batched_layered_inference(ICL_hidden_states)
acc_in_layer = []
for i in range(len(ICL_hidden_states)):
    acc_in_layer.append(bench[dataset_index](forward_inference = None, input_prediction = inf_res[i] + inf_res[i])[0]['accuracy'])

**Part III: Plot and Save the Result**

What you should do:

1. Run the Cell 9 and 10. You can define your own file name and dictionary to save the result in Cell 10.

In [None]:
# Cell 9: Data preview.

plt.figure(figsize=(4, 3))
plt.xlabel("Transformer Block Number", fontsize = 12)
plt.ylabel("Accuracy", fontsize = 12)
plt.title("Centroid Classifier Acc. on dataset " + str(dataset_index) + "\n model: " + ICL_model_name + "\ntrained on: " + trained_token_type + ", k: " + str(k) , fontsize = 12)
plt.plot(acc_in_layer, 
         color = (("green" if corr_label else "red") if not no_context_baseline else "gray"), 
         label = ("predicted on: " + predicted_token_type) if not no_context_baseline else "Correct Label w/o Context",
         linestyle = "--" if trained_token_type == "label_words" else "-",
        )
plt.axhline(1/len(bench[dataset_index].get_label_space()), color = "black", linestyle = "--", linewidth = 1, label = "Random Baseline")
plt.legend(loc = 4, prop={'size': 9})

In [36]:
# Cell 10: Save the result.
# Result file organization:
# list[layer_index] = the accuracy predicted in layer `layer_index`.

import pickle

data_file_name = "data/" + ICL_model_name.replace('/', '_')+ "," + predicted_token_type + ",HiddenC," + trained_token_type  + ',' + predicted_token_type + ',' + ("corr," if corr_label else "wrong,") + str(k) + ',' + str(dataset_index + 1) + ".pickle"
with open(data_file_name, 'wb') as f:
    pickle.dump(acc_in_layer, f)