# Experiment Pipeline


**Data workflow**

question_id is the unique identifier across the whole project

- Set of questions → csv file
    - question id
    - prompt
    - groundtruth answer
    - question class
- Activations → .pt files
    - question id
    - hidden_states
    - output
- Processed activations dataframe (for faster loading) → .csv
    - rows: neuron_ids (flattened)
    - columns : q_ids
- Cluster → list of tuples (.npy files)
    - [(layer_id, emb_id), ..]
- Analysis files → csv
    - question id
    - accuracy without knockout
    - accuracy with knockout

**Function workflow**

- load_llm: load an llm for evaluation
    - input → model_paths
    - output → model instance
- generate_outputs_batch: takes a set of questions and generates llm outputs, can do knockout
    - input → set of questions, llm, optional cluster npy file, activation dir
    - output → activations, save to activation_dir
- process_activations: load activation dir using aggregation strategy (first/avg/last) and save the df
    - input → activations_dir, set_of_questions
    - output → Processed activations dataframe
- cluster_activations: load activations df and do clustering and save them, append _kmeans or _pca for the method, optionally calculate r2 score to rank clusters
    - input → Processed activations dataframe, clustering_kwargs
    - output → cluster npy files saved
- visualize clusters [Needs to be implemented]
    - input → activation_dir, question_id
    - output → .mp4 file with plots for each token

In [None]:
import os
opj = os.path.join
os.environ['CUDA_VISIBLE_DEVICES'] = '0, 1'

import utils
print("Started pipeline")

# Load LLM

In [None]:
# load llm
model_path = '/home/gridsan/wzulfikar/models/vicuna-13b-v1.3'
lora_model_path = '/home/gridsan/wzulfikar/models/alpaca-lora-13b'

tokenizer, model = utils.load_llm(model_path, lora_model_path)

# Generate baseline outputs


In [None]:
dataset_dir = '/home/gridsan/wzulfikar/activations/cot_vicuna_13b/'

cot_csv_file = opj(dataset_dir, 'prompt_chain.csv')
no_cot_csv_file = opj(dataset_dir, 'prompt_no_chain.csv')
raw_activations_dir = opj(dataset_dir, 'raw')

utils.generate_outputs_batch([cot_csv_file, no_cot_csv_file], tokenizer, model, activations_dir=raw_activations_dir)


# Process activations for clustering

In [None]:
aggr_strategy = 'avg'
utils.process_activations([cot_csv_file, no_cot_csv_file], raw_activations_dir, aggr_strategy=aggr_strategy)

# Cluster using kmeans


In [None]:
activations_df_file = opj(raw_activations_dir, f'activations_{aggr_strategy}.csv')
clusters_dir = opj(dataset_dir, 'clusters')
n_clusters = 16

utils.cluster_activations_kmeans(activations_df_file, clusters_dir, cluster_kwargs={'n_clusters': n_clusters}, 
                                 calculate_significance=True)

# Generate outputs with knockout of each cluster


In [None]:
for c in range(5, n_clusters):
    knockout_cluster = opj(clusters_dir, f'{c}.npy')
    knockout_activation_dir = opj(clusters_dir, f'cluster_{c}')
    utils.generate_outputs_batch([cot_csv_file, no_cot_csv_file], tokenizer, model, knockout_cluster=knockout_cluster, activations_dir=knockout_activation_dir)