## NEURAL ACTIVATIONS EXPLORATION

RQ1: How does the model process semantically plausible vs implausible sentences? 
Method: check whether there is an overlap in the most influencial neurons.

* H1: there is a substantial overlap --> set of SYNTAX NEURONS
* H2: there is no substantial overlap --> SYNTAX + SEMANTICS network

In [4]:
import numpy as np

# --- Configuration ---
NUM_TOP_NEURONS = 1280  # Let's look at the overlap in the top 50

# --- Load the Results ---
try:
    effects_anomalous = np.load('data/bahavioral_scores/anomalous_neuron_effects.npy')
    effects_core = np.load('data/bahavioral_scores/core_neuron_effects.npy')
except FileNotFoundError as e:
    print(f"Error: Make sure both neuron effect files exist. Missing: {e.filename}")
    exit()

# --- Find the Top Neurons for Each Condition ---

# Get the indices of the neurons, sorted from least to most influential
sorted_indices_anomalous = np.argsort(effects_anomalous)
sorted_indices_core = np.argsort(effects_core)

# Take the last N indices to get the top N most influential
top_neurons_anomalous = set(sorted_indices_anomalous[-NUM_TOP_NEURONS:])
top_neurons_core = set(sorted_indices_core[-NUM_TOP_NEURONS:])

print(f"--- Overlap Analysis (Top {NUM_TOP_NEURONS} Neurons) ---")
print(f"\nTop influential neurons for ANOMALOUS:\n{sorted(list(top_neurons_anomalous))}")
print(f"\nTop influential neurons for CORE:\n{sorted(list(top_neurons_core))}")

# --- Calculate the Overlap ---
overlapping_neurons = top_neurons_anomalous.intersection(top_neurons_core)

overlap_percentage = (len(overlapping_neurons) / NUM_TOP_NEURONS) * 100

print("\n--- RESULTS ---")
print(f"Number of overlapping neurons: {len(overlapping_neurons)}")
print(f"Overlap percentage: {overlap_percentage:.2f}%")
print(f"\nOverlapping neuron indices:\n{sorted(list(overlapping_neurons))}")

# --- Interpretation ---
print("\n--- Interpretation ---")
if overlap_percentage > 20:
    print("Result: High overlap. This suggests the model uses a stable, core set of neurons for syntactic processing in both plausible and implausible contexts.")
elif overlap_percentage > 5:
    print("Result: Moderate overlap. This suggests some neurons are dedicated to pure syntax, while others might be involved in combined syntax+semantics processing.")
else:
    print("Result: Low to no overlap. This suggests the model may use largely separate neural pathways for processing plausible vs. implausible syntax.")

--- Overlap Analysis (Top 1280 Neurons) ---

Top influential neurons for ANOMALOUS:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 

In [12]:
import torch
import numpy as np
import os

# --- Configuration ---
neuron_effects_file = 'data/bahavioral_scores/anomalous_neuron_effects.npy'
sample_data_file = 'data/raw_activations/ANOMALOUS_chomsky_transitive_15000sampled_10-1/x_px/item_1.pt'
num_to_show = 10
# --------------------


def inspect_neuron_activations(neuron_effects_file, raw_data_file, num_to_show):
    """
    Loads neuron rankings and a sample data file, then prints the activations
    of the top-ranked neurons for that sample.
    """
    print(f"--- Inspecting Neuron Activations ---")

    # --- Step 1: Load the Neuron Ranking ---
    try:
        # ================================================================= #
        # THE FIX: Add the allow_pickle=True argument here.               #
        # ================================================================= #
        neuron_effects = np.load(neuron_effects_file, allow_pickle=True)
        
    except FileNotFoundError:
        print(f"ERROR: Neuron effects file not found at '{neuron_effects_file}'")
        return

    top_neuron_indices = np.argsort(neuron_effects)[-num_to_show:][::-1]
    
    print(f"\nLoaded neuron rankings from: {neuron_effects_file}")
    print(f"Top {num_to_show} most influential neuron indices (according to the file):")
    print(top_neuron_indices)

    # --- Step 2: Load a Sample Raw Data File ---
    try:
        raw_data = torch.load(raw_data_file)
    except FileNotFoundError:
        print(f"\nERROR: Raw data file not found at '{raw_data_file}'")
        print("Please make sure the path is correct and the file exists.")
        return
        
    print(f"\nLoaded sample raw data from: {raw_data_file}")
    
    activations = raw_data['activations']['layer_-1_hx']
    
    # --- Step 3: Print the Activations for the Top Neurons ---
    print("\n--- Activations for the Top-Ranked Neurons (for this single sentence) ---")
    print("Note: These values are the MEAN activation across all tokens in this sentence.\n")
    
    for neuron_idx in top_neuron_indices:
        neuron_activation_over_time = activations[:, neuron_idx]
        mean_activation = neuron_activation_over_time.mean().item()
        
        print(f"Neuron #{neuron_idx:<4}: Mean Activation = {mean_activation:.4f}")

# --- Run the function ---
inspect_neuron_activations(neuron_effects_file, sample_data_file, num_to_show)

--- Inspecting Neuron Activations ---

Loaded neuron rankings from: results/analysis/anomalous_neuron_effects.npy
Top 10 most influential neuron indices (according to the file):
[  36  105  623  680   50 1026  816  489  326  241]

ERROR: Raw data file not found at 'results/raw_data/ANOMALOUS_chomsky_transitive_15000sampled_10-1/x_px/item_1.pt'
Please make sure the path is correct and the file exists.


**RESULTS VERIFICATION TASKS**:

* Repeat analysis with target PRODUCTION (aka GENERATION) to control for the effect of TASK
* Repeat analysis for the other corpora and sequentially separate overlaps and differences
* Repeat with a different Syntactic Processing Psycholinguistic task 


Model
That is an excellent and crucial question. Before you run the final experiment, you must understand exactly what the script is doing.
Let's break down the test_impaired_generation.py script step-by-step. Its purpose is to create a "virtual patient" by impairing a healthy model and then to test its language production.
The Script's Journey: A Detailed Walkthrough
1. Setup and Configuration (if __name__ == "__main__":)
Argument Parsing: The script starts by defining the command-line arguments it needs:
--model: Which model to load (e.g., gpt2-large).
--neuron_effects_file: The path to your .npy file that contains the list of influential neurons. This is the "targeting system" from your previous analysis.
--num_impaired_neurons: How many of the top neurons from your list to impair (e.g., 50).
--alpha: The "dampening factor." 0.5 means "reduce the activation of the target neurons by 50%." 0.0 would mean "completely silence them."
--num_generations: How many example sentences to generate.
Loading the Tools:
It loads the healthy gpt2-large model and its tokenizer from Hugging Face.
It loads your neuron_effects.npy file.
It sorts the neuron scores and identifies the indices of the top N most influential neurons. This is your final "target list."
2. The "Healthy" Generation Loop
The Goal: To establish a baseline. We need to see what kind of language the healthy, unimpaired model produces.
The Process:
The script takes a simple, open-ended prompt (e.g., "The scientist discovered").
It feeds this prompt to the standard Hugging Face model.generate() function.
This function uses the model to predict the next word, then the next, and so on, for a set number of steps, generating a full sentence completion.
The generated text is saved to a list called healthy_generations.
This process is repeated for all the prompts.
3. The "Impairment" Step (The "Brain Surgery")
The Goal: To apply the "soft lesion" to the model without permanently changing it.
The Method (forward_pass_with_impairment_hooks): This is the most sophisticated part.
It defines a small, special function called a "hook."
It then uses a standard PyTorch feature to attach this hook to the final layer of the Transformer model.
What the Hook Does: A hook is like a sensor that activates automatically during the model's computation. Our specific hook is designed to do the following every single time the final layer produces its output:
It intercepts the hidden state (the neuron activations).
It finds the activations corresponding to our target neurons (e.g., neuron #36, #105, etc.).
It multiplies just those specific activation values by our alpha factor (e.g., * 0.5).
It then passes this modified, dampened hidden state on to the next part of the model.
The Result: The model is now "impaired." Its core knowledge is intact, but the output signal from its most important syntax-processing neurons is now weaker.
4. The "Impaired" Generation Loop
The Goal: To see how the "impaired" model's language production differs from the healthy baseline.
The Process: This is identical to the "Healthy" loop. It uses the same prompts and the same model.generate() function. However, because the hook is now active, every time the model thinks, the activations of the syntax neurons are being dampened, which will influence the words it chooses to generate.
The generated text is saved to a list called impaired_generations.
5. Cleanup and Saving
hook_handle.remove(): This is a critical step. It detaches the hook from the model, instantly restoring it to its original, healthy state for any future operations.
Saving to CSV: The script creates a pandas DataFrame with three columns (prompt, healthy_generation, impaired_generation) and saves it to a .csv file. This gives you a perfect side-by-side comparison for your final analysis.
In one sentence: The script establishes a baseline by letting the healthy model generate text, then applies a targeted, temporary "lesion" to the syntax neurons using a hook, and finally lets the impaired model generate text from the same prompts to measure the difference in their productive capabilities.