# Extracting Top Activating Text Excerpts for Neurons 

This notebook serves as a guide to extracting the most highly activating text excerpts for a specified neuron. We will leverage [NeuroX](https://neurox.qcri.org/) which is a framework that aims to interpret deep NLP models and increase the transparency of their inner workings and predictions. Through the utilization of such code, one can get the top activating text excerpts for a given neuron. Consequently, this capability facilitates the creation of a Neuron Record, enabling the generation of explanations tailored to the aforementioned neuron.

# Import Libraries

In [None]:
import neurox.data.extraction.transformers_extractor as transformers_extractor
from neurox.data import loader as loader
import numpy as np

# Inspect Toy Data

The toy data file contains random sentences generated using ChatGPT

In [None]:
!head -n 10 toy-data.txt

The sun rises in the east.
Birds fly across the sky.
The river flows calmly.
Children play in the park.
Books are knowledge.
The wind blows.
The cat meows.
Cars drive on the road.
The moon shines brightly.
The ocean waves crash.


# Define Variables For Activation Extraction

In [None]:
data_path = "toy-data.txt"
activations_path = "activations.json"
model = "bert-base-cased"
max_sent_l=512

# Extract Activations 

We will be extracting the activations using the [NeuroX transformers Extractor](https://neurox.qcri.org/docs/neurox.data.extraction.html)

In [None]:
transformers_extractor.extract_representations(model, data_path, activations_path)

Loading model: bert-base-cased


Some weights of the model checkpoint at bert-base-cased were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Reading input corpus
Preparing output file
Extracting representations from model
Sentence         : "The sun rises in the east."
Original    (006): ['The', 'sun', 'rises', 'in', 'the', 'east.']
Tokenized   (009): ['[CLS]', 'The', 'sun', 'rises', 'in', 'the', 'east', '.', '[SEP]']
Filtered   (007): ['The', 'sun', 'rises', 'in', 'the', 'east', '.']
Detokenized (006): ['The', 'sun', 'rises', 'in', 'the', 'east.']
Counter: 7
Hidden states:  (13, 6, 768)
# Extracted words:  6
Sentence         : "Birds fly across the sky."
Original    (005): ['Birds', 'fly', 'across', 'the', 'sky.']
Tokenized   (008): ['[CLS]', 'Birds', 'fly', 'across', 'the', 'sky', '.', '[SEP]']
Filtered   (006): ['Birds', 'fly', 'across', 'the', 'sky', '.']
Detokenized (005): ['Birds', 'fly', 'across', 'the', 'sky.']
Counter: 6
Hidden states:  (13, 5, 768)
# Extracted words:  5
Sentence         : "The river flows calmly."
Original    (004): ['The', 'river', 'flows', 'calmly.']
Tokenized   (007): ['[CLS]', 'The', 'river', 

Sentence         : "The flower is beautiful."
Original    (004): ['The', 'flower', 'is', 'beautiful.']
Tokenized   (007): ['[CLS]', 'The', 'flower', 'is', 'beautiful', '.', '[SEP]']
Filtered   (005): ['The', 'flower', 'is', 'beautiful', '.']
Detokenized (004): ['The', 'flower', 'is', 'beautiful.']
Counter: 5
Hidden states:  (13, 4, 768)
# Extracted words:  4
Sentence         : "The sunset is stunning."
Original    (004): ['The', 'sunset', 'is', 'stunning.']
Tokenized   (007): ['[CLS]', 'The', 'sunset', 'is', 'stunning', '.', '[SEP]']
Filtered   (005): ['The', 'sunset', 'is', 'stunning', '.']
Detokenized (004): ['The', 'sunset', 'is', 'stunning.']
Counter: 5
Hidden states:  (13, 4, 768)
# Extracted words:  4
Sentence         : "The mountain is pretty."
Original    (004): ['The', 'mountain', 'is', 'pretty.']
Tokenized   (007): ['[CLS]', 'The', 'mountain', 'is', 'pretty', '.', '[SEP]']
Filtered   (005): ['The', 'mountain', 'is', 'pretty', '.']
Detokenized (004): ['The', 'mountain', 'is', 

# Load Activations

Load the activations using the [NeuroX Data Loader](https://neurox.readthedocs.io/en/latest/neurox.data.html#module-neurox.data.loader)

In [None]:
# Load Activations and tokens
activations, num_layers = loader.load_activations(activations_path)
tokens = loader.load_data(data_path, data_path, activations, max_sent_l=max_sent_l)
num_neurons_per_layer = int(activations[0].shape[1] / num_layers)

Loading json activations from activations.json...
35 13.0


# Define Variables For Getting The Top Text Excerpts For A Specific Neuron

In [None]:
neuron = 133
layer = 4
n = 5 # consider top 5 text excerpts for the neuron

# Get Max Activation Records

In [None]:
max_activations_record = {}
max_tokens_record = {}
for sent_idx, sent in enumerate(tokens["source"]):
    max_activations_record[sent_idx] = np.max(activations[sent_idx][:,num_neurons_per_layer*layer + neuron])
    max_tokens_record[sent_idx] = sent[np.argmax(activations[sent_idx][:,num_neurons_per_layer*layer + neuron])]

# Sort the Activations 

In [None]:
# sort activations obtained and choose the top n activating records 
sorted_acts = sorted(max_activations_record.items(), key = lambda x: x[1], reverse = True)[:n]

In [None]:
print(sorted_acts)

[(18, 1.6492859), (7, 1.1502558), (13, 1.0802596), (10, 1.0725111), (3, 0.9881723)]


# Print top N Activating text Excerpts For The Selected Neuron

In [None]:
# get top 5 excerpts corresponding to the neuron
for elem in sorted_acts: 
    sent_idx = elem[0]
    sent = tokens["source"][sent_idx]
    activation_values = activations[sent_idx][:,num_neurons_per_layer*layer + neuron]
    print(list(zip(sent, activation_values.tolist())))

[('The', 0.0783667266368866), ('train', -0.17363914847373962), ('arrives', -0.9608103036880493), ('on', 1.649285912513733), ('time.', 0.2141266018152237)]
[('Cars', 0.13900332152843475), ('drive', 0.2723767161369324), ('on', 1.150255799293518), ('the', 0.21985113620758057), ('road.', -0.09761186689138412)]
[('Rain', 0.5524147748947144), ('falls', 1.0802595615386963), ('from', 0.2041938751935959), ('the', 0.77774977684021), ('clouds.', 0.047311119735240936)]
[('People', -0.2982337772846222), ('walk', 0.07899212092161179), ('in', 1.072511076927185), ('the', 0.18185363709926605), ('street.', 0.0007843400235287845)]
[('Children', 0.13210095465183258), ('play', 0.27250510454177856), ('in', 0.9881722927093506), ('the', 0.6650263667106628), ('park.', 0.030416980385780334)]
