# **Script for Calculating Surprisal Scores on Different Time Scales**

This script was written by [Merle Schuckart](merle.schuckart@uni-luebeck.de) with a lot of help from [Lea-Maria Schmitt](https://www.predictivebrainlab.com/people-details/lea-maria-schmitt/) and [a very nice Stackoverflow user](https://stackoverflow.com/users/1949646/kaybee).

Version: 26th of April 2023

## Import modules

In [47]:
%%capture
# (disable %%capture to print output in console)

# Import modules:

! pip install transformers
from transformers import AutoTokenizer, AutoModelForCausalLM #,AutoModelWithLMHead # will be deprecated soon-ish
import torch
import tensorflow as tf

import re # for re.search()
import numpy as np # for maths functions
import pandas as pd # for dfs
import csv # for reading in csv properly
from google.colab import files # for downloading files

## Load tokenizer & model

In [48]:
%%capture
# (disable %%capture to print output in console)

# download pre-trained German GPT-2 model & tokenizer
tokenizer = AutoTokenizer.from_pretrained("dbmdz/german-gpt2")

# initialise the model
model = AutoModelForCausalLM.from_pretrained("dbmdz/german-gpt2", pad_token_id = tokenizer.eos_token_id)

## Test: How to lesion single layers (aka zeroing-out layers)

This is just an example to show how you can set the weights in 1 specific layer to 0: 

We first make a prediction with the full model, then we lesion layer 5 by setting all pre-trained network weights to 0, then we make a prediction for the same input prompt again. 

The second prediction should differ from the first one. 

If you run this chunk multiple times and set a different layer to lesion every time, you can lesion multiple layers. The prediction should get worse with each layer you turn off.

In [49]:
# set input text
input_text = ["Orlando", "liebte", "von", "Natur", "aus", "einsame", "Orte,", "weite", "Ausblicke", "und", "das", "Gefühl,", "für", "immer", "und", "ewig"] # correct continuation: "allein zu sein."
# turn list of words into 1 text string:
input_text = ' '.join(input_text)
# tokenize text
ids_list = tokenizer.encode(input_text)
# put the token IDs into array
ids_array = np.expand_dims((ids_list), axis = 0)

# predict the next x tokens
output = model.generate(torch.tensor(ids_array), 
                        return_dict_in_generate = True, 
                        output_scores = True, 
                        max_new_tokens = 10) # set output length here!

# decode generated token
# generate list of token ids from model output
predicted_ids = output.sequences
# decode output to get words instead of token ids
prediction = tokenizer.decode(predicted_ids.numpy().tolist()[0])

print('\n***Before zeroing out layer***')
print("prediction:", prediction)


### Lesion 1 Layer in the Model

# choose layer to lesion - example: lesion layer 5
layer_idx = 5

# Find all parameter names for this layer

# The parameter names are basically the names of the weight tensors. 
# Each weight tensor is stored in the model.state_dict with the corresponding 
# key in the dict being the parameter name. 

# The parameter names of layer X all include the string "transformer.h.X." 
# with X being the index of layer X (e.g. "transformer.h.5." for layer_idx = 5). 

# So to get a list of all parameter names for the layer we want to lesion, we need to filter 
# all keys in the model.state_dict for the string "transformer.h.", followed by the layer index: 
paramnames = filter(lambda s: re.search(f'transformer.h\.{layer_idx}\.',s) is not None, model.state_dict().keys())

# set the weights of these parameters to 0
# loop parameter names
for paramname in paramnames:
  # get weight tensor from model.state_dict
  # w is a tensor containing the layer's weights
  w = model.state_dict()[paramname]
  
  # if the tensor w has more than 0 dimension (aka is not empty), 
  # you can set all weights in the tensor to 0.
  if w.ndim > 0:
    w[:] = 0

# FINISHED UPDATING MODEL - LAYER 5 IS NOW LESIONED


# generate prediction using updated model:
# predict the next x tokens
output = model.generate(torch.tensor(ids_array), 
                        return_dict_in_generate = True, 
                        output_scores = True, 
                        max_new_tokens = 10) # set output length here!

# decode generated token
# generate list of token ids from model output
predicted_ids = output.sequences
# decode output to get words instead of token ids
prediction = tokenizer.decode(predicted_ids.numpy().tolist()[0])

print('\n***After zeroing out layer***')
print("prediction:", prediction)



***Before zeroing out layer***
prediction: Orlando liebte von Natur aus einsame Orte, weite Ausblicke und das Gefühl, für immer und ewig in der Nähe zu sein.
Er war ein

***After zeroing out layer***
prediction: Orlando liebte von Natur aus einsame Orte, weite Ausblicke und das Gefühl, für immer und ewig zu sein.
Die Natur ist ein Paradies für


# Surprisal Scores

Now comes the real deal: We need surprisal scores for each word in the EXNAT-1 texts on each time scale (time scale = layer).



## Load Texts

In [50]:
# Load texts

# Important: You need to put the CSV called "Texts_surprisal_scores.csv" (see the surprisal score folder in the EXNAT-1 analysis folder on Github) 
# into the files section (see menu on the left) before running this chunk.

# read in csv with texts
texts_df = pd.read_csv('/content/Texts_surprisal_scores.csv', 
                       sep = ";", 
                       quoting = csv.QUOTE_NONE, 
                       quotechar = None)

# Check dataframe - show all rows and columns in console:
#pd.set_option('display.max_rows', None)
#pd.set_option('display.max_columns', None)
#print(texts_df["word_punct"])

# get unique text numbers in texts_df
text_nrs = list(set(texts_df["text_nr"]))
#print("preparing surprisal scores for the following texts:")
#print(text_nrs)


### Settings


In [63]:
# How long should the context chunk be?
# I decided I want 50 words, but that's a completely arbitrary value. 
# Just make sure it's less than 300, ideally <= 100 so you still have some trials left.
context_size = 50

# Prepare surprisal score lists

surprisal_1  = [] # layer  1 at idx  0
surprisal_2  = [] # layer  2 at idx  1
surprisal_3  = [] # layer  3 at idx  2
surprisal_4  = [] # layer  4 at idx  3
surprisal_5  = [] # layer  5 at idx  4
surprisal_6  = [] # layer  6 at idx  5
surprisal_7  = [] # layer  7 at idx  6
surprisal_8  = [] # layer  8 at idx  7
surprisal_9  = [] # layer  9 at idx  8
surprisal_10 = [] # layer 10 at idx  9
surprisal_11 = [] # layer 11 at idx 10
surprisal_12 = [] # layer 12 at idx 11

# ------------------------

# Load full model again:

model = AutoModelForCausalLM.from_pretrained("dbmdz/german-gpt2", pad_token_id = tokenizer.eos_token_id)
print("Loaded model with pretrained weights again - we're all set!")

# ------------------------

# Check how many layers the model has 
# (should be 12, but better safe than sorry)

# Placeholder set to store unique layer indices:
layer_indices = set()

# Loop keys in the model.state_dict dictionary:
for key in model.state_dict().keys():
    # Use regex to check if the current key (aka parameter name) 
    # contains the string "transformer.h." followed by 1 or more digits
    target_string = re.search(r'transformer\.h\.(\d+)\.', key)

    # If there is one, add index to set, 
    # so we collect each index only once (there are 
    # always several tensors for 1 layer)
    if target_string:
        # Get layer index from match object "target_string",
        # then convert layer index from string to integer
        layer_idx = int(target_string.group(1))
        # We only need the unique indices, so add to set we created 
        # before the loop - this will only store indices 
        # that are not already stored in the set:
        layer_indices.add(layer_idx)

# get length of set
num_layers = len(layer_indices)
# show number of layers in console
print("Number of layers in the model:", num_layers)



Loaded model with pretrained weights again - we're all set!
Number of layers in the model: 12


### Compute Surprisal Scores 

#### Plan for now:

Loop the layers:
- Load the full model with the pretrained weights
- Lesion all layers in the model except for the current layer
- Loop words:
    - compute surprisal score for each ID in the current word
    - multiply ID surprisal scores to get word surprisal score
- add surprisal score columns to text df and download file




In [None]:
""" Loop all 12 layers """
for curr_layer_idx in range(num_layers):

    print("\n\n\nStarting to compute surprisal scores for layer", curr_layer_idx+1, 
          ".\nZeroing out weights from the other layers now.\n\n")
    
    # get the list we want to save all surprisal scores for this time scale in.
    # The name of the list depends on the layer index, 
    # so get reference to correct list like this:
    curr_TS_list = eval(f"surprisal_{curr_layer_idx + 1}")

    # Load model with pretrained weights again        
    model = AutoModelForCausalLM.from_pretrained("dbmdz/german-gpt2", pad_token_id = tokenizer.eos_token_id)
    print("Loaded full model with pretrained weights!")

    """ Loop layers again, lesion each of them, 
        but skip layer if it's the layer we want to keep """
    for lesion_layer_idx in range(num_layers):
      
      # If the layer is the layer we want to keep, go to next iteration:
      if lesion_layer_idx == curr_layer_idx:
        print("keeping weights for layer", lesion_layer_idx+1)
        continue
      
      # If it's one of the other layers, exterminate! 
      # (Please read the last word in a dalek voice - thanks.)   
      else:
        print("zeroing out weights for layer", lesion_layer_idx+1)

        # Find all parameter names for the layer we want to lesion
        # (See more detailed comments in the example at the beginning of this script!)
        paramnames = filter(lambda s: re.search(f'transformer.h\.{lesion_layer_idx}\.',s) is not None, model.state_dict().keys())

        # set the weights of these parameters to 0
        # loop parameter names
        for paramname in paramnames:
          # get weight tensor from model.state_dict
          # w is a tensor containing the layer's weights
          w = model.state_dict()[paramname]
          
          # if the tensor w has more than 0 dimension (aka is not empty), 
          # you can set all weights in the tensor to 0.
          if w.ndim > 0:
            w[:] = 0

    print("\n\nModel weights are updated! Starting to compute surprisal scores now.\n")

    # -----------------------------------------

    """ Loop texts """
    for text_nr in text_nrs:
      # get subset of df with current text
      curr_text = texts_df[texts_df["text_nr"] == text_nr]

      """ Loop words & calculate surprisal score for each """
      
      # context_size is the size of the input text chunk, I set it as 50.
      # The first 50 words don't get surprisal scores, 
      # so assign None values instead.
      curr_TS_list.extend([None] * context_size)

      # loop words, start at word with index = 50 (aka the 51st word)
      for word_idx in range(context_size, len(curr_text["word_punct"])):

        """ prepare context chunk """
        # get 50 previous words (context chunk of size context_size)
        previous_words = list(curr_text["word_punct"])[word_idx - context_size : word_idx]

        # turn list of previous words into 1 text string
        previous_words = ' '.join(previous_words)

        # generate token ids for each of the x previous words
        ids_list = tokenizer.encode(previous_words)
        # put the token IDs into an array
        ids_array = np.expand_dims((ids_list), axis = 0) 

        """ prepare actual next word """
        # We should also predict punctuation. 
        # It's not like the words are shown without punctuation on screen.
        actual_word = list(curr_text["word_punct"])[word_idx]

        # Problem: actual word might have multiple token IDs
        # --> get all IDs for current word
        act_word_id = tokenizer.encode(actual_word) 
        # Output looks somewhat like this: [44, 305, 479, 5283]
        
        """ loop IDs of current word """
        curr_id_probs = []
        for curr_id in act_word_id:

            print("computing probability for ID " +  str(curr_id) + " of word " + actual_word)
              
            # generate probabilities for each possible token being the actual next token
            output = model.generate(torch.tensor(ids_array),
                                    return_dict_in_generate = True, 
                                    output_scores = True, 
                                    max_new_tokens = 1) # set output length here - 1 because I only want 1 token
            
            # read out probabilities for all IDs
            logits = output.scores[0] # logits = probabilities with range [0,1] transformed to range [inf, -inf]
            probs = tf.nn.softmax(logits) # transform logits back to probabilities

            # get probability for actual ID being the next one & append it to 
            # array with probabilities of all IDs for current word
            curr_id_probs.append(probs.numpy()[0][curr_id]) 

            # append current token ID to list of previous words (if there are any) 
            # reason: The previous parts of the word are part of the context.
            ids_list = np.append(ids_list, curr_id)
            ids_array = np.expand_dims((ids_list), axis = 0) # put the token IDs into an array

        # multiply all probabilities for current word:
        act_word_prob = np.prod(curr_id_probs)
        
        # transform probability value into surprisal score (negative log of the probability)
        # negative log = log(1 / x) with x being the value you want to get the negative log of.
        # I use e as a base value for the log here.
        surprisal_score = np.log( 1 / act_word_prob )

        # if surprisal score == Inf, set surprisal score to 100
        if surprisal_score == float('inf'):
          surprisal_score = 100

        # collect surprisal score in array for all surprisal scores of current time scale
        curr_TS_list.append(surprisal_score)
        print("\nSurprisal score for actual word " + actual_word +" is " + str(surprisal_score) + ".")
        print("Text Nr = " + str(text_nr) + " - Trial Nr = " + str(word_idx))
        print(" --------------- ")

print("\n\n\nfinished computing surprisal scores\n\n")

# append new surprisal score columns to text_df
texts_df = texts_df.assign(surprisal_1  = surprisal_1, 
                           surprisal_2  = surprisal_2,
                           surprisal_3  = surprisal_3,
                           surprisal_4  = surprisal_4,
                           surprisal_5  = surprisal_5,
                           surprisal_6  = surprisal_6,
                           surprisal_7  = surprisal_7,
                           surprisal_8  = surprisal_8,
                           surprisal_9  = surprisal_9,
                           surprisal_10 = surprisal_10,
                           surprisal_11 = surprisal_11,
                           surprisal_12 = surprisal_12)

# print first 370 rows of df to check if it looks correct
# (depending on chunk size x, first x values of a text should be None values)
#print(texts_df.head(370)) 

""" download texts_df as surprisal_scores.csv """
texts_df.to_csv('surprisal_scores_lesioned_layers.csv', encoding = 'utf-8-sig') 
files.download("surprisal_scores.csv")





Starting to compute surprisal scores for layer 1 .
Zeroing out weights from the other layers now.


Loaded full model with pretrained weights!
keeping weights for layer 1
zeroing out weights for layer 2
zeroing out weights for layer 3
zeroing out weights for layer 4
zeroing out weights for layer 5
zeroing out weights for layer 6
zeroing out weights for layer 7
zeroing out weights for layer 8
zeroing out weights for layer 9
zeroing out weights for layer 10
zeroing out weights for layer 11
zeroing out weights for layer 12


Model weights are updated! Starting to compute surprisal scores now.

computing probability for ID 320 of word der

Surprisal score for actual word der is 11.968019390800702.
Text Nr = text_08 - Trial Nr = 50
 --------------- 
computing probability for ID 20306 of word Felsenstadt
computing probability for ID 388 of word Felsenstadt
computing probability for ID 1483 of word Felsenstadt

Surprisal score for actual word Felsenstadt is 32.97529346233736.
Text Nr = tex

  model = AutoModelForCausalLM.from_pretrained("dbmdz/german-gpt2", pad_token_id = tokenizer.eos_token_id)



Surprisal score for actual word \“Superman\“-Miterfinder is 100.
Text Nr = text_04 - Trial Nr = 295
 --------------- 
computing probability for ID 34190 of word Joe

Surprisal score for actual word Joe is 15.476491038818024.
Text Nr = text_04 - Trial Nr = 296
 --------------- 
computing probability for ID 5675 of word Shuster
computing probability for ID 5025 of word Shuster

Surprisal score for actual word Shuster is 24.769051442406194.
Text Nr = text_04 - Trial Nr = 297
 --------------- 
computing probability for ID 343 of word am

Surprisal score for actual word am is 13.453097634987557.
Text Nr = text_04 - Trial Nr = 298
 --------------- 
computing probability for ID 35686 of word Zeichentisch.
computing probability for ID 4463 of word Zeichentisch.
computing probability for ID 18 of word Zeichentisch.

Surprisal score for actual word Zeichentisch. is 35.23608181977868.
Text Nr = text_04 - Trial Nr = 299
 --------------- 
computing probability for ID 2682 of word sie

Surprisal sc