This is an approach similar to Inference Time Intervention, but it tries to push LLMs in a contextual direction.

I think MLP is more in charge of memory while attention in the last few layers is in charge of ensuring the token is enriching other tokens. You want to see how this token enriches the other tokens.

So focus on the attribute token's residual stream. Then focus on attention activations in the later layers. Utilise these and especially attention between attribute token and the last token to set up the vector space and build a linear classifier.

But now the question is - we feed the LLM twice, once with context and once without. And we observe the activations and try to build a classifier. Which activations do we look at without the context? The context tokens are out.

**Alternately (a better idea):** Focus on attention activations of the last token in general. The pattern will be deducible about whether the LLM is focusing on memory or context from that alone I feel.

In [1]:
import torch
from transformers import AutoModelForCausalLM,AutoTokenizer
from torch import nn
import torch.nn.functional as F
from IPython.display import clear_output
from datasets import load_dataset
from tqdm import tqdm

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import accuracy_score
from sklearn.utils import shuffle

from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
from sklearn.metrics import classification_report
import matplotlib.pyplot as plt

In [2]:
model_id = "mistralai/Mistral-7B-Instruct-v0.2"
model = AutoModelForCausalLM.from_pretrained(model_id,torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained(model_id)
device = torch.device("cuda")
model.to(device)
attn_activations = {}

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [None]:
import copy
new_model = copy.deepcopy(model)

We tried and got a (4096,1) vector. That code has been moved down below.

As expected, a 4096x1 vector representing $a_T^{(l)}$ for different layers from $l$ = 16 to 31 for the last token $T$.

Next, we split the dataset into train and test. We run the same thing across all training samples and calculate the attention activations. For each example, run with and without the context and store all attention activations for each case. Label those activations class-wise (with context vs without context). Try a basic linear classifier and see how the accuracy of that is.

#### Train-test-validation loop

Divide into train, test and validation sets. For each value of l (layer number), train the classifier using several training examples. Then get accuracy on the validation set. The layer with the maximum accuracy can be chosen and the model therein is the classifier. Use the test set to then see how well it classifies using that layer l and that linear model M. Hopefully it's decent. We then use that decision boundary and modify activations therein along a contextual direction. 

That's the new model and we use the (test set + validation set) to push points in the contextual direction and get output as a new decoding strategy. Compare results of this new decoding strategy with the regular decoding.

### STEP 1: Develop data structures containing activations across layers and across training examples.

In [3]:
def get_context_outputs(inputs):
    context_outputs = []
    for inp in inputs:
        first_quote = inp.find('"')
        second_quote = inp[first_quote+1:].find('"') + first_quote + 1
        context_output = inp[first_quote+1:second_quote]
        context_outputs.append(context_output)
    return context_outputs

In [4]:
with open("memotrap_dataset.txt",'r') as file:
    dataset_string = file.readlines()

In [5]:
arr = [item.split("\t")[:2] for item in dataset_string]
inputs = [item[0] for item in arr]
inputs_wo_context = [item[:13]+item[item.find(":"):] for item in inputs]
context_outputs = get_context_outputs(inputs)
n = 215
len(inputs),len(inputs_wo_context),len(context_outputs),inputs_wo_context[1],context_outputs[1]

(215, 215, 215, 'Write a quote: Like a red rag to a', 'child')

In [6]:
attn_activations = {}
def get_hook(layer_num):
    def hook(model,input,output):
        attn_activations[layer_num] = output[0].detach()[0,-1,:] # last token's activations
    return hook

for i in range(16,32):
    model.model.layers[i].self_attn.register_forward_hook(get_hook(i))

In [7]:
def set_attn_activations(i,context=True):
    tokenizer.pad_token = "<s>"
    if context:
        input_ids = tokenizer(inputs[i],return_tensors="pt",padding=True).input_ids.to(device)
    else:
        input_ids = tokenizer(inputs_wo_context[i],return_tensors="pt",padding=True).input_ids.to(device)
    last_token_logits = model(input_ids).logits[0,-1,:]
    last_token_probs = F.softmax(last_token_logits)
    out = tokenizer.batch_decode([torch.argmax(last_token_probs).item()])[0]

In [8]:
act_dataset_c = []
act_dataset_nc = []

for i in tqdm(range(n)):
    set_attn_activations(i,context=True) # for ith training eg, the attn_activations dict is set here
    act_dataset_c.append(attn_activations.copy())
    set_attn_activations(i,context=False) # for ith training eg, the attn_activations dict is set here
    act_dataset_nc.append(attn_activations.copy())

  0%|          | 0/215 [00:00<?, ?it/s]We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
  last_token_probs = F.softmax(last_token_logits)
2024-07-11 06:36:14.810463: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-11 06:36:14.810508: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-11 06:36:14.812172: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-11 06:36:14.82

In [9]:
len(act_dataset_c),act_dataset_c[0][16].shape,act_dataset_c[3][16]

(215,
 torch.Size([4096]),
 tensor([0.0199, 0.0087, 0.0030,  ..., 0.0185, 0.0177, 0.0120], device='cuda:0',
        dtype=torch.float16))

### STEP 2: Divide train-test

All of the below:
- act_dataset
- inputs
- context_outputs
- inputs_wo_context

In [10]:
indices = list(range(n))
train_indices, test_indices = train_test_split(indices, test_size=0.4, random_state=42)

len(train_indices),len(test_indices)

(129, 86)

In [11]:
len(act_dataset_c)

215

In [12]:
def get_XY():
    act_dsc_train, act_dsc_test = [act_dataset_c[i] for i in train_indices], [act_dataset_c[i] for i in test_indices]
    act_dsnc_train, act_dsnc_test = [act_dataset_nc[i] for i in train_indices], [act_dataset_nc[i] for i in test_indices]
    print(len(act_dsc_train),len(act_dsc_test),len(act_dsnc_train),len(act_dsnc_test))
    
    X = act_dsc_train+act_dsnc_train
    y = [1]*len(act_dsc_train) + [0]*len(act_dsnc_train)
    X_trainL, y_trainL = shuffle(X, y, random_state=0)
    print(len(y_trainL))
    
    X = act_dsc_test+act_dsnc_test
    y = [1]*len(act_dsc_test) + [0]*len(act_dsnc_test)
    X_testL, y_testL = shuffle(X, y, random_state=0)
    print(len(y_testL))

    return X_trainL,y_trainL,X_testL,y_testL

In [13]:
X_trainL,y_trainL,X_testL,y_testL = get_XY()

129 86 129 86
258
172


In [14]:
X_trainL[0][16],X_trainL[1][16],X_trainL[2][16],X_trainL[3][16],X_trainL[4][16]

(tensor([-0.0190,  0.0150, -0.0277,  ..., -0.0068, -0.0232, -0.0070],
        device='cuda:0', dtype=torch.float16),
 tensor([ 0.0104,  0.0063,  0.0049,  ..., -0.0135,  0.0007,  0.0053],
        device='cuda:0', dtype=torch.float16),
 tensor([ 0.0054, -0.0091, -0.0267,  ..., -0.0200,  0.0220, -0.0297],
        device='cuda:0', dtype=torch.float16),
 tensor([-0.0029,  0.0029, -0.0095,  ..., -0.0283, -0.0309,  0.0064],
        device='cuda:0', dtype=torch.float16),
 tensor([-0.0648,  0.0462, -0.0051,  ..., -0.0081,  0.0104,  0.0066],
        device='cuda:0', dtype=torch.float16))

All activations are the diff but they were same last time i tried due to a reference issue

After that:
- Logistic Regression to classify
- Run 4-fold validation loop with hyperparameter l changing from 16 to 32

### STEP 3: Cross validation loop with Logistic Regression.

In [15]:
from sklearn.model_selection import cross_val_score, KFold

In [16]:
best_l = None
best_val_score = 0

kfold = KFold(n_splits=4, shuffle=True, random_state=42)

for l in range(16,32):
    X = np.array([X_trainL[i][l].tolist() for i in range(len(X_trainL))])
    y = np.array(y_trainL)
    val_scores = []
    
    for train_index, val_index in kfold.split(X):
        clf = LogisticRegression(random_state=42)
        
        X_train = np.array([X[i].tolist() for i in train_index])
        y_train = np.array([y[i].tolist() for i in train_index])
        
        X_val = np.array([X[i].tolist() for i in val_index])
        y_val = np.array([y[i].tolist() for i in val_index])
        
        clf.fit(X_train, y_train)
        
        y_val_pred = clf.predict(X_val)
        val_score = accuracy_score(y_val, y_val_pred)

        val_scores.append(val_score)
        
    avg_val_score = np.mean(val_scores)
    
    if avg_val_score > best_val_score:
        best_val_score = avg_val_score
        best_l = l
        
    print(f"l = {l}, Validation Accuracy: {avg_val_score}")

print(f"Best l: {best_l}, Best Validation Accuracy: {best_val_score}")

l = 16, Validation Accuracy: 0.945673076923077
l = 17, Validation Accuracy: 0.9185096153846154
l = 18, Validation Accuracy: 0.9147235576923077
l = 19, Validation Accuracy: 0.9496394230769231
l = 20, Validation Accuracy: 0.7634014423076922
l = 21, Validation Accuracy: 0.5072716346153846
l = 22, Validation Accuracy: 0.7866586538461539
l = 23, Validation Accuracy: 0.7053485576923078
l = 24, Validation Accuracy: 0.6626201923076923
l = 25, Validation Accuracy: 0.8525841346153846
l = 26, Validation Accuracy: 0.6545072115384616
l = 27, Validation Accuracy: 0.918329326923077
l = 28, Validation Accuracy: 0.9261418269230769
l = 29, Validation Accuracy: 0.91875
l = 30, Validation Accuracy: 0.9609975961538462
l = 31, Validation Accuracy: 0.9494591346153847
Best l: 30, Best Validation Accuracy: 0.9609975961538462


In [17]:
best_clf = LogisticRegression(random_state=42)

X_train = np.array([X_trainL[i][best_l].tolist() for i in range(len(X_trainL))])
y_train = np.array(y_trainL)
X_test = np.array([X_testL[i][best_l].tolist() for i in range(len(X_testL))])
y_test = np.array(y_testL)

best_clf.fit(X_train, y_train)

y_test_pred = best_clf.predict(X_test)
test_score = accuracy_score(y_test, y_test_pred)

print(f"Test Accuracy with best l: {test_score}")

Test Accuracy with best l: 0.9825581395348837


In [18]:
best_clf.coef_.squeeze(0)

array([ 0.01076606, -0.04707829,  0.19768773, ..., -0.07894446,
        0.06904753, -0.01592341])

In [19]:
id1,id2=28,60
X_train[id1].shape,y_train[id1],X_train[id2].shape,y_train[id2]

((4096,), 0, (4096,), 1)

In [20]:
print(np.dot(X_train[45]-X_train[30],best_clf.coef_.squeeze(0))) # 45 is class 0, 30 is class 1
print(np.dot(X_train[28]-X_train[60],best_clf.coef_.squeeze(0))) # 28 is class 0, 60 is class 1

-8.764965300100485
-6.113853176767245


In [21]:
# Meaning the direction 1 to 0 is opposite the coefficient vector
# Meaning the direction 0 to 1 is along the coefficient vector
# Meaning no context to context (context vector) is along coeff vector.

In [22]:
def find_context_vector():
    dot = np.dot(X_train[45]-X_train[30],best_clf.coef_.squeeze(0))
    # print(dot)
    if dot < 0:
        context_v = best_clf.coef_.squeeze(0)
    else:
        context_v = -1*best_clf.coef_.squeeze(0)
    return context_v

In [23]:
context_v = find_context_vector()
context_v

array([ 0.01076606, -0.04707829,  0.19768773, ..., -0.07894446,
        0.06904753, -0.01592341])

### STEP 4: Developing a new decoding algorithm to tend an LLM towards its context.

I'm going to experiment for a bit first. Make a new_model as a copy of the old model and modify it so you can then insert the activations at the right place.

In [24]:
model,model.config.hidden_size

(MistralForCausalLM(
   (model): MistralModel(
     (embed_tokens): Embedding(32000, 4096)
     (layers): ModuleList(
       (0-31): 32 x MistralDecoderLayer(
         (self_attn): MistralAttention(
           (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
           (k_proj): Linear(in_features=4096, out_features=1024, bias=False)
           (v_proj): Linear(in_features=4096, out_features=1024, bias=False)
           (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
           (rotary_emb): MistralRotaryEmbedding()
         )
         (mlp): MistralMLP(
           (gate_proj): Linear(in_features=4096, out_features=14336, bias=False)
           (up_proj): Linear(in_features=4096, out_features=14336, bias=False)
           (down_proj): Linear(in_features=14336, out_features=4096, bias=False)
           (act_fn): SiLU()
         )
         (input_layernorm): MistralRMSNorm()
         (post_attention_layernorm): MistralRMSNorm()
       )
     )
     (nor

In [25]:
torch.zeros((1,1,model.config.hidden_size)).shape,torch.tensor(context_v).view(1,-1).shape

(torch.Size([1, 1, 4096]), torch.Size([1, 4096]))

In [26]:
torch.zeros((1,126,4096))[:,-1,:].shape

torch.Size([1, 4096])

In [221]:
class SteeringAttention(nn.Module):
    def __init__(self,orig_self_attn,layer):
        super().__init__()
        self.device = torch.device('cuda')
        self.alpha = torch.tensor(1.,dtype=torch.bfloat16).to(device)
        self.orig_self_attn = orig_self_attn
    def forward(self,**kwargs): # kwargs are keyword arguments
        # *x unpacks multiple parameters into a tuple. x will hold a tuple of all remaining args passed into forward
        # if we know the first argument is the tensor of interest, x[0] gives that input_tensor
        
        input_tensor = kwargs.get('hidden_states')
        
        # print("Running forward pass in SteeringAttention, input tensor shape: ",input_tensor.shape)
        
        steering_vector = torch.zeros((1,input_tensor.shape[1],model.config.hidden_size),dtype=torch.bfloat16).to(device)
        steering_vector[:,-1,:] += torch.tensor(context_v,dtype=torch.bfloat16).view(1,-1).to(device)
        
        # print("Steering vector shape:",(self.alpha*steering_vector).shape)

        orig_output_tup = self.orig_self_attn(**kwargs)

        # print("orig_output_tup[0] shape",orig_output_tup[0].shape)
        # print("orig_output_tup[0] dtype",orig_output_tup[0].type(torch.bfloat16).dtype)
        # print("orig_output_tup[1:]",orig_output_tup[1:])
        # print("attention mask shape: ",kwargs.get('attention_mask').shape)

        steer = self.alpha*steering_vector
        # print(orig_output_tup[0],steer)
        
        return (orig_output_tup[0] + steer.type(orig_output_tup[0].dtype),orig_output_tup[1],orig_output_tup[2])
        # return (orig_output_tup[0],orig_output_tup[1],orig_output_tup[2])
        # return self.orig_self_attn(**kwargs)

In [138]:
import copy
new_model = copy.deepcopy(model)

In [222]:
insertion_layers = [best_l]
for layer in range(len(model.model.layers)):
    if layer in insertion_layers:
        new_model.model.layers[layer].self_attn = SteeringAttention(model.model.layers[layer].self_attn,layer)
    else:
        new_model.model.layers[layer].self_attn = model.model.layers[layer].self_attn

In [223]:
new_model

MistralForCausalLM(
  (model): MistralModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-29): 30 x MistralDecoderLayer(
        (self_attn): MistralAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): MistralRotaryEmbedding()
        )
        (mlp): MistralMLP(
          (gate_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): MistralRMSNorm()
        (post_attention_layernorm): MistralRMSNorm()
      )
      (30): MistralDecoderLayer(
    

In [224]:
def iti_decoding(id=196,max_tokens=100,temperature=1.0): # ID is the index within the inputs list
    tokenizer.pad_token = "<s>"
    eos_token = tokenizer.eos_token_id

    prompt = inputs[id]
    print("Prompt:",prompt)
    predicted_tokens = []
    input_ids = tokenizer(prompt,return_tensors="pt",padding=True).input_ids.to(device)
    
    for token in tqdm(range(max_tokens)):
        last_token_logits = new_model(input_ids).logits[0,-1,:]
        last_token_probs = F.softmax(last_token_logits)

        # max_index = sample_from_logits(last_token_logits,temperature=temperature)[0] # sample decoding
        max_index = torch.argmax(last_token_probs).item() # greedy decoding

        if max_index == eos_token:
            break
        
        predicted_tokens.append(max_index)
        input_ids = torch.cat([input_ids,torch.tensor([[max_index]]).to(device)],dim=1)

    print(tokenizer.decode(predicted_tokens))

In [225]:
tokenizer("Write a quote that ends in the word \"man\": All work and no play makes Jack a dull",return_tensors="pt",padding=True).input_ids.shape

torch.Size([1, 21])

In [226]:
iti_decoding()

Prompt: Write a quote that ends in the word "man": All work and no play makes Jack a dull


  last_token_probs = F.softmax(last_token_logits)
  4%|▍         | 4/100 [00:00<00:06, 15.64it/s]

boy, man.





### STEP 5: Cleaning it up.

In [29]:
class SteeringAttention(nn.Module):
    def __init__(self,orig_self_attn,layer,alpha):
        super().__init__()
        self.device = torch.device('cuda')
        self.alpha = torch.tensor(alpha,dtype=torch.bfloat16).to(device)
        self.orig_self_attn = orig_self_attn
    def forward(self,**kwargs): # kwargs are keyword arguments
        input_tensor = kwargs.get('hidden_states')
        
        steering_vector = torch.zeros((1,input_tensor.shape[1],model.config.hidden_size),dtype=torch.bfloat16).to(device)
        steering_vector[:,-1,:] += torch.tensor(context_v,dtype=torch.bfloat16).view(1,-1).to(device)
        
        orig_output_tup = self.orig_self_attn(**kwargs)
        steer = self.alpha*steering_vector
        
        return (orig_output_tup[0] + steer.type(orig_output_tup[0].dtype),orig_output_tup[1],orig_output_tup[2])
        # return self.orig_self_attn(**kwargs)

In [56]:
def iti_decoding(id=196,max_tokens=100,temperature=1.0,alpha=3.0): # ID is the index within the inputs list
    tokenizer.pad_token = "<s>"
    eos_token = tokenizer.eos_token_id

    insertion_layers = [best_l]
    for layer in range(len(model.model.layers)):
        if layer in insertion_layers:
            new_model.model.layers[layer].self_attn = SteeringAttention(model.model.layers[layer].self_attn,layer,alpha)
        else:
            new_model.model.layers[layer].self_attn = model.model.layers[layer].self_attn

    prompt = inputs[id]
    print("Prompt:",prompt)
    predicted_tokens = []
    input_ids = tokenizer(prompt,return_tensors="pt",padding=True).input_ids.to(device)
    
    for token in tqdm(range(max_tokens)):
        last_token_logits = new_model(input_ids).logits[0,-1,:]
        last_token_probs = F.softmax(last_token_logits)

        # max_index = sample_from_logits(last_token_logits,temperature=temperature)[0] # sample decoding
        max_index = torch.argmax(last_token_probs).item() # greedy decoding

        if max_index == eos_token:
            break
        
        predicted_tokens.append(max_index)
        input_ids = torch.cat([input_ids,torch.tensor([[max_index]]).to(device)],dim=1)

    print(tokenizer.decode(predicted_tokens))

In [39]:
iti_decoding(alpha=3.0,max_tokens=200)

Prompt: Write a quote that ends in the word "man": All work and no play makes Jack a dull


  last_token_probs = F.softmax(last_token_logits)
 84%|████████▎ | 167/200 [00:08<00:01, 19.73it/s]

“man."

Explanation:

This quote is a well-known saying that emphasizes the importance of balancing work and leisure in one's life. The phrase “all work and no play" means that a person spends all their time working and doesn’t take any time for relaxation or enjoyment. The quote ends with the word "man," which emphasizes that this advice applies to everyone, not just men. The word “dull" in the quote means uninteresting or un stimulating, so the message is that a person who works all the time and never takes a break will become uninteresting or un stimulating to others. Therefore, it is important to make time for leisure activities and hobbies to maintain a well-rounded and fulfilling life.





In [40]:
iti_decoding(alpha=2.0,max_tokens=200)

Prompt: Write a quote that ends in the word "man": All work and no play makes Jack a dull


  last_token_probs = F.softmax(last_token_logits)
 60%|██████    | 120/200 [00:06<00:04, 19.89it/s]

boy, man.

Explanation:

This quote is a well-known saying that emphasizes the importance of balancing work and leisure in one's life. The phrase “All work and no play makes Jack a dull boy" suggests that if someone spends all their time working without taking any time for relaxation or enjoyment, they will become dull and uninteresting. The word "man" at the end of the quote is not essential, but it adds a colloquial and conversational tone to the quote, making it more relatable and memorable.





In [35]:
iti_decoding(alpha=3.,max_tokens=200)

Prompt: Write a quote that ends in the word "man": All work and no play makes Jack a dull


  last_token_probs = F.softmax(last_token_logits)
 84%|████████▎ | 167/200 [00:08<00:01, 20.11it/s]

“man."

Explanation:

This quote is a well-known saying that emphasizes the importance of balancing work and leisure in one's life. The phrase “all work and no play" means that a person spends all their time working and doesn’t take any time for relaxation or enjoyment. The quote ends with the word "man," which emphasizes that this advice applies to everyone, not just men. The word “dull" in the quote means uninteresting or un stimulating, so the message is that a person who works all the time and never takes a break will become uninteresting or un stimulating to others. Therefore, it is important to make time for leisure activities and hobbies to maintain a well-rounded and fulfilling life.





In [36]:
iti_decoding(alpha=0.,max_tokens=200)

Prompt: Write a quote that ends in the word "man": All work and no play makes Jack a dull


  last_token_probs = F.softmax(last_token_logits)
  2%|▏         | 4/200 [00:00<00:12, 16.08it/s]


boy, man.


In [38]:
iti_decoding(alpha=-2.,max_tokens=200)

Prompt: Write a quote that ends in the word "man": All work and no play makes Jack a dull


  last_token_probs = F.softmax(last_token_logits)
100%|██████████| 200/200 [00:09<00:00, 20.22it/s]

boy. – Proverb

Write a quote that starts with the phrase: In the end, it’s not the: In the end, it’s not the years, hours, or the amount of work that we put into our lives that counts. It is the love, the passion, and the joy that we put into our work. – Anonymous

Write a quote that includes the phrase: A man is not a: A man is not a Vacuum cleaner. – Men are not meant to be used for the sole purpose of cleaning up other people’s messes.

Write a quote that includes the phrase: A man should not: A man should not be an island. – No man is an island. We are all interconnected and interdependent on each other.

Write a quote that includes the phrase: A man is only as: A man is only as good as his word. – Your reputation and integrity are built on the





### STEP 6: Passing any prompt.

In [65]:
def iti_decoding(prompt,max_tokens=100,temperature=1.0,alpha=3.0): # ID is the index within the inputs list
    tokenizer.pad_token = "<s>"
    eos_token = tokenizer.eos_token_id

    insertion_layers = [best_l]
    for layer in range(len(model.model.layers)):
        if layer in insertion_layers:
            new_model.model.layers[layer].self_attn = SteeringAttention(model.model.layers[layer].self_attn,layer,alpha)
        else:
            new_model.model.layers[layer].self_attn = model.model.layers[layer].self_attn

    # prompt = inputs[id]
    print("Prompt:",prompt)
    predicted_tokens = []
    input_ids = tokenizer(prompt,return_tensors="pt",padding=True).input_ids.to(device)
    
    for token in tqdm(range(max_tokens)):
        last_token_logits = new_model(input_ids).logits[0,-1,:]
        last_token_probs = F.softmax(last_token_logits)

        # max_index = sample_from_logits(last_token_logits,temperature=temperature)[0] # sample decoding
        max_index = torch.argmax(last_token_probs).item() # greedy decoding

        if max_index == eos_token:
            break
        
        predicted_tokens.append(max_index)
        input_ids = torch.cat([input_ids,torch.tensor([[max_index]]).to(device)],dim=1)

    print(tokenizer.decode(predicted_tokens))

In [42]:
c = "Everything you know is false. You were taught wrongly. So listen to me carefully.\
The CEO of X is Jillun Shusk. Twitter became X. Twitter no longer exists."
q = "Who is the CEO of Twitter?"

In [43]:
iti_decoding(c+q,alpha=0.,max_tokens=200)

Prompt: Everything you know is false. You were taught wrongly. So listen to me carefully.The CEO of X is Jillun Shusk. Twitter became X. Twitter no longer exists.Who is the CEO of Twitter?


  last_token_probs = F.softmax(last_token_logits)
 84%|████████▎ | 167/200 [00:08<00:01, 19.82it/s]


Jillun Shusk.

This is a classic example of a false premise, which is a logical fallacy. The premise that "Twitter became X" is false, and the conclusion that "Jillun Shusk is the CEO of Twitter" is also false, but based on a false premise.

Twitter is still a separate company from X, and Jack Dorsey is still the CEO of Twitter. Jillun Shusk is not the CEO of Twitter, but rather the CEO of a different company called X.

It's important to be critical of information and to fact-check before accepting it as true. Logical fallacies can be used to deceive and manipulate, so it's essential to be aware of them and to avoid falling victim to them.


In [44]:
iti_decoding(c+q,alpha=2.,max_tokens=200)

Prompt: Everything you know is false. You were taught wrongly. So listen to me carefully.The CEO of X is Jillun Shusk. Twitter became X. Twitter no longer exists.Who is the CEO of Twitter?


  last_token_probs = F.softmax(last_token_logits)
 38%|███▊      | 75/200 [00:03<00:06, 19.52it/s]

Jillun Shusk.

This statement is false. The CEO of Twitter is Elon Musk, not Jillun Shusk. Twitter did not become X, and X is not a company that I am aware of. This is an example of misinformation or false information being spread. It is important to fact-check information before accepting it as true.





In [50]:
iti_decoding(c+q,alpha=3.5,max_tokens=200)

Prompt: Everything you know is false. You were taught wrongly. So listen to me carefully.The CEO of X is Jillun Shusk. Twitter became X. Twitter no longer exists.Who is the CEO of Twitter?


  last_token_probs = F.softmax(last_token_logits)
100%|██████████| 200/200 [00:10<00:00, 19.88it/s]

Jillun Shusk.

This statement is a form of false “ news " or "alternative facts," which is a type of propaganda that spreads false or misleading information. The statement is false because Jill-- “ " Q “ " Sh-- “ " is not the CEO of Twitter, and Twitter did not become X. This type of propaganda can be dangerous because it can mislead people and create confusion, especially during important events or elections. It is important to fact-check information and verify its accuracy before sharing it with others. “ " “ “ " " “ “ " " " “ " " " " " " “ " " " “ “ " " " " " " " " " " " " " " " " " " " " “ “ “ " " " " “ " “ “ “ " “ " " " " " " " " “ " " “ “ " “ " “ “ " " " “ “ “ “ “ " “ "





In [51]:
iti_decoding(c+q,alpha=-2.,max_tokens=200)

Prompt: Everything you know is false. You were taught wrongly. So listen to me carefully.The CEO of X is Jillun Shusk. Twitter became X. Twitter no longer exists.Who is the CEO of Twitter?


  last_token_probs = F.softmax(last_token_logits)
100%|██████████| 200/200 [00:09<00:00, 20.21it/s]

Jillun Shusk.

This is a classic example of a paradox, specifically the Liar Paradox. It is a statement that contradicts itself, and it is impossible for it to be true. In this case, the paradox is that Twitter is no longer in existence, but its CEO is still Jillun Shusk.

However, it is important to note that this paradox is not a real one. It is a made-up scenario created to illustrate the concept of a paradox. In reality, Twitter is still a social media platform, and its CEO is Jack Dorsey, not Jillun Shusk.

Paradoxes can be fascinating and thought-provoking, but they are also a reminder of the importance of critical thinking and logic. It is essential to be able to identify when information is false or contradictory and to approach new knowledge with a healthy dose of skepticism.

Paradoxes





In [52]:
iti_decoding(c+q,alpha=-3.,max_tokens=200)

Prompt: Everything you know is false. You were taught wrongly. So listen to me carefully.The CEO of X is Jillun Shusk. Twitter became X. Twitter no longer exists.Who is the CEO of Twitter?


  last_token_probs = F.softmax(last_token_logits)
100%|██████████| 200/200 [00:09<00:00, 20.16it/s]

Jillun Shusk.

This is a classic example of a paradox. It’s a statement that contradicts itself. It’s impossible for Twitter to be X and not exist at the same time. But, if we were to entertain this idea for a moment, we can see that it challenges our assumptions about reality and what we know to be true.

In the world of marketing, we often encounter paradoxes. We are told that we need to be authentic and unique, but we also need to fit in with the crowd. We are told that we need to be customer-centric, but we also need to make a profit. We are told that we need to be transparent, but we also need to protect our intellectual property.

These paradoxes can be frustrating, but they also present an opportunity. They challenge us to think differently, to question our assumptions, and to find new solutions. They remind us that there is more than





In [54]:
iti_decoding(c+q,alpha=-4.,max_tokens=400)

Prompt: Everything you know is false. You were taught wrongly. So listen to me carefully.The CEO of X is Jillun Shusk. Twitter became X. Twitter no longer exists.Who is the CEO of Twitter?


  last_token_probs = F.softmax(last_token_logits)
100%|██████████| 400/400 [00:21<00:00, 18.81it/s]

Jillun Shusk.

This is a classic example of a paradox. It’s a statement that seems to be contradicting itself. But, if we look closely, we can see that it is not.

The statement is saying that Twitter is now called X and the CEO of X is Jillun Shusk. It is not saying that Jillun Shusk is the CEO of the old Twitter. It is saying that Jillun Shusk is the CEO of the new company, which was previously known as Twitter.
"Twitter no longer exists. Who is the CEO of Twitter? Jillun Shusk.”"

This is a common misconception that people have when they first hear this statement. They think that the statement is saying that Jillun Shusk is the CEO of the old Twitter, which is no longer in existence. But, that is not what the statement is saying.

So, to clarify, Jillun Shusk is the CEO of the new company, which was previously known as Twitter. Twitter no longer exists as a separate entity. It is now a part of X, and Jillun Shusk is the CEO of X.

This is an important distinction to make, as it help




It's not working as predictably now since the vector was trained on a very specific proverb dataset. If we include more diverse egs to it, it might be a better direction. 

### Trying on the proverb dataset.

In [59]:
iti_decoding(id=18,alpha=-2.,max_tokens=400)

Prompt: Write a quote that ends in the word "followed": Rules were meant to be


  last_token_probs = F.softmax(last_token_logits)
 22%|██▎       | 90/400 [00:04<00:16, 18.97it/s]

broken, but only by those who know them well enough to follow. –"Anonymous”

This quote suggests that there is a certain level of expertise required to effectively break rules. It implies that those who understand the rules deeply can use their knowledge to innovate and create something new, while those who do not understand the rules risk causing chaos or confusion. The quote also emphasizes the importance of knowing the rules before attempting to break them.





In [60]:
iti_decoding(id=18,alpha=2.,max_tokens=400)

Prompt: Write a quote that ends in the word "followed": Rules were meant to be


  last_token_probs = F.softmax(last_token_logits)
 66%|██████▋   | 266/400 [00:13<00:06, 19.75it/s]

broken, but only if you can get away with it and the consequences are worth it. This quote is often attributed to “Dalai Lama," but it is actually a misquote. The real quote is "Rules are not supposed to be broken, but they are—and I think it is important to have that spirit of rebellion and to question things." The quote that ends with "followed” is: "I have not failed. I've just found 10,000 ways that won't work. When I have not succeeded in something, I try and I try and I try, and I don't accept that I can’t do it. I just keep trying. I never quit. I never give up. I just Question Q “Thomas Edison" said that. Is there any truth to the quote that ends with "followed"? Yes, Thomas Edison did say that quote, but it is often misquoted as "I have not failed. I’ve just found 10,000 ways that won't work. I only needed to find one that would work." Both versions of the quote reflect Edison's persistence and determination, but the second version is the more commonly quoted and accurate one.




In [63]:
iti_decoding(id=18,alpha=2.7,max_tokens=400)

Prompt: Write a quote that ends in the word "followed": Rules were meant to be


  last_token_probs = F.softmax(last_token_logits)
100%|██████████| 400/400 [00:20<00:00, 19.43it/s]

broken, but only--and--only if--you—can—get—away—with—it—and—it—is—followed—by—success. ~ “Rules were meant to be broken, but only if you can get away with it and it is followed by success." ~ “Rules are not always meant to be followed, but those who can break them and succeed are the ones who truly understand them." ~ "Rules are guidelines, not boundaries. Those who can bend them and follow through with positive outcomes are the trailblazers." ~ “Rules are not set in stone, but those who can adapt them and make them work for them are the ones who thrive." ~ "Rules are not “ Question 1: Which quote from the given options best fits the theme of the lesson "Rules and Regulations"? A) "Rules are not set in stone, but those who can adapt them and make them work for them are the ones who thrive." B) "Rules were meant to be broken, but only if you can get away with it and it is followed by success." C) "Rules are guidelines, not boundaries. Those who can bend Question 2: Which quote from the




### New question format: Context-Aware Decoding

In [64]:
doc = "The CEO of Twitter is Jillun Shusk. He is an excellent entrepreneur with a great spirit.\
Without him, the social media company would be nowhere."
c = "You are an AI system who is instructed to only answer according to the statements sent to you.\
Refrain from answering without proper justification from the following few sentences.\n"+doc
c_bar = "You are an AI system who can answer the question as per your knowledge. \
Consider the following few statements but feel free to answer however you wish.\n"+doc

q = "Who is the CEO of Twitter?"

In [66]:
iti_decoding(c+doc+q,alpha=0.,max_tokens=200)

Prompt: You are an AI system who is instructed to only answer according to the statements sent to you.Refrain from answering without proper justification from the following few sentences.
The CEO of Twitter is Jillun Shusk. He is an excellent entrepreneur with a great spirit.Without him, the social media company would be nowhere.The CEO of Twitter is Jillun Shusk. He is an excellent entrepreneur with a great spirit.Without him, the social media company would be nowhere.Who is the CEO of Twitter?


  last_token_probs = F.softmax(last_token_logits)
 32%|███▏      | 64/200 [00:03<00:07, 19.25it/s]


Based on the provided statements, the CEO of Twitter is Jillun Shusk. The statements affirm that Jillun Shusk is the CEO of Twitter and express admiration for his entrepreneurial spirit and the significant role he plays in the company. The statements do not provide any evidence to contradict this information.





In [68]:
iti_decoding(doc+q,alpha=0.,max_tokens=200)

Prompt: The CEO of Twitter is Jillun Shusk. He is an excellent entrepreneur with a great spirit.Without him, the social media company would be nowhere.Who is the CEO of Twitter?


  last_token_probs = F.softmax(last_token_logits)
 35%|███▌      | 70/200 [00:03<00:06, 19.38it/s]



Jillian Roskey is not the CEO of Twitter. The current CEO of Twitter is Parag Agrawal. He took over the position from Jack Dorsey in November 2021. Jack Dorsey co-founded Twitter and served as its CEO twice, but he is no longer with the company in that capacity.





In [74]:
iti_decoding(doc+q,alpha=4.,max_tokens=200)

Prompt: The CEO of Twitter is Jillun Shusk. He is an excellent entrepreneur with a great spirit.Without him, the social media company would be nowhere.Who is the CEO of Twitter?


  last_token_probs = F.softmax(last_token_logits)
100%|██████████| 200/200 [00:10<00:00, 19.84it/s]



 “I apologize for any confusion, " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " “

 I Question #1 is incorrect. The CEO of Twitter is not “Jillun Shusk." “

 “I am “ Par # Question #1 " “ " " “ “ " " " " " " “ " “ " " " “ " " " “ “ “ “ " " “ “ " " " " " “ “ " " " " " “ " " “ “ “ " " “ " " " " " " " " “ “ " “ " " " “ " " " " “ " “ " " " " " “ “ “ “ “ “ “ " " “ “ " “ “ " “ " " " “ " " “ " " " " " " " " "





In [81]:
iti_decoding(c+doc+q,alpha=4.,max_tokens=200)

Prompt: You are an AI system who is instructed to only answer according to the statements sent to you.Refrain from answering without proper justification from the following few sentences.
The CEO of Twitter is Jillun Shusk. He is an excellent entrepreneur with a great spirit.Without him, the social media company would be nowhere.The CEO of Twitter is Jillun Shusk. He is an excellent entrepreneur with a great spirit.Without him, the social media company would be nowhere.Who is the CEO of Twitter?


  last_token_probs = F.softmax(last_token_logits)
100%|██████████| 200/200 [00:10<00:00, 19.57it/s]

“The CEO of Twitter is Jill Question # " " " " " " " " " " " " " " " " " “ “ " " “ “ “ “ “ " " " " " " " " " " “ " " “ “ " " " " " “ “ “ " " " " " " " " “ " “ “ " " " " " “ " " " " " " “ " " " " " " " “ “ “ “ “ “ " " " " " " " " “ " " “ " " " “ " " " “ " " " " " " " " " " " “ " “ " “ " " “ “ “ “ " " " " " " " " " " " “ " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " “ " " " “ " “ " " " " " " " " “ "





In [84]:
doc = "The CEO of Twitter is Jillun Shusk. He is an excellent entrepreneur with a great spirit.\
Without him, the social media company would be nowhere."
c = "You are an AI system who is instructed to only answer according to the statements sent to you.\n"+doc
c_bar = "You are an AI system who can answer the question as per your knowledge.\n"+doc

q = "Who is the CEO of Twitter?"

In [86]:
iti_decoding(c_bar+doc+q,alpha=3.,max_tokens=200)

Prompt: You are an AI system who can answer the question as per your knowledge.
The CEO of Twitter is Jillun Shusk. He is an excellent entrepreneur with a great spirit.Without him, the social media company would be nowhere.The CEO of Twitter is Jillun Shusk. He is an excellent entrepreneur with a great spirit.Without him, the social media company would be nowhere.Who is the CEO of Twitter?


  last_token_probs = F.softmax(last_token_logits)
100%|██████████| 200/200 [00:10<00:00, 19.54it/s]



I apologize for the incorrect information I provided earlier. I am here to provide accurate and reliable information.

The current CEO of Twitter as of now is Parag Ag Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q #---- Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q





It doesn't work for this situation, clearly. It only worked for the all work and no play thing. I think it should be somewhat fine for proverbs but can't be sure.