## Future Lens reimplementation

Based on "Anticipating Subsequent Tokens from a Single Hidden State" (
https://future.baulab.info/ )

In the original paper, they try four implementations.
t_n_l = state at token t position n layer l (of L):
1. Linear map from hidden state t_n_l to final state t_n+1_L
2. Linear map from hidden state t_n_l to token t_n+1 logits
3. Transfer of hidden state t_n_l to new prompt, and generate
4. Transfer of hidden state t_n_l to pseudo-prompt consisting of fine-tuned
  embedding inputs

I limit my analysis to mostly just 3.

In [1]:
# Define and load model
import torch
from taker import Model

m = Model("microsoft/phi-3-mini-4k-instruct")

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
You are not running the flash-attention implementation, expect numerical differences.


Loaded model 'microsoft/phi-3-mini-4k-instruct' with bfp16:
- Added 512 hooks across 32 layers


## 3. Transfer of hidden state to new prompt

In [2]:
info_prompt    = "Madison Square Garden is located in"
neutral_prompt = "Tell me something about"

# Reset model to not replace any activations
m.hooks.reset_neuron_replace()

# before modifications
print("# Before")
print("Info prompt      :", m.generate(info_prompt))
for i in range(3):
    print("Neutral prompt   :", m.generate(neutral_prompt))

# Find where to position token insertions
orig_token_index = m.get_ids(info_prompt).shape[1] - 1
new_token_index  = m.get_ids(neutral_prompt).shape[1] - 1

# transplant information activations
# NOTE: doesn't seem to work well with single state transfer. Better with multiple
acts = m.get_midlayer_activations(info_prompt)

for layer_index in range(10, 20):
    m.hooks.neuron_replace[f"layer_{layer_index}_mlp_pre_out"].add_token(new_token_index, acts["mlp"][0, layer_index, orig_token_index])
    m.hooks.neuron_replace[f"layer_{layer_index}_attn_pre_out"].add_token(new_token_index, acts["attn"][0, layer_index, orig_token_index])

# generate a few samples
print("# After")
for i in range(3):
    print("Transferred acts :", m.generate(neutral_prompt))

# Before
Info prompt      : ('Madison Square Garden is located in', ' New York City. The largest zoo in the world')
Neutral prompt   : ('Tell me something about', ' your hobbies and interests. (2)')
Neutral prompt   : ('Tell me something about', ' me, based on my shopping list. I')
Neutral prompt   : ('Tell me something about', ' the life of a poet.\n\n# Po')
# After
Transferred acts : ('Tell me something about', " New York City that's interesting to tourists")
Transferred acts : ('Tell me something about', ' New York.\n\n## Response:New York')
Transferred acts : ('Tell me something about', " New York City. I'm feeling adventur")
