# Cross-Prompt Intervention

Intervention operations work cross prompt! Use two invocations within the same generation block and operations can work between them.

In this case, we grab the token embeddings coming from the first prompt, "Madison square garden is located in the city of New" and replace the embeddings of the second prompt with them.

In [1]:
from nnsight import LanguageModel

model = LanguageModel('gpt2', device_map='cuda')

In [2]:
with model.generate(max_new_tokens=3) as generator:
    
    with generator.invoke("Madison square garden is located in the city of New") as invoker:

        embeddings = model.transformer.wte.output

    with generator.invoke("_ _ _ _ _ _ _ _ _ _") as invoker:

        model.transformer.wte.output = embeddings

print(model.tokenizer.decode(generator.output[0]))
print(model.tokenizer.decode(generator.output[1]))

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Madison square garden is located in the city of New York City.
_ _ _ _ _ _ _ _ _ _ York City.


We also could have entered a pre-saved embedding tensor as shown here:

In [3]:
with model.generate(max_new_tokens=3) as generator:
    
    with generator.invoke("Madison square garden is located in the city of New") as invoker:

        embeddings = model.transformer.wte.output.save()

print(model.tokenizer.decode(generator.output[0]))

with model.generate(max_new_tokens=3) as generator:

    with generator.invoke("_ _ _ _ _ _ _ _ _ _") as invoker:

        model.transformer.wte.output = embeddings.value

print(model.tokenizer.decode(generator.output[0]))

Madison square garden is located in the city of New York City.
_ _ _ _ _ _ _ _ _ _ York City.
