# Multiple Token Generation

When generating more than one token, use `invoker.next()` to denote following interventions should be applied to the subsequent generations.

Here we generate three tokens and save the hidden states of the last layer for each one:

In [1]:
from nnsight import LanguageModel

model = LanguageModel('gpt2', device_map='cuda')

In [2]:
with model.generate(max_new_tokens=3) as generator:
    with generator.invoke('The Eiffel Tower is in the city of') as invoker:

        hidden_states1 = model.transformer.h[-1].output[0].save()

        invoker.next()
        
        hidden_states2 = model.transformer.h[-1].output[0].save()

        invoker.next()
        
        hidden_states3 = model.transformer.h[-1].output[0].save()


output = generator.output
hidden_states1 = hidden_states1.value
hidden_states2 = hidden_states2.value
hidden_states3 = hidden_states3.value

You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Note how calling save *before* `invoker.next()` returns the hidden state across the initial prompt while calling save *after* returns the hidden state of each subsequent generated token.

In [3]:
print(hidden_states1.shape)
print(hidden_states2.shape)
print(hidden_states3.shape)

torch.Size([1, 10, 768])
torch.Size([1, 1, 768])
torch.Size([1, 1, 768])
