## Auto-Regressive Decoding in Language Models (Seq2Seq and Decoder only models)

One final unit of decoding LLMs (pun intended :) involves understanding how to actually generate sequences given a representation of the input. We'll do a light exercise on the most basic run of the decoding here and then you can combine this pipeline with the decoding methods discussed in the previous class to observe variations in different kind of outputs. 

In [None]:
!pip install transformers 
!pip install sentencepiece
!pip install torch
!pip install sacremoses

In [94]:
from transformers import MarianMTModel, MarianTokenizer
import torch
import numpy as np

tokenizer = MarianTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-hi")
model = MarianMTModel.from_pretrained("Helsinki-NLP/opus-mt-en-hi")

# create ids of encoded input vectors
input_ids = tokenizer("I want to buy a car", return_tensors="pt").input_ids
print(f'{input_ids} are the input ids')


decoder_input_ids = tokenizer("<pad>", add_special_tokens=False, return_tensors="pt").input_ids
print(f'{decoder_input_ids} is the decoder input ids')

# let's feed this input to our model
outputs = model(input_ids, decoder_input_ids = decoder_input_ids, return_dict=True)

encoded_sequence = (outputs.encoder_last_hidden_state,)
print(encoded_sequence)
# now that we have our inputs representation, let's decode 

while True: 
  # pass our encoder representation and decoder start id to our decoder 
  lm_logits = model(None, encoder_outputs=encoded_sequence, decoder_input_ids=decoder_input_ids, return_dict=True).logits

  # pick the likeliest token 
  next_decoder_input_ids = torch.argmax(lm_logits[:, -1:], axis=-1)
  print(next_decoder_input_ids)
  
  # concatenate that with our current decoder ids 
  decoder_input_ids = torch.cat([decoder_input_ids, next_decoder_input_ids], axis=-1)
  print(decoder_input_ids)
  print(f"Generated so far: {tokenizer.decode(decoder_input_ids[0], skip_special_tokens=True)}")

  # stop when you encounted the <eos>
  if next_decoder_input_ids == tokenizer.eos_token_id:
    print(f'Final translation is {tokenizer.decode(decoder_input_ids[0], skip_special_tokens=True)}')
    break 




tensor([[  56,  385,    7, 5333,   19, 3869,    0]]) are the input ids
tensor([[61949]]) is the decoder input ids
(tensor([[[ 0.9560, -0.1123,  0.1342,  ..., -0.2770,  0.3980,  1.0380],
         [-0.4514,  0.4379, -0.3578,  ..., -0.3392, -0.6470,  0.2711],
         [-0.1239,  0.9591, -0.3312,  ...,  0.0969, -0.0748,  0.2844],
         ...,
         [-0.0996,  0.5244,  0.5465,  ...,  0.6494, -0.2613, -0.1096],
         [-0.1579,  0.0308,  0.6918,  ..., -0.1743,  0.2450, -0.2864],
         [-0.0543, -0.1454, -0.0861,  ..., -0.0810, -0.1390,  0.1813]]],
       grad_fn=<NativeLayerNormBackward0>),)
tensor([[104]])
tensor([[61949,   104]])
Generated so far: मैं
tensor([[38]])
tensor([[61949,   104,    38]])
Generated so far: मैं एक
tensor([[3444]])
tensor([[61949,   104,    38,  3444]])
Generated so far: मैं एक कार
tensor([[10261]])
tensor([[61949,   104,    38,  3444, 10261]])
Generated so far: मैं एक कार खरीद
tensor([[448]])
tensor([[61949,   104,    38,  3444, 10261,   448]])
Generated s

In [93]:
print(np.shape(encoded_sequence[0]), np.shape(lm_logits[0]))

torch.Size([1, 7, 512]) torch.Size([8, 61950])


But what about a model that does not have an encoder ?  Current models like GPT (Generative, Pretrained Models) do not have an encoder. How does one decode in those scenarios ? 

In [80]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

text = "I went "
encoded_input = tokenizer.encode(text, return_tensors='pt')
output = model(input_ids = encoded_input)

while True:
  logits = output.logits
  next_decoder_input_ids = torch.argmax(logits[:, -1:], axis=-1)
  encoded_input = torch.cat([encoded_input, next_decoder_input_ids], axis=-1)
  print(f"Current Generation: {tokenizer.decode(encoded_input[0], skip_special_tokens=True)}")
  if len(encoded_input[0]) >= 30: 
    print(f"Complete Generation: {tokenizer.decode(encoded_input[0], skip_special_tokens=True)}")
    break 
  
  output = model(encoded_input)

Current Generation: I went  
Current Generation: I went  to
Current Generation: I went  to the
Current Generation: I went  to the 
Current Generation: I went  to the University
Current Generation: I went  to the University of
Current Generation: I went  to the University of California
Current Generation: I went  to the University of California,
Current Generation: I went  to the University of California, Berkeley
Current Generation: I went  to the University of California, Berkeley,
Current Generation: I went  to the University of California, Berkeley, and
Current Generation: I went  to the University of California, Berkeley, and I
Current Generation: I went  to the University of California, Berkeley, and I was
Current Generation: I went  to the University of California, Berkeley, and I was there
Current Generation: I went  to the University of California, Berkeley, and I was there for
Current Generation: I went  to the University of California, Berkeley, and I was there for a
Current 

That's it! Now mix and match this decoding pipeline with the methods that we have discussed before to understand the effect of adopting different sampling strategies on top of this autoregressive pipelines. Now let's head over to https://chat.openai.com/ to see a few quirks of this generation. 