## Sampling

In [2]:
from transformers import AutoTokenizer, AutoModelForCausalLM

In [3]:
model_checkpoint = "gpt2"

In [4]:
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
model = AutoModelForCausalLM.from_pretrained(model_checkpoint)

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [5]:
tokenizer.vocab_size

50257

In [6]:
model

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D(nf=2304, nx=768)
          (c_proj): Conv1D(nf=768, nx=768)
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D(nf=3072, nx=768)
          (c_proj): Conv1D(nf=768, nx=3072)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)

In [7]:
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"

In [8]:
model.to(device)
model.eval()

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D(nf=2304, nx=768)
          (c_proj): Conv1D(nf=768, nx=768)
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D(nf=3072, nx=768)
          (c_proj): Conv1D(nf=768, nx=3072)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)

In [11]:
prompt = "The sky is"
input_tokens = tokenizer.encode(prompt, return_tensors="pt").to(device)
input_tokens

tensor([[ 464, 6766,  318]], device='cuda:0')

In [16]:
model(input_tokens).logits.shape

torch.Size([1, 3, 50257])

In [17]:
last_token_logits = model(input_tokens).logits[:, -1, :]
last_token_logits.shape

torch.Size([1, 50257])

In [35]:
next_token_id_argmax = last_token_logits.argmax(dim=-1)
next_token_id_argmax

tensor([262], device='cuda:0')

In [40]:
tokenizer.decode(next_token_id_argmax)

' the'

## Full Text Generation using Greedy Sampling

In [146]:
total_responses = 5

In [148]:
max_new_tokens = 30

# manual greedy decoding
for i in range(total_responses):
  prompt = "Once upon a time"
  input_tokens = tokenizer.encode(prompt, return_tensors="pt").to(device)

  for _ in range(max_new_tokens):
    with torch.no_grad():
      output = model(input_ids = input_tokens)

    logits = output.logits[:, -1, :]
    next_token_id = logits.argmax(dim=-1).unsqueeze(0)

    input_tokens = torch.cat([input_tokens, next_token_id], dim=-1)

  output_text = tokenizer.decode(input_tokens[0], skip_special_tokens=True)
  print(f"Response {i}: {output_text}")

Response 0: Once upon a time, the world was a place of great beauty and great danger. The world was a place of great danger, and the world was a place of great
Response 1: Once upon a time, the world was a place of great beauty and great danger. The world was a place of great danger, and the world was a place of great
Response 2: Once upon a time, the world was a place of great beauty and great danger. The world was a place of great danger, and the world was a place of great
Response 3: Once upon a time, the world was a place of great beauty and great danger. The world was a place of great danger, and the world was a place of great
Response 4: Once upon a time, the world was a place of great beauty and great danger. The world was a place of great danger, and the world was a place of great


## Full Text Generation using Random Sampling

In [149]:
max_new_tokens = 30

# manual greedy decoding
for i in range(total_responses):
  prompt = "Once upon a time"
  input_tokens = tokenizer.encode(prompt, return_tensors="pt").to(device)

  for _ in range(max_new_tokens):
    with torch.no_grad():
      output = model(input_ids = input_tokens)

    logits = output.logits[:, -1, :]
    next_token_id = torch.multinomial(torch.softmax(logits, dim=-1), num_samples=1)

    input_tokens = torch.cat([input_tokens, next_token_id], dim=-1)

  output_text = tokenizer.decode(input_tokens[0], skip_special_tokens=True)
  print(f"Response {i}: {output_text}")

Response 0: Once upon a time, aliens replaced humans as productivity aids for the production of Tianinfuex, a sentient intelligent offspring of Buttermilk, the majority of which are
Response 1: Once upon a time you would strive seek realism, New York was all but destroyed by Trump's Super PACs. One need only to look at the wildly unsuccessful attempts by Bill
Response 2: Once upon a time she saw this? Perhaps the plain steel roof. As she thought, about those magnificent projection cats staying behind, one thought. This sentence drew her ly
Response 3: Once upon a time there was nothing remarkable in birth about Qing. His four favorite people or cultivators were very intelligent, extremely gentle and sensible– as well as wise-
Response 4: Once upon a time he was as helpless as a chicken at shaving his arms in a feather chase, as his tail with thumb stretched and now demanded to be shortened. By


## Temperature Scaling

In [153]:
max_new_tokens = 30
temperature = 0.2
# lower = more conservative Higher = more random
for i in range(total_responses):
  prompt = "Once upon a time"
  input_tokens = tokenizer.encode(prompt, return_tensors="pt").to(device)
  for _ in range(max_new_tokens):
      with torch.no_grad():
        output = model(input_ids = input_tokens)

      logits = output.logits[:, -1, :]
      logits = logits / temperature
      next_token_id = torch.multinomial(torch.softmax(logits, dim=-1), num_samples=1)

      input_tokens = torch.cat([input_tokens, next_token_id], dim=-1)

  output_text = tokenizer.decode(input_tokens[0], skip_special_tokens=True)
  print(f"Response {i}: {output_text}")

Response 0: Once upon a time, the whole world was a land of the dead. The dead were the living, the dead were the dead. The dead were the living, the
Response 1: Once upon a time, the world was a place of great beauty and beauty, and the world was a place of great danger. And the world was a place of great
Response 2: Once upon a time, the world was filled with the sound of the wind, and the sound of the moon. The world was filled with the sound of the moon,
Response 3: Once upon a time, the world was a place of great beauty, but now it is a place of darkness. The world is filled with darkness, and the darkness is
Response 4: Once upon a time, the world was a place of peace, but now it is a place of war.

The world is a place of war.




## Top K Sampling

In [159]:
max_new_tokens = 30
top_k = 10

for i in range(total_responses):
  prompt = "Once upon a"
  input_tokens = tokenizer.encode(prompt, return_tensors="pt").to(device)

  for _ in range(max_new_tokens):
    with torch.no_grad():
      output = model(input_ids = input_tokens)

    logits = output.logits[:, -1, :]

    # Top-k filtering
    topk_logits, topk_indices = torch.topk(logits, k=top_k, dim=-1)
    probs = torch.softmax(topk_logits, dim=-1)
    next_token = torch.multinomial(probs, num_samples=1)

    # mapping back to the original vocab indices
    next_token_id = topk_indices.gather(-1, next_token)
    input_tokens = torch.cat([input_tokens, next_token_id], dim=-1)

  output_text = tokenizer.decode(input_tokens[0], skip_special_tokens=True)
  print(f"Response {i}: {output_text}")

Response 0: Once upon a time, you may choose to pay for all of the costs incurred by the Company in connection with the purchase.

3.6

3
Response 1: Once upon a second glance at her body and the way the head held her, it wasn't a man or a woman. It was something more in tune with what
Response 2: Once upon a sudden the light suddenly dims and disappears. This is the most common occurrence of the human mind.

The reason behind this is unknown.

Response 3: Once upon a time, the world's most powerful corporation, known as the U.S. government, is trying to control the world by creating a super-vill
Response 4: Once upon a time the king of England said to me, 'Lord, I have a letter, or rather my letter,' and I said 'Oh, I am


## Top P Sampling (Nucleus)

In [167]:
# Sampling config
max_new_tokens = 30
top_p = 0.9  # Keep only tokens with cumulative prob <= top_p

for i in range(total_responses):
  prompt = "Once upon a time"
  input_tokens = tokenizer.encode(prompt, return_tensors="pt").to(device)
  for _ in range(max_new_tokens):
      with torch.no_grad():
          output = model(input_ids=input_tokens)

      logits = output.logits[:, -1, :]
      probs = torch.softmax(logits, dim=-1)

      # Sort the probabilities descending
      sorted_probs, sorted_indices = torch.sort(probs, descending=True, dim=-1)
      cumulative_probs = torch.cumsum(sorted_probs, dim=-1)

      # Mask tokens where cumulative probability > top_p
      sorted_mask = cumulative_probs > top_p
      # Shift the mask right to always include the first token above the threshold
      sorted_mask[..., 1:] = sorted_mask[..., :-1].clone()
      sorted_mask[..., 0] = False

      # Set masked tokens' probabilities to 0
      sorted_probs[sorted_mask] = 0
      sorted_probs = sorted_probs / sorted_probs.sum(dim=-1, keepdim=True)  # re-normalize

      # Sample from the filtered distribution
      next_token = torch.multinomial(sorted_probs, num_samples=1)

      # Map back to original token ID
      next_token_id = sorted_indices.gather(-1, next_token)

      input_tokens = torch.cat([input_tokens, next_token_id], dim=-1)

  # Decode and print
  output_text = tokenizer.decode(input_tokens[0], skip_special_tokens=True)
  print(f"Response {i}: {output_text}")

Response 0: Once upon a time in this world, the scriptures speak of the separation of man and beast. It was after this manner that they say about God, the Word of God
Response 1: Once upon a time the internal combustion engine ran on standby for 20 minutes, the heat was banished. This short delay allowed the idea to appear and move smoothly, much like
Response 2: Once upon a time, the Bolivarian world had a special source of prosperity and opportunity. All the progress had begun before what could not have been possible with an early
Response 3: Once upon a time, one found myself called upon to defend my country from the foul evil that were the wolves, just as they had taken refuge with us and acted under
Response 4: Once upon a time, Vishnu thought of doing something for his friend. In those "decades of meditation of Khandsath" (1934-42)


## Combining all

In [168]:
def generate_responses(prompt, num_responses=3, max_new_tokens=30, temperature=1.0, top_k=0, top_p=1.0):
    input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)
    generated_texts = []

    for _ in range(num_responses):
        current_ids = input_ids.clone()

        for _ in range(max_new_tokens):
            with torch.no_grad():
                output = model(input_ids=current_ids)
                logits = output.logits[:, -1, :] / temperature
                probs = torch.softmax(logits, dim=-1)

                # Top-k filtering
                if top_k > 0:
                    topk_probs, topk_indices = torch.topk(probs, top_k, dim=-1)
                    probs = torch.zeros_like(probs).scatter(-1, topk_indices, topk_probs)
                    probs = probs / probs.sum(dim=-1, keepdim=True)

                # Top-p (nucleus) filtering
                if top_p < 1.0:
                    sorted_probs, sorted_indices = torch.sort(probs, descending=True)
                    cumulative_probs = torch.cumsum(sorted_probs, dim=-1)
                    sorted_mask = cumulative_probs > top_p
                    sorted_mask[..., 1:] = sorted_mask[..., :-1].clone()
                    sorted_mask[..., 0] = False
                    sorted_probs[sorted_mask] = 0
                    probs = torch.zeros_like(probs).scatter(-1, sorted_indices, sorted_probs)
                    probs = probs / probs.sum(dim=-1, keepdim=True)

                next_token_id = torch.multinomial(probs, num_samples=1)
                current_ids = torch.cat([current_ids, next_token_id], dim=-1)

        output_text = tokenizer.decode(current_ids[0], skip_special_tokens=True)
        generated_texts.append(output_text)

    return generated_texts

In [169]:
results = generate_responses(
    prompt="Once upon a time",
    num_responses=5,
    max_new_tokens=50,
    temperature=0.9,
    top_k=50,
    top_p=0
)

for i, text in enumerate(results, 1):
    print(f"--- Response {i} ---\n{text}\n")


--- Response 1 ---
Once upon a time, the world was a place of great beauty and great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a place of great danger

--- Response 2 ---
Once upon a time, the world was a place of great beauty and great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a place of great danger

--- Response 3 ---
Once upon a time, the world was a place of great beauty and great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a place of great danger

--- Response 4 ---
Once upon a time, the world was a place of great beauty and great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world w

In [170]:
results = generate_responses(
    prompt="Once upon a time",
    num_responses=5,
    max_new_tokens=50,
    temperature=0.1,
    top_k=50,
    top_p=1.0
)

for i, text in enumerate(results, 1):
    print(f"--- Response {i} ---\n{text}\n")


--- Response 1 ---
Once upon a time, the world was a place of great beauty and great danger. The world was a place of great danger. The world was a place of great danger. The world was a place of great danger. The world was a place of great danger. The

--- Response 2 ---
Once upon a time, the world was a place of great beauty, and the world was a place of great fear. And the world was a place of great fear. And the world was a place of great fear. And the world was a place of great fear.

--- Response 3 ---
Once upon a time, the world was a place of great beauty and great danger. But now, the world is a place of great danger. And now, the world is a place of great danger. And now, the world is a place of great danger. And

--- Response 4 ---
Once upon a time, the world was a place of peace and harmony. But now, the world is a place of war and bloodshed. The world is a place of terror and bloodshed. The world is a place of war and bloodshed. The world is a place of

--- Response 5 ---


In [171]:
results = generate_responses(
    prompt="Once upon a time",
    num_responses=5,
    max_new_tokens=50,
    temperature=1,
    top_k=1,
    top_p=1.0
)

for i, text in enumerate(results, 1):
    print(f"--- Response {i} ---\n{text}\n")


--- Response 1 ---
Once upon a time, the world was a place of great beauty and great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a place of great danger

--- Response 2 ---
Once upon a time, the world was a place of great beauty and great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a place of great danger

--- Response 3 ---
Once upon a time, the world was a place of great beauty and great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a place of great danger

--- Response 4 ---
Once upon a time, the world was a place of great beauty and great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world w