# Setup

In [1]:
from transformers import AutoTokenizer,AutoModelForCausalLM
import torch

In [2]:
tokenizer =AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [3]:
model.eval()  #model in evalution mode

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D(nf=2304, nx=768)
          (c_proj): Conv1D(nf=768, nx=768)
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D(nf=3072, nx=768)
          (c_proj): Conv1D(nf=768, nx=3072)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)

In [4]:
prompt = "Once upon a time in a mumbai"
inputs = tokenizer(prompt, return_tensors="pt")

#  Greedy Decoding

* Always pick the **highest probability token**



* Deterministic

* No randomness

In [8]:
g_out = model.generate(
    **inputs,
    max_new_tokens = 20,
    do_sample = False
)

print("Greedy Decoding Output :")
print(tokenizer.decode(g_out[0],skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Greedy Decoding Output :
Once upon a time in a mumbai-based company, the company's CEO, who is a former employee of the company, was asked


* *Same output every run*

* *Safe, boring, repetitive*

#  Temperature Sampling



*   Random sampling from probability distribution


*   Temperature controls randomnes



In [17]:
# temperature=0.3  # conservative
# temperature=1.0  # balanced
# temperature=1.3  # creative but risky



temp_out = model.generate(
    **inputs,
    max_new_tokens=20,
    do_sample=True,
    temperature=0.8
)

print("\nTemperature Sampling (T=0.8):")
print(tokenizer.decode(temp_out[0], skip_special_tokens=True))


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



Temperature Sampling (T=0.8):
Once upon a time in a mumbai city or a village, people are called by their name. And it is from such name that people


# Top-k Sampling

* Sample only from top **K** tokens

* Removes low-probability junk




In [22]:
topk_output = model.generate(
    **inputs,
    max_new_tokens=20,
    do_sample=True,
    top_k=10,
    temperature=0.8
)

print("\nTop-k Sampling (k=50):")
print(tokenizer.decode(topk_output[0], skip_special_tokens=True))


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



Top-k Sampling (k=50):
Once upon a time in a mumbai bar, a group of five people came over to the bar, took out a few beers (with


# Top-p (Nucleus Sampling)

* Sample from smallest set with **cumulative prob ≥ p**

* Adaptive and preferred

In [28]:
topp_output = model.generate(
    **inputs,
    max_new_tokens=40,
    do_sample=True,
    top_p=0.9,
    temperature=0.8
)

print("\nTop-p Sampling (p=0.9):")
print(tokenizer.decode(topp_output[0], skip_special_tokens=True))


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



Top-p Sampling (p=0.9):
Once upon a time in a mumbai temple it was said that when a devotee had been asked to pray for a certain hour and that they would not get any time off, he would be killed. He was not given any opportunity to


# Top-k + Top-p

In [26]:
combined_output = model.generate(
    **inputs,
    max_new_tokens=40,
    do_sample=True,
    top_k=10,
    top_p=0.95,
    temperature=0.7
)

print("\nTop-k + Top-p:")
print(tokenizer.decode(combined_output[0], skip_special_tokens=True))


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



Top-k + Top-p:
Once upon a time in a mumbai, I was sitting on my bed with my son, I had a dream about a girl who was in a car accident. I thought, I want to go back home. I was so scared.
