# Session 2 – Notebook 3: LLM Parameters

**Objectives:**
- Learn about key parameters that control LLM behavior.
- Experiment with:
  - `temperature` → randomness / creativity.
  - `max_new_tokens` → response length.
  - `top_p` → diversity of word choices.
- Understand how parameter tuning changes chatbot output.

In [2]:
from transformers import pipeline

# Load a small instruction-following model
# flan-t5-small is light and good for Q&A or simple tasks
gen = pipeline("text2text-generation", model="google/flan-t5-small")

prompt = "Write a short story about a dog who becomes a hero."


  from .autonotebook import tqdm as notebook_tqdm
Device set to use cpu


**MAX NEW TOKENS**

In [25]:
# MAX NEW TOKENS -> controls how long the response can be
# Smaller values = short answers, larger values = detailed answers

print("Max tokens = 20")
print(gen(prompt, max_new_tokens=20)[0]["generated_text"])

Max tokens = 20
The dog is a hero and he is a hero. He is a


In [26]:
# MAX NEW TOKENS -> controls how long the response can be
# Smaller values = short answers, larger values = detailed answers

print("\nMax tokens = 80")
print(gen(prompt, max_new_tokens=80)[0]["generated_text"])


Max tokens = 80
The dog is a hero. He is a dog. He is a hero. He is a hero. He is a hero. He is a hero. He is a hero. He is a hero. He is a hero. He is a hero. He is a hero. He is 


**do_sample**

When a model generates text, it chooses words one by one.
Each next word has probabilities — like:

Token | Probability
------|-------------
dog   | 0.65
cat   | 0.25
wolf  | 0.10

`do_sample` tells the model *how* to pick from these probabilities.

-------------------------------------------------------------
🧩 do_sample = False   →  DETERMINISTIC MODE

-------------------------------------------------------------
- Always picks the word with the highest score.
- No randomness.
- Same prompt → same output every time.
✅ Called “deterministic” because the result is fixed and repeatable.

-------------------------------------------------------------
🎲 do_sample = True   →  STOCHASTIC MODE

-------------------------------------------------------------
- Randomly picks from the probability list.
- Sometimes “cat,” sometimes “dog,” depending on luck and temperature.
- Same prompt → slightly different outputs each time.
🎨 Called “stochastic” because randomness is part of the process.

-------------------------------------------------------------

🧠 In short:

   do_sample = False  →  Deterministic (no randomness)
   
   do_sample = True   →  Stochastic (adds randomness)



In [24]:
# do_sample = False → Deterministic / Greedy decoding
# The model always picks the most likely next word.
# Same input → same output every time (safe, predictable).
# max_new_tokens just controls how long the response can be.

print(gen(prompt, do_sample=False, max_new_tokens=80)[0]["generated_text"])

The dog is a hero. He is a dog. He is a hero. He is a hero. He is a hero. He is a hero. He is a hero. He is a hero. He is a hero. He is a hero. He is a hero. He is 


In [22]:
# do_sample = True → Stochastic / Sampling decoding
# The model randomly picks from possible next words.
# Same input → different output each time (creative, varied).
# max_new_tokens still limits how long the model continues writing.

print(gen(prompt, do_sample=True, max_new_tokens=80)[0]["generated_text"])

The dog is the hero of the fox. He is a fox. He is a fox. He is a fox. He is a fox. He is a fox. He is a fox. He is a fox. He is a fox. He is a fox. He is a fox


**Temperature**

In [27]:
# TEMPERATURE -> controls creativity / randomness
# Lower = predictable, Higher = more creative/varied

print("Temperature = 0.2")
print(gen(prompt, temperature=0.2, do_sample=True, max_new_tokens=60)[0]["generated_text"])

Temperature = 0.2
The dog is a hero. He is a dog. He is a hero. He is a hero. He is a hero. He is a hero. He is a hero. He is a hero. He is a


In [28]:
# TEMPERATURE -> controls creativity / randomness
# Lower = predictable, Higher = more creative/varied

print("\nTemperature = 1.5")
print(gen(prompt, temperature=1.5, do_sample=True, max_new_tokens=60)[0]["generated_text"])


Temperature = 1.5
The dog is the dog's owner and he is the thief. He is an obnoxious dog. He has a lot of problems with his dog that he doesn t like. He is the dog's owner and his dog is the owner of


In [29]:
# TEMPERATURE -> controls creativity / randomness
# Lower = predictable, Higher = more creative/varied

print("\nTemperature = 1.5")
print(gen(prompt, temperature=1.5, do_sample=False, max_new_tokens=60)[0]["generated_text"])


Temperature = 1.5
The dog is a hero. He is a dog. He is a hero. He is a hero. He is a hero. He is a hero. He is a hero. He is a hero. He is a


**What have we learnt?**

Temperature doesn’t make the model “smarter” — it changes how risky it is when picking the next word.

- Low temp: always picks the safest word → same output every time.
- High temp: takes more risks → new ideas, but more errors.
- Works only if sampling is on (do_sample=True).

**TOP-P**

In [39]:
# TOP-P -> controls diversity of word choices
# Lower = focused on the most likely words
# Higher = allows more diverse / unexpected words

print("Top-p = 0.1")
print(gen(prompt, top_p=0.5, do_sample=True, max_new_tokens=60)[0]["generated_text"])

Top-p = 0.1
The dog is a hero. He is a dog. He is a hero. He is a hero. He is a hero. He is a hero. He is a hero. He is a hero. He is a


In [40]:
# TOP-P -> controls diversity of word choices
# Lower = focused on the most likely words
# Higher = allows more diverse / unexpected words

print("Top-p = 0.9")
print(gen(prompt, top_p=0.9, do_sample=True, max_new_tokens=60)[0]["generated_text"])


Top-p = 0.9
The dog is a hero. He is a scout. He is a cat. He is a scout. He is a scout. He is a scout. He is a scout. He


### Reflection

- Which response sounded more creative or surprising?  
- Which response was the shortest or most factual?  
- If you wanted to build a chatbot for **homework help**, what settings would you use?  
- If you wanted to build a **storytelling bot**, what settings would you use?  

In [6]:
# FULL DEMO: Adjust all parameters at once

prompt = "Explain photosynthesis like I am 10 years old."

response = gen(
    prompt,
    do_sample=True,
    temperature=0.7,     # creativity
    max_new_tokens=80,   # length
    top_p=0.9            # diversity
)

print("User:", prompt)
print("Bot:", response[0]["generated_text"])


User: Explain photosynthesis like I am 10 years old.
Bot: Explain photosynthesis as I am 10 years old.
