# LLM Settings & Generation Parameters
### Hands-On Guide Using Ollama

**Based on:** Nunnari Academy | Module 1, Week 2, Section 2.1

A practical, exercise-driven guide to understanding every generation parameter and how it affects LLM output. All examples use **Ollama** running locally.

---

**Prerequisites:**
- Ollama installed and running (`ollama serve`)
- A model pulled (we use `qwen2.5:1.5b` — adjust if you have a different model)
- `ollama` Python package installed

In [27]:
# Install the ollama python package if needed
!pip install ollama -q

In [28]:
import ollama

# Configure which model to use throughout this notebook
MODEL = "qwen2.5:1.5b"  # Change this to any model you have pulled

def chat(user_msg, system_msg="You are a helpful assistant.", **options):
    """Helper to call Ollama and return the response text."""
    response = ollama.chat(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_msg},
            {"role": "user", "content": user_msg},
        ],
        options=options,
    )
    return response["message"]["content"]

# Quick test
print(chat("Say hello in one sentence."))

Hello! How can I assist you today?


---
## 1. Temperature

Temperature controls the **randomness/creativity** of the model's output. It adjusts the probability distribution over possible next tokens.

| Value | Behavior | Output Style | Best For |
|-------|----------|-------------|----------|
| 0 | Fully deterministic | Same output every time | Fact extraction, classification, code |
| 0.3 | Slightly creative but focused | Minor phrasing variations | Summaries, professional writing |
| 0.7 | Balanced creativity and coherence | Noticeably different each run | General chatbots, emails |
| 1.0 | High creativity | Unique and diverse each time | Brainstorming, creative writing |
| 1.5–2.0 | Very random, often incoherent | Unpredictable, may hallucinate | Experimental only |

### 1.1 Experiment A: Factual Task (Low Temperature = Better)

Temperature should be **LOW** for factual tasks.

In [29]:
system = "You are a geography expert. Answer factually and concisely."
user = "What is the capital of Tamil Nadu?"

for temp in [0, 0.7, 1.5]:
    print(f"\n{'='*60}")
    print(f"Temperature = {temp}")
    print(f"{'='*60}")
    for run in range(3):
        result = chat(user, system, temperature=temp)
        print(f"  Run {run+1}: {result.strip()[:150]}")


Temperature = 0
  Run 1: The capital of Tamil Nadu, India, is Chennai.
  Run 2: The capital of Tamil Nadu, India, is Chennai.
  Run 3: The capital of Tamil Nadu, India, is Chennai.

Temperature = 0.7
  Run 1: The capital of Tamil Nadu, India, is Chennai.
  Run 2: The capital of Tamil Nadu, India, is Chennai.
  Run 3: The capital of Tamil Nadu is Chennai.

Temperature = 1.5
  Run 1: The capital of Tamil Nadu, India, is Chennai (also spelled Madras).
  Run 2: The capital city of Tamil Nadu, India, is Chennai.
  Run 3: The capital city of Tamil Nadu, India, is Chennai (also known as Madras).


**Observe:**
- At temperature 0, all runs should be identical
- At 0.7, you may see minor wording changes
- At 1.5, the model may add unnecessary details or ramble

### 1.2 Experiment B: Creative Task (Higher Temperature = Better)

In [30]:
system = "You are a creative poet who writes in a modern style."
user = "Write a 4-line poem about rain in Coimbatore."

for temp in [0, 0.7, 1.2]:
    print(f"\n{'='*60}")
    print(f"Temperature = {temp}")
    print(f"{'='*60}")
    result = chat(user, system, temperature=temp)
    print(result.strip())


Temperature = 0
Raindrops cascade from the sky,  
Coimbatore's streets dance with delight.  
In this city where dreams do dwell,  
Nature's symphony plays on.

The greenery swells with moisture,  
A canvas painted by the heavens' brush.  
Each drop a promise of renewal,  
In Coimbatore, rain is always sure.

Temperature = 0.7
Amidst the city's gleam,  
Rain descends with gentle might.  
In Coimbatore’s lanes,  
Nature whispers soft and kind.

Temperature = 1.2
Raindrops caress the cobblestone paths,
Coimbatore's symphony, refreshing each day.
From the sky, they weave life into our canvas blue,
Watering dreams and nurturing every soul.


**Observe:**
- At 0: Generic, predictable poem — same every time
- At 0.7: Good variety, still coherent and pleasant
- At 1.2: Very creative, unexpected metaphors — some may be beautiful, some odd

### Exercise 1: Temperature Explorer

Run the cell below and answer:
1. At temperature 0, are the names identical across runs?
2. At temperature 1.0, how different are they?
3. At temperature 1.8, are any names nonsensical?
4. At which temperature did you get the **best** names?

In [31]:
system = "You are a helpful assistant."
user = "Give me 3 startup name ideas for an AI company in Tamil Nadu."

for temp in [0, 1.0, 1.8]:
    print(f"\n{'='*60}")
    print(f"Temperature = {temp}")
    print(f"{'='*60}")
    for run in range(3):
        result = chat(user, system, temperature=temp)
        print(f"\n--- Run {run+1} ---")
        print(result.strip()[:300])


Temperature = 0

--- Run 1 ---
1. "Neural Nexus"
2. "Brainwave Innovations"
3. "MindSphere Solutions"

--- Run 2 ---
1. "Neural Nexus"
2. "Brainwave Innovations"
3. "MindSphere Solutions"

--- Run 3 ---
1. "Neural Nexus"
2. "Brainwave Innovations"
3. "MindSphere Solutions"

Temperature = 1.0

--- Run 1 ---
1. Saurashtra Smart Labs Pvt Ltd
  2. Vellalar Innovation Solutions Private Limited
  3. Thanjavur Technology and Research Institute

--- Run 2 ---
Sure, here are some startup name ideas for an AI company located in Tamil Nadu:

1. NalaTech: This name combines "nala" (meaning "intelligence" in Tamil) with the prefix "tech," reflecting the focus on technology and innovation.

2. IndaiWave: This name incorporates "india" to highlight the Tamil Na

--- Run 3 ---
Sure, here are three startup name ideas for an AI company based in Tamil Nadu:

1. TanviAI - This name combines the first letter of Chennai (T) and "AI" to indicate the location and technology focus.

2. RamaKami - Combining "R

---
## 2. Top P (Nucleus Sampling)

Top P controls creativity differently from Temperature. Instead of scaling probabilities, it defines a **probability budget** — the model only considers tokens whose cumulative probability adds up to the Top P value.

**Intuition with an example:**

Imagine the model is choosing the next word:

| Token | Probability | Cumulative |
|-------|------------|------------|
| "Chennai" | 60% | 60% |
| "Madurai" | 20% | 80% |
| "Coimbatore" | 10% | 90% |
| "Trichy" | 5% | 95% |
| "Ooty" | 3% | 98% |
| "Kanyakumari" | 2% | 100% |

- **Top P = 0.1**: Only "Chennai" is considered
- **Top P = 0.8**: "Chennai", "Madurai" are in the pool
- **Top P = 0.95**: "Chennai", "Madurai", "Coimbatore", "Trichy" are all candidates
- **Top P = 1.0**: All tokens are candidates (no filtering)

### 2.1 Demonstration: Top P in Action

In [32]:
system = "You are a food recommendation assistant for South Indian cuisine."
user = "Suggest one breakfast item I should try today."

# Keep temperature at 1.0 (default), vary only Top P
for top_p in [0.1, 0.5, 0.95]:
    print(f"\n{'='*60}")
    print(f"Top P = {top_p}  (Temperature = 1.0)")
    print(f"{'='*60}")
    for run in range(5):
        result = chat(user, system, temperature=1.0, top_p=top_p)
        # Extract just the first line for compact display
        first_line = result.strip().split('\n')[0][:120]
        print(f"  Run {run+1}: {first_line}")


Top P = 0.1  (Temperature = 1.0)
  Run 1: For a delightful start to your day, you might enjoy trying "Idli with Coconut Chutney." Idlis are steamed rice cakes tha
  Run 2: For a delightful start to your day, you might enjoy trying "Idli with Coconut Chutney." Idlis are steamed rice cakes tha
  Run 3: For a delightful start to your day, you might enjoy trying "Idli with Coconut Chutney." Idlis are steamed rice cakes tha
  Run 4: For a delightful start to your day, you might enjoy trying "Idli with Coconut Chutney." Idlis are steamed rice cakes tha
  Run 5: For a delightful start to your day, you might enjoy trying "Idli with Coconut Chutney." Idlis are steamed rice cakes tha

Top P = 0.5  (Temperature = 1.0)
  Run 1: Certainly! For a delicious and refreshing start to your day, you might enjoy trying **Idli Sambhar**. Idlis are steamed 
  Run 2: Certainly! A perfect choice for your breakfast would be **Idli with Sambar and Coconut Chutney**. Idlis, a steamed rice 
  Run 3: For a delight

**Observe:**
- Top P = 0.1: Almost always "Dosa" or "Idli" (most probable tokens)
- Top P = 0.5: Rotates among popular options (Dosa, Idli, Pongal, Upma)
- Top P = 0.95: Wide variety — may suggest Paniyaram, Adai, Puttu, Appam

> **Important Rule:** Alter Temperature **OR** Top P, but **NOT both**. If adjusting Temperature, keep Top P at 1.0. If adjusting Top P, keep Temperature at 1.0.

### Exercise 2: Temperature vs Top P

Compare the two approaches and answer:
1. Which setting gave more varied suggestions?
2. Which setting kept answers more relevant and coherent?
3. When would you prefer each approach?

In [33]:
system = "You are a travel guide for Tamil Nadu."
user = "Recommend one hidden gem tourist spot."

print("Approach 1: Temperature=0.8, Top P=1.0")
print("=" * 50)
for run in range(5):
    result = chat(user, system, temperature=0.8, top_p=1.0)
    first_line = result.strip().split('\n')[0][:120]
    print(f"  Run {run+1}: {first_line}")

print(f"\nApproach 2: Temperature=1.0, Top P=0.3")
print("=" * 50)
for run in range(5):
    result = chat(user, system, temperature=1.0, top_p=0.3)
    first_line = result.strip().split('\n')[0][:120]
    print(f"  Run {run+1}: {first_line}")

Approach 1: Temperature=0.8, Top P=1.0
  Run 1: When it comes to finding hidden gems, the picturesque Ammanamala Falls in Kumbakonam is a must-visit destination. This r
  Run 2: Tamil Nadu is known for its rich culture, history and natural beauty offering lots of interesting places to discover. Ho
  Run 3: One of the lesser-known but highly scenic spots in Tamil Nadu is Pazhukotam, also known as Pondy Bay or Pondicherry. It'
  Run 4: One of the lesser-known but highly fascinating tourist spots in Tamil Nadu is "Vetrikkulam Marine National Park" located
  Run 5: One hidden gem in Chennai, which is often overlooked by tourist crowds but offers a unique and refreshing experience, is

Approach 2: Temperature=1.0, Top P=0.3
  Run 1: Tamil Nadu is rich in history, culture and natural beauty. One of the lesser-known gems that deserves to be explored is 
  Run 2: Tamil Nadu is rich in history, culture and natural beauty. One of the lesser-known gems that deserves to be explored is 
  Run 3: Ta

---
## 3. Max Tokens (Max Length)

Sets the upper limit on how many tokens the model can **generate** in its response. This is purely about output.

### Why It Matters
- **Cost control**: Fewer tokens = lower cost
- **Response quality**: Uncapped models tend to ramble
- **Latency**: Fewer tokens = faster response
- **Structure**: Forces the model to be concise

| Max Tokens | Expected Output |
|-----------|----------------|
| 20 | Gets cut off mid-sentence |
| 50 | Concise but complete |
| 200 | Detailed with examples |
| 1000 | May over-explain |

In [34]:
system = "You are a helpful assistant. Be direct and informative."
user = "Explain what machine learning is."

for max_tok in [20, 50, 200, 1000]:
    print(f"\n{'='*60}")
    print(f"Max Tokens = {max_tok}")
    print(f"{'='*60}")
    result = chat(user, system, num_predict=max_tok)
    word_count = len(result.split())
    print(result.strip())
    print(f"\n  [Word count: {word_count}]")


Max Tokens = 20
Machine learning is a subfield of artificial intelligence (AI) that focuses on the development of algorithms and

  [Word count: 17]

Max Tokens = 50
Machine Learning (ML) is a branch of artificial intelligence that involves developing algorithms and models to enable computer systems to learn from and make predictions on data without being explicitly programmed. ML focuses on developing intelligent machines capable of performing tasks that typically require human-like intelligence

  [Word count: 46]

Max Tokens = 200
Machine learning is a subfield of artificial intelligence (AI) that involves the development of algorithms and statistical models that enable computer systems to learn from data, identify patterns in the data, and make predictions or decisions without being explicitly programmed to do so.

The basic idea behind machine learning is that by analyzing large amounts of data, machines can automatically improve their performance on specific tasks over time. Thi

> **Production Tip:** Always set max_tokens explicitly. For a chatbot answering FAQs, 150-300 tokens is often sufficient.

### Exercise 3: Max Tokens Impact

1. Is the 30-token answer useful?
2. Is 100 tokens better?
3. Is the extra length at 500 tokens worth the cost?

In [35]:
system = "You are a coding tutor."
user = "What is a Python dictionary?"

for max_tok in [30, 100, 500]:
    print(f"\n{'='*60}")
    print(f"Max Tokens = {max_tok}")
    print(f"{'='*60}")
    result = chat(user, system, num_predict=max_tok)
    print(result.strip())
    print(f"\n  [Approximate tokens: {len(result.split())}]")


Max Tokens = 30
A Python dictionary is an unordered collection of key-value pairs. Each item in the dictionary is identified by a unique key, and each value can be accessed

  [Approximate tokens: 27]

Max Tokens = 100
A Python dictionary is an unordered collection of key-value pairs, where each pair consists of a unique key and its associated value.

In simpler terms:
- It's similar to a hash table in other programming languages.
- A dictionary allows you to quickly look up values based on keys.
- Keys are unique within the dictionary; they can't be duplicated.
- Values can repeat multiple times with different keys.

Key-value pairs in a Python dictionary are stored as tuples, and each pair is separated by commas.

  [Approximate tokens: 86]

Max Tokens = 500
A Python Dictionary is an unordered collection of key-value pairs, where each pair consists of a unique key and its corresponding value. Dictionaries are mutable (modifiable) and provide a fast lookup for values based on keys us

---
## 4. Stop Sequences

Stop sequences tell the model to **stop generating** when it encounters a specific string. This gives you structural control over the output format.

> **Note:** In Ollama, stop sequences are passed via the `stop` option.

### 4.1 Example A: Controlling List Length

Stop at "6." to get exactly 5 items.

In [36]:
system = "You are a helpful assistant. Generate numbered lists."
user = "List the top programming languages for AI development."

print("WITHOUT stop sequence:")
print("-" * 40)
result = chat(user, system, temperature=0)
print(result.strip())

print(f"\n{'='*60}")
print("WITH stop sequence '6.':")
print("-" * 40)

response = ollama.chat(
    model=MODEL,
    messages=[
        {"role": "system", "content": system},
        {"role": "user", "content": user},
    ],
    options={"temperature": 0, "stop": ["6."]},
)
print(response["message"]["content"].strip())

WITHOUT stop sequence:
----------------------------------------
1. Python
2. Java
3. C++
4. R
5. JavaScript
6. Swift
7. Go
8. Kotlin
9. Ruby
10. TypeScript

WITH stop sequence '6.':
----------------------------------------
1. Python
2. Java
3. C++
4. R
5. JavaScript


### 4.2 Example B: Stopping at Delimiters (Clean JSON)

In [37]:
system = "You are a JSON generator. Output valid JSON only."
user = "Generate a JSON object with name, age, and city for a sample user."

print("WITHOUT stop sequence:")
print("-" * 40)
result = chat(user, system, temperature=0)
print(result.strip())

print(f"\n{'='*60}")
print("WITH stop sequence '```':")
print("-" * 40)
response = ollama.chat(
    model=MODEL,
    messages=[
        {"role": "system", "content": system},
        {"role": "user", "content": user},
    ],
    options={"temperature": 0, "stop": ["```"]},
)
print(response["message"]["content"].strip())

WITHOUT stop sequence:
----------------------------------------
```json
{
  "name": "John Doe",
  "age": 30,
  "city": "New York"
}
```

WITH stop sequence '```':
----------------------------------------



### 4.3 Example C: Single-Answer Extraction

In [38]:
system = "You are a classification system. Respond with ONLY the category name."
user = 'Classify this review: "The food was amazing and the service was fast!"'

print("WITHOUT stop sequence:")
result = chat(user, system, temperature=0)
print(f"  -> {result.strip()}")

print("\nWITH stop sequence ['\\n', '.']:")
response = ollama.chat(
    model=MODEL,
    messages=[
        {"role": "system", "content": system},
        {"role": "user", "content": user},
    ],
    options={"temperature": 0, "stop": ["\n", "."]},
)
print(f"  -> {response['message']['content'].strip()}")

WITHOUT stop sequence:
  -> Positive Feedback

WITH stop sequence ['\n', '.']:
  -> Positive Feedback


### Exercise 4: Stop Sequences for Structured Output

1. Run without a stop sequence — observe how much it generates
2. Use stop sequence `"Question 2"` — does it stop after the first question?
3. Use stop sequence `"D)"` to limit to 3 options

In [39]:
system = "You are a quiz generator. Create multiple choice questions."
user = "Create a quiz question about Python data types."

print("No stop sequence:")
print("=" * 50)
result = chat(user, system, temperature=0)
print(result.strip())

print(f"\n{'='*60}")
print('Stop sequence: "Question 2"')
print("=" * 50)
response = ollama.chat(
    model=MODEL,
    messages=[{"role": "system", "content": system}, {"role": "user", "content": user}],
    options={"temperature": 0, "stop": ["Question 2"]},
)
print(response["message"]["content"].strip())

print(f"\n{'='*60}")
print('Stop sequence: "D)"')
print("=" * 50)
response = ollama.chat(
    model=MODEL,
    messages=[{"role": "system", "content": system}, {"role": "user", "content": user}],
    options={"temperature": 0, "stop": ["D)"]},
)
print(response["message"]["content"].strip())

No stop sequence:
Which of the following is not considered a built-in type in Python?
A) String
B) Integer
C) Dictionary
D) Float

The correct answer is C, Dictionary. In Python, dictionaries are objects and cannot be directly referred to as built-in types like strings or integers.

Stop sequence: "Question 2"
Which of the following is not considered a built-in type in Python?
A) String
B) Integer
C) Dictionary
D) Float

The correct answer is C, Dictionary. In Python, dictionaries are objects and cannot be directly referred to as built-in types like strings or integers.

Stop sequence: "D)"
Which of the following is not considered a built-in type in Python?
A) String
B) Integer
C) Dictionary


---
## 5. Frequency Penalty & Presence Penalty

These two settings control **repetition** in the model's output.

| Parameter | How It Works | Effect |
|-----------|-------------|--------|
| **Frequency Penalty** | Penalty proportional to how many times a token appeared. Used 5 times = 5x penalty. | Reduces **word-level** repetition |
| **Presence Penalty** | Flat penalty once a token has appeared at all. Used 1 time = same penalty as 10 times. | Encourages **topic diversity** |

Both range from 0 to 2.0. Default is 0. Values above 1.0 are aggressive.

> **Note:** In Ollama, these map to `repeat_penalty` (which combines both concepts). We'll use `repeat_penalty` to demonstrate the effect. A value of 1.0 = no penalty, >1.0 = penalize repetition.

In [40]:
system = "You are a creative writer."
user = "Write a paragraph about the benefits of learning AI. Make it engaging and use varied vocabulary."

for penalty in [1.0, 1.1, 1.3, 1.8]:
    print(f"\n{'='*60}")
    print(f"Repeat Penalty = {penalty}")
    print(f"{'='*60}")
    result = chat(user, system, temperature=0.7, repeat_penalty=penalty)
    print(result.strip()[:400])


Repeat Penalty = 1.0
As the world becomes increasingly digital, the importance of understanding artificial intelligence (AI) becomes a necessity. Not only can AI revolutionize industries by automating processes and increasing efficiency, but it also promises to make our lives more convenient and enriching. With its capability to learn from vast amounts of data, AI can help in making critical decisions based on pattern

Repeat Penalty = 1.1
Learning Artificial Intelligence (AI) is akin to unlocking an extraordinary code that could transform our world for the better, promising a future where technology serves humanity rather than vice versa. As we embark on this journey of understanding the intricate mechanisms behind AI, we find ourselves in a realm where algorithms and cognitive simulations come alive, offering insights into how mac

Repeat Penalty = 1.3
Embarking on an educational journey to learn Artificial Intelligence (AI) is akin to unlocking doors that lead into extraordinary po

### Exercise 5: Tackling Repetition

Compare different penalty levels and observe:
1. At 1.0 (no penalty), count repeated words/phrases
2. At moderate penalty, is the output more varied?
3. At high penalty, does the output sound unnatural?

In [41]:
system = "You are a motivational coach."
user = "Write 5 tips for staying productive while working from home."

for penalty in [1.0, 1.2, 1.5]:
    print(f"\n{'='*60}")
    print(f"Repeat Penalty = {penalty}")
    print(f"{'='*60}")
    result = chat(user, system, temperature=0.7, repeat_penalty=penalty)
    print(result.strip()[:500])
    # Count most common words
    words = result.lower().split()
    from collections import Counter
    common = Counter(words).most_common(5)
    print(f"\n  Top 5 words: {common}")


Repeat Penalty = 1.0
Working from home can be an incredibly productive way to work, but it can also be challenging to maintain a structured routine and avoid distractions. Here are five tips to help you stay productive while working from home:

1. **Establish a Dedicated Workspace**: Create a specific area in your home dedicated solely to work. This space should be quiet, organized, and free from clutter. Having a designated workspace can help you mentally separate work from your personal life, making it easier to f

  Top 5 words: [('to', 13), ('and', 13), ('a', 12), ('your', 11), ('you', 10)]

Repeat Penalty = 1.2
Working from home can offer numerous benefits, including flexibility and the ability to work on your own schedule. However, it's important that you stay focused throughout the day as distractions often become more common when you're not in an office environment.

Here are 5 tips for staying productive while working from home:

1. **Establish a Routine**: Start each morning

---
## 6. System Message (Persona & Behavior)

The system message is the **most powerful control** you have. It defines the model's identity, tone, constraints, and output format.

### 6.1 Same Query, Different System Messages

In [42]:
user = "What should I know about investing in mutual funds?"

personas = {
    "Financial Advisor": (
        "You are a certified financial advisor with 20 years of experience. "
        "Provide balanced, professional advice. Always include risk disclaimers."
    ),
    "Friend Over Coffee": (
        "You are a friend who is good with money. Explain things casually using "
        "simple language and relatable examples. No jargon."
    ),
    "Tamil Financial YouTuber": (
        "You are a popular Tamil financial YouTuber. Explain concepts in a mix of "
        "Tamil and English (Tanglish). Use examples relevant to middle-class Indian "
        "families. Keep it energetic and relatable."
    ),
}

for name, system in personas.items():
    print(f"\n{'='*60}")
    print(f"Persona: {name}")
    print(f"{'='*60}")
    result = chat(user, system, temperature=0.7, num_predict=300)
    print(result.strip()[:500])


Persona: Financial Advisor
Investing in mutual funds can be an effective way to diversify your portfolio and potentially generate returns without the need for direct ownership of individual stocks or bonds. Here are some key points to consider:

1. **Understanding Mutual Funds**: A mutual fund is a pool of money from many investors who share the responsibility for managing it. The manager decides how to allocate the money among various assets, such as stocks, bonds, and other securities.

2. **Types of Mutual Funds**: Loo

Persona: Friend Over Coffee
When you think of investing, you might imagine buying shares of a company or putting your money into stocks that companies are selling for the first time on a stock exchange. But there's another type of investment called mutual funds.

Imagine you have $100 to invest. Instead of choosing individual pieces from many different companies (which can be hard because some perform better than others), you buy shares in a fund. This fund, made up

### Exercise 6: System Message Power

Compare 3 different personas explaining the same concept:
1. Which explanation would work best for **your** target audience?
2. Does the model follow the "exactly 3 sentences" constraint reliably?

In [43]:
user = "Explain what an API is."

systems = {
    "CS Professor": "You are a CS professor teaching first-year engineering students.",
    "Explaining to a 10-year-old": "You are explaining to a 10-year-old. Use analogies from daily life.",
    "Developer Docs": "You are a developer writing internal documentation. Be precise and technical.",
}

for name, system in systems.items():
    print(f"\n{'='*60}")
    print(f"Persona: {name}")
    print(f"{'='*60}")
    result = chat(user, system, temperature=0.3, num_predict=200)
    print(result.strip())

# Test constraint following
print(f"\n{'='*60}")
print("With constraint: 'Respond in exactly 3 sentences.'")
print(f"{'='*60}")
result = chat(
    user,
    "You are a CS professor. Respond in exactly 3 sentences.",
    temperature=0.3,
    num_predict=200,
)
print(result.strip())
sentence_count = result.strip().count('.') + result.strip().count('!') + result.strip().count('?')
print(f"\n  [Approximate sentence count: {sentence_count}]")


Persona: CS Professor
An Application Programming Interface (API) is a set of rules and protocols that allows different software applications to communicate with each other. It's like a language for machines, enabling them to talk to one another without needing to understand the inner workings of each other.

In simpler terms, imagine you have two apps: App A and App B. They both need to interact with some external service (like fetching data from an API). Instead of writing code that directly talks to this service, they can use a common interface provided by App C (the API) which translates their requests into the language understood by the external service.

This way, each app only needs to know how to communicate with App C, and it doesn't need to understand the details of what's happening on the other side. This makes the apps more modular, easier to maintain, and can be reused in different contexts without needing to rewrite their logic.

APIs are crucial for building complex syst

---
## 7. Combining Parameters: Real-World Recipes

In production, you rarely change just one parameter. Here are proven combinations:

| Use Case | Temp | Top P | Max Tokens | Repeat Pen. | Why |
|----------|------|-------|-----------|-------------|-----|
| FAQ Chatbot | 0 | 1.0 | 150 | 1.0 | Deterministic, concise |
| Code Generation | 0 | 0.95 | 500 | 1.0 | Precise logic |
| Creative Writing | 0.9 | 1.0 | 800 | 1.1 | High creativity, reduced repetition |
| Email Drafting | 0.3 | 1.0 | 300 | 1.1 | Professional, slight variation |
| Data Extraction | 0 | 1.0 | 100 | 1.0 | Maximum precision |
| Brainstorming | 1.2 | 1.0 | 500 | 1.2 | Maximum diversity |
| Classification | 0 | 1.0 | 10 | 1.0 | Single label, deterministic |
| Translation | 0.2 | 1.0 | 500 | 1.0 | Faithful to source |

In [44]:
# Let's test a few recipes side by side

recipes = {
    "FAQ Chatbot": {
        "system": "You are a helpful FAQ assistant for a coding bootcamp. Be concise.",
        "user": "What programming language should I learn first?",
        "options": {"temperature": 0, "top_p": 1.0, "num_predict": 150, "repeat_penalty": 1.0},
    },
    "Creative Writing": {
        "system": "You are a creative storyteller.",
        "user": "Write a micro-story (3 sentences) about a robot learning to cook.",
        "options": {"temperature": 0.9, "top_p": 1.0, "num_predict": 200, "repeat_penalty": 1.1},
    },
    "Classification": {
        "system": "Classify the sentiment as Positive, Negative, or Neutral. Reply with one word only.",
        "user": "The product arrived late but the quality was excellent.",
        "options": {"temperature": 0, "top_p": 1.0, "num_predict": 10, "repeat_penalty": 1.0},
    },
}

for name, recipe in recipes.items():
    print(f"\n{'='*60}")
    print(f"Recipe: {name}")
    print(f"Settings: {recipe['options']}")
    print(f"{'='*60}")
    result = chat(recipe["user"], recipe["system"], **recipe["options"])
    print(result.strip())


Recipe: FAQ Chatbot
Settings: {'temperature': 0, 'top_p': 1.0, 'num_predict': 150, 'repeat_penalty': 1.0}
The best programming language to learn first depends on your goals and the context of your project. Here are some popular choices:

1. **Python** - Great for beginners due to its simplicity and readability. It's widely used in data science, web development, and automation.

2. **JavaScript** - Essential for web development. It's used to build interactive web applications and is part of the backend with Node.js.

3. **Java** - Popular for enterprise applications, Android app development, and large-scale systems due to its robustness and scalability.

4. **C++** - Ideal for system programming, game development, and high-performance applications due to its efficiency and low-level control.

5. **Ruby** - Great for web development, especially

Recipe: Creative Writing
Settings: {'temperature': 0.9, 'top_p': 1.0, 'num_predict': 200, 'repeat_penalty': 1.1}
In the kitchen of its factory,

### Exercise 7: Build Your Own Recipe

Pick one scenario and find the optimal settings:

- **Scenario A**: Customer support bot for an academy answering course questions
- **Scenario B**: Creative Tamil story generator for children
- **Scenario C**: Code review agent that analyzes Python code

In [45]:
# Fill in your chosen scenario below and experiment!

my_system = ""  # Your system message here
my_user = ""    # Your user query here
my_options = {
    "temperature": 0.5,
    "top_p": 1.0,
    "num_predict": 300,
    "repeat_penalty": 1.0,
}

if my_system and my_user:
    result = chat(my_user, my_system, **my_options)
    print(result.strip())
else:
    print("Fill in my_system and my_user above, then re-run!")

Fill in my_system and my_user above, then re-run!


---
## 8. From Playground to Code

Everything you've been configuring interactively maps directly to API parameters. Here's the full mapping:

| Concept | Ollama Option | Default |
|---------|-------------|--------|
| Temperature | `temperature` | 0.8 |
| Top P | `top_p` | 0.9 |
| Max Tokens | `num_predict` | -1 (unlimited) |
| Stop Sequences | `stop` | None |
| Repeat Penalty | `repeat_penalty` | 1.1 |

### Complete Example

In [46]:
import ollama
import time

# A production-ready call with all parameters
start = time.time()

response = ollama.chat(
    model=MODEL,
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant.",
        },
        {
            "role": "user",
            "content": "Explain what a neural network is.",
        },
    ],
    options={
        "temperature": 0.3,       # Low for factual content
        "top_p": 1.0,             # Default (not adjusting both)
        "num_predict": 200,       # Cap output length
        "repeat_penalty": 1.1,    # Slight repetition reduction
        "stop": ["\n\n"],         # Stop at double newline
    },
)

elapsed = time.time() - start

print("Response:")
print(response["message"]["content"])
print(f"\n--- Stats ---")
print(f"Model: {MODEL}")
print(f"Time: {elapsed:.2f}s")
if "eval_count" in response:
    print(f"Tokens generated: {response['eval_count']}")
if "eval_duration" in response:
    tokens_per_sec = response["eval_count"] / (response["eval_duration"] / 1e9)
    print(f"Speed: {tokens_per_sec:.1f} tokens/sec")

Response:
A neural network is a type of artificial intelligence (AI) system that is inspired by the structure and function of the human brain. It consists of interconnected nodes or "neurons" that process information in a way similar to how neurons in the brain do.

--- Stats ---
Model: qwen2.5:1.5b
Time: 0.59s
Tokens generated: 51
Speed: 112.7 tokens/sec


---
## 9. Quick Reference Cheat Sheet

### The Two Golden Rules

1. **Change Temperature OR Top P** — never both
2. **Use Repeat Penalty conservatively** — start with 1.1, test before going higher

### What to Adjust

| I want to... | Adjust This | Direction | Example Value |
|-------------|------------|-----------|---------------|
| Get consistent, factual answers | Temperature | Lower | 0 – 0.2 |
| Get creative, varied output | Temperature | Higher | 0.8 – 1.2 |
| Limit word choices to safe options | Top P | Lower | 0.1 – 0.3 |
| Keep responses short | Max Tokens (`num_predict`) | Lower | 50 – 150 |
| Stop at a specific format | Stop Sequences (`stop`) | Set delimiter | `"\n"`, `"###"`, `"6."` |
| Reduce repetition | Repeat Penalty | Higher | 1.1 – 1.3 |
| Change model personality | System Message | Rewrite persona | (see Section 6) |

---

*Nunnari Academy | Generative AI & Agentic AI Professional Program*  
*Module 1, Week 2, Section 2.1 | LLM Settings & Generation Parameters*