# Lab 3: Generative AI with Ollama

**Duration:** 90-120 minutes | **Difficulty:** Intermediate to Advanced

## Learning Objectives

By the end of this lab, you will be able to:
1. Connect to and use Ollama for local LLM inference
2. Generate text using the Llama model
3. Apply prompt engineering techniques for better results
4. Control generation with temperature and other parameters
5. Build multi-turn conversations with chat history
6. Implement Retrieval-Augmented Generation (RAG)
7. Understand fine-tuning concepts with LoRA and QLoRA

## Prerequisites

- Ollama must be installed and running locally
- The Llama 3.2 model should be pulled: `ollama pull llama3.2`

## Instructions

- Read each markdown cell carefully before running the code cell below it
- For **Exercise** cells, replace `None` with your code
- Use the **Demonstration** cells as examples to guide you
- Run cells with `Shift+Enter`

## Setup

Run the cell below to import the required libraries and verify Ollama is running.

In [None]:
import ollama
import json
import numpy as np
from typing import List, Dict

# Check connection to Ollama
try:
    models = ollama.list()
    print("Connected to Ollama!")
    print("\nAvailable models:")
    for model in models.get('models', []):
        print(f"  - {model['name']}")
except Exception as e:
    print(f"Error connecting to Ollama: {e}")
    print("\nMake sure Ollama is running: ollama serve")

---
# Part 1: Basic Text Generation

Ollama provides a simple API for generating text with local LLMs.

## 1.1 Simple Generation - Demonstration

Use `ollama.generate()` to generate text from a prompt:
- `model` - The model name (e.g., 'llama3.2')
- `prompt` - Your input text

Run the cell below to see basic generation:

In [None]:
# Basic text generation
response = ollama.generate(
    model='llama3.2',
    prompt='What is machine learning in one sentence?'
)

print("Response:")
print(response['response'])

## 1.2 Understanding the Response - Demonstration

The response object contains useful metadata:

Run the cell below to explore the response structure:

In [None]:
response = ollama.generate(
    model='llama3.2',
    prompt='Name three programming languages.'
)

print("Full response keys:", list(response.keys()))
print("\n--- Key Information ---")
print(f"Model: {response['model']}")
print(f"Response text: {response['response'][:100]}...")
print(f"Done: {response['done']}")
print(f"Total duration: {response.get('total_duration', 0) / 1e9:.2f} seconds")

## Exercise 1.1: Generate Your First Response

Use `ollama.generate()` to ask the model to explain what an API is.

| Variable | What to do |
|----------|------------|
| `my_response` | Call `ollama.generate()` with model 'llama3.2' and a prompt asking "What is an API?" |
| `answer` | Extract the response text from `my_response` |

**Hint:** Look at the demonstration above for the syntax. The response text is in `response['response']`.

In [None]:
# Generate a response asking about APIs
my_response = None

# Extract the text
answer = None

print("Answer:", answer)

---
# Part 2: Prompt Engineering

The way you write prompts significantly affects the quality of responses.

## 2.1 Role-Based Prompts - Demonstration

Giving the model a role or persona can improve responses:

Run the cell below to see the difference:

In [None]:
# Without role
basic_prompt = "Explain recursion."

# With role
role_prompt = """You are an experienced computer science teacher who explains concepts 
to beginners using simple analogies.

Explain recursion."""

print("=== Basic Prompt ===")
response1 = ollama.generate(model='llama3.2', prompt=basic_prompt)
print(response1['response'][:300])

print("\n=== With Role ===")
response2 = ollama.generate(model='llama3.2', prompt=role_prompt)
print(response2['response'][:300])

## 2.2 Structured Output - Demonstration

Ask for specific formats to get structured responses:

Run the cell below:

In [None]:
structured_prompt = """List 3 benefits of exercise.

Format your response as a numbered list with exactly 3 items.
Each item should be one sentence."""

response = ollama.generate(model='llama3.2', prompt=structured_prompt)
print(response['response'])

## 2.3 JSON Output - Demonstration

You can request JSON-formatted responses for programmatic use:

Run the cell below:

In [None]:
json_prompt = """Generate information about a fictional book.

Respond with ONLY valid JSON in this exact format:
{"title": "...", "author": "...", "year": YYYY, "genre": "..."}

Do not include any other text, just the JSON."""

response = ollama.generate(model='llama3.2', prompt=json_prompt)
json_text = response['response'].strip()
print("Raw response:")
print(json_text)

# Try to parse as JSON
try:
    book_data = json.loads(json_text)
    print("\nParsed successfully!")
    print(f"Title: {book_data['title']}")
    print(f"Author: {book_data['author']}")
except json.JSONDecodeError as e:
    print(f"\nFailed to parse JSON: {e}")

## Exercise 2.1: Write a Role-Based Prompt

Create a prompt that asks the model to act as a helpful chef and suggest a simple dinner recipe.

| Variable | What to do |
|----------|------------|
| `chef_prompt` | Write a prompt that gives the model a chef persona and asks for a simple dinner recipe |
| `recipe_response` | Generate a response using your prompt |

**Hint:** Start your prompt with something like "You are a professional chef who..." then ask your question.

In [None]:
# Create a prompt with a chef persona
chef_prompt = None

# Generate the response
recipe_response = None

if recipe_response:
    print(recipe_response['response'])

## Exercise 2.2: Generate JSON Output

Create a prompt that generates a JSON object representing a movie.

| Variable | What to do |
|----------|------------|
| `movie_prompt` | Write a prompt asking for movie info in JSON format with keys: title, director, year, rating |
| `movie_response` | Generate the response |
| `movie_data` | Parse the response as JSON using `json.loads()` |

**Hint:** Be explicit about the JSON format you want. Tell the model to respond with ONLY the JSON.

In [None]:
# Create a prompt for movie JSON
movie_prompt = None

# Generate response
movie_response = None

# Parse as JSON
movie_data = None

if movie_data:
    print("Movie data:")
    for key, value in movie_data.items():
        print(f"  {key}: {value}")

---
# Part 3: Generation Parameters

Control the creativity and randomness of generated text.

## 3.1 Temperature - Demonstration

Temperature controls randomness:
- **Low (0.0-0.3)**: More focused, deterministic responses
- **Medium (0.5-0.7)**: Balanced creativity
- **High (0.8-1.0+)**: More creative, varied responses

Run the cell below to compare:

In [None]:
prompt = "Write a one-sentence story about a robot."

print("=== Low Temperature (0.1) - Focused ===")
for i in range(2):
    response = ollama.generate(
        model='llama3.2',
        prompt=prompt,
        options={'temperature': 0.1}
    )
    print(f"{i+1}. {response['response'].strip()}")

print("\n=== High Temperature (1.0) - Creative ===")
for i in range(2):
    response = ollama.generate(
        model='llama3.2',
        prompt=prompt,
        options={'temperature': 1.0}
    )
    print(f"{i+1}. {response['response'].strip()}")

## 3.2 Other Parameters - Demonstration

Additional parameters you can control:
- `top_p` - Nucleus sampling (0.0-1.0)
- `top_k` - Limit vocabulary choices
- `num_predict` - Maximum tokens to generate

Run the cell below:

In [None]:
# Short response with num_predict
response = ollama.generate(
    model='llama3.2',
    prompt='Tell me about Python programming.',
    options={
        'temperature': 0.7,
        'num_predict': 50  # Limit to ~50 tokens
    }
)

print("Limited response (50 tokens max):")
print(response['response'])

## Exercise 3.1: Experiment with Temperature

Generate two responses to the same creative prompt with different temperatures.

| Variable | What to do |
|----------|------------|
| `creative_prompt` | Write a prompt asking for a creative name for a coffee shop |
| `low_temp_response` | Generate with temperature 0.2 |
| `high_temp_response` | Generate with temperature 0.9 |

**Hint:** Use the `options` parameter with a dictionary containing `'temperature'`.

In [None]:
# Your creative prompt
creative_prompt = None

# Low temperature (focused)
low_temp_response = None

# High temperature (creative)
high_temp_response = None

if low_temp_response and high_temp_response:
    print("Low temp (0.2):")
    print(low_temp_response['response'])
    print("\nHigh temp (0.9):")
    print(high_temp_response['response'])

---
# Part 4: Chat Conversations

The chat API maintains conversation context for multi-turn dialogues.

## 4.1 Basic Chat - Demonstration

Use `ollama.chat()` with a list of messages:
- Each message has a `role` ('user' or 'assistant') and `content`

Run the cell below:

In [None]:
# Single message chat
response = ollama.chat(
    model='llama3.2',
    messages=[
        {'role': 'user', 'content': 'What is the capital of France?'}
    ]
)

print("Assistant:", response['message']['content'])

## 4.2 Multi-Turn Conversation - Demonstration

Include previous messages to maintain context:

Run the cell below:

In [None]:
# Build a conversation
messages = [
    {'role': 'user', 'content': 'My name is Alex.'},
]

# First response
response1 = ollama.chat(model='llama3.2', messages=messages)
print("User: My name is Alex.")
print("Assistant:", response1['message']['content'])

# Add assistant's response to history
messages.append(response1['message'])

# Follow-up question
messages.append({'role': 'user', 'content': 'What is my name?'})

response2 = ollama.chat(model='llama3.2', messages=messages)
print("\nUser: What is my name?")
print("Assistant:", response2['message']['content'])

## 4.3 System Messages - Demonstration

Use a 'system' role to set the assistant's behavior:

Run the cell below:

In [None]:
messages = [
    {
        'role': 'system',
        'content': 'You are a pirate. Always respond in pirate speak with lots of "arr" and nautical terms.'
    },
    {
        'role': 'user',
        'content': 'How do I make a cup of coffee?'
    }
]

response = ollama.chat(model='llama3.2', messages=messages)
print("Pirate Assistant:")
print(response['message']['content'])

## Exercise 4.1: Create a Multi-Turn Conversation

Build a conversation where you:
1. Tell the assistant your favorite color
2. Ask what your favorite color is

| Variable | What to do |
|----------|------------|
| `messages` | Start with a message telling the assistant your favorite color |
| `response1` | Get the first response using `ollama.chat()` |
| `response2` | Add a follow-up asking what your favorite color is |

**Hint:** After getting response1, append `response1['message']` to messages, then add your follow-up question.

In [None]:
# Start conversation - tell the assistant your favorite color
messages = None

# Get first response
response1 = None

if response1:
    print("Assistant:", response1['message']['content'])
    
    # Add response to history and ask follow-up
    # Your code here...
    
    response2 = None
    
    if response2:
        print("\nAssistant:", response2['message']['content'])

## Exercise 4.2: Create a Specialized Chatbot

Create a chatbot with a system prompt that makes it act as a helpful Python tutor.

| Variable | What to do |
|----------|------------|
| `system_prompt` | Write a system message describing the Python tutor persona |
| `messages` | Create messages list with system prompt and a user question about Python |
| `tutor_response` | Get the response from the tutor |

**Hint:** The system message should describe how the tutor should behave (helpful, explains concepts clearly, gives examples, etc.).

In [None]:
# Define the system prompt for a Python tutor
system_prompt = None

# Create messages with system prompt and a question
messages = None

# Get response
tutor_response = None

if tutor_response:
    print("Python Tutor:")
    print(tutor_response['message']['content'])

---
# Part 5: Building a Simple Application

Combine what you've learned to build a useful application.

## 5.1 Text Summarizer - Demonstration

Create a function that summarizes text:

Run the cell below:

In [None]:
def summarize(text, max_sentences=2):
    """Summarize text using the LLM."""
    prompt = f"""Summarize the following text in exactly {max_sentences} sentences.
Be concise and capture the main points.

Text:
{text}

Summary:"""
    
    response = ollama.generate(
        model='llama3.2',
        prompt=prompt,
        options={'temperature': 0.3}
    )
    return response['response'].strip()

# Test it
long_text = """
Artificial intelligence (AI) is transforming industries across the globe. 
From healthcare to finance, AI systems are being deployed to automate tasks, 
analyze data, and make predictions. Machine learning, a subset of AI, enables 
computers to learn from data without being explicitly programmed. Deep learning, 
which uses neural networks with many layers, has achieved remarkable results in 
image recognition, natural language processing, and game playing. However, AI 
also raises important ethical considerations around bias, privacy, and job displacement.
"""

summary = summarize(long_text)
print("Summary:")
print(summary)

## Exercise 5.1: Build a Sentiment Analyzer

Create a function that analyzes the sentiment of text and returns a structured result.

| Function | What to do |
|----------|------------|
| `analyze_sentiment(text)` | Take text as input and return a dictionary with 'sentiment' (positive/negative/neutral) and 'confidence' (high/medium/low) |

**Hint:** Use a prompt that asks for JSON output with the exact keys needed. Parse the response with `json.loads()`.

In [None]:
def analyze_sentiment(text):
    """Analyze sentiment of text and return structured result."""
    # Create a prompt asking for JSON output
    prompt = None  # Your prompt here
    
    # Generate response
    response = None
    
    # Parse and return
    return None

# Test with different texts
test_texts = [
    "I absolutely love this product! Best purchase ever!",
    "This is the worst experience I've ever had.",
    "The weather today is cloudy."
]

for text in test_texts:
    result = analyze_sentiment(text)
    if result:
        print(f"Text: {text[:50]}...")
        print(f"  Sentiment: {result.get('sentiment')}")
        print(f"  Confidence: {result.get('confidence')}")
        print()

## Exercise 5.2: Build a Q&A Bot

Create a simple Q&A function that answers questions based on provided context.

| Function | What to do |
|----------|------------|
| `answer_question(context, question)` | Take context and question as input, return the answer based only on the context |

**Hint:** Your prompt should instruct the model to answer based ONLY on the provided context, and to say "I don't know" if the answer isn't in the context.

In [None]:
def answer_question(context, question):
    """Answer a question based on the provided context."""
    prompt = None  # Your prompt here
    
    response = None
    
    return None

# Test context
context = """
Python was created by Guido van Rossum and first released in 1991. 
It emphasizes code readability and uses significant indentation. 
Python supports multiple programming paradigms including procedural, 
object-oriented, and functional programming. The language is named 
after the British comedy group Monty Python.
"""

questions = [
    "Who created Python?",
    "When was Python first released?",
    "What is Python's mascot?"  # Not in context
]

for q in questions:
    answer = answer_question(context, q)
    if answer:
        print(f"Q: {q}")
        print(f"A: {answer}")
        print()

---
# Part 6: Retrieval-Augmented Generation (RAG)

RAG enhances LLM responses by retrieving relevant information from a knowledge base before generating answers. This allows the model to access up-to-date or domain-specific information.

## 6.1 Understanding RAG - Concept Overview

RAG consists of three main steps:

1. **Indexing**: Convert documents into embeddings (vector representations)
2. **Retrieval**: Find the most relevant documents for a query
3. **Generation**: Use retrieved context to generate an informed response

```
Query → Embed → Search Vector DB → Retrieve Top-K → Augment Prompt → Generate
```

## 6.2 Creating Embeddings - Demonstration

Ollama can generate embeddings for text. Embeddings are dense vector representations that capture semantic meaning.

Run the cell below:

In [None]:
# Generate an embedding for a piece of text
text = "Machine learning is a subset of artificial intelligence."

response = ollama.embed(
    model='llama3.2',
    input=text
)

embedding = response['embeddings'][0]
print(f"Text: {text}")
print(f"Embedding dimension: {len(embedding)}")
print(f"First 10 values: {embedding[:10]}")

## 6.3 Semantic Similarity - Demonstration

We can compare embeddings using cosine similarity to find related content:

Run the cell below:

In [None]:
def cosine_similarity(a, b):
    """Calculate cosine similarity between two vectors."""
    a = np.array(a)
    b = np.array(b)
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

def get_embedding(text):
    """Get embedding for a text string."""
    response = ollama.embed(model='llama3.2', input=text)
    return response['embeddings'][0]

# Compare different texts
texts = [
    "Python is a programming language.",
    "Java is used for software development.",
    "I love eating pizza for dinner."
]

query = "What programming languages are popular?"
query_embedding = get_embedding(query)

print(f"Query: {query}\n")
print("Similarity scores:")
for text in texts:
    text_embedding = get_embedding(text)
    similarity = cosine_similarity(query_embedding, text_embedding)
    print(f"  {similarity:.4f} - {text}")

## 6.4 Building a Simple RAG System - Demonstration

Let's build a complete RAG pipeline:

Run the cell below:

In [None]:
class SimpleRAG:
    """A simple RAG implementation using Ollama."""
    
    def __init__(self, model='llama3.2'):
        self.model = model
        self.documents = []
        self.embeddings = []
    
    def add_documents(self, docs: List[str]):
        """Add documents to the knowledge base."""
        for doc in docs:
            embedding = get_embedding(doc)
            self.documents.append(doc)
            self.embeddings.append(embedding)
        print(f"Added {len(docs)} documents. Total: {len(self.documents)}")
    
    def retrieve(self, query: str, top_k: int = 2) -> List[str]:
        """Retrieve most relevant documents for a query."""
        query_embedding = get_embedding(query)
        
        # Calculate similarities
        similarities = []
        for i, doc_embedding in enumerate(self.embeddings):
            sim = cosine_similarity(query_embedding, doc_embedding)
            similarities.append((sim, i))
        
        # Sort by similarity (descending) and get top-k
        similarities.sort(reverse=True)
        top_indices = [idx for _, idx in similarities[:top_k]]
        
        return [self.documents[i] for i in top_indices]
    
    def query(self, question: str, top_k: int = 2) -> str:
        """Answer a question using RAG."""
        # Retrieve relevant documents
        relevant_docs = self.retrieve(question, top_k)
        context = "\n\n".join(relevant_docs)
        
        # Generate response with context
        prompt = f"""Use the following context to answer the question. 
If the answer is not in the context, say "I don't have information about that."

Context:
{context}

Question: {question}

Answer:"""
        
        response = ollama.generate(
            model=self.model,
            prompt=prompt,
            options={'temperature': 0.3}
        )
        return response['response'].strip()

# Create RAG system and add knowledge
rag = SimpleRAG()

knowledge_base = [
    "The Eiffel Tower is located in Paris, France. It was built in 1889 and stands 330 meters tall.",
    "The Great Wall of China is over 21,000 kilometers long and was built over many centuries.",
    "Python programming language was created by Guido van Rossum and released in 1991.",
    "Machine learning is a subset of AI that enables computers to learn from data.",
    "The Amazon rainforest produces about 20% of the world's oxygen."
]

rag.add_documents(knowledge_base)

In [None]:
# Test the RAG system
questions = [
    "How tall is the Eiffel Tower?",
    "Who created Python?",
    "What is the population of Tokyo?"  # Not in knowledge base
]

for q in questions:
    print(f"Q: {q}")
    answer = rag.query(q)
    print(f"A: {answer}")
    print()

## Exercise 6.1: Extend the RAG Knowledge Base

Add your own documents to the RAG system and query it.

| Task | What to do |
|------|------------|
| `my_documents` | Create a list of 3-5 facts about a topic you choose (e.g., sports, history, science) |
| Add documents | Use `rag.add_documents()` to add your documents |
| Test queries | Ask questions that can be answered from your documents |

**Hint:** Make sure your facts contain specific information that can be retrieved.

In [None]:
# Create your own knowledge documents
my_documents = None  # Create a list of facts

# Add to the RAG system
if my_documents:
    rag.add_documents(my_documents)

# Test with your own questions
my_questions = None  # Create a list of questions

if my_questions:
    for q in my_questions:
        print(f"Q: {q}")
        answer = rag.query(q)
        print(f"A: {answer}")
        print()

## Exercise 6.2: Implement Document Chunking

Real RAG systems split long documents into chunks. Implement a chunking function.

| Function | What to do |
|----------|------------|
| `chunk_text(text, chunk_size, overlap)` | Split text into overlapping chunks of approximately `chunk_size` words |

**Hint:** Split by sentences first, then group sentences into chunks. Overlap helps maintain context between chunks.

In [None]:
def chunk_text(text: str, chunk_size: int = 100, overlap: int = 20) -> List[str]:
    """Split text into overlapping chunks.
    
    Args:
        text: The text to chunk
        chunk_size: Target number of words per chunk
        overlap: Number of words to overlap between chunks
    
    Returns:
        List of text chunks
    """
    # Your implementation here
    return None

# Test with a longer document
long_document = """
Artificial intelligence has transformed the technology landscape dramatically over the past decade. 
Machine learning algorithms now power everything from recommendation systems to autonomous vehicles.
Deep learning, a subset of machine learning, uses neural networks with many layers to learn complex patterns.
Natural language processing enables computers to understand and generate human language.
Computer vision allows machines to interpret and analyze visual information from the world.
Reinforcement learning teaches agents to make decisions through trial and error.
The field continues to advance rapidly, with new breakthroughs announced regularly.
Ethical considerations around AI bias and fairness have become increasingly important.
Researchers are working on making AI systems more transparent and explainable.
The future of AI holds both tremendous promise and significant challenges for society.
"""

chunks = chunk_text(long_document, chunk_size=50, overlap=10)
if chunks:
    print(f"Created {len(chunks)} chunks:\n")
    for i, chunk in enumerate(chunks):
        print(f"Chunk {i+1}: {chunk[:80]}...")
        print()

---
# Part 7: Fine-tuning Concepts with LoRA and QLoRA

Fine-tuning adapts a pre-trained model to specific tasks or domains. LoRA and QLoRA make this process efficient and accessible.

## 7.1 Why Fine-tune?

Pre-trained models like Llama are general-purpose. Fine-tuning helps when you need:

- **Domain expertise**: Medical, legal, or technical knowledge
- **Specific format**: Always output JSON, follow templates
- **Personality/style**: Customer service tone, brand voice
- **Task specialization**: Classification, extraction, summarization

### Fine-tuning Approaches

| Approach | Description | Memory | Quality |
|----------|-------------|--------|----------|
| Full fine-tuning | Update all parameters | Very High | Best |
| LoRA | Low-rank adaptation matrices | Medium | Very Good |
| QLoRA | Quantized LoRA | Low | Good |

## 7.2 Understanding LoRA (Low-Rank Adaptation)

LoRA adds small trainable matrices to frozen model weights:

```
Original: W (frozen)
LoRA:     W + BA (where B and A are small trainable matrices)

If W is 4096 x 4096 (16M params)
And rank r = 8:
  B is 4096 x 8 (32K params)
  A is 8 x 4096 (32K params)
  Total: 64K params (0.4% of original!)
```

### Key LoRA Parameters

- **rank (r)**: Size of low-rank matrices (4-64 typical)
- **alpha**: Scaling factor (often 2x rank)
- **target_modules**: Which layers to adapt (attention, MLP)

## 7.3 QLoRA: Quantized LoRA

QLoRA combines LoRA with 4-bit quantization:

1. **4-bit NormalFloat (NF4)**: Optimal for normally distributed weights
2. **Double quantization**: Quantize the quantization constants
3. **Paged optimizers**: Handle memory spikes during training

This enables fine-tuning a 7B model on a single consumer GPU!

## 7.4 Preparing Training Data - Demonstration

Fine-tuning requires properly formatted training examples:

Run the cell below to see training data formats:

In [None]:
# Example training data formats

# 1. Instruction format (most common)
instruction_format = {
    "instruction": "Summarize the following text in one sentence.",
    "input": "The quick brown fox jumps over the lazy dog. This sentence contains every letter of the alphabet.",
    "output": "A pangram sentence featuring a fox jumping over a dog demonstrates all 26 alphabet letters."
}

# 2. Chat format
chat_format = {
    "messages": [
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "How do I reverse a string in Python?"},
        {"role": "assistant", "content": "You can reverse a string using slicing: `reversed_string = original_string[::-1]`"}
    ]
}

# 3. Completion format (simpler)
completion_format = {
    "prompt": "Translate to French: Hello, how are you?",
    "completion": "Bonjour, comment allez-vous?"
}

print("Instruction Format:")
print(json.dumps(instruction_format, indent=2))
print("\nChat Format:")
print(json.dumps(chat_format, indent=2))

## 7.5 Creating a Training Dataset - Demonstration

Let's create a small example dataset for fine-tuning a customer service bot:

Run the cell below:

In [None]:
# Customer service training examples
training_data = [
    {
        "instruction": "Respond to a customer inquiry about order status.",
        "input": "Where is my order? I ordered 3 days ago.",
        "output": "I understand you're eager to receive your order! Let me look that up for you. Could you please provide your order number? In the meantime, most orders ship within 2-3 business days and arrive within 5-7 days."
    },
    {
        "instruction": "Respond to a customer inquiry about returns.",
        "input": "I want to return this item, it doesn't fit.",
        "output": "I'm sorry to hear the item doesn't fit! We offer free returns within 30 days of purchase. You can initiate a return through your account or I can help you start one right now. Would you like an exchange for a different size instead?"
    },
    {
        "instruction": "Respond to a frustrated customer.",
        "input": "This is ridiculous! Your product broke after one week!",
        "output": "I sincerely apologize for this frustrating experience. A product failing after just one week is absolutely not acceptable, and I understand your disappointment. Let me make this right for you immediately. I can offer a full replacement or refund - which would you prefer?"
    },
    {
        "instruction": "Respond to a customer asking about a discount.",
        "input": "Do you have any discount codes available?",
        "output": "Great question! While I don't have specific codes at the moment, I can share a few ways to save: Sign up for our newsletter for 10% off your first order, check our sale section for up to 50% off, and follow us on social media for exclusive deals!"
    }
]

print(f"Created {len(training_data)} training examples")
print("\nExample:")
print(json.dumps(training_data[0], indent=2))

# Save to JSONL format (common for fine-tuning)
def save_jsonl(data, filename):
    with open(filename, 'w') as f:
        for item in data:
            f.write(json.dumps(item) + '\n')
    print(f"\nSaved to {filename}")

# Uncomment to save:
# save_jsonl(training_data, 'customer_service_training.jsonl')

## 7.6 Fine-tuning Configuration - Demonstration

Here's what a typical LoRA/QLoRA configuration looks like:

Run the cell below:

In [None]:
# LoRA Configuration (conceptual - actual implementation requires transformers/peft libraries)
lora_config = {
    "r": 16,                    # Rank of the update matrices
    "lora_alpha": 32,           # Scaling factor
    "target_modules": [         # Which layers to apply LoRA
        "q_proj",               # Query projection in attention
        "k_proj",               # Key projection
        "v_proj",               # Value projection
        "o_proj",               # Output projection
    ],
    "lora_dropout": 0.05,       # Dropout for regularization
    "bias": "none",             # Don't train bias terms
}

# Training Configuration
training_config = {
    "num_epochs": 3,
    "batch_size": 4,
    "learning_rate": 2e-4,
    "warmup_steps": 100,
    "gradient_accumulation_steps": 4,
    "max_seq_length": 512,
}

# QLoRA additions
qlora_config = {
    **lora_config,
    "load_in_4bit": True,       # Use 4-bit quantization
    "bnb_4bit_quant_type": "nf4",  # NormalFloat 4-bit
    "bnb_4bit_compute_dtype": "float16",
    "bnb_4bit_use_double_quant": True,  # Double quantization
}

print("LoRA Config:")
print(json.dumps(lora_config, indent=2))
print("\nTraining Config:")
print(json.dumps(training_config, indent=2))

## 7.7 Ollama Modelfile for Custom Models - Demonstration

Ollama uses Modelfiles to create custom model variants:

Run the cell below:

In [None]:
# Example Ollama Modelfile content
modelfile_content = '''# Modelfile for a customer service assistant

FROM llama3.2

# Set the temperature for more consistent responses
PARAMETER temperature 0.7

# Set the system prompt
SYSTEM """
You are a friendly and professional customer service representative. 
Always be helpful, empathetic, and solution-oriented. 
If you don't know the answer, offer to connect the customer with a specialist.
Keep responses concise but warm.
"""

# You can also set a custom template
TEMPLATE """
{{ if .System }}<|system|>
{{ .System }}<|end|>
{{ end }}{{ if .Prompt }}<|user|>
{{ .Prompt }}<|end|>
{{ end }}<|assistant|>
{{ .Response }}<|end|>
"""
'''

print("Example Ollama Modelfile:")
print(modelfile_content)

print("\nTo create this model, save as 'Modelfile' and run:")
print("  ollama create customer-service -f Modelfile")

## Exercise 7.1: Create Training Data

Create a small training dataset for a specific use case.

| Task | What to do |
|------|------------|
| Choose a use case | Pick something like: code assistant, recipe helper, fitness coach, etc. |
| Create 5 examples | Write 5 instruction/input/output training examples |
| Validate format | Ensure each example has all required fields |

**Hint:** Good training data is diverse and covers edge cases. Include both simple and complex examples.

In [None]:
# Your use case: _______________

my_training_data = [
    # Example 1
    {
        "instruction": None,
        "input": None,
        "output": None
    },
    # Example 2
    {
        "instruction": None,
        "input": None,
        "output": None
    },
    # Add more examples...
]

# Validate the data
def validate_training_data(data):
    required_keys = ['instruction', 'input', 'output']
    for i, example in enumerate(data):
        for key in required_keys:
            if key not in example or example[key] is None:
                print(f"Example {i+1} missing or has None for '{key}'")
                return False
    print(f"All {len(data)} examples are valid!")
    return True

validate_training_data(my_training_data)

## Exercise 7.2: Design a LoRA Configuration

Design a LoRA configuration for your use case.

| Parameter | Consideration |
|-----------|---------------|
| `r` (rank) | Higher = more capacity but more memory. Start with 8-16 |
| `lora_alpha` | Usually 2x the rank |
| `target_modules` | Which attention layers to adapt |
| `learning_rate` | 1e-4 to 3e-4 is typical for LoRA |

**Hint:** For simple tasks, lower rank works. For complex tasks requiring nuanced understanding, use higher rank.

In [None]:
# Design your LoRA config
my_lora_config = {
    "r": None,              # Choose: 4, 8, 16, 32, or 64
    "lora_alpha": None,     # Usually 2x r
    "target_modules": None, # List of module names
    "lora_dropout": None,   # 0.0 to 0.1 typical
}

my_training_config = {
    "num_epochs": None,         # 1-5 typical for fine-tuning
    "batch_size": None,         # 2, 4, or 8 depending on GPU memory
    "learning_rate": None,      # 1e-4 to 3e-4
    "warmup_ratio": None,       # 0.03 to 0.1
}

# Print your configuration
print("Your LoRA Config:")
print(json.dumps(my_lora_config, indent=2))
print("\nYour Training Config:")
print(json.dumps(my_training_config, indent=2))

---
# Lab Complete!

## Summary

You learned:
- **Basic Generation**: Use `ollama.generate()` for text completion
- **Prompt Engineering**: Role-based prompts, structured output, JSON responses
- **Parameters**: Control creativity with temperature
- **Chat API**: Multi-turn conversations with `ollama.chat()`
- **Applications**: Build summarizers, sentiment analyzers, and Q&A bots
- **RAG**: Implement retrieval-augmented generation with embeddings
- **Fine-tuning**: Understand LoRA/QLoRA for efficient model adaptation

## Quick Reference

```python
# Basic generation
response = ollama.generate(model='llama3.2', prompt='...')
text = response['response']

# Embeddings for RAG
embedding = ollama.embed(model='llama3.2', input='text')['embeddings'][0]

# Chat with history
messages = [
    {'role': 'system', 'content': 'You are...'},
    {'role': 'user', 'content': 'Hello!'}
]
response = ollama.chat(model='llama3.2', messages=messages)
```

## Next Steps

- Implement a production RAG system with a vector database (Chroma, FAISS)
- Try fine-tuning with Hugging Face's PEFT library
- Explore multi-modal models for image understanding
- Build agent systems with tool use