# 📚 Generating Knowledge Base Articles from Support Tickets

## What You'll Learn

In this notebook, you'll learn how to:
- Transform messy support ticket notes into professional knowledge base articles
- Use OpenAI's API to automate documentation creation
- Validate and quality-check generated content
- Generate multiple formats (KB articles, runbooks, email templates) from the same source
- Track and optimize API costs

## The Business Problem

Support teams face a recurring challenge:

**🔴 The Problem:**
- Support agents solve the same issues repeatedly
- Knowledge is trapped in ticket notes (messy, inconsistent, full of jargon)
- Ticket notes are written for technicians, not end-users
- Users can't self-serve, creating high ticket volume
- Manual KB article writing is time-consuming and often gets deprioritized

**🟢 The Solution:**
- LLMs can transform technical ticket notes into polished, user-friendly guides
- Automated generation makes it feasible to document every common issue
- Users get self-service resources, reducing ticket volume
- Support teams can focus on complex issues instead of repetitive ones

## Key Concepts

- **Knowledge Base (KB)**: A self-service library of articles that help users solve problems independently
- **Support Ticket**: A record of a user's issue and how it was resolved
- **Documentation Automation**: Using AI to convert technical notes into user-friendly documentation

## Cost Efficiency

💡 **Key Point**: We'll use `gpt-5-nano` for all examples in this notebook. Generating one KB article typically costs ~500-800 tokens, making it very affordable to document your entire knowledge base.


---
## 🔧 Setup

### Install Required Dependencies

In [None]:
# Install required packages
!pip install -q openai tqdm

### Import Libraries

In [None]:
import os
import csv
import json
from datetime import datetime
from openai import OpenAI
from tqdm import tqdm

### Configure OpenAI API Key

💡 **Two methods to set your API key:**
1. **Recommended**: Store in Colab secrets (🔑 icon in left sidebar)
2. **Fallback**: Manual input when prompted

In [None]:
# Configure OpenAI API key
# Method 1: Try to get API key from Colab secrets (recommended)
try:
    from google.colab import userdata
    OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
    print("✅ API key loaded from Colab secrets")
except:
    # Method 2: Manual input (fallback)
    from getpass import getpass
    print("💡 To use Colab secrets: Go to 🔑 (left sidebar) → Add new secret → Name: OPENAI_API_KEY")
    OPENAI_API_KEY = getpass("Enter your OpenAI API Key: ")

# Set the API key as an environment variable
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

# Validate that the API key is set
if not OPENAI_API_KEY or OPENAI_API_KEY.strip() == "":
    raise ValueError("❌ ERROR: No API key provided!")

print("✅ Authentication configured!")

# Configure which OpenAI model to use
OPENAI_MODEL = "gpt-5-nano"  # Using gpt-5-nano for cost efficiency
print(f"🤖 Selected Model: {OPENAI_MODEL}")

# Initialize OpenAI client
client = OpenAI(api_key=OPENAI_API_KEY)

---
## 🔍 Understanding the Problem

Let's see the difference between ticket notes and knowledge base articles.

### Example: Password Reset Issue

**📝 TICKET NOTES (written by technician for technicians):**
```
User locked out. AD sync delay. Forced sync. Reset in portal. Fixed. 15min wait important.
```

**📄 KB ARTICLE (written for end-users):**

---
### How to Unlock Your Account After Multiple Failed Login Attempts

**Problem**: You've entered your password incorrectly several times and now your account is locked.

**Solution**:

1. **Wait 15 minutes** - Your account will automatically unlock after 15 minutes
2. **Don't keep trying to log in** - Additional login attempts will reset the timer
3. **Use the password reset portal** if you've forgotten your password:
   - Go to https://portal.company.com/reset
   - Enter your username
   - Follow the email instructions
4. **Wait for synchronization** - After resetting, wait 2-3 minutes before trying to log in

**⚠️ Common Mistakes:**
- Trying to log in immediately after password reset (wait 2-3 minutes)
- Continuing to attempt logins during the lockout period (this extends the lockout)

**Still having issues?** Contact the helpdesk at ext. 5555

---

### Key Differences

| Ticket Notes | KB Article |
|--------------|------------|
| Abbreviations (AD, sync) | Full terms explained |
| Technical jargon | User-friendly language |
| Missing steps | Complete step-by-step |
| Assumes knowledge | Assumes no prior knowledge |
| Quick reference | Comprehensive guide |

💡 **Key Point**: LLMs bridge this gap by expanding abbreviations, adding context, structuring information, and adjusting tone for the target audience.


---
## 🎲 Mock Data Generation

We'll create realistic support ticket data programmatically to simulate a real helpdesk environment.

In [None]:
def generate_mock_tickets():
    """
    Generate diverse support tickets covering common IT issues.
    Returns a list of ticket dictionaries.
    """
    tickets = [
        {
            "ticket_id": "T001",
            "category": "Password_Account",
            "issue_summary": "User account locked after failed login attempts",
            "short_notes": "User locked out. AD sync delay. Forced sync. Reset in portal. Fixed. 15min wait important.",
            "solution_steps": "1. Wait 15min for auto-unlock 2. Use reset portal 3. Wait 2-3min after reset 4. Try login",
            "common_mistakes": "Trying login immediately after reset, Multiple attempts during lockout",
            "keywords": "account locked, password reset, login failed, AD sync"
        },
        {
            "ticket_id": "T002",
            "category": "VPN_Remote_Access",
            "issue_summary": "VPN client won't connect from home",
            "short_notes": "VPN stuck connecting. FW blocking port 443. Changed to port 1194. Works now. Check router settings.",
            "solution_steps": "1. Open VPN client 2. Settings > Advanced 3. Change port 443 to 1194 4. Reconnect 5. Check router FW if still fails",
            "common_mistakes": "Not restarting VPN after settings change, Home router blocking VPN ports",
            "keywords": "VPN connection, remote access, firewall, port blocking"
        },
        {
            "ticket_id": "T003",
            "category": "Email_Outlook",
            "issue_summary": "Outlook not syncing emails on mobile device",
            "short_notes": "Mobile not syncing. Cache issue. Removed acct, re-added. Modern auth required. Works.",
            "solution_steps": "1. Remove email account 2. Clear Outlook cache 3. Re-add account 4. Enable modern auth when prompted",
            "common_mistakes": "Not clearing cache before re-adding, Using old password",
            "keywords": "Outlook mobile, email sync, modern authentication, cache"
        },
        {
            "ticket_id": "T004",
            "category": "Printing",
            "issue_summary": "Printer shows offline but is powered on",
            "short_notes": "Printer offline. IP conflict. Released old lease. Renewed DHCP. Reset print spooler. Fixed.",
            "solution_steps": "1. Check printer IP 2. Release/renew DHCP 3. Restart print spooler service 4. Re-add printer if needed",
            "common_mistakes": "Not checking IP conflict, Wrong printer driver",
            "keywords": "printer offline, IP conflict, DHCP, print spooler"
        },
        {
            "ticket_id": "T005",
            "category": "Software_Installation",
            "issue_summary": "Software installation fails with error 1603",
            "short_notes": "Install fails. Prev version remnants. Used cleanup tool. Rebooted. Install success.",
            "solution_steps": "1. Uninstall old version 2. Run cleanup tool 3. Reboot system 4. Install new version as admin",
            "common_mistakes": "Not running as administrator, Skipping reboot",
            "keywords": "installation error, error 1603, software install, cleanup"
        }
    ]
    
    return tickets

# Generate tickets
print("🎲 Generating mock support tickets...")
tickets = generate_mock_tickets()
print(f"✅ Generated {len(tickets)} support tickets\n")

# Save to CSV
csv_file = "/content/sample_tickets.csv"
fieldnames = ["ticket_id", "category", "issue_summary", "short_notes", "solution_steps", "common_mistakes", "keywords"]

with open(csv_file, 'w', newline='', encoding='utf-8') as f:
    writer = csv.DictWriter(f, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerows(tickets)

print(f"💾 Saved tickets to: {csv_file}\n")

# Display sample tickets
print("📋 Sample Tickets:\n")
print("=" * 80)
for ticket in tickets[:3]:
    print(f"🎫 Ticket ID: {ticket['ticket_id']}")
    print(f"   Category: {ticket['category']}")
    print(f"   Issue: {ticket['issue_summary']}")
    print(f"   Notes: {ticket['short_notes']}")
    print("=" * 80)

print(f"\n✅ CSV file created successfully with {len(tickets)} tickets")

---
## 📝 Basic KB Article Generation

### Part A: Generate a Single Article

Let's start by generating one KB article from a ticket.

In [None]:
# Select a ticket to work with
selected_ticket = tickets[0]  # Password/Account issue

print(f"📋 Selected Ticket: {selected_ticket['ticket_id']}")
print(f"Issue: {selected_ticket['issue_summary']}\n")

In [None]:
# Create the prompt for KB article generation
def create_kb_prompt(ticket):
    """
    Create a detailed prompt for generating a user-friendly KB article.
    """
    prompt = f"""You are a technical writer creating a knowledge base article for end-users (non-technical audience).

Based on this support ticket information:

Issue: {ticket['issue_summary']}
Technician Notes: {ticket['short_notes']}
Solution Steps: {ticket['solution_steps']}
Common Mistakes: {ticket['common_mistakes']}

Write a professional, user-friendly knowledge base article following these requirements:

1. Start with a clear, descriptive title (use ##)
2. Begin with a brief problem statement that reflects common user frustration
3. Provide step-by-step instructions with clear numbering
4. Expand all abbreviations and technical terms
5. Add a troubleshooting section for common issues
6. Include warnings about common mistakes users make
7. Use placeholders like [Screenshot: description] where helpful
8. Target length: 400-600 words (about one page)
9. Use a professional but friendly tone
10. End with a "Still need help?" section

Write the article in markdown format."""
    
    return prompt

# Generate the KB article
print("🤖 Generating KB article...\n")

prompt = create_kb_prompt(selected_ticket)

# Make API call
response = client.responses.create(
    model=OPENAI_MODEL,
    input=prompt,
    text={
        "verbosity": "medium",
        "format": {"type": "text"}
    },
    reasoning={
        "effort": "minimal"
    }
)

kb_article = response.output_text
tokens_used = response.usage.input_tokens + response.usage.output_tokens

print("✅ Article generated!\n")
print(f"📊 Tokens used: {tokens_used}\n")
print("=" * 80)
print(kb_article)
print("=" * 80)

In [None]:
# Save the article to a file
os.makedirs("/content/kb_articles", exist_ok=True)

filename = f"/content/kb_articles/KB_{selected_ticket['ticket_id']}_{selected_ticket['category']}.md"

with open(filename, 'w', encoding='utf-8') as f:
    f.write(kb_article)

print(f"💾 Article saved to: {filename}")

### Part B: Before/After Comparison

Let's see the transformation side-by-side:

In [None]:
print("🔄 TRANSFORMATION COMPARISON\n")
print("=" * 80)
print("BEFORE: Original Ticket Notes")
print("=" * 80)
print(f"Ticket ID: {selected_ticket['ticket_id']}")
print(f"Issue: {selected_ticket['issue_summary']}")
print(f"\nTechnician Notes:\n{selected_ticket['short_notes']}")
print(f"\nSolution: {selected_ticket['solution_steps']}")
print("\n" + "=" * 80)
print("AFTER: Generated KB Article")
print("=" * 80)
print(kb_article)
print("=" * 80)

print("\n💡 Key Improvements:")
print("  ✅ Abbreviations expanded (AD → Active Directory)")
print("  ✅ Steps clearly numbered and detailed")
print("  ✅ User-friendly language (no technical jargon)")
print("  ✅ Common mistakes highlighted")
print("  ✅ Professional structure with troubleshooting section")

---
## 🔁 Batch Processing

Now let's process all tickets and generate KB articles for each one.

In [None]:
def process_ticket_batch(tickets, model=OPENAI_MODEL):
    """
    Process multiple tickets and generate KB articles for each.
    
    Args:
        tickets: List of ticket dictionaries
        model: OpenAI model to use
    
    Returns:
        Dictionary with statistics and results
    """
    results = {
        "total_processed": 0,
        "successful": 0,
        "failed": 0,
        "total_tokens": 0,
        "files_created": [],
        "failures": []
    }
    
    # Create output directory
    os.makedirs("/content/kb_articles", exist_ok=True)
    
    # Process each ticket with progress bar
    for ticket in tqdm(tickets, desc="Generating KB articles"):
        results["total_processed"] += 1
        
        try:
            # Create prompt
            prompt = create_kb_prompt(ticket)
            
            # Make API call
            response = client.responses.create(
                model=model,
                input=prompt,
                text={
                    "verbosity": "medium",
                    "format": {"type": "text"}
                },
                reasoning={
                    "effort": "minimal"
                }
            )
            
            article = response.output_text
            tokens = response.usage.input_tokens + response.usage.output_tokens
            results["total_tokens"] += tokens
            
            # Save article
            filename = f"/content/kb_articles/KB_{ticket['ticket_id']}_{ticket['category']}.md"
            with open(filename, 'w', encoding='utf-8') as f:
                f.write(article)
            
            results["successful"] += 1
            results["files_created"].append(filename)
            
        except Exception as e:
            results["failed"] += 1
            results["failures"].append({"ticket_id": ticket['ticket_id'], "error": str(e)})
            print(f"\n❌ Failed to process {ticket['ticket_id']}: {str(e)}")
    
    return results

# Estimate cost BEFORE processing
tickets_to_process = tickets[:5]
avg_tokens_per_article = 600
estimated_total_tokens = avg_tokens_per_article * len(tickets_to_process)
estimated_cost = (estimated_total_tokens / 1_000_000) * 0.225

print("💰 COST ESTIMATE (before processing):")
print("=" * 80)
print(f"Articles to generate: {len(tickets_to_process)}")
print(f"Estimated tokens (avg {avg_tokens_per_article} per article): {estimated_total_tokens:,}")
print(f"Estimated cost: ${estimated_cost:.4f}")
print("=" * 80)
print()

# Process tickets
print("🚀 Starting batch processing...\n")
batch_results = process_ticket_batch(tickets_to_process)

print("\n" + "=" * 80)
print("📊 BATCH PROCESSING RESULTS")
print("=" * 80)
print(f"Total tickets processed: {batch_results['total_processed']}")
print(f"✅ Successful: {batch_results['successful']}")
print(f"❌ Failed: {batch_results['failed']}")
print(f"🎯 Total tokens used: {batch_results['total_tokens']}")
print(f"💰 Average tokens per article: {batch_results['total_tokens'] // batch_results['successful'] if batch_results['successful'] > 0 else 0}")

# Actual cost
actual_cost = (batch_results['total_tokens'] / 1_000_000) * 0.225
print(f"💵 Actual cost: ${actual_cost:.4f}")

print("\n📁 Files created:")
for file in batch_results['files_created']:
    print(f"  ✅ {file}")

if batch_results['failures']:
    print("\n⚠️ Failures:")
    for failure in batch_results['failures']:
        print(f"  ❌ {failure['ticket_id']}: {failure['error']}")

In [None]:
# Generate an index file listing all articles
index_content = "# Knowledge Base Article Index\n\n"
index_content += f"Generated on: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n\n"
index_content += f"Total articles: {batch_results['successful']}\n\n"
index_content += "## Articles by Category\n\n"

# Group by category
categories = {}
for ticket in tickets:
    category = ticket['category']
    if category not in categories:
        categories[category] = []
    categories[category].append(ticket)

# Write index
for category, cat_tickets in sorted(categories.items()):
    index_content += f"### {category.replace('_', ' ')}\n\n"
    for ticket in cat_tickets:
        filename = f"KB_{ticket['ticket_id']}_{ticket['category']}.md"
        index_content += f"- [{ticket['issue_summary']}]({filename})\n"
    index_content += "\n"

# Save index
with open("/content/kb_articles/index.md", 'w', encoding='utf-8') as f:
    f.write(index_content)

print("✅ Created index file: /content/kb_articles/index.md")

---
## ✅ Quality Checks

Before publishing KB articles, we need to validate they meet minimum quality standards.

### Theory: Why Validate?

LLMs are powerful but not perfect. They can:
- Generate articles that are too short or too verbose
- Miss important sections
- Use overly technical language
- Hallucinate information

Basic validation catches these issues before users see them.

In [None]:
import re

def check_article_length(article_text):
    """
    Check 1: Validate article length (300-700 words)
    """
    words = article_text.split()
    word_count = len(words)
    
    status = "✅ PASS"
    if word_count < 300:
        status = "❌ FAIL: Too short"
    elif word_count > 700:
        status = "⚠️ WARNING: Too long"
    
    return {
        "check": "Length Validation",
        "word_count": word_count,
        "status": status,
        "details": f"{word_count} words (target: 300-700)"
    }

def check_article_structure(article_text):
    """
    Check 2: Validate article has required sections
    """
    checks = {
        "has_title": bool(re.search(r'^#{1,3}\s+.+', article_text, re.MULTILINE)),
        "has_numbered_steps": bool(re.search(r'^\d+\.\s+', article_text, re.MULTILINE)),
        "has_problem_statement": bool(re.search(r'(problem|issue|symptom):', article_text, re.IGNORECASE)),
        "has_troubleshooting": bool(re.search(r'(troubleshoot|common (mistake|issue|problem)|still (having|need))', article_text, re.IGNORECASE))
    }
    
    missing = [k.replace('has_', '').replace('_', ' ').title() for k, v in checks.items() if not v]
    
    status = "✅ PASS" if not missing else f"⚠️ WARNING: Missing {len(missing)} section(s)"
    
    return {
        "check": "Structure Validation",
        "status": status,
        "sections_found": sum(checks.values()),
        "sections_missing": missing,
        "details": f"{sum(checks.values())}/4 required sections found"
    }

def check_readability(article_text):
    """
    Check 3: Basic readability checks
    """
    sentences = re.split(r'[.!?]+', article_text)
    sentences = [s.strip() for s in sentences if s.strip()]
    
    # Calculate average sentence length
    words = article_text.split()
    avg_sentence_length = len(words) / len(sentences) if sentences else 0
    
    # Check for action verbs (good for instructions)
    action_verbs = ['click', 'open', 'navigate', 'select', 'enter', 'type', 'go to', 'press', 'check']
    has_action_verbs = any(verb in article_text.lower() for verb in action_verbs)
    
    # Check for excessive technical jargon (basic check)
    tech_terms = ['API', 'DNS', 'DHCP', 'LDAP', 'SQL', 'regex', 'backend', 'frontend']
    jargon_count = sum(1 for term in tech_terms if term in article_text)
    jargon_density = jargon_count / len(words) * 100 if words else 0
    
    issues = []
    if avg_sentence_length > 25:
        issues.append("Long sentences")
    if not has_action_verbs:
        issues.append("Missing action verbs")
    if jargon_density > 2:
        issues.append("High jargon density")
    
    status = "✅ PASS" if not issues else f"⚠️ WARNING: {', '.join(issues)}"
    
    return {
        "check": "Readability Check",
        "status": status,
        "avg_sentence_length": round(avg_sentence_length, 1),
        "has_action_verbs": has_action_verbs,
        "jargon_density": round(jargon_density, 2),
        "details": f"Avg sentence: {round(avg_sentence_length, 1)} words, Jargon: {round(jargon_density, 2)}%"
    }

def validate_article(article_text, ticket_id):
    """
    Run all validation checks on an article
    """
    validations = {
        "ticket_id": ticket_id,
        "timestamp": datetime.now().isoformat(),
        "checks": [
            check_article_length(article_text),
            check_article_structure(article_text),
            check_readability(article_text)
        ]
    }
    
    # Overall status
    fail_count = sum(1 for check in validations['checks'] if 'FAIL' in check['status'])
    warning_count = sum(1 for check in validations['checks'] if 'WARNING' in check['status'])
    
    if fail_count > 0:
        validations['overall_status'] = "❌ FAILED"
    elif warning_count > 0:
        validations['overall_status'] = "⚠️ NEEDS REVIEW"
    else:
        validations['overall_status'] = "✅ PASSED"
    
    return validations

print("🔍 Running quality checks on generated articles...\n")

In [None]:
# Validate a few sample articles
validation_reports = []

# Select 3 articles to validate
articles_to_validate = tickets[:3]

for ticket in articles_to_validate:
    filename = f"/content/kb_articles/KB_{ticket['ticket_id']}_{ticket['category']}.md"
    
    try:
        with open(filename, 'r', encoding='utf-8') as f:
            article_text = f.read()
        
        validation = validate_article(article_text, ticket['ticket_id'])
        validation_reports.append(validation)
        
        # Print report
        print("=" * 80)
        print(f"📄 Article: {ticket['ticket_id']} - {ticket['issue_summary']}")
        print(f"Overall Status: {validation['overall_status']}")
        print("-" * 80)
        
        for check in validation['checks']:
            print(f"\n{check['check']}: {check['status']}")
            print(f"  Details: {check['details']}")
        
        print("=" * 80 + "\n")
        
    except FileNotFoundError:
        print(f"❌ File not found: {filename}\n")

# Save validation report
report_file = "/content/validation_report.json"
with open(report_file, 'w', encoding='utf-8') as f:
    json.dump(validation_reports, f, indent=2)

print(f"💾 Validation report saved to: {report_file}")

---
## 🎨 Enhancement: Multi-Format Generation

The same ticket can be transformed into different formats for different audiences:
- **End-User KB Article**: Simple, patient, step-by-step
- **Internal Runbook**: Technical, for support staff
- **Email Template**: Concise response to send to users

In [None]:
# Create output directory for different formats
os.makedirs("/content/formats", exist_ok=True)

# Select a ticket for multi-format generation
demo_ticket = tickets[1]  # VPN issue

print(f"🎯 Generating multiple formats for: {demo_ticket['issue_summary']}\n")

# Format 1: End-User KB Article
kb_prompt = f"""Create a user-friendly knowledge base article for end-users about:

Issue: {demo_ticket['issue_summary']}
Technical Notes: {demo_ticket['short_notes']}
Solution: {demo_ticket['solution_steps']}

Write in simple language, include step-by-step instructions, and use a patient, helpful tone.
Target length: ~500 words."""

# Format 2: Internal Runbook
runbook_prompt = f"""Create a technical runbook for support staff about:

Issue: {demo_ticket['issue_summary']}
Notes: {demo_ticket['short_notes']}
Solution: {demo_ticket['solution_steps']}
Common Mistakes: {demo_ticket['common_mistakes']}

Include technical details, troubleshooting commands, root cause analysis.
Use technical language appropriate for IT support staff.
Target length: ~400 words."""

# Format 3: Email Template
email_prompt = f"""Create a concise email template to send to a user who reported:

Issue: {demo_ticket['issue_summary']}
Solution: {demo_ticket['solution_steps']}

Write a professional, friendly email that:
- Acknowledges their issue
- Provides clear steps to resolve
- Offers further assistance
Target length: ~150 words."""

prompts = {
    "kb_article": kb_prompt,
    "runbook": runbook_prompt,
    "email_template": email_prompt
}

generated_formats = {}

# Generate all three formats
for format_type, prompt in prompts.items():
    print(f"📝 Generating {format_type.replace('_', ' ')}...")
    
    response = client.responses.create(
        model=OPENAI_MODEL,
        input=prompt,
        text={
        "verbosity": "medium",
        "format": {"type": "text"}
    },
    reasoning={
        "effort": "minimal"
    }
    )
    
    content = response.output_text
    generated_formats[format_type] = content
    
    # Save to file
    filename = f"/content/formats/{format_type}_{demo_ticket['ticket_id']}.md"
    with open(filename, 'w', encoding='utf-8') as f:
        f.write(content)
    
    print(f"  ✅ Saved to {filename}")

print("\n✅ All formats generated!\n")

In [None]:
# Display all three formats side-by-side
print("📊 MULTI-FORMAT COMPARISON\n")
print("=" * 80)

for format_type, content in generated_formats.items():
    print(f"\n{'='*80}")
    print(f"FORMAT: {format_type.replace('_', ' ').upper()}")
    print(f"{'='*80}")
    print(content)
    print(f"\nWord count: {len(content.split())} words")

print("\n" + "=" * 80)
print("\n💡 Key Insight: Same source data → Different outputs based on audience and purpose!")

---
## 💰 Cost Tracking & Optimization

Let's analyze the costs and explore optimization strategies.

In [None]:
def track_generation_cost(total_tokens, model="gpt-5-nano"):
    """
    Calculate costs based on token usage.
    
    Note: Adjust pricing based on actual model costs.
    Example pricing used here (fictional):
    - gpt-5-nano: $0.0001 per 1K tokens
    """
    # Pricing per 1K tokens (example rates)
    pricing = {
        "gpt-5-nano": 0.0001,
        "gpt-4": 0.03,  # For comparison
        "gpt-3.5-turbo": 0.002  # For comparison
    }
    
    rate = pricing.get(model, 0.0001)
    cost = (total_tokens / 1000) * rate
    
    return cost

# Calculate costs for our batch processing
total_tokens = batch_results['total_tokens']
articles_generated = batch_results['successful']

print("💰 COST ANALYSIS\n")
print("=" * 80)
print(f"Total articles generated: {articles_generated}")
print(f"Total tokens used: {total_tokens:,}")
print(f"Average tokens per article: {total_tokens // articles_generated if articles_generated > 0 else 0}")
print("\nCost with gpt-5-nano:")
nano_cost = track_generation_cost(total_tokens, "gpt-5-nano")
print(f"  ${nano_cost:.4f}")
print(f"  (${nano_cost/articles_generated:.6f} per article)" if articles_generated > 0 else "")

# Show comparison with other models
print("\n📊 Cost Comparison (for same token usage):")
for model in ["gpt-5-nano", "gpt-3.5-turbo", "gpt-4"]:
    cost = track_generation_cost(total_tokens, model)
    print(f"  {model:20} ${cost:.4f}")

print("=" * 80)

In [None]:
# Cost projection for larger scale
print("\n📈 COST PROJECTION\n")
print("=" * 80)

avg_tokens = total_tokens // articles_generated if articles_generated > 0 else 600
scales = [50, 100, 500, 1000]

print(f"Assuming {avg_tokens} tokens per article:\n")
print(f"{'Articles':<15} {'Total Tokens':<20} {'Cost (gpt-5-nano)'}")
print("-" * 60)

for scale in scales:
    projected_tokens = avg_tokens * scale
    projected_cost = track_generation_cost(projected_tokens, "gpt-5-nano")
    print(f"{scale:<15} {projected_tokens:<20,} ${projected_cost:.2f}")

print("=" * 80)

### 💡 Cost Optimization Tips

1. **Use the right model**: `gpt-5-nano` is perfect for draft generation (20-30x cheaper than GPT-4)
2. **Set appropriate max_tokens**: Avoid over-generation by setting realistic limits (800 for ~500 words)
3. **Batch processing**: Process during off-peak hours if you have high volume
4. **Cache common sections**: Store and reuse troubleshooting templates when applicable
5. **Filter before generating**: Only generate articles for frequently occurring issues
6. **Progressive enhancement**: Generate basic articles with gpt-5-nano, use GPT-4 only for critical/complex topics

---
## 📚 Best Practices & Key Takeaways

### Best Practices Summary

✅ **Do:**
- Keep articles focused (one issue per article)
- Validate article quality before publishing
- Use consistent naming for easy organization
- Track generation costs for budgeting
- Review generated content before making public
- Update articles as solutions evolve
- Include screenshots or placeholders for visuals
- Test instructions yourself before publishing

⚠️ **Review Carefully:**
- Security-sensitive procedures (password resets, access controls)
- Complex, multi-system issues
- Compliance-related documentation
- Instructions involving user data

❌ **Not Suitable For:**
- Issues requiring case-by-case judgment
- Problems without established solutions
- Highly variable scenarios
- Emergency procedures (require human oversight)

### When to Use This Approach

✅ **Good for:**
- Common, repetitive issues with established solutions
- Creating documentation backlog quickly
- Standardizing support responses
- Onboarding documentation
- User guides for new software rollouts
- Converting tribal knowledge to documented procedures

### Real-World Applications

1. **Building Self-Service Knowledge Base**: Process historical tickets to create comprehensive KB
2. **New Software Rollouts**: Generate user guides from testing notes
3. **Onboarding Documentation**: Convert training materials into searchable guides
4. **Knowledge Preservation**: Document departing employees' expertise
5. **Multi-Language Support**: Generate articles in multiple languages
6. **Runbook Creation**: Create internal procedures for support teams

### Key Metrics to Track

- 📊 Ticket deflection rate (% of users finding solutions in KB)
- 📈 Article usage (views, helpfulness ratings)
- 💰 Cost per article generated
- ⏱️ Time saved vs manual writing
- ✅ Article quality scores
- 🔄 Update frequency needed

---
## 🎓 Student Exercises

Practice what you've learned with these hands-on exercises!

### Exercise 1: Add New Ticket Categories

**Task**: Create 5 new tickets for a new category (e.g., "Mobile Device Setup"), generate KB articles, and validate quality.

**Steps**:
1. Create a function that generates 5 tickets for mobile device issues
2. Add them to the ticket list
3. Generate KB articles for each
4. Run validation checks
5. Review articles that need improvement

**Bonus**: Calculate the cost and compare it to manual writing time.

In [None]:
# Exercise 1: Your code here
# TODO: Create function to generate mobile device setup tickets
# TODO: Generate KB articles
# TODO: Validate articles
# TODO: Calculate costs

def generate_mobile_device_tickets():
    """
    Generate 5 tickets related to mobile device setup/issues.
    """
    # Your implementation here
    pass

# Test your implementation
# mobile_tickets = generate_mobile_device_tickets()
# print(f"Created {len(mobile_tickets)} mobile device tickets")

### Exercise 2: Create Multi-Language Support

**Task**: Modify the prompt to generate the same article in two languages (e.g., English and Spanish).

**Steps**:
1. Select one ticket
2. Create prompts for generating in English and Spanish
3. Generate both versions
4. Save with language identifiers (e.g., `KB_T001_en.md`, `KB_T001_es.md`)
5. Compare token usage between languages

**Bonus**: Create a function that generates articles in any specified language.

In [None]:
# Exercise 2: Your code here
# TODO: Create multi-language generation function
# TODO: Generate articles in English and Spanish
# TODO: Compare token usage

def generate_multilingual_article(ticket, languages=['en', 'es']):
    """
    Generate KB article in multiple languages.
    
    Args:
        ticket: Ticket dictionary
        languages: List of language codes (e.g., ['en', 'es', 'fr'])
    
    Returns:
        Dictionary with language codes as keys and articles as values
    """
    # Your implementation here
    pass

# Test your implementation
# articles = generate_multilingual_article(tickets[0], ['en', 'es'])
# print(f"Generated articles in {len(articles)} languages")

### Exercise 3: Build Article Update System

**Task**: Create a system that takes an existing article and ticket updates, then regenerates the article incorporating new information.

**Steps**:
1. Load an existing KB article
2. Create "update" information (e.g., new troubleshooting step)
3. Generate a prompt that includes the existing article and updates
4. Regenerate the article
5. Compare old vs new versions to see what changed
6. Track token usage for updates vs fresh generation

**Bonus**: Create a diff view showing exactly what changed.

In [None]:
# Exercise 3: Your code here
# TODO: Create article update function
# TODO: Load existing article
# TODO: Generate updated version
# TODO: Compare versions

def update_kb_article(article_path, updates, ticket_info):
    """
    Update an existing KB article with new information.
    
    Args:
        article_path: Path to existing article
        updates: Dictionary with update information
        ticket_info: Original ticket information
    
    Returns:
        Updated article text and change summary
    """
    # Your implementation here
    pass

# Test your implementation
# Example update:
# updates = {
#     "new_steps": "Step 5: If issue persists, check for Windows updates",
#     "new_troubleshooting": "Some users report success after disabling antivirus temporarily"
# }
# updated_article = update_kb_article("/content/kb_articles/KB_T001_Password_Account.md", updates, tickets[0])

---
## 🎉 Congratulations!

You've learned how to:
- ✅ Transform support tickets into professional KB articles
- ✅ Automate documentation creation at scale
- ✅ Validate content quality
- ✅ Generate multiple formats for different audiences
- ✅ Track and optimize API costs
- ✅ Implement best practices for AI-generated documentation

### Next Steps

1. **Try with real data**: Use actual support tickets from your organization
2. **Customize validation**: Add checks specific to your quality standards
3. **Integrate with systems**: Connect to ticket systems (Zendesk, ServiceNow, etc.)
4. **Add human review workflow**: Implement approval process before publishing
5. **Track metrics**: Measure ticket deflection and user satisfaction
6. **Iterate prompts**: Refine prompts based on feedback

### Additional Resources

- OpenAI API Documentation: https://platform.openai.com/docs
- Prompt Engineering Guide: https://platform.openai.com/docs/guides/prompt-engineering
- Token Usage Optimization: https://platform.openai.com/docs/guides/tokens

---

**Happy documenting! 📚✨**