# Notebook 01: Build Your First RAG System for IT Ticket Search

## üéØ Your Mission

You're an IT support engineer building your first intelligent ticket search system. Your job today: index IT tickets into a vector database and enable semantic search that finds similar incidents, even when they use different words.

**Why this matters:** This same RAG (Retrieval-Augmented Generation) approach is how you could build search systems that understand meaning, not just keywords - enabling faster incident resolution by finding similar past problems and their solutions.

---

## ‚ö° Quick Win: See Semantic Search in Action

**The Problem:** Traditional keyword search only finds exact matches. If you search for "application crashes", it won't find tickets about "software failures" or "system errors" - even though they're the same problem!

**The Solution:** Semantic search understands meaning. It knows that "application crashes", "software failures", and "system errors" are similar concepts.

**Try this:** After we build the system, search for "application crashes" and watch it find tickets with different wording but the same meaning!

**üí° This is the power of RAG - matching by meaning, not just keywords.**

---

## üéØ What You'll Learn

By the end of this notebook, you will:
- ‚úÖ Build a simple RAG system that indexes and searches IT tickets
- ‚úÖ Understand how semantic search works (matching meaning, not keywords)
- ‚úÖ See the difference between keyword search and semantic search
- ‚úÖ Use RAG to answer questions using retrieved ticket context

**Time:** ~15-20 minutes

---

## üìã The Journey

We'll build this step by step:

1. **Explore the Data** - See what tickets look like
2. **Understand RAG** - Learn how semantic search works
3. **Set Up LlamaStack** - Connect to our RAG platform
4. **Index Tickets** - Convert tickets into searchable vectors
5. **Query & Search** - Find tickets by meaning, not keywords

---

### Step 1: Explore the Dataset

**What we're doing:** Loading IT call center tickets and examining their structure.

**Why:** We need to understand what data we're working with before we can index it for semantic search.


In [None]:
# Import required libraries
import pandas as pd
from pathlib import Path
import uuid
from llama_stack_client import RAGDocument

# Load the CSV file from the data directory
data_dir = Path("../data")
file_path = data_dir / "synthetic-it-call-center-tickets-sample.csv"

print("üîÑ Loading IT call center tickets dataset...")
df = pd.read_csv(file_path)

print(f"‚úÖ Loaded {len(df)} tickets")
print(f"üìã Dataset shape: {df.shape[0]} rows √ó {df.shape[1]} columns")
print(f"\nüîç Let's examine the dataset:")
print("=" * 60)
df.head()

**What we see:** Each ticket has a `short_description` field that describes the problem. This is what we'll index for semantic search.

**üí° Key insight:** Traditional keyword search would only find exact matches. RAG uses semantic similarity - it understands that "application crashes" and "software failures" mean similar things!

Let's examine some example tickets to understand what we're working with:

In [None]:
# Show dataset structure and example tickets
print("üìä Dataset Structure:")
print("=" * 60)
print(f"\nColumns: {list(df.columns)}")
print(f"\nüìù Key Field for Simple RAG:")
print(f"   - short_description: Problem summary (this is what we'll index)")

# Show example tickets with more detail
print("\nüìã Example Tickets:")
print("=" * 60)
if len(df) > 0:
    for i in range(min(5, len(df))):
        example = df.iloc[i]
        desc = str(example.get('short_description', 'N/A'))
        print(f"\nüé´ Ticket #{example.get('number', 'N/A')}")
        print(f"   Description: {desc}")
        if len(desc) > 100:
            print(f"   (Full description: {desc[:150]}...)")
    
    print(f"\nüí° Notice how each ticket describes a problem differently!")
    print(f"   Traditional search: Would only find exact word matches")
    print(f"   Semantic search: Finds tickets with similar meaning, even different words")

---

### Step 2: Understand How RAG Works

**What we're learning:** How semantic search differs from traditional keyword search.

**Why:** Understanding the concept helps you see why RAG is powerful for IT operations.

#### üîç Keyword Search (Traditional)

```
Search: "application crashes"
Results: Only tickets with exact words "application" AND "crashes"
‚ùå Misses: "software failures", "system errors", "app stops working"
```

#### üß† Semantic Search (RAG)

```
Search: "application crashes"  
Results: Tickets with similar meaning:
  ‚úÖ "application crashes"
  ‚úÖ "software failures"  
  ‚úÖ "system errors"
  ‚úÖ "app stops working"
  ‚úÖ "program terminates unexpectedly"
```

**How it works:**
1. **Indexing:** Convert ticket descriptions into "embeddings" (vectors that capture meaning)
2. **Querying:** Convert your search query into the same type of embedding
3. **Matching:** Find tickets whose embeddings are similar to your query embedding
4. **Retrieval:** Return the most semantically similar tickets

**üí° Think of it like:** Instead of matching words, we're matching meanings. The system understands that "crash" and "failure" mean similar things in context.

---


### Step 3: Set Up LlamaStack Client

**What we're doing:** Connecting to LlamaStack - our RAG platform that handles:
- Vector database (stores ticket embeddings)
- Embedding generation (converts text to vectors)
- Semantic search (finds similar tickets)

**Why:** LlamaStack provides all the RAG infrastructure we need, so we can focus on building the search system.

**What happened:** We explored the dataset and understand how semantic search works. Now let's connect to LlamaStack to start building our RAG system.

**What's next:** We'll initialize the LlamaStack client and verify the connection.

**Ready to connect?** Let's initialize the LlamaStack client:

In [None]:
# Import required libraries for LlamaStack
import os
import sys
from pathlib import Path
from llama_stack_client import LlamaStackClient
from termcolor import cprint

# Add root src directory to path to import shared config
root_dir = Path("../..").resolve()
sys.path.insert(0, str(root_dir / "src"))

# Import centralized configuration
from config import LLAMA_STACK_URL, MODEL, CONFIG

# Configuration values (automatically detected based on environment)
llamastack_url = LLAMA_STACK_URL
model = MODEL

if not llamastack_url:
    raise ValueError(
        "LLAMA_STACK_URL is not configured!\n"
        "Please run: ./scripts/setup-env.sh\n"
        "Or set LLAMA_STACK_URL environment variable:\n"
        "  export LLAMA_STACK_URL='https://llamastack-route-my-first-model.apps.ocp.example.com'"
    )

print("üîÑ Connecting to LlamaStack...")
print("=" * 60)
print(f"üì° LlamaStack URL: {llamastack_url}")
print(f"ü§ñ Model: {model}")
print(f"üìç Environment: {'Inside OpenShift cluster' if CONFIG['inside_cluster'] else 'Outside OpenShift cluster'}")
print(f"üì¶ Namespace: {CONFIG['namespace']}")

# Initialize LlamaStack client
client = LlamaStackClient(base_url=llamastack_url)

# Verify connection
try:
    models = client.models.list()
    model_count = len(models.data) if hasattr(models, 'data') else len(models)
    print(f"\n‚úÖ Connected to LlamaStack")
    print(f"   Available models: {model_count}")
except Exception as e:
    print(f"\n‚ùå Cannot connect to LlamaStack: {e}")
    print("\nüí° Troubleshooting:")
    print("   1. Check if route exists: oc get route llamastack-route -n my-first-model")
    print("   2. Run setup script: ./scripts/setup-env.sh")
    print("   3. Or set LLAMA_STACK_URL manually in .env file")
    raise

# Configure inference parameters
temperature = float(os.getenv("TEMPERATURE", 0.0))
max_tokens = int(os.getenv("MAX_TOKENS", 4096))
stream_env = os.getenv("STREAM", "True")
stream = (stream_env != "False")

print(f"\n‚öôÔ∏è  Inference Parameters:")
print(f"   Model: {model}")
print(f"   Temperature: {temperature}")
print(f"   Max Tokens: {max_tokens}")
print(f"   Stream: {stream}")

**What happened:** We connected to LlamaStack successfully! ‚úÖ

**What's next:** Now we'll create a vector store (where ticket embeddings will be stored) and then index our tickets.

---

### Step 4: Create Vector Store and Index Documents

**What we're doing:** Creating a ChromaDB vector store and indexing ticket descriptions for semantic search.

**Why:** The vector store enables semantic search - finding tickets by meaning, not just exact keyword matches.

In [None]:
# Create ChromaDB vector store
print("\nüîÑ Creating ChromaDB vector store...")
print("=" * 60)
print("   - Provider: ChromaDB (embedded in LlamaStack)")
print("   - Embedding model: sentence-transformers/nomic-ai/nomic-embed-text-v1.5")
print("   - Embedding dimension: 768")

vs_chroma = client.vector_stores.create(
    extra_body={
        "provider_id": "chromadb",  # ChromaDB is managed by LlamaStack
        "embedding_model": "sentence-transformers/nomic-ai/nomic-embed-text-v1.5",
        "embedding_dimension": 768
    }
)

print(f"\n‚úÖ Vector store created!")
print("=" * 60)
print(f"üì¶ Vector Store Details:")
print(f"   ID: {vs_chroma.id}")
print(f"   Status: {vs_chroma.status}")
if vs_chroma.name:
    print(f"   Name: {vs_chroma.name}")
if vs_chroma.metadata:
    provider = vs_chroma.metadata.get('provider_id', 'N/A')
    print(f"   Provider: {provider}")
if hasattr(vs_chroma, 'file_counts') and vs_chroma.file_counts:
    print(f"\nüìä File Statistics:")
    print(f"   Total files: {vs_chroma.file_counts.total}")
    print(f"   Completed: {vs_chroma.file_counts.completed}")
    print(f"   In progress: {vs_chroma.file_counts.in_progress}")
    print(f"   Failed: {vs_chroma.file_counts.failed}")
    print(f"   Cancelled: {vs_chroma.file_counts.cancelled}")
if hasattr(vs_chroma, 'usage_bytes') and vs_chroma.usage_bytes:
    usage_mb = vs_chroma.usage_bytes / (1024 * 1024)
    print(f"\nüíæ Storage:")
    print(f"   Usage: {usage_mb:.2f} MB")
if hasattr(vs_chroma, 'created_at') and vs_chroma.created_at:
    from datetime import datetime
    created_time = datetime.fromtimestamp(vs_chroma.created_at)
    print(f"\nüïí Timestamps:")
    print(f"   Created: {created_time.strftime('%Y-%m-%d %H:%M:%S')}")
    if hasattr(vs_chroma, 'last_active_at') and vs_chroma.last_active_at:
        last_active = datetime.fromtimestamp(vs_chroma.last_active_at)
        print(f"   Last active: {last_active.strftime('%Y-%m-%d %H:%M:%S')}")
print("=" * 60)

**What happened:** We created a ChromaDB vector store! ‚úÖ

**üí° What is ChromaDB?** It's a vector database that stores embeddings. Think of it as a specialized database optimized for finding similar vectors (similar meanings).

**Key point:** ChromaDB is embedded in LlamaStack - no separate deployment needed! This makes setup simple.

**What's next:** Now we'll prepare our ticket data and convert it into a format that can be indexed.

---

In [None]:
# Prepare the data
print("\nüîÑ Preparing data for indexing...")
print("=" * 60)

# Fill missing values with empty strings
df = df.fillna("")

# Use all tickets (sample file already has 1000 rows)
df_1000 = df  # Sample file already has 1000 rows
print(f"   Processing {len(df_1000)} tickets (out of {len(df)} total)")

# Create RAG documents using only short_description
print("\nüîÑ Creating RAG documents...")
print("   Using field: short_description (problem summary)")
print("   Storing other fields as metadata")

documents = [
    RAGDocument(
        document_id=f"ticket-{i}",
        content=df_1000.iloc[i]["short_description"],
        mime_type="text/plain",
        metadata=df_1000.iloc[i].drop("short_description").to_dict(),
    )
    for i in range(len(df_1000))
]

print(f"‚úÖ Created {len(documents)} RAG documents")
print(f"\nüí° Each document contains:")
print(f"   - Content: short_description (what we'll search)")
print(f"   - Metadata: All other fields (for filtering)")

**What happened:** We created RAG documents! ‚úÖ

**üí° What is a RAG Document?**
- **Content:** The `short_description` field (what we'll search)
- **Metadata:** All other ticket fields (for filtering later)

**Why "Simple RAG"?** We're using only one field (`short_description`) for search. This is the simplest approach.

**üí° In notebook 02:** We'll see how combining multiple fields (`short_description` + `content` + `close_notes`) creates richer documents that improve search quality!

**What's next:** Now we'll index these documents - LlamaStack will automatically:
1. Chunk the text (split into manageable pieces)
2. Generate embeddings (convert to vectors)
3. Store in ChromaDB (make them searchable)

---

In [None]:
# Index documents into the vector store (in batches to avoid timeout)
print("\nüîÑ Indexing documents into vector store...")
print("=" * 60)
print(f"   Chunk size: 1024 tokens")
print(f"   Total documents: {len(documents)}")
print(f"   Processing in batches of 100 to avoid timeout...")

# Process in batches to avoid gateway timeout
BATCH_SIZE = 100
total_batches = (len(documents) + BATCH_SIZE - 1) // BATCH_SIZE
inserted_count = 0

for batch_num in range(total_batches):
    start_idx = batch_num * BATCH_SIZE
    end_idx = min(start_idx + BATCH_SIZE, len(documents))
    batch = documents[start_idx:end_idx]
    
    print(f"\n   Batch {batch_num + 1}/{total_batches}: Processing documents {start_idx} to {end_idx-1}...")
    
    try:
        insert_result = client.tool_runtime.rag_tool.insert( 
            chunk_size_in_tokens=1024,
            documents=batch,
            vector_db_id=str(vs_chroma.id),
            extra_body={"vector_store_id": str(vs_chroma.id)},
            extra_headers=None,
            extra_query=None,
            timeout=300  # 5 minute timeout per batch
        )
        inserted_count += len(batch)
        print(f"   ‚úÖ Batch {batch_num + 1} indexed successfully ({inserted_count}/{len(documents)} documents)")
    except Exception as e:
        print(f"   ‚ö†Ô∏è  Error indexing batch {batch_num + 1}: {e}")
        print(f"   üí° Tip: You can continue with the documents already indexed, or reduce BATCH_SIZE")
        continue

print(f"\n‚úÖ Indexing complete!")
print(f"   Successfully indexed: {inserted_count}/{len(documents)} documents")
print(f"   Vector store ID: {vs_chroma.id}")
print(f"\nüí° LlamaStack automatically:")
print(f"   - Chunked the documents")
print(f"   - Generated embeddings for each chunk")
print(f"   - Stored them in ChromaDB for semantic search")

In [None]:
# Display vector store with documents after indexing
print("\n" + "=" * 60)
print("üìä Vector Store Status After Indexing")
print("=" * 60)

# Retrieve updated vector store information
vs_updated = client.vector_stores.retrieve(vs_chroma.id)

print(f"\nüì¶ Vector Store Details:")
print(f"   ID: {vs_updated.id}")
print(f"   Status: {vs_updated.status}")
if vs_updated.name:
    print(f"   Name: {vs_updated.name}")
if vs_updated.metadata:
    provider = vs_updated.metadata.get("provider_id", "N/A")
    print(f"   Provider: {provider}")

# Display file/document statistics
if hasattr(vs_updated, "file_counts") and vs_updated.file_counts:
    print(f"\nüìä Document Statistics:")
    print(f"   Total files: {vs_updated.file_counts.total}")
    print(f"   Completed: {vs_updated.file_counts.completed}")
    print(f"   In progress: {vs_updated.file_counts.in_progress}")
    print(f"   Failed: {vs_updated.file_counts.failed}")
    print(f"   Cancelled: {vs_updated.file_counts.cancelled}")

# Display storage usage
if hasattr(vs_updated, "usage_bytes") and vs_updated.usage_bytes:
    usage_mb = vs_updated.usage_bytes / (1024 * 1024)
    print(f"\nüíæ Storage:")
    print(f"   Usage: {usage_mb:.2f} MB")

# Display timestamps
if hasattr(vs_updated, "created_at") and vs_updated.created_at:
    from datetime import datetime

    created_time = datetime.fromtimestamp(vs_updated.created_at)
    print(f"\nüïí Timestamps:")
    print(f"   Created: {created_time.strftime('%Y-%m-%d %H:%M:%S')}")
    if hasattr(vs_updated, "last_active_at") and vs_updated.last_active_at:
        last_active = datetime.fromtimestamp(vs_updated.last_active_at)
        print(f"   Last active: {last_active.strftime('%Y-%m-%d %H:%M:%S')}")

# Query the vector store to show sample documents
print(f"\n" + "=" * 60)
print("üîç Sample Documents in Vector Store")
print("=" * 60)
print("\nüí° Querying vector store to retrieve sample documents...")

try:
    # Query with a general query to get some sample results
    sample_query = "IT support ticket"
    query_result = client.tool_runtime.rag_tool.query(
        content=sample_query,
        vector_db_ids=[str(vs_chroma.id)],
        extra_body={"vector_store_ids": [str(vs_chroma.id)]},
    )

    print(f"\n‚úÖ Vector store is queryable and contains indexed documents!")
    print(f"\nüìÑ Sample Query Results:")
    print(f"   Query: '{sample_query}'")

    # Extract document information if available
    if hasattr(query_result, "content") and query_result.content:
        content_preview = (
            query_result.content[:300]
            if len(query_result.content) > 300
            else query_result.content
        )
        print(f"\n   Retrieved content preview:")
        print(f"   {content_preview}...")

    print(f"\nüí° The vector store is ready for semantic search!")
    print(f"   You can now query it to find tickets by meaning, not just keywords.")

except Exception as e:
    print(f"\n‚ö†Ô∏è  Could not query vector store: {e}")
    print(f"   Vector store may still be processing documents.")

print("=" * 60)

**What happened:** We indexed all documents into ChromaDB! ‚úÖ

**üéâ Success!** The tickets are now searchable using semantic similarity. Each ticket description has been:
- ‚úÖ Converted into embeddings (vectors representing meaning)
- ‚úÖ Stored in ChromaDB (ready for semantic search)

**üí° What happened behind the scenes:**
- LlamaStack automatically chunked long descriptions
- Generated embeddings using the embedding model
- Stored them in the vector database

---

### Step 5: Query with RAG - See Semantic Search in Action!

**What we're doing:** Testing our RAG system with a query to see how semantic search works.

**Why:** This is where you'll see the power of RAG - finding relevant tickets even when they use different words. It matches meaning, not just keywords!

**Ready to see semantic search in action?** Let's query our indexed tickets:

**What happened:** We indexed the tickets successfully! ‚úÖ

**What's next:** Now let's query the system to see semantic search in action. Watch how it finds tickets with similar meaning, even if they use different words!

---

In [None]:
# Test query
query = "What was the root cause and resolution for application crashes related to memory issues?"

print("üîç Querying RAG System")
print("=" * 60)
cprint(f"\nüìù User Query: {query}", "blue")
print("\nüí° This query will use semantic search to find:")
print("   - Tickets about 'application crashes'")
print("   - Tickets about 'memory issues'")
print("   - Even if they use different words!")
print("\nüîÑ Searching vector store...")

# Step 1: RAG retrieval - find relevant document chunks
rag_response = client.tool_runtime.rag_tool.query(
    content=query,
    vector_db_ids=[str(vs_chroma.id)],
    extra_body={"vector_store_ids": [str(vs_chroma.id)]},
)

print(f"‚úÖ Retrieved relevant context from vector store")
print(f"\nüìÑ Retrieved Context (first 500 chars):")
print("=" * 60)
print(rag_response.content[:500] + "..." if len(rag_response.content) > 500 else rag_response.content)
print("\nüí° Notice: The system found relevant tickets using semantic similarity!")

# Step 2: Construct extended prompt with retrieved context
messages = [{"role": "system", "content": "You are a helpful IT support assistant."}]
extended_prompt = f"Please answer the given query using the context below.\n\nCONTEXT:\n{rag_response.content}\n\nQUERY:\n{query}"
messages.append({"role": "user", "content": extended_prompt})

# Step 3: Generate answer using LLM
print("\nüîÑ Generating answer with LLM...")
response = client.chat.completions.create(
    messages=messages,
    model=model,
    stream=stream,
    max_tokens=max_tokens,
    temperature=temperature,
)

print("\n‚úÖ Answer:")
print("=" * 60)
if stream:
    for chunk in response:
        if chunk.choices and chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)
    print()
else:
    print(response.choices[0].message.content)

**What happened:** We used RAG to find relevant tickets and generate an answer! The system matched meaning, not just keywords.

---

## üéì Key Takeaway

RAG (Retrieval-Augmented Generation) enables semantic search - finding documents by meaning, not just exact keyword matches. By indexing ticket descriptions into a vector database, we can search for similar incidents even when they use different words. This is the foundation for intelligent IT operations search systems.

---

## üåç Real-World Connection

**How this applies to IT Operations:**

The same RAG approach can be used for:

- **Incident Resolution:** "Find similar incidents to this one" ‚Üí Get past solutions
- **Knowledge Base Search:** Search through documentation and runbooks using natural language
- **Pattern Recognition:** Identify recurring problems across incidents
- **Root Cause Analysis:** Find incidents with similar symptoms to learn from past diagnostics

**The pattern is the same:** Index historical data ‚Üí Query semantically ‚Üí Retrieve relevant context ‚Üí Use context to answer questions or suggest solutions.

---

## ‚ú® Your Turn

**Try this:** Modify the query to search for different types of problems. For example:
- "How do I fix database connection errors?"
- "What causes slow application performance?"
- "Find tickets about network issues"

Notice how the semantic search finds relevant tickets even with different wording!

---

## üéâ You Did It!

You've built your first RAG system! You learned how to index IT tickets and enable semantic search that understands meaning, not just keywords. 

**Next:** `02_multifield_RAG_llama_stack_chromadb.ipynb` - Learn how combining multiple fields (problem + diagnosis + solution) creates even better search results!

---

## üìö Want to Go Deeper?

<details>
<summary>üìñ Additional Resources (Click to expand)</summary>

- [LlamaStack Documentation](https://github.com/llamastack/llamastack) - RAG and vector store capabilities
- [ChromaDB Documentation](https://www.trychroma.com/) - Vector database used by LlamaStack
- [RAG Best Practices](https://www.pinecone.io/learn/retrieval-augmented-generation/) - RAG techniques and patterns
</details>