# Notebook 01: Build Your First RAG System for IT Ticket Search

## üéØ Your Mission

You're an IT support engineer building your first intelligent ticket search system. Your job today: index IT tickets into a vector database and enable semantic search that finds similar incidents, even when they use different words.

**Why this matters:** This same RAG (Retrieval-Augmented Generation) approach is how you could build search systems that understand meaning, not just keywords - enabling faster incident resolution by finding similar past problems and their solutions.

---

## ‚ö° Quick Win (First 2 Minutes)

Let's see RAG in action! Run the cell below to see how semantic search finds relevant tickets:

**What you'll see:** A RAG system that can find relevant IT tickets using semantic similarity - matching meaning, not just exact keywords. For example, searching for "application crashes" will find tickets about "software failures" and "system errors" too!

Now let's build it step by step to understand how it works.

---

## üéØ What You'll Learn

By the end of this notebook, you will:
- ‚úÖ Build a simple RAG system that indexes and searches IT tickets
- ‚úÖ Understand how semantic search works (matching meaning, not keywords)
- ‚úÖ Use RAG to answer questions using retrieved ticket context

**Time:** ~15-20 minutes

---

## üìã The Journey

---

### Step 1: Load and Explore the Dataset

**What we're doing:** Loading IT call center tickets and examining their structure.

**Why:** We need to understand the data before indexing it for semantic search.


In [None]:
# Import required libraries
import pandas as pd
from pathlib import Path
import uuid
from llama_stack_client import RAGDocument

# Load the CSV file from the data directory
data_dir = Path("../data")
file_path = data_dir / "synthetic-it-call-center-tickets.csv"

print("üîÑ Loading IT call center tickets dataset...")
df = pd.read_csv(file_path)

print(f"‚úÖ Loaded {len(df)} tickets")
print(f"üìã Dataset shape: {df.shape[0]} rows √ó {df.shape[1]} columns")
print(f"\nüîç Let's examine the dataset:")
print("=" * 60)
df.head()

**What we see:** Each ticket has a `short_description` field that describes the problem. This is what we'll index for semantic search.

**üí° Key insight:** Traditional keyword search would only find exact matches. RAG uses semantic similarity - it understands that "application crashes" and "software failures" mean similar things!

Let's see the structure:

In [None]:
# Show dataset structure and example tickets
print("üìä Dataset Structure:")
print("=" * 60)
print(f"\nColumns: {list(df.columns)}")
print(f"\nüìù Key Field for Simple RAG:")
print(f"   - short_description: Problem summary (this is what we'll index)")

# Show example tickets
print("\nüìã Example Tickets:")
print("=" * 60)
if len(df) > 0:
    for i in range(min(3, len(df))):
        example = df.iloc[i]
        print(f"\nüé´ Ticket #{example.get('number', 'N/A')}")
        print(f"   Description: {str(example.get('short_description', 'N/A'))[:100]}...")
    print(f"\nüí° We'll index these descriptions for semantic search!")

---

### Step 2: Set Up LlamaStack Client

**What we're doing:** Connecting to LlamaStack and configuring our environment.

**Why:** We need LlamaStack to handle vector database operations, embeddings, and RAG queries.

**What happened:** We explored the dataset. Now let's connect to LlamaStack.

---

**What happened:** We connected to LlamaStack. Now we're ready to create the vector store and index documents.

---

### Step 3: Create Vector Store and Index Documents

**What we're doing:** Creating a ChromaDB vector store and indexing ticket descriptions for semantic search.

**Why:** The vector store enables semantic search - finding tickets by meaning, not just exact keyword matches.

In [None]:
# Import required libraries for LlamaStack
import os
import sys
from pathlib import Path
from llama_stack_client import LlamaStackClient
from termcolor import cprint

# Add root src directory to path to import shared config
root_dir = Path("../..").resolve()
sys.path.insert(0, str(root_dir / "src"))

# Import centralized configuration
from config import LLAMA_STACK_URL, MODEL, CONFIG

# Configuration values (automatically detected based on environment)
llamastack_url = LLAMA_STACK_URL
model = MODEL

if not llamastack_url:
    raise ValueError(
        "LLAMA_STACK_URL is not configured!\n"
        "Please run: ./scripts/setup-env.sh\n"
        "Or set LLAMA_STACK_URL environment variable:\n"
        "  export LLAMA_STACK_URL='https://llamastack-route-my-first-model.apps.ocp.example.com'"
    )

print("üîÑ Step 1: Connecting to LlamaStack...")
print("=" * 60)
print(f"üì° LlamaStack URL: {llamastack_url}")
print(f"ü§ñ Model: {model}")
print(f"üìç Environment: {'Inside OpenShift cluster' if CONFIG['inside_cluster'] else 'Outside OpenShift cluster'}")
print(f"üì¶ Namespace: {CONFIG['namespace']}")

# Initialize LlamaStack client
client = LlamaStackClient(base_url=llamastack_url)

# Verify connection
try:
    models = client.models.list()
    model_count = len(models.data) if hasattr(models, 'data') else len(models)
    print(f"\n‚úÖ Connected to LlamaStack")
    print(f"   Available models: {model_count}")
except Exception as e:
    print(f"\n‚ùå Cannot connect to LlamaStack: {e}")
    print("\nüí° Troubleshooting:")
    print("   1. Check if route exists: oc get route llamastack-route -n my-first-model")
    print("   2. Run setup script: ./scripts/setup-env.sh")
    print("   3. Or set LLAMA_STACK_URL manually in .env file")
    raise

# Configure inference parameters
temperature = float(os.getenv("TEMPERATURE", 0.0))
max_tokens = int(os.getenv("MAX_TOKENS", 4096))
stream_env = os.getenv("STREAM", "True")
stream = (stream_env != "False")

print(f"\n‚öôÔ∏è  Inference Parameters:")
print(f"   Model: {model}")
print(f"   Temperature: {temperature}")
print(f"   Max Tokens: {max_tokens}")
print(f"   Stream: {stream}")

**What happened:** We connected to LlamaStack and configured our inference parameters. Now let's create the vector store.

---

In [None]:
# Step 1: Create ChromaDB vector store
print("\nüîÑ Step 1: Creating ChromaDB vector store...")
print("=" * 60)
print("   - Provider: ChromaDB (embedded in LlamaStack)")
print("   - Embedding model: sentence-transformers/nomic-ai/nomic-embed-text-v1.5")
print("   - Embedding dimension: 768")

vs_chroma = client.vector_stores.create(
    extra_body={
        "provider_id": "chromadb",  # ChromaDB is managed by LlamaStack
        "embedding_model": "sentence-transformers/nomic-ai/nomic-embed-text-v1.5",
        "embedding_dimension": 768
    }
)

print(f"‚úÖ Vector store created!")
print(f"   Vector Store ID: {vs_chroma.id}")

**What happened:** We created a ChromaDB vector store. ChromaDB is embedded in LlamaStack (no separate deployment needed).

Now let's prepare the ticket data for indexing:

In [None]:
# Step 2: Prepare the data
print("\nüîÑ Step 2: Preparing data for indexing...")
print("=" * 60)

# Fill missing values with empty strings
df = df.fillna("")

# Limit to first 1000 records for faster processing (you can use more for production)
df_1000 = df.head(1000)
print(f"   Processing {len(df_1000)} tickets (out of {len(df)} total)")

# Step 3: Create RAG documents using only short_description
print("\nüîÑ Step 3: Creating RAG documents...")
print("   Using field: short_description (problem summary)")
print("   Storing other fields as metadata")

documents = [
    RAGDocument(
        document_id=f"ticket-{i}",
        content=df_1000.iloc[i]["short_description"],
        mime_type="text/plain",
        metadata=df_1000.iloc[i].drop("short_description").to_dict(),
    )
    for i in range(len(df_1000))
]

print(f"‚úÖ Created {len(documents)} RAG documents")
print(f"\nüí° Each document contains:")
print(f"   - Content: short_description (what we'll search)")
print(f"   - Metadata: All other fields (for filtering)")

**What happened:** We created RAG documents using only the `short_description` field. This is "simple RAG" - using one field for search.

**üí° Note:** In the next notebook (02), we'll see how combining multiple fields (`short_description` + `content` + `close_notes`) creates richer documents that improve search quality!

Now let's index these documents:

In [None]:
# Step 4: Index documents into the vector store (in batches to avoid timeout)
print("\nüîÑ Step 4: Indexing documents into vector store...")
print("=" * 60)
print(f"   Chunk size: 1024 tokens")
print(f"   Total documents: {len(documents)}")
print(f"   Processing in batches of 100 to avoid timeout...")

# Process in batches to avoid gateway timeout
BATCH_SIZE = 100
total_batches = (len(documents) + BATCH_SIZE - 1) // BATCH_SIZE
inserted_count = 0

for batch_num in range(total_batches):
    start_idx = batch_num * BATCH_SIZE
    end_idx = min(start_idx + BATCH_SIZE, len(documents))
    batch = documents[start_idx:end_idx]
    
    print(f"\n   Batch {batch_num + 1}/{total_batches}: Processing documents {start_idx} to {end_idx-1}...")
    
    try:
        insert_result = client.tool_runtime.rag_tool.insert( 
            chunk_size_in_tokens=1024,
            documents=batch,
            vector_db_id=str(vs_chroma.id),
            extra_body={"vector_store_id": str(vs_chroma.id)},
            extra_headers=None,
            extra_query=None,
            timeout=300  # 5 minute timeout per batch
        )
        inserted_count += len(batch)
        print(f"   ‚úÖ Batch {batch_num + 1} indexed successfully ({inserted_count}/{len(documents)} documents)")
    except Exception as e:
        print(f"   ‚ö†Ô∏è  Error indexing batch {batch_num + 1}: {e}")
        print(f"   üí° Tip: You can continue with the documents already indexed, or reduce BATCH_SIZE")
        continue

print(f"\n‚úÖ Indexing complete!")
print(f"   Successfully indexed: {inserted_count}/{len(documents)} documents")
print(f"   Vector store ID: {vs_chroma.id}")
print(f"\nüí° LlamaStack automatically:")
print(f"   - Chunked the documents")
print(f"   - Generated embeddings for each chunk")
print(f"   - Stored them in ChromaDB for semantic search")

**What happened:** We indexed all documents into ChromaDB! The tickets are now searchable using semantic similarity.

---

### Step 4: Query with RAG

**What we're doing:** Testing our RAG system with a query to see how semantic search works.

**Why:** RAG enables finding relevant tickets even when they use different words - it matches meaning, not just keywords.

Let's test semantic search with a query:

**What happened:** We indexed the tickets. Now let's query the system to see semantic search in action!

---

In [None]:
# Test query
query = "What was the root cause and resolution for application crashes related to memory issues?"

print("üîç Querying RAG System")
print("=" * 60)
cprint(f"\nüìù User Query: {query}", "blue")
print("\nüí° This query will use semantic search to find:")
print("   - Tickets about 'application crashes'")
print("   - Tickets about 'memory issues'")
print("   - Even if they use different words!")
print("\nüîÑ Searching vector store...")

# Step 1: RAG retrieval - find relevant document chunks
rag_response = client.tool_runtime.rag_tool.query(
    content=query,
    vector_db_ids=[str(vs_chroma.id)],
    extra_body={"vector_store_ids": [str(vs_chroma.id)]},
)

print(f"‚úÖ Retrieved relevant context from vector store")
print(f"\nüìÑ Retrieved Context (first 500 chars):")
print("=" * 60)
print(rag_response.content[:500] + "..." if len(rag_response.content) > 500 else rag_response.content)
print("\nüí° Notice: The system found relevant tickets using semantic similarity!")

# Step 2: Construct extended prompt with retrieved context
messages = [{"role": "system", "content": "You are a helpful IT support assistant."}]
extended_prompt = f"Please answer the given query using the context below.\n\nCONTEXT:\n{rag_response.content}\n\nQUERY:\n{query}"
messages.append({"role": "user", "content": extended_prompt})

# Step 3: Generate answer using LLM
print("\nüîÑ Generating answer with LLM...")
response = client.chat.completions.create(
    messages=messages,
    model=model,
    stream=stream,
    max_tokens=max_tokens,
    temperature=temperature,
)

print("\n‚úÖ Answer:")
print("=" * 60)
if stream:
    for chunk in response:
        if chunk.choices and chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)
    print()
else:
    print(response.choices[0].message.content)

**What happened:** We used RAG to find relevant tickets and generate an answer! The system matched meaning, not just keywords.

---

## üéì Key Takeaway

RAG (Retrieval-Augmented Generation) enables semantic search - finding documents by meaning, not just exact keyword matches. By indexing ticket descriptions into a vector database, we can search for similar incidents even when they use different words. This is the foundation for intelligent IT operations search systems.

---

## üåç Real-World Connection

**How this applies to IT Operations:**

The same RAG approach can be used for:

- **Incident Resolution:** "Find similar incidents to this one" ‚Üí Get past solutions
- **Knowledge Base Search:** Search through documentation and runbooks using natural language
- **Pattern Recognition:** Identify recurring problems across incidents
- **Root Cause Analysis:** Find incidents with similar symptoms to learn from past diagnostics

**The pattern is the same:** Index historical data ‚Üí Query semantically ‚Üí Retrieve relevant context ‚Üí Use context to answer questions or suggest solutions.

---

## ‚ú® Your Turn

**Try this:** Modify the query to search for different types of problems. For example:
- "How do I fix database connection errors?"
- "What causes slow application performance?"
- "Find tickets about network issues"

Notice how the semantic search finds relevant tickets even with different wording!

---

## üéâ You Did It!

You've built your first RAG system! You learned how to index IT tickets and enable semantic search that understands meaning, not just keywords. 

**Next:** `02_multifield_RAG_llama_stack_chromadb.ipynb` - Learn how combining multiple fields (problem + diagnosis + solution) creates even better search results!

---

## üìö Want to Go Deeper?

<details>
<summary>üìñ Additional Resources (Click to expand)</summary>

- [LlamaStack Documentation](https://github.com/llamastack/llamastack) - RAG and vector store capabilities
- [ChromaDB Documentation](https://www.trychroma.com/) - Vector database used by LlamaStack
- [RAG Best Practices](https://www.pinecone.io/learn/retrieval-augmented-generation/) - RAG techniques and patterns
</details>