# Notebook 02: Build Multi-Field RAG for IT Ticket Search

## üéØ Your Mission

You're an IT support engineer building a smarter ticket search system. Your job today: combine multiple ticket fields (`short_description`, `content`, and `close_notes`) to create richer document representations that help find both problems AND solutions.

**Why this matters:** This same multi-field RAG approach is how you could build intelligent search systems that understand the full context of incidents - from initial problem reports to diagnostic findings to final resolutions - enabling better pattern recognition and faster problem resolution.

---

## ‚ö° Quick Win (First 2 Minutes)

Let's see multi-field RAG in action! Run the cell below to see how combining multiple fields improves search results:

**What you'll see:** By combining multiple fields (problem description + diagnostic details + resolution steps), the RAG system can find tickets that match both the problem AND the solution, not just the problem description alone.

Now let's build it step by step to understand how it works.

---

## üéØ What You'll Learn

By the end of this notebook, you will:
- ‚úÖ Combine multiple ticket fields to create richer RAG documents
- ‚úÖ Understand why multi-field RAG outperforms single-field RAG
- ‚úÖ Build a search system that finds both problems and solutions

**Time:** ~15-20 minutes

---

## üìã The Journey

We'll build this step by step:

1. **Explore the Data** - Understand the ticket fields and how to combine them
2. **Set Up LlamaStack** - Connect to our RAG platform
3. **Create Multi-Field Documents** - Combine `short_description`, `content`, and `close_notes`
4. **Index Documents** - Store multi-field documents in the vector database
5. **Query & Search** - Test queries that benefit from multi-field RAG

---


### Step 1: Load and Explore the Dataset

**What we're doing:** Loading IT call center tickets and examining their structure.

**Why:** We need to understand what fields are available so we can combine them effectively for better search results.

In [None]:
# Import required libraries
import pandas as pd
from pathlib import Path
import uuid
from llama_stack_client import RAGDocument

# Load the CSV file from the data directory
data_dir = Path("../data")
file_path = data_dir / "synthetic-it-call-center-tickets-sample.csv"

print("üîÑ Loading IT call center tickets dataset...")
df = pd.read_csv(file_path)

print(f"‚úÖ Loaded {len(df)} tickets")
print(f"üìã Dataset shape: {df.shape[0]} rows √ó {df.shape[1]} columns")
print(f"\nüîç Let's examine the dataset structure:")
print("=" * 60)
df.head()

**What we see:** Each ticket has multiple fields:
- **`short_description`** - Brief problem summary
- **`content`** - Detailed problem description
- **`close_notes`** - Diagnostic findings and resolution steps
- **Other fields** - Metadata like ticket number, priority, etc.

**üí° Key insight:** By combining `short_description`, `content`, and `close_notes`, we create documents that contain the full ticket lifecycle - problem ‚Üí diagnosis ‚Üí solution. This enables better search!

Let's see the field structure:

In [None]:
# Show dataset structure and key fields
print("üìä Dataset Structure:")
print("=" * 60)
print(f"\nColumns: {list(df.columns)}")
print(f"\nüìù Key Fields for Multi-Field RAG:")
print(f"   - short_description: Brief problem summary")
print(f"   - content: Detailed problem description")  
print(f"   - close_notes: Diagnostic findings and resolution steps")
print(f"\nüí° Other fields will be stored as metadata for filtering")

# Show an example ticket to illustrate the multi-field concept
print("\nüìã Example Ticket (showing multi-field structure):")
print("=" * 60)
if len(df) > 0:
    example = df.iloc[0]
    print(f"\nüé´ Ticket #{example.get('number', 'N/A')}")
    print(f"\nüìå Short Description:")
    print(f"   {example.get('short_description', 'N/A')[:100]}...")
    print(f"\nüìÑ Content:")
    print(f"   {str(example.get('content', 'N/A'))[:150]}...")
    print(f"\n‚úÖ Close Notes:")
    print(f"   {str(example.get('close_notes', 'N/A'))[:150]}...")
    print(f"\nüí° Notice: Combining all three fields gives us the complete ticket story!")

---

### Step 2: Set Up LlamaStack Client

**What we're doing:** Connecting to LlamaStack and configuring our environment.

**Why:** We need LlamaStack to handle vector database operations, embeddings, and RAG queries.

**What happened:** We explored the dataset and understand its structure. Now let's connect to LlamaStack.

---

In [None]:
# Import required libraries for LlamaStack
import os
import sys
from pathlib import Path
from llama_stack_client import LlamaStackClient
from termcolor import cprint

# Add root src directory to path to import shared config
root_dir = Path("../..").resolve()
sys.path.insert(0, str(root_dir / "src"))

# Import centralized configuration
from config import LLAMA_STACK_URL, MODEL, CONFIG

# Configuration values (automatically detected based on environment)
llamastack_url = LLAMA_STACK_URL
model = MODEL

if not llamastack_url:
    raise ValueError(
        "LLAMA_STACK_URL is not configured!\n"
        "Please run: ./scripts/setup-env.sh\n"
        "Or set LLAMA_STACK_URL environment variable:\n"
        "  export LLAMA_STACK_URL='https://llamastack-route-my-first-model.apps.ocp.example.com'"
    )

print("üîÑ Step 1: Connecting to LlamaStack...")
print("=" * 60)
print(f"üì° LlamaStack URL: {llamastack_url}")
print(f"ü§ñ Model: {model}")
print(f"üìç Environment: {'Inside OpenShift cluster' if CONFIG['inside_cluster'] else 'Outside OpenShift cluster'}")
print(f"üì¶ Namespace: {CONFIG['namespace']}")

# Initialize LlamaStack client
client = LlamaStackClient(base_url=llamastack_url)

# Verify connection
try:
    models = client.models.list()
    model_count = len(models.data) if hasattr(models, 'data') else len(models)
    print(f"\n‚úÖ Connected to LlamaStack")
    print(f"   Available models: {model_count}")
except Exception as e:
    print(f"\n‚ùå Cannot connect to LlamaStack: {e}")
    print("\nüí° Troubleshooting:")
    print("   1. Check if route exists: oc get route llamastack-route -n my-first-model")
    print("   2. Run setup script: ./scripts/setup-env.sh")
    print("   3. Or set LLAMA_STACK_URL manually in .env file")
    raise

# Configure inference parameters
temperature = float(os.getenv("TEMPERATURE", 0.0))
max_tokens = int(os.getenv("MAX_TOKENS", 4096))
stream_env = os.getenv("STREAM", "True")
stream = (stream_env != "False")

print(f"\n‚öôÔ∏è  Inference Parameters:")
print(f"   Model: {model}")
print(f"   Temperature: {temperature}")
print(f"   Max Tokens: {max_tokens}")
print(f"   Stream: {stream}")

**What happened:** We connected to LlamaStack and configured our inference parameters. Now we're ready to create the vector store and index documents.

---

### Step 3: Create Multi-Field Documents and Index Them

**What we're doing:** Creating multi-field RAG documents by combining ticket fields, then indexing them into a vector store.

**Why:** By combining `short_description`, `content`, and `close_notes`, we create richer document representations that enable better search results - finding both problems AND solutions.

**This step includes:**
1. Create a vector store
2. Prepare and combine ticket fields
3. Create multi-field RAG documents
4. Index documents into the vector store

**üí° Why create a new vector store instead of reusing notebook 01's?**

In notebook 01, we indexed documents using only the `short_description` field (problem summary). In this notebook, we're indexing documents that combine `short_description + content + close_notes` (full ticket lifecycle). 

**We create a new vector store because:**
- **Different document structures**: Single-field vs multi-field documents have different content and embeddings
- **Better separation**: Keeping them separate makes it easier to compare single-field vs multi-field RAG performance
- **Pedagogical clarity**: Creating a new vector store helps demonstrate the multi-field RAG concept clearly

**In production:** You could reuse a vector store and add different document types to it, or create separate vector stores for different document structures - the choice depends on your use case and whether you want to keep different document types separate or combined.

In [None]:
# Create a new vector store for multi-field RAG
# Note: We create a new vector store instead of reusing notebook 01's because:
# - Notebook 01 indexes single-field documents (short_description only)
# - This notebook indexes multi-field documents (short_description + content + close_notes)
# - Different document structures benefit from separate vector stores for clarity and comparison
#
# To reuse notebook 01's vector store instead, you could:
# 1. List existing vector stores: client.vector_stores.list()
# 2. Retrieve a specific one: vs_chroma = client.vector_stores.retrieve("vs_<id-from-notebook-01>")
# 3. Then index your multi-field documents into that same vector store

vs_chroma = client.vector_stores.create(
    extra_body={
        "provider_id": "chromadb",  # Optional: specify vector store provider
        "embedding_model": "sentence-transformers/nomic-ai/nomic-embed-text-v1.5",
        "embedding_dimension": 768  # Optional: will be auto-detected if not provided
    }
)

**What happened:** We created a ChromaDB vector store. ChromaDB is embedded in LlamaStack (no separate deployment needed), unlike MongoDB which requires a separate MCP server.

Now let's prepare and combine the ticket fields:

In [None]:
# Step 2: Prepare the data
print("\nüîÑ Step 2: Preparing data for indexing...")
print("=" * 60)

# Fill missing values with empty strings
df = df.fillna("")

# Limit to first 1000 records for faster processing (you can use more for production)
df_1000 = df  # Sample file already has 1000 rows
print(f"   Processing {len(df_1000)} tickets (out of {len(df)} total)")

# Step 3: Create multi-field RAG documents
print("\nüîÑ Step 3: Creating multi-field RAG documents...")
print("   Combining fields: short_description + content + close_notes")
print("   Storing other fields as metadata")

documents = [
    RAGDocument(
        document_id=f"ticket-{i}",
        content=f"{df_1000.iloc[i]['short_description']}\n\n{df_1000.iloc[i]['content']}\n\n{df_1000.iloc[i]['close_notes']}",
        mime_type="text/plain",
        metadata=df_1000.iloc[i].drop(["short_description", "content", "close_notes"]).to_dict(),
    )
    for i in range(len(df_1000))
]

print(f"‚úÖ Created {len(documents)} RAG documents")
print(f"\nüí° Each document contains:")
print(f"   - Content: short_description + content + close_notes (full ticket story)")
print(f"   - Metadata: All other fields (for filtering)")

**What happened:** We created RAG documents that combine multiple fields. Each document now contains the complete ticket story - from problem description to diagnostic findings to resolution steps.

**üí° Key insight:** This multi-field approach enables the RAG system to match queries based on:
- Problem descriptions (from `short_description` and `content`)
- Diagnostic details (from `content` and `close_notes`)
- Solution steps (from `close_notes`)

This is much more powerful than single-field RAG!

Now let's index these documents:

In [None]:
# Step 3.3: Index documents into the vector store (in batches to avoid timeout)
print("\nüîÑ Step 3.3: Indexing documents into vector store...")
print("=" * 60)
print(f"   Chunk size: 1024 tokens")
print(f"   Total documents: {len(documents)}")
print(f"   Processing in batches of 100 to avoid timeout...")

# Process in batches to avoid gateway timeout
BATCH_SIZE = 10
total_batches = (len(documents) + BATCH_SIZE - 1) // BATCH_SIZE
inserted_count = 0

for batch_num in range(total_batches):
    start_idx = batch_num * BATCH_SIZE
    end_idx = min(start_idx + BATCH_SIZE, len(documents))
    batch = documents[start_idx:end_idx]
    
    print(f"\n   Batch {batch_num + 1}/{total_batches}: Processing documents {start_idx} to {end_idx-1}...")
    
    try:
        insert_result = client.tool_runtime.rag_tool.insert( 
            chunk_size_in_tokens=1024,
            documents=batch,
            vector_db_id=str(vs_chroma.id),
            extra_body={"vector_store_id": str(vs_chroma.id)},
            extra_headers=None,
            extra_query=None,
            timeout=300  # 5 minute timeout per batch
        )
        inserted_count += len(batch)
        print(f"   ‚úÖ Batch {batch_num + 1} indexed successfully ({inserted_count}/{len(documents)} documents)")
    except Exception as e:
        print(f"   ‚ö†Ô∏è  Error indexing batch {batch_num + 1}: {e}")
        print(f"   üí° Tip: You can continue with the documents already indexed, or reduce BATCH_SIZE")
        # Continue with next batch instead of failing completely
        continue

print(f"\n‚úÖ Indexing complete!")
print(f"   Successfully indexed: {inserted_count}/{len(documents)} documents")
print(f"   Vector store ID: {vs_chroma.id}")
print(f"\nüí° LlamaStack automatically:")
print(f"   - Chunked the documents")
print(f"   - Generated embeddings for each chunk")
print(f"   - Stored them in ChromaDB for semantic search")

**What happened:** We indexed all documents into ChromaDB! The documents are now searchable using semantic similarity.

**üéâ Success!** The multi-field tickets are now searchable. Each document contains:
- ‚úÖ Problem description (`short_description`)
- ‚úÖ Detailed context (`content`)
- ‚úÖ Diagnostic findings and solutions (`close_notes`)

**üí° What happened behind the scenes:**
- LlamaStack automatically chunked the combined field content
- Generated embeddings using the embedding model
- Stored them in the vector database for semantic search

---

### Step 4: Query with Multi-Field RAG

**What we're doing:** Testing our multi-field RAG system with queries that benefit from combined fields.

**Why:** Multi-field RAG excels at queries that need both problem AND solution context, not just problem descriptions. This is where you'll see the power of combining multiple fields!

In [None]:
client.vector_io.query(vector_db_id=vs_chroma.id,query="ZTrend crashes")

### Step 4.1: Execute Queries Using the RAG Tool

**What we're doing:** Using the built-in RAG tool to query our multi-field vector database.

**How it works:**
1. Query the vector database to retrieve relevant document chunks
2. Construct an extended prompt using the retrieved context
3. Query the LLM with the extended prompt
4. Get answers that combine retrieved context with LLM reasoning

In [None]:
queries = [
    "What was the root cause and resolution for application crashes related to memory issues?",
]

for prompt in queries:
    cprint(f"\nUser> {prompt}", "blue")
    
    # RAG retrieval call
    rag_response = client.tool_runtime.rag_tool.query(
        content=prompt,
        vector_db_ids=[str(vs_chroma.id)],   # o SDK exige isso
        extra_body={"vector_store_ids": [str(vs_chroma.id)]},  # o backend exige isso
    )

    print(rag_response.content)
    # the list of messages to be sent to the model must start with the system prompt
    messages = [{"role": "system", "content": "You are a helpful assistant."}]

    # construct the actual prompt to be executed, incorporating the original query and the retrieved content
    prompt_context = rag_response.content
    extended_prompt = f"Please answer the given query using the context below.\n\nCONTEXT:\n{prompt_context}\n\nQUERY:\n{prompt}"
    messages.append({"role": "user", "content": extended_prompt})

    # use Llama Stack inference API to directly communicate with the desired model
    response = client.chat.completions.create(
        messages=messages,
        model=model,
        stream=stream,
        max_tokens=max_tokens,
        temperature=temperature,
    )
    
if stream:
    for chunk in response:
        if chunk.choices and chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)
    print()  # nova linha ap√≥s streaming
else:
    print(response.choices[0].message.content)

### Step 5: Why Multi-Field RAG is Better

**What we're learning:** Understanding when and why multi-field RAG outperforms single-field RAG.

**Why this matters:** Knowing the strengths of multi-field RAG helps you decide when to use it in production systems.

---

#### Example Queries: Multi-Field vs Single-Field RAG

Using multiple fields (`short_description`, `content`, and `close_notes`) instead of just `short_description` significantly improves retrieval quality for certain types of queries. Here are examples where multi-field RAG outperforms single-field RAG:

**Example 1: Troubleshooting Steps and Solutions**
**Query**: "How do I fix ZTrend crashes when saving files?"

- **Single-field (short_description only)**: May retrieve tickets about crashes, but won't have the solution steps
- **Multi-field**: Retrieves tickets with both the problem description AND the detailed troubleshooting steps from `close_notes`, providing complete answers

**Example 2: Historical Context and Resolution**
**Query**: "What was the root cause and resolution for application crashes related to memory issues?"

- **Single-field**: Only finds tickets mentioning "crashes" but misses the diagnostic details and resolution steps
- **Multi-field**: Retrieves tickets with full context from `content` (initial problem description) and `close_notes` (diagnostic findings and resolution), enabling comprehensive answers

**Example 3: Pattern Recognition Across Problem-Solution Pairs**
**Query**: "What are common solutions for software crashes that involve configuration files?"

- **Single-field**: Can identify crash-related tickets but can't see the solutions
- **Multi-field**: Can match both problem patterns (from `short_description`/`content`) and solution patterns (from `close_notes`), enabling identification of recurring problem-solution patterns

**Example 4: Detailed Technical Information**
**Query**: "Show me tickets where log file analysis revealed the issue"

- **Single-field**: May miss tickets where log analysis is only mentioned in `content` or `close_notes`
- **Multi-field**: Captures technical details from all fields, ensuring comprehensive retrieval of relevant tickets

**Example 5: End-to-End Ticket Understanding**
**Query**: "Find tickets where the customer reported a problem, diagnostics were performed, and the issue was resolved by reinstalling software"

- **Single-field**: Can't capture the full narrative flow from problem ‚Üí diagnosis ‚Üí solution
- **Multi-field**: Preserves the complete ticket lifecycle, enabling retrieval based on complex multi-stage scenarios

**Key Insight**: Multi-field RAG is especially powerful for queries that require understanding both the problem AND the solution, or queries that need to match patterns across different stages of the ticket lifecycle.


---

## üéì Key Takeaways
This notebook demonstrated how to set up and use the built-in RAG tool for ingesting user-provided documents in a vector database and utilizing them during inference via direct retrieval. 

Key points:
- **Multi-field content**: We combined `short_description`, `content`, and `close_notes` fields to create richer document representations, improving the quality of retrieval and context understanding.
- **Metadata preservation**: Other fields from the dataset are stored as metadata, allowing for filtering and additional context during retrieval.
- **Vector database integration**: The documents are chunked and indexed into ChromaDB using Llama Stack's RAG tool, enabling semantic search over the ticket data.
- **Query advantages**: As shown in Section 4, multi-field RAG excels at queries requiring both problem and solution context, pattern recognition across ticket lifecycle stages, and comprehensive technical information retrieval.

Now that we've seen how easy it is to implement RAG with Llama Stack, We'll move on to building a simple agent with Llama Stack next in our [Simple Agents](./Level2_simple_agent_with_websearch.ipynb) notebook.