# Competitor Analysis

This notebook demonstrates how to query information in the input PDF documents using `Llamastack` APIs

---

## What is demonstrated

1. **Connect to LlamaStack** - Access the RAG infrastructure
2. **List Vector Databases** - See available document collections
3. **Semantic Search** - Query documents using natural language
4. **Full RAG with LLM** - Get AI-generated answers with source attribution

---

## Prerequisites

- Documents ingested via KFP pipeline
- Embeddings stored in Milvus (vector DB: `competitor-docs`)
- Notebook running in a RHOAI workbench with cluster access

---


## Install Required Libraries

Install the LlamaStack client and visualization libraries.


In [None]:
# Install required packages
%pip install -q llama-stack-client==0.2.22 rich pandas tabulate


## Import Libraries

Import all necessary Python libraries for querying and visualization.


In [None]:
from llama_stack_client import LlamaStackClient
from llama_stack_client.types import Document
import logging
import pandas as pd
from rich import print as rprint
from rich.console import Console
from rich.table import Table
from rich.panel import Panel
from rich.markdown import Markdown
from IPython.display import display, Markdown as IPyMarkdown
import json

# Suppress verbose HTTP logs
logging.getLogger("httpx").setLevel(logging.WARNING)

# Initialize Rich console for pretty output
console = Console()

print("[OK] Libraries imported successfully!")


## Configure LlamaStack Connection

Set up the connection to LlamaStack service running in the cluster.

**Note:** We use the **in-cluster DNS name** since this notebook runs inside OpenShift.


In [None]:
# LlamaStack service URL (in-cluster)
LLAMASTACK_URL = "http://llama-stack-dist-service.competitor-analysis.svc.cluster.local:8321"

# Vector DB name (logical identifier used in pipeline)
VECTOR_DB_NAME = "competitor-docs"

console.print(Panel.fit(
    f"[bold cyan]LlamaStack URL:[/bold cyan] {LLAMASTACK_URL}\n"
    f"[bold cyan]Target Vector DB:[/bold cyan] {VECTOR_DB_NAME}",
    title="üîß Configuration",
    border_style="cyan"
))


## Connect to LlamaStack

Initialize the LlamaStack client and verify connectivity.


In [None]:
try:
    # Initialize client
    client = LlamaStackClient(base_url=LLAMASTACK_URL)
    
    # Test connection by listing models
    models = client.models.list()
    
    console.print("[bold green][OK] Successfully connected to LlamaStack![/bold green]")
    console.print(f"[dim]Found {len(models)} model(s)[/dim]")
    
except Exception as e:
    console.print(f"[bold red][FAIL] Connection failed:[/bold red] {e}")
    console.print("[bold red]Ensure LlamaStack service is running in the cluster[/bold red]")


## Discover Available Models

List all models available in LlamaStack (LLM for inference + Embedding model).


In [None]:
# Create a table for models
table = Table(title="Available Models", show_header=True, header_style="bold magenta")
table.add_column("Model Type", style="cyan", width=15)
table.add_column("Model ID", style="blue", width=40)
table.add_column("Details", style="green")

inference_model_id = None
embedding_model_id = None
embedding_dimension = None

for model in models:
    model_type = model.model_type
    model_id = model.identifier
    
    details = ""
    
    if model_type == "llm":
        inference_model_id = model_id
        details = "Used for text generation"
        icon = "üí¨"
    elif model_type == "embedding":
        embedding_model_id = model_id
        embedding_dimension = model.metadata.get("embedding_dimension", "N/A")
        details = f"Dimension: {embedding_dimension}"
        icon = "üî¢"
    else:
        icon = "‚ùì"
    
    table.add_row(f"{icon} {model_type}", model_id, details)

console.print(table)

## List Vector Databases

Discover all vector databases registered in LlamaStack and locate our target: **`competitor-docs`**.


In [None]:
# Get all vector DBs
vector_dbs = client.vector_dbs.list()

if not vector_dbs:
    console.print("[bold red][FAIL] No vector databases found![/bold red]")
    console.print("[bold red]Run the KFP pipeline to ingest documents first[/bold red]")
else:
    # Create table for vector DBs
    table = Table(title="üì¶ Vector Databases", show_header=True, header_style="bold magenta")
    table.add_column("Status", width=6)
    table.add_column("Vector DB ID", style="cyan", width=40)
    table.add_column("Logical Name", style="brown", width=20)
    table.add_column("Provider", style="green", width=15)
    table.add_column("Embedding Model", style="blue")
    
    target_vector_db_id = None
    
    for vdb in vector_dbs:
        vdb_id = vdb.identifier
        logical_name = getattr(vdb, 'vector_db_name', 'N/A')
        provider = vdb.provider_id
        emb_model = getattr(vdb, 'embedding_model', 'N/A')
        
        # Check if this is our target
        is_target = (logical_name == VECTOR_DB_NAME or vdb_id == VECTOR_DB_NAME)
        status = "‚úÖ" if is_target else "  "
        
        if is_target:
            target_vector_db_id = vdb_id
        
        table.add_row(status, vdb_id, logical_name, provider, emb_model)
    
    console.print(table)
    
    # Verify we found our target
    if target_vector_db_id:
        console.print(f"\n[bold green][OK] Found target vector DB:[/bold green] {VECTOR_DB_NAME}")
        console.print(f"[dim] Milvus Collection ID: {target_vector_db_id}[/dim]")
    else:
        console.print(f"\n[bold red][FAIL] Vector DB '{VECTOR_DB_NAME}' not found![/bold red]")
        console.print(f"[bold red]Available: {[getattr(vdb, 'vector_db_name', vdb.identifier) for vdb in vector_dbs]}[/bold red]")


## Verify Vector DB Setup

Ensure we have a valid vector DB to query.


In [None]:
if target_vector_db_id:
    console.print(Panel.fit(
        f"[bold green][OK] Ready to Query[/bold green]\n\n"
        f"[cyan]Vector DB Name:[/cyan] {VECTOR_DB_NAME}\n"
        f"[cyan]Vector DB ID:[/cyan] {target_vector_db_id}\n"
        f"[cyan]Embedding Model:[/cyan] {embedding_model_id}\n"
        f"[cyan]Embedding Dimension:[/cyan] {embedding_dimension}",
        title="Query Configuration",
        border_style="green"
    ))
else:
    console.print(Panel.fit(
        f"[bold red][FAIL] Cannot proceed - Vector DB not found[/bold red]\n\n"
        f"Please ensure the KFP pipeline has run successfully and ingested documents.",
        title="Setup Required",
        border_style="red"
    ))
    raise ValueError(f"Vector DB '{VECTOR_DB_NAME}' not found")


---

# Semantic Search

Now let's query the documents using natural language!


## Define Your Query

Customize this cell to ask any question about your ingested documents.


In [None]:
# CUSTOMIZE YOUR QUERY HERE
query_text = "What was the standalone Profit After Tax (PAT) for HDFC Bank in Q2 FY26??"

console.print(Panel.fit(
    f"[bold red]{query_text}[/bold red]",
    title="Your Query",
    border_style="blue"
))


## Execute Semantic Search

Query the vector database and retrieve relevant document chunks.


In [None]:
console.print("[cyan]Generating query embeddings...[/cyan]")
console.print("[cyan]Searching document vectors...[/cyan]")
console.print()

try:
    # Perform semantic search using LlamaStack RAG tool
    rag_response = client.tool_runtime.rag_tool.query(
        content=query_text,
        vector_db_ids=[target_vector_db_id]
    )
    
    # Extract results - handle structured response
    if hasattr(rag_response, 'content') and rag_response.content:
        # Extract text from content items (rag_response.content is a list)
        if isinstance(rag_response.content, list):
            search_results = "\n".join([
                item.text if hasattr(item, 'text') else str(item) 
                for item in rag_response.content
            ])
        else:
            search_results = str(rag_response.content)
        
        # Display results in a panel
        console.print(Panel(
            search_results,
            title="Search Results",
            border_style="green",
            expand=False
        ))
        
        console.print(f"\n[bold green][OK] Search completed successfully![/bold green]")
        
    else:
        console.print("[brown]No results found for your query[/brown]")
        search_results = None
        
except Exception as e:
    console.print(f"[bold red][FAIL] Search failed:[/bold red] {e}")
    import traceback
    traceback.print_exc()
    search_results = None


---

# Full RAG with LLM

Go beyond just retrieving documents - get **AI-generated answers** with source attribution!



Define system instructions for the LLM to:
1. Answer based ONLY on retrieved document context
2. Maintain factual accuracy and avoid hallucination
3. Provide confidence scores and source attribution


In [None]:
# LLM system instructions for RAG
agent_instructions = """
You are an intelligent assistant that answers user queries.

    Instructions:
    - Use only the knowledge_search tool to extract information. Ignore all other sources of information.
    - Do NOT make up or assume any facts beyond what is given.
    - If the answer cannot be found in the provided context, clearly respond with:
      "The information you asked for is not available in the provided documents."
    - If you cannot find any relevant information for response, do not print the confidence score and do not mention the sources.
    - Maintain factual accuracy and logical consistency at all times.
    - If there are multiple relevant pieces of information, summarize them precisely and cite their context where applicable.
    - Be concise, structured, and neutral ‚Äî avoid speculation or creative elaboration.
    - When numerical or factual answers are expected, extract them exactly as stated in the context.
    - Do not quote any numerical information in US Dollars. All numbers to be in Indian Rupees (INR).
    - Use currency representation for the Indian locale. Use lakhs, crores and not millions or billions 
    - Put correct commas in currency to reflect the indian locale. 1 Million Rupees (or 10 Lakhs) should be shown as 10,00,000.
    - If you find a factual answer to a query, indicate a confidence score (0‚Äì100%) along with the name of the source documents. 
    If you do not find the information, then do not cite the confidence score or the source.
    - The source is available in a field called 'filename' in the context. For all source files that you mention in the response,
      Always Replace the .md file extension with .pdf

"""

console.print(Panel(
    agent_instructions.strip(),
    title="LLM Instructions",
    border_style="blue"
))

# Verify we have the inference model
if not inference_model_id:
    console.print("[bold red][FAIL] No inference model available for RAG[/bold red]")
else:
    console.print(f"\n[green][OK] Ready for RAG queries with model:[/green] {inference_model_id}")


## Two-Step RAG Query (with Streaming)

**Step 1:** Retrieve relevant document chunks via semantic search  
**Step 2:** Generate AI answer by feeding context to the LLM


In [None]:
console.print(f"[cyan]Question: {query_text}[/cyan]")
console.print()

try:
    # Step 1: Perform semantic search to retrieve relevant document chunks
    console.print("[cyan]Step 1: Retrieving relevant document chunks...[/cyan]")
    
    rag_response = client.tool_runtime.rag_tool.query(
        content=query_text,
        vector_db_ids=[target_vector_db_id]
    )
    
    # Extract retrieved context
    if hasattr(rag_response, 'content') and rag_response.content:
        if isinstance(rag_response.content, list):
            retrieved_context = "\n".join([
                item.text if hasattr(item, 'text') else str(item) 
                for item in rag_response.content
            ])
        else:
            retrieved_context = str(rag_response.content)
        
        console.print(f"[green][OK] Retrieved context ({len(retrieved_context)} chars)[/green]")
        
        # Show preview of retrieved context
        console.print(Panel(
            retrieved_context[:500] + "..." if len(retrieved_context) > 500 else retrieved_context,
            title="Retrieved Context (Preview)",
            border_style="blue",
            expand=False
        ))
        
    else:
        console.print("[bold red]No relevant documents found![/bold red]")
        retrieved_context = None
    
    # Step 2: Generate answer using LLM with retrieved context
    if retrieved_context:
        console.print("\n[cyan]Step 2: Generating AI answer with context...[/cyan]")
        console.rule("[bold green]LLM Response", style="green")
        print()
        
        # Build RAG prompt with instructions, context, and query
        rag_prompt = f"""{agent_instructions}

**Retrieved Document Context:**
{retrieved_context}

**User Question:**
{query_text}

**Your Answer:**"""
        
        # Call inference API directly (no agent, just LLM)
        response = client.inference.chat_completion(
            model_id=inference_model_id,
            messages=[
                {"role": "user", "content": rag_prompt}
            ],
            stream=True
        )
        
        # Stream and print the response
        full_response = ""
        for chunk in response:
            if hasattr(chunk, 'event') and hasattr(chunk.event, 'delta'):
                delta = chunk.event.delta
                # Extract text from delta object
                if hasattr(delta, 'text'):
                    content = delta.text
                elif isinstance(delta, str):
                    content = delta
                else:
                    content = str(delta)
                
                print(content, end='', flush=True)
                full_response += content
        
        print()
        console.rule(style="green")
        console.print("\n[bold green][OK] Answer generated successfully![/bold green]")
        
    else:
        console.print("[yellow]Cannot generate answer without context[/yellow]")
    
except Exception as e:
    console.print(f"\n[bold red][FAIL] RAG query failed:[/bold red] {e}")
    import traceback
    traceback.print_exc()


---

# Quick Query Helper

Run this cell repeatedly to test different queries quickly!


In [None]:
def quick_query(question: str, use_agent: bool = False):
    """
    Quick query helper function.
    
    Args:
        question: Your question
        use_agent: If True, use LLM agent; if False, just return search results
    """
    console.print(Panel.fit(
        f"[bold yellow]{question}[/bold yellow]",
        title="Query",
        border_style="yellow"
    ))
    
    try:
        if use_agent:
            # Two-step RAG: retrieve context, then generate answer
            console.print("[cyan]üîç Retrieving context...[/cyan]")
            rag_response = client.tool_runtime.rag_tool.query(
                content=question,
                vector_db_ids=[target_vector_db_id]
            )
            
            # Extract context
            if hasattr(rag_response, 'content') and rag_response.content:
                if isinstance(rag_response.content, list):
                    context = "\n".join([
                        item.text if hasattr(item, 'text') else str(item)
                        for item in rag_response.content
                    ])
                else:
                    context = str(rag_response.content)
                
                console.print("[cyan]ü§ñ Generating answer...[/cyan]\n")
                
                # Build prompt with context
                prompt = f"""{agent_instructions}

**Retrieved Document Context:**
{context}

**User Question:**
{question}

**Your Answer:**"""
                
                # Call LLM with context
                response = client.inference.chat_completion(
                    model_id=inference_model_id,
                    messages=[{"role": "user", "content": prompt}],
                    stream=True
                )
                
                for chunk in response:
                    if hasattr(chunk, 'event') and hasattr(chunk.event, 'delta'):
                        delta = chunk.event.delta
                        if hasattr(delta, 'text'):
                            content = delta.text
                        elif isinstance(delta, str):
                            content = delta
                        else:
                            content = str(delta)
                        print(content, end='', flush=True)
                print()
            else:
                console.print("[yellow]No relevant context found[/yellow]")
        else:
            # Just do semantic search
            console.print("[cyan]Performing semantic search...[/cyan]\n")
            rag_response = client.tool_runtime.rag_tool.query(
                content=question,
                vector_db_ids=[target_vector_db_id]
            )
            if hasattr(rag_response, 'content') and rag_response.content:
                # Extract text from content items
                if isinstance(rag_response.content, list):
                    results_text = "\n".join([
                        item.text if hasattr(item, 'text') else str(item)
                        for item in rag_response.content
                    ])
                else:
                    results_text = str(rag_response.content)
                
                console.print(Panel(
                    results_text,
                    title="Results",
                    border_style="green"
                ))
            else:
                console.print("[yellow]No results found[/yellow]")
        
        console.print("\n[bold green][OK] Done![/bold green]")
        
    except Exception as e:
        console.print(f"[bold red][FAIL] Error:[/bold red] {e}")


# Example usage:
# quick_query("What is Basel III?", use_agent=False)  # Just search
# quick_query("What is Basel III?", use_agent=True)   # Full AI answer

console.print("[green][OK] Helper function loaded! Use:[/green]")
console.print('[dim] quick_query("Your question here", use_agent=False)[/dim]')


## Try Different Queries

Test various questions on your document corpus:


In [None]:
# Example: Full AI-generated answer (slower, more comprehensive)
quick_query("Calculate the total standalone operating expenses for ICICI Bank for H1-2026 by finding the values for Q1-2026 and Q2-2026 and adding them together..", use_agent=True)


Here are some queries you can try

- As per its Q1FY26 results, what percentage of SBI's savings bank accounts were acquired digitally through YONO?
- How did SBI's standalone Gross NPA (GNPA) ratio and Net NPA (NNPA) ratio change between the end of Q2FY25 and the end of Q3FY25?
- List the profit after tax for all of ICICI Bank's key subsidiaries (ICICI Prudential Life, ICICI Lombard, ICICI AMC, ICICI Securities) for Q2-2026.
- Based on their latest 2025 quarterly reports (Q2'26 for HDFC/ICICI, Q1'26 for SBI), rank HDFC Bank, ICICI Bank, and SBI from lowest to highest based on their standalone Gross NPA ratio.

In [None]:
# Try your own query here!
quick_query("YOUR QUESTION HERE", use_agent=True)


---

# Summary

- Connected to LlamaStack API
- Discovered available models and vector databases
- Performed semantic search on document embeddings
- Generated AI-powered answers with source attribution
- Created reusable query helpers

---

## Next Steps

- **Ingest More Documents**: Run the KFP pipeline with new PDFs
- **Experiment with Queries**: Try different question types
- **Fine-tune Agent Instructions**: Customize response style
- **Build Applications**: Use this as a foundation for RAG apps
