# Multimodal Agentic RAG Pack Tutorial

This tutorial demonstrates how to use the **Multimodal Agentic RAG Pack** to build a high-fidelity RAG pipeline capable of understanding complex documents with diagrams, tables, and formulas.

### üß© What makes this pack special?
- **Sidecar Metadata**: Automatically extracts and injects Images' BBox (Bounding Box) coordinates into your metadata.
- **Dual-Store Reasoning**: Combines the semantic power of **Qdrant** with the relational reasoning of **Neo4j**.
- **Agentic Workflow**: Features built-in retrieval grading and web search fallback (via Tavily).

‚ö†Ô∏è **Prerequisites:**
Ensure the following services are running:
- **Qdrant**: `http://localhost:6333`
- **Neo4j**: `bolt://localhost:7687`

Also ensure your `.env` file contains:
- `DASHSCOPE_API_KEY`
- `NEO4J_PASSWORD`
- (Required if you need web search) `TAVILY_API_KEY`


## Step 1: Initialize the Pack

‚ö†Ô∏è `force_recreate=True` will clear existing Qdrant collections and Neo4j data.


In [None]:
import os
import sys
import nest_asyncio
from dotenv import load_dotenv
from llama_index.packs.multimodal_agentic_rag import MultimodalAgenticRAGPack
nest_asyncio.apply()

load_dotenv()

os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'

In [None]:
# Initialize the Pack
# Ensure DASHSCOPE_API_KEY is in your .env
if not os.getenv('DASHSCOPE_API_KEY'):
    raise ValueError('‚ùå DASHSCOPE_API_KEY not found in environment.')

pack = MultimodalAgenticRAGPack(
    dashscope_api_key=os.getenv('DASHSCOPE_API_KEY') or '',
    qdrant_url='http://localhost:6333',
    neo4j_url='bolt://localhost:7687',
    neo4j_password=os.getenv('NEO4J_PASSWORD', 'password'),
    tavily_api_key=os.getenv('TAVILY_API_KEY'),
    data_dir='./data_test',
    force_recreate=True
)

print('üöÄ Pack initialized successfully')


## Step 2: Ingest a PDF

Ensure `test.pdf` exists in the root directory.


In [None]:
# 2. Prepare & Run Ingestion
pdf_path = "test.pdf"

if not os.path.exists(pdf_path):
    print(f"‚ùå Error: '{pdf_path}' not found.")

print(f"\nüöÄ [2/3] Starting Ingestion: {pdf_path}")
# ÊâßË°å‰∏ÄÊ¨°Âç≥ÂèØ
await pack.run_ingestion(pdf_path)
print("‚ú® Ingestion complete!")

## Step 3: Query the System


In [None]:
query = 'What are the core technologies discussed in this document?'
print(f'‚ùì Query: {query}')

response = await pack.run(query)


## Step 4: Stream Final Response + Inspect References


In [None]:
# AI Response (Streaming support)
if isinstance(response, dict):
    print("ü§ñ AI Answer:")
    print("-" * 30)

    async for chunk in response.get("final_response"):
        print(chunk.delta or "", end="", flush=True)
    
    # Inspecting Visual Metadata
    print("\n\nüìö Visual Evidence (BBox Metadata):")
    print("-" * 30)
    retrieved_nodes = response.get("retrieved_nodes", [])

    for i, node in enumerate(retrieved_nodes):
        if isinstance(node, dict):
            meta = node.get("metadata", {})
            score = node.get("score", 0.0)
            text = node.get("text", "")
        else:
            meta = node.metadata if hasattr(node, 'metadata') else node.node.metadata
            score = getattr(node, 'score', 0.0)
            text = node.get_content() if hasattr(node, 'get_content') else ""

        if "bbox" in meta:
            print(f"[{i+1}] Page {meta.get('page_label', 'N/A')}: BBox found ‚úÖ")
            print(f"    Score: {score:.4f}")
            print(f"    Coordinates: {meta['bbox']}")
        else:
            print(f"[{i+1}] Page {meta.get('page_label', 'N/A')}: No BBox metadata.")
else:
    # Fallback if response is just a string
    print(response)