# Construction RAG - Full Pipeline Demo

This notebook demonstrates the complete pipeline with:
- Batch processing of multiple drawings
- LLM-powered summaries
- Question answering

## Prerequisites

Set your OpenRouter API key for LLM features:
```bash
export OPENROUTER_API_KEY="your-api-key"
```

In [None]:
import os
from construction_rag import ConstructionRAGPipeline

# Check if API key is set
api_key = os.environ.get("OPENROUTER_API_KEY")
print(f"OpenRouter API key: {'Set ✓' if api_key else 'Not set ✗'}")

## 1. Initialize Pipeline with LLM Support

In [None]:
pipeline = ConstructionRAGPipeline(
    persist_directory="./full_pipeline_db",
    enable_summaries=True,  # Enable LLM summaries
    llm_model="openai/gpt-4o-mini",  # Fast and cost-effective
    cluster_eps=0.02,  # DBSCAN epsilon
    cluster_min_samples=2
)

print(f"LLM enabled: {pipeline.llm is not None}")
if pipeline.llm:
    print(f"LLM model: {pipeline.llm.model}")

## 2. Batch Process Multiple Drawings

In [None]:
# Get all sample images
import glob

image_paths = glob.glob("sample_images/*.jpg")
print(f"Found {len(image_paths)} sample images:")
for path in image_paths:
    print(f"  - {os.path.basename(path)}")

In [None]:
# Process all images
results = pipeline.process_batch(image_paths, verbose=True)

In [None]:
# Summary of processing
successful = sum(1 for r in results if r.success)
total_chunks = sum(len(r.chunks) for r in results)
total_time = sum(r.processing_time for r in results)

print(f"\nProcessing Summary:")
print(f"  Images processed: {successful}/{len(results)}")
print(f"  Total chunks: {total_chunks}")
print(f"  Total time: {total_time:.1f}s")
print(f"  Average time/image: {total_time/len(results):.1f}s")

## 3. Examine LLM-Generated Summaries

In [None]:
# Show chunks with their summaries
all_chunks = [c for r in results for c in r.chunks]

print("Sample chunks with summaries:\n")
for chunk in all_chunks[:5]:
    print(f"[{chunk.chunk_type}] {chunk.chunk_id}")
    print(f"  Content: {chunk.content[:60].replace(chr(10), ' ')}...")
    if chunk.summary:
        print(f"  Summary: {chunk.summary}")
    print()

## 4. Semantic Search

In [None]:
# Search for specific content
queries = [
    "door schedule fire rating",
    "general notes",
    "project information",
    "floor plan layout"
]

for query in queries:
    print(f"\n{'='*50}")
    print(f"Query: '{query}'")
    print(f"{'='*50}")
    
    results = pipeline.query(query, n_results=2)
    for r in results:
        print(f"\n[{r.metadata['chunk_type']}] Score: {r.relevance_score:.3f}")
        print(f"Source: {r.metadata.get('source_image', 'Unknown')}")
        summary = r.metadata.get('summary', '')
        if summary:
            print(f"Summary: {summary}")

## 5. Question Answering with LLM

Use the `ask()` method to get LLM-generated answers grounded in the retrieved content.

In [None]:
# Ask questions about the drawings
questions = [
    "What types of doors are mentioned in the drawings?",
    "What are the general construction notes?",
    "What is the project name?"
]

for question in questions:
    print(f"\n{'='*50}")
    print(f"Q: {question}")
    print(f"{'='*50}")
    
    try:
        answer = pipeline.ask(question)
        print(f"\nA: {answer}")
    except ValueError as e:
        print(f"\nError: {e}")
        print("(LLM not available - set OPENROUTER_API_KEY)")

## 6. Statistics

In [None]:
stats = pipeline.get_stats()

print("Final Statistics:")
print(f"  Total chunks indexed: {stats['total_chunks']}")
print(f"  Embedding model: {stats['embedding_model']}")
print(f"  Embedding dimension: {stats['embedding_dimension']}")
print(f"  LLM model: {stats.get('llm_model', 'None')}")
print(f"\n  Chunks by type:")
for chunk_type, count in stats['chunks_by_type'].items():
    print(f"    {chunk_type}: {count}")

## 7. Clean Up

In [None]:
# Uncomment to clear the database
# pipeline.clear()
# print("Database cleared!")