# Construction RAG - Basic Usage

This notebook demonstrates the basic usage of the Construction RAG library for processing construction drawings and performing semantic search.

## Prerequisites

```bash
pip install construction-rag
```

For LLM features, set your OpenRouter API key:
```bash
export OPENROUTER_API_KEY="your-api-key"
```

In [None]:
# Import the main pipeline
from construction_rag import ConstructionRAGPipeline

## 1. Initialize the Pipeline

Create a pipeline instance. If you don't have an OpenRouter API key, set `enable_summaries=False`.

In [None]:
# Initialize with summaries disabled (no API key required)
pipeline = ConstructionRAGPipeline(
    persist_directory="./demo_db",
    enable_summaries=False  # Set to True if you have OPENROUTER_API_KEY
)

print("Pipeline initialized!")
print(f"Database location: {pipeline.persist_directory}")

## 2. Process a Construction Drawing

Process a sample construction drawing image. This will:
1. Run IBM Docling for layout detection
2. Apply DBSCAN clustering to group text blocks
3. Index the chunks in ChromaDB

In [None]:
# Process a sample image
result = pipeline.process("sample_images/sample_floor_plan_1.jpg")

print(f"\nProcessing Result:")
print(f"  Source: {result.source_image}")
print(f"  Success: {result.success}")
print(f"  Chunks extracted: {len(result.chunks)}")
print(f"  Processing time: {result.processing_time:.2f}s")

## 3. Examine the Extracted Chunks

Let's look at the different types of chunks that were extracted.

In [None]:
# Count chunks by type
from collections import Counter

type_counts = Counter(c.chunk_type for c in result.chunks)
print("Chunks by type:")
for chunk_type, count in type_counts.items():
    print(f"  {chunk_type}: {count}")

In [None]:
# Show a few example chunks
print("\nSample chunks:")
for chunk in result.chunks[:5]:
    content_preview = chunk.content[:80].replace('\n', ' ')
    if len(chunk.content) > 80:
        content_preview += "..."
    print(f"\n[{chunk.chunk_type}] {chunk.chunk_id}")
    print(f"  Content: {content_preview}")
    print(f"  Confidence: {chunk.confidence:.2f}")

## 4. Query the Indexed Content

Now we can perform semantic search to find relevant content.

In [None]:
# Perform a semantic search
query = "door schedule"
results = pipeline.query(query, n_results=3)

print(f"Query: '{query}'\n")
for i, r in enumerate(results, 1):
    content_preview = r.content[:100].replace('\n', ' ')
    print(f"{i}. [{r.metadata['chunk_type']}] Score: {r.relevance_score:.3f}")
    print(f"   {content_preview}...")
    print()

In [None]:
# Try another query
query = "project information"
results = pipeline.query(query, n_results=3)

print(f"Query: '{query}'\n")
for i, r in enumerate(results, 1):
    print(f"{i}. [{r.metadata['chunk_type']}] Score: {r.relevance_score:.3f}")
    print(f"   {r.content[:100].replace(chr(10), ' ')}...")
    print()

## 5. Filter by Chunk Type

You can also filter queries to specific chunk types.

In [None]:
# Query only tables
results = pipeline.query("schedule", n_results=3, filter_type="table")

print("Tables matching 'schedule':")
for r in results:
    print(f"  - {r.content[:60].replace(chr(10), ' ')}...")

## 6. Check Statistics

Get statistics about the indexed content.

In [None]:
stats = pipeline.get_stats()

print("Pipeline Statistics:")
print(f"  Total chunks: {stats['total_chunks']}")
print(f"  Embedding model: {stats['embedding_model']}")
print(f"  LLM enabled: {stats['llm_enabled']}")
print(f"\n  Chunks by type:")
for chunk_type, count in stats['chunks_by_type'].items():
    print(f"    {chunk_type}: {count}")

## 7. Clean Up

Clear the database when done (optional).

In [None]:
# Uncomment to clear the database
# pipeline.clear()
# print("Database cleared!")

## Next Steps

- See `02_full_pipeline.ipynb` for processing multiple drawings and using LLM features
- See `03_evaluation.ipynb` for evaluating RAG quality with RAGAS metrics