# Demo 5: Multi-Collection RAG with S3 Vectors
Pattern: Enterprise RAG with Multiple Vector Collections

**Features:**
- Multiple S3 Vector indexes (Technical Docs, FAQs, Memos)
- Metadata-based filtering within each collection
- Collection-specific prompts and routing
- Smart query routing based on content type

In [1]:
import boto3
import json
import time
from typing import List, Dict, Optional
from datetime import datetime, timedelta

In [2]:
# Initialize clients
bedrock_runtime = boto3.client('bedrock-runtime')
s3vectors = boto3.client('s3vectors')

# Configuration
VECTOR_BUCKET = f"multi-collection-vectors-{int(time.time())}"
EMBEDDING_MODEL = "amazon.titan-embed-text-v1"
GENERATION_MODEL = "amazon.nova-pro-v1:0"

# Collection configurations
COLLECTIONS = {
    "technical_docs": {
        "index_name": "technical-docs-index",
        "description": "Technical documentation and API references"
    },
    "faqs": {
        "index_name": "faqs-index", 
        "description": "Frequently asked questions and support content"
    },
    "memos": {
        "index_name": "memos-index",
        "description": "Internal memos and policy documents"
    }
}

In [3]:
# Sample documents for different collections
documents = {
    "technical_docs": [
        {
            "id": "api_auth",
            "title": "API Authentication Guide",
            "content": "Use Bearer tokens for API authentication. Include Authorization header with 'Bearer <token>'. Tokens expire after 24 hours.",
            "metadata": {"category": "security", "version": "v2.1", "last_updated": "2024-01-15"}
        },
        {
            "id": "rate_limits",
            "title": "API Rate Limiting", 
            "content": "API rate limits: 1000 requests/hour for standard tier, 5000 requests/hour for premium. Use exponential backoff for retries.",
            "metadata": {"category": "performance", "version": "v2.1", "last_updated": "2024-01-10"}
        },
        {
            "id": "webhooks",
            "title": "Webhook Configuration",
            "content": "Configure webhooks for real-time notifications. Endpoint must return 200 status. Retry policy: 3 attempts with exponential backoff.",
            "metadata": {"category": "integration", "version": "v2.0", "last_updated": "2024-01-05"}
        }
    ],
    "faqs": [
        {
            "id": "billing_faq",
            "title": "Billing Questions",
            "content": "Q: How is billing calculated? A: Billing is based on API calls and storage usage. Premium features have additional costs.",
            "metadata": {"category": "billing", "priority": "high", "last_updated": "2024-01-20"}
        },
        {
            "id": "support_faq",
            "title": "Support Process",
            "content": "Q: How to get support? A: Submit tickets through portal. Response time: 24h standard, 4h premium, 1h enterprise.",
            "metadata": {"category": "support", "priority": "high", "last_updated": "2024-01-18"}
        },
        {
            "id": "integration_faq",
            "title": "Integration Help",
            "content": "Q: Common integration issues? A: Check API keys, verify endpoints, ensure proper headers. Use sandbox for testing.",
            "metadata": {"category": "integration", "priority": "medium", "last_updated": "2024-01-12"}
        }
    ],
    "memos": [
        {
            "id": "security_policy",
            "title": "Security Policy Update",
            "content": "New security policy effective Feb 1st: MFA required for all admin accounts. Password complexity increased. Regular security audits.",
            "metadata": {"department": "security", "confidential": "internal", "effective_date": "2024-02-01"}
        },
        {
            "id": "remote_work",
            "title": "Remote Work Guidelines",
            "content": "Updated remote work policy: 3 days office, 2 days remote maximum. VPN required for all remote connections. Equipment provided.",
            "metadata": {"department": "hr", "confidential": "internal", "effective_date": "2024-01-15"}
        },
        {
            "id": "budget_memo",
            "title": "Q1 Budget Allocation",
            "content": "Q1 budget approved: 40% engineering, 25% marketing, 20% sales, 15% operations. Monthly reviews scheduled.",
            "metadata": {"department": "finance", "confidential": "restricted", "effective_date": "2024-01-01"}
        }
    ]
}

print(f"Loaded documents for {len(COLLECTIONS)} collections")
for collection, docs in documents.items():
    print(f"  {collection}: {len(docs)} documents")

Loaded documents for 3 collections
  technical_docs: 3 documents
  faqs: 3 documents
  memos: 3 documents


In [7]:
# Create S3 Vector bucket and indexes for each collection
s3vectors.create_vector_bucket(vectorBucketName=VECTOR_BUCKET)
print(f"Created vector bucket: {VECTOR_BUCKET}")

# Create vector index for each collection
for collection_name, config in COLLECTIONS.items():
    s3vectors.create_index(
        vectorBucketName=VECTOR_BUCKET,
        indexName=config["index_name"],
        dataType="float32",
        dimension=1536,  # Titan embedding dimension
        distanceMetric="cosine"
    )
    print(f"Created index: {config['index_name']} for {collection_name}")

Created vector bucket: multi-collection-vectors-1767769436
Created index: technical-docs-index for technical_docs
Created index: faqs-index for faqs
Created index: memos-index for memos


In [9]:
def get_embedding(text: str) -> List[float]:
    """Get embedding using Titan model"""
    response = bedrock_runtime.invoke_model(
        modelId=EMBEDDING_MODEL,
        body=json.dumps({"inputText": text})
    )
    return json.loads(response['body'].read())['embedding']

# Generate embeddings and store in respective S3 Vector indexes
print("Generating embeddings and storing in S3 Vector collections...")

for collection_name, docs in documents.items():
    index_name = COLLECTIONS[collection_name]["index_name"]
    vectors_to_put = []
    
    for doc in docs:
        embedding = get_embedding(doc["content"])
        
        # Prepare metadata for S3 Vectors (flatten nested metadata)
        flat_metadata = {
            "title": doc["title"],
            "content": doc["content"],
            "collection": collection_name
        }
        
        # Add document metadata
        for key, value in doc["metadata"].items():
            flat_metadata[f"meta_{key}"] = str(value)
        
        vectors_to_put.append({
            "key": doc["id"],
            "data": {'float32': embedding},
            "metadata": flat_metadata
        })
        print(f"Prepared vector for {doc['id']} in {collection_name}")
        time.sleep(0.1)
    
    # Batch insert vectors for this collection
    s3vectors.put_vectors(
        vectorBucketName=VECTOR_BUCKET,
        indexName=index_name,
        vectors=vectors_to_put
    )
    print(f"Stored {len(vectors_to_put)} vectors in {collection_name} collection\n")

print("All collections populated with vectors")

Generating embeddings and storing in S3 Vector collections...
Prepared vector for api_auth in technical_docs
Prepared vector for rate_limits in technical_docs
Prepared vector for webhooks in technical_docs
Stored 3 vectors in technical_docs collection

Prepared vector for billing_faq in faqs
Prepared vector for support_faq in faqs
Prepared vector for integration_faq in faqs
Stored 3 vectors in faqs collection

Prepared vector for security_policy in memos
Prepared vector for remote_work in memos
Prepared vector for budget_memo in memos
Stored 3 vectors in memos collection

All collections populated with vectors


In [23]:
def route_query_to_collections(query: str) -> List[str]:
    """Determine which collections to search based on query content"""
    query_lower = query.lower()
    
    # Keywords for each collection type
    collection_keywords = {
        "technical_docs": ["api", "authentication", "rate limit", "webhook", "integration", "endpoint", "token"],
        "faqs": ["how to", "question", "help", "support", "billing", "cost", "price", "faq"],
        "memos": ["policy", "memo", "guideline", "budget", "security", "remote work", "internal"]
    }
    
    # Score each collection based on keyword matches
    collection_scores = {}
    for collection, keywords in collection_keywords.items():
        score = sum(1 for keyword in keywords if keyword in query_lower)
        if score > 0:
            collection_scores[collection] = score
    
    # If no specific matches, search all collections
    if not collection_scores:
        return list(COLLECTIONS.keys())
    
    # Return collections sorted by relevance score
    return sorted(collection_scores.keys(), key=lambda x: collection_scores[x], reverse=True)

def search_collection(query: str, collection_name: str, top_k: int = 5, metadata_filter: Optional[Dict] = None) -> List[Dict]:
    """Search a specific collection with optional metadata filtering"""
    query_embedding = get_embedding(query)
    index_name = COLLECTIONS[collection_name]["index_name"]
    
    # Build query parameters
    query_params = {
        "vectorBucketName": VECTOR_BUCKET,
        "indexName": index_name,
        "queryVector": {'float32': query_embedding},
        "topK": top_k,
        "returnMetadata": True,
        "returnDistance": True
    }
    
    # Add metadata filter if provided
    if metadata_filter:
        query_params["filter"] = metadata_filter
    
    response = s3vectors.query_vectors(**query_params)
    
    results = []
    for result in response['vectors']:
        results.append({
            "doc_id": result['key'],
            "collection": collection_name,
            "title": result['metadata']['title'],
            "content": result['metadata']['content'],
            "metadata": {k.replace('meta_', ''): v for k, v in result['metadata'].items() if k.startswith('meta_')},
            "score": 1 - result['distance']  # Convert distance to similarity
        })
    
    return results

In [16]:
def multi_collection_search(query: str, top_k_per_collection: int = 3, metadata_filters: Optional[Dict] = None) -> Dict:
    """Search across multiple collections with smart routing"""
    
    # Route query to relevant collections
    target_collections = route_query_to_collections(query)
    
    print(f"Query: {query}")
    print(f"Routing to collections: {target_collections}\n")
    
    all_results = []
    collection_results = {}
    
    for collection in target_collections:
        # Get collection-specific metadata filter if provided
        collection_filter = metadata_filters.get(collection) if metadata_filters else None
        
        results = search_collection(query, collection, top_k_per_collection, collection_filter)
        collection_results[collection] = results
        all_results.extend(results)
        
        print(f"{collection.upper()} Results:")
        for i, result in enumerate(results, 1):
            print(f"  {i}. {result['title']} (score: {result['score']:.3f})")
        print()
    
    # Sort all results by score
    all_results.sort(key=lambda x: x['score'], reverse=True)
    
    return {
        "query": query,
        "target_collections": target_collections,
        "collection_results": collection_results,
        "all_results": all_results[:10]  # Top 10 overall
    }

In [17]:
def generate_collection_aware_answer(query: str, search_results: Dict) -> str:
    """Generate answer with collection-aware context"""
    
    # Build context from top results across collections
    context_parts = []
    for result in search_results["all_results"][:5]:  # Top 5 results
        collection = result['collection'].replace('_', ' ').title()
        context_parts.append(
            f"[{collection}] {result['title']}: {result['content']}"
        )
    
    context = "\n\n".join(context_parts)
    collections_searched = ", ".join([c.replace('_', ' ').title() for c in search_results["target_collections"]])
    
    prompt = f"""Based on the following information from multiple enterprise collections, provide a comprehensive answer.

Collections searched: {collections_searched}

Context:
{context}

Question: {query}

Provide a detailed answer that:
1. Addresses the specific question
2. References relevant information from the appropriate collections
3. Mentions which collection type the information comes from when relevant

Answer:"""
    
    response = bedrock_runtime.invoke_model(
        modelId=GENERATION_MODEL,
        body=json.dumps({
            "messages": [{
                "role": "user",
                "content": [{"text": prompt}]
            }],
            "inferenceConfig": {
                "maxTokens": 500,
                "temperature": 0.1
            }
        })
    )
    
    result = json.loads(response['body'].read())
    return result['output']['message']['content'][0]['text']

In [18]:
def multi_collection_rag_pipeline(query: str, metadata_filters: Optional[Dict] = None) -> Dict:
    """Complete multi-collection RAG pipeline"""
    
    print(f"\n{'='*70}")
    print(f"MULTI-COLLECTION S3 VECTORS RAG PIPELINE")
    print(f"{'='*70}")
    
    # Step 1: Multi-collection search
    search_results = multi_collection_search(query, top_k_per_collection=3, metadata_filters=metadata_filters)
    
    # Step 2: Generate collection-aware answer
    print("Generating collection-aware answer...")
    answer = generate_collection_aware_answer(query, search_results)
    
    print(f"\nFINAL ANSWER:")
    print(f"{answer}")
    print(f"\n{'='*70}\n")
    
    return {
        "query": query,
        "search_results": search_results,
        "answer": answer
    }

In [21]:
# Test multi-collection RAG with different query types
test_queries = [
    "How do I authenticate with the API?",
    "What are the billing options and support response times?", 
    "What is the new security policy?",
    "How to handle API rate limits and integration issues?"
]

results = []
for query in test_queries:
    result = multi_collection_rag_pipeline(query)
    results.append(result)


MULTI-COLLECTION S3 VECTORS RAG PIPELINE
Query: How do I authenticate with the API?
Routing to collections: ['technical_docs']

TECHNICAL_DOCS Results:
  1. API Authentication Guide (score: 0.736)
  2. API Rate Limiting (score: 0.523)
  3. Webhook Configuration (score: 0.405)

Generating collection-aware answer...

FINAL ANSWER:
To authenticate with the API, you should use Bearer tokens as specified in the **API Authentication Guide** from the Technical Docs collection. Here’s a step-by-step guide on how to do this:

1. **Obtain a Bearer Token**:
   - You need to acquire a Bearer token, which is typically provided after you authenticate through some initial process (e.g., OAuth flow, API key, or another authentication mechanism). This token serves as your credential for subsequent API requests.

2. **Include the Authorization Header**:
   - For every API request you make, you must include an `Authorization` header. The value of this header should be `Bearer <token>`, where `<token>` i

In [24]:
# Test with metadata filtering
print("\nTesting with metadata filters...")

# Example: Only search for high priority FAQs and recent technical docs
metadata_filters = {
    "faqs": {"meta_priority": "high"},
    "technical_docs": {"meta_version": "v2.1"}
}

filtered_result = multi_collection_rag_pipeline(
    "How to get support and what are the latest API features?",
    metadata_filters=metadata_filters
)


Testing with metadata filters...

MULTI-COLLECTION S3 VECTORS RAG PIPELINE
Query: How to get support and what are the latest API features?
Routing to collections: ['faqs', 'technical_docs']

FAQS Results:
  1. Support Process (score: 0.539)
  2. Billing Questions (score: 0.365)

TECHNICAL_DOCS Results:
  1. API Rate Limiting (score: 0.515)
  2. API Authentication Guide (score: 0.458)

Generating collection-aware answer...

FINAL ANSWER:
Certainly! Here's a comprehensive answer based on the provided information from the multiple enterprise collections:

### How to Get Support

To get support, you need to submit tickets through the support portal. The response times vary depending on your support tier:
- **Standard Support:** The standard response time is 24 hours. (Source: [Faqs] Support Process)
- **Premium Support:** The premium response time is 4 hours. (Source: [Faqs] Support Process)
- **Enterprise Support:** The enterprise response time is 1 hour. (Source: [Faqs] Support Process)

## Multi-Collection RAG Benefits with S3 Vectors

### S3 Vectors Multi-Collection Advantages:
✅ **Collection isolation**: Separate indexes for different content types  
✅ **Metadata filtering**: Rich filtering within each collection  
✅ **Smart routing**: Query-based collection selection  
✅ **Scalable architecture**: Each collection can scale independently  
✅ **Cost optimization**: Pay only for what you use per collection  

### Enterprise RAG Features:
✅ **Content organization**: Technical docs, FAQs, memos in separate collections  
✅ **Access control**: Collection-level security and permissions  
✅ **Metadata-rich search**: Filter by category, priority, department, etc.  
✅ **Collection-aware responses**: Context includes source collection info  

### Collection Types:
1. **Technical Docs**: API references, integration guides, technical specifications
2. **FAQs**: Support content, common questions, troubleshooting guides
3. **Memos**: Internal policies, announcements, confidential documents

### Smart Query Routing:
- **Keyword-based routing**: Automatically selects relevant collections
- **Multi-collection search**: Searches across multiple relevant collections
- **Fallback strategy**: Searches all collections if no specific match

### Metadata Filtering Examples:
- **Priority filtering**: `{"meta_priority": "high"}` for urgent content
- **Version filtering**: `{"meta_version": "v2.1"}` for latest docs
- **Department filtering**: `{"meta_department": "security"}` for specific teams
- **Date filtering**: Recent content based on last_updated metadata

### Use Cases:
- **Enterprise knowledge management**: Organize content by type and access level
- **Customer support**: Route queries to appropriate knowledge collections
- **Internal documentation**: Separate public and confidential content
- **Multi-tenant applications**: Isolate data by customer or organization

In [25]:
# Performance summary
print("MULTI-COLLECTION S3 VECTORS PERFORMANCE SUMMARY")
print("="*60)

for i, result in enumerate(results, 1):
    search_results = result["search_results"]
    collections_used = len(search_results["target_collections"])
    total_results = len(search_results["all_results"])
    
    print(f"Query {i}: {result['query'][:50]}...")
    print(f"  Collections searched: {collections_used}")
    print(f"  Total results: {total_results}")
    print(f"  Target collections: {', '.join(search_results['target_collections'])}")
    print()

print(f"Demo complete! S3 Vectors bucket: {VECTOR_BUCKET}")
print(f"Created {len(COLLECTIONS)} separate vector collections for enterprise RAG.")
print("Multi-collection architecture enables organized, scalable knowledge management.")

MULTI-COLLECTION S3 VECTORS PERFORMANCE SUMMARY
Query 1: How do I authenticate with the API?...
  Collections searched: 1
  Total results: 3
  Target collections: technical_docs

Query 2: What are the billing options and support response ...
  Collections searched: 1
  Total results: 3
  Target collections: faqs

Query 3: What is the new security policy?...
  Collections searched: 1
  Total results: 3
  Target collections: memos

Query 4: How to handle API rate limits and integration issu...
  Collections searched: 2
  Total results: 6
  Target collections: technical_docs, faqs

Demo complete! S3 Vectors bucket: multi-collection-vectors-1767769436
Created 3 separate vector collections for enterprise RAG.
Multi-collection architecture enables organized, scalable knowledge management.
