# Test Unstructured Amazon Bedrock Knowledge Base

This notebook tests the unstructured Amazon Bedrock Knowledge Base created in the previous notebook using the `RetrieveAndGenerate` API.


## Setup and Prerequisites

### Prerequisites
* Completed `1.1-prerequisites-unstructured-kb.ipynb` notebook
* Unstructured Knowledge Base successfully created and ingested

Let's start by importing the required libraries and loading the Knowledge Base configuration:


Import required libraries for AWS service interaction and testing the Knowledge Base retrieval capabilities.

In [None]:
import os
import json
import time
import boto3
import logging
from datetime import datetime


Initialize AWS service clients for Bedrock Agent Runtime to query the Knowledge Base and retrieve account information.

In [None]:
# Initialize AWS clients

session = boto3.session.Session()
sts_client = boto3.client('sts')
region = session.region_name
account_id = sts_client.get_caller_identity()["Account"]

bedrock_agent_runtime_client = boto3.client("bedrock-agent-runtime")

print(f"AWS Region: {region}")
print(f"AWS Account ID: {account_id}")


## Load Unstructured Knowledge Base Configuration

Load the Knowledge Base ID and configuration from the prerequisite notebook.


In [None]:
# Load stored variables from prerequisite notebook
%store -r unstructured_kb_id
%store -r kb_region
%store -r data_bucket_name

# Validate that variables were loaded
if 'unstructured_kb_id' not in locals():
    raise ValueError("Unstructured KB ID not found. Please run 1.1-prerequisites-unstructured-kb.ipynb first.")

print("Unstructured Knowledge Base Configuration:")
print("=" * 50)
print(f"KB ID: {unstructured_kb_id}")
print(f"Region: {kb_region}")
print(f"S3 Bucket: {data_bucket_name}")
print("=" * 50)


## Setup Retrieval Configuration

Set up the foundation model and retrieval parameters.


In [None]:
# Test configuration
foundation_model = "anthropic.claude-haiku-4-5-20251001-v1:0"

# For metadata filtering, use Claude Haiku 4.5 (now supported for implicit filtering)
# AWS Documentation: https://docs.aws.amazon.com/bedrock/latest/userguide/kb-test-config.html#kb-test-config-implicit-filter
metadata_filter_model = "anthropic.claude-haiku-4-5-20251001-v1:0"
max_results = 5

# Construct the full ARN for the global inference profile
# Global inference profiles require the full ARN format for Knowledge Base retrieval
foundation_model_arn = f"arn:aws:bedrock:{region}:{account_id}:inference-profile/global.{foundation_model}"

# Construct the foundation model ARN for metadata filtering (also using global inference profile)
metadata_filter_model_arn = f"arn:aws:bedrock:{kb_region}:{account_id}:inference-profile/global.{metadata_filter_model}"

print(f"Using foundation model: {foundation_model}")
print(f"Foundation model ARN: {foundation_model_arn}")
print(f"Metadata filter model: {metadata_filter_model}")
print(f"Metadata filter model ARN: {metadata_filter_model_arn}")
print(f"Max results per query: {max_results}")
print("\nðŸ“– Documentation: https://docs.aws.amazon.com/bedrock/latest/userguide/kb-test-config.html#kb-test-config-implicit-filter")

## Helper Functions

Define helper functions to format and display results from the Knowledge Base APIs.


In [None]:
def run_unstructured_kb_query(query, use_metadata_filtering=False):
    """
    Test a query against the retrieve_and_generate API with optional metadata filtering

    Args:
        query (str): The query to test
        use_metadata_filtering (bool): Whether to enable autogenerated metadata filtering
    """

    try:
        # Base retrieval configuration
        retrieval_config = {
            "vectorSearchConfiguration": {
                "numberOfResults": max_results
            }
        }

        # Add autogenerated metadata filtering if requested
        if use_metadata_filtering:
            retrieval_config["vectorSearchConfiguration"]["implicitFilterConfiguration"] = {
                "metadataAttributes": [
                    {
                        "key": "product_type",
                        "type": "STRING",
                        "description": "The type of product being reviewed. Possible values include: 'cookbook', 'kitchenware', 'furniture', 'speaker', 'educational toy', 'board game', 'shirt', 'self-help'"
                    },
                    {
                        "key": "rating",
                        "type": "NUMBER",
                        "description": "The rating given by the customer, ranging from 1 to 5 stars"
                    },
                    {
                        "key": "created_at",
                        "type": "STRING",
                        "description": "The date when the review was created in YYYY-MM-DD format"
                    },
                    {
                        "key": "product_id",
                        "type": "STRING",
                        "description": "The unique identifier of the product being reviewed"
                    },
                    {
                        "key": "customer_id",
                        "type": "STRING",
                        "description": "The unique identifier of the customer who wrote the review"
                    }
                ],
                # Use the pre-configured foundation model ARN for metadata filtering
                "modelArn": metadata_filter_model_arn
            }

        # Display whether metadata filtering was used
        filter_status = "WITH METADATA FILTERING" if use_metadata_filtering else "WITHOUT METADATA FILTERING"
        print(f"\n{'='*20} RETRIEVE AND GENERATE API {filter_status} {'='*20}")

        # Call retrieve_and_generate API - use the global inference profile ARN for main generation
        rag_response = bedrock_agent_runtime_client.retrieve_and_generate(
            input={'text': query},
            retrieveAndGenerateConfiguration={
                'type': 'KNOWLEDGE_BASE',
                'knowledgeBaseConfiguration': {
                    'knowledgeBaseId': unstructured_kb_id,
                    "modelArn": foundation_model_arn,  # Use the constructed inference profile ARN
                    "retrievalConfiguration": retrieval_config,
                    "generationConfiguration": {
                        "promptTemplate": {
                            "textPromptTemplate": "You are a customer feedback analyst. Objectively summarize customer reviews including both positive and negative feedback.\n\n$search_results$\n$output_format_instructions$\nAnswer: $query$"
                        }
                    }
                }
            }
        )

        # Display results with enhanced metadata information
        display_rag_results_with_metadata(rag_response, query, use_metadata_filtering)

    except Exception as e:
        print(f"Error in retrieve_and_generate API: {e}")



def display_rag_results_with_metadata(response, query, metadata_filtering_used=False):
    """
    Display results from the retrieve_and_generate API with enhanced metadata information
    
    Args:
        response: The API response
        query (str): The original query
        metadata_filtering_used (bool): Whether metadata filtering was enabled
    """
    print(f"\nQUERY: {query}")
    print("-" * 40)
    
    if 'output' in response:
        print("GENERATED RESPONSE:")
        print("-" * 40)
        print(response['output']['text'])
        
        if 'citations' in response:
            citations = response['citations']
            print(f"\nCITATIONS ({len(citations)} found):")
            print("-" * 40)
            
            for i, citation in enumerate(citations, 1):
                print(f"\nCitation {i}:")
                
                if 'retrievedReferences' in citation:
                    refs = citation['retrievedReferences']
                    for j, ref in enumerate(refs, 1):
                        print(f"  Reference {j}:")
                        
                        # Display content
                        content = ref.get('content', {})
                        if 'text' in content:
                            text = content['text']
                            print(f"    Content: {text[:200]}..." if len(text) > 200 else f"    Content: {text}")
                        
                        # Display enhanced metadata information
                        if 'metadata' in ref:
                            metadata = ref['metadata']
                            print(f"    Metadata:")
                            for key, value in metadata.items():
                                print(f"      {key}: {value}")
                        
                        # Display source location
                        if 'location' in ref:
                            location = ref['location']
                            if 's3Location' in location:
                                s3_info = location['s3Location']
                                print(f"    Source: {s3_info.get('uri', 'N/A')}")
                        
                        # Display score if available
                        if 'score' in ref:
                            print(f"    Relevance Score: {ref['score']}")
        
    else:
        print("No response generated")

## Test Queries for Unstructured Customer Review Data

Now let's test various types of qualitative queries that work well with our unstructured customer review data. We'll look at both regular queries and queries with autogenerated metadata filtering.

### Available Metadata Attributes:
- **product_type**: cookbook, kitchenware, furniture, speaker, educational toy, board game, shirt, self-help
- **rating**: 1-5 stars
- **created_at**: Review date 
- **review_id**: Unique review identifier
- **product_id**: Unique product identifier


### Query 1

In [None]:
# Without metadata filtering
run_unstructured_kb_query("What specific problems or benefits do customers describe when reviewing product_890?")

As shown above, when metadata filtering is not applied, the Knowledge Base may retrieve reviews for products other than `product_890` if their content is semantically similar. This happens because the retrieval process relies on semantic similarity rather than exact product ID matching. 

Next, we'll look at how enabling **autogenerated metadata filtering** restricts the results to only those reviews that specifically reference `product_890` in their metadata, ensuring more precise answers and citations.


In [None]:
# With metadata filtering
run_unstructured_kb_query("What specific problems or benefits do customers describe when reviewing product_890?", use_metadata_filtering=True)


### Let's test a few more queries


### Query 2

In [None]:

# Test with metadata filtering - should focus specifically on cookbook reviews
run_unstructured_kb_query("What do customers think about cookbook quality and recipe clarity?", use_metadata_filtering=True)


### Query 3

In [None]:

# Test with metadata filtering - should focus on high-rated furniture reviews
run_unstructured_kb_query("What features make customers give 5-star ratings to furniture products?", use_metadata_filtering=True)


## Next Steps

If all tests completed successfully, your unstructured Knowledge Base is working correctly with both basic retrieval and advanced metadata filtering!

**Ready to continue?** Proceed to [**Lab 2**](../Lab%202%20-%20Structured%20KB/2.1-prerequisites-structured-kb.ipynb) to create a structured Knowledge Base that connects to Amazon Redshift for querying transactional data.
