# Bedrock Knowledge Base Retrieval and Generation with Metadata Filtering

### Description:
This notebook demonstrates how to query and retrieve data from an Amazon Bedrock-powered knowledge base using different configurations, filters, and citation extraction. The steps include creating a query, retrieving responses, and printing the citations used for generating the results.


![Metadata Filtering](./metadata_filtering.png)

## 1. Load Configuration Variables

In [None]:
# Load configuration variables from a JSON file to access knowledge base ID, account number, and guardrail info.
import json

with open("../Lab 1/variables.json", "r") as f:
    variables = json.load(f)

variables  # Display the loaded variables for confirmation

## 2. Set Up Required IDs and Model ARNs

In [None]:
accountNumber=variables['accountNumber']   
knowledge_base_id = variables['kbFixedChunk']   
model_id = 'us.amazon.nova-pro-v1:0' 
model_arn = f"arn:aws:bedrock:us-west-2:{accountNumber}:inference-profile/{model_id}"


## 3. Configure Bedrock Client

In [None]:
import boto3
import json
from typing import *

# Configure the Bedrock client
bedrock_agent_runtime = boto3.client('bedrock-agent-runtime', region_name="us-west-2")


## 4. Define Function to Retrieve and Generate Without Filters

In [None]:
def retrieve_and_generate_without_filter(query, knowledge_base_id, model_arn):
    """
    Retrieves and generates a response based on the given query.

    Parameters:
    - query (str): The input query.
    - knowledge_base_id (str): The ID of the knowledge base.
    - model_arn (str): The ARN of the model.
    - one_group_filter (dict): The filter for the vector search configuration.

    Returns:
    - response: The response from the retrieve_and_generate method.
    """
    response = bedrock_agent_runtime.retrieve_and_generate(
        input={
            "text": query
        },
        retrieveAndGenerateConfiguration={
            "type": "KNOWLEDGE_BASE",
            "knowledgeBaseConfiguration": {
                'knowledgeBaseId': knowledge_base_id,
                "modelArn": model_arn,
                "retrievalConfiguration": {
                    "vectorSearchConfiguration": {
                        "numberOfResults": 5
                    }
                }
            }
        }
    )
    return response


## 5. Define Function to Retrieve and Generate With Filters

In [None]:
def retrieve_and_generate_with_filter(query, knowledge_base_id, model_arn, metadata_filter):
    """
    Retrieves and generates a response based on the given query.

    Parameters:
    - query (str): The input query.
    - knowledge_base_id (str): The ID of the knowledge base.
    - model_arn (str): The ARN of the model.
    - one_group_filter (dict): The filter for the vector search configuration.

    Returns:
    - response: The response from the retrieve_and_generate method.
    """
    response = bedrock_agent_runtime.retrieve_and_generate(
        input={
            "text": query
        },
        retrieveAndGenerateConfiguration={
            "type": "KNOWLEDGE_BASE",
            "knowledgeBaseConfiguration": {
                'knowledgeBaseId': knowledge_base_id,
                "modelArn": model_arn,
                "retrievalConfiguration": {
                    "vectorSearchConfiguration": {
                        "numberOfResults": 5,
                        "filter": metadata_filter
                    }
                }
            }
        }
    )
    return response



## 6. Define Query

In [None]:
query = "what was the % increase in sales?"

## 7. Retrieve Response Without Metadata Filter

In [None]:
response_withoutMetadata=retrieve_and_generate_without_filter(query, knowledge_base_id, model_arn)
print(response_withoutMetadata['output']['text'])


## 8. Retrieve and Print Citations Without Metadata Filter

In [None]:
# Extract citations used to generate the response
response_without_MD = response_withoutMetadata['citations'][0]['retrievedReferences']
print("# of citations or chunks used to generate the response: ", len(response_without_MD))

# Function to print citations or chunks of text retrieved
def citations_rag_print(response_ret):
    for num, chunk in enumerate(response_ret, 1):
        print(f'Chunk {num}: ', chunk['content']['text'], end='\n'*2)
        print(f'Chunk {num} Location: ', chunk['location'], end='\n'*2)
        print(f'Chunk {num} Metadata: ', chunk['metadata'], end='\n'*2)

# Print citations
citations_rag_print(response_without_MD)


## 9. Define Metadata Filter

The code below defines a metadata filter to narrow down the knowledge base search:
- Creates a complex filter using logical operators (andAll)
- The filter has two conditions that must both be true:
  1. docType must equal '10K Report'
  2. year must equal 2023
- This filter will limit retrieval to only chunks from 2023 10K reports
- The structure demonstrates how to build more complex queries with multiple conditions

This filter will be used to demonstrate selective retrieval from specific documents.

In [None]:
# Define a metadata filter for advanced filtering based on specific conditions
one_group_filter= {
    "andAll": [
        {
            "equals": {
                "key": "docType",
                "value": '10K Report'
            }
        },
        {
            "equals": {
                "key": "year",
                "value": 2023
            }
        }
    ]
}


## 10. Retrieve Response With Metadata Filter

In [None]:
# Use the function to retrieve a response with metadata filtering
response_with_Metadata = retrieve_and_generate_with_filter(query, knowledge_base_id, model_arn, one_group_filter)

# Print the response text
print(response_with_Metadata['output']['text'])


## 11. Retrieve and Print Citations With Metadata Filter

In [None]:
# Extract citations used to generate the response with metadata filter
response_with_MD = response_with_Metadata['citations'][0]['retrievedReferences']
print("# of citations or chunks used to generate the response: ", len(response_with_MD))

# Print citations for the filtered response
citations_rag_print(response_with_MD)


## 12. Advanced Metadata Filtering

Dynamically creating metadata fliters allows  to create query-specific filters programmatically rather than hardcoding them.

This cell defines a function to create  metadata filters programatically based on various parameters:
- company: Filter by company name
- year: Filter by year (can be a single year or list of years)
- docType: Filter by document type
- min_page/max_page: Filter by page number ranges
- s3_prefix: Filter by S3 URI prefix
The function builds a filter configuration based on the provided parameters,
combining them with appropriate operators (equals, greaterThanOrEquals, etc.).

In [None]:
def create_dynamic_filter(company=None, year=None, docType=None, min_page=None, max_page=None, s3_prefix=None):
    """
    Creates a dynamic metadata filter for Amazon Bedrock Knowledge Base queries.
    
    Parameters:
    - company (str): Filter by company name (e.g., 'Amazon')
    - year (int or list): Filter by year or list of years
    - docType (str): Filter by document type (e.g., '10K Report')
    - min_page (int): Filter for pages greater than or equal to this number
    - max_page (int): Filter for pages less than or equal to this number
    - segment (str): Filter by business segment (e.g., 'AWS', 'North America', 'International','RISK')
    
    Returns:
    - dict: A metadata filter configuration
    """
    filter_conditions = []
    
    # Add company filter if specified
    if company:
        filter_conditions.append({
            "equals": {
                "key": "company",
                "value": company
            }
        })
    
    # Add year filter (single year or multiple years)
    if year:
        if isinstance(year, list):
            year_conditions = []
            for y in year:
                year_conditions.append({
                    "equals": {
                        "key": "year",
                        "value": y
                    }
                })
            filter_conditions.append({"orAll": year_conditions})
        else:
            filter_conditions.append({
                "equals": {
                    "key": "year",
                    "value": year
                }
            })
    
    # Add document type filter if specified
    if docType:
        filter_conditions.append({
            "equals": {
                "key": "docType",
                "value": docType
            }
        })
    
    # Add minimum page filter if specified
    if min_page is not None:
        filter_conditions.append({
            "greaterThanOrEquals": {
                "key": "x-amz-bedrock-kb-document-page-number",
                "value": min_page
            }
        })
    
    # Add maximum page filter if specified
    if max_page is not None:
        filter_conditions.append({
            "lessThanOrEquals": {
                "key": "x-amz-bedrock-kb-document-page-number",
                "value": max_page
            }
        })

    if s3_prefix:
        filter_conditions.append({
            "startsWith": {
                "key": "x-amz-bedrock-kb-source-uri",
                "value": s3_prefix
            }
        })
    
    # Return the complete filter
    if len(filter_conditions) > 0:
        return {"andAll": filter_conditions}
    else:
        return {}

## Query Financial Data Function
This cell creates a higher-level function that uses the dynamic filter:
- Takes a query text and various filter parameters
- Creates a filter using the create_dynamic_filter function
- Prints the filter configuration for debugging
- Calls retrieve_and_generate_with_filter with the created filter
- Returns the complete response

In [None]:
def query_financial_data(query_text, kb_id, model_arn, **filter_params):
    """
    Perform a query against financial data with dynamic filtering.
    
    Parameters:
    - query_text (str): The natural language query
    - kb_id (str): Knowledge base ID
    - model_arn (str): Model ARN
    - **filter_params: Parameters to pass to create_dynamic_filter
    
    Returns:
    - dict: Response from Bedrock
    """
    # Create the filter
    filter_config = create_dynamic_filter(**filter_params)
    
    # Log the filter for debugging
    print("Using filter configuration:")
    print(json.dumps(filter_config, indent=2))
    
    # Run the query
    response = retrieve_and_generate_with_filter(
        query_text, kb_id, model_arn, filter_config
    )
    
    return response


In [None]:
# Compare growth rates across all Amazon business segments
from utils import print_citations
response = query_financial_data(
    "Compare the year-over-year growth rates for AWS, North America, and International segments, including factors that influenced performance differences",
    knowledge_base_id,
    model_arn,
    company="Amazon",
    year=[2023, 2024],
    min_page=20,
    max_page=30
)
print_citations(response)
#print(response['output']['text'])

In [None]:
# Filter for 2023 documents in a specific folder
s3_prefix_2023 = f"s3://{variables['s3Bucket']}/pdf_documents/"

response_2023 = query_financial_data(
    "What was the AWS revenue growth in 2023?",
    knowledge_base_id,
    model_arn,
    year=[2023,2024],
    s3_prefix=s3_prefix_2023
)

#print(response_2023['output']['text'])
print_citations(response)