## Create a Knowledge Base with Custom chunking strategy

#### Custom Chunking Logic with Lambda Functions in Amazon Bedrock

When creating a Knowledge Base (KB) for Amazon Bedrock, you can connect a Lambda function to specify your custom chunking logic. During the ingestion process, if a Lambda function is provided, the Knowledge Base will execute the Lambda function and store the input and output values in the specified intermediate S3 bucket.

#### Use Cases for Lambda Functions in KBs

- **Custom Chunking Logic:** Lambda functions can be used to implement custom logic for chunking documents during ingestion, enabling more control over how documents are divided into meaningful chunks.
- **Chunk-level Metadata Processing:** Lambda functions can also process chunked data, for example, by adding custom metadata at the chunk level, enriching the data for more advanced retrieval or analysis.

This allows for more flexibility and tailored handling of document data within the Knowledge Base, making it possible to apply unique chunking strategies and augment the data with specific metadata for improved search and retrieval.


In [7]:
# Import the advanced_rag_utils module
import advanced_rag_utils
import json
import importlib

# Reload module
importlib.reload(advanced_rag_utils)

# Re-import all functions
from advanced_rag_utils import *

from datetime import datetime, timedelta, UTC

notebook_start_time = datetime.now(UTC)

# Load the variables from the JSON file
with open("../variables.json", "r") as f:
    variables = json.load(f)

variables

{'accountNumber': '270597685972',
 'regionName': 'us-west-2',
 'collectionArn': 'arn:aws:aoss:us-west-2:270597685972:collection/3ethft3xms9as2092ulg',
 'collectionId': '3ethft3xms9as2092ulg',
 'vectorIndexName': 'ws-index-',
 'bedrockExecutionRoleArn': 'arn:aws:iam::270597685972:role/advanced-rag-workshop-bedrock_execution_role-us-west-2',
 's3Bucket': '270597685972-us-west-2-advanced-rag-workshop',
 'kbFixedChunk': 'SN9KSOQPOV',
 'kbSemanticChunk': 'KMZYCTNSWW',
 'kbHierarchicalChunk': 'V8EJKFPYTK',
 'kbCustomChunk': 'G8P2D7M28S',
 'sagemakerLLMEndpoint': 'endpoint-llama-3-2-3b-instruct-2025-05-02-18-22-06'}

In [8]:
model_id = "amazon.titan-embed-text-v2:0"
kb_chunking_strategy = "custom" # ["fixed", "hierarchical", "semantic", "custom"]

In [9]:
df_costs = load_df_from_csv()
df_costs

Loaded existing file: /home/sagemaker-user/sample-advanced-rag-using-bedrock-and-sagemaker/embed_algo_costs.csv


Unnamed: 0,chunking_algo,embedding_seconds,input_tokens,invocation_count,total_token_costs
0,fixed,41.132119,243046,906,0.004861
1,hierarchical,56.287275,295358,1157,0.005907
2,semantic,131.845039,680599,4178,0.013612


### 0. Create a Lambda function with custom chunking logic

In [10]:
# Create or update the Lambda function with custom chunking logic
role_arn, function_arn = create_or_update_custom_chunking_lambda(
    region_name=variables["regionName"],
    account_number=variables["accountNumber"],
    role_name=f"advanced-rag-custom-chunk-{variables['regionName']}-role",
    function_name="advanced-rag-custom-chunk",
    s3_bucket=variables['s3Bucket']
)

IAM role 'advanced-rag-custom-chunk-us-west-2-role' already exists. Using the existing role.
Lambda function 'advanced-rag-custom-chunk' already exists. Updating code...
Lambda function code updated successfully


In [11]:
# Create an S3 bucket for custom chunking if it doesn't exist
create_custom_chunk_s3_bucket(
    s3_bucket=variables["s3Bucket"],
    region_name=variables["regionName"]
)

Bucket '270597685972-us-west-2-advanced-rag-workshop-custom-chunk' already exists.


'270597685972-us-west-2-advanced-rag-workshop-custom-chunk'

### 1. Create a Knowledge Base

In [12]:
model_id = "amazon.titan-embed-text-v2:0"
kb_chunking_strategy = "custom" # ["fixed", "hierarchical", "semantic", "custom"]

In [13]:
kb_name = f"advanced-rag-workshop-{kb_chunking_strategy}-chunking"

kb_description = "Knowledge base using Amazon OpenSearch Service as a vector store"

kb = create_kb(kb_name, kb_description, kb_chunking_strategy, variables, model_id)

{'collectionArn': 'arn:aws:aoss:us-west-2:270597685972:collection/3ethft3xms9as2092ulg', 'vectorIndexName': 'ws-index-custom', 'fieldMapping': {'vectorField': 'vector', 'textField': 'text', 'metadataField': 'text-metadata'}}
{'collectionArn': 'arn:aws:aoss:us-west-2:270597685972:collection/3ethft3xms9as2092ulg', 'vectorIndexName': 'ws-index-custom', 'fieldMapping': {'vectorField': 'vector', 'textField': 'text', 'metadataField': 'text-metadata'}}
{'collectionArn': 'arn:aws:aoss:us-west-2:270597685972:collection/3ethft3xms9as2092ulg', 'vectorIndexName': 'ws-index-custom', 'fieldMapping': {'vectorField': 'vector', 'textField': 'text', 'metadataField': 'text-metadata'}}
Knowledge Base already exists. Retrieving its ID...
Found existing knowledge base with Name: advanced-rag-workshop-custom-chunking and ID: G8P2D7M28S
OpenSearch Knowledge Response: {
    "createdAt": "2025-05-02 18:06:18.640912+00:00",
    "description": "Knowledge base using Amazon OpenSearch Service as a vector store",
  

### 2. Create Datasources for Knowledge Base

In [16]:
# Create the data source with custom transformation configuration
ds_object = create_custom_data_source_for_kb(
    kb=kb,
    variables=variables,
    data_source_name="advanced-rag-example",
    function_arn=function_arn
)

Checking for existing data sources in knowledge base G8P2D7M28S...
Found existing data source 'advanced-rag-example'. Deleting it...
Waiting for data source deletion to complete...
Data source deleted.
Creating new data source 'advanced-rag-example' with custom chunking...
Custom chunking data source created successfully with ID: OHPOWTAXS5


### 3. Start Ingestion Job for Amazon Bedrock Knowledge base pointing to Amazon OpenSearch

> **Note**: The ingestion process will take approximately 2-3 minutes to complete. During this time, the system is processing your documents by:
> 1. Extracting text from the source files
> 2. Chunking the content according to the defined strategy (Fixed / Semantic / Hierachical / Custom)
> 3. Generating embeddings for each chunk
> 4. Storing the embeddings and associated metadata in the OpenSearch vector database
>
> You'll see status updates as the process progresses. Please wait for the "Ingestion job completed successfully" message before proceeding to the next step.

In [17]:
from time import sleep
ingestion_start_time = datetime.now(UTC)
sleep(3)
create_ingestion_job(kb, ds_object, variables)
sleep(3)
ingestion_end_time = datetime.now(UTC)

Ingestion job started successfully for kb_name = advanced-rag-workshop-custom-chunking and kb_id = G8P2D7M28S

running...
running...
running...
running...
running...
running...
running...
Job completed successfully



In [18]:
time_taken = (ingestion_end_time-ingestion_start_time).total_seconds()
print(f"time taken to ingest into KB = {fmt_n(time_taken)} seconds")

time taken to ingest into KB = 77.42 seconds


In [19]:
vector_store_embedding_cost = get_bedrock_token_based_cost(model_id, ingestion_start_time, ingestion_end_time)
print(json.dumps(vector_store_embedding_cost, indent=4))

{
    "model_id": "amazon.titan-embed-text-v2:0",
    "start_time": "2025-05-02T23:40:22.212813+00:00",
    "end_time": "2025-05-02T23:41:39.629905+00:00",
    "duration in minutes": 1.2902848666666666,
    "input_tokens": 99307,
    "output_tokens": 0,
    "invocation_count": 638,
    "per million input token costs": 0.02,
    "per million output token costs": 0.0,
    "input token costs": 0.00198614,
    "output token costs": 0.0,
    "total token costs": 0.00198614,
    "average token costs per invocation": 3.1130721003134796e-06,
    "token costs per MILLION such invocations": 3.1130721003134796
}


In [20]:
# Let's add or update the cost binfo to dataframe. 
# This will help us compare the costs from various chunking strategies visually.
new_row = {
    'chunking_algo': kb_chunking_strategy,
    'embedding_seconds': vector_store_embedding_cost['duration in minutes']*60,
    'input_tokens': vector_store_embedding_cost['input_tokens'],
    'invocation_count': vector_store_embedding_cost['invocation_count'],
    'total_token_costs': vector_store_embedding_cost['total token costs']
}
df_costs = update_or_add_row(df_costs, new_row)
df_costs

Added new row for: custom


Unnamed: 0,chunking_algo,embedding_seconds,input_tokens,invocation_count,total_token_costs
0,fixed,41.132119,243046,906,0.004861
1,hierarchical,56.287275,295358,1157,0.005907
2,semantic,131.845039,680599,4178,0.013612
3,custom,77.417092,99307,638,0.001986


### 4. Retrieve

In [21]:
# Define the query for retrieving relevant documents
query = "What were net incomes of Amazon in 2022, 2023 and 2024?"

# Get the knowledge base ID from the variables
kb_id = variables.get("kbCustomChunk")

# Retrieve results from the knowledge base
chunks_from_kb = retrieve_from_kb(
    query=query,
    kb={"knowledgeBaseId": kb_id},
    n_chunks=3,
    variables=variables
)


#Let's specify a minimum similarity score. We should see less chunks retrieved as compared to the previous invocation.
min_score = 0.50

# # get chunks from KB
# chunks_from_kb = retrieve_from_kb(query, kb, n_chunks, variables, min_score)

print(json.dumps(chunks_from_kb, indent=2))


[]


> **Note**: After creating the knowledge base, you can explore its details and settings in the Amazon Bedrock console. This gives you a more visual interface to understand how the knowledge base is structured.
> 
> **[➡️ View your Knowledge Bases in the AWS Console](https://us-west-2.console.aws.amazon.com/bedrock/home?region=us-west-2#/knowledge-bases)**
>
> In the console, you can:
> - See all your knowledge bases in one place
> - View ingestion status and statistics
> - Test queries through the built-in chat interface
> - Modify settings and configurations

In [22]:
# Let's summarize with total chunks, minimum score, maximum score, average score, 
# and lastly the number of chunks with a score more than a specified threshold.
score_threshold = 0.40
score_structure = analyze_chunk_scores_above_threshold(chunks_from_kb, score_threshold)
print(json.dumps(score_structure, indent=4))

{
    "total_chunks": 0,
    "min_score": 0,
    "max_score": 0,
    "avg_score": 0,
    "count_above_threshold": 0
}


### Cost Summary for Running This Notebook
In this notebook, we have used an embedding LLM for two purposes. 
1. Populate a vector store for six PDF files and one CSV file. (7 documents in total)
2. Generate a query embedding.

In [25]:
from IPython.display import display, Markdown
from advanced_rag_utils import embedding_cost_report
# Marking notebook endtime
notebook_end_time = datetime.now(UTC)

cost_for_notebook = get_bedrock_token_based_cost(model_id, notebook_start_time, notebook_end_time)

# Your assumptions for your use case:
scenario_number_of_documents = 100000
scenario_number_of_queries =   5000000 
display(Markdown(embedding_cost_report(vector_store_embedding_cost, cost_for_notebook, scenario_number_of_documents, scenario_number_of_queries)))


#### Scenario
* Number of documents to ingest: 100000
* Number of queries: 5000000

#### Cost Estimation based on the Scenario (USD)
|-| Notebook Cost | Scenario Cost |
|-|-|-|
|VectorStore|0.001986|28.373429|
|Queries|0.006605460000000001|19.37085|
|**TOTAL**|0.008592|47.744279000000006|

#### The cost estimation is based on a scenario that the similar documents and queries are multiplied.
        