## Create a Knowledge Base with Semantic Chunking Strategy
#### What will we do in this workshop?
1. Create a Knowledgebase (KB) in the vector database.
2. We will create a data source for the KB. The data source will be the Amazon Science and 10K documents stored in S3.
3. We will ingest the data from S3, use Semantic Chunking to chunk the data, generate vector embeddings, and store the chunks and their corresponding vector embeddings in the KB.
4. We will then ask some questions and query the KB to return some chunks and inspect relevancy score.
<br>Note: We are not sending the query and its chunks to a LLM in this notebook. We will do that in other notebooks.
![We are generating vector embeddings and storing them in a KB in Vector Database](./Semantic_Chunking.png)

## Concept

Semantic chunking analyzes the relationships within a text (using vector embeddings) and creates chunks based on the semantic similarity calculated by the embedding model. Please note that this will result in chunks with varying sizes. This approach preserves the information’s integrity during retrieval, helping to ensure accurate and contextually appropriate results. <br>
<br>![How Semantic Chunking Works](./Semantic_how_it_works.png)

## Benefits

* By focusing on the text’s meaning and context, semantic chunking significantly improves the quality of retrieval. It should be used in scenarios where maintaining the semantic integrity of the text is crucial.

* Although this method is more computationally intensive than fixed-size chunking, it can be beneficial for chunking documents where contextual boundaries aren’t clear—for example, legal documents, technical manuals, documents with too many tables.

## Cost Considerations

* Since Semantic chunking process generates vector embeddings to find chunk boundaries, this will result in increase API calls to an embedding LLM. Thus, expect relatively higher costs than fixed size chunking.
* However, the bulk of these operations happen the first time the documents are processed. In a steady state situation these costs will be incurred only for new documents or changes documents.

In [1]:
# Import a module with few helper functions. 
# These functions will help us create knowledge base (KB), create data source for KB, and ingest data using semantic chunking to KB.

import importlib
import advanced_rag_utils

# Reload module
importlib.reload(advanced_rag_utils)

# Re-import all functions
from advanced_rag_utils import *

from datetime import datetime, timedelta, UTC

notebook_start_time = datetime.now(UTC)

In [2]:
# Let's load the variables we saved in the first notebook. We will use these variables
import json
with open("../variables.json", "r") as f:
    variables = json.load(f)

variables

{'accountNumber': '989679345636',
 'regionName': 'us-west-2',
 'collectionArn': 'arn:aws:aoss:us-west-2:989679345636:collection/ny2d41n7rmju74rh4ue2',
 'collectionId': 'ny2d41n7rmju74rh4ue2',
 'vectorIndexName': 'ws-index-',
 'bedrockExecutionRoleArn': 'arn:aws:iam::989679345636:role/advanced-rag-workshop-bedrock_execution_role-us-west-2',
 's3Bucket': '989679345636-us-west-2-advanced-rag-workshop',
 'kbFixedChunk': 'TYG3IXCHCX'}

In [3]:
# Load the dataframe related to costs from a csv file (if it already exists)
df_costs = load_df_from_csv()
df_costs

Loaded existing file: /home/sagemaker-user/brsk-GTM/Advanced_RAG_Workshop/simplified_labs/embed_algo_costs.csv


Unnamed: 0,chunking_algo,embedding_seconds,input_tokens,invocation_count,total_token_costs
0,fixed,54.113933,0,0,0.0


### 1. Create a Knowledge Base
Let's specify  chunking strategy, name and descripotion for Knowledge Base (KB) and create a KB.

In [4]:
kb_chunking_strategy = "semantic" # ["fixed", "hierarchical", "semantic", "custom"]

In [5]:
kb_name = f"advanced-rag-workshop-{kb_chunking_strategy}-chunking"

kb_description = "Knowledge base using Amazon OpenSearch Service as a vector store"

kb = create_kb(kb_name, kb_description, kb_chunking_strategy, variables)

{'collectionArn': 'arn:aws:aoss:us-west-2:989679345636:collection/ny2d41n7rmju74rh4ue2', 'vectorIndexName': 'ws-index-semantic', 'fieldMapping': {'vectorField': 'vector', 'textField': 'text', 'metadataField': 'text-metadata'}}
{'collectionArn': 'arn:aws:aoss:us-west-2:989679345636:collection/ny2d41n7rmju74rh4ue2', 'vectorIndexName': 'ws-index-semantic', 'fieldMapping': {'vectorField': 'vector', 'textField': 'text', 'metadataField': 'text-metadata'}}
{'collectionArn': 'arn:aws:aoss:us-west-2:989679345636:collection/ny2d41n7rmju74rh4ue2', 'vectorIndexName': 'ws-index-semantic', 'fieldMapping': {'vectorField': 'vector', 'textField': 'text', 'metadataField': 'text-metadata'}}
Knowledge Base already exists. Retrieving its ID...
Found existing knowledge base with Name: advanced-rag-workshop-semantic-chunking and ID: N7ZHYZVLOX
OpenSearch Knowledge Response: {
    "createdAt": "2025-03-19 16:35:57.992254+00:00",
    "description": "Knowledge base using Amazon OpenSearch Service as a vector st

### 2. Create Datasource for Knowledge Base

In [6]:
data_source_name = f"advanced-rag-example-{kb_chunking_strategy}"

ds_object = create_data_source_for_kb(kb_chunking_strategy, data_source_name, kb, variables)

Creating new data source 'advanced-rag-example-semantic' with {'chunkingStrategy': 'SEMANTIC', 'semanticChunkingConfiguration': {'maxTokens': 300, 'bufferSize': 1, 'breakpointPercentileThreshold': 95}} chunking...
semantic chunking data source created successfully.


### 3. Start Ingestion Job for Amazon Bedrock Knowledge base pointing to Amazon OpenSearch

> **Note**: The ingestion process will take approximately 2-3 minutes to complete. During this time, the system is processing your documents by:
> 1. Extracting text from the source files
> 2. Chunking the content according to the defined strategy (Fixed / Semantic / Hierachical / Custom)
> 3. Generating embeddings for each chunk
> 4. Storing the embeddings and associated metadata in a Knowledge Base (KB) in OpenSearch vector database
>
> You'll see status updates as the process progresses. Please wait for the "Ingestion job completed successfully" message before proceeding to the next step.

In [7]:
ingestion_start_time = datetime.now(UTC)
create_ingestion_job(kb, ds_object, variables)
ingestion_end_time = datetime.now(UTC)

Ingestion job started successfully for kb_name = advanced-rag-workshop-semantic-chunking and kb_id = N7ZHYZVLOX

running...
running...
running...
running...
running...
running...
running...
running...
running...
running...
running...
running...
Job completed successfully



In [8]:
time_taken = (ingestion_end_time-ingestion_start_time).total_seconds()
print(f"time taken to ingest into KB = {fmt_n(time_taken)} seconds")

time taken to ingest into KB = 122.86 seconds


## Embedding LLM Costs
1. Specify model id
2. Specify start and end time
3. Invoke a helper function to query cloud watch
5. Calculate costs (please note that pricing is subject to change per region and over time)

<br>![Embedding LLM Input Token Costs](./Input_token_embedding_llm_costs.png)

In [15]:
model_id = 'arn:aws:bedrock:us-west-2::foundation-model/amazon.titan-embed-text-v2:0'

# use the helper function to get input tokens to embedding LLM and the associated costs
tokens = get_embedding_LLM_costs_for_KB(model_id, ingestion_start_time, ingestion_end_time)

print(json.dumps(tokens, indent=4))

{
    "model_id": "arn:aws:bedrock:us-west-2::foundation-model/amazon.titan-embed-text-v2:0",
    "start_time": "2025-04-29T21:48:23.044863+00:00",
    "end_time": "2025-04-29T21:50:25.902385+00:00",
    "duration in minutes": 2.047625366666667,
    "input_tokens": 0,
    "invocation_count": 0,
    "per million input token costs": 0.02,
    "total token costs": 0.0
}


In [16]:
# Let's add or update the cost binfo to dataframe. 
# This will help us compare the costs from various chunking strategies visually.
new_row = {
    'chunking_algo': kb_chunking_strategy,
    'embedding_seconds': tokens['duration in minutes']*60,
    'input_tokens': tokens['input_tokens'],
    'invocation_count': tokens['invocation_count'],
    'total_token_costs': tokens['total token costs']
}
df_costs = update_or_add_row(df_costs, new_row)
df_costs

Updated existing row for: semantic


Unnamed: 0,chunking_algo,embedding_seconds,input_tokens,invocation_count,total_token_costs
0,fixed,54.113933,0,0,0.0
1,semantic,122.857522,0,0,0.0


In [17]:
# Let's save the df
save_df_to_csv(df_costs)

Successfully saved DataFrame to: /home/sagemaker-user/brsk-GTM/Advanced_RAG_Workshop/simplified_labs/embed_algo_costs.csv


### 4. Retrieve: Use input query to RETRIEVE chunks from Vector Database
We will use a helper function where you can specify the number of chunks to extract.<br>
The helper function will 1/ generate a vector embedding for the query, 2/ search the vector embedding in the Knowledge Base (KB) vector database, 3/ get the number of chunks specified, 4/ Optionally, you can also specify minimum score for similarity in which case the helper function will get chunks with at least the minimum relevancy.

<b>Warning: After data is ingested into a KB, when you query immediately, the results might be empty because of eventual consistency. If that happens, please wait for a few seconds and then retry.</b>

In [18]:
# Now let's pick the chunks with some minimum relevance score for the same question.
query = "Who is the CEO, CFO, and CTO of Amazon?"

#specify the number of chunks
n_chunks = 5

#Let's specify a minimum similarity score. We should see less chunks retrieved as compared to the previous invocation.
min_score = 0.60

# get chunks from KB
chunks_from_kb = retrieve_from_kb(query, kb, n_chunks, variables, min_score)

print(json.dumps(chunks_from_kb, indent=2))

# You should see less number of chunks retrieved as compared to the previous cell 
# because of the minimum relevance score.

[
  {
    "content": "We promptly make available on this website, free of charge, the reports that we file or furnish with the Securities and Exchange Commission (\u201cSEC\u201d), corporate governance information (including our Code of Business Conduct and Ethics), and select press releases. \n \n Executive Officers and Directors \n \n The following tables set forth certain information regarding our Executive Officers and Directors as of January 25, 2023: \n \n Information About Our Executive Officers Name Age Position \n \n Jeffrey P. Bezos 59 Executive Chair Andrew R. Jassy 55 President and Chief Executive Officer Douglas J. Herrington 56 CEO Worldwide Amazon Stores Brian T. Olsavsky 59 Senior Vice President and Chief Financial Officer Shelley L. Reynolds 58 Vice President, Worldwide Controller, and Principal Accounting Officer Adam N. Selipsky 56 CEO Amazon Web Services David A. Zapolsky 59 Senior Vice President, General Counsel, and Secretary \n \n Jeffrey P. Bezos. Mr. Bezos foun

In [19]:
# Let's summarize with total chunks, minimum score, maximum score, average score, 
# and lastly the number of chunks with a score more than a specified threshold.
score_threshold = 0.40
score_structure = analyze_chunk_scores_above_threshold(chunks_from_kb, score_threshold)
print(json.dumps(score_structure, indent=4))

{
    "total_chunks": 3,
    "min_score": 0.63529587,
    "max_score": 0.6481242,
    "avg_score": 0.6438317566666666,
    "count_above_threshold": 3
}


In [20]:
#Let's print the costs of running this notebook.

model_id = 'arn:aws:bedrock:us-west-2::foundation-model/amazon.titan-embed-text-v2:0'

notebook_end_time = datetime.now(UTC)
tokens = get_bedrock_tokens(model_id, notebook_start_time, notebook_end_time, 5)
print(json.dumps(tokens, indent=4))
print(f"Cost of running this notebook is approximately ${tokens['total token costs']}")

{
    "model_id": "arn:aws:bedrock:us-west-2::foundation-model/amazon.titan-embed-text-v2:0",
    "start_time": "2025-04-29T21:48:09.384462+00:00",
    "end_time": "2025-04-29T21:50:56.546725+00:00",
    "duration in minutes": 2.7860377166666668,
    "input_tokens": 0,
    "output_tokens": 0,
    "invocation_count": 0,
    "per million input token costs": 0.02,
    "per million output token costs": 0,
    "input token costs": 0.0,
    "output token costs": 0.0,
    "total token costs": 0.0,
    "average token costs per invocation": 0,
    "token costs per MILLION such invocations": 0
}
Cost of running this notebook is approximately $0.0
