# Module 2 - Query Knowledge Base and Build RAG-powered Q&A Application with **Retrieve API**

----

This notebook provides sample code and step-by-step instructions for building a question-answering (Q&A) application using a **Retrieve API** of Amazon Bedrock Knowledge Bases.

----

### Introduction

In the previous notebook, we explored the `RetrieveAndGenerate` API from Amazon Bedrock Knowledge Bases — a fully managed RAG (Retrieval-Augmented Generation) solution. As the name suggests, this API not only retrieves the most relevant information from a knowledge base but also automatically generates a response to the user query in a single, fully managed API call.

In this notebook, we will take a closer look at the `Retrieve` API, which provides greater flexibility for building custom RAG solutions. Unlike `RetrieveAndGenerate`, the `Retrieve` API only fetches relevant document chunks from a Knowledge Base based on the user query — leaving it up to the developer to decide how to leverage this retrieved information.

To keep things simple and focused, in this notebook we will use the output of the `Retrieve` API to manually construct an augmented prompt. We will then send this prompt to a Bedrock's foundation model (FM) of our choice to generate a grounded response.

![retrieveAPI](./images/retrieve_api.png)

### Pre-requisites

In order to run this notebook, you should have successfully completed the first notebook lab:
- [1_create-kb-and-ingest-documents.ipynb](./1\_create-kb-and-ingest-documents.ipynb).

Also, please make sure that you have enabled the following model access in _Amazon Bedrock Console_:

- `Amazon Nova Micro`
- `Amazon Titan Text Embeddings V2`

## 1. Setup

### 1.1 Import the required libraries

In [None]:
# Standard library imports
import os
import sys
import json
import time

# Third-party imports
import boto3
from botocore.client import Config
from botocore.exceptions import ClientError

# Local imports
import utility

# Print SDK versions
print(f"Python version: {sys.version.split()[0]}")
print(f"Boto3 SDK version: {boto3.__version__}")

### 1.2 Initial setup for clients and global variables

In [None]:
%store -r bedrock_kb_id

In [None]:
# Create boto3 session and set AWS region
boto_session = boto3.Session()
aws_region = boto_session.region_name

# Create boto3 clients for Bedrock
bedrock_config = Config(connect_timeout=120, read_timeout=120, retries={'max_attempts': 0})
bedrock_client = boto3.client('bedrock-runtime')
bedrock_agent_client = boto3.client('bedrock-agent-runtime', config=bedrock_config)

# Set the Bedrock model to use for text generation
model_id = 'amazon.nova-micro-v1:0'
model_arn = f'arn:aws:bedrock:{aws_region}::foundation-model/{model_id}'

# Print configurations
print("AWS Region:", aws_region)
print("Bedrock Knowledge Base ID:", bedrock_kb_id)

## 2. Using the **Retrieve API** with Foundation Models from Amazon Bedrock

We will begin by defining a `retrieve` function that calls the `Retrieve` API provided by Amazon Bedrock Knowledge Bases (BKB). This API transforms the user query into vector embeddings, searches the connected knowledge base, and returns the most relevant results. This approach gives you fine-grained control to build custom RAG workflows on top of the retrieved content.

The response from the `Retrieve` API includes several useful components:

- The **retrieved document chunks** containing relevant content from the knowledge base  
- The **source location type** and **URI** for each retrieved document, enabling traceability  
- The **relevance score** for each document chunk, indicating how well it matches the query  

Additionally, the `Retrieve` API supports the `overrideSearchType` parameter within `retrievalConfiguration`, allowing you to control the search strategy used:

| Search Type | Description |
|-------------|-------------|
| `HYBRID`    | Combines semantic search (vector similarity) with keyword search for improved accuracy, especially for structured content. |
| `SEMANTIC`  | Purely embedding-based semantic search, ideal for unstructured or natural language content. |

By default, Amazon Bedrock automatically selects the optimal search strategy for your query. However, if needed, you can explicitly specify `HYBRID` or `SEMANTIC` using `overrideSearchType` to tailor the search behavior to your use case.

### 2.1 Exploring the **Retrieve API**

In [None]:
# Implement the `retrieve` function
def retrieve(user_query, kb_id, num_of_results=5):
    return bedrock_agent_client.retrieve(
        retrievalQuery= {
            'text': user_query
        },
        knowledgeBaseId=kb_id,
        retrievalConfiguration= {
            'vectorSearchConfiguration': {
                'numberOfResults': num_of_results,
                'overrideSearchType': "HYBRID", # optional
            }
        }
    )

In [None]:
user_query = "What is Amazon doing in the field of Generative AI?"

response = retrieve(user_query, bedrock_kb_id, num_of_results=3)

print("Retrieval Results:\n", json.dumps(response['retrievalResults'], indent=2, default=str))

### 2.2 Generating a Response using Retrieved Context and the **Converse API**

Once we have used the `Retrieve` API to fetch the most relevant document chunks from our knowledge base, the next step is to use this retrieved context to generate a grounded and informative response to the user query.

In this section, we will construct a LLM request that combines both user query and the retrieved knowledge base content. We will then use Amazon Bedrock's `Converse` API to interact with a LLM of our choice to generate the final response.

Specifically:
- We will define a *system prompt* that provides general behavioral guidelines to the model — for example, instructing it to act like a financial advisor that prioritizes fact-based, concise answers.
- We will create a *user prompt template* that injects both the retrieved context and the user’s query.
- Finally, we will use the `Converse` API to generate the model’s response, ensuring that it leverages the provided context to produce accurate and grounded answers.

This pattern enables full control over how context is presented to the model, allowing you to implement custom RAG workflows tailored to your application's needs.

In [None]:
# Define a system prompt
system_prompt = """You are a financial advisor AI system, and provides answers to questions
by using fact based and statistical information when possible. 
Use the following pieces of information in <context> tags to provide a concise answer to the questions.
Give an answer directly, without any XML tags.
If you don't know the answer, just say that you don't know, don't try to make up an answer."""

# Define a user prompt template
user_prompt_template = """Here is some additional context:
<context>
{contexts}
</context>

Please provide an answer to this user query:
<query>
{user_query}
</query>

The response should be specific and use statistics or numbers when possible."""

# Extract all context from all relevant retrieved document chunks
contexts = [rr['content']['text'] for rr in response['retrievalResults']]

In [None]:
# Build Converse API request
converse_request = {
    "system": [
        {"text": system_prompt}
    ],
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "text": user_prompt_template.format(contexts=contexts, user_query=user_query)
                }
            ]
        }
    ],
    "inferenceConfig": {
        "temperature": 0.4,
        "topP": 0.9,
        "maxTokens": 500
    }
}

# Call Bedrock's Converse API to generate the final answer to user query
response = bedrock_client.converse(
    modelId=model_id,
    system=converse_request['system'],
    messages=converse_request["messages"],
    inferenceConfig=converse_request["inferenceConfig"]
)

print("Final Answer:\n", response["output"]["message"]["content"][0]["text"])

## 3. Conclusions and Next Steps

In this notebook, we built a custom RAG-powered Q&A application using Amazon Bedrock Knowledge Bases and the `Retrieve` API.

We followed three main steps:
- Used the `Retrieve` API to fetch the most relevant document chunks from a knowledge base based on a user query.
- Constructed an augmented prompt by combining the retrieved content with the user’s question.
- Used the `Converse` API to generate a grounded, fact-based response leveraging the retrieved context.

This approach provides flexibility and control over both search and response generation, enabling tailored RAG solutions for your specific use case.

### Next Steps

Do not forget to clean up the resources here, if you do not indent to expriment with the created Bedrock Knowledge Base anymore:

&nbsp; **NEXT ▶** [4_clean-up.ipynb](./4\_clean-up.ipynb)