# Hybrid search and multi-query with RAG

In this module, you'll learn about the hybrid search and multi query concepts, and how to apply them using Amazon Bedrock and Knowledge bases for Amazon Bedrock APIs.
This module contains:
1. [Overview](#1-Overview)
2. [Pre-requisites](#2-Pre-requisites)
3. [How hybrid search improves the FM generations?](#3-how-hybrid-search-improves-the-fm-generations)
4. [How to apply multi-queries approach?](#4-how-to-apply-multi-queries-approach)

## Overview

For RAG-based applications, the accuracy of the generated response from Foundation Models (FMs) is dependent on the context provided to the model. Context is retrieved from the vector database based on the user query. Semantic search is widely used because it is able to understand more human-like questions—a user’s query is not always directly related to the exact keywords in the content that answers it. 

Semantic search helps provide answers based on the meaning of the text. However, it has limitations in capturing all the relevant keywords. Its performance relies on the quality of the word embeddings used to represent meaning of the text. To overcome such limitations, you can either combine semantic search with keyword search (hybrid) will give better results.
    
[Slide placeholder](http)

### Hybrid search


![hybrid search](images/hybrid-overview.png)


Hybrid search takes advantage of the strengths of multiple search algorithms, integrating their unique capabilities to enhance the relevance of returned search results. For RAG-based applications, semantic search capabilities are commonly combined with traditional keyword-based search to improve the relevance of search results. It enables searching over both the content of documents and their underlying meaning.

It works great for RAG-based applications where the retriever has to handle a wide variety of natural language queries. The keywords help cover specific entities in the query such as product name, color, and price, while semantics better understands the meaning and intent within the query. For example, if you have want to build a chatbot for an ecommerce website to handle customer queries such as the return policy or details of the product, using hybrid search will be most suitable.

#### Benefits of hybrid search
Both keyword and semantic search will return a separate set of results along with their relevancy scores, which are then combined to return the most relevant results.



### Multi-Query

A different approach to improve the relevance of retrieved documents and subsequently FM generations; is based on multi-query concept. Vector databases find documents similar to your query based distance of its embedding to those docuemtns. In which, it creates problems if the wording used in the query is "closer" to less relevant documents. That are usually resolved by prompt engineering/query tuning.

The approach is based on using an FM to generate multiple queries from different perspectives for a given query. For each query, it retrieves a set of relevant documents then uses a distinct set of document across all queries to get a larger set of potentially relevant documents. By generating multiple queries on the same question, this approach might be able to overcome some of the limitations of the distance-based retrieval and get a more comprehensive set of results.

![Multi-Query](images/multi-query.png)
<br>

#### Notes:
- You are going to use the ```Retrieve``` and ```RetrieveAndGenerate``` APIs from the Amazon Bedrock's agent runtime client, to illustrate the use cases. These APIs convert queries into embeddings, searches the knowledge base, and then augment (in case of ```RetrieveAndGenerate``` API) the foundation model prompt with the search results as context information and returns the FM-generated response to the question. The output of the ```RetrieveAndGenerate``` API includes the generated response, source attribution as well as the retrieved text chunks.

- For this module, we will use the Anthropic Claude 3 Sonnet model on Amazon Bedrock as our FM


## Pre-requisites
Before being able to answer the questions, the documents must be processed and stored in a knowledge base. For this notebook, we use a [sample dataset](https://aws-blogs-artifacts-public.s3.amazonaws.com/ML-16482/30_generated_video_game_records.zip) about fictional video games to create the Knowledge Base. 

1. Upload your documents (data source) to Amazon S3 bucket.
2. Create the Knowledge Base using [1a_create_ingest_documents_test_kb.ipynb](https://github.com/aws-samples/amazon-bedrock-samples/blob/main/knowledge-bases/01-rag-concepts/1a_create_ingest_documents_test_kb.ipynb)
3. Note the Knowledge Base ID


## Setup

In [None]:
%pip install -qU boto3 awscli botocore langchain langchain-community langchain-aws

In [None]:
# restart kernel
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

### Initialize boto3 client and variables
Through out the notebook, we are going to utilise RetrieveAndGenerate and Retrieve APIs to test knowledge base features.

In [24]:
import json
import boto3
import pprint as pp
from botocore.exceptions import ClientError
from botocore.client import Config

kb_id = "$Knowledge_Base_ID" # replace it with your Knowledge base id.

# Create boto3 session
sts_client = boto3.client('sts')
boto3_session = boto3.session.Session()
region_name = boto3_session.region_name

# Create bedrock agent client
bedrock_config = Config(connect_timeout=120, read_timeout=120, retries={'max_attempts': 0}, region_name=region_name)
bedrock_agent_client = boto3_session.client("bedrock-agent-runtime",
                              config=bedrock_config)
# Create bedrock client
bedrock_client = boto3_session.client("bedrock-runtime",
                              config=bedrock_config)

# Define FM to be used for generations 
model_id = "anthropic.claude-3-sonnet-20240229-v1:0" # we will be using Anthropic Claude 3 Haiku throughout the notebook
model_arn = f'arn:aws:bedrock:{region_name}::foundation-model/{model_id}'


def retrieve_and_generate(query, kb_id, model_arn, max_results, search_type):
    response = bedrock_agent_client.retrieve_and_generate(
            input={
                'text': query
            },
        retrieveAndGenerateConfiguration={
        'type': 'KNOWLEDGE_BASE',
        'knowledgeBaseConfiguration': {
            'knowledgeBaseId': kb_id,
            'modelArn': model_arn, 
            'retrievalConfiguration': {
                'vectorSearchConfiguration': {
                    'numberOfResults': max_results, # will fetch top N documents which closely match the query
                    'overrideSearchType' : search_type # usese either SEMANTIC or HYBRID options
                    }
                }
            }
        }
    )
    return response

def print_generation_results(response, print_context = True):
    generated_text = response['output']['text']
    print('Generated FM response:\n')
    pp.pprint(generated_text)
    
    if print_context is True:
        ## print out the source attribution/citations from the original documents to see if the response generated belongs to the context.
        citations = response["citations"]
        contexts = []
        for citation in citations:
            retrievedReferences = citation["retrievedReferences"]
            for reference in retrievedReferences:
                contexts.append(reference["content"]["text"])
    
        print('\n\n\nRetrieved Context:\n')
        pp.pprint(contexts)

def retrieve(query, kb_id, model_arn, max_results, search_type):
    response = bedrock_agent_client.retrieve(
            retrievalQuery={
                'text': query
            },
        knowledgeBaseId= kb_id,
        retrievalConfiguration={
                'vectorSearchConfiguration': {
                    'numberOfResults': max_results, # will fetch top N documents which closely match the query
                    'overrideSearchType': search_type
                    }
            }
    )
    return response


def print_results(response):

    print('Retrieved documents:\n')
    
    ## print out the source citations from the original documents
    citations = response['retrievalResults']
    for citation in citations:
        pp.pprint(citation["content"]["text"])


### How hybrid search improves the documents retrieval?


In the following examples, we are going to run the following query with 15 results each time:
<br>
```Name 3 recommended epic games```




**Using semantic search**

In [None]:
query = "Name 3 recommended epic games"

results = retrieve_and_generate(query = query, kb_id = kb_id, model_arn = model_arn, max_results = 15, search_type='SEMANTIC')

print_generation_results(results)

**Using hyprid search**

In [None]:
# Now let's try with the Hybrid search 

results = retrieve_and_generate(query = query, kb_id = kb_id, model_arn = model_arn, max_results = 15, search_type='HYBRID')

print_generation_results(results)

### How to apply multi queries approach? 

To demonstrate the the multi-queries benefits, we will use the same query we used in the previous section: 
<br>
```Name 3 recommended epic games```
<br>

Although, this time we will limit the retrieved documents per query to `5`.

***Note:*** to simplify the illustration of the multi-query concept, we will use the Langchain library.

In [None]:
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_aws import BedrockChat
from langchain_community.retrievers import AmazonKnowledgeBasesRetriever
import logging
from langchain_core.prompts.prompt import PromptTemplate


multi_query_prompt = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is 
    to generate 3 different versions of the given user 
    question to retrieve relevant documents from a vector  database. 
    By generating multiple perspectives on the user question, 
    your goal is to help the user overcome some of the limitations 
    of distance-based similarity search. Provide these alternative 
    questions separated by a single '\n'. Original question: {question}""",
)


# itiating Knowledge bases for Amazon Bedrock as Langchain retriever
retriever = AmazonKnowledgeBasesRetriever(
    knowledge_base_id= kb_id,
    region_name = region_name,
    retrieval_config={
        "vectorSearchConfiguration": {
            "numberOfResults": 5
        }
    }
)

# Amazon Bedrock runtime client - to invoke LLM
bedrock_runtime = boto3_session.client("bedrock-runtime",
                              config=bedrock_config)


# prepare the FM inference configurations
inference_modifier = {
    "temperature": 0.0
}

# prepare model id and inference parameters
llm = BedrockChat(
    model_id = model_id,
    client = bedrock_runtime,
    model_kwargs = inference_modifier,
)


# Instantiating the KB as a multi query retreiever
retriever_from_llm = MultiQueryRetriever.from_llm(
    retriever = retriever, llm = llm, include_original=True, prompt = multi_query_prompt
)


logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)


unique_docs = retriever_from_llm.get_relevant_documents(query = query)

print(len(unique_docs))

#print(unique_docs)

As you could tell from the count of unique documents, it is richer than the asking the original question only. It is also expected to get an improved answer based on variety of documents retrieve.

Let's contiune with that example and check the final answer:

In [None]:
final_answer_prompt = PromptTemplate(
    input_variables=["documents","question"],
    template="""You are an AI language model assistant. Your task is to answer 
    a given user question based on provided context from a vector database. 
    Use only documents provided in the context to answer the user question. 
    <context>
    {documents}
    </context>
    
    User question: {question}""",
)


answer= llm.invoke(final_answer_prompt.format(question= query, documents=unique_docs))

pp.pprint(answer.content)

<div class="alert alert-block alert-warning">
<b>Note:</b> Remember to delete KB, OSS index and related IAM roles and policies to avoid incurring any charges.
</div>