# Multi-Query Retrieval: Enhancing Vector Search with LLMs

## The Challenge
In vector databases, finding the right information depends heavily on how you phrase your query. When you have thousands of documents, the exact phrasing becomes crucial - but how do you know the right phrasing when you don't know what's in your database?

## The Solution: Multi-Query Retrieval
Multi-Query Retrieval offers an elegant solution by using Large Language Models to generate multiple variations of your initial query, improving the chances of matching relevant documents.

## How It Works
1. **Query Generation**: An LLM (like ChatGPT) takes your natural question and creates several alternative phrasings
2. **Multiple Searches**: Each generated query is used to search the vector database
3. **Result Aggregation**: The unique results from all queries are combined to provide comprehensive information

## The Notebook Implementation
The notebook demonstrates this technique using:
- Wikipedia data about MKUltra as sample content
- Document splitting to create manageable chunks
- OpenAI embeddings to vectorize the content
- FAISS as the vector database
- ChatGPT to generate query variations

## Setting Up Our Environment

First, let's install the necessary libraries. We'll need:
- `faiss-cpu` for our vector database
- `wikipedia` for our data source
- `langchain` components for our retrieval pipeline
- OpenAI's API for embeddings and query generation

Don't forget to run ```export OPENAI_API_KEY=sk-...``` to set your api key in the environment variables. You can set up alternative api keys with hugging face or other client sites to operate with LangChain but that is beyond our scope.

In [1]:
## Uncomment and run this cell if you need to install the required packages 
# !pip install faiss-cpu wikipedia python-certifi-win32 langchain-community langchain-openai certifi

## Step 1: Gathering Our Data

For this example, we'll use information about Project MKUltra from Wikipedia. This CIA program was classified for many years before being declassified, making it an interesting case study for information retrieval.

Let's load the data using LangChain's Wikipedia loader:
   

In [2]:
from langchain.document_loaders import WikipediaLoader

loader = WikipediaLoader(query='MKUltra')
documents = loader.load()

## Project MKUltra : Our Test Case
Project MKUltra was a human experimentation program designed and undertaken by the U.S. Central Intelligence Agency (CIA) from 1953 to 1973. Its goal was to develop procedures and identify drugs that could be used during interrogations to weaken individuals and force confessions through brainwashing and psychological torture.

The term MKUltra is a CIA cryptonym: \"MK\" stands for the Office of Technical Service and \"Ultra\" is an arbitrary word used to name this project. The program has been widely condemned as a violation of individual rights and an example of the CIA's abuse of power.

What makes this an interesting test case for our retrieval system is that information about it was released gradually through various declassifications and investigations. This means relevant information might be phrased in various ways throughout the documents.
   

link: https://en.wikipedia.org/wiki/MKUltra

In [4]:
len(documents)

24

## Step 2: Processing Documents for Vector Storage

Our Wikipedia article is quite long. For effective retrieval, we need to split it into smaller chunks that can be independently vectorized and retrieved. Think of this as creating multiple focused knowledge pieces instead of one large document.

We'll use the `RecursiveCharacterTextSplitter` which intelligently splits text based on natural boundaries like paragraphs and sentences:
   

In [5]:
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter

In [6]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100, separators=["\n\n", "\n", ".", " "])
docs = text_splitter.split_documents(documents)

In [7]:
len(docs)

305

We've gone from 24 document sections to 305 smaller chunks! This granularity is important for several reasons:

1. **Precision**: Smaller chunks allow us to retrieve just the relevant information
2. **Context Management**: The 100-character overlap helps maintain context between chunks
3. **Token Efficiency**: When working with LLMs, smaller chunks help us stay within token limits

The splitter tries to break at natural boundaries (paragraphs, sentences) first before resorting to character-level splits. This helps preserve the semantic integrity of our chunks.
   

## Step 3: Creating Vector Embeddings

Now we need to transform our text chunks into vector embeddings - numerical representations that capture the semantic meaning of the text. These vectors will allow us to perform similarity searches.

We'll use OpenAI's embedding model for this task:
   

In [8]:
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

In [9]:
embedding_model = OpenAIEmbeddings()

## Step 4: Creating Our Vector Database

Now we'll store our embeddings in a vector database. FAISS (Facebook AI Similarity Search) is a good choice because it's efficient for similarity search and doesn't require external servers:
   

In [12]:
db = FAISS.from_documents(docs, embedding_model)

Behind the scenes, several important things just happened:

1. Each of our 305 document chunks was sent to OpenAI's embedding API
2. The API converted each chunk into a high-dimensional vector (typically 1536 dimensions)
3. These vectors were stored in the FAISS database
4. FAISS created an index structure to make similarity searches efficient

Now we have a searchable knowledge base that understands semantic similarity, not just keyword matching.
   

## Step 5: Implementing Multi-Query Retrieval
    
Here comes the innovative part. Instead of doing a single search with one query, we'll use an LLM to generate multiple query variations, then search for each one.

First, let's import the necessary components:
   

In [13]:
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_openai import ChatOpenAI

Let's frame a question that tests our system. \"When was this declassified?\" is a good example because:

1. It's a natural way a person might ask
2. It uses a pronoun (\"this\") that requires contextual understanding
3. The answer might be phrased in various ways in the documents
   

In [14]:
question = "When was this declassified?"

In [22]:
# We use temperature=0 for consistent, deterministic outputs so the LLM is less creative
llm = ChatOpenAI(temperature=0)

In [16]:
# Create our multi-query retriever
retriever_from_llm = MultiQueryRetriever.from_llm(retriever=db.as_retriever(), llm=llm)

In [17]:
# Set up logging to see what queries are generated
import logging
logging.basicConfig()
logging.getLogger('langchain.retrievers.multi_query').setLevel(logging.INFO)

Now let's execute our multi-query retrieval. This will:

1. Take our original question
2. Generate multiple variations using the LLM
3. Search the vector database with each variation
4. Return the combined unique results

In [18]:
# THIS WILL NOT DIRECTLY ANSWER ANY QUERY
# RETURN N DOCS THAT ARE MOST SIMILAR/RELEVANT
unique_docs = retriever_from_llm.invoke(question)

INFO:langchain.retrievers.multi_query:Generated queries: ['1. What is the date of the declassification of this information?', '2. Can you provide the specific time when this was declassified?', '3. When was the classified status of this information removed?']


## Understanding the Generated Queries

Look at those query variations! The LLM has taken our simple question \"When was this declassified?\" and generated three different phrasings:

1. \"What is the date of the declassification of this information?
2. \"Can you provide the specific time when this was declassified?
3. \"When was the classified status of this information removed?

Each variation approaches the question from a slightly different angle, using different vocabulary and sentence structures. This increases our chances of matching the actual phrasing in the documents.
   

In [19]:
# How many unique documents did we retrieve?
len(unique_docs)

7

We've retrieved 7 unique documents across all three query variations. This is the power of multi-query retrieval - we cast a wider net and capture more potentially relevant information.

Let's look at the most relevant document to see if we found our answer:
   

In [20]:
# Let's look at the first (most relevant) document
print(unique_docs[0])

page_content='. Some surviving information about MKUltra was declassified in 2001.' metadata={'title': 'MKUltra', 'summary': 'Project MKUltra was a human experimentation program designed and undertaken by the U.S. Central Intelligence Agency (CIA) to develop procedures and identify drugs that could be used during interrogations to weaken individuals and force confessions through brainwashing and psychological torture. The term MKUltra is a CIA cryptonym: "MK" is an arbitrary prefix standing for the Office of Technical Service and "Ultra" is an arbitrary word out of a dictionary used to name this project. The program has been widely condemned as a violation of individual rights and an example of the CIAâ€™s abuse of power, with critics highlighting its disregard for consent and its corrosive impact on democratic principles.\nProject MKUltra began in 1953 and was halted in 1973. MKUltra used numerous methods to manipulate its subjects\' mental states and brain functions, such as the cove

In [21]:
print(unique_docs[0].page_content)

. Some surviving information about MKUltra was declassified in 2001.


## Success! Finding Our Answer

Our multi-query approach worked perfectly! We found the exact information we were looking for:

\"Some surviving information about MKUltra was declassified in 2001.\"


## Key Takeaways about Multi-Query Retrieval

Let's reflect on what makes this approach powerful:

1. **Natural User Experience**: Users can ask questions in their own words without knowing the exact phrasing in the documents

2. **Increased Coverage**: By generating multiple query variations, we cast a wider net in our semantic search

3. **Contextual Understanding**: The LLM understands that \"this\" in our question refers to MKUltra and generates appropriate variations

4. **Efficient Implementation**: The entire process requires minimal code once the basic vector retrieval system is in place

## Practical Applications

This technique can be applied to many real-world scenarios:
- **Customer Support**: Helping customers find answers without knowing the exact terms used in support documentation
- **Research Assistance**: Finding relevant papers and documents when exploring new fields
- **Knowledge Management**: Making corporate knowledge bases more accessible to employees
- **Educational Tools**: Helping students find relevant information in learning materials

Multi-query retrieval represents an important evolution in how we interact with large knowledge bases, making information retrieval more intuitive and human-centered.
   