# Multiquery Retrieval

The `MultiQueryRetriever` automates the process of prompt tuning by using an LLM to generate multiple queries from different perspectives for a given user input query. For each query, it retrieves a set of relevant documents and takes the unique union across all queries to get a larger set of potentially relevant documents. By generating multiple perspectives on the same question, the `MultiQueryRetriever` might be able to overcome some of the limitations of the distance-based retrieval and get a richer set of results.



<a href="https://colab.research.google.com/github/edumunozsala/langchain-rag-techniques/blob/main/Multiquery-retrieval.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip -q install langchain openai tiktoken faiss-cpu

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.6/17.6 MB[0m [31m73.6 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
!pip show langchain

Name: langchain
Version: 0.0.305
Summary: Building applications with LLMs through composability
Home-page: https://github.com/langchain-ai/langchain
Author: 
Author-email: 
License: MIT
Location: /usr/local/lib/python3.10/dist-packages
Requires: aiohttp, anyio, async-timeout, dataclasses-json, jsonpatch, langsmith, numexpr, numpy, pydantic, PyYAML, requests, SQLAlchemy, tenacity
Required-by: 


### Load the API Key

In [2]:
from dotenv import load_dotenv

# Load the enviroment variables
load_dotenv()

True

### Load the PDF document

In [3]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Load a PDF file, extract text into documents and create a FAISS vectorstore with Langchain
from langchain.document_loaders import PyPDFLoader
from langchain.schema import Document

import os
import re 

Helper functions to load the PDF file, split it and create the Documents

In [4]:

def split_text_documents(text, source="Not provided", chunk_size=1000, chunk_overlap=0):
    """
    Split the documents in the reader into smaller chuncks.
    
    Args:
    reader (PdfReader): The PdfReader object to be splitted.
    Returns:
    str: The summarized document.
    """
    # Create a text splitter
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size, chunk_overlap=chunk_overlap, separators=[" ", ",", "\n"]
    )
    #Split the text
    texts = text_splitter.split_text(text)
    # Create a list of documents
    docs = [Document(page_content=t, metadata={"source":source, "chunk":i}) for i,t in enumerate(texts)]
    """
    documents = text_splitter.split_documents(documents=reader)
    print(f"Splitted into {len(documents)} chunks")
    # Update the metadata with the source url
    for doc in documents:
        #old_path = doc.metadata["source"]
        #new_url = old_path.replace("langchain-docs", "https:/")
        #doc.metadata.update({"source": new_url})
        print(doc.metadata)
    """
    
    return docs

def load_pdf_from_url(path, files):
    """
    Load one or more PDF documents from the directory in the parameter path.

    Args:
    path: directory where the file or files are located
    files: list of file names

    Returns:
    str: the loaded document
    """
    
    # creating a pdf reader object 
    if path=='' or files=='':
        print('Error: file not found')
        return None
    else:
        reader = PyPDFLoader(os.path.join(path, files[0])) # 'data/Retrieve rerank generate.pdf') 
    
    # printing number of pages in pdf file 
    # print(len(reader.pages)) 
    return reader.load()

def extract_text_from_pdf(path, file):
    """
    Extract and return the text inside a PDF documents in the directory in the parameter path.

    Args:
    path: directory where the file or files are located
    files: list of file names

    Returns:
    str: the the text in the PDF file
    """
    files=[file]
    pages = load_pdf_from_url(path, files)
    
    "Join all pages in one single text"
    text=""
    for page in pages:
        raw_page = re.sub('-\s+', '', page.page_content)
        text += " ".join(raw_page.split())
        text += "\n"
        
    return text

def load_pdf_from_file(path, file, chunk_size, chunk_overlap):
    """
    Load the PDF document file from the directory in the parameter path. Returns a list of Langchain Documents

    Args:
    path: directory where the file or files are located
    file: file name

    Returns:
    lst: list of documents
    """
    
    # creating a pdf reader object 
    if path=='' or file=='':
        print('Error: file not found')
        return None
    else:
        # Read and clean the text in the PDf file
        text= extract_text_from_pdf(path, file)
        # Split the text and create a list of documents
        documents = split_text_documents(text, source=file, chunk_size=chunk_size, chunk_overlap=chunk_overlap)

  
    # printing number of pages in pdf file 
    print(len(documents)) 
    
    return documents


In [5]:
path="data"
file="Attention is all you need.pdf"

docs= load_pdf_from_file(path, file, 1000, 50)

42


In [6]:
print(len(docs))

42


### Create the Vector index

In [7]:
#from langchain.vectorstores import Chroma
from langchain.vectorstores import FAISS
from langchain.embeddings.openai import OpenAIEmbeddings

In [8]:
embedding = OpenAIEmbeddings()
faiss_vectorstore = FAISS.from_documents(docs, embedding)

# Create a retriever from the vectorstore
#faiss_retriever = faiss_vectorstore.as_retriever(search_kwargs={"k": 3})

## Create the Multiquery Retriever

In [10]:
from langchain.chat_models import ChatOpenAI
from langchain.retrievers.multi_query import MultiQueryRetriever

**Simple usage**

Specify the LLM to use for query generation, and the retriever will do the rest.

In [11]:
# Define the LLM
llm = ChatOpenAI(temperature=0)
# Set the multiquery retriever
retriever_from_llm = MultiQueryRetriever.from_llm(
    retriever=faiss_vectorstore.as_retriever(search_kwargs={"k": 3}), llm=llm, include_original=True
)

### Set the logging properties

In [11]:
# Set logging for the queries
import logging

logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

We test the retriever

In [12]:
question="How are transformers related to convolutional neural networks?"

unique_docs = retriever_from_llm.get_relevant_documents(query=question)
len(unique_docs)

4

In [13]:
unique_docs

[Document(page_content='of sequential computation, however, remains. Attention mechanisms have become an integral part of compelling sequence modeling and transduction models in various tasks, allowing modeling of dependencies without regard to their distance in the input or output sequences [ 2,19]. In all but a few cases [ 27], however, such attention mechanisms are used in conjunction with a recurrent network. In this work we propose the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output. The Transformer allows for signiﬁcantly more parallelization and can reach a new state of the art in translation quality after being trained for as little as twelve hours on eight P100 GPUs. 2 Background The goal of reducing sequential computation also forms the foundation of the Extended Neural GPU [16], ByteNet [ 18] and ConvS2S [ 9], all of which use convolutional neural networks as ba

## Build the RAG Chain

In [14]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableLambda, RunnablePassthrough

Define the Prompt template

In [15]:
# Create the Template 
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)


Create the chain

In [16]:
model = ChatOpenAI(temperature=0)

chain = (
    {"context": retriever_from_llm, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

Invoke the chain

In [17]:
chain.invoke("How are transformers related to convolutional neural networks?")

'Transformers are related to convolutional neural networks in that both models are used for sequence transduction tasks. However, transformers differ from convolutional neural networks in that they do not rely on recurrence or convolutions. Instead, transformers use attention mechanisms to draw global dependencies between input and output, allowing for more parallelization and reducing the number of operations required to relate signals from different positions in the sequence.'

## Create a custom template

In [15]:
from typing import List

from langchain.chains import LLMChain
from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate
from pydantic import BaseModel, Field

from langchain.chat_models import ChatOpenAI
from langchain.retrievers.multi_query import MultiQueryRetriever

Define a custom prompt for multiquery, asking the model for creating 5 questions. We also create an output parses to get the questions from the LLM.

In [16]:
# Output parser will split the LLM result into a list of queries
class LineList(BaseModel):
    # "lines" is the key (attribute name) of the parsed output
    lines: List[str] = Field(description="Lines of text")


class LineListOutputParser(PydanticOutputParser):
    def __init__(self) -> None:
        super().__init__(pydantic_object=LineList)

    def parse(self, text: str) -> LineList:
        lines = text.strip().split("\n")
        return LineList(lines=lines)


output_parser = LineListOutputParser()

QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate five 
    different versions of the given user question to retrieve relevant documents from a vector 
    database. By generating multiple perspectives on the user question, your goal is to help
    the user overcome some of the limitations of the distance-based similarity search. 
    Provide these alternative questions separated by newlines.
    Original question: {question}""",
)

Set the logging to get the multiqueries

In [25]:
# Set logging for the queries
import logging

logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

Create a LLM Chain to extract the multiple queries from our original query in an appropiate format

In [26]:
llm = ChatOpenAI(temperature=0)

# Chain
llm_chain = LLMChain(llm=llm, prompt=QUERY_PROMPT, output_parser=output_parser)

# Run
retriever_custom_multi = MultiQueryRetriever(
    retriever=faiss_vectorstore.as_retriever(), llm_chain=llm_chain, parser_key="lines"
)  # "lines" is the key (attribute name) of the parsed output


Invoke the retriever to test it

In [27]:
# Results
unique_docs = retriever_custom_multi.get_relevant_documents(
    query="How are transformers related to convolutional neural networks?"
)

INFO:langchain.retrievers.multi_query:Generated queries: ['1. What is the relationship between transformers and convolutional neural networks?', '2. Can you explain how transformers and convolutional neural networks are connected?', '3. In what way are transformers and convolutional neural networks related?', '4. What is the connection between transformers and convolutional neural networks?', '5. How do transformers and convolutional neural networks relate to each other?']


In [28]:
unique_docs

[Document(page_content='of sequential computation, however, remains. Attention mechanisms have become an integral part of compelling sequence modeling and transduction models in various tasks, allowing modeling of dependencies without regard to their distance in the input or output sequences [ 2,19]. In all but a few cases [ 27], however, such attention mechanisms are used in conjunction with a recurrent network. In this work we propose the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output. The Transformer allows for signiﬁcantly more parallelization and can reach a new state of the art in translation quality after being trained for as little as twelve hours on eight P100 GPUs. 2 Background The goal of reducing sequential computation also forms the foundation of the Extended Neural GPU [16], ByteNet [ 18] and ConvS2S [ 9], all of which use convolutional neural networks as ba

## Build the RAG Chain

In [24]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableLambda, RunnablePassthrough

Define the Prompt template

In [29]:
# Create the Template 
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

model = ChatOpenAI(temperature=0)

chain = (
    {"context": retriever_custom_multi, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

Invoke the chain

In [30]:
chain.invoke("How are transformers related to convolutional neural networks?")

INFO:langchain.retrievers.multi_query:Generated queries: ['1. What is the relationship between transformers and convolutional neural networks?', '2. Can you explain how transformers and convolutional neural networks are connected?', '3. In what way are transformers and convolutional neural networks related?', '4. What is the connection between transformers and convolutional neural networks?', '5. How do transformers and convolutional neural networks relate to each other?']


'Transformers are related to convolutional neural networks in that they both aim to reduce sequential computation. However, transformers rely entirely on attention mechanisms to draw global dependencies between input and output, while convolutional neural networks use convolutional layers as their basic building block and require a varying number of operations to relate signals from different input or output positions based on their distance.'