# Question Answering on LangChain blogs (using MultiQuery Retriever)

In this notebook we explore `MultiQueryRetriever` (MQR) to enhance output quality of RAG documents.

Idea here is to generate upto 5 questions by rephrasing the original question without changing the context of the question.

We then build a `MultiQueryRetriever` with our current retriever and llm to rephrase questions accordingly.

Finally, whilst invoking the RAG chain, we use the MQR as our retriever.

We'll QnA on a few blog post from [LangChain blogs](https://blog.langchain.dev/rss/)

In [1]:
# !pip install -U feedparser rich --quiet

In [2]:
import os
import sys

sys.path.append(os.path.abspath(os.path.join(os.getcwd(), os.pardir)) + "/utils")

In [3]:
import warnings
from typing import List

import boto3
from anthropic_bedrock import AI_PROMPT, HUMAN_PROMPT
from rich import print
from utils import get_inference_parameters, get_model_ids

warnings.filterwarnings("ignore")

%load_ext rich
%load_ext autoreload
%autoreload 2

### Instantiate LLM and Embeddings

In [4]:
from langchain.embeddings.bedrock import BedrockEmbeddings
from langchain.llms.bedrock import Bedrock

region = "us-west-2"
b_client = boto3.client("bedrock-runtime", region_name=region)
model_kwargs = get_inference_parameters(
    "anthropic"
)  # We need pass in model_kwargs for a model
# llm_model_id = "anthropic.claude-v2"
llm_model_id = "anthropic.claude-instant-v1"
embed_model_id = "cohere.embed-english-v3"

llm = Bedrock(
    client=b_client,
    model_kwargs=model_kwargs,
    model_id=llm_model_id,
    region_name=region,
)
embeddings = BedrockEmbeddings(
    client=b_client, model_id=embed_model_id, region_name=region
)

### Scrape a few blogs posts for encoding

We scrape top langchain blog posts from rss feed.

We use LangChain `AsyncHtmlLoader` document loader to download blog posts as html.

In [5]:
import feedparser
from langchain.document_loaders import AsyncHtmlLoader

feed_url = "https://blog.langchain.dev/rss/"
rss_feed = feedparser.parse(feed_url)
urls = [entry.link for entry in rss_feed.entries]

html_loader = AsyncHtmlLoader(urls)
html_docs = html_loader.load()

Fetching pages: 100%|##########################################################################################################################################| 15/15 [00:01<00:00, 14.35it/s]


Extracted html_docs metadata contains only `source`. Let's enhance the `metadata` by adding `language` and blog `title`.

In [6]:
from bs4 import BeautifulSoup

for _html_doc in html_docs:
    metadata = dict()
    metadata["source"] = _html_doc.metadata["source"]
    metadata["language"] = "en"
    soup = BeautifulSoup(_html_doc.page_content, "html.parser")
    metadata["title"] = soup.find("title").text
    _html_doc.metadata = metadata

### Convert HTML docs into Text

We use Unstructured [partition_html](https://unstructured-io.github.io/unstructured/core/partition.html#partition-html) to extract text from html. `partition_html` helps to clean and group html text.

- group articles by title using `chunking_strategy='by_title'`
- `assemble_articles = True`
- `skip_headers_and_footers = True`
- Clean any non ascii chars in text with `clean_non_ascii_chars`

In [7]:
from langchain.docstore.document import Document
from unstructured.cleaners.core import clean_non_ascii_chars
from unstructured.partition.html import partition_html


# Add documentation to the below function
def extract_text_chunks_from_html(urls, html_docs) -> List[Document]:
    """ "
    Function to reformat html_docs from html to plain text
    Input: urls, html_docs
    Output: List[Document]
    """
    extracted_docs = []
    for url, doc in zip(urls, html_docs):
        elements = partition_html(
            text=doc.page_content,
            html_assemble_articles=True,
            skip_headers_and_footers=True,
            chunking_strategy="by_title",
        )
        extracted_text = "".join([e.text for e in elements])
        # extract links if available and append to metadata
        extracted_links = []
        for element in elements:
            if element.metadata.links is not None:
                print(element.metadata.links)
                link = element.metadata.links[0]["url"][1:]
                extracted_links.append(link)
        # Add extracted links to metadata as references
        if len(extracted_links) > 0:
            doc.metadata["references"] = extracted_links
        doc.page_content = clean_non_ascii_chars(extracted_text)
        extracted_docs.append(doc)
    return extracted_docs

In [8]:
print(f"Converting {len(html_docs)} HTML docs to Text")
extracted_docs = extract_text_chunks_from_html(urls, html_docs)

### Split docs into chunks the size of Embedding models max length (512)


In [9]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Remember TextSplitter chunk_size is != model max length
splitter = RecursiveCharacterTextSplitter(
    add_start_index=True, chunk_size=2048, chunk_overlap=0
)
doc_chunks = splitter.split_documents(documents=extracted_docs)
print(f"Split {len(html_docs)} HTML docs into {len(doc_chunks)} chunks")

### Add docs to vectorstore (Qdrant)

Install and run `qdrant` vector store locally using docker

Refer here for Installation: <https://qdrant.tech/documentation/quick-start/>

Qdrant should be running at port `6333` on localhost.

In [10]:
add_docs = False  # Set this to False during mutiple runs

In [11]:
from langchain.vectorstores.qdrant import Qdrant
from qdrant_client import QdrantClient

collection_name = "mlblogs_coherev3"  # define collection name
qclient = QdrantClient(location="localhost", port=6333)
collection_status = qclient.get_collection(collection_name=collection_name).status

if collection_status == "green":
    print(f"Connected to collection: [b magenta]{collection_name}[/b magenta] ✅")
    # Instantiating Qdrant client is weird with LangChain
    db = Qdrant(
        client=qclient,
        collection_name=collection_name,
        distance_strategy="cosine",
        embeddings=embeddings,
    )
    
if add_docs:
    # Add documents to vector db with force_recreate = True for testing
    db = db.from_documents(
        documents=doc_chunks,
        embedding=embeddings,
        collection_name=collection_name,
        force_recreate=False,  # Set this to false in PROD
    )
    print(
        f"Added [b]{len(doc_chunks)}[/b] to collection: [b green]{collection_name}[/b green] ✅"
    )

### Enhance retrieval with MultiQueryRetriever (MQR)

Idea here it to generate upto questions by rephrasing the original question provided by the user without changing the context of the question.

We use our llm to rephrase our questions accordingly.

`MultiQueryRetriever` does exactly that. Initialize MQR by passing in the retriever and llm.

For each query, it retrieves a set of relevant documents and takes the unique union across all queries to get a larger set of potentially relevant documents. 

In [12]:
import logging

from langchain.retrievers.multi_query import MultiQueryRetriever

logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

# retriever_kwargs = {"search_type": "similarity", "top_k": 5}
# qdrant_retriever = db.as_retriever(**retriever_kwargs)
qdrant_retriever = db.as_retriever()

retriever_from_llm = MultiQueryRetriever.from_llm(retriever=qdrant_retriever, llm=llm)

question = " What are some of the approaches to benchmark RAG on Tables"
unique_docs = retriever_from_llm.get_relevant_documents(query=question)
print(f"Extracted {len(unique_docs)} queries")

INFO:langchain.retrievers.multi_query:Generated queries: ['Here are 3 alternative questions for "What are some of the approaches to benchmark RAG on Tables":', '', 'How can I evaluate the performance of RAG models on table data?', '', 'What methods exist for testing and comparing how well RAG models understand and interact with tabular information? ', '', 'What techniques or processes are used to measure how RAG algorithms handle and process information contained within tables?']


### Create MQR with your own prompt

You can also supply a prompt along with an output parser to split the results into a list of queries.

Output parser will split the LLM result into a list of queries by removing additional characters and cleaning empty strings 

In [13]:
from typing import List
import re
from langchain.chains import LLMChain
from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate
from pydantic import BaseModel, Field


# Output parser will split the LLM result into a list of queries
class QuestionsList(BaseModel):
    # "lines" is the key (attribute name) of the parsed output
    lines: List[str] = Field(description="Lines of rephrased questions")


class QuestionsListOutputParser(PydanticOutputParser):
    def __init__(self) -> None:
        super().__init__(pydantic_object=QuestionsList)

    def parse(self, text: str) -> QuestionsList:
        lines = text.strip().split("\n")
        # remove 1. 2. from the string
        questions = [re.sub(r"^\d+\.\s+", "", question) for question in lines]
        questions = [s for s in questions if s]  # remove any empty strings

        return QuestionsList(lines=questions)


output_parser = QuestionsListOutputParser()

#### Load and format question rephraser prompt from file

In [14]:
from pathlib import Path

# Let's load prompt from text file
prompt_path = Path("./prompts/question_rephrase_claude.txt")
prompt_text = prompt_path.read_text(encoding="utf-8")
rephrase_prompt_text = f"{HUMAN_PROMPT}{prompt_text}{AI_PROMPT}"

query_rephraser_prompt = PromptTemplate.from_template(template=rephrase_prompt_text)

print(query_rephraser_prompt.template)

### Build MQ retriever with question rephraser chain and custom output parser

In [15]:
from langchain.schema.runnable import RunnablePassthrough

# create question rephraser chain with llm
rephraser_chain = LLMChain(llm=llm, prompt=query_rephraser_prompt, output_parser=output_parser)

# build MQR
retriever = MultiQueryRetriever(
    retriever=qdrant_retriever, llm_chain=rephraser_chain, parser_key="lines"
)  # "lines" is the key (attribute name) of the parsed output

question = " What are some of the approaches to benchmark RAG on Tables"

# Invoke MQR
unique_docs = retriever.get_relevant_documents(query=question)

INFO:langchain.retrievers.multi_query:Generated queries: ['What are some methods for evaluating RAG on tabular data?', 'What performance metrics can be used to assess RAG when applied to tables?  ', 'What techniques have researchers explored for measuring the effectiveness of RAG models on table understanding tasks?']


### Create RAG chain using enhanced MultiQuery retriever to answer questions

In [16]:
from langchain.schema.output_parser import StrOutputParser

# qna chain with re-ranking
rag_prompt_path = Path("./prompts/rag_prompt_claude.txt")
rag_prompt = PromptTemplate.from_file(rag_prompt_path, input_variables=["context", "question"])

# format retrieved docs within <context{idx}></context{idx}> tags
def format_context_docs(docs):
    context_string = ""
    for idx, _d in enumerate(docs):
        otag = f"<context{idx+1}>"
        ctag = f"</context{idx+1}>"
        c_text = f"{otag} {_d.page_content} {ctag}\n"
        context_string += c_text
    return context_string


# Input variables to RAG prompt are `context` and `question`
rag_chain = (
    {
        "context": retriever | format_context_docs,
        "question": RunnablePassthrough(),
        # "context": RunnableLambda(lambda output: format_context_docs(query=output, retriever=retriever)),
    }
    | rag_prompt
    | llm
    | StrOutputParser()
)

In [17]:
from IPython.display import Markdown, display

queries = [
    "Can we do Multimodal RAG on slide decks using langchain",
    "Difference between langchain core and community",
    "Are there any benchmarks for extraction",
    "What are some of the approaches to benchmark RAG on Tables. Output in a bulleted list format."
]

# Invoke chain on all queries
for q in queries:
    print(f"[b]Question: [b green]{q}[/b green]")
    output = rag_chain.invoke(q)
    display(Markdown(output))
    print("===" * 15)

INFO:langchain.retrievers.multi_query:Generated queries: ['Is it possible to perform Multimodal RAG on slide presentations using the langchain tool?', 'What are the capabilities for applying Multimodal RAG techniques to slideshow content such as PowerPoint decks through utilization of the langchain framework?  ', 'Can the langchain system facilitate Multimodal Relation-Aware Generation on slide deck files to analyze and generate content from the visual and textual elements?']


 <answer>Yes, LangChain supports multimodal RAG on slide decks using two main approaches: multi-modal embeddings and multi-vector retrieval. Multi-modal embeddings extract slides as images and use embeddings to retrieve relevant slides based on a user question, while multi-vector retrieval summarizes each slide image and embeds the summaries to retrieve relevant slides. A public benchmark was created to evaluate these approaches on a Datadog earnings presentation, finding multi-vector retrieval performed best.</answer>

INFO:langchain.retrievers.multi_query:Generated queries: ['What is the distinction between langchain core and community?', 'What are the key differences between langchain core versus community? ', 'How does langchain core diverge from or relate to the langchain community version?']


 <answer>
Langchain-core contains simple, core abstractions that have emerged as a standard, as well as LangChain Expression Language as a way to compose these components together. Langchain-community contains all third party integrations.
</answer>

INFO:langchain.retrievers.multi_query:Generated queries: ['What metrics exist for measuring extraction performance?', 'What standards are used to evaluate systems that extract information?  ', 'How can one assess the effectiveness of tools designed for extracting data?']


 <answer>Yes, the new dataset released offers a practical environment to test common challenges in LLM application development like classifying unstructured text, generating machine-readable information, and reasoning over multiple tasks with distracting information.</answer>

INFO:langchain.retrievers.multi_query:Generated queries: ['What are some methods for evaluating the performance of RAG models on tabular data structures? Please provide responses in a bulleted list format.', 'How can RAG models be assessed when processing and understanding table-formatted information? Represent the answers as a bulleted listing of assessment techniques.  ', 'What testing or measurement procedures can be used to gauge how well RAG handles and comprehends table data? List the approaches without paragraphs in a bulleted format.']


 <answer>
- Text-to-SQL: Translating natural language into SQL requests to query tables. This enables evaluation on structured data while preserving data privacy.
- Mixed type (structured and unstructured) data storage: Including an embedded document column using pgvector extension for PostgreSQL allows interacting with semi-structured data using natural language.
</answer>