# Solutions for Missed Top Ranked, Not in Context, Not Extracted & Incorrect Specificity



Here we will explore the following strategies

- Effect of Embedder Models
- Advanced Retrieval Strategies
- Chained Retrieval with Rerankers
- Context Compression Strategies

#### Install OpenAI, HuggingFace and LangChain dependencies

In [None]:
!pip install langchain
!pip install langchain-openai
!pip install langchain-community
!pip install langchain-huggingface
!pip install langchain-chroma
!pip install rank_bm25

### Get Wikipedia Data

In [None]:
!gdown 1oWBnoxBZ1Mpeond8XDUSO6J9oAjcRDyW

Downloading...
From (original): https://drive.google.com/uc?id=1oWBnoxBZ1Mpeond8XDUSO6J9oAjcRDyW
From (redirected): https://drive.google.com/uc?id=1oWBnoxBZ1Mpeond8XDUSO6J9oAjcRDyW&confirm=t&uuid=99c322c4-544a-4a41-83e2-103863d44341
To: /content/simplewiki-2020-11-01.jsonl.gz
100% 50.2M/50.2M [00:00<00:00, 57.3MB/s]


In [2]:
import gzip
import json
from langchain.docstore.document import Document

wikipedia_filepath = 'simplewiki-2020-11-01.jsonl.gz'

docs = []
with gzip.open(wikipedia_filepath, 'rt', encoding='utf8') as fIn:
    for line in fIn:
        data = json.loads(line.strip())
        #Only add the first paragraph
        docs.append({
                        'metadata': {
                                        'title': data.get('title'),
                                        'article_id': data.get('id')
                        },
                        'data': data.get('paragraphs')[0] # restrict data to first 3 paragraphs to run later modules faster
        })

In [3]:
docs = [doc for doc in docs for x in ['india']
              if x in doc['data'].lower().split()]
docs = [Document(page_content=doc['data'],
                 metadata=doc['metadata']) for doc in docs]
len(docs)

767

In [4]:
docs[:3]

[Document(metadata={'title': 'Basil', 'article_id': '73985'}, page_content='Basil ("Ocimum basilicum") ( or ) is a plant of the Family Lamiaceae. It is also known as Sweet Basil or Tulsi. It is a tender low-growing herb that is grown as a perennial in warm, tropical climates. Basil is originally native to India and other tropical regions of Asia. It has been cultivated there for more than 5,000 years. It is prominently featured in many cuisines throughout the world. Some of them are Italian, Thai, Vietnamese and Laotian cuisines. It grows to between 30–60\xa0cm tall. It has light green, silky leaves 3–5\xa0cm long and 1–3\xa0cm broad. The leaves are opposite each other. The flowers are quite big. They are white in color and arranged as a spike.'),
 Document(metadata={'title': 'Roerich’s Pact', 'article_id': '259745'}, page_content='The Roerich Pact is a treaty on Protection of Artistic and Scientific Institutions and Historic Monuments, signed by the representatives of 21 states in the

# Exploring Embedding Models

In [6]:
from langchain_community.embeddings import HuggingFaceBgeEmbeddings

model_name = "BAAI/bge-small-en"
model_kwargs = {"device": "cpu"} # For faster performance, set to "cuda" if you have a GPU
encode_kwargs = {"normalize_embeddings": True}
hf_bge_embedding = HuggingFaceBgeEmbeddings(
    model_name=model_name, model_kwargs=model_kwargs, encode_kwargs=encode_kwargs
)

In [7]:
from langchain_chroma import Chroma

# create vector DB of docs and embeddings - takes < 30s on Colab
chroma_db1 = Chroma.from_documents(documents=docs,
                                  collection_name='wikipedia_db1',
                                  embedding=hf_bge_embedding,
                                  collection_metadata={"hnsw:space": "cosine"})


In [8]:
retriever1 = chroma_db1.as_retriever(search_type="similarity",
                                     search_kwargs={"k": 2})

In [9]:
query = "what is the capital of India?"
top_docs = retriever1.invoke(query)
top_docs

[Document(metadata={'article_id': '4062', 'title': 'Kolkata'}, page_content="Kolkata (spelled Calcutta before 1 January 2001) is the capital city of the Indian state of West Bengal. It is the second largest city in India after Mumbai. It is on the east bank of the River Hooghly. When it is called Calcutta, it includes the suburbs. This makes it the third largest city of India. This also makes it the world's 8th largest metropolitan area as defined by the United Nations. Kolkata served as the capital of India during the British Raj until 1911. Kolkata was once the center of industry and education. However, it has witnessed political violence and economic problems since 1954. Since 2000, Kolkata has grown due to economic growth. Like other metropolitan cities in India, Kolkata struggles with poverty, pollution and traffic congestion."),
 Document(metadata={'article_id': '5117', 'title': 'New Delhi'}, page_content='New Delhi () is the capital of India and a union territory of the megacity

In [10]:
query = "what is the old capital of India?"
top_docs = retriever1.invoke(query)
top_docs

[Document(metadata={'article_id': '5117', 'title': 'New Delhi'}, page_content='New Delhi () is the capital of India and a union territory of the megacity of Delhi. It has a very old history and is home to several monuments where the city is expensive to live in. In traditional Indian geography it falls under the North Indian zone. The city has an area of about 42.7\xa0km. New Delhi has a population of about 9.4 Million people.'),
 Document(metadata={'article_id': '5113', 'title': 'Chennai'}, page_content='Chennai (formerly known as Madras) is the capital city of the Indian state of Tamil Nadu. It has a population of about 7 million people. Almost 10% of all of the people in the state live in Chennai. The city is the fourth largest city of India. It was founded in 1661 by the British East India Company. The city is on the Coromandel Coast of the Bay of Bengal.')]

# Exploring Advanced Retrieval & Reranking Strategies

### Multi Query Retrieval

Retrieval may produce different results with subtle changes in query wording, or if the embeddings do not capture the semantics of the data well. Prompt engineering / tuning is sometimes done to manually address these problems, but can be tedious.

The [`MultiQueryRetriever`](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.multi_query.MultiQueryRetriever.html) automates the process of prompt tuning by using an LLM to generate multiple queries from different perspectives for a given user input query. For each query, it retrieves a set of relevant documents and takes the unique union across all queries to get a larger set of potentially relevant documents.

In [11]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="llama3:8b-instruct-fp16", temperature=0, base_url="http://172.31.0.1:11434/v1", api_key="EMPTY")

In [12]:
from langchain.retrievers.multi_query import MultiQueryRetriever
# Set logging for the queries
import logging

similarity_retriever3 = chroma_db1.as_retriever(search_type="similarity",
                                                search_kwargs={"k": 2})

mq_retriever = MultiQueryRetriever.from_llm(
    retriever=similarity_retriever3, llm=llm,
    include_original=True
)

logging.basicConfig()
# so we can see what queries are generated by the LLM
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

In [13]:
query = "what is the capital of India?"
docs = mq_retriever.invoke(query)
docs

INFO:langchain.retrievers.multi_query:Generated queries: ['Here are three different versions of the original question:', '', 'What cities serve as the administrative centers of the Government of India?', '', 'Which city is officially recognized as the capital of the Republic of India?', '', 'Can you provide information about the metropolitan area that houses the seat of government in India?', '', 'These alternative questions aim to capture different aspects and nuances of the original query, which can help retrieve relevant documents from a vector database.']


[Document(metadata={'article_id': '731954', 'title': 'Adivasi'}, page_content="The Adivasi people of India look very different from the Dravidians and Aryans. Adivasi people face discrimination and it's hard for them to get jobs in the city areas. These groups have survived in the forest and mountain regions in India for many years. They were the first to inhabit India."),
 Document(metadata={'article_id': '78279', 'title': 'Poverty in India'}, page_content='Poverty in India is an important issue. India has some of the poorest people in the world.'),
 Document(metadata={'article_id': '29943', 'title': 'Ayurveda'}, page_content='Ayurveda is a traditional system of medicine and medication, based on experience and observation. This system of medicine and medication is more than 3000 years old. According to mythological story, Dhanvanteri was the first physician to use Ayurveda. In modern India also, Ayurveda is being used. Indian Government give equal importance ayurveda as other pathies.

In [14]:
query = "what is the old capital of India?"
docs = mq_retriever.invoke(query)
docs

INFO:langchain.retrievers.multi_query:Generated queries: ['Here are three different versions of the original question:', '', 'What was the capital of India before New Delhi?', '', 'Which city served as the capital of British India prior to independence?', '', 'Can you provide information about the historical capital of undivided India, i.e., pre-partition India?', '', 'These alternative questions aim to capture different aspects and nuances of the original query, which can help retrieve relevant documents from a vector database that may not have been retrieved by the original question alone.']


[Document(metadata={'article_id': '731954', 'title': 'Adivasi'}, page_content="The Adivasi people of India look very different from the Dravidians and Aryans. Adivasi people face discrimination and it's hard for them to get jobs in the city areas. These groups have survived in the forest and mountain regions in India for many years. They were the first to inhabit India."),
 Document(metadata={'article_id': '78279', 'title': 'Poverty in India'}, page_content='Poverty in India is an important issue. India has some of the poorest people in the world.'),
 Document(metadata={'article_id': '29943', 'title': 'Ayurveda'}, page_content='Ayurveda is a traditional system of medicine and medication, based on experience and observation. This system of medicine and medication is more than 3000 years old. According to mythological story, Dhanvanteri was the first physician to use Ayurveda. In modern India also, Ayurveda is being used. Indian Government give equal importance ayurveda as other pathies.

### Hybrid Search (BM25 + Semantic)

In [15]:
laptops = [
    "The laptop model XPS13-9380 by Dell comes with an Intel Core i7-8565U processor, 16GB RAM, and 512GB SSD. It also features a 13.3-inch FHD display.",
    "Apple's MacBook Pro 16-inch model MVVJ2LL/A includes a 9th-generation Intel Core i9 processor, 16GB of RAM, and a 1TB SSD. It has a stunning Retina display with True Tone technology.",
    "The HP Spectre x360 15T-eb000 has an Intel Core i7-10750H processor, 16GB RAM, and a 512GB SSD. This model, 7DF22AV_1, also features a 4K UHD touch display.",
    "Lenovo's ThinkPad X1 Carbon Gen 8, part number 20U9005MUS, is equipped with an Intel Core i7-10510U, 16GB RAM, and a 1TB SSD. It is known for its lightweight design and durability.",
    "The ASUS ZenBook 14 UX434FL-DB77 features an Intel Core i7-8565U, 16GB RAM, and a 512GB SSD. This model also comes with a 14-inch FHD display and ScreenPad 2.0.",
    "Microsoft Surface Laptop 3 model VEF-00064 has an Intel Core i7-1065G7 processor, 16GB RAM, and a 256GB SSD. It is known for its sleek design and vibrant PixelSense display."
]

In [16]:
laptop_db = Chroma.from_texts(laptops, collection_name='laptop_db',
                              embedding=hf_bge_embedding,
                              collection_metadata={"hnsw:space": "cosine"})

In [17]:
from langchain.retrievers import BM25Retriever, EnsembleRetriever

similarity_retriever = laptop_db.as_retriever(search_type="similarity",
                                                search_kwargs={"k": 2})

bm25_retriever = BM25Retriever.from_texts(laptops)
bm25_retriever.k = 2
# reciprocal rank fusion
ensemble_retriever = EnsembleRetriever(
    retrievers=[bm25_retriever, similarity_retriever],
    weights=[0.7, 0.3]
)


In [18]:
# just cosine embedding similarity
query = "laptops with 16GB RAM and processor i7-1065G7 intel"
docs = similarity_retriever.invoke(query)
docs

[Document(page_content='Microsoft Surface Laptop 3 model VEF-00064 has an Intel Core i7-1065G7 processor, 16GB RAM, and a 256GB SSD. It is known for its sleek design and vibrant PixelSense display.'),
 Document(page_content='The laptop model XPS13-9380 by Dell comes with an Intel Core i7-8565U processor, 16GB RAM, and 512GB SSD. It also features a 13.3-inch FHD display.')]

In [19]:
# hybrid search - bm25 and cosine embedding similarity
query = "laptops with 16GB RAM and processor i7-1065G7 intel"
docs = ensemble_retriever.invoke(query)
docs

[Document(page_content='Microsoft Surface Laptop 3 model VEF-00064 has an Intel Core i7-1065G7 processor, 16GB RAM, and a 256GB SSD. It is known for its sleek design and vibrant PixelSense display.'),
 Document(page_content='The ASUS ZenBook 14 UX434FL-DB77 features an Intel Core i7-8565U, 16GB RAM, and a 512GB SSD. This model also comes with a 14-inch FHD display and ScreenPad 2.0.'),
 Document(page_content='The laptop model XPS13-9380 by Dell comes with an Intel Core i7-8565U processor, 16GB RAM, and 512GB SSD. It also features a 13.3-inch FHD display.')]

### Chained Retrieval with Reranker


This strategy uses a chain of multiple retrievers sequentially to get to the most relevant documents. The following is the flow

Similarity Retrieval → Reranker Model Retrieval

In [25]:
from langchain_community.cross_encoders import HuggingFaceCrossEncoder
from langchain.retrievers.document_compressors import CrossEncoderReranker
from langchain.retrievers import ContextualCompressionRetriever

# Retriever 1 - simple cosine distance based retriever
similarity_retriever = chroma_db1.as_retriever(search_type="similarity",
                                              search_kwargs={"k": 3})

# download an open-source reranker model - cross-encoder/qnli-electra-base
reranker = HuggingFaceCrossEncoder(model_name="cross-encoder/qnli-electra-base")
reranker_compressor = CrossEncoderReranker(model=reranker, top_n=2)
# Retriever 2 - Uses a Reranker model to rerank retrieval results from the previous retriever
final_retriever = ContextualCompressionRetriever(
    base_compressor=reranker_compressor,
    base_retriever=similarity_retriever
)

In [26]:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"  # Replace "0" with the appropriate GPU ID


In [27]:
query = "what is the capital of India?"
docs = final_retriever.invoke(query)
docs

[Document(metadata={'article_id': '5117', 'title': 'New Delhi'}, page_content='New Delhi () is the capital of India and a union territory of the megacity of Delhi. It has a very old history and is home to several monuments where the city is expensive to live in. In traditional Indian geography it falls under the North Indian zone. The city has an area of about 42.7\xa0km. New Delhi has a population of about 9.4 Million people.'),
 Document(metadata={'article_id': '5113', 'title': 'Chennai'}, page_content='Chennai (formerly known as Madras) is the capital city of the Indian state of Tamil Nadu. It has a population of about 7 million people. Almost 10% of all of the people in the state live in Chennai. The city is the fourth largest city of India. It was founded in 1661 by the British East India Company. The city is on the Coromandel Coast of the Bay of Bengal.')]

In [28]:
query = "what is the old capital of India?"
docs = final_retriever.invoke(query)
docs

[Document(metadata={'article_id': '21772', 'title': 'Thiruvananthapuram'}, page_content='Thiruvananthapuram () is the capital city of the Indian state of Kerala. The city used to be known by the name of Trivandrum. It is on the west coast of India near the far south of the mainland.'),
 Document(metadata={'article_id': '5113', 'title': 'Chennai'}, page_content='Chennai (formerly known as Madras) is the capital city of the Indian state of Tamil Nadu. It has a population of about 7 million people. Almost 10% of all of the people in the state live in Chennai. The city is the fourth largest city of India. It was founded in 1661 by the British East India Company. The city is on the Coromandel Coast of the Bay of Bengal.')]

# Exploring Context Compression Strategies

### LLM Prompt-based Contextual Compression Retrieval

The context compression can happen in the form of:

- Remove parts of the content of retrieved documents which are not relevant to the query. This is done by extracting only relevant parts of the document to the given query

- Filter out documents which are not relevant to the given query but do not remove content from the document

Here we look at `LLMChainExtractor`, which will iterate over the initially returned documents and extract from each only the content that is relevant to the query. Totally irrelevant documents might also be dropped

In [29]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="llama3:8b-instruct-fp16", temperature=0, base_url="http://172.31.0.1:11434/v1", api_key="EMPTY")

similarity_retriever = chroma_db1.as_retriever(search_type="similarity",
                                              search_kwargs={"k": 3})

# extracts from each document only the content that is relevant to the query
compressor = LLMChainExtractor.from_llm(llm=llm)

# retrieves the documents similar to query and then applies the compressor
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=similarity_retriever
)

In [30]:
query = "what is the capital of India?"
docs = compression_retriever.invoke(query)
docs

[Document(metadata={'article_id': '5117', 'title': 'New Delhi'}, page_content='New Delhi () is the capital of India and a union territory of the megacity of Delhi.')]

In [31]:
query = "what is the old capital of India?"
docs = compression_retriever.invoke(query)
docs

[Document(metadata={'article_id': '5117', 'title': 'New Delhi'}, page_content='New Delhi () is the capital of India and a union territory of the megacity of Delhi.\n\nNote: There are no other relevant parts in the context that mention the old capital of India, so this is the only extracted part.')]

The `LLMChainFilter` is slightly simpler but more robust compressor that uses an LLM chain to decide which of the initially retrieved documents to filter out and which ones to return, without manipulating the document contents.

In [32]:
from langchain.retrievers.document_compressors import LLMChainFilter

similarity_retriever = chroma_db1.as_retriever(search_type="similarity",
                                              search_kwargs={"k": 3})

#  decides which of the initially retrieved documents to filter out and which ones to return
_filter = LLMChainFilter.from_llm(llm=llm)

# retrieves the documents similar to query and then applies the filter
compression_retriever = ContextualCompressionRetriever(
    base_compressor=_filter, base_retriever=similarity_retriever
)

In [33]:
query = "what is the capital of India?"
docs = compression_retriever.invoke(query)
docs

[Document(metadata={'article_id': '5117', 'title': 'New Delhi'}, page_content='New Delhi () is the capital of India and a union territory of the megacity of Delhi. It has a very old history and is home to several monuments where the city is expensive to live in. In traditional Indian geography it falls under the North Indian zone. The city has an area of about 42.7\xa0km. New Delhi has a population of about 9.4 Million people.')]

In [34]:
query = "what is the old capital of India?"
docs = compression_retriever.invoke(query)
docs

[Document(metadata={'article_id': '5117', 'title': 'New Delhi'}, page_content='New Delhi () is the capital of India and a union territory of the megacity of Delhi. It has a very old history and is home to several monuments where the city is expensive to live in. In traditional Indian geography it falls under the North Indian zone. The city has an area of about 42.7\xa0km. New Delhi has a population of about 9.4 Million people.')]