# LOTR (Lord of the Retreivers)

Lord of the Retrievers, also known as MergerRetriever, takes a **list of retrievers** as input and merges the results of their get_relevant_documents() methods into a single list. The merged results will be a list of documents that are relevant to the query and that have been ranked by the different retrievers.

The **MergerRetriever** class can be used to improve the accuracy of document retrieval in a number of ways.

* First, it can combine the results of multiple retrievers, which can help to reduce the risk of bias in the results.
* Second, it can rank the results of the different retrievers, which can help to ensure that the most relevant documents are returned first.

## Installing neccessary libraries

In [4]:
!pip install -U langchain langchain-openai huggingface_hub
!pip -q install lancedb  pypdf sentence-transformers openai
!pip install tiktoken
!pip install lancedb
!pip install sentence-transformers
!pip install openai
!pip install pypdf

Collecting langchain
  Downloading langchain-0.1.3-py3-none-any.whl (803 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m803.6/803.6 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain)
  Downloading dataclasses_json-0.6.3-py3-none-any.whl (28 kB)
Collecting langchain-community<0.1,>=0.0.14 (from langchain)
  Downloading langchain_community-0.0.15-py3-none-any.whl (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m12.3 MB/s[0m eta [36m0:00:00[0m
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain)
  Downloading marshmallow-3.20.2-py3-none-any.whl (49 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain)
  Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)
Collecting mypy-extens

In [5]:
import os
os.environ["OPENAI_API_KEY"] = "sk-"

## Import the libraries

In [6]:
from langchain.embeddings import HuggingFaceEmbeddings, OpenAIEmbeddings,HuggingFaceBgeEmbeddings
import os
from langchain.document_transformers import (
    EmbeddingsClusteringFilter,
    EmbeddingsRedundantFilter,
)
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import DocumentCompressorPipeline
from langchain.retrievers.merger_retriever import MergerRetriever
from langchain.vectorstores import LanceDB
import lancedb
from langchain.schema import Document
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.document_loaders import PyPDFLoader
from langchain.embeddings import SentenceTransformerEmbeddings
from re import search
from langchain.document_transformers import LongContextReorder
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI

### Download the data (you can change it to any pdf you like)

In [7]:
!wget  http://ccras.nic.in/sites/default/files/II%20Ayurveda%20Day/Ayurvedic%20%20Home%20Remedies%20English.pdf

--2024-01-23 21:04:19--  http://ccras.nic.in/sites/default/files/II%20Ayurveda%20Day/Ayurvedic%20%20Home%20Remedies%20English.pdf
Resolving ccras.nic.in (ccras.nic.in)... 164.100.158.210
Connecting to ccras.nic.in (ccras.nic.in)|164.100.158.210|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5321388 (5.1M) [application/pdf]
Saving to: ‘Ayurvedic  Home Remedies English.pdf’


2024-01-23 21:04:29 (593 KB/s) - ‘Ayurvedic  Home Remedies English.pdf’ saved [5321388/5321388]



## Defining the Embeddings model

In [8]:
medical_health_embedding = SentenceTransformerEmbeddings(
    model_name="NeuML/pubmedbert-base-embeddings")

hf_bge_embeddings = HuggingFaceBgeEmbeddings(model_name="BAAI/bge-large-en",
                                             model_kwargs={"device":"cpu"},
                                             encode_kwargs = {'normalize_embeddings': False})

filter_embeddings = OpenAIEmbeddings()  ## second embedding model

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


.gitattributes:   0%|          | 0.00/1.52k [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/6.12k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/74.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/667 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/123 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

similarity_evaluation_results.csv:   0%|          | 0.00/301 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/706k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.30k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/226k [00:00<?, ?B/s]

modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

.gitattributes:   0%|          | 0.00/1.52k [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/191 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/90.3k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/720 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

  warn_deprecated(


## Load the data

In [9]:
# load single pdfz
loader = PyPDFLoader("/content/Ayurvedic  Home Remedies English.pdf")
pages = loader.load_and_split()

## Define the text splitter

In [10]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=200)
text = text_splitter.split_documents(pages)
# text_esops = text_splitter.split_documents(documnet_esops)

### First Embedding

In [11]:
# embedding model 1
db = lancedb.connect('/tmp/lancedb')
table = db.create_table("health embedding", data=[
    {"vector": medical_health_embedding.embed_query("Hello World"), "text": "Hello World", "id": "1"}
], mode="overwrite")

# Initialize LanceDB retriever
db_all = LanceDB.from_documents(text, medical_health_embedding, connection=table)

### Second Embedding

In [12]:
# embedding model 2
db_multi = lancedb.connect('/tmp/lancedb')
table = db_multi.create_table("bge embedding", data=[
    {"vector": hf_bge_embeddings.embed_query("Hello World"), "text": "Hello World", "id": "1"}
], mode="overwrite")

# Initialize LanceDB retriever
db_multiqa = LanceDB.from_documents(text, hf_bge_embeddings, connection=table)

### Merge all the retrievers
This will hold the outputs from both the retrievers and can be used as any other retriever on different types of chains.


In [14]:
# med retriever
retriever_med = db_all.as_retriever(search_type="similarity",
                                  search_kwargs={"k": 5, "include_metadata": True})

# bge embeddings
retriever_bge = db_multiqa.as_retriever(search_type="similarity",
                                        search_kwargs={"k": 5, "include_metadata": True})

In [15]:
# merger retriever
lotr = MergerRetriever(retrievers=[retriever_med, retriever_bge])

for chunks in lotr.get_relevant_documents("What is use of tulsi ?"):
    print(chunks.page_content)

TULSI
(Ocimum sanctum,  Tulasi)
Cough/ Cold 
Fever
Skin allergy
Indigestion/ 
Loss of appetite
Greying of hair 
W ound/ulcer 
Ear pain5-10 ml. juice twice or thrice a day 
with honey.
30 ml. decoction from handful of 
leaves & 5 gm. Dhania  thrice a 
day.
5-10 ml. juice twice or thrice daily. 
5-10 ml. juice twice or thrice daily.
Coconut oil processed with Tulsi  
juice for regular use.
Juice mixed with honey & Haldi  
powder for application.
2-3 luke warm drops 2 times daily 
(D o n o t use w hen th ere is 
discharge).
25
TULSI
(Ocimum sanctum,  Tulasi)
Cough/ Cold 
Fever
Skin allergy
Indigestion/ 
Loss of appetite
Greying of hair 
W ound/ulcer 
Ear pain5-10 ml. juice twice or thrice a day 
with honey.
30 ml. decoction from handful of 
leaves & 5 gm. Dhania  thrice a 
day.
5-10 ml. juice twice or thrice daily. 
5-10 ml. juice twice or thrice daily.
Coconut oil processed with Tulsi  
juice for regular use.
Juice mixed with honey & Haldi  
powder for application.
2-3 luke warm drops 2 

### Remove **redundant results** from the merged retrievers
EmbeddingsRedundantFilter drops redundant documents by comparing their embeddings

In [16]:
# Remove redundant results from the merged retrievers EmbeddingsRedundantFilter drops redundant documents by comparing their embeddings
filter = EmbeddingsRedundantFilter(embeddings=hf_bge_embeddings)

reordering = LongContextReorder()

pipeline = DocumentCompressorPipeline(transformers=[filter, reordering])
compression_retriever_reordered = ContextualCompressionRetriever(
    base_compressor=pipeline, base_retriever=lotr,search_kwargs={"k": 5, "include_metadata": True}
)

docs = compression_retriever_reordered.get_relevant_documents("What is use of tulsi ?")
print(len(docs))

print(docs[0].page_content)

9
TULSI
(Ocimum sanctum,  Tulasi)
Cough/ Cold 
Fever
Skin allergy
Indigestion/ 
Loss of appetite
Greying of hair 
W ound/ulcer 
Ear pain5-10 ml. juice twice or thrice a day 
with honey.
30 ml. decoction from handful of 
leaves & 5 gm. Dhania  thrice a 
day.
5-10 ml. juice twice or thrice daily. 
5-10 ml. juice twice or thrice daily.
Coconut oil processed with Tulsi  
juice for regular use.
Juice mixed with honey & Haldi  
powder for application.
2-3 luke warm drops 2 times daily 
(D o n o t use w hen th ere is 
discharge).
25


### Chat with the data

In [19]:
llm = ChatOpenAI(openai_api_key=os.environ["OPENAI_API_KEY"])

qa = RetrievalQA.from_chain_type(
      llm=llm,
      chain_type="stuff",
      retriever = compression_retriever_reordered,
      return_source_documents = True
)

In [20]:
query ="What is use of tulsi?"
results = qa(query)
print(results['result'])

print(results["source_documents"])

Tulsi, also known as Holy Basil, has several uses. It can be used for cough and cold, fever, skin allergies, indigestion, loss of appetite, greying of hair, wounds and ulcers, and ear pain. It can be consumed as a juice or decoction, or applied topically in the form of a paste or mixed with other ingredients.
[_DocumentWithState(page_content='TULSI\n(Ocimum sanctum,  Tulasi)\nCough/ Cold \nFever\nSkin allergy\nIndigestion/ \nLoss of appetite\nGreying of hair \nW ound/ulcer \nEar pain5-10 ml. juice twice or thrice a day \nwith honey.\n30 ml. decoction from handful of \nleaves & 5 gm. Dhania  thrice a \nday.\n5-10 ml. juice twice or thrice daily. \n5-10 ml. juice twice or thrice daily.\nCoconut oil processed with Tulsi  \njuice for regular use.\nJuice mixed with honey & Haldi  \npowder for application.\n2-3 luke warm drops 2 times daily \n(D o n o t use w hen th ere is \ndischarge).\n25', metadata={'vector': array([-0.01830427,  0.01806732, -0.00766297, ..., -0.01976463,
       -0.006608

In [21]:
query ="having high fiver & cough what should i take as home remadies?"
results = qa(query)
print(results['result'])

# print(results["source_documents"])

For high fever and cough, you can try the following home remedies:

1. Ginger (Adrak): Take 10-20 ml of ginger decoction prepared from 2 gm. of dry rhizome twice a day. You can also mix 2-5 gm. of ginger powder with jaggery and take it three times a day in divided doses.

2. Honey (Madhu): Mix one teaspoon of honey with a glass of water and drink it in the morning. You can also mix honey with a pinch of pepper powder and take it three to four times a day.

3. Clove (Laung): Chew 1-2 cloves frequently throughout the day or take 1 gm. of clove powder with honey two to three times in divided doses. You can also prepare a warm decoction by putting 1 gm. of clove in 20 ml of water and consume it three to four times daily.

Remember, if your symptoms persist or worsen, it is important to consult a doctor.
