### Install the required libraries.

In [None]:
# !pip install indexify-dspy 
# !pip install indexify
# !pip install indexify-extractor-sdk

In [41]:
import dspy
from indexify import IndexifyClient
from indexify_dspy.retriever import IndexifyRM

If the above cell causes some issues with dspy module, try running this:

In [None]:
# !pip install dspy-ai==2.0.1

### Initialize the OpenAI model using the API

In [42]:
turbo = dspy.OpenAI(model="gpt-3.5-turbo",api_key="<your_openai_api>")
indexify_client = IndexifyClient()
indexify_retriever_model = IndexifyRM("index_name", indexify_client)

dspy.settings.configure(lm=turbo, rm=indexify_retriever_model)

### Download the dataset from huggingface

https://huggingface.co/datasets/gamino/wiki_medical_terms

In [43]:
import pandas as pd

df = pd.read_parquet("hf://datasets/gamino/wiki_medical_terms/wiki_medical_terms.parquet")

In [44]:
df=df.dropna()
print(df)

Unnamed: 0,page_title,page_text
0,Paracetamol poisoning,"Paracetamol poisoning, also known as acetamino..."
1,Acromegaly,Acromegaly is a disorder that results from exc...
2,Actinic keratosis,"Actinic keratosis (AK), sometimes called solar..."
3,Congenital adrenal hyperplasia,Congenital adrenal hyperplasia (CAH) is a grou...
4,Adrenocortical carcinoma,Adrenocortical carcinoma (ACC) is an aggressi...
...,...,...
7271,Gephyrophobia,Gephyrophobia is the anxiety disorder or speci...
7272,Coronary artery bypass surgery,"Coronary artery bypass surgery, also known as ..."
7273,Unemployment,"Unemployment, according to the OECD (Organisat..."
7274,Surgical instrument,A surgical instrument is a tool or device for ...


In [55]:
medical_descriptions = df['page_text'].tolist()

In [115]:
print(medical_descriptions[893])

Primary progressive aphasia (PPA) is a type of neurological syndrome in which language capabilities slowly and progressively become impaired. As with other types of aphasia, the symptoms that accompany PPA depend on what parts of the left hemisphere are significantly damaged. However, unlike most other aphasias, PPA results from continuous deterioration in brain tissue, which leads to early symptoms being far less detrimental than later symptoms. Those with PPA slowly lose the ability to speak, write, read, and generally comprehend language. Eventually, almost every patient becomes mute and completely loses the ability to understand both written and spoken language. Although it was first described as solely impairment of language capabilities while other mental functions remain intact, it is now recognized that many, if not most of those with PPA experience impairment of memory, short-term memory formation and loss of executive functions. It was first described as a distinct syndrome b

### Initialize Indexify server and ingest data

##### Using terminal 1; Server:

curl https://getindexify.ai | sh

./indexify server -d

##### Using terminal 2; Extractors:
indexify-extractor download tensorlake/minilm-l6 

indexify-extractor download tensorlake/chunk-extractor 

indexify-extractor join-server

In [47]:
from indexify import IndexifyClient, ExtractionGraph

indexify_client = IndexifyClient()
extraction_graph_spec = """
name: 'medical'
extraction_policies:
  - extractor: 'tensorlake/minilm-l6'
    name: 'minilml6'
"""
extraction_graph = ExtractionGraph.from_yaml(extraction_graph_spec)
indexify_client.create_extraction_graph(extraction_graph)  

indexify_client.add_documents(
    "medical",
    medical_descriptions,
)

['36e4b32b6ae41b8e',
 'a3bbaecd55aeb14f',
 '9864dfecaf3279c9',
 '3d980ee121e87b86',
 '711a882c219bb053',
 '8ea59f30efc2a0cc',
 'baddf57bdc55b53c',
 'feaaee0116746d87',
 '504ccf98316c31f7',
 '9abccdd80275decb',
 '705c29d7c21b391d',
 'b046caee63bd36ae',
 'c2f84f532070d25f',
 'f93328c610d9786',
 '2d5f298208938ecd',
 '40e097dd070e98dc',
 'df58f42a8411fe3a',
 '22c53bd66e102cb',
 '693a95fa975e12ec',
 '6b4a94cad11301f8',
 'f6f2fd85ba2f32c2',
 'cd0ff38b1902526f',
 '86ef5b9c0c1f6c75',
 'c66cf2a57859d87a',
 'f932237bc78ea009',
 'a31b4ceca879cf09',
 '7f84c0b68f3b660b',
 'df3f240d062c4ece',
 '88c2452b96d6fb95',
 'b154d13cc04f3138',
 'bf9fe9670a14525f',
 'd04844c85984877a',
 'b964b44d03a5ba1e',
 '632406e9e7d04e47',
 'ed5a8184ba03b857',
 '3ac63b888b181a75',
 'a1026205572432e0',
 'f87f59ebc2adddb1',
 '51443c2c8a5190df',
 '84491ee4cf821419',
 '46d6df02951b7acd',
 'fe70e5ad890db370',
 '99f6afd0068df545',
 '732094df03417cc5',
 'ccb427be661ae3e2',
 '95f6333728c07ce9',
 '6f36d5dbb0948657',
 'e5295fb77667b

### Generate Context func using Indexify Client

In [48]:
def generate_context(query, k):
    retrieve = IndexifyRM(indexify_client)
    topk_passages = retrieve(query, "medical.minilml6.embedding", k=k).passages
    return topk_passages

In [57]:
query = "heart attack"
generate_context(query=query, k=2)

['Carditis (pl. carditides) is the inflammation of the heart.It is usually studied and treated by specifying it as:\nPericarditis is the inflammation of the pericardium\nMyocarditis is the inflammation of the heart muscle\nEndocarditis is the inflammation of the endocardium\nPancarditis, also called perimyoendocarditis, is the inflammation of the entire heart: the pericardium, the myocardium and the endocardium\nReflux carditis refers to a possible outcome of esophageal reflux (also known as GERD), and involves inflammation of the esophagus/stomach mucosa\n\n\n== References ==',
 'Coronary artery disease (CAD), also called coronary heart disease (CHD), ischemic heart disease (IHD), myocardial ischemia, or simply heart disease, involves the reduction of blood flow to the heart muscle due to build-up of atherosclerotic plaque in the arteries of the heart. It is the most common of the cardiovascular diseases. Types include stable angina, unstable angina, myocardial infarction, and sudden 

### DSPy Signatures and Module Defintion for Multi-Hop CoT RAG 

In [95]:
class RAGSignature(dspy.Signature):
    """Answer questions based on given the context."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="an answer not more than 2 lines")

In [96]:
class GenerateSearchQuery(dspy.Signature):
    """Write a simple search query that will help answer a complex question."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    query = dspy.OutputField()

In [97]:
# class ChainOfThoughtRAG(dspy.Module):
#     def __init__(self, context_length):
#         super().__init__()
#         self.retrieve = dspy.Retrieve(k=context_length)
#         self.generate_answer = dspy.ChainOfThought(RAGSignature)
#         self.k = context_length

#     def forward(self, question):
#         context = generate_context(question, k=self.k)
#         prediction = self.generate_answer(context=context, question=question)
#         return dspy.Prediction(context=context, answer=prediction.answer)

In [98]:
from dsp.utils import deduplicate

class MultiHopChainOfThoughtRAG(dspy.Module):
    def __init__(self, passages_per_hop=3, max_hops=2):
        super().__init__()

        self.generate_query = [dspy.ChainOfThought(GenerateSearchQuery) for _ in range(max_hops)]
        self.retrieve = dspy.Retrieve(k=passages_per_hop)
        self.generate_answer = dspy.ChainOfThought(RAGSignature)
        self.max_hops = max_hops
        self.k = passages_per_hop
    
    def forward(self, question):
        context = []
        
        for hop in range(self.max_hops):
            query = self.generate_query[hop](context=context, question=question).query
            passages = generate_context(query, k=self.k)
            context = deduplicate(context + passages)
        pred = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=pred.answer)

### Results

In [107]:
multi_hop_rag = MultiHopChainOfThoughtRAG(passages_per_hop=2, max_hops=2)

In [111]:
query = "does overdosing on paracetamol cures kidney failure? and what if i take 3 grams at once am i overdosing?"
response = multi_hop_rag(query).answer
print(response)

Overdosing on paracetamol does not cure kidney failure, and taking 3 grams at once is not considered an overdose for a healthy adult.


In [112]:
turbo.inspect_history(1)





Answer questions based on given the context.

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Answer: an answer not more than 2 lines

---

Context:
«Paracetamol poisoning, also known as acetaminophen poisoning, is caused by excessive use of the medication paracetamol (acetaminophen). Most people have few or non-specific symptoms in the first 24 hours following overdose. These include feeling tired, abdominal pain, or nausea. This is typically followed by a couple of days without any symptoms, after which yellowish skin, blood clotting problems, and confusion occurs as a result of liver failure. Additional complications may include kidney failure, pancreatitis, low blood sugar, and lactic acidosis. If death does not occur, people tend to recover fully over a couple of weeks. Without treatment, death from toxicity occurs 4 to 18 days later.Paracetamol poisoni

In [118]:
query = "what is Primary progressive aphasia and does it cause heart attacks? if not what causes them?"
response = multi_hop_rag(query).answer
print(response)

Primary progressive aphasia is a type of neurological syndrome that impairs language capabilities. It does not cause heart attacks. Heart attacks are typically caused by cardiovascular diseases, such as atherosclerosis, high blood pressure, and other risk factors.
