<a href="https://colab.research.google.com/github/plaban1981/Agents/blob/main/ReAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Install required Dependencies

In [None]:
!pip install langchain langchain_groq langchain_ollama langchain_community pymupdf pypdf

Collecting langchain
  Downloading langchain-0.3.18-py3-none-any.whl.metadata (7.8 kB)
Collecting langchain_groq
  Downloading langchain_groq-0.2.4-py3-none-any.whl.metadata (3.0 kB)
Collecting langchain_ollama
  Downloading langchain_ollama-0.2.3-py3-none-any.whl.metadata (1.9 kB)
Collecting langchain_community
  Downloading langchain_community-0.3.17-py3-none-any.whl.metadata (2.4 kB)
Collecting pymupdf
  Downloading pymupdf-1.25.3-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (3.4 kB)
Collecting pypdf
  Downloading pypdf-5.3.0-py3-none-any.whl.metadata (7.2 kB)
Collecting langchain-core<1.0.0,>=0.3.34 (from langchain)
  Downloading langchain_core-0.3.35-py3-none-any.whl.metadata (5.9 kB)
Collecting langchain-text-splitters<1.0.0,>=0.3.6 (from langchain)
  Downloading langchain_text_splitters-0.3.6-py3-none-any.whl.metadata (1.9 kB)
Collecting langsmith<0.4,>=0.1.17 (from langchain)
  Downloading langsmith-0.3.8-py3-none-any.whl.metadata (14 kB)
Collecting pydanti

### Download Data

In [None]:
!mkdir ./data
!mkdir ./chunk_caches
!wget "https://www.binasss.sa.cr/int23/8.pdf" -O "./data/fibromyalgia.pdf"

--2025-02-16 16:35:47--  https://www.binasss.sa.cr/int23/8.pdf
Resolving www.binasss.sa.cr (www.binasss.sa.cr)... 196.40.24.242
Connecting to www.binasss.sa.cr (www.binasss.sa.cr)|196.40.24.242|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 632664 (618K) [application/pdf]
Saving to: ‘./data/fibromyalgia.pdf’


2025-02-16 16:35:48 (1.45 MB/s) - ‘./data/fibromyalgia.pdf’ saved [632664/632664]



#### Setup LLM

In [None]:
from langchain_groq import ChatGroq
from langchain_ollama import ChatOllama
import os
os.environ["GROQ_API_KEY"] = "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"
#
llm_relevancy = ChatGroq(
    model="llama-3.3-70b-versatile",
    temperature=0,)
#
llm = ChatOllama(model="deepseek-r1:14b",
                 temperature=0.6,
                 max_tokens=3000,
                )


### Define System Prompt

In [None]:
REAG_SYSTEM_PROMPT = """
# Role and Objective
You are an intelligent knowledge retrieval assistant. Your task is to analyze provided documents or URLs to extract the most relevant information for user queries.

# Instructions
1. Analyze the user's query carefully to identify key concepts and requirements.
2. Search through the provided sources for relevant information and output the relevant parts in the 'content' field.
3. If you cannot find the necessary information in the documents, return 'isIrrelevant: true', otherwise return 'isIrrelevant: false'.

# Constraints
- Do not make assumptions beyond available data
- Clearly indicate if relevant information is not found
- Maintain objectivity in source selection
"""

### Define RAG Prompt

In [None]:
rag_prompt = """You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question}
Context: {context}
Answer:
"""

### Define Relevancy Schema

In [None]:
from pydantic import BaseModel,Field
from typing import List
from langchain_core.output_parsers import JsonOutputParser

class ResponseSchema(BaseModel):
    content: str = Field(...,description="The page content of the document that is relevant or sufficient to answer the question asked")
    reasoning: str = Field(...,description="The reasoning for selecting The page content with respect to the question asked")
    is_irrelevant: bool = Field(...,description="Specify 'True' if the content in the document is not sufficient or relevant to answer the question asked otherwise specify 'False' if the context or page content is relevant to answer the question asked")


class RelevancySchemaMessage(BaseModel):
    source: ResponseSchema

relevancy_parser = JsonOutputParser(pydantic_object=RelevancySchemaMessage)

### Load and Process Dpocument

In [None]:
from langchain_community.document_loaders import PyMuPDFLoader

file_path = "./data/fibromyalgia.pdf"
loader = PyMuPDFLoader(file_path)
#
docs = loader.load()
print(len(docs))
print(docs[0].metadata)

8
{'producer': 'Acrobat Distiller 6.0 for Windows', 'creator': 'Elsevier', 'creationdate': '2023-01-20T09:25:19-06:00', 'source': './data/fibromyalgia.pdf', 'file_path': './data/fibromyalgia.pdf', 'total_pages': 8, 'format': 'PDF 1.7', 'title': 'Fibromyalgia: Diagnosis and Management', 'author': 'Bradford T. Winslow MD', 'subject': 'American Family Physician, 107 (2023) 137-144', 'keywords': '', 'moddate': '2023-02-27T15:02:12+05:30', 'trapped': '', 'modDate': "D:20230227150212+05'30'", 'creationDate': "D:20230120092519-06'00'", 'page': 0}


In [None]:
docs[0].metadata

{'producer': 'Acrobat Distiller 6.0 for Windows',
 'creator': 'Elsevier',
 'creationdate': '2023-01-20T09:25:19-06:00',
 'source': './data/fibromyalgia.pdf',
 'file_path': './data/fibromyalgia.pdf',
 'total_pages': 8,
 'format': 'PDF 1.7',
 'title': 'Fibromyalgia: Diagnosis and Management',
 'author': 'Bradford T. Winslow MD',
 'subject': 'American Family Physician, 107 (2023) 137-144',
 'keywords': '',
 'moddate': '2023-02-27T15:02:12+05:30',
 'trapped': '',
 'modDate': "D:20230227150212+05'30'",
 'creationDate': "D:20230120092519-06'00'",
 'page': 0}

### Helper Function to format Documents

In [None]:
from langchain.schema import Document
def format_doc(doc: Document) -> str:
    return f"Document_Title: {doc.metadata['title']}\nPage: {doc.metadata['page']}\nContent: {doc.page_content}"

### Define Sytem Prompt

In [None]:
system = REAG_SYSTEM_PROMPT
#system = f"{self.system}\n\n# Available source\n\n{format_doc(document)}"

In [None]:
### Helper function to extract relevant context
from langchain_core.prompts import PromptTemplate
def extract_relevant_context(question,documents):
    result = []
    for doc in documents:
        formatted_documents = format_doc(doc)
        system = f"{REAG_SYSTEM_PROMPT}\n\n# Available source\n\n{formatted_documents}"
        prompt = f"""Determine if the 'Avaiable source' content supplied is sufficient and relevant to ANSWER the QUESTION asked.
        QUESTION: {question}
        #INSTRUCTIONS TO FOLLOW
        1. Analyze the context provided thoroughly to check its relevancy to help formulizing a response for the QUESTION asked.
        2, STRICTLY PROVIDE THE RESPONSE IN A JSON STRUCTURE AS DESCRIBED BELOW:
            ```json
               {{"content":<<The page content of the document that is relevant or sufficient to answer the question asked>>,
                 "reasoning":<<The reasoning for selecting The page content with respect to the question asked>>,
                 "is_irrelevant":<<Specify 'True' if the content in the document is not sufficient or relevant.Specify 'False' if the page content is sufficient to answer the QUESTION>>
                 }}
            ```
         """
        messages =[ {"role": "system", "content": system},
                       {"role": "user", "content": prompt},
                    ]
        response = llm_relevancy.invoke(messages)
        print(response.content)
        formatted_response = relevancy_parser.parse(response.content)
        result.append(formatted_response)
    final_context = []
    for items in result:
        if (items['is_irrelevant'] == False) or ( items['is_irrelevant'] == 'false') or (items['is_irrelevant'] == 'False'):
            final_context.append(items['content'])
    return final_context







In [None]:
question = "What is Fibromyalgia?"
final_context = extract_relevant_context(question,docs)

```json
{
  "content": "Fibromyalgia is characterized by diffuse musculoskeletal pain, fatigue, poor sleep, and other somatic symptoms. Fibromyalgia is a chronic, centralized pain syndrome characterized by disordered processing of painful stimuli.",
  "reasoning": "The provided content defines fibromyalgia, its characteristics, and symptoms, making it directly relevant to answering the question 'What is Fibromyalgia?'. The content explicitly states that fibromyalgia is a condition involving diffuse musculoskeletal pain, fatigue, and other somatic symptoms, and further clarifies it as a chronic, centralized pain syndrome.",
  "is_irrelevant": "False"
}
```
```json
{
  "content": "FIBROMYALGIA Clinical Presentation Chronic diffuse pain is the predominant symptom in most patients with fibromyalgia. Patients may also experience muscle stiffness and tenderness. The physical examination in patients with fibromyalgia generally finds diffuse tenderness without other unusual findings. If joint 

In [None]:
len(final_context)

4

In [None]:
def generate_response(question,final_context):
    prompt = PromptTemplate(template=rag_prompt,
                                     input_variables=["question","context"],)
    chain  = prompt | llm
    response = chain.invoke({"question":question,"context":final_context})
    print(response.content.split("\n\n")[-1])
    return response.content.split("\n\n")[-1]

In [None]:
final_response = generate_response(question,final_context)
final_response

Fibromyalgia is a chronic condition characterized by widespread musculoskeletal pain, fatigue, disrupted sleep, and cognitive difficulties like "fibrofog." It is often associated with heightened sensitivity to pain due to altered nervous system processing. Diagnosis considers symptoms such as long-term pain, fatigue, and sleep issues without underlying inflammation or injury.


'Fibromyalgia is a chronic condition characterized by widespread musculoskeletal pain, fatigue, disrupted sleep, and cognitive difficulties like "fibrofog." It is often associated with heightened sensitivity to pain due to altered nervous system processing. Diagnosis considers symptoms such as long-term pain, fatigue, and sleep issues without underlying inflammation or injury.'

In [None]:
question =  "What are the causes of Fibromyalgia?"
final_context = extract_relevant_context(question,docs)
final_response = generate_response(question,final_context)


```json
{
  "content": "Fibromyalgia is likely caused by disordered central nociceptive signal processing that leads to sensitization expressed as hyperalgesia and allodynia, which is similar to chronic pain conditions such as irritable bowel syndrome, interstitial cystitis, chronic pelvic pain, and chronic low back pain.6,7 Functional brain imaging suggests that this aberrant processing may be attributed to an imbalance between excitatory and inhibitory neurotransmitters, particularly within the insula.8 Suggested etiologies include dysfunction of the hypothalamic-pituitary-adrenal axis and the autonomic nervous system, diffuse inflammation, glial cell activation, small fiber neuropathy, and infections such as the Epstein-Barr virus, Lyme disease, and viral hepatitis.9 Twin studies suggest a genetic component may also be a factor.10",
  "reasoning": "The provided content discusses the pathophysiology of fibromyalgia, including the potential causes and underlying mechanisms. The text m

In [None]:
question =  "Do people suffering from rheumatologic conditions may have fibromyalgia?"
final_context = extract_relevant_context(question,docs)
final_response = generate_response(question,final_context)

```json
{
  "content": "Comorbid conditions, such as functional somatic syndromes, psychiatric diagnoses, and rheumatologic conditions may be present. ",
  "reasoning": "The content is relevant to the question because it explicitly mentions that people suffering from rheumatologic conditions may have fibromyalgia, as it states that comorbid conditions including rheumatologic conditions may be present in patients with fibromyalgia.",
  "is_irrelevant": "False"
}
```
```json
{
  "content": "The presence of another painful disorder does not exclude the diagnosis of fibromyalgia.",
  "reasoning": "The question asks if people suffering from rheumatologic conditions may have fibromyalgia. The provided content mentions that the presence of another painful disorder, which could include rheumatologic conditions, does not exclude the diagnosis of fibromyalgia. This suggests that individuals with rheumatologic conditions can also have fibromyalgia.",
  "is_irrelevant": "False"
}
```
```json
{
  "

In [None]:
question =  "Mention the nonpharmacologic treatment for fibromyalgia?"
final_context = extract_relevant_context(question,docs)
final_response = generate_response(question,final_context)

In [None]:
question =  "What are the medications and doses for Fibromyalgia?"
final_context = extract_relevant_context(question,docs)
final_response = generate_response(question,final_context)

In [None]:
question =  "What is the starting dosage of Amitriptyline?"
final_context = extract_relevant_context(question,docs)
final_response = generate_response(question,final_context)

In [None]:
question =  "What is the starting dosage of Amitriptyline?"
final_context = extract_relevant_context(question,docs)
final_response = generate_response(question,final_context)

In [None]:
question = "What has been mentioned about AAPT 2019 Diagnostic Criteria for Fibromyalgia"
final_context = extract_relevant_context(question,docs)
final_response = generate_response(question,final_context)