In [2]:
from langchain_core.documents import Document

doc=Document(
    page_content="this is my content",
    metadata={'source':'pj'},
    id=1
)

doc.page_content

'this is my content'

In [5]:
### loading data from the pdf

from langchain_community.document_loaders import PyPDFLoader

document=PyPDFLoader(
    file_path=("..\documents\medical_book.pdf")
)

document=document.load()

In [6]:
print(len(document))

759


In [7]:
print(document[4].page_content)

The Gale Encyclopedia of Medicine 2is a medical ref-
erence product designed to inform and educate readers
about a wide variety of disorders, conditions, treatments,
and diagnostic tests. The Gale Group believes the product
to be comprehensive, but not necessarily definitive. It is
intended to supplement, not replace, consultation with a
physician or other healthcare practitioner. While the Gale
Group has made substantial efforts to provide information
that is accurate, comprehensive, and up-to-date, the Gale
Group makes no representations or warranties of any
kind, including without limitation, warranties of mer-
chantability or fitness for a particular purpose, nor does it
guarantee the accuracy, comprehensiveness, or timeliness
of the information contained in this product. Readers
should be aware that the universe of medical knowledge
is constantly growing and changing, and that differences
of medical opinion exist among authorities. Readers are
also advised to seek professional dia

### step 2 chunking dataset

In [8]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=800,
    chunk_overlap=150
)

chunks = splitter.split_documents(document)

In [9]:
chunks[1].metadata

{'producer': 'GPL Ghostscript 9.10',
 'creator': '',
 'creationdate': '2017-05-01T10:37:35-07:00',
 'moddate': '2017-05-01T10:37:35-07:00',
 'title': '',
 'author': '',
 'subject': '',
 'keywords': '',
 'source': '..\\documents\\medical_book.pdf',
 'total_pages': 759,
 'page': 1,
 'page_label': '2'}

In [10]:
len(chunks)

4986

In [11]:
print(chunks[2500].page_content)

Infection of the upper urinary tract involves the spread of
bacteria to the kidney and is called pyelonephritis.
Description
The frequency of bladder infections in humans varies
significantly according to age and sex. The male/female
GALE ENCYCLOPEDIA OF MEDICINE 2 991
Cystitis


In [12]:
## creating embedding model

from langchain_community.embeddings import HuggingFaceEmbeddings

embedding_model=HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

  embedding_model=HuggingFaceEmbeddings(
Loading weights: 100%|██████████| 103/103 [00:00<00:00, 381.55it/s, Materializing param=pooler.dense.weight]                             
[1mBertModel LOAD REPORT[0m from: sentence-transformers/all-MiniLM-L6-v2
Key                     | Status     |  | 
------------------------+------------+--+-
embeddings.position_ids | UNEXPECTED |  | 

[3mNotes:
- UNEXPECTED[3m	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.[0m


In [13]:
## storing the document in vector db

from langchain_community.vectorstores import FAISS

vector_db=FAISS.from_documents(
    documents=chunks,
    embedding=embedding_model
)

vector_db.save_local(folder_path="../vectorDB",index_name="Faiss_index")

In [14]:
## retrival pipeline 

vector_db = FAISS.load_local(
    folder_path="../vectorDB",
    index_name="Faiss_index",
    embeddings=embedding_model,
    allow_dangerous_deserialization=True
)


In [15]:
retriever = vector_db.as_retriever(
    search_kwargs={"k": 6}
)


## building LLM 


In [None]:
## using groq llama3 model

# from langchain_groq import ChatGroq
# from dotenv import load_dotenv
# import os

# load_dotenv()

# llm = ChatGroq(
#     groq_api_key=os.getenv("GROQ_API_KEY"),
#     model_name="llama3-70b-8192"
# )

: 

In [16]:
import os
from langchain_google_genai import ChatGoogleGenerativeAI
from dotenv import load_dotenv

load_dotenv()

os.environ["GEMINI_API_KEY"] = os.getenv("GEMINI_API_KEY")

model = ChatGoogleGenerativeAI(model="gemini-2.5-flash-lite")

In [27]:
query="what are symtoms of diabeties"

docs = retriever.invoke(query)

context="\n".join([doc.page_content for doc in docs])



In [28]:
print(context)

absorption. These drugs include haloperidol, lithium car-
bonate, phenothiazines, tricyclic antidepressants, and
adrenergic agonists. Other medications that can cause
diabetes symptoms include isoniazid, nicotinic acid,
cimetidine, and heparin.
Symptoms
Symptoms of diabetes can develop suddenly (over
days or weeks) in previously healthy children or adoles-
cents, or can develop gradually (over several years) in
overweight adults over the age of 40. The classic symp-
toms include feeling tired and sick, frequent urination,
excessive thirst, excessive hunger, and weight loss.
Ketoacidosis, a condition due to starvation or
uncontrolled diabetes, is common in Type I diabetes.
Ketones are acid compounds that form in the blood when
the body breaks down fats and proteins. Symptoms
loss of feeling or sensation, and loss of autonomic func-
tions such as digestion, erection, bladder control and
sweating among others.
The longer a person has diabetes, the more likely the
development of one or mor

In [29]:
docs[5].page_content

'OTHER\nCenters for Disease Control. <http://www.cdc.gov/nccdphp/\nddt/ddthome.htm>.\n“Insulin-Dependent Diabetes.” National Institute of Diabetes\nand Digestive and Kidney Diseases. National Institutes of\nHealth, NIH Publication No. 94-2098.\n“Noninsulin-Dependent Diabetes.” National Institute of Dia-\nbetes and Digestive and Kidney Diseases. National Insti-\ntutes of Health, NIH Publication No. 92-241.\nAltha Roberts Edgren\nDiabetic control index see Glycosylated\nhemoglobin test\nDiabetic foot infections\nDefinition\nDiabetic foot infections are infections that can\ndevelop in the skin, muscles, or bones of the foot as a\nresult of the nerve damage and poor circulation that is\nassociated with diabetes.\nDescription\nPeople who have diabetes have a greater-than-average'

In [None]:
## testing the rag model 
# from langchain.messages import SystemMessage,HumanMessage

# message=[
#     SystemMessage(content="You are professional medical doctor and you have to answer using the provided context only\n\n"+context),
#     HumanMessage(content=query)
# ]


: 

In [30]:
prompt = f"""
You are a medical knowledge assistant AI.

Your role:
- Provide accurate, evidence-based medical information
- Use ONLY the provided context
- Do NOT guess or hallucinate
- If unsure, say "I don't have enough medical information"

STRICT SAFETY RULES:
- Do NOT give diagnosis
- Do NOT prescribe medicines
- Do NOT suggest dosage
- Do NOT replace a doctor
- Always recommend consulting a qualified healthcare professional

Context:
{context}

User Question:
{query}

Response Guidelines:
- Be clear and simple
- Use bullet points if needed
- Mention risks when relevant
- Avoid definitive claims

Answer:
"""


In [31]:
response = model.invoke(prompt)

print(response.content)

Symptoms of diabetes can include:
* Feeling tired and sick
* Frequent urination
* Excessive thirst
* Excessive hunger
* Weight loss

Symptoms can develop suddenly over days or weeks in children or adolescents, or gradually over several years in overweight adults over the age of 40.

Some medications can also cause diabetes symptoms. These include haloperidol, lithium carbonate, phenothiazines, tricyclic antidepressants, adrenergic agonists, isoniazid, nicotinic acid, cimetidine, and heparin.

It is important to consult a qualified healthcare professional for accurate diagnosis and management.


: 