------------

# **Experimentation for Medical Chatbot**

-------------

### **Importing Libraries**

In [2]:
import os
import warnings
from dotenv import load_dotenv
from tqdm import tqdm
from langchain_core.prompts import PromptTemplate
from langchain.chains import RetrievalQA
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_pinecone import PineconeVectorStore
from langchain_community.llms import CTransformers
from langchain_community.document_loaders import PyPDFLoader, DirectoryLoader
from pinecone import Pinecone, ServerlessSpec

warnings.simplefilter("ignore", FutureWarning)
load_dotenv()

api_key = os.getenv("PINECONE_API_KEY")
index_name = "medical-chatbot-implementation-extended"
pc = Pinecone(api_key=api_key)

### **Setting Up New Pinecode Index or Initialize Pinecone index**

In [3]:
if index_name not in pc.list_indexes().names():
    pc.create_index(
        name=index_name,
        dimension=384,
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-east-1")
    )

### **Extract and Load the Data from PDF**

In [4]:
def load_pdf(data):
    loader = DirectoryLoader(data, glob="*.pdf", loader_cls=PyPDFLoader)
    documents = loader.load()
    return documents

extracted_data = load_pdf(r"E:\Practice python\Krish Naik\End to end Medical Chatbot Implementation\data")
#extracted_data

### **Let's Split Text into Chunks/Documents**

In [5]:
def text_split(extracted_data):
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
    text_chunks = text_splitter.split_documents(extracted_data)
    return text_chunks

text_chunks = text_split(extracted_data)

### **Let's Download Embeddings Model**

In [6]:
def download_huggingface_embedding_model():
    load_dotenv()
    os.environ["HF_TOKEN"] = os.getenv("HF_TOKEN")
    embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
    return embeddings

embeddings = download_huggingface_embedding_model()
embeddings

### **Generate embeddings**

In [7]:
model = embeddings.client
embeddings_list = model.encode([t.page_content for t in text_chunks]).tolist()

### **Let's Upsert Embeddings in Pinecone**

In [8]:
def batch_upsert(index, text_chunks, embeddings, batch_size=100):
    for i in tqdm(range(0, len(embeddings), batch_size)):
        i_end = min(i+batch_size, len(embeddings))
        batch = list(zip(
            [str(j) for j in range(i, i_end)],
            embeddings[i:i_end],
            [{"text": chunk.page_content} for chunk in text_chunks[i:i_end]]
        ))
        index.upsert(vectors=batch)

index = pc.Index(index_name)
batch_upsert(index, text_chunks, embeddings_list)

100%|██████████| 71/71 [12:55<00:00, 10.92s/it]


### **Query the Pinecone index**

In [9]:
query = "What are Allergies"
xq = model.encode([query]).tolist()[0]

# Perform the similarity search
res = index.query(vector=xq, top_k=3, include_metadata=True)

# Extract and print the results
for match in res['matches']:
    print(f"{match['score']:.2f}: {match['metadata']['text']}")

0.68: GALE ENCYCLOPEDIA OF MEDICINE 2 117Allergies
Allergic rhinitis is commonly triggered by
exposure to household dust, animal fur,or pollen. The foreign substance thattriggers an allergic reaction is calledan allergen.
The presence of an allergen causes the
body's lymphocytes to begin producingIgE antibodies. The lymphocytes of an allergy sufferer produce an unusuallylarge amount of IgE.
IgE molecules attach to mast
cells, which contain histamine.HistaminePollen grains
Lymphocyte
FIRST EXPOSURE
0.68: allergens are the following:
• plant pollens
• animal fur and dander
• body parts from house mites (microscopic creatures
found in all houses)
• house dust• mold spores• cigarette smoke• solvents• cleaners
Common food allergens include the following:
• nuts, especially peanuts, walnuts, and brazil nuts
• fish, mollusks, and shellfish• eggs• wheat• milk• food additives and preservatives
The following types of drugs commonly cause aller-
gic reactions:
• penicillin or other antibiotics
0.

### **Initialize PineconeVectorStore**

In [10]:
vectorstore = PineconeVectorStore(index_name=index_name, embedding=embeddings)

### **Let's Define a Prompt Template to Answer Questions**

In [11]:
prompt_template = """
Use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context: {context}
Question: {question}

Only return the helpful answer below and nothing else.
Helpful answer:
"""
PROMPT = PromptTemplate(template=prompt_template, input_variables=["context", "question"])

### **Initialize the CTransformers model**

In [12]:
llm = CTransformers(model = r"E:\Practice python\Krish Naik\End to end Medical Chatbot Implementation\models\llama-2-7b-chat.ggmlv3.q4_0.bin",
                    model_type="llama",
                    config={'max_new_tokens':512,
                            'temperature':0.8})

### **Initialize the RetrievalQA chain**

In [13]:
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever(search_kwargs={'k': 2}),
    return_source_documents=True,
    chain_type_kwargs={"prompt": PROMPT}
)

### **Query loop**

In [15]:
while True:
    user_input = input("Input Prompt: ")
    if user_input.lower() == 'q':
        print("Quitting the program.")
        break
    result = qa({"query": user_input})
    print("Response:", result["result"])


Response: Asthma is a chronic respiratory disease characterized by inflammation of the airways, which can cause recurring episodes of wheezing, coughing, chest tightness, and shortness of breath. The exact cause of asthma is not fully understood, but it is believed to involve a combination of genetic and environmental factors. Common triggers for asthma attacks include exposure to allergens such as pollen, dust, and pet dander, as well as stress, exercise, and cold temperatures. Treatment options for asthma include medications such as inhalers and nebulizers, which help to manage symptoms and prevent attacks from occurring. In severe cases, hospitalization may be necessary to provide more intensive treatment.
Quitting the program.
