<a href="https://colab.research.google.com/github/fallenscent22/Medical-Chatbot/blob/main/BioMistral_Medical_RAG_chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ***Bio Mistral Medical RAG Chatbot***

# Loading file from google drive

In [122]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# Installation

In [123]:
pip install langchain sentence-transformers chromadb llama-cpp-python langchain_community pypdf



# Imporitng libraries

In [124]:
from langchain.document_loaders import PyPDFDirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import SentenceTransformerEmbeddings
from langchain.vectorstores import Chroma
from langchain.llms import LlamaCpp
from langchain.chains import RetrievalQA, LLMChain

# Import the document

In [125]:
loader=PyPDFDirectoryLoader("/content/drive/MyDrive")
docs=loader.load()

In [126]:
len(docs)#no.of pages

98

In [127]:
docs[10]

Document(metadata={'source': '/content/drive/MyDrive/healthyheart.pdf', 'page': 7}, page_content='What Is Heart Disease? \nCoronary heart disease—often simply called heart disease—occurs\nwhen the arteries that supply blood to the heart muscle becomehardened and narrowed due to a buildup of plaque on the arteries’inner walls. Plaque is the accumulation of fat, cholesterol, and othersubstances. As plaque continues to build up in the arteries, bloodflow to the heart is reduced.\nHeart disease can lead to a heart attack. A heart attack happens\nwhen an artery becomes totally blocked with plaque, preventingvital oxygen and nutrients from getting to the heart. A heart attackcan cause permanent damage to the heart muscle.\nHeart disease is one of several cardiovascular diseases, which are\ndisorders of the heart and blood vessel system. Other cardiovasculardiseases include stroke, high blood pressure, and rheumatic heartdisease.\nSome people aren’t too concerned about heart disease because t

# chunking

In [128]:
text_splitter=RecursiveCharacterTextSplitter(chunk_size=300,chunk_overlap=50)
chunks=text_splitter.split_documents(docs)

In [None]:
len(chunks)

764

In [129]:
chunks[724]

Document(metadata={'source': '/content/drive/MyDrive/healthyheart.pdf', 'page': 87}, page_content='83Heart Health Is a Family AffairWhen it comes to heart health, what’s good for you is good for your')

In [130]:
chunks[701]

Document(metadata={'source': '/content/drive/MyDrive/healthyheart.pdf', 'page': 84}, page_content='or cheerful as usual during the first weeks after quitting, be gentle with yourself. Give yourself a chance to adjust to your new smoke-free lifestyle. Congratulate yourself for making a major, positive change in your life.\nMore Help forQuitting')

In [131]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("neuml/pubmedbert-base-embeddings")
embeddings = model.encode(sentences)
print(embeddings)


[[-0.5490218  -0.00991846 -0.263759   ... -0.15789421 -1.2998041
   0.80934775]
 [-1.042079    0.7897055   0.51802737 ... -0.59063715 -1.0819341
   0.50429934]]


# Embeddings creations

In [132]:
import os
os.environ['HUGGINGFACEHUB_API_TOKEN']="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"

In [133]:
embeddings=SentenceTransformerEmbeddings(model_name="neuml/pubmedbert-base-embeddings")

# updated version of langchain-hugging face

In [134]:
pip install -U langchain-huggingface



In [135]:
from langchain_huggingface import HuggingFaceEmbeddings


# vector store creation

In [136]:
vectorstore=Chroma.from_documents(chunks,embeddings)

In [137]:
query="who is at risk of heart disease?"
search_results=vectorstore.similarity_search(query)


In [138]:
search_results


[Document(metadata={'page': 8, 'source': '/content/drive/MyDrive/healthyheart.pdf'}, page_content='While each risk factor increases your risk of heart disease, having'),
 Document(metadata={'page': 8, 'source': '/content/drive/MyDrive/healthyheart.pdf'}, page_content='While each risk factor increases your risk of heart disease, having'),
 Document(metadata={'page': 8, 'source': '/content/drive/MyDrive/healthyheart.pdf'}, page_content='While each risk factor increases your risk of heart disease, having'),
 Document(metadata={'page': 8, 'source': '/content/drive/MyDrive/healthyheart.pdf'}, page_content='While each risk factor increases your risk of heart disease, having')]

In [139]:
retriever = vectorstore.as_retriever(search_kwargs={"k":5})

In [140]:
retriever.invoke(query)


[Document(metadata={'page': 8, 'source': '/content/drive/MyDrive/healthyheart.pdf'}, page_content='While each risk factor increases your risk of heart disease, having'),
 Document(metadata={'page': 8, 'source': '/content/drive/MyDrive/healthyheart.pdf'}, page_content='While each risk factor increases your risk of heart disease, having'),
 Document(metadata={'page': 8, 'source': '/content/drive/MyDrive/healthyheart.pdf'}, page_content='While each risk factor increases your risk of heart disease, having'),
 Document(metadata={'page': 8, 'source': '/content/drive/MyDrive/healthyheart.pdf'}, page_content='While each risk factor increases your risk of heart disease, having'),
 Document(metadata={'page': 5, 'source': '/content/drive/MyDrive/healthyheart.pdf'}, page_content='■As early as age 45, a man’s risk of heart disease begins to rise significantly. For a woman, risk starts to increase at age 55.\n■Fifty percent of men and 64 percent of women who die suddenlyof heart disease have no prev

# Loading LLM

In [141]:
llm= LlamaCpp(
         model_path="/content/drive/MyDrive/BioMistral-7B.Q2_K.gguf",
         temperature=0.2,
         max_tokens=2048,
         top_p=1
         )


llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from /content/drive/MyDrive/BioMistral-7B.Q2_K.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = hub
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_c

# Using LLM, Retriever and Query to generate finla response

In [142]:
template="""
<|context|>
You are a medical assistant that follows the instructions and generate accurate response based on query and context provided.
please be truthful and give direct answers.
</s>
<|user|>
{query}
</s>
<|assistant|>
"""


In [143]:
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser
from langchain.prompts import ChatPromptTemplate


In [144]:
prompt=ChatPromptTemplate.from_template(template)

In [145]:
rag_chain= (
    {"context": retriever, "query": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [146]:
response=rag_chain.invoke(query)


llama_print_timings:        load time =    5794.86 ms
llama_print_timings:      sample time =      51.82 ms /    78 runs   (    0.66 ms per token,  1505.27 tokens per second)
llama_print_timings: prompt eval time =   35677.28 ms /    71 tokens (  502.50 ms per token,     1.99 tokens per second)
llama_print_timings:        eval time =   48020.86 ms /    77 runs   (  623.65 ms per token,     1.60 tokens per second)
llama_print_timings:       total time =   83848.26 ms /   148 tokens


In [147]:
response

'The risk of heart disease is higher in people who have a family history of heart disease, are overweight or obese, have high blood pressure, smoke tobacco, have diabetes, and have high cholesterol. Other factors that can increase the risk of heart disease include having a diet high in saturated fats, not exercising regularly, and drinking too much alcohol.'

In [148]:
import sys
while True:
  user_input=input(f"Input Query: ")
  if user_input=='exit' :
    sys.exit()
    print("Exiting")

  if user_input=='':
    continue
  response=rag_chain.invoke(user_input)
  print(response)

Input Query: types of heart diseases


Llama.generate: 52 prefix-match hit, remaining 15 prompt tokens to eval

llama_print_timings:        load time =    5794.86 ms
llama_print_timings:      sample time =      63.73 ms /    91 runs   (    0.70 ms per token,  1427.81 tokens per second)
llama_print_timings: prompt eval time =    8789.17 ms /    15 tokens (  585.94 ms per token,     1.71 tokens per second)
llama_print_timings:        eval time =   56101.82 ms /    90 runs   (  623.35 ms per token,     1.60 tokens per second)
llama_print_timings:       total time =   65059.35 ms /   105 tokens


Heart diseases are a type of disease that affects the heart. The heart is a muscular organ that pumps blood through the body and provides oxygen and nutrients to all parts of the body. Heart failure occurs when the heart cannot pump blood as efficiently as it used to, or at all. This can lead to fatigue, shortness of breath, swollen ankles and feet, and a rapid or irregular heartbeat.
Input Query: exit


SystemExit: 

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)
