<a href="https://colab.research.google.com/github/sunnamsriram1/sriramBioMistrol/blob/main/sriramBioAIMistral_chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Build BioMistral Medical RAG Chatbot using BioMistral Open Source LLM

In the notebook we will build a Medical Chatbot with BioMistral LLM and Heart Health pdf file.

## Installation

In [1]:
!pip install langchain sentence-transformers chromadb llama-cpp-python langchain_community pypdf

Collecting langchain
  Downloading langchain-0.3.0-py3-none-any.whl.metadata (7.1 kB)
Collecting sentence-transformers
  Downloading sentence_transformers-3.1.0-py3-none-any.whl.metadata (23 kB)
Collecting chromadb
  Downloading chromadb-0.5.5-py3-none-any.whl.metadata (6.8 kB)
Collecting llama-cpp-python
  Downloading llama_cpp_python-0.2.90.tar.gz (63.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m63.8/63.8 MB[0m [31m12.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting langchain_community
  Downloading langchain_community-0.3.0-py3-none-any.whl.metadata (2.8 kB)
Collecting pypdf
  Downloading pypdf-5.0.0-py3-none-any.whl.metadata (7.4 kB)
Collecting langchain-core<0.4.0,>=0.3.0 (from langchain)
  Downloading langchain_core-0.3.1-py3-none-

## Import libraries

In [2]:
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain.text_splitter import CharacterTextSplitter,RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS, Chroma
from langchain_community.llms import LlamaCpp
from langchain.chains import RetrievalQA, LLMChain




In [3]:
import pathlib
import textwrap
from IPython.display import display
from IPython.display import Markdown



def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

In [4]:
# Used to securely store your API key
from google.colab import userdata

## Setup HuggingFace Access Token

- Log in to [HuggingFace.co](https://huggingface.co/)
- Click on your profile icon at the top-right corner, then choose [“Settings.”](https://huggingface.co/settings/)
- In the left sidebar, navigate to [“Access Token”](https://huggingface.co/settings/tokens)
- Generate a new access token, assigning it the “write” role.


In [6]:
# Or use `os.getenv('HUGGINGFACEHUB_API_TOKEN')` to fetch an environment variable.
import os
from getpass import getpass

HUGGINGFACEHUB_API_TOKEN = userdata.get("HUGGINGFACEHUB_API_TOKEN")
os.environ["HUGGINGFACEHUB_API_TOKEN"] = "HUGGINGFACEHUB_API_TOKEN"

## Import document

In [7]:
#connect to google drive
from google.colab import drive
drive.mount('/content/drive') #Fixed typo in directory name

Mounted at /content/drive


In [8]:
loader = PyPDFDirectoryLoader("/content/drive/MyDrive/Pdf")
docs = loader.load()

In [9]:
docs

[Document(metadata={'source': '/content/drive/MyDrive/Pdf/4159216.pdf', 'page': 0}, page_content="1.\nName (As per SSC / Equivalent certificate)\n:\nSUNNAM SEETHARAM\n2.\nFather's/ Husband's Name\n:\nS VEERRAJU\nPreserve the REGISTRATION NUMBER \nfor all your future correspondence.\n3.\nMother's Name\n:\nS SUBHADRA\n4.\nGender\n:\nMale\n5.\nDate of Birth\n(As per SSC/Equivalent certificate)\n:\n16-08-1996\n4159216\n6.\n(a) SSC/Equivalent Roll No.\n:\n1111127157\n(b) Year of Passing\n:\n2011\n11.\n7.\nUpload Certificate (SSC or Equivalent)\n:\nUploaded\n8.\nEducational Qualification\n:\nIntermediate  / Any other \nequivalent\n9.\nCommunity\n:\nABO-ST\n10.\nUpload Community Certificate\n(For BC, SC and ST)\n:\nUploaded\n13.\nDo you belong to EWS Category ?\n:\nNo\n14.\nUpload EWS Certificate\n(For OC)\n:\n---\n15.\nDo you belong to Ex-Servicemen ?\n:\nNo\nEx-Servicemen Joining Date\n:\n---\nEx-Servicemen Discharge Date / Likely Date \nof Discharge\n:\n---\n16.\nldentificaton Marks\n(As p

## Text Splitting - Chunking

In [10]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
chunks = text_splitter.split_documents(docs)

In [11]:
len(chunks)

573

In [12]:
chunks[0]

Document(metadata={'source': '/content/drive/MyDrive/Pdf/4159216.pdf', 'page': 0}, page_content="1.\nName (As per SSC / Equivalent certificate)\n:\nSUNNAM SEETHARAM\n2.\nFather's/ Husband's Name\n:\nS VEERRAJU\nPreserve the REGISTRATION NUMBER \nfor all your future correspondence.\n3.\nMother's Name\n:\nS SUBHADRA\n4.\nGender\n:\nMale\n5.\nDate of Birth\n(As per SSC/Equivalent certificate)\n:\n16-08-1996\n4159216\n6.")

In [13]:
chunks[1]

Document(metadata={'source': '/content/drive/MyDrive/Pdf/4159216.pdf', 'page': 0}, page_content=':\n16-08-1996\n4159216\n6.\n(a) SSC/Equivalent Roll No.\n:\n1111127157\n(b) Year of Passing\n:\n2011\n11.\n7.\nUpload Certificate (SSC or Equivalent)\n:\nUploaded\n8.\nEducational Qualification\n:\nIntermediate  / Any other \nequivalent\n9.\nCommunity\n:\nABO-ST\n10.\nUpload Community Certificate\n(For BC, SC and ST)\n:')

In [14]:
chunks[2]

Document(metadata={'source': '/content/drive/MyDrive/Pdf/4159216.pdf', 'page': 0}, page_content='(For BC, SC and ST)\n:\nUploaded\n13.\nDo you belong to EWS Category ?\n:\nNo\n14.\nUpload EWS Certificate\n(For OC)\n:\n---\n15.\nDo you belong to Ex-Servicemen ?\n:\nNo\nEx-Servicemen Joining Date\n:\n---\nEx-Servicemen Discharge Date / Likely Date \nof Discharge\n:\n---\n16.\nldentificaton Marks')

In [15]:
chunks[3]

Document(metadata={'source': '/content/drive/MyDrive/Pdf/4159216.pdf', 'page': 0}, page_content='of Discharge\n:\n---\n16.\nldentificaton Marks\n(As per SSC / Equivalent certificate)\n:\nA MOLE ON THE LEFT SIDE OF \nTHE CHEST\n:\nA MOLE ON THE RIGHT SIDE OF \nTHE RIBBS\n17.\nMobile No.\n:\n8688655324\n12.\n18.\nE-mail id\n:\nbtgsuryacomputers@gmail.c\nom\n19.\n(a) Do you have Aadhaar Card ?\n:\nYes')

In [16]:
chunks[4]

Document(metadata={'source': '/content/drive/MyDrive/Pdf/4159216.pdf', 'page': 0}, page_content='om\n19.\n(a) Do you have Aadhaar Card ?\n:\nYes\n(b) lf yes, please mention number\n:\n673915750575\n20.\nWhat is your preferred medium\nof Examination ?\n:\nTelugu\n21.\nWhich is your prefered place for \nwriting Preliminary written test ?\n(SLPRB, AP reserves the Right to \ninclude New Test Centres, if')

## Embeddings

In [17]:
embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-base-en-v1.5")

  embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-base-en-v1.5")
  from tqdm.autonotebook import tqdm, trange


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/777 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

## Vector Store - FAISS or ChromaDB

In [18]:
vectorstore = Chroma.from_documents(chunks, embeddings)

In [19]:
vectorstore

<langchain_community.vectorstores.chroma.Chroma at 0x7be1ddecc5b0>

In [20]:
query = "what is network" # what is at risk of heart disease
search = vectorstore.similarity_search(query)

In [21]:
to_markdown(search[0].page_content)

> A network is a group of two or more  computer systems or other devices 
> that are linked together to exchange data . In networks, computing devices 
> exchange data with each other using data links between nodes. These data links

## Retriever

In [22]:
retriever = vectorstore.as_retriever(
    search_kwargs={'k': 5}
)

In [23]:
retriever.get_relevant_documents(query)

  retriever.get_relevant_documents(query)


[Document(metadata={'page': 1, 'source': '/content/drive/MyDrive/Pdf/0.Network-Basics-Theory-1.pdf'}, page_content='A network is a group of two or more  computer systems or other devices \nthat are linked together to exchange data . In networks, computing devices \nexchange data with each other using data links between nodes. These data links'),
 Document(metadata={'page': 1, 'source': '/content/drive/MyDrive/Pdf/0.Network-Basics-Theory-1.pdf'}, page_content='Wide area network (WAN)  \nA WAN  is a computer network that covers a large geographic area such \nas a city, country, or spans even inter continental distances. A WAN uses a \ncommunications channel that combines many types of media such as telephone \nlines,  ethernet cables, optical fibers , etc.'),
 Document(metadata={'page': 1, 'source': '/content/drive/MyDrive/Pdf/0.Network-Basics-Theory-1.pdf'}, page_content='communication hub for users to connect all their wireless device.  \n \nTypes of Networks  \n \nLocal area network (

## Large Language Model - Open Source

In [24]:
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [27]:
llm = LlamaCpp(
    model_path= "/content/drive/MyDrive/Pdf/Bio/BioMistral-7B.Q4_K_M.gguf",
    temperature=0.3,
    max_tokens=2048,
    top_p=1)

llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from /content/drive/MyDrive/Pdf/Bio/BioMistral-7B.Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = hub
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attent

## RAG Chain

In [28]:
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser
from langchain.prompts import ChatPromptTemplate

In [29]:
template = """
<|context|>
You are an AI assistant that follows instruction extremely well.
Please be truthful and give direct answers
</s>
<|user|>
{query}
</s>
 <|assistant|>
"""

In [30]:
prompt = ChatPromptTemplate.from_template(template)

In [31]:
rag_chain = (
    {"context": retriever,  "query": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [32]:
response = rag_chain.invoke("what disease affect the heart?")


llama_print_timings:        load time =    5043.90 ms
llama_print_timings:      sample time =      36.71 ms /    66 runs   (    0.56 ms per token,  1797.78 tokens per second)
llama_print_timings: prompt eval time =   27075.40 ms /    58 tokens (  466.82 ms per token,     2.14 tokens per second)
llama_print_timings:        eval time =   43117.69 ms /    65 runs   (  663.35 ms per token,     1.51 tokens per second)
llama_print_timings:       total time =   70286.95 ms /   123 tokens


In [33]:
to_markdown(response)

> Hypertrophic cardiomyopathy, dilated cardiomyopathy, myocarditis, aortic stenosis, hypertensive heart disease, restrictive cardiomyopathy, arrhythmogenic right ventricular cardiomyopathy, and endocarditis.

In [None]:
import sys

while True:
  user_input = input(f"Input Prompt: ")
  if user_input == 'exit':
    print('Exiting')
    sys.exit()
  if user_input == '':
    continue
  result = rag_chain.invoke(user_input)
  print("Answer: ",result)

Input Prompt: what is python3


Llama.generate: 42 prefix-match hit, remaining 14 prompt tokens to eval

llama_print_timings:        load time =    5043.90 ms
llama_print_timings:      sample time =      41.05 ms /    71 runs   (    0.58 ms per token,  1729.64 tokens per second)
llama_print_timings: prompt eval time =    5388.88 ms /    14 tokens (  384.92 ms per token,     2.60 tokens per second)
llama_print_timings:        eval time =   46104.13 ms /    70 runs   (  658.63 ms per token,     1.52 tokens per second)
llama_print_timings:       total time =   51600.04 ms /    84 tokens


Answer:   Python3 is a general-purpose programming language with dynamic semantics. It emphasizes code readability through the use of expressive syntax and a uniform syntax style, which results in more concise, clear, and easily readable code. It also has a large standard library and supports third-party libraries through the package management system pip.
Input Prompt: who are you


Llama.generate: 41 prefix-match hit, remaining 14 prompt tokens to eval

llama_print_timings:        load time =    5043.90 ms
llama_print_timings:      sample time =      32.93 ms /    58 runs   (    0.57 ms per token,  1761.20 tokens per second)
llama_print_timings: prompt eval time =    5345.21 ms /    14 tokens (  381.80 ms per token,     2.62 tokens per second)
llama_print_timings:        eval time =   37958.74 ms /    57 runs   (  665.94 ms per token,     1.50 tokens per second)
llama_print_timings:       total time =   43384.56 ms /    71 tokens


Answer:  I am a large language model trained by OpenAI. I can understand text and generate text based on the input text. I have been trained on a wide variety of data, including text from websites, books, and other sources. I can answer questions and provide information on many topics.
Input Prompt: who is ramcharantej


Llama.generate: 42 prefix-match hit, remaining 16 prompt tokens to eval

llama_print_timings:        load time =    5043.90 ms
llama_print_timings:      sample time =      57.98 ms /    98 runs   (    0.59 ms per token,  1690.24 tokens per second)
llama_print_timings: prompt eval time =    6096.92 ms /    16 tokens (  381.06 ms per token,     2.62 tokens per second)
llama_print_timings:        eval time =   64576.53 ms /    97 runs   (  665.74 ms per token,     1.50 tokens per second)
llama_print_timings:       total time =   70820.28 ms /   113 tokens


Answer:  Ram Charan Tej is a Nepali politician who has been the President of the Rashtriya Prajnatshan Party since its formation in 2017. He was elected as the President of the Federal People's Congress (FPC) from 2018 to 2023. He served as the Deputy Prime Minister and Home Minister of Nepal from May 2018 to August 2020.
Input Prompt: show me india time 


Llama.generate: 41 prefix-match hit, remaining 17 prompt tokens to eval

llama_print_timings:        load time =    5043.90 ms
llama_print_timings:      sample time =      33.33 ms /    58 runs   (    0.57 ms per token,  1739.97 tokens per second)
llama_print_timings: prompt eval time =    6327.10 ms /    16 tokens (  395.44 ms per token,     2.53 tokens per second)
llama_print_timings:        eval time =   38671.49 ms /    58 runs   (  666.75 ms per token,     1.50 tokens per second)
llama_print_timings:       total time =   45082.09 ms /    74 tokens


Answer:  The current time in New Delhi, India is 10:32 AM. The weather in New Delhi at this time is mostly clear with some clouds. The temperature is 28°C and humidity is 64%. Wind speed is 5 km/hr.
