<a href="https://colab.research.google.com/github/sunnamsriram1/-Apps/blob/main/sriramBioMistral_chatbot_ipynb_%E0%B0%95%E0%B0%BE%E0%B0%AA%E0%B1%80.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Build BioMistral Medical RAG Chatbot using BioMistral Open Source LLM

In the notebook we will build a Medical Chatbot with BioMistral LLM and Heart Health pdf file.

In [None]:
# Use a pipeline as a high-level helper
from transformers import pipeline

messages = [
    {"role": "user", "content": "Who are you?"},
    ]
    pipe = pipeline("text-generation", model="mattshumer/Reflection-Llama-3.1-70B")
    pipe(messages)

## Installation

In [None]:
!pip install langchain sentence-transformers chromadb llama-cpp-python langchain_community pypdf



## Import libraries

In [None]:
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain.text_splitter import CharacterTextSplitter,RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS, Chroma
from langchain_community.llms import LlamaCpp
from langchain.chains import RetrievalQA, LLMChain

In [None]:
import pathlib
import textwrap
from IPython.display import display
from IPython.display import Markdown



def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

In [None]:
# Used to securely store your API key
from google.colab import userdata

## Setup HuggingFace Access Token

- Log in to [HuggingFace.co](https://huggingface.co/)
- Click on your profile icon at the top-right corner, then choose [“Settings.”](https://huggingface.co/settings/)
- In the left sidebar, navigate to [“Access Token”](https://huggingface.co/settings/tokens)
- Generate a new access token, assigning it the “write” role.


In [None]:
# Or use `os.getenv('HUGGINGFACEHUB_API_TOKEN')` to fetch an environment variable.
import os
from getpass import getpass

HUGGINGFACEHUB_API_TOKEN = userdata.get("HUGGINGFACEHUB_API_TOKEN")
os.environ["HUGGINGFACEHUB_API_TOKEN"] = "HUGGINGFACEHUB_API_TOKEN"

## Import document

In [None]:
#connect to google drive
from google.colab import drive
drive.mount('/content/drive') #Fixed typo in directory name

Mounted at /content/drive


In [None]:
loader = PyPDFDirectoryLoader("/content/drive/MyDrive/Pdf")
docs = loader.load()



In [None]:
docs

[Document(metadata={'source': '/content/drive/MyDrive/Pdf/1-b5cd0e66-cb91-4188-9d19-821a15c5f68c.pdf', 'page': 0}, page_content='Sunnam Seetharam\nIntroduction to Arti\x00cial IntelligenceThe certi\x00cate is awarded to\nfor successfully completing the course\non September 14, 2023\nIssued on: Friday, September 15, 2023\nTo verify, scan the QR code at https://verify.onwingspan.com'),
 Document(metadata={'source': '/content/drive/MyDrive/Pdf/1-b5cd0e66-cb91-4188-9d19-821a15c5f68c (2).pdf', 'page': 0}, page_content='Sunnam Seetharam\nIntroduction to Arti\x00cial IntelligenceThe certi\x00cate is awarded to\nfor successfully completing the course\non September 14, 2023\nIssued on: Friday, September 15, 2023\nTo verify, scan the QR code at https://verify.onwingspan.com'),
 Document(metadata={'source': '/content/drive/MyDrive/Pdf/ISEA Digital Certificate (6).pdf', 'page': 0}, page_content='SunnamSeetharamMeitY/ISEA/WCHP/031912 Certificate No :\n \n19-09-2023 Issued Date :'),
 Document(metada

## Text Splitting - Chunking

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
chunks = text_splitter.split_documents(docs)

In [None]:
len(chunks)

4824

In [None]:
chunks[0]

Document(metadata={'source': '/content/drive/MyDrive/Pdf/1-b5cd0e66-cb91-4188-9d19-821a15c5f68c.pdf', 'page': 0}, page_content='Sunnam Seetharam\nIntroduction to Arti\x00cial IntelligenceThe certi\x00cate is awarded to\nfor successfully completing the course\non September 14, 2023\nIssued on: Friday, September 15, 2023\nTo verify, scan the QR code at https://verify.onwingspan.com')

In [None]:
chunks[1]

Document(metadata={'source': '/content/drive/MyDrive/Pdf/1-b5cd0e66-cb91-4188-9d19-821a15c5f68c (2).pdf', 'page': 0}, page_content='Sunnam Seetharam\nIntroduction to Arti\x00cial IntelligenceThe certi\x00cate is awarded to\nfor successfully completing the course\non September 14, 2023\nIssued on: Friday, September 15, 2023\nTo verify, scan the QR code at https://verify.onwingspan.com')

In [None]:
chunks[2]

Document(metadata={'source': '/content/drive/MyDrive/Pdf/ISEA Digital Certificate (6).pdf', 'page': 0}, page_content='SunnamSeetharamMeitY/ISEA/WCHP/031912 Certificate No :\n \n19-09-2023 Issued Date :')

In [None]:
chunks[3]

Document(metadata={'source': '/content/drive/MyDrive/Pdf/ISEA Digital Certificate (7).pdf', 'page': 0}, page_content='SunnamSeetharamISEA/NCSAM/PWDSEC/50984\nPassword Security 80%')

In [None]:
chunks[4]

Document(metadata={'source': '/content/drive/MyDrive/Pdf/ISEA Digital Certificate (9).pdf', 'page': 0}, page_content='SunnamSeetharamISEA/NCSAM/FABSEC/35161\nFacebook Security 68%')

In [None]:
embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-base-en-v1.5")

  embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-base-en-v1.5")
  from tqdm.autonotebook import tqdm, trange


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/777 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

## Embeddings

## Vector Store - FAISS or ChromaDB

In [24]:
vectorstore = Chroma.from_documents(chunks, embeddings)

In [25]:
vectorstore

<langchain_community.vectorstores.chroma.Chroma at 0x7af62a25ae00>

In [26]:
query = "SunnamSriram" # what is at risk of heart disease
search = vectorstore.similarity_search(query)

In [27]:
to_markdown(search[0].page_content)

> SunnamSeetharamISEA/NCSAM/CYBPAT/14398 Certificate No :
>  
> 20-09-2023 Issued Date :

## Retriever

In [28]:
retriever = vectorstore.as_retriever(
    search_kwargs={'k': 5}
)

In [29]:
retriever.get_relevant_documents(query)

  retriever.get_relevant_documents(query)


[Document(metadata={'page': 0, 'source': '/content/drive/MyDrive/Pdf/ISEA Digital Certificate (14).pdf'}, page_content='SunnamSeetharamISEA/NCSAM/CYBPAT/14398 Certificate No :\n \n20-09-2023 Issued Date :'),
 Document(metadata={'page': 0, 'source': '/content/drive/MyDrive/Pdf/4159216 (2).pdf'}, page_content="1.\nName (As per SSC / Equivalent certificate)\n:\nSUNNAM SEETHARAM\n2.\nFather's/ Husband's Name\n:\nS VEERRAJU\nPreserve the REGISTRATION NUMBER \nfor all your future correspondence.\n3.\nMother's Name\n:\nS SUBHADRA\n4.\nGender\n:\nMale\n5.\nDate of Birth\n(As per SSC/Equivalent certificate)\n:\n16-08-1996\n4159216\n6."),
 Document(metadata={'page': 0, 'source': '/content/drive/MyDrive/Pdf/4159216 (1) (1).pdf'}, page_content="1.\nName (As per SSC / Equivalent certificate)\n:\nSUNNAM SEETHARAM\n2.\nFather's/ Husband's Name\n:\nS VEERRAJU\nPreserve the REGISTRATION NUMBER \nfor all your future correspondence.\n3.\nMother's Name\n:\nS SUBHADRA\n4.\nGender\n:\nMale\n5.\nDate of Birt

## Large Language Model - Open Source

In [30]:
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [31]:
llm = LlamaCpp(
    model_path= "/content/drive/MyDrive/Pdf/Bio/BioMistral-7B.Q4_K_M.gguf",
    temperature=0.3,
    max_tokens=2048,
    top_p=1)

llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from /content/drive/MyDrive/Pdf/Bio/BioMistral-7B.Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = hub
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attent

## RAG Chain

In [32]:
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser
from langchain.prompts import ChatPromptTemplate

In [33]:
template = """
<|context|>
You are an AI assistant that follows instruction extremely well.
Please be truthful and give direct answers
</s>
<|user|>
{query}
</s>
 <|assistant|>
"""

In [34]:
prompt = ChatPromptTemplate.from_template(template)

In [35]:
rag_chain = (
    {"context": retriever,  "query": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [None]:
response = rag_chain.invoke("what disease affect the heart?")




In [None]:
to_markdown(response)

In [None]:
import sys

while True:
  user_input = input(f"Input Prompt: ")
  if user_input == 'exit':
    print('Exiting')
    sys.exit()
  if user_input == '':
    continue
  result = rag_chain.invoke(user_input)
  print("Answer: ",result)

Llama.generate: 41 prefix-match hit, remaining 12 prompt tokens to eval

llama_print_timings:        load time =    6511.65 ms
llama_print_timings:      sample time =       6.32 ms /    10 runs   (    0.63 ms per token,  1583.28 tokens per second)
llama_print_timings: prompt eval time =    5146.58 ms /    12 tokens (  428.88 ms per token,     2.33 tokens per second)
llama_print_timings:        eval time =    7031.48 ms /     9 runs   (  781.28 ms per token,     1.28 tokens per second)
llama_print_timings:       total time =   12201.91 ms /    21 tokens


Answer:  Hi! How can I assist you today?


Llama.generate: 41 prefix-match hit, remaining 15 prompt tokens to eval

llama_print_timings:        load time =    6511.65 ms
llama_print_timings:      sample time =      17.05 ms /    25 runs   (    0.68 ms per token,  1465.85 tokens per second)
llama_print_timings: prompt eval time =    7231.32 ms /    15 tokens (  482.09 ms per token,     2.07 tokens per second)
llama_print_timings:        eval time =   18407.99 ms /    24 runs   (  767.00 ms per token,     1.30 tokens per second)
llama_print_timings:       total time =   25685.44 ms /    39 tokens


Answer:  I am just a machine learning model, I don't have feelings or emotions. How can I assist you today?


Llama.generate: 41 prefix-match hit, remaining 16 prompt tokens to eval

llama_print_timings:        load time =    6511.65 ms
llama_print_timings:      sample time =      37.88 ms /    56 runs   (    0.68 ms per token,  1478.16 tokens per second)
llama_print_timings: prompt eval time =    9439.89 ms /    16 tokens (  589.99 ms per token,     1.69 tokens per second)
llama_print_timings:        eval time =   40740.17 ms /    55 runs   (  740.73 ms per token,     1.35 tokens per second)
llama_print_timings:       total time =   50279.62 ms /    71 tokens


Answer:  I am following the instructions given to me by the user. I am an open-ended language model, so I can answer any question that a human might ask me. I am programmed to be as helpful as possible and provide accurate answers to the best of my ability.


Llama.generate: 41 prefix-match hit, remaining 17 prompt tokens to eval

llama_print_timings:        load time =    6511.65 ms
llama_print_timings:      sample time =      51.32 ms /    61 runs   (    0.84 ms per token,  1188.57 tokens per second)
llama_print_timings: prompt eval time =    9405.88 ms /    16 tokens (  587.87 ms per token,     1.70 tokens per second)
llama_print_timings:        eval time =   58046.00 ms /    61 runs   (  951.57 ms per token,     1.05 tokens per second)
llama_print_timings:       total time =   67574.79 ms /    77 tokens


Answer:  Yes, I am familiar with Telugu. It is a language spoken in the states of Andhra Pradesh, Telangana, Karnataka, Maharashtra and Madhya Pradesh in India. Is there something specific you would like to know about Telugu?
Input Prompt: Tell me my name


Llama.generate: 41 prefix-match hit, remaining 15 prompt tokens to eval

llama_print_timings:        load time =    6511.65 ms
llama_print_timings:      sample time =      10.80 ms /    17 runs   (    0.64 ms per token,  1573.78 tokens per second)
llama_print_timings: prompt eval time =   12113.46 ms /    15 tokens (  807.56 ms per token,     1.24 tokens per second)
llama_print_timings:        eval time =   11685.09 ms /    16 runs   (  730.32 ms per token,     1.37 tokens per second)
llama_print_timings:       total time =   23828.35 ms /    31 tokens


Answer:  I am unable to find your name. Can you please tell me your name?
