# Project Summary: GaleMed Insights – A RAG-Based Medical Chatbot Using Pinecone and LLMs

---
## Project Overview
GaleMed Insights is an advanced medical chatbot built using a Retrieval-Augmented Generation (RAG) framework, leveraging the wealth of medical knowledge contained in The Gale Encyclopedia of Medicine, which spans over 637 pages. By combining PDF processing, semantic embedding, and vector databases, GaleMed Insights offers accurate, detailed, and user-friendly responses to medical inquiries. This makes it a valuable and accessible resource for individuals seeking trustworthy health information.

With its foundation in The Gale Encyclopedia of Medicine, GaleMed Insights ensures users can access well-researched, vetted content on a wide range of medical topics, from conditions and treatments to best healthcare practices. The chatbot empowers users to easily navigate complex medical concepts and obtain clear, relevant answers.

---

## Why Use RAG-Based Methodology Instead of Just LLMs?
Using a Retrieval-Augmented Generation (RAG) methodology offers significant advantages over traditional large language models (LLMs) alone:

#### Higher Accuracy and Reliability:
GaleMed Insights retrieves factual information directly from The Gale Encyclopedia of Medicine, ensuring that responses are factually accurate and relevant. This drastically reduces the risk of generating incorrect information (known as "hallucinations"), which is a common issue when using LLMs on their own.

#### Targeted Knowledge Base:
Instead of relying on broad datasets, GaleMed Insights focuses on a specific medical knowledge base, ensuring that users receive expert, specialized responses. This contrasts with general LLMs, which may provide superficial or irrelevant answers to health-related queries.

#### Contextual Relevance:
Through chunking and semantic embedding, GaleMed Insights retrieves the most relevant information while maintaining the necessary context, which leads to clearer and more coherent responses tailored to the user's needs.

#### Efficient Memory Management:
Dividing the text into chunks avoids running into token limits of LLMs, allowing the model to process large documents effectively without losing important details from the encyclopedia.

#### Improved User Experience:
With precise, well-formatted responses, GaleMed Insights enhances the readability and accessibility of complex medical information, making it easier for users to understand and act on the information provided.

---
## Why This Project?
The GaleMed Insights project is designed to address the growing need for reliable, accurate, and accessible medical information. In a world where misinformation spreads easily, especially on health-related topics, GaleMed Insights offers a trusted source of knowledge from The Gale Encyclopedia of Medicine. Key reasons for developing this project include:

#### Improving Health Literacy:
GaleMed Insights empowers individuals to make informed decisions about their health by providing clear and factual medical information.

#### Supporting Healthcare Professionals:
The chatbot serves as a quick reference tool for healthcare providers, giving them rapid access to accurate medical information to assist in patient care.

#### Easy Knowledge Base Updates:
As medical knowledge evolves, GaleMed Insights can easily be updated with new data, ensuring the chatbot remains current and provides the latest medical insights.

---
## Steps: 
### Document Reading with PyPDF:
The project utilizes the pyPDF library to read and extract data from The Gale Encyclopedia of Medicine. This library efficiently handles large documents, facilitating the structured extraction of valuable medical information.

### Data Chunking:
After extracting the data, it is crucial to divide the information into manageable chunks. Chunking helps to:

- Improve retrieval efficiency by enabling quick access to relevant sections.

- Enhance context provided in responses, allowing the model to generate more meaningful answers.

- Mitigate issues related to the maximum token limits of language models by ensuring that only concise and relevant information is processed at any given time.

### Creating Semantic Embeddings:

- To facilitate meaningful queries and responses, we employ the HuggingFaceEmbeddings model (sentence-transformers/all-MiniLM-L6-v2). This model generates semantic embeddings, which capture the context and meaning of the text beyond surface-level words. These embeddings enable GaleMed to comprehend user queries more effectively and retrieve the most relevant information.

### Building a Vector Database with Pinecone:
The next step involves creating a vector database using Pinecone. We load the chunked and embedded data into Pinecone, establishing a knowledge base that can efficiently manage user queries by identifying similar vectors and retrieving pertinent information.

### Testing Queries:
Once the knowledge base is established, we conduct test queries to evaluate the performance of GaleMed. This testing phase ensures that the chatbot retrieves accurate and relevant information effectively, confirming the effectiveness of the RAG approach.

### Utilizing LLM for Response Formatting:
To enhance the clarity and readability of the information returned to users, we utilize a large language model (LLM) based on the LLaMA architecture (CTransformers). This offline version processes the extracted data and formats it into coherent responses, making complex medical information accessible and understandable.

### Flask Application Development:
Finally, we aim to develop a Flask web application that serves as the interface for GaleMed. This application will allow users to interact with the chatbot seamlessly, posing questions and receiving comprehensive answers based on the medical encyclopedia.

---
## Real-World Use Cases of RAG Methodology in Other Fields
This RAG-based approach is not only useful in healthcare but also has real-world applications in various fields, such as:

#### Legal Industry:
Legal research chatbots can use RAG to pull up relevant laws, precedents, and case summaries from specific legal databases, providing accurate and contextual legal advice.

#### Customer Support:
Companies can use RAG to power customer service bots that retrieve relevant product information and troubleshooting steps from their internal knowledge bases, enhancing customer experience with quicker and more accurate responses.

#### Education:
RAG-based educational bots can assist students by providing targeted explanations and detailed answers based on textbooks, enhancing their learning experiences.

#### Technical Documentation:
Tech companies can use RAG to help engineers and developers quickly retrieve information from large technical manuals or knowledge bases to solve problems efficiently.

---

By using a RAG framework such as GaleMed Insights, we can enable and ensure that users get precise, relevant information from a trusted source, improving the overall accuracy and reliability of responses across various industries.

Data got from https://www.academia.edu/32752835/The_GALE_ENCYCLOPEDIA_of_MEDICINE_SECOND_EDITION

In [2]:
print("OK!")

OK!


In [3]:
import os
import time
from pinecone import Pinecone, ServerlessSpec
from langchain import PromptTemplate
from langchain.chains import RetrievalQA
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Pinecone as LangchainPinecone  # Using alias for LangChain Pinecone
import pinecone
from langchain.document_loaders import PyPDFLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.prompts import PromptTemplate
from langchain.llms import CTransformers


  from tqdm.autonotebook import tqdm


In [4]:
# from pinecone import Pinecone

pc = Pinecone(api_key="Your-Pine-Cone-Key")
pc.list_indexes()

{'indexes': [{'deletion_protection': 'disabled',
              'dimension': 384,
              'host': 'medical-vector-3fsvey1.svc.aped-4627-b74a.pinecone.io',
              'metric': 'cosine',
              'name': 'medical-vector',
              'spec': {'serverless': {'cloud': 'aws', 'region': 'us-east-1'}},
              'status': {'ready': True, 'state': 'Ready'}},
             {'deletion_protection': 'disabled',
              'dimension': 384,
              'host': 'abcd-3fsvey1.svc.aped-4627-b74a.pinecone.io',
              'metric': 'cosine',
              'name': 'abcd',
              'spec': {'serverless': {'cloud': 'aws', 'region': 'us-east-1'}},
              'status': {'ready': True, 'state': 'Ready'}}]}

### Vector stores are there in the Pinecone that we created

In [9]:
#Extract data from the PDF
def load_pdf(data):
    loader = DirectoryLoader(data,
                    glob="*.pdf",
                    loader_cls=PyPDFLoader)
    
    documents = loader.load()

    return documents

In [10]:
extracted_data = load_pdf("data/")

In [5]:
#extracted_data

In [6]:
#Create text chunks
def text_split(extracted_data):
    text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500, chunk_overlap = 20)
    text_chunks = text_splitter.split_documents(extracted_data)

    return text_chunks

In [7]:
text_chunks = text_split(extracted_data)
print("length of my chunk:", len(text_chunks))

length of my chunk: 7020


In [8]:
# text_chunks

In [14]:
#download embedding model
def download_hugging_face_embeddings():
    embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
    return embeddings

In [15]:
embeddings = download_hugging_face_embeddings()

Error while downloading from https://cdn-lfs.hf.co/sentence-transformers/all-MiniLM-L6-v2/53aa51172d142c89d9012cce15ae4d6cc0ca6895895114379cacb4fab128d9db?response-content-disposition=inline%3B+filename*%3DUTF-8%27%27model.safetensors%3B+filename%3D%22model.safetensors%22%3B&Expires=1728451856&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcyODQ1MTg1Nn19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2RuLWxmcy5oZi5jby9zZW50ZW5jZS10cmFuc2Zvcm1lcnMvYWxsLU1pbmlMTS1MNi12Mi81M2FhNTExNzJkMTQyYzg5ZDkwMTJjY2UxNWFlNGQ2Y2MwY2E2ODk1ODk1MTE0Mzc5Y2FjYjRmYWIxMjhkOWRiP3Jlc3BvbnNlLWNvbnRlbnQtZGlzcG9zaXRpb249KiJ9XX0_&Signature=Gxuz6CGvkfBklZ2-b3blQMW97%7EjDZUdscugEI%7EdCtxjTnT5W9zka8Ygze-PzWhuwUA%7EXEFwRf%7EPtQzJJ31wf8o%7ET5qKj76wxlrGq%7EDKBk-i3A7i3EGebKFmVzGlcpOLMJGsC70MYUJN2bUXMYvKPP6Xjj4ipmoTyWjiJVZfz717PtICtIZSwBXGxSYKLGQlICj-ovrz3fgGHgh58OAykAOwbGiQJYJXZVyV5hcvG41VCPnVs5Z%7EiglVyh%7EJ5IW6axWQ8dV9O1kT1ojaRu2j7uyJd4e4VGEqwJgYjJbk80Ft%7E4XY2R1eBbTs-dLtAI3bzl%7ET9e2aRJZEU1ttwEwO

In [16]:
embeddings

HuggingFaceEmbeddings(client=SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
), model_name='sentence-transformers/all-MiniLM-L6-v2', cache_folder=None, model_kwargs={}, encode_kwargs={}, multi_process=False, show_progress=False)

In [17]:
query_result = embeddings.embed_query("Hello world")
print("Length", len(query_result))

Length 384


In [13]:
# query_result

### Writing the Data into Pinecone

In [5]:
#Initializing index name and the Pinecone

os.environ["PINECONE_API_KEY"] = "Your-Pinecone-API-Key"

index_name="medical-vector"

# Initialize Pinecone with optional parameters
try:
    pc = Pinecone(
        api_key=os.environ.get("PINECONE_API_KEY"),
        proxy_url=None,            # Example optional parameter
        proxy_headers=None,        # Example optional parameter
        ssl_ca_certs=None,        # Example optional parameter
        ssl_verify=True,  # Example optional parameter, usually set to True
    )
    
    time.sleep(2)  # Optional sleep to ensure initialization completes

    # Check if the index exists
    indexes = pc.list_indexes()  # List of index names
    index_names = indexes.names()  # Get only the names of the indexes

    if index_name not in index_names:
        print(f'{index_name} does not exist')
        # Optionally, create a new index
        # Uncomment the following line to create the index
        # pc.create_index(name=index_name, dimension=384, metric='cosine')
    else:
        print(f'{index_name} exists.')

    # Connect to the existing index
    index = pc.Index(index_name)

except Exception as e:
    print(f"An error occurred while checking indexes: {e}")

# Embedding the text chunks and storing them in Pinecone
#try:
#    docsearch = LangchainPinecone.from_texts(
#        texts=[t.page_content for t in text_chunks],  # Assuming `text_chunks` is a list of text splits
#        embedding=embeddings,  # Embedding model instance
#        index_name=index_name
#    )
#except Exception as e:
#    print(f"An error occurred while creating embeddings: {e}")

medical-vector exists.


### Testing it out

In [19]:
index_name="abcdef"
#If we already have an index we can load it like this
docsearch=LangchainPinecone.from_existing_index(index_name, embeddings)

query = "What are Allergies"

docs=docsearch.similarity_search(query, k=3)

print("Result", docs)

Result [Document(metadata={}, page_content="GALE ENCYCLOPEDIA OF MEDICINE 2 117Allergies\nAllergic rhinitis is commonly triggered by\nexposure to household dust, animal fur,or pollen. The foreign substance thattriggers an allergic reaction is calledan allergen.\nThe presence of an allergen causes the\nbody's lymphocytes to begin producingIgE antibodies. The lymphocytes of an allergy sufferer produce an unusuallylarge amount of IgE.\nIgE molecules attach to mast\ncells, which contain histamine.HistaminePollen grains\nLymphocyte\nFIRST EXPOSURE"), Document(metadata={}, page_content='allergens are the following:\n• plant pollens\n• animal fur and dander\n• body parts from house mites (microscopic creatures\nfound in all houses)\n• house dust• mold spores• cigarette smoke• solvents• cleaners\nCommon food allergens include the following:\n• nuts, especially peanuts, walnuts, and brazil nuts\n• fish, mollusks, and shellfish• eggs• wheat• milk• food additives and preservatives\nThe following 

In [49]:
query = "Cure for acne?"

docs=docsearch.similarity_search(query, k=3)

print("Result", docs)

Result [Document(metadata={}, page_content='Acne cannot be cured, but acne drugs can help clear\nthe skin. Benzoyl peroxide and tretinoin work by mildlyirritating the skin. This encourages skin cells to sloughoff, which helps open blocked pores. Benzoyl peroxidealso kills bacteria, which helps prevent whiteheads andblackheads from turning into pimples. Isotretinoinshrinks the glands that produce sebum.Description\nBenzoyl peroxide is found in many over-the-counter'), Document(metadata={}, page_content='isotretinoin. It can be controlled by proper treatment,\nwith improvement taking two or more months. Acnetends to reappear when treatment stops, but spontaneous-ly improves over time. Inflammatory acne may leavescars that require further treatment.\nPrevention\nThere are no sure ways to prevent acne, but the fol-\nlowing steps may be taken to minimize flare-ups:\n• gentle washing of affected areas once or twice every day\n• avoid abrasive cleansers\n• use noncomedogenic makeup and moistu

In [25]:
query = "I have pain in my head"

docs=docsearch.similarity_search(query, k=3)

print("Result", docs)

Result [Document(metadata={}, page_content='Purpose\nMigraine headaches usually cause a throbbing pain\non one side of the head. Nausea, vomiting, dizziness ,'), Document(metadata={}, page_content='no evidence that head injury causes brain tumors, but\nresearchers are trying to determine the relationship, ifany, between brain tumors and viruses, family history,and long-term exposure to electromagnetic fields.\nSymptoms do not usually appear until the tumor\ngrows large enough to displace, damage, or destroy deli-cate brain tissue. When that happens, the patient mayexperience:\n• headaches that become increasingly painful and are\nmost painful when lying down'), Document(metadata={}, page_content='“GB20” points at the base of the skull in the back of thehead, just behind the bones in back of the ears. Dispersethese points for two minutes with the fingers or thumbs.Also find the “yintang” point, which is in the middle ofthe forehead between the eyebrows. Disperse it withgentle pressure f

In [None]:
docs

In [42]:
query = "I am sad all the time"

docs=docsearch.similarity_search(query, k=3)

print("Result", docs)

Result [Document(metadata={}, page_content='feel extremely sad and lose interest in life. Peoplewith depression may also have sleep problemsand loss of appetite and may have trouble concen-trating and carrying out everyday activities.\nPregnancy category —A system of classifying'), Document(metadata={}, page_content='for ADcaregivers to develop feelings of anger, resentment, guilt,and hopelessness, in addition to the sorrow they feel fortheir loved one and for themselves. Depression is anextremely common consequence of being a full-timecaregiver for a person with AD. Support groups are animportant way to deal with the stress of caregiving.'), Document(metadata={}, page_content='• severe depression or risk of suicide\n• family in crisis\nGALE ENCYCLOPEDIA OF MEDICINE 2 212Anorectal disordersGEM - 0001 to 0432 - A  10/22/03 1:42 PM  Page 212')]


In [37]:
query = "What to do if you are sad?"

docs=docsearch.similarity_search(query, k=3)

print("Result", docs)

Result [Document(metadata={}, page_content='• Humbly ask God to remove shortcomings.• Make a list of all persons harmed by your wrongs and\nbecome willing to make amends to them all.\n• Make direct amends to such people, whenever possible\nexcept when to do so would injure them or others.\n• Continue to take personal inventory and promptly\nadmit any future wrongdoings.\n• Seek to improve contact with a God of the individual’s\nunderstanding through meditation and prayer.\n• Carry the message of spiritual awakening to others and'), Document(metadata={}, page_content='for ADcaregivers to develop feelings of anger, resentment, guilt,and hopelessness, in addition to the sorrow they feel fortheir loved one and for themselves. Depression is anextremely common consequence of being a full-timecaregiver for a person with AD. Support groups are animportant way to deal with the stress of caregiving.'), Document(metadata={}, page_content='include desipramine (Norpramin), imipramine (Tofranil),and

In [52]:
query = "Who is superman?"

docs=docsearch.similarity_search(query, k=3)

print("Result", 
      docs)

Result [Document(metadata={}, page_content='Dr. George Goodheart was born in Detroit, Michi-'), Document(metadata={}, page_content='Tasmania, Australia. He became an actor and Shake-spearean reciter, and early in his career he began to sufferfrom strain on his vocal chords. He sought medical atten-tion for chronic hoarseness, but after treatment with arecommended prescription and extensive periods of rest,his problem persisted.\nAlexander realized that his hoarseness began about'), Document(metadata={}, page_content='Judith Aston serves as director of Aston ParadigmCorporation.GEM - 0001 to 0432 - A  10/22/03 1:43 PM  Page 387')]


### Defining Prompt Template and testing the model with retreival and LLM Text Generation

In [17]:
prompt_template="""
Use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context: {context}
Question: {question}

Only return the helpful answer below and nothing else.
Helpful answer:
"""

In [42]:
PROMPT=PromptTemplate(template=prompt_template, input_variables=["context", "question"])
chain_type_kwargs={"prompt": PROMPT}

In [43]:
llm=CTransformers(model="model/llama-2-7b-chat.ggmlv3.q4_0.bin",
                  model_type="llama",
                  config={'max_new_tokens':512,
                          'temperature':0.8})

In [44]:
qa=RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=docsearch.as_retriever(search_kwargs={'k': 2}),
    return_source_documents=True, 
    chain_type_kwargs=chain_type_kwargs)

In [46]:
while True:
    user_input=input(f"Input Prompt:")
    result=qa({"query": user_input})
    print("Response : ", result["result"])

Input Prompt: what is acne?


Response :  Acne is a common skin disease characterized by pimples on the face, chest, and back. It occurs when the pores of the skin become clogged with oil, dead skin cells, and bacteria.


Input Prompt: exit


KeyboardInterrupt: 

# The previous error is nothing but a Keybord Interrupt 

# Since the offline LLM takes a while to load and give us a response we will try to use LLM API from GROQ

## Response for user input query "I have pain in my head"

In [27]:
import os
from groq import Groq

# Ensure API key is set correctly
os.environ["GROQ_API_KEY"] = "Your-Groq-Key"

api_key = os.environ.get("GROQ_API_KEY")
if not api_key:
    print("API Key not found. Please set the GROQ_API_KEY environment variable.")
else:
    client = Groq(api_key=api_key)

    # User Query
    # query # Already assigned

    # Retrieved Documents (Example)
    retrieved_documents = docs

    # Combine query and retrieved results into one input
    combined_input = f"User Query: {query}\nRelevant Information:\n" + "\n".join(doc.page_content for doc in retrieved_documents)

    try:
        # Create the chat completion request with streaming enabled
        completion = client.chat.completions.create(
            model="llama3-groq-70b-8192-tool-use-preview",
            messages=[
                {"role": "user", "content": combined_input}
            ],
            temperature=0.8,
            max_tokens=1024,
            top_p=0.65,
            stream=True,  # Streaming enabled
            stop=None
        )

        # Buffer to collect the streamed response
        full_response = []

        # Handle the streamed chunks
        for chunk in completion:
            # Access the delta content from the chunk
            if hasattr(chunk, 'choices') and chunk.choices:
                delta = chunk.choices[0].delta.content
                if delta:
                    full_response.append(delta)  # Collect the response chunks

        # Join and print the full response once all chunks are received
        print("Full Response: ", ''.join(full_response))

    except Exception as e:
        print(f"An error occurred during the API call: {str(e)}")


Full Response:  Based on the information provided, it seems that your headache could be caused by a variety of factors including a migraine, a brain tumor, or other underlying conditions. It is important to consult a healthcare professional for a proper diagnosis and treatment plan. In the meantime, you may try some acupressure techniques to help alleviate your symptoms. For example, applying gentle pressure to the "GB20" points at the base of the skull or the "yintang" point in the middle of the forehead may provide some relief. However, please note that acupressure is not a substitute for medical care and should be used in conjunction with other treatments as advised by your healthcare provider.


## Respone for the user input query "who is superman?"

In [53]:

if not api_key:
    print("API Key not found. Please set the GROQ_API_KEY environment variable.")
else:
    client = Groq(api_key=api_key)

    # User Query
    # query already assigned

    # Assume 'docs' contains the relevant documents retrieved from your vector database
    retrieved_documents = docs  # Replace this with your actual document retrieval code

    # Prepare the relevant information from the retrieved documents
    relevant_information = "\n".join(doc.page_content for doc in retrieved_documents)

    # Prepare the relevance check input
    relevance_check_input = f"""
    Based on the following user query and relevant information, determine if the information is relevant to the query. 
    If it is relevant, provide a concise and clear answer addressing the user's question; if not, respond with "I'm sorry, I'm not sure about that."

    User Query: {query}
    
    Relevant Information:
    {relevant_information}
    """

    try:
        # Create the chat completion request with streaming enabled
        completion = client.chat.completions.create(
            model="llama3-groq-70b-8192-tool-use-preview",
            messages=[
                {"role": "user", "content": relevance_check_input}
            ],
            temperature=0.8,
            max_tokens=1024,
            top_p=0.65,
            stream=True,  # Streaming enabled
            stop=None
        )

        # Buffer to collect the streamed response
        full_response = []

        # Handle the streamed chunks
        for chunk in completion:
            # Access the delta content from the chunk
            if hasattr(chunk, 'choices') and chunk.choices:
                delta = chunk.choices[0].delta.content
                if delta:
                    full_response.append(delta)  # Collect the response chunks

        # Join and print the full response once all chunks are received
        print("Full Response: ", ''.join(full_response))

    except Exception as e:
        print(f"An error occurred during the API call: {str(e)}")

Full Response:  I'm sorry, I'm not sure about that.


### Looks like we are good to go and proceed with the next steps

### If needed we can use RAGA to input feedback from user on answers and trail the model further using RAGA, but for now it seems good so we will proceed to the next step

### Next step would be to put this model into a Flask App with modular coding and pipelines