# Problem Statement

- A local healthcare company published multiple articles containing healthcare facts,information, and tips. It wishes to create a conversational chatbot that can address readers’concerns in natural language using information from the trusted articles and in the healthcare context.
- The conversational chatbot should answer readers' queries using only the information from the published articles. Where appropriate, it should adopt an empathetic and understanding
tone.

# Solution

In [137]:
!pip install openai langchain langchain-openai langchain_chroma ragas grandalf



In [138]:
import os

os.environ["OPENAI_API_KEY"] = ""

Retrieve PDF and HTML docs

In [139]:
import pandas as pd

docs_df = pd.read_pickle("/content/docs_13_pdf_htmls_df.pkl")

display(docs_df.head(10))
display(pd.DataFrame(docs_df["urls"].value_counts()))

Unnamed: 0,text,source,urls,metadata
0,SUPPORT\n\n1\n\nSupporting individuals with di...,12_pdf_57c95e67895840888240d7322f00117c?v=f00c...,https://ch-api.healthhub.sg/api/public/content...,{'source': 'https://ch-api.healthhub.sg/api/pu...
1,What is positive social support?\n\nWhat is po...,12_pdf_57c95e67895840888240d7322f00117c?v=f00c...,https://ch-api.healthhub.sg/api/public/content...,{'source': 'https://ch-api.healthhub.sg/api/pu...
2,Learn About Diabetes\n\nKnow When to Step Back...,12_pdf_57c95e67895840888240d7322f00117c?v=f00c...,https://ch-api.healthhub.sg/api/public/content...,{'source': 'https://ch-api.healthhub.sg/api/pu...
3,Here are some key areas of diabetes management...,12_pdf_57c95e67895840888240d7322f00117c?v=f00c...,https://ch-api.healthhub.sg/api/public/content...,{'source': 'https://ch-api.healthhub.sg/api/pu...
4,2. Be a good listener\n\n2. Be a good listener...,12_pdf_57c95e67895840888240d7322f00117c?v=f00c...,https://ch-api.healthhub.sg/api/public/content...,{'source': 'https://ch-api.healthhub.sg/api/pu...
5,Here are some helpful questions you can ask wh...,12_pdf_57c95e67895840888240d7322f00117c?v=f00c...,https://ch-api.healthhub.sg/api/public/content...,{'source': 'https://ch-api.healthhub.sg/api/pu...
6,• Would you like me to take you or accompany y...,12_pdf_57c95e67895840888240d7322f00117c?v=f00c...,https://ch-api.healthhub.sg/api/public/content...,{'source': 'https://ch-api.healthhub.sg/api/pu...
7,4. Do it together\n\n4. Do it together\n\n• A ...,12_pdf_57c95e67895840888240d7322f00117c?v=f00c...,https://ch-api.healthhub.sg/api/public/content...,{'source': 'https://ch-api.healthhub.sg/api/pu...
8,"• For example, stop buying unhealthy snacks fo...",12_pdf_57c95e67895840888240d7322f00117c?v=f00c...,https://ch-api.healthhub.sg/api/public/content...,{'source': 'https://ch-api.healthhub.sg/api/pu...
9,• Your role is not to be the food police but t...,12_pdf_57c95e67895840888240d7322f00117c?v=f00c...,https://ch-api.healthhub.sg/api/public/content...,{'source': 'https://ch-api.healthhub.sg/api/pu...


Unnamed: 0_level_0,count
urls,Unnamed: 1_level_1
https://ch-api.healthhub.sg/api/public/content/70e56617c5b140139bb5cb37db3cdef0?v=a7dd48f7&3_gl=1*11zfakr*_ga*MTk3Mjg0Mjc5Ni4xNzA3Nzg5OTUw*_ga_VQW1KL2RMR*MTcxMDc1MzAxMS40LjEuMTcxMDc1NDQzMC42MC4wLjA.,208
https://www.ace-hta.gov.sg/docs/default-source/acgs/acg-t2dm-personalising-medications.pdf,114
https://ch-api.healthhub.sg/api/public/content/0aa3838b8eeb40e6b34a2340a65fdd03?v=eca50be0&_gl=1*1tq2sup*_ga*MTk3Mjg0Mjc5Ni4xNzA3Nzg5OTUw*_ga_VQW1KL2RMR*MTcxMDc1MzAxMS40LjEuMTcxMDc1NDQzMC42MC4wLjA.,85
https://ch-api.healthhub.sg/api/public/content/57c95e67895840888240d7322f00117c?v=f00ca9b4&_gl=1*1tq2sup*_ga*MTk3Mjg0Mjc5Ni4xNzA3Nzg5OTUw,45
https://ch-api.healthhub.sg/api/public/content/57349b96ccfe47319fb49d902a064022?v=6a601253&_gl=1*ok09oc*_ga*MTk3Mjg0Mjc5Ni4xNzA3Nzg5OTUw*_ga_VQW1KL2RMR*MTcxMDc1MzAxMS40LjEuMTcxMDc1NDQxNC4xMi4wLjA.,40
https://www.ace-hta.gov.sg/docs/default-source/acgs/gdm---an-update-on-screening-diagnosis-and-follow-up-(may-2018).pdf,35
https://www.ace-hta.gov.sg/docs/default-source/default-library/managing-pre-diabetes-(updated-on-27-jul-2021)c2bfc77474154c2abf623156a4b93002.pdf,35
https://www.healthxchange.sg/diabetes/living-well-diabetes/diabetes-recommended-vaccinations-children-adults,28
https://www.healthhub.sg/a-z/diseases-and-conditions/diabetes-treatment-insulin,20
https://www.healthhub.sg/a-z/diseases-and-conditions/diabetes-treatment-capsules--tablets,14


In [140]:
from langchain_core.documents.base import Document

docs = []

for index, row in docs_df.iterrows():
  page_content = row["text"]
  metadata = row["metadata"]

  doc = Document(page_content = page_content, metadata = metadata)
  docs.append(doc)

Create RAG pipeline using Chroma vector store and LangChain

In [141]:
# vectorstore.delete_collection() # delete collection from vectorstore

In [142]:
from langchain_chroma import Chroma
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_core.runnables import RunnableParallel
from langchain_openai import OpenAIEmbeddings
from langchain_openai import ChatOpenAI
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

llm = ChatOpenAI(model="gpt-3.5-turbo")

# setup multiquery retriever with chroma
vectordb = Chroma.from_documents(documents=docs, embedding=OpenAIEmbeddings(model = "text-embedding-ada-002"))
retriever = MultiQueryRetriever.from_llm(retriever = vectordb.as_retriever(), llm = llm)
# retriever = vectordb.as_retriever(search_kwargs = {"k": 1})

# contextualize the question
contextualize_q_system_prompt = """Given a chat history and the latest user question \
which might reference context in the chat history, formulate a standalone question \
which can be understood without the chat history. Do NOT answer the question, \
just reformulate it if needed and otherwise return it as is."""
contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)
history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt
)

# question answer prompt
qa_system_prompt = """You are a compassionate and considerate healthcare assistant for question-answering tasks. \
If you don't know the answer, just say that you don't know in a sympathetic manner, and base your responses solely on the information derived from the context. \
Be empathetic, considerate and concise in your responses. \

context: {context}"""
qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", qa_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)
question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

# statefully manage chat history
store = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

conversational_rag_chain = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="answer",
)

conversational_rag_chain.get_graph().print_ascii()

                         +-----------------------------+                      
                         | Parallel<chat_history>Input |                      
                         +-----------------------------+                      
                                ***            ***                            
                              **                  **                          
                            **                      **                        
            +------------------------+          +-------------+               
            | Lambda(_enter_history) |          | Passthrough |               
            +------------------------+          +-------------+               
                                ***            ***                            
                                   **        **                               
                                     **    **                                 
                        +---------------------------

Run queries using RAG

In [143]:
queries = \
[
    "What is gestational diabetes and how is it diagnosed?",
    "What are some healthy eating tips for people with diabetes?",
    "How can my outpatient bill for diabetes be covered?",
    "How does insulin pump therapy mimic the function of the pancreas?",
    "What factors should be considered when determining appropriate exercises for a wheelchair user?",
    "What can excess internal fat lead to in terms of inflammation and damage of body cells?",
    "What is the suggested location for doing lunchtime exercises?",
    "What are the main medications used by patients with type 2 diabetes mellitus in Singapore?",
    "How can rotating injection sites prevent fat lump formation in insulin pump therapy?",
    "What is the purpose of an insulin reservoir in insulin pump therapy and how does it relate to rotating injection sites for proper absorption and to prevent swelling?"
]

question_list = []
source_list = []
context_list = []
answer_list = []

for query in queries:
  print(query)
  result = conversational_rag_chain.invoke({"input": query}, config={"configurable": {"session_id": "abc123"}},)

  ans = result["answer"]
  answer_list.append(ans)

  source_set = set([doc.metadata["source"] for doc in result["context"]])

  source_text = ""
  for source in source_set:
    source_text += source + "\n"

  ans = f"{ans}\n\nSource:\n{source_text}"
  print(ans)
  print()

  question_list.append(query)
  source_list.append(source_text)
  context_list.append([doc.page_content for doc in result["context"]])

eval_dict = \
 {
     "question": question_list,
     "source": source_list,
     "answer": answer_list,
     "contexts": context_list
 }

print(eval_dict)

What is gestational diabetes and how is it diagnosed?
Gestational diabetes is a type of diabetes that develops during pregnancy. It is diagnosed through screening tests such as the 3-point OGTT (oral glucose tolerance test) using the criteria set by the International Association of Diabetes and Pregnancy Study Group (IADPSG), which includes measuring plasma glucose levels at fasting, 1-hour post-OGTT, and 2-hour post-OGTT. If these levels meet the criteria specified in Table 1, then gestational diabetes is diagnosed.

Source:
https://www.ace-hta.gov.sg/docs/default-source/acgs/gdm---an-update-on-screening-diagnosis-and-follow-up-(may-2018).pdf


What are some healthy eating tips for people with diabetes?
Healthy eating tips for people with diabetes include:
1. Eating a well-balanced diet with a variety of fruits, vegetables, whole grains, lean proteins, and healthy fats.
2. Monitoring carbohydrate intake and choosing complex carbohydrates over simple sugars.
3. Limiting saturated fats,

LLM evaluation using Retrieval Augmented Generation Assessment (RAGAs)

In [144]:
from ragas.metrics import (answer_relevancy,faithfulness, context_relevancy)
from ragas import evaluate
from datasets import Dataset

dataset  = Dataset.from_dict(eval_dict)
score = evaluate(dataset,metrics=[faithfulness, answer_relevancy, context_relevancy])
score = score.to_pandas()

display(score)

Evaluating:   0%|          | 0/30 [00:00<?, ?it/s]

Unnamed: 0,question,source,answer,contexts,faithfulness,answer_relevancy,context_relevancy
0,What is gestational diabetes and how is it dia...,https://www.ace-hta.gov.sg/docs/default-source...,Gestational diabetes is a type of diabetes tha...,"[history of diabetes.1,22\n\nAll women with di...",1.0,0.951863,0.388889
1,What are some healthy eating tips for people w...,https://www.ace-hta.gov.sg/docs/default-source...,Healthy eating tips for people with diabetes i...,[Healthy diet\n\nA healthy and balanced diet p...,1.0,1.0,0.0
2,How can my outpatient bill for diabetes be cov...,https://ch-api.healthhub.sg/api/public/content...,Your outpatient bill for diabetes can be cover...,"[After deduction from the sources above, you m...",1.0,0.993797,0.125
3,How does insulin pump therapy mimic the functi...,https://ch-api.healthhub.sg/api/public/content...,Insulin pump therapy mimics the function of th...,[medication independent of their effect on HbA...,1.0,0.905265,0.05
4,What factors should be considered when determi...,https://ch-api.healthhub.sg/api/public/content...,When determining appropriate exercises for a w...,[What exercise are appropriate for me?\n\nWhat...,1.0,1.0,0.076923
5,What can excess internal fat lead to in terms ...,https://ch-api.healthhub.sg/api/public/content...,"Excess internal fat, including fat surrounding...",[There may be excess fat in your body even if ...,1.0,0.915512,0.030303
6,What is the suggested location for doing lunch...,https://ch-api.healthhub.sg/api/public/content...,The suggested location for doing lunchtime exe...,[The table provides reasons for not being able...,1.0,0.994663,0.25
7,What are the main medications used by patients...,https://www.ace-hta.gov.sg/docs/default-source...,"In Singapore, despite the availability of newe...",[More than 95% of people with diabetes have ty...,1.0,1.0,0.2
8,How can rotating injection sites prevent fat l...,https://www.healthhub.sg/a-z/diseases-and-cond...,Rotating injection sites in insulin pump thera...,[Rotate your injection spots\n\nRotate injecti...,1.0,0.976867,0.25
9,What is the purpose of an insulin reservoir in...,https://ch-api.healthhub.sg/api/public/content...,The insulin reservoir in insulin pump therapy ...,[c Not available as single-ingredient product;...,1.0,0.885586,0.2


Save Results

In [145]:
# Faithfulness - Measures the factual consistency of the answer to the context based on the question. i.e., hallucination
# Answer Relevancy - Measures how relevant the answer is to the question

result_df = pd.DataFrame({
    "Reader Query": score["question"].to_list(),
    "Information Retrieved from source": source_list,
    "LLM Response": score["answer"].to_list(),
    "Source Text": score["contexts"],
    "Faithfulness": score["faithfulness"], # measures hallucinations i.e., context -> answer
    "Answer Relevancy": score["answer_relevancy"], # measures answer relevancy i.e., answer -> query
    "Context Relevancy": score["context_relevancy"] # measures context relevancy i.e., query -> context
    })

display(result_df)

Unnamed: 0,Reader Query,Information Retrieved from source,LLM Response,Source Text,Faithfulness,Answer Relevancy,Context Relevancy
0,What is gestational diabetes and how is it dia...,https://www.ace-hta.gov.sg/docs/default-source...,Gestational diabetes is a type of diabetes tha...,"[history of diabetes.1,22\n\nAll women with di...",1.0,0.951863,0.388889
1,What are some healthy eating tips for people w...,https://www.ace-hta.gov.sg/docs/default-source...,Healthy eating tips for people with diabetes i...,[Healthy diet\n\nA healthy and balanced diet p...,1.0,1.0,0.0
2,How can my outpatient bill for diabetes be cov...,https://ch-api.healthhub.sg/api/public/content...,Your outpatient bill for diabetes can be cover...,"[After deduction from the sources above, you m...",1.0,0.993797,0.125
3,How does insulin pump therapy mimic the functi...,https://ch-api.healthhub.sg/api/public/content...,Insulin pump therapy mimics the function of th...,[medication independent of their effect on HbA...,1.0,0.905265,0.05
4,What factors should be considered when determi...,https://ch-api.healthhub.sg/api/public/content...,When determining appropriate exercises for a w...,[What exercise are appropriate for me?\n\nWhat...,1.0,1.0,0.076923
5,What can excess internal fat lead to in terms ...,https://ch-api.healthhub.sg/api/public/content...,"Excess internal fat, including fat surrounding...",[There may be excess fat in your body even if ...,1.0,0.915512,0.030303
6,What is the suggested location for doing lunch...,https://ch-api.healthhub.sg/api/public/content...,The suggested location for doing lunchtime exe...,[The table provides reasons for not being able...,1.0,0.994663,0.25
7,What are the main medications used by patients...,https://www.ace-hta.gov.sg/docs/default-source...,"In Singapore, despite the availability of newe...",[More than 95% of people with diabetes have ty...,1.0,1.0,0.2
8,How can rotating injection sites prevent fat l...,https://www.healthhub.sg/a-z/diseases-and-cond...,Rotating injection sites in insulin pump thera...,[Rotate your injection spots\n\nRotate injecti...,1.0,0.976867,0.25
9,What is the purpose of an insulin reservoir in...,https://ch-api.healthhub.sg/api/public/content...,The insulin reservoir in insulin pump therapy ...,[c Not available as single-ingredient product;...,1.0,0.885586,0.2


In [146]:
result_df.to_csv("Sample Output.csv")