# Obesity Chatbot

## Task 1: Environment Variables

In [141]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

In [142]:
os.environ["TAVILY_API_KEY"] = getpass.getpass("TAVILY_API_KEY")

In [143]:
os.environ["GOOGLE_API_KEY"] = getpass.getpass("GOOGLE_API_KEY")

In [144]:
os.environ["GOOGLE_CSE_ID"] = getpass.getpass("GOOGLE_CSE_ID")

In [145]:
from uuid import uuid4

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = f"AIE5 - LangGraph - {uuid4().hex[0:8]}"
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("LangSmith API Key: ")

## Task 2: Setup RAG and SDG

In [146]:
from langchain_community.document_loaders import DirectoryLoader

path = "data/"
loader = DirectoryLoader(path, glob="*.pdf")
docs = loader.load()

In [147]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
split_documents = text_splitter.split_documents(docs)
len(split_documents)

535

In [148]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

In [150]:
from langchain.vectorstores import Qdrant
from langchain.embeddings import OpenAIEmbeddings
from qdrant_client import QdrantClient, models
from qdrant_client.models import Distance, VectorParams

client = QdrantClient(":memory:")

client.create_collection(
    collection_name="obecity_rag",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)

vector_store = QdrantVectorStore(
    client=client,
    collection_name="obecity_rag",
    embedding=embeddings,
)

In [151]:
_ = vector_store.add_documents(documents=split_documents)
retriever = vector_store.as_retriever(search_kwargs={"k": 5})
def retrieve(state):
  retrieved_docs = retriever.invoke(state["question"])
  return {"context" : retrieved_docs}

In [152]:
from langchain.prompts import ChatPromptTemplate

RAG_PROMPT = """\
You are a helpful assistant who will do the following:
1. Be clear and detailed
2. Stay relevant to the context of the question

Follow these guidelines while responding:
- Assist in setting realistic and achievable weight-loss goals that are tailored to individual [needs] and [lifestyle]. The process should involve an initial assessment of current habits, health status, and lifestyle to establish a baseline. From there, develop a structured, step-by-step plan that includes short-term milestones and long-term objectives. The plan should be flexible enough to adjust as progress is made but structured enough to provide clear direction. Incorporate strategies for overcoming common obstacles, such as motivation dips and plateaus, and recommend tools or resources for tracking progress. Ensure the goals are SMART (Specific, Measurable, Achievable, Relevant, and Time-bound) to increase the likelihood of success.
- Your task is to identify and help address unhelpful eating patterns in the client seeking to improve their health and wellness. Begin by conducting a comprehensive assessment to understand the client's current eating habits, lifestyle, and underlying factors contributing to their eating patterns. Develop a personalized plan that incorporates achievable goals, mindful eating strategies, and healthier food choices. Provide ongoing support, motivation, and adjustments to the plan based on the client’s progress and feedback. Your approach should be empathetic, evidence-based, and tailored to each client's unique needs, aiming to foster sustainable, positive changes in their eating habits.
- Act as a fitness coach. Develop a personalized workout routine specifically tailored to meet the client's [fitness goal]. The routine must consider the client's current fitness level, any potential limitations or injuries, and their available equipment. It should include a mix of cardiovascular exercises, strength training, flexibility workouts, and recovery activities. Provide clear instructions for each exercise, suggest the number of sets and repetitions, and offer guidance on proper form to maximize effectiveness and minimize the risk of injury.
- As a Personal Chef specialized in creating customized meal plans, design a meal plan tailored to specific dietary preferences. This plan should cater to the client's [health goals], [taste preferences], and any [dietary restrictions] they might have. The meal plan should cover breakfast, lunch, dinner, and snack options for one week, ensuring a balanced and nutritious diet. Include a detailed list of ingredients for each meal, preparation instructions that are easy to follow, and tips for meal prepping to save time.

### Question
{question}

### Context
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_PROMPT)

In [153]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")

In [154]:
def generate(state):
  docs_content = "\n\n".join(doc.page_content for doc in state["context"])
  messages = rag_prompt.format_messages(question=state["question"], context=docs_content)
  response = llm.invoke(messages)
  return {"response" : response.content}

In [155]:
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict
from langchain_core.documents import Document

class State(TypedDict):
  question: str
  context: List[Document]
  response: str

In [156]:
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

In [157]:
response = graph.invoke({"question" : "Why is obesity a big problem in America?"})

In [158]:
response["response"]

'Obesity is a significant public health problem in America for several interrelated reasons, including the prevalence of obesity, its health implications, economic burdens, and the complex environmental and societal factors contributing to its rise. Here’s a detailed breakdown:\n\n### 1. **Prevalence and Statistics**\nThe prevalence of obesity in the United States has reached alarming levels, with about 42% of adults classified as having obesity based on a Body Mass Index (BMI) of 30 or greater. Among specific demographic groups, the rates can be even higher, with nearly half of non-Hispanic Black adults experiencing obesity. The progression of obesity rates indicates a worrying trend; it is projected that nearly 49% of U.S. adults could have obesity by 2030. This increase in obesity prevalence poses a critical public health challenge, as it is associated with numerous health complications such as diabetes, cardiovascular diseases, and certain types of cancer.\n\n### 2. **Health Compli

In [187]:
for idx, tool in enumerate(tool_belt):
    tool_name = getattr(tool, "name", f"Unnamed_Tool_{idx}")
    tool_func_name = getattr(tool, "func", None)

    if tool_func_name is None:
        print(f"⚠️ Tool {idx}: Name='{tool_name}' does not have a .func attribute (might not be a Tool instance)")
    else:
        print(f"✅ Tool {idx}: Name='{tool_name}', Function='{tool_func_name.__name__}'")

✅ Tool 0: Name='TavilySearch', Function='<lambda>'
✅ Tool 1: Name='ArxivQuery', Function='<lambda>'
✅ Tool 2: Name='GoogleSearch', Function='run'
✅ Tool 3: Name='Obesity_QA_Tool', Function='ai_rag_tool'


In [182]:
from langchain_core.tools import Tool
from langchain_core.messages import HumanMessage


#@Tool (name="Obesity Question Answering Tool", description="Useful for when you need to answer questions about obesity. Input should be a fully formed question.")
def ai_rag_tool(question: str) -> str:
    """Useful for when you need to answer questions about obesity. Input should be a fully formed question."""
    response = graph.invoke({"question" : question})
    return {
        "messages" : [HumanMessage(content=response["response"])],
        "context" : response["context"]
    }

ai_rag_tool_instance = Tool(
    name="Obesity_QA_Tool",  # ✅ No spaces, only letters, numbers, underscores, or hyphens
    description="Useful for when you need to answer questions about obesity. Input should be a fully formed question.",
    func=ai_rag_tool
)

In [191]:
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_community.tools.arxiv.tool import ArxivQueryRun
from langchain_google_community import GoogleSearchAPIWrapper
from langchain.tools import Tool

tavily_tool = TavilySearchResults(max_results=5)
google_search = Tool(
    name="GoogleSearch",
    func=GoogleSearchAPIWrapper().run, # Use the .run method directly
    description="Use this tool to search Google.", # Provide a description
)

def tavily_search_func(query: str):
    return TavilySearchResults(max_results=5).invoke({"query": query})

tavily_tool_instance = Tool(
    name="TavilySearch",
    func=tavily_search_func,  # ✅ Now has a proper function name
    description="Use this tool to search Tavily."
)

def arxiv_query_func(query: str):
    return ArxivQueryRun().invoke({"query": query})

arxiv_tool_instance = Tool(
    name="ArxivQuery",
    func=arxiv_query_func,  # ✅ Named function
    description="Use this tool to search academic papers on Arxiv."
)

tool_belt = [
    tavily_tool_instance,  # ✅ Now a valid `Tool`
    arxiv_tool_instance,  # ✅ Now a valid `Tool`
    google_search,  # ✅ Already correct
    ai_rag_tool_instance,  # ✅ Already correct
]

In [192]:
from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-4o", temperature=0)

In [193]:
model = model.bind_tools(tool_belt)

In [194]:
from typing import TypedDict, Annotated
from langgraph.graph.message import add_messages
import operator
from langchain_core.messages import BaseMessage, HumanMessage
from langchain_core.documents import Document

class AgentState(TypedDict):
  messages: Annotated[list, add_messages]
  context: List[Document]

In [195]:
from langgraph.prebuilt import ToolNode

def call_model(state):
  messages = state["messages"]
  response = model.invoke(messages)
  return {
        "messages" : [response],
        "context" : state.get("context", [])
  }

tool_node = ToolNode(tool_belt)

In [196]:
from langgraph.graph import StateGraph, END

uncompiled_graph = StateGraph(AgentState)

uncompiled_graph.add_node("agent", call_model)
uncompiled_graph.add_node("action", tool_node)

<langgraph.graph.state.StateGraph at 0x1586b0e90>

In [197]:
uncompiled_graph.set_entry_point("agent")

<langgraph.graph.state.StateGraph at 0x1586b0e90>

In [198]:
def should_continue(state):
  last_message = state["messages"][-1]

  if last_message.tool_calls:
    return "action"

  return END

uncompiled_graph.add_conditional_edges(
    "agent",
    should_continue
)

<langgraph.graph.state.StateGraph at 0x1586b0e90>

In [199]:
uncompiled_graph.add_edge("action", "agent")

<langgraph.graph.state.StateGraph at 0x1586b0e90>

In [200]:
compiled_graph = uncompiled_graph.compile()

In [202]:
from langchain_core.messages import HumanMessage

#inputs = {"messages" : [HumanMessage(content="Search Arxiv for the QLoRA paper, then search each of the authors to find out their latest Tweet using Tavily!")]}
#inputs = {"messages" : [HumanMessage(content="What is the impact of weightloss on obesity. You will perform perform evidence search using Arxiv and correlate with statements from the news article that needs to be verified using Tavily, and match them with reliable sources such as government or healthcare websites using google searchthat corroborate the findings.")]}

inputs = {"messages" : [HumanMessage(content="Who is the current captain of winnipeg jets?")]}

async for chunk in compiled_graph.astream(inputs, stream_mode="updates"):
    for node, values in chunk.items():
        print(f"Receiving update from node: '{node}'")
        if node == "action":
          print(f"Tool Used: {values['messages'][0].name}")
        print(values["messages"])

        print("\n\n")

Receiving update from node: 'agent'
[AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_5QdqZpsT4j1ZSTyOrtXmGQq0', 'function': {'arguments': '{"__arg1":"current captain of Winnipeg Jets 2023"}', 'name': 'GoogleSearch'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 24, 'prompt_tokens': 168, 'total_tokens': 192, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_eb9dce56a8', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-9a9b4e92-6088-40a8-81e7-409f8bf34509-0', tool_calls=[{'name': 'GoogleSearch', 'args': {'__arg1': 'current captain of Winnipeg Jets 2023'}, 'id': 'call_5QdqZpsT4j1ZSTyOrtXmGQq0', 'type': 'tool_call'}], usage_metadata={'input_tokens': 168, 'output_tokens': 24, 'total_tokens': 192, '

## RAGAS Baseline 

In [203]:
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

In [204]:
from ragas.testset import TestsetGenerator

generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings)
dataset = generator.generate_with_langchain_docs(docs, testset_size=10)

Generating personas: 100%|██████████| 3/3 [00:01<00:00,  2.65it/s]                                           
Generating Scenarios: 100%|██████████| 3/3 [00:12<00:00,  4.13s/it]
Generating Samples: 100%|██████████| 12/12 [00:09<00:00,  1.27it/s]


In [205]:
dataset.to_pandas()

Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name
0,How did Brownell and Horgen describe the moder...,[Context of Lifestyle Modification for Obesity...,Brownell and Horgen (2004) described the moder...,single_hop_specifc_query_synthesizer
1,What Jensen et al. say about calorie intake fo...,[Principal Components of Lifestyle Modificatio...,Jensen et al. (2014) recommend a diet designed...,single_hop_specifc_query_synthesizer
2,"What Burke et al., 2012 say about using smartp...",[Efficacy of Lifestyle Modification: Short Ter...,"Burke et al., 2012 found that in randomized co...",single_hop_specifc_query_synthesizer
3,Who Rosenbaum?,[t h o r M a n u s c r i p t A u t h o r M a n...,"Rosenbaum, along with Leibel, is mentioned in ...",single_hop_specifc_query_synthesizer
4,How does the prevalence of obesity in the US c...,[<1-hop>\n\nClinicalReview&Education JAMA | Re...,The prevalence of obesity in the US is signifi...,multi_hop_abstract_query_synthesizer
5,How do the current evidence-based obesity mana...,[<1-hop>\n\nClinicalReview&Education JAMA | Re...,Current evidence-based obesity management stra...,multi_hop_abstract_query_synthesizer
6,How do global obesity trends and the effective...,[<1-hop>\n\nClinicalReview&Education JAMA | Re...,Global obesity trends have shown a significant...,multi_hop_abstract_query_synthesizer
7,How does the AI Fat Loss Chatbot contribute to...,"[<1-hop>\n\n2/23/25, 1:33 PM I Built an AI Fat...","The AI Fat Loss Chatbot, as described by Chris...",multi_hop_abstract_query_synthesizer
8,How does semaglutide compare to other GLP-1 re...,[<1-hop>\n\nof murine pre-proglucagon-producin...,"Semaglutide, a GLP-1 receptor agonist, has bee...",multi_hop_specific_query_synthesizer
9,How do GIPR/GLP-1R co-agonists achieve weight ...,[<1-hop>\n\ninjected (158). The explosion of i...,GIPR/GLP-1R co-agonists achieve weight loss by...,multi_hop_specific_query_synthesizer


### Context Parsing

In [206]:
from langchain_core.tools import Tool
from langchain_core.messages import HumanMessage

def parse_context_from_invoked_tools(invoked_tools_log: list, tool_belt: list, query: str) -> dict:
    """
    Parses and aggregates context dynamically based on which tools were actually invoked.

    Args:
        invoked_tools_log (list): List of tool names that were invoked.
        tool_belt (list): List of available Tool instances.
        query (str): The user's query for context generation.

    Returns:
        dict: A structured response with 'messages' and 'context'.
    """
    context_list = []
    tool_dict = {tool.name: tool for tool in tool_belt if hasattr(tool, "name")}  # Map tool names to tools

    for tool_name in invoked_tools_log:
        if tool_name in tool_dict:
            tool = tool_dict[tool_name]
            try:
                response = tool.func(query)  # Invoke the tool with user query

                if isinstance(response, dict):  # If tool returns structured data
                    extracted_text = "\n".join(msg.content for msg in response.get("messages", []) if hasattr(msg, "content"))
                    context_list.append(extracted_text)
                else:  # If tool returns raw text
                    context_list.append(str(response))

            except Exception as e:
                context_list.append(f"Error extracting context from {tool.name}: {str(e)}")

    # Ensure 'context' is NEVER empty to prevent errors
    final_context = "\n".join(context_list) if context_list else "No relevant context found."

    return {
        "messages": [HumanMessage(content=final_context)],
        "context": final_context  # Ensure context is always a string
    }


async def extract_invoked_tools_context(compiled_graph, inputs, tool_belt):
    """
    Extracts tools invoked during streaming updates and dynamically generates context.

    Args:
        compiled_graph: The LangChain compiled graph object handling execution.
        inputs (dict): The input query dictionary.
        tool_belt (list): List of available tools.

    Returns:
        dict: A structured response containing 'messages' and 'context'.
    """
    invoked_tools_log = []  # Track tools that were actually used

    async for chunk in compiled_graph.astream(inputs, stream_mode="updates"):
        for node, values in chunk.items():
            print(f"Receiving update from node: '{node}'")
            if node == "action":
                tool_used = values["messages"][0].name
                print(f"Tool Used: {tool_used}")
                invoked_tools_log.append(tool_used)  # Keep track of invoked tools
            
            print(values["messages"])
            print("\n\n")

    # Remove duplicates in case of repeated tool calls
    invoked_tools_log = list(set(invoked_tools_log))

    # Parse context dynamically from the tools actually used
    query = inputs.get("question", "")
    dynamic_context = parse_context_from_invoked_tools(invoked_tools_log, tool_belt, query)

    # Ensure at least one valid update is written
    if not dynamic_context.get("messages"):
        dynamic_context["messages"] = [HumanMessage(content="No additional context available.")]

    if not dynamic_context.get("context"):
        dynamic_context["context"] = "No relevant context found."

    return dynamic_context  

### Eval Baseline

In [207]:
#for test_row in dataset:
#  response = graph.invoke({"question" : test_row.eval_sample.user_input})
#  test_row.eval_sample.response = response["response"]
#  test_row.eval_sample.retrieved_contexts = [context.page_content for context in response["context"]]
async def process_dataset_with_dynamic_context(dataset, compiled_graph, tool_belt):
    """
    Processes the dataset and dynamically injects context based on invoked tools.

    Args:
        dataset (list): List of test rows.
        compiled_graph: The LangChain compiled graph.
        tool_belt (list): Available tools for context retrieval.

    Returns:
        None (modifies dataset in place).
    """
    for test_row in dataset:
        user_query = test_row.eval_sample.user_input
        inputs = {"question": user_query}

        # Step 1: Extract dynamic context
        dynamic_context = await extract_invoked_tools_context(compiled_graph, inputs, tool_belt)

        # Step 2: Invoke the graph
        response = graph.invoke(inputs)

        # Step 3: Debugging Step - Check response
        print("Graph Response:", response)

        # Step 4: Inject responses and context
        test_row.eval_sample.response = response.get("response", "No response")

        # Ensure retrieved_contexts is properly structured
        if "context" in response and response["context"]:
            test_row.eval_sample.retrieved_contexts = [context.page_content for context in response["context"]]
        elif "context" in dynamic_context:
            test_row.eval_sample.retrieved_contexts = [dynamic_context["context"]]
        else:
            test_row.eval_sample.retrieved_contexts = ["No context available."]

In [208]:
dataset.to_pandas()

Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name
0,How did Brownell and Horgen describe the moder...,[Context of Lifestyle Modification for Obesity...,Brownell and Horgen (2004) described the moder...,single_hop_specifc_query_synthesizer
1,What Jensen et al. say about calorie intake fo...,[Principal Components of Lifestyle Modificatio...,Jensen et al. (2014) recommend a diet designed...,single_hop_specifc_query_synthesizer
2,"What Burke et al., 2012 say about using smartp...",[Efficacy of Lifestyle Modification: Short Ter...,"Burke et al., 2012 found that in randomized co...",single_hop_specifc_query_synthesizer
3,Who Rosenbaum?,[t h o r M a n u s c r i p t A u t h o r M a n...,"Rosenbaum, along with Leibel, is mentioned in ...",single_hop_specifc_query_synthesizer
4,How does the prevalence of obesity in the US c...,[<1-hop>\n\nClinicalReview&Education JAMA | Re...,The prevalence of obesity in the US is signifi...,multi_hop_abstract_query_synthesizer
5,How do the current evidence-based obesity mana...,[<1-hop>\n\nClinicalReview&Education JAMA | Re...,Current evidence-based obesity management stra...,multi_hop_abstract_query_synthesizer
6,How do global obesity trends and the effective...,[<1-hop>\n\nClinicalReview&Education JAMA | Re...,Global obesity trends have shown a significant...,multi_hop_abstract_query_synthesizer
7,How does the AI Fat Loss Chatbot contribute to...,"[<1-hop>\n\n2/23/25, 1:33 PM I Built an AI Fat...","The AI Fat Loss Chatbot, as described by Chris...",multi_hop_abstract_query_synthesizer
8,How does semaglutide compare to other GLP-1 re...,[<1-hop>\n\nof murine pre-proglucagon-producin...,"Semaglutide, a GLP-1 receptor agonist, has bee...",multi_hop_specific_query_synthesizer
9,How do GIPR/GLP-1R co-agonists achieve weight ...,[<1-hop>\n\ninjected (158). The explosion of i...,GIPR/GLP-1R co-agonists achieve weight loss by...,multi_hop_specific_query_synthesizer
