# Diabetes chat insulin recommendation

## Goals

1. Read query, find any food items
   1. Extract key food items
2. Retrieve nutrional value from items
   1. Find db with nutrional information
   2. Fuzzy search
3. Return expected insulin units required for meal
   1. Calculation can be straightforward
   2. Think of a way to find user's sensibility and ratios for insulin

## 1. Extract food items

Model selection:

- Deepseek-r1 does not support structured output, reasoning is not required for this task
- Mistral is lightweight and is a good candidate for this effort
- Explore other options in the future

Structured output:

- We must extract both the foods items, and their quantity
  - TODO: Move towards weight of each food, and volume for beverages
- Revise exected output based on step #2, on how to retrieve nutrional value

Evaluate:

- Use MLflow for evaluation
- Create an evaluation dataset
- Metrics 

In [None]:
from langchain_ollama import ChatOllama
from pydantic import BaseModel, Field


class FoodOutput(BaseModel):
    foods: dict[str, str] = Field(
        title="Key is the food name, value is the quantity",
        examples={"pizza": "200g", "burger": "1"},
    )


llm = ChatOllama(
    model="mistral:latest", temperature=0, json=True, format=FoodOutput.model_json_schema()
)

In [None]:
from langchain_core.messages import SystemMessage, HumanMessage


def invoke(user_prompt: str, llm: ChatOllama) -> dict:
    messages = [
        SystemMessage(
            """
            You will recieve user prompt about food. Output in JSON format the food items as keys and the amount as integer values."
            For example:
            prompt: "I ate 2 apples and a steak", 
            output: {"foods": {"apples": "2", "steak": "1"}}"
            
            prompt: "I need to bolus for 100 g of yogurt and 2 slices of bread",
            output: {"foods": {"yogurt": "100g", "bread": "2"}}
            """
        ),
        HumanMessage(user_prompt),
    ]
    return llm.invoke(messages)


user_prompt = "I ate two slices of pizza and 50 g of chocolate"
invoke(user_prompt, llm)

In [None]:
import json
import pandas as pd

with open("examples/few_shot_food.json", "r") as f:
    queries = json.load(f)

results = []
for query in queries:
    prompt_output = invoke(query["question"], llm)
    expected_output = str(query["answer"])
    results.append((prompt_output.content, expected_output))

df = pd.DataFrame(results, columns=["prompt_output", "expected_output"])

In [None]:
df

### Evaluation

First, let's build a dataset for testing.
Considerations:

- Must include a range of foods
- Must include sentences without any foods associated