# Agent Case Study with OpenAI SDK

This notebook shows **with vs without** using the same user prompt:
- Direct Agent (no planning loop)
- Planning Agent (Planner -> Executor -> Final)

Also includes Tool and Memory demos.

## 1) Install

In [None]:
%pip -q install openai python-dotenv chromadb

## 2) Setup

In [40]:
import os
import json
import re
import shutil

from datetime import datetime
from openai import OpenAI, AsyncOpenAI
from agents import Agent, Runner, OpenAIResponsesModel, function_tool

import os
os.environ["OPENAI_API_KEY"] = ""  # your skey

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
gpt_async_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# Open-model endpoint settings (for vLLM/Ollama/OpenRouter-compatible APIs)
OPEN_MODEL_BASE_URL = os.getenv("OPEN_MODEL_BASE_URL", "http://localhost:8000/v1")
OPEN_MODEL_API_KEY = os.getenv("OPEN_MODEL_API_KEY", "EMPTY")
open_async_client = AsyncOpenAI(base_url=OPEN_MODEL_BASE_URL, api_key=OPEN_MODEL_API_KEY)

# Hosted embedding endpoint (e.g., vLLM --task embed)
EMBED_BASE_URL = os.getenv("EMBED_BASE_URL", "http://localhost:8001/v1")
EMBED_API_KEY = os.getenv("EMBED_API_KEY", "EMPTY")
emb_client = OpenAI(base_url=EMBED_BASE_URL, api_key=EMBED_API_KEY)

MODEL_GPT = "gpt-4.1-mini"
MODEL_OPEN = os.getenv("OPEN_MODEL", "qwen3-4b-instruct")
ACTIVE_PROVIDER = os.getenv("ACTIVE_PROVIDER", "gpt")  # set to 'open' to use qwen/open model
MODEL = MODEL_GPT
EMBED_MODEL = os.getenv("EMBED_MODEL", "BAAI/bge-m3")

if not os.getenv("OPENAI_API_KEY"):
    raise ValueError("OPENAI_API_KEY is not set. Put it in env or .env file.")

## 2.5) Start vLLM Servers in Notebook (Optional)

In [None]:
# Run these in the command

# pip install vllm
# python -c "import sys; print(sys.executable)"
# python -c "import vllm; print(vllm.__version__)"
# nohup "$(python -c 'import sys; print(sys.executable)')" -m vllm.entrypoints.openai.api_server --model Qwen/Qwen3-4B-Instruct --host 127.0.0.1 --port 8000 > /tmp/vllm_llm.log 2>&1 &
# nohup "$(python -c 'import sys; print(sys.executable)')" -m vllm.entrypoints.openai.api_server --model BAAI/bge-m3 --task embed --host 127.0.0.1 --port 8001 > /tmp/vllm_emb.log 2>&1 &
# sleep 5
# tail -n 80 /tmp/vllm_llm.log
# tail -n 80 /tmp/vllm_emb.log

## 3) Helper

In [41]:
async def run_agent(agent_name, instructions, user_input, model=None, provider="gpt", tools=None):
    if provider == "gpt":
        selected_model = model or MODEL_GPT
        model_obj = OpenAIResponsesModel(selected_model, gpt_async_client)
    elif provider == "open":
        selected_model = model or MODEL_OPEN
        model_obj = OpenAIResponsesModel(selected_model, open_async_client)
    else:
        raise ValueError("provider must be 'gpt' or 'open'")

    agent = Agent(
        name=agent_name,
        instructions=instructions,
        model=model_obj,
        tools=tools or [],
    )
    result = await Runner.run(agent, user_input)
    return result.final_output, result

async def ask(messages_or_text, model=None, provider="gpt", temperature=0):
    if isinstance(messages_or_text, list):
        text_parts = []
        for m in messages_or_text:
            content = m.get("content", "") if isinstance(m, dict) else str(m)
            text_parts.append(str(content))
        user_input = "\n".join([x for x in text_parts if x])
    else:
        user_input = messages_or_text
    return await run_agent("Assistant", "", user_input, model=model, provider=provider)

def embed_texts(texts):
    emb = emb_client.embeddings.create(model=EMBED_MODEL, input=texts)
    return [item.embedding for item in emb.data]


def extract_json_object(text):
    m = re.search(r"\{[\s\S]*\}", text)
    if not m:
        raise ValueError("No JSON object found in output:\n" + text)
    return json.loads(m.group(0))

## 4) Tool Demo (Without Tool vs With Tool)

In [42]:
@function_tool
def get_current_date() -> dict:
    now = datetime.now()
    print("The agent is calling this tool!")
    return {"date": now.strftime("%Y-%m-%d"), "weekday": now.strftime("%A")}

q = "What day is today? Return exact date as YYYY-MM-DD and weekday."
without_tool, _ = await ask([{"role": "user", "content": q}], provider=ACTIVE_PROVIDER)
print("WITHOUT TOOL:", without_tool)


with_tool, _ = await run_agent(
    "Tool Agent",
    "Use get_current_date when user asks about current date or weekday.",
    q,
    provider=ACTIVE_PROVIDER,
    tools=[get_current_date],
)

print("WITH TOOL:", with_tool)

WITHOUT TOOL: Today is 2024-04-27, Saturday.
The agent is calling this tool!
WITH TOOL: Today is 2026-02-13, and the weekday is Friday.


## 5) Memory Demo with Vector Database (Without Memory vs With Vector Memory)

In [43]:
import numpy as np
import faiss

# 1) init in-memory FAISS index + record store
records = []
index = None

# 2) write one fact
record = {
    "id": "fact_jeff_birthday",
    "text": "Jeff's birthday is February 1.",
    "metadata": {"type": "birthday", "person": "Jeff"},
}
vec = embed_texts([record["text"]])[0]
x = np.array([vec], dtype="float32")
faiss.normalize_L2(x)
index = faiss.IndexFlatIP(x.shape[1])
index.add(x)
records.append(record)

In [44]:
# 3) ask without memory
q = "When is Jeff's birthday?"
without_memory, _ = await run_agent(
    "QA Agent",
    "Answer from your own knowledge. If unsure, say you don't know.",
    q,
    provider=ACTIVE_PROVIDER,
)
print("WITHOUT MEMORY:", without_memory)

WITHOUT MEMORY: I don't know when Jeff's birthday is.


In [45]:
# 4) retrieve + ask with memory
qvec = np.array([embed_texts([q])[0]], dtype="float32")
faiss.normalize_L2(qvec)
k = min(1, len(records))

docs = []
if index is not None and k > 0:
    _, I = index.search(qvec, k)
    docs = [records[i]["text"] for i in I[0] if i >= 0]
ctx = "\n".join(f"- {d}" for d in docs)

with_memory, _ = await run_agent(
    "Memory Agent",
    f"Use retrieved facts to answer.\nRetrieved facts:\n{ctx}",
    q,
    provider=ACTIVE_PROVIDER,
)
print("WITH MEMORY:", with_memory)

WITH MEMORY: Jeff's birthday is February 1.


## 6) Planning Demo: Same Prompt, Different Agent Policy

We use the **same user task** and compare:
- Direct Agent: one-shot answer
- Planning Agent: Planner -> Executor -> Final

In [49]:
TASK = "Compare GLP-1 receptor agonists vs SGLT2 inhibitors in adults with type 2 diabetes."


async def direct_agent(task, provider="gpt", model=None):
    ans, _ = await ask([{"role": "user", "content": task}], temperature=0, provider=provider, model=model)
    return ans


async def planning_agent(task, provider="gpt", model=None):
    planner_prompt = (
        "You are Biomedical Research Planner.\n"
        "Return strict JSON only with this schema:\n"
        "{\n"
        "  \"goal\": \"...\",\n"
        "  \"scope\": {\"population\": \"...\", \"interventions\": [\"...\"], \"outcomes\": [\"...\"]},\n"
        "  \"steps\": [\n"
        "    {\"id\": 1, \"task\": \"...\", \"done_definition\": \"...\"}\n"
        "  ]\n"
        "}\n"
        f"Task: {task}"
    )

    plan_text, _ = await ask(planner_prompt, temperature=0, provider=provider, model=model)
    plan = extract_json_object(plan_text)

    step_results = []
    for s in plan["steps"]:
        exec_prompt = (
            "You are Biomedical Research Executor.\n"
            f"Goal: {plan['goal']}\n"
            f"Scope: {json.dumps(plan['scope'])}\n"
            f"Step: {s['task']}\n"
            f"Done definition: {s['done_definition']}\n"
            "Provide concise but evidence-oriented output for this step."
        )
        out, _ = await ask(exec_prompt, temperature=0, provider=provider, model=model)
        step_results.append({"id": s["id"], "task": s["task"], "result": out})

    final_prompt = (
        "You are Biomedical Finalizer.\n"
        "Produce a deep-research style report with:\n"
        "1) clinical question framing (PICO)\n"
        "2) evidence synthesis table (benefits/risks)\n"
        "3) subgroup considerations (CKD, obesity, elderly)\n"
        "4) practical prescribing recommendations\n"
        "5) uncertainty and evidence gaps\n"
        f"Task: {task}\n"
        f"Plan: {json.dumps(plan, ensure_ascii=False)}\n"
        f"Step results: {json.dumps(step_results, ensure_ascii=False)}"
    )
    final_text, _ = await ask(final_prompt, temperature=0, provider=provider, model=model)

    return {
        "plan": plan,
        "step_results": step_results,
        "final": final_text,
    }

baseline = await direct_agent(TASK, provider=ACTIVE_PROVIDER)
print("=== SAME TASK ===")
print(TASK)
print("\n=== DIRECT AGENT (WITHOUT PLANNING) ===")
print(baseline)

=== SAME TASK ===
Compare GLP-1 receptor agonists vs SGLT2 inhibitors in adults with type 2 diabetes.

=== DIRECT AGENT (WITHOUT PLANNING) ===
Certainly! Here's a comparison of **GLP-1 receptor agonists (GLP-1 RAs)** and **SGLT2 inhibitors** in adults with type 2 diabetes, focusing on mechanism, efficacy, benefits, side effects, and other clinical considerations:

---

### 1. **Mechanism of Action**

- **GLP-1 Receptor Agonists:**  
  Mimic the incretin hormone GLP-1, enhancing glucose-dependent insulin secretion, suppressing glucagon secretion, slowing gastric emptying, and increasing satiety.

- **SGLT2 Inhibitors:**  
  Inhibit sodium-glucose co-transporter 2 in the proximal renal tubules, reducing glucose reabsorption and increasing urinary glucose excretion.

---

### 2. **Glycemic Control**

- **GLP-1 RAs:**  
  Typically lower HbA1c by approximately 1.0–1.5%. Particularly effective in reducing postprandial glucose.

- **SGLT2 Inhibitors:**  
  Lower HbA1c by about 0.5–1.0%. More

In [52]:
planned = await planning_agent(TASK, provider=ACTIVE_PROVIDER)
print("\n=== PLANNING AGENT (WITH PLANNING) - FINAL ===")
print(planned["final"])


=== PLANNING AGENT (WITH PLANNING) - FINAL ===
**Comparative Effectiveness and Safety of GLP-1 Receptor Agonists vs SGLT2 Inhibitors in Adults with Type 2 Diabetes: A Deep-Research Report**

---

### 1) Clinical Question Framing (PICO)

| Element        | Description                                                                                                               |
|----------------|---------------------------------------------------------------------------------------------------------------------------|
| **Population** | Adults diagnosed with type 2 diabetes mellitus (T2DM), including those with or without established cardiovascular (CV) or renal disease. |
| **Interventions** | GLP-1 receptor agonists (liraglutide, semaglutide, exenatide, dulaglutide, and others)                                   |
| **Comparator** | SGLT2 inhibitors (empagliflozin, dapagliflozin, canagliflozin, and others)                                                |
| **Outcomes**   | - Glycemic 

## 7) Optional: Quick scorecard

In [53]:
def simple_score(text):
    t = text.lower()
    checks = {
        "cardiovascular": any(k in t for k in ["cardiovascular", "cv", "mace", "heart failure"]),
        "renal": any(k in t for k in ["renal", "kidney", "ckd", "egfr", "albuminuria"]),
        "weight": any(k in t for k in ["weight", "obesity", "bmi"]),
        "safety": any(k in t for k in ["safety", "adverse", "side effect", "risk"]),
        "cost_access": any(k in t for k in ["cost", "access", "insurance", "affordability"]),
    }
    return sum(int(v) for v in checks.values()), checks

s1, c1 = simple_score(baseline)
s2, c2 = simple_score(planned["final"])
print("DIRECT score:", s1, c1)
print("PLANNED score:", s2, c2)

DIRECT score: 4 {'cardiovascular': True, 'renal': True, 'weight': True, 'safety': True, 'cost_access': False}
PLANNED score: 5 {'cardiovascular': True, 'renal': True, 'weight': True, 'safety': True, 'cost_access': True}


In [51]:
print("""The Planning Agent output is stronger overall for this task.

It is better structured, more comprehensive, and much closer to a true deep-research deliverable. It covers key clinical dimensions (CV, renal, weight, safety, subgroup use, and practical prescribing) and presents them in a decision-friendly format. By contrast, the Direct Agent response is clear and accurate at a high level, but it is more of a concise overview than a research-grade synthesis.

The main caveat is verification: the Planning version includes quantitative claims and citations that should be independently checked before clinical or publication use.""")

The Planning Agent output is stronger overall for this task.

It is better structured, more comprehensive, and much closer to a true deep-research deliverable. It covers key clinical dimensions (CV, renal, weight, safety, subgroup use, and practical prescribing) and presents them in a decision-friendly format. By contrast, the Direct Agent response is clear and accurate at a high level, but it is more of a concise overview than a research-grade synthesis.

The main caveat is verification: the Planning version includes quantitative claims and citations that should be independently checked before clinical or publication use.
