# PMC Multi-Agent Search & Summarization — notebook.ipynb

This document contains a runnable notebook-style guide (sections + code snippets) and a README for the Sanofi R&D case study. It's intended to be copy-pasted into a Jupyter notebook (or run cell-by-cell).

## Table of contents

Setup & data access

Retriever Agent

Summarizer Agent

(Optional) Verifier Agent

Report generation

README (run instructions, design choices)

## 1. Setup & data access

Goal: work with a small subset (<100) of PMC OA oa_comm full-text .txt files located in the public S3 bucket pmc-oa-opendata in us-east-1.

Requirements:

In [None]:
python -m venv pmc_agent
source pmc_agent/bin/activate
pip install --upgrade pip
pip install pandas botocore sentence-transformers transformers torch scikit-learn faiss-cpu jupyterlab

## Access S3 (no AWS credentials required)

You can list/copy files directly via AWS CLI without credentials using --no-sign-request with an unsigned config.

In [None]:
# list top-level
aws s3 ls --no-sign-request s3://pmc-oa-opendata/oa_comm/ | head

# fetch the CSV filelist (metadata)
aws s3 cp --no-sign-request s3://pmc-oa-opendata/oa_comm/txt/metadata/csv/oa_comm.filelist.csv ./

## 2. Retriever Agent

### Design:
The Retriever Agent is responsible for fetching biomedical documents that are most relevant to a user’s query. It does this by embedding abstracts into a vector space, comparing them to the embedded query, and ranking results by semantic similarity. Unlike summarization, this agent’s job is only to retrieve and present relevant abstracts.

Model (CPU-friendly):
sentence-transformers/all-MiniLM-L6-v2 — produces 384-dimensional embeddings quickly and efficiently.

### Steps & Code:

#### Corpus Preparation

For each PMID from the metadata file (oa_comm.filelist.csv), download the corresponding full-text .txt document from S3 (oa_comm/txt/all/{PMID}.txt).

Extract the title and abstract (fallback to first ~300 words if abstract is missing).

Store results in a document table (PMID, title, abstract, text).

#### Embedding Index

Encode all abstracts with SentenceTransformer.

Store embeddings in a NumPy array for fast similarity search.

#### Retrieval Function

Encodes the query.

Computes cosine similarity against stored embeddings.

Returns the top-k documents ranked by relevance score.

Agent Integration (Reasoning + Action)

Uses GPT-4o-mini to decide whether retrieval is needed.

If yes, GPT calls the retrieve_tool.

#### The agent records:

Reasoning (Thought): e.g., “I need to retrieve documents for this biomedical query.”

Action: invocation of the retrieve_tool.

Observation: number of documents retrieved.

Output: retrieved documents (PMID, title, abstract, score).

#### Example wrapper:

retriever_tools = [
    {
        "type": "function",
        "function": {
            "name": "retrieve_tool",
            "description": "Retrieve relevant biomedical papers for a given query (without summarization).",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string"},
                    "top_k": {"type": "integer", "default": 5},
                },
                "required": ["query"],
            },
        },
    }
]

#### Execution 

🤔 Thought: I need to retrieve documents.
🔎 Action: retrieve_tool(query="Adverse events with mRNA vaccines in pediatrics", top_k=5)
📄 Observation: Retrieved 5 documents.


Output: list of abstracts and metadata, ready for downstream processing (e.g., summarizer agent).

In [14]:
from sentence_transformers import SentenceTransformer
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd
import boto3
from botocore import UNSIGNED
from botocore.client import Config
from openai import OpenAI
import json

# ------------------------------
# 1. Data Ingestion
# ------------------------------

# Load metadata (CSV from PMC Open Access)
filelist = pd.read_csv("..//oa_comm.filelist.csv")
sample_pmids = filelist["AccessionID"].sample(50, random_state=42).tolist()

# Configure unsigned S3 client for public PMC bucket
s3 = boto3.client(
    "s3",
    config=Config(signature_version=UNSIGNED),
    region_name="us-east-1"
)

def download_document(pmid: str) -> str:
    """Download a full-text document from PMC Open Access."""
    key = f"oa_comm/txt/all/{pmid}.txt"
    obj = s3.get_object(Bucket="pmc-oa-opendata", Key=key)
    #return obj["Body"].read().decode("utf-8")
    return obj["Body"].read().decode("latin-1")


def extract_metadata(text: str) -> tuple[str, str]:
    """Extract title and abstract from the raw document."""
    lines = text.splitlines()
    title, abstract = "", ""
    for line in lines:
        if line.lower().startswith("title:"):
            title = line[len("title:"):].strip()
        if line.lower().startswith("abstract:"):
            abstract = line[len("abstract:"):].strip()
    if not abstract:  # fallback: first ~300 words
        abstract = " ".join(lines[:50])
    return title, abstract

# Prepare corpus
documents = []
for pmid in sample_pmids:
    text = download_document(pmid)
    title, abstract = extract_metadata(text)
    documents.append({
        "pmid": pmid,
        "title": title,
        "abstract": abstract,
        "text": text
    })

# ------------------------------
# 2. Embedding Model
# ------------------------------

model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
doc_texts = [d["abstract"] or d["text"][:1000] for d in documents]
doc_embeddings = model.encode(doc_texts, show_progress_bar=True)

# ------------------------------
# 3. Retriever
# ------------------------------

def retrieve(query: str, top_k: int = 5):
    """Retrieve top-k most relevant documents for a given query."""
    query_emb = model.encode([query])
    scores = cosine_similarity(query_emb, doc_embeddings)[0]
    print(" Scores:", scores)  # Debug: print similarity scores
    
    indices = np.argsort(scores)[::-1][:top_k]
    return [
        {
            "pmid": documents[i]["pmid"],
            "title": documents[i]["title"],
            "abstract": documents[i]["abstract"],
            "relevance_score": float(scores[i]),
        }
        for i in indices
    ]

# ------------------------------
# 4. Agent Integration
# ------------------------------
from dotenv import load_dotenv
import os
load_dotenv()

OPEN_AI_KEY=os.getenv('OPEN_AI_KEY')
client = OpenAI(api_key=OPEN_AI_KEY)

def retrieve_tool(query: str, top_k: int = 5):
    """Tool wrapper for the retriever."""
    return retrieve(query, top_k=top_k)

tools = [
    {
        "type": "function",
        "function": {
            "name": "retrieve_tool",
            "description": "Retrieve relevant biomedical papers for a given query.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "The research query to retrieve documents for"},
                    "top_k": {"type": "integer", "default": 5},
                },
                "required": ["query"],
            },
        },
    }
]

def run_agent(query: str, show_trace: bool = True):
    """
    Run the agent with explicit reasoning (thought),
    action (tool use), observation (retrieved results),
    and final answer synthesis.
    """
    trace = []

    # Step 1: Ask GPT if retrieval is needed
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a biomedical research assistant. Use tools if needed before answering."},
            {"role": "user", "content": query},
        ],
        tools=tools,
    )

    message = response.choices[0].message

    # Step 2: If GPT calls the retriever tool
    if message.tool_calls:
        tool_call = message.tool_calls[0]
        args = json.loads(tool_call.function.arguments)

        if show_trace:
            trace.append(f"🤔 Thought: I need to retrieve documents for this query.")
            trace.append(f"🔎 Action: retrieve_tool(query={args['query']}, top_k={args.get('top_k', 5)})")

        retrieved_docs = retrieve_tool(**args)

        if show_trace:
            trace.append(f"📄 Observation: Retrieved {len(retrieved_docs)} documents.")

        # Step 3: Feed results back into GPT for synthesis
        followup = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "user", "content": query},
                {"role": "assistant", "tool_call_id": tool_call.id, "content": json.dumps(retrieved_docs, indent=2)},
            ],
        )

        final_answer = followup.choices[0].message.content
        if show_trace:
            trace.append(f"💡 Final Answer: {final_answer}")

    else:
        # If GPT answers directly without retrieval
        final_answer = message.content
        if show_trace:
            trace.append("🤔 Thought: No retrieval needed.")
            trace.append(f"💡 Final Answer: {final_answer}")

    return "\n".join(trace) if show_trace else final_answer

user_query = "What adverse effects of mRNA vaccines have been reported in children?"
result = run_agent(user_query, show_trace=True)
print(f"The Retriever Agent Results: {result}")

# ------------------------------
# 5. Example Usage
# ------------------------------

# if __name__ == "__main__":
#     user_query = "What adverse effects of mRNA vaccines have been reported in children?"
#     result = run_agent(user_query, show_trace=True)
#     print(result)


Batches: 100%|██████████| 2/2 [00:00<00:00,  6.77it/s]


 Scores: [ 0.16886362  0.18520235 -0.04030042  0.07915587  0.07436326  0.04194159
  0.01086882 -0.06934417  0.29671973  0.02270352 -0.00646156  0.14045893
  0.11386014  0.04435049  0.03969971 -0.00755932  0.11347413 -0.00453467
  0.21948662  0.10990962 -0.05809837 -0.03347653 -0.10428705  0.04117587
  0.16852488  0.00554047  0.12521012  0.23691656  0.21472375  0.07521107
  0.00839179 -0.0390802   0.11271232  0.10027247  0.10864428  0.14425398
 -0.01665949 -0.11803625  0.08784895 -0.00330128  0.05201231  0.18813506
  0.02010169  0.02175671  0.00740073 -0.08665704  0.04330515  0.17324978
 -0.1061269   0.07157484]
The Retriever Agent Results: 🤔 Thought: I need to retrieve documents for this query.
🔎 Action: retrieve_tool(query=adverse effects of mRNA vaccines in children, top_k=5)
📄 Observation: Retrieved 5 documents.
💡 Final Answer: Reports on adverse effects of mRNA vaccines in children have indicated a range of potential reactions. Most commonly, these are similar to those observed in 

## 3. Summarizer Agent

### Design:
The Summarizer Agent receives the retrieved abstracts from the Retriever Agent and produces concise summaries along with key terms. Its responsibilities are:

Summarization: Convert each abstract into a 2–3 sentence concise summary.

Keyword Extraction: Identify top keywords in the abstract for quick reference.

Unlike the Retriever Agent, this agent does not fetch documents — it operates only on already retrieved content.

### Model Choices (CPU-friendly):

google/flan-t5-small (used in the code)

Alternatives: t5-small, facebook/bart-base

### Implementation Approach:

Summarization Pipeline:

Use transformers.pipeline("summarization") for easy abstraction.

Guard against very short texts by returning them unchanged.

### Keyword Extraction (TF-IDF):

Build TF-IDF vectorizer over all document abstracts.

Extract top-k keywords for each abstract.

### Agent Tool Function:

Combines summarization and keyword extraction into a single callable tool.

### Agent Integration (Reasoning + Action + Observation):

Reasoning (Thought): Decide that retrieved documents need summarization.

Action: Call summarizer_tool on the retrieved documents.

Observation: Record the number of documents summarized and preview summaries & keywords.

### Key Points:

Retriever → Summarizer Pipeline: The summarizer agent always operates on documents retrieved by the retriever agent.

Reasoning + Action + Observation: The agent explicitly tracks the steps of deciding to summarize, performing the summarization, and observing results.

Output: Returns both a trace of reasoning steps and a structured summary + keywords per document.

In [10]:
from transformers import pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np
import json

# ------------------------------
# 1. Summarization Model
# ------------------------------
summarizer = pipeline("summarization", model="google/flan-t5-small", device=-1)

def make_summary(text: str, max_new_tokens: int = 120) -> str:
    """Summarize a given text, skipping very short inputs."""
    if len(text.split()) < 30:
        return text
    out = summarizer(text, max_new_tokens=max_new_tokens, truncation=True)
    return out[0]["summary_text"]


# ------------------------------
# 2. Keyword Extraction (TF-IDF)
# ------------------------------
doc_texts = [d["abstract"] or d["text"][:2000] for d in documents]
vectorizer = TfidfVectorizer(stop_words="english", max_features=2000)
X = vectorizer.fit_transform(doc_texts)
terms = vectorizer.get_feature_names_out()

def top_keywords(idx: int, k: int = 6):
    row = X[idx].toarray()[0]
    top_idx = np.argsort(row)[::-1][:k]
    return [terms[i] for i in top_idx if row[i] > 0]


# ------------------------------
# 3. Summarizer Tool
# ------------------------------
def summarize_documents(docs, max_new_tokens=120):
    """Produce summaries + keywords for retrieved documents."""
    summaries = []
    for doc in docs:
        text = doc["abstract"] or doc["text"][:2000]
        summary = make_summary(text, max_new_tokens=max_new_tokens)
        kw = top_keywords(documents.index(next(d for d in documents if d["pmid"] == doc["pmid"])), k=6)
        summaries.append({
            "pmid": doc["pmid"],
            "title": doc["title"],
            "summary": summary,
            "keywords": kw,
            "relevance_score": doc["relevance_score"],
            "abstract": doc["abstract"]
        })
    return summaries


def summarizer_tool(query: str, top_k: int = 5):
    """Agent tool: retrieve documents, then summarize them."""
    retrieved = retrieve(query, top_k=top_k)
    return summarize_documents(retrieved)


# ------------------------------
# 4. Agent Integration
# ------------------------------
summarizer_tools = [
    {
        "type": "function",
        "function": {
            "name": "summarizer_tool",
            "description": "Retrieve relevant biomedical papers and summarize them with keywords.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "The research query to retrieve & summarize"},
                    "top_k": {"type": "integer", "default": 5},
                },
                "required": ["query"],
            },
        },
    }
]

def run_summarizer_agent(query: str, show_trace: bool = True):
    """
    Summarizer Agent:
    - Thought: decide if summarization is needed
    - Action: retrieve & summarize
    - Observation: summaries & keywords
    - Final Answer: synthesis
    """
    trace = []

    # Step 1: Ask GPT whether to call summarizer_tool
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a biomedical summarization assistant. Use tools to retrieve & summarize literature if needed."},
            {"role": "user", "content": query},
        ],
        tools=summarizer_tools,
    )

    message = response.choices[0].message

    if message.tool_calls:
        tool_call = message.tool_calls[0]
        args = json.loads(tool_call.function.arguments)

        if show_trace:
            trace.append("🤔 Thought: I should summarize relevant documents for this query.")
            trace.append(f"🔎 Action: summarizer_tool(query={args['query']}, top_k={args.get('top_k', 5)})")

        summaries = summarizer_tool(**args)

        if show_trace:
            trace.append(f"📄 Observation: Summarized {len(summaries)} documents.")

        followup = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "user", "content": query},
                {"role": "assistant", "tool_call_id": tool_call.id, "content": json.dumps(summaries, indent=2)},
            ],
        )

        final_answer = followup.choices[0].message.content
        if show_trace:
            trace.append(f"💡 Final Answer: {final_answer}")

    else:
        final_answer = message.content
        if show_trace:
            trace.append("🤔 Thought: No summarization needed.")
            trace.append(f"💡 Final Answer: {final_answer}")

    return "\n".join(trace) if show_trace else final_answer , summaries


query = "Adverse events with mRNA vaccines in pediatrics"
result, summaries = run_summarizer_agent(query, show_trace=True)
print("The result of summarization: ",result)
print("\n\nThe summarize[0]: ",summaries[0])

# ------------------------------
# 5. Example Usage
# ------------------------------
# if __name__ == "__main__":
#     query = "Adverse events with mRNA vaccines in pediatrics"
#     result = run_summarizer_agent(query, show_trace=True)
#     print(result)


Device set to use cpu


The result of summarization:  🤔 Thought: I should summarize relevant documents for this query.
🔎 Action: summarizer_tool(query=Adverse events with mRNA vaccines in pediatrics, top_k=5)
📄 Observation: Summarized 5 documents.
💡 Final Answer: The discussion surrounding adverse events associated with mRNA vaccines in pediatric populations has gained increasing attention, particularly due to the broader rollout of COVID-19 vaccines in children. Here are some key points regarding the topic:

### Safety Profile of mRNA Vaccines in Pediatrics

1. **Common Adverse Events**:
   - The most frequently reported side effects in children receiving mRNA vaccines include pain at the injection site, fatigue, headache, muscle pain, chills, fever, and nausea. These side effects are generally mild to moderate and resolve within a few days.

2. **Serious Adverse Events**:
   - While the incidence of serious adverse events is low, there have been reports of myocarditis and pericarditis, particularly in adole

## 4. (Optional) Verifier Agent

### Design:
The Verifier Agent ensures that the summaries produced by the Summarizer Agent are faithful, relevant, and complete with respect to the retrieved abstracts. Its responsibilities are:

Verification: Compare the summary against the abstracts used for summarization.

Scoring: Assign a faithfulness score and flag potential hallucinations or missing critical information.

Decision-making: Provide a final verification assessment that can be used to accept, refine, or reject the summary.

Unlike the Retriever or Summarizer Agents, the Verifier Agent does not fetch or summarize content, but it takes structured outputs from the previous agent.

### Implementation Approach:

Verification Tool:

Receives the summary and the source abstracts (sources).

Performs a basic verification, e.g., checks keyword overlap, length, and alignment with sources.

### Verifier Agent Function:

Combines reasoning, action, and observation into a single agent function.

Input / Output:

Input:

{
    "query": "Adverse events with mRNA vaccines in pediatrics",
    "summary": "mRNA vaccines in children mostly cause mild, short-lived adverse events, with no severe cases reported in trials.",
    "abstract": [list of retrieved abstracts used for summarization]
}


Output:

{
    "verification": {"faithfulness_score": 0.45, "length_ok": True, "comment": "Summary aligns well with retrieved docs."},
    "final_answer": "...",
    "trace": [reasoning, action, observation steps]
}


Key Features:

Reasoning: GPT (or the agent logic) decides whether verification is needed.

Action: Runs the verifier_tool on the summary and sources.

Observation: Records results and computes scores for faithfulness and completeness.

Output: Structured feedback for downstream use, including potential refinement or acceptance of the summary.

In [None]:
#import json
#from openai import OpenAI

#client = OpenAI()

# ---- VERIFIER TOOL ----
def verifier_tool(summary: str, sources: list) -> dict:
    """
    Verifies whether the summarizer's output is faithful and relevant.
    - Checks alignment between the summary and provided abstracts.
    - Detects hallucinations or missing critical points.
    """
    # For now, do a simple keyword overlap + length check
    keywords = [word.lower() for src in sources for word in src.split() if len(word) > 6]
    summary_words = summary.lower().split()
    overlap = len(set(summary_words) & set(keywords)) / max(1, len(set(summary_words)))

    return {
        "faithfulness_score": round(overlap, 3),
        "length_ok": len(summary.split()) > 30,
        "comment": (
            "Summary aligns well with retrieved docs."
            if overlap > 0.2 else "Potential hallucination / weak grounding."
        )
    }

verifier_tools = [
    {
        "type": "function",
        "function": {
            "name": "verifier_tool",
            "description": "Verify that the summary is faithful, relevant, and complete given the retrieved sources.",
            "parameters": {
                "type": "object",
                "properties": {
                    "summary": {"type": "string"},
                    "sources": {"type": "array", "items": {"type": "string"}}
                },
                "required": ["summary", "sources"]
            },
        },
    }
]

# ---- VERIFIER AGENT ----
def run_verifier_agent(summary_results: dict):
    """
    Verifier Agent: receives summarizer output (summary + sources) and verifies it.
    """
    #query = summary_results["query"]
    summary = summary_results["summary"]
    sources = summary_results["abstract"]

    trace = []

    # Step 1: Reasoning — decide if verification is needed
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a verifier agent. Your job is to decide if a summary should be verified against its sources."},
            {"role": "user", "content": f"Query: {query}\nSummary: {summary}\nSources: {len(sources)} documents"}
        ],
        tools=verifier_tools,
    )

    message = response.choices[0].message
    trace.append("🤔 Reasoning: Checking whether to verify summarizer output.")

    # Step 2: Action — if tool call is triggered, run verifier
    if message.tool_calls:
        tool_call = message.tool_calls[0]
        args = json.loads(tool_call.function.arguments)

        trace.append(f"🔎 Action: verifier_tool(summary=..., sources={len(args['sources'])} docs)")
        verification = verifier_tool(**args)

        # Step 3: Observation
        trace.append(f"📋 Observation: {json.dumps(verification, indent=2)}")

        # Step 4: Final Answer
        followup = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": "You are a verifier agent. Provide a final verification decision."},
                {"role": "user", "content": f"Summary: {summary}\nSources: {sources}\nVerification: {verification}"},
            ],
        )
        trace.append(f"✅ Final Answer: {followup.choices[0].message.content}")
        return {"verification": verification, "final_answer": followup.choices[0].message.content, "trace": trace}

    # No tool call = GPT decided no verification needed
    trace.append("⚠️ No verification needed.")
    return {"verification": None, "final_answer": summary, "trace": trace}


result_ver = run_verifier_agent(summaries[0])
#print("The result of verification: ",result_ver)

The result of verification:  {'verification': {'faithfulness_score': 0.0, 'length_ok': False, 'comment': 'Potential hallucination / weak grounding.'}, 'final_answer': 'Final Verification Decision: The summary provided contains a correction notice related to a scientific article, which appears to have an inconsistency regarding the figure labels. The issues flagged include a zero faithfulness score and length discrepancies, suggesting the summary may not accurately represent the original content. Furthermore, there are signs of potential hallucination or weak grounding. \n\nDecision: **Not Verified**.', 'trace': ['🤔 Reasoning: Checking whether to verify summarizer output.', '🔎 Action: verifier_tool(summary=..., sources=1 docs)', '📋 Observation: {\n  "faithfulness_score": 0.0,\n  "length_ok": false,\n  "comment": "Potential hallucination / weak grounding."\n}', '✅ Final Answer: Final Verification Decision: The summary provided contains a correction notice related to a scientific article,