# Genex Meta-Study Researcher — PDF Folder → Relevance Screen → Evidence Extraction → Scoring → Markdown Evidence Report

## What this notebook is
This notebook is a **Genex “meta-study researcher” pipeline** that turns a *folder of research-paper PDFs* about a target genetic condition into a structured, ranked **evidence report**.

It is designed to:
- **screen** papers for relevance to a condition,
- **extract** normalized findings (symptoms, treatments, outcomes, genes, definitions) with short evidence snippets,
- **aggregate + score** evidence across papers, and
- produce a final **Markdown report** suitable for parent-facing summaries or internal Genex knowledge-base drafts.

> Output is informational and depends on paper quality/coverage; it is **not medical advice**.

---

## Inputs (what you must provide)
### 1) A folder of PDFs
- Set `PAPERS_DIR` to a directory containing paper PDFs (`.pdf`).
- The notebook uses **PyMuPDF (`fitz`)** to read text from each PDF.

### 2) Target condition string
You pass a condition name (e.g., `"L-serine deficiency disorder"`) into:
- `await run_condition_folder(condition, PAPERS_DIR)`

### 3) LLM access (configured in notebook)
The pipeline uses **Google ADK** (`LlmAgent`, `Runner`, `InMemorySessionService`) plus an LLM handle (`llm`) for structured JSON extraction and writing.
- Make sure the model client is configured and accessible in your environment (API keys / auth as applicable to your ADK setup).

---

## Process overview (what it does)
### A) PDF ingestion
- Lists PDFs in `PAPERS_DIR`
- Extracts text per paper with PyMuPDF (plus helper utilities for cleaning/metadata as needed)

### B) Relevance screening (strict filter)
- **Relevance Agent**: given the condition + paper metadata + partial text, returns a **single JSON** `RelevanceDecision`
- Only papers marked relevant (with a relevance score) proceed to extraction

### C) Structured evidence extraction (per relevant paper)
- **Paper Reader Agent** produces a normalized `PaperExtraction` JSON including:
  - `summary` + `key_takeaways`
  - `findings[]` with items like:
    - `category`: symptom | treatment | outcome | population | gene | definition | other
    - `polarity`: supports | refutes | mixed | unclear
    - `snippet`: ≤2 sentences (evidence)
    - `section`: methods | results | discussion | abstract | unknown
- A normalization/repair step standardizes fields (e.g., gene names) and safely parses JSON.

### D) Aggregation + scoring across papers
- Findings are grouped by `(category, name)` and scored using:
  - **weighted support** (optional journal/quality weights)
  - **supporting paper counts**
  - an approximate **90% interval** for support rate (Beta-based heuristic)
- Produces `ranked_findings` (symptoms/treatments/genes/definitions/outcomes) sorted by confidence.

### E) Final Markdown write-up
- **Aggregator/Writer Agent** converts the scored evidence into a **Genex Evidence Report** in Markdown with sections like:
  - Executive Summary (bullets)
  - Ranked Symptoms (with confidence + interval)
  - Ranked Treatments/Interventions (with confidence + interval)
  - Outcomes / prognosis (if present)
  - Conflicting / uncertain areas
  - Limitations + what to read next

---

## Outputs (what you get)
Calling:
- `result = await run_condition_folder("<condition>", PAPERS_DIR)`

Returns a dictionary that typically includes:
- `papers_screened` / `papers_relevant` (counts)
- `extractions` (per-paper structured extractions)
- `scored` with `ranked_findings` (aggregated evidence)_


In [18]:
# Core Python
import os
import re
import json
import math
from typing import Any, Dict, List, Optional, Tuple

# PDF parsing
import fitz  # PyMuPDF

# Data models
from pydantic import BaseModel, Field

# Google ADK
from google.adk.agents import LlmAgent
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.adk.models.lite_llm import LiteLlm
from google.genai.types import Content, Part

# One session service for this notebook
session_service = InMemorySessionService()

In [19]:
# -----------------------
# A) Settings
# -----------------------
PAPERS_DIR = r"C:/Users/T490/Downloads/Genetics-Dashboard/docs/papers/serine_deficiency_papers/"
MAX_PAGES_PER_PDF = 25            # <-- keep reasonable for context
MODEL = "openai/gpt-4o-mini"      # <-- or anthropic/..., etc.

llm = LiteLlm(model=MODEL)
session_service = InMemorySessionService()

In [20]:
# -----------------------
# B) Local-only journal reputation weights
# -----------------------
# 0..1 weight. If a journal isn't present, we fallback to DEFAULT_JOURNAL_WEIGHT.
JOURNAL_WEIGHTS = {
    "Pediatrics": 0.95,
    "The Lancet": 0.98,
    "NEJM": 0.99,
    "JAMA": 0.97,
    "Developmental Medicine & Child Neurology": 0.90,
    "American Journal of Medical Genetics": 0.85,
}
DEFAULT_JOURNAL_WEIGHT = 0.60


def journal_weight(journal: Optional[str]) -> float:
    if not journal:
        return DEFAULT_JOURNAL_WEIGHT
    # simple fuzzy match by containment
    for k, w in JOURNAL_WEIGHTS.items():
        if k.lower() in journal.lower():
            return float(w)
    return DEFAULT_JOURNAL_WEIGHT


In [21]:
# -----------------------
# C) PDF reading (local)
# -----------------------
def list_pdfs(folder: str) -> List[str]:
    return sorted(
        os.path.join(folder, f)
        for f in os.listdir(folder)
        if f.lower().endswith(".pdf")
    )

def pdf_to_text(path: str, max_pages: int = 20) -> str:
    doc = fitz.open(path)
    out = []
    for i in range(min(len(doc), max_pages)):
        out.append(doc.load_page(i).get_text("text"))
    return "\n".join(out)

def infer_metadata(path: str, text: str) -> Dict[str, Any]:
    filename = os.path.basename(path)
    paper_id = filename

    # year guess
    year = None
    m = re.search(r"\b(19|20)\d{2}\b", text[:4000])
    if m:
        year = int(m.group(0))

    # title guess: first decent line
    title = os.path.splitext(filename)[0]
    lines = [ln.strip() for ln in text.splitlines() if ln.strip()]
    for ln in lines[:40]:
        if 15 < len(ln) < 180 and "doi" not in ln.lower():
            title = ln
            break

    # journal guess (very heuristic)
    journal = None
    jm = re.search(r"(Journal of [A-Za-z0-9 \-:&]+)", text[:8000])
    if jm:
        journal = jm.group(1).strip()

    return {
        "paper_id": paper_id,
        "path": path,
        "title": title,
        "year": year,
        "journal": journal,
        "authors": [],   # <-- add this
        "doi": None      # <-- optional
    }


In [22]:
# -----------------------
# Paper-level metadata
# -----------------------
class PaperMetadata(BaseModel):
    title: str
    authors: List[str] = Field(default_factory=list)
    year: Optional[int] = None
    journal: Optional[str] = None
    doi: Optional[str] = None


# -----------------------
# Relevance decision schema
# -----------------------
class RelevanceDecision(BaseModel):
    paper_id: str
    title: str
    is_relevant: bool = False
    relevance_score: float = 0.0
    reason: str = ""
    matched_terms: List[str] = Field(default_factory=list)


# -----------------------
# Extraction schema
# -----------------------
class ExtractedFinding(BaseModel):
    name: str
    category: str
    polarity: str
    snippet: str
    section: str


class PaperExtraction(BaseModel):
    paper_id: str
    title: str
    authors: List[str] = Field(default_factory=list)   # ← propagated
    year: Optional[int] = None
    journal: Optional[str] = None
    doi: Optional[str] = None
    condition: str = ""
    relevance_score: float = 0.0
    findings: List[ExtractedFinding] = Field(default_factory=list)


In [23]:
# Add a normalize/repair function
def normalize_gene_name(name: str) -> str:
    return name.strip().upper().replace("GENE ", "")

def normalize_extraction(raw: Dict[str, Any], meta: Dict[str, Any], condition: str, relevance_score: float) -> Dict[str, Any]:
    out = {
        "paper_id": meta["paper_id"],
        "title": meta["title"],
        "year": meta.get("year"),
        "journal": meta.get("journal"),
        "condition": condition,
        "relevance_score": float(relevance_score),
        "summary": "",
        "key_takeaways": [],
        "findings": [],
        "authors": meta.get("authors", []),
        "doi": meta.get("doi"),
    }

    # ... existing logic that fills out["findings"] ...

    # FINAL NORMALIZATION STEP
    cleaned_findings = []
    for f in out["findings"]:
        if not isinstance(f, dict):
            continue

        if f.get("category") == "gene" and f.get("name"):
            f["name"] = normalize_gene_name(f["name"])

        cleaned_findings.append(f)

    out["findings"] = cleaned_findings
    return out


In [24]:
# -----------------------
# E) ADK agent: relevance agent
# -----------------------

RELEVANCE_SYSTEM = """
You are a strict biomedical relevance screener.

Given:
- a target genetic condition (string)
- paper metadata + partial text

Decide if this paper is relevant to the condition's clinical phenotype, diagnosis, pathophysiology, or treatment.

Return ONE JSON object only:
{
  "paper_id": "...",
  "title": "...",
  "is_relevant": true/false,
  "relevance_score": 0.0-1.0,
  "reason": "...",
  "matched_terms": ["..."]
}

Rules:
- If the paper is about a different disorder, animal model not clearly tied, or generic metabolism without condition link => not relevant.
- If it mentions synonyms, gene name(s), biochemical markers, or known treatment of the condition => likely relevant.
- Be conservative: false unless there is clear evidence.
No markdown fences. JSON only.
"""

RELEVANCE_USER = """
TARGET CONDITION: {condition}

PAPER:
paper_id: {paper_id}
title: {title}

TEXT (partial):
{text}
"""

relevance_agent = LlmAgent(
    name="PaperRelevanceScreener",
    model=llm,
    instruction=RELEVANCE_SYSTEM,
)


In [25]:
# -----------------------
# E) ADK agent: reads one paper
# -----------------------
READER_SYSTEM = """
Return ONE valid JSON object ONLY with EXACTLY these keys:
{
  "paper_id": "...",
  "title": "...",
  "year": null,
  "journal": null,
  "condition": "...",
  "relevance_score": 0.0,
  "summary": "...",
  "key_takeaways": ["...", "..."],
  "findings": [
    {"name":"...", "category":"symptom|treatment|outcome|population|gene|definition|other",
     "polarity":"supports|refutes|mixed|unclear",
     "snippet":"...", "section":"methods|results|discussion|abstract|unknown"}
  ]
}

Rules:
- ONLY extract items that relate to the TARGET CONDITION.
- Always try to extract:
  (A) definition statements about the disorder (category="definition")
  (B) genes explicitly mentioned as causal/associated (category="gene"; name must be gene symbol like PHGDH)
  (C) symptoms (category="symptom")
  (D) treatments/interventions (category="treatment")
- If not present in the paper, omit those items (don’t guess).
- snippet must be <=2 sentences and clearly support the item.
- No markdown fences. No extra keys.
"""


READER_USER_TEMPLATE = """
TARGET CONDITION: {condition}

PAPER METADATA:
paper_id: {paper_id}
title: {title}
year: {year}
journal: {journal}
relevance_score: {relevance_score}

PAPER TEXT:
{text}

Return JSON only.
"""
paper_reader = LlmAgent(
    name="PaperReader",
    model=llm,
    instruction=READER_SYSTEM,
)

In [26]:
# -----------------------
# F) Scoring
# -----------------------
def beta_interval_90(k: int, n: int) -> Tuple[float, float]:
    """Approx 90% interval for support rate using Beta(1+k, 1+n-k) normal approximation."""
    if n <= 0:
        return (0.0, 1.0)
    a = 1 + k
    b = 1 + (n - k)
    mean = a / (a + b)
    var = (a * b) / (((a + b) ** 2) * (a + b + 1))
    sd = math.sqrt(var)
    z = 1.645  # ~90%
    return (max(0.0, mean - z * sd), min(1.0, mean + z * sd))

def aggregate_and_score(extractions: List[PaperExtraction]) -> Dict[str, Any]:
    N = len(extractions)
    # weights = {p.paper_id: journal_weight(p.journal) for p in extractions}
    weights = {
    p.paper_id: 0.5 * journal_weight(p.journal) + 0.5 * float(p.relevance_score or 0.0)
    for p in extractions
    }
    total_w = sum(weights.values()) or 1.0

    agg: Dict[str, Dict[str, Any]] = {}
    supporters: Dict[str, set] = {}

    for p in extractions:
        w = weights.get(p.paper_id, DEFAULT_JOURNAL_WEIGHT)
        for f in p.findings:
            key = f"{f.category}:{f.name}".lower().strip()
            agg.setdefault(key, {
                "category": f.category,
                "name": f.name,
                "weighted_support": 0.0,
                "evidence": [],
                "snippets": []
            })
            supporters.setdefault(key, set())

            if f.polarity == "supports":
                agg[key]["weighted_support"] += w
                supporters[key].add(p.paper_id)

            agg[key]["evidence"].append({
                "paper_id": p.paper_id,
                "title": p.title,
                "year": p.year,
                "journal": p.journal,
                "paper_weight": round(w, 3),
                "polarity": f.polarity,
                "section": f.section,
            })
            agg[key]["snippets"].append(f.snippet)

    ranked = []
    for key, v in agg.items():
        k = len(supporters.get(key, set()))
        score = v["weighted_support"] / total_w
        lo, hi = beta_interval_90(k, N)
        ranked.append({
            "category": v["category"],
            "name": v["name"],
            "confidence_score": round(score, 3),     # weighted
            "supporting_papers": k,                 # count-based
            "papers_reviewed": N,
            "cred_interval_90": [round(lo, 3), round(hi, 3)],
            "example_snippets": v["snippets"][:3],
            "evidence": v["evidence"][:8],
        })

    ranked.sort(key=lambda x: x["confidence_score"], reverse=True)
    return {"ranked_findings": ranked}


In [27]:
# -----------------------
# G) Aggregator agent (write-up only)
# -----------------------
AGGREGATOR_SYSTEM = """
You are the Genex summarization agent.

You will receive scored_findings (ranked_findings) derived from papers.
Write a report that includes:

1) Condition (name)
2) Definition (ONLY from findings where category="definition"; if none, say "Not explicitly defined in the provided papers.")
3) Genes affected (ONLY from category="gene"; list + confidence scores)
4) Symptoms (ONLY from category="symptom"; ranked with confidence_score + interval)
5) Treatments (ONLY from category="treatment"; ranked with confidence_score + interval)

Rules:
- Do NOT use any outside knowledge.
- If a section has insufficient evidence, say so.
- When you list an item, include confidence_score and cred_interval_90.
- Use the evidence list to mention which papers support it (paper_id or title).
Output Markdown.
"""

AGGREGATOR_USER_TEMPLATE = """
## INPUTS

### Papers reviewed (per-paper summaries)
{paper_summaries}

### Scored evidence JSON
{scored_json}

## TASK
Write the final Genex Evidence Report in Markdown with sections:
1) Executive Summary (5-8 bullets)
2) Symptoms supported by literature (ranked)
3) Treatments/interventions supported by literature (ranked)
4) Outcomes / prognosis (if present)
5) Conflicting / uncertain areas
6) Limitations + what to read next
"""

evidence_writer = LlmAgent(
    name="EvidenceWriter",
    model=llm,
    instruction=AGGREGATOR_SYSTEM,
)

In [28]:
METADATA_SYSTEM = """
You extract bibliographic metadata from the first page(s) of a biomedical paper.

Return ONE valid JSON object ONLY with EXACTLY these keys:
{
  "title": "...",
  "authors": ["Last, First", "..."],
  "year": null,
  "journal": null,
  "doi": null
}

Rules:
- If unsure, use null (or [] for authors).
- Prefer author list as it appears (you can normalize to "Last, First" if easy).
- Do not hallucinate.
"""

metadata_agent = LlmAgent(
    name="PaperMetadataExtractor",
    model=llm,
    instruction=METADATA_SYSTEM,
)


In [29]:
# -----------------------
# H) Main pipeline
# -----------------------

APP_NAME = "genex-research"
USER_ID = "genex"

def extract_last_text(events) -> str:
    for e in reversed(events or []):
        c = getattr(e, "content", None)
        if c and getattr(c, "parts", None):
            for part in c.parts:
                t = getattr(part, "text", None)
                if t:
                    return t
        if getattr(e, "text", None):
            return e.text
    return ""

def safe_json_extract(text: str):
    """
    Safely extract a JSON object from an LLM response.
    Returns dict if successful, else None.
    """
    if not text or not isinstance(text, str):
        return None

    # Try direct parse first
    try:
        return json.loads(text)
    except Exception:
        pass

    # Try to find JSON inside text
    match = re.search(r"\{[\s\S]*\}", text)
    if match:
        try:
            return json.loads(match.group(0))
        except Exception:
            return None

    return None


def safe_json(text: str) -> Dict[str, Any]:
    s = text.strip()
    if s.startswith("```"):
        s = s.strip("`").replace("json", "", 1).strip()
    # Extract first JSON object if there is stray text
    start = s.find("{")
    end = s.rfind("}")
    if start != -1 and end != -1 and end > start:
        s = s[start:end+1]
    return json.loads(s)

async def run_condition_folder(condition: str, papers_dir: str = PAPERS_DIR) -> Dict[str, Any]:
    reader_runner = Runner(app_name=APP_NAME, agent=paper_reader, session_service=session_service)
    rel_runner = Runner(app_name=APP_NAME, agent=relevance_agent, session_service=session_service)
    writer_runner = Runner(app_name=APP_NAME, agent=evidence_writer, session_service=session_service)

    pdfs = list_pdfs(papers_dir)
    if not pdfs:
        return {"error": f"No PDFs found in {papers_dir}"}

    relevant_extractions: List[PaperExtraction] = []
    screened: List[Dict[str, Any]] = []
    failures: List[Dict[str, Any]] = []

    for i, path in enumerate(pdfs, start=1):
        raw_text = pdf_to_text(path, max_pages=MAX_PAGES_PER_PDF)
        meta = infer_metadata(path, raw_text[:12000])

        meta_text = pdf_to_text(path, max_pages=2)[:60000]

        meta_session = f"meta-{i}-{meta['paper_id']}"
        await session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=meta_session)

        meta_runner = Runner(app_name=APP_NAME, agent=metadata_agent, session_service=session_service)

        meta_msg = f"""
        PAPER_ID: {meta['paper_id']}
        FILENAME: {os.path.basename(path)}

        TEXT (first pages):
        {meta_text}
        """
        ev0_events = await collect_events(
            meta_runner.run_async(
                user_id=USER_ID,
                session_id=meta_session,
                new_message=Content(parts=[Part(text=meta_msg)])
            )
        )

        meta_json = safe_json_extract(extract_last_text(ev0_events))

        # Merge LLM metadata into meta (only if present)
        if isinstance(meta_json, dict):
            meta["title"] = meta_json.get("title") or meta["title"]
            meta["journal"] = meta_json.get("journal") or meta["journal"]
            meta["year"] = meta_json.get("year") or meta["year"]
            meta["doi"] = meta_json.get("doi") or meta.get("doi")
            meta["authors"] = meta_json.get("authors") or meta.get("authors", [])


        # Create session for screening
        screen_session = f"screen-{i}-{meta['paper_id']}"
        await session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=screen_session)

        # ---- 1) Relevance screening on FIRST ~3-5 pages (fast)
        screen_text = pdf_to_text(path, max_pages=5)[:60000]
        rel_msg = RELEVANCE_USER.format(
            condition=condition,
            paper_id=meta["paper_id"],
            title=meta["title"],
            text=screen_text,
        )

        try:
            ev = list(rel_runner.run(
                user_id=USER_ID,
                session_id=screen_session,
                new_message=Content(parts=[Part(text=rel_msg)])
            ))
            rel_raw = safe_json(extract_last_text(ev))
            rel = RelevanceDecision.model_validate({
                "paper_id": meta["paper_id"],
                "title": meta["title"],
                **rel_raw
            })
            screened.append(rel.model_dump())

            if not rel.is_relevant or rel.relevance_score < 0.55:
                print(f"[{i}/{len(pdfs)}] SKIP (irrelevant): {meta['paper_id']} score={rel.relevance_score:.2f}")
                continue

            print(f"[{i}/{len(pdfs)}] RELEVANT: {meta['paper_id']} score={rel.relevance_score:.2f}")

            # ---- 2) Extraction on more pages (slow)
            extract_session = f"paper-{i}-{meta['paper_id']}"
            await session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=extract_session)

            text_for_llm = raw_text[:140000]
            user_msg = READER_USER_TEMPLATE.format(
                condition=condition,
                paper_id=meta["paper_id"],
                title=meta["title"],
                year=meta.get("year"),
                journal=meta.get("journal"),
                relevance_score=rel.relevance_score,
                text=text_for_llm,
            )

            ev2 = list(reader_runner.run(
                user_id=USER_ID,
                session_id=extract_session,
                new_message=Content(parts=[Part(text=user_msg)])
            ))

            obj = safe_json(extract_last_text(ev2))
            obj["condition"] = condition
            obj["relevance_score"] = float(rel.relevance_score)
            # Ensure metadata fallback
            obj.setdefault("paper_id", meta["paper_id"])
            obj.setdefault("title", meta["title"])
            obj.setdefault("year", meta.get("year"))
            obj.setdefault("journal", meta.get("journal"))

            relevant_extractions.append(PaperExtraction.model_validate(obj))

        except Exception as e:
            failures.append({"paper_id": meta["paper_id"], "error": repr(e), "path": path})
            print(f"[{i}/{len(pdfs)}] FAILED: {meta['paper_id']} -> {e}")

    if not relevant_extractions:
        return {
            "condition": condition,
            "screened": screened,
            "failures": failures,
            "error": "No relevant papers after screening."
        }


    # ---- 3) Deterministic scoring on relevant papers only
    scored = aggregate_and_score(relevant_extractions)

    # ---- 4) Writer report
    writer_session = "final-writeup"
    await session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=writer_session)

    paper_summaries = [{
        "paper_id": p.paper_id,
        "title": p.title,
        "year": p.year,
        "journal": p.journal,
        "relevance_score": p.relevance_score,
        "summary": p.summary,
        "key_takeaways": p.key_takeaways[:6],
    } for p in relevant_extractions]

    writer_msg = AGGREGATOR_USER_TEMPLATE.format(
        paper_summaries=json.dumps(paper_summaries, indent=2),
        scored_json=json.dumps({
            **scored,
            "condition": condition,
            "papers_screened": len(pdfs),
            "papers_relevant": len(relevant_extractions),
        }, indent=2),
    )

    ev3 = list(writer_runner.run(
        user_id=USER_ID,
        session_id=writer_session,
        new_message=Content(parts=[Part(text=writer_msg)])
    ))
    report_md = extract_last_text(ev3)

    return {
        "condition": condition,
        "papers_screened": len(pdfs),
        "papers_relevant": len(relevant_extractions),
        "screened": screened,
        "papers": [p.model_dump() for p in relevant_extractions],
        "scored": scored,
        "report_markdown": report_md,
        "failures": failures,
    }

In [30]:
async def collect_events(async_gen):
    events = []
    async for e in async_gen:
        events.append(e)
    return events


In [31]:
result = await run_condition_folder("L-serine deficiency disorder", PAPERS_DIR)
print(result["papers_screened"], result["papers_relevant"])
print(result["report_markdown"][:1500])
result["scored"]["ranked_findings"][:10]

[1/11] RELEVANT: 12534373.pdf score=0.80
[2/11] RELEVANT: Bookshelf_NBK592681.pdf score=1.00
[3/11] RELEVANT: Human Mutation - 2020 - Abdelfattah - Expanding the genotypic and phenotypic spectrum of severe serine biosynthesis.pdf score=0.95
[4/11] RELEVANT: Intl J of Devlp Neuroscience - 2022 - Fu - Mild phenotypes of phosphoglycerate dehydrogenase deficiency by a novel mutation.pdf score=0.85
[5/11] RELEVANT: Two new cases of serine deficiency disorders treated with l-serine - PubMed.pdf score=0.90
[6/11] RELEVANT: fgene-13-949038.pdf score=0.90


Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x000001A0E3637990>


CancelledError: 

In [None]:
rf = result["scored"]["ranked_findings"]
symptoms = [x for x in rf if x["category"] == "symptom"]

print("Total symptom items:", len(symptoms))
print("Top 20 symptoms by score:")
for x in symptoms[:20]:
    print(x["name"], x["confidence_score"], x["supporting_papers"])


##Build a searchable “Evidence Index” from your extracted findings

In [None]:
def build_evidence_index(papers: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
    """
    Turns your per-paper extractions into atomic evidence rows that are easy to retrieve.
    Each row is one (paper, finding, snippet).
    """
    rows = []
    for p in papers:
        for f in (p.get("findings") or []):
            rows.append({
                "paper_id": p.get("paper_id"),
                "title": p.get("title"),
                "authors": p.get("authors") or [],
                "year": p.get("year"),
                "journal": p.get("journal"),
                "doi": p.get("doi"),
                "condition": p.get("condition"),
                "relevance_score": p.get("relevance_score", 0.0),

                "name": f.get("name"),
                "category": f.get("category"),
                "polarity": f.get("polarity"),
                "section": f.get("section"),
                "snippet": f.get("snippet"),
            })
    return rows


def _tokenize(s: str) -> List[str]:
    return re.findall(r"[a-z0-9]+", (s or "").lower())


def retrieve_evidence(question: str, evidence_rows: List[Dict[str, Any]], top_k: int = 14) -> List[Dict[str, Any]]:
    """
    Lightweight lexical retrieval (no embeddings needed).
    Works well because your rows already contain normalized names + snippets.
    """
    qtok = set(_tokenize(question))
    if not qtok:
        return evidence_rows[:top_k]

    scored = []
    for r in evidence_rows:
        blob = " ".join([
            str(r.get("name","")),
            str(r.get("category","")),
            str(r.get("snippet","")),
            str(r.get("title","")),
            str(r.get("journal","")),
        ]).lower()

        btok = set(_tokenize(blob))
        overlap = len(qtok & btok)

        # small boosts
        boost = 0.0
        if r.get("polarity") == "supports":
            boost += 0.25
        boost += 0.15 * float(r.get("relevance_score") or 0.0)

        score = overlap + boost
        if overlap > 0:
            scored.append((score, r))

    scored.sort(key=lambda x: x[0], reverse=True)
    return [r for _, r in scored[:top_k]]


In [None]:
#Create the QA agent
QA_SYSTEM = """
You are a biomedical literature Q&A agent.

You will be given:
- a user question
- a set of EVIDENCE ITEMS extracted from papers (each has snippet + citation fields)

Your job:
- Answer ONLY using the provided evidence items.
- If the evidence does not contain an answer, say: "Not found in the provided papers."
- Use inline citations like [1], [2] where the numbers correspond to the evidence item IDs.
- Prefer "supports" polarity evidence, but mention mixed/unclear if present.

Output format (Markdown):
1) Answer (short, direct)
2) Evidence (bullets; each bullet includes a snippet + citation)
3) References (numbered; paper title, authors, journal, year; include DOI if present)

Do not invent papers, authors, journals, or outcomes.
"""

qa_agent = LlmAgent(
    name="PaperQAAgent",
    model=llm,
    instruction=QA_SYSTEM,
)

In [None]:
async def ask_papers(question: str, result: Dict[str, Any], top_k: int = 14) -> str:
    """
    result is the dict returned by run_condition_folder(...)
    """
    papers = result.get("papers") or []
    evidence_rows = build_evidence_index(papers)
    top = retrieve_evidence(question, evidence_rows, top_k=top_k)

    # If nothing retrieved, still give the agent the chance to say "not found"
    lines = []
    for i, r in enumerate(top, start=1):
        authors = ", ".join(r.get("authors") or [])
        lines.append(
            f"""EVIDENCE_ITEM [{i}]
paper_id: {r.get("paper_id")}
title: {r.get("title")}
authors: {authors if authors else "UNKNOWN"}
journal: {r.get("journal")}
year: {r.get("year")}
doi: {r.get("doi")}
category: {r.get("category")}
name: {r.get("name")}
polarity: {r.get("polarity")}
section: {r.get("section")}
snippet: {r.get("snippet")}
"""
        )

    qa_context = "\n".join(lines) if lines else "NO EVIDENCE ITEMS RETRIEVED."

    qa_session = f"qa-{re.sub(r'[^a-z0-9]+','-', question.lower())[:40]}"
    await session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=qa_session)

    qa_runner = Runner(app_name=APP_NAME, agent=qa_agent, session_service=session_service)

    prompt = f"""
USER QUESTION:
{question}

{qa_context}
"""

    ev_events = await collect_events(
        qa_runner.run_async(
            user_id=USER_ID,
            session_id=qa_session,
            new_message=Content(parts=[Part(text=prompt)])
        )
    )
    return extract_last_text(ev_events)

In [None]:

# 2) Ask questions against the extracted paper evidence
ans1 = await ask_papers("Are there different types of serine deficiency?", result)
print(ans1)

ans2 = await ask_papers("Is there any therapy recommended?", result)
print(ans2)

ans3 = await ask_papers("Any evidence for L-serine supplementation improving seizures?", result)
print(ans3)
