# Genex Meta-Study Researcher (v3 — Evidence Synthesis Report + Q&A)

This version adds a **Report Synthesis Agent** so `result["report_markdown"]` becomes a clinician-style evidence synthesis (not a raw snippet dump), while keeping your evidence-grounded Q&A agent.

## Output report structure
1) Executive Summary including definition and genes affected  
2) Symptoms supported by literature (ranked)  
3) Treatments/interventions supported by literature (ranked)  
4) Outcomes / prognosis (if present)  
5) Conflicting / uncertain areas  
6) Limitations + what to read next  


In [1]:

# -----------------------
# 0) Imports
# -----------------------
import os
import re
import json
from typing import Any, Dict, List, Optional

import fitz  # PyMuPDF
from pydantic import BaseModel, Field

from collections import defaultdict, Counter
import uuid

# Google ADK
from google.adk.agents import LlmAgent
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.adk.models.lite_llm import LiteLlm

# Optional GenAI message types (depends on installed package)
try:
    from google.genai.types import Content, Part
    _HAS_GENAI_TYPES = True
except Exception:
    Content, Part = None, None
    _HAS_GENAI_TYPES = False


In [2]:

# -----------------------
# 1) Settings
# -----------------------
# TODO: change this to your local folder containing PDFs
PAPERS_DIR = r"C:/Users/T490/Downloads/Genex/docs/papers/serine_deficiency_papers"

MAX_PAGES_PER_PDF = 25

# LiteLLM-compatible model id
MODEL = "openai/gpt-4o-mini"

APP_NAME = "genex_meta_study"
USER_ID = "genex_user"

llm = LiteLlm(model=MODEL)
session_service = InMemorySessionService()


In [3]:

# -----------------------
# 2) Schemas
# -----------------------
class PaperMetadata(BaseModel):
    title: str
    authors: List[str] = Field(default_factory=list)
    year: Optional[int] = None
    journal: Optional[str] = None
    doi: Optional[str] = None

class RelevanceDecision(BaseModel):
    paper_id: str
    title: str
    is_relevant: bool = False
    relevance_score: float = 0.0  # 0..1
    reason: str = ""
    matched_terms: List[str] = Field(default_factory=list)

class ExtractedFinding(BaseModel):
    name: str = Field(..., description="Normalized term")
    category: str = Field(..., description="definition|gene|symptom|treatment|outcome|population|other")
    polarity: str = Field(..., description="supports|refutes|mixed|unclear")
    snippet: str = Field(..., description="<=2 sentences evidence snippet")
    section: str = Field(..., description="methods|results|discussion|abstract|unknown")

class PaperExtraction(BaseModel):
    paper_id: str
    title: str
    authors: List[str] = Field(default_factory=list)
    year: Optional[int] = None
    journal: Optional[str] = None
    doi: Optional[str] = None

    condition: str = ""
    relevance_score: float = 0.0
    summary: str = ""
    key_takeaways: List[str] = Field(default_factory=list)
    findings: List[ExtractedFinding] = Field(default_factory=list)


In [4]:

# -----------------------
# 3) Helper utilities (robust across ADK versions)
# -----------------------
def list_pdfs(folder: str) -> List[str]:
    return sorted(
        os.path.join(folder, f)
        for f in os.listdir(folder)
        if f.lower().endswith(".pdf")
    )

def pdf_to_text(path: str, max_pages: int = 20) -> str:
    doc = fitz.open(path)
    out = []
    for i in range(min(len(doc), max_pages)):
        out.append(doc.load_page(i).get_text("text"))
    doc.close()
    return "\n".join(out)

def safe_json_extract(text: str) -> Optional[Dict[str, Any]]:
    """Extract a JSON object from model output. Returns dict or None."""
    if not text or not isinstance(text, str):
        return None
    text = text.strip()

    try:
        obj = json.loads(text)
        if isinstance(obj, dict):
            return obj
    except Exception:
        pass

    fence = re.search(r"```json\s*(\{[\s\S]*?\})\s*```", text, flags=re.IGNORECASE)
    if fence:
        try:
            obj = json.loads(fence.group(1))
            if isinstance(obj, dict):
                return obj
        except Exception:
            pass

    m = re.search(r"(\{[\s\S]*\})", text)
    if m:
        try:
            obj = json.loads(m.group(1))
            if isinstance(obj, dict):
                return obj
        except Exception:
            pass

    return None

def _make_message(text: str):
    """Create ADK message payload in a version-tolerant way."""
    if _HAS_GENAI_TYPES and Content is not None and Part is not None:
        return Content(parts=[Part(text=text)])
    return text

async def collect_events(async_gen) -> List[Any]:
    events = []
    async for e in async_gen:
        events.append(e)
    return events

async def run_runner(runner: Runner, user_id: str, session_id: str, text: str) -> List[Any]:
    """Run ADK runner in a way that works for both streaming and non-streaming."""
    res = runner.run_async(user_id=user_id, session_id=session_id, new_message=_make_message(text))
    if hasattr(res, "__aiter__"):
        return await collect_events(res)
    out = await res
    if isinstance(out, list):
        return out
    return [out]

def _event_to_text(e: Any) -> str:
    if e is None:
        return ""
    if isinstance(e, str):
        return e
    if isinstance(e, dict):
        for k in ("text", "output_text"):
            v = e.get(k)
            if isinstance(v, str):
                return v
        content = e.get("content")
        if isinstance(content, dict):
            parts = content.get("parts") or []
            texts = []
            for p in parts:
                if isinstance(p, dict) and isinstance(p.get("text"), str):
                    texts.append(p["text"])
            return "\n".join(texts)

    for attr in ("text", "output_text"):
        if hasattr(e, attr):
            v = getattr(e, attr)
            if isinstance(v, str):
                return v

    if hasattr(e, "content"):
        c = getattr(e, "content")
        if hasattr(c, "parts"):
            parts = getattr(c, "parts") or []
            texts = []
            for p in parts:
                if hasattr(p, "text") and isinstance(getattr(p, "text"), str):
                    texts.append(getattr(p, "text"))
                elif isinstance(p, dict) and isinstance(p.get("text"), str):
                    texts.append(p["text"])
            if texts:
                return "\n".join(texts)
        if isinstance(c, str):
            return c

    return ""

def extract_last_text(events: List[Any]) -> str:
    for e in reversed(events or []):
        t = _event_to_text(e).strip()
        if t:
            return t
    return ""


In [5]:

# -----------------------
# 4) Aggregation + ranking helpers
# -----------------------
def _norm_key(s: str) -> str:
    return re.sub(r"\s+", " ", (s or "").strip().lower())

def _paper_cite(p: dict) -> str:
    title = p.get("title") or "(untitled)"
    authors = ", ".join(p.get("authors") or []) or "UNKNOWN"
    journal = p.get("journal") or "UNKNOWN JOURNAL"
    year = p.get("year") or "n.d."
    doi = p.get("doi")
    doi_txt = f", DOI: {doi}" if doi else ""
    return f"{title} — {authors}. {journal} ({year}){doi_txt}"

def _polarity_weight(polarity: str) -> float:
    polarity = (polarity or "").lower()
    if polarity == "supports":
        return 1.0
    if polarity == "mixed":
        return 0.6
    if polarity == "unclear":
        return 0.35
    if polarity == "refutes":
        return 0.2
    return 0.35

def aggregate_findings(papers: list) -> dict:
    buckets = defaultdict(lambda: defaultdict(lambda: {
        "score": 0.0,
        "count": 0,
        "papers": set(),
        "snippets": [],
        "polarities": Counter(),
    }))

    for p in papers:
        pscore = float(p.get("relevance_score") or 0.0)
        cite = _paper_cite(p)

        for f in (p.get("findings") or []):
            cat = (f.get("category") or "other").lower()
            name = (f.get("name") or "").strip()
            if not name:
                continue
            key = _norm_key(name)

            pol = (f.get("polarity") or "unclear").lower()
            w = _polarity_weight(pol)
            score_add = w * (0.75 + 0.25 * min(1.0, pscore))

            entry = buckets[cat][key]
            entry["score"] += score_add
            entry["count"] += 1
            entry["papers"].add(cite)
            entry["polarities"][pol] += 1

            snip = (f.get("snippet") or "").strip()
            if snip:
                entry["snippets"].append({"snippet": snip, "cite": cite, "polarity": pol})

    ranked = {}
    for cat, items in buckets.items():
        ranked_items = []
        for key, v in items.items():
            ranked_items.append({
                "name": key,
                "score": v["score"],
                "mentions": v["count"],
                "papers": sorted(v["papers"]),
                "polarities": v["polarities"],
                "snippets": v["snippets"][:12],
            })
        ranked_items.sort(key=lambda x: (x["score"], len(x["papers"])), reverse=True)
        ranked[cat] = ranked_items
    return ranked


In [6]:

# -----------------------
# 5) Agents
# -----------------------
METADATA_SYSTEM = """You extract bibliographic metadata from the first pages of a biomedical paper.

Return ONE valid JSON object ONLY with EXACTLY these keys:
{
  "title": "...",
  "authors": ["Last, First", "..."],
  "year": null,
  "journal": null,
  "doi": null
}

Rules:
- If unsure, use null (or [] for authors).
- Do not hallucinate.
"""

RELEVANCE_SYSTEM = """You decide whether a paper is relevant to the CONDITION.

Return ONE valid JSON object ONLY with EXACTLY these keys:
{
  "paper_id": "...",
  "title": "...",
  "is_relevant": false,
  "relevance_score": 0.0,
  "reason": "...",
  "matched_terms": ["...", "..."]
}

Rules:
- Use ONLY the provided text.
- relevance_score is 0..1.
- If uncertain, be conservative.
"""

EXTRACTION_SYSTEM = """You extract structured findings from a paper for the CONDITION.

Return ONE valid JSON object ONLY with EXACTLY these keys:
{
  "paper_id": "...",
  "title": "...",
  "year": null,
  "journal": null,
  "condition": "...",
  "relevance_score": 0.0,
  "summary": "...",
  "key_takeaways": ["...", "..."],
  "findings": [
    {"name":"...", "category":"definition|gene|symptom|treatment|outcome|population|other",
     "polarity":"supports|refutes|mixed|unclear",
     "snippet":"<=2 sentences", "section":"methods|results|discussion|abstract|unknown"}
  ]
}

Rules:
- Use ONLY the provided text.
- Do NOT invent details.
- Keep snippets short (<=2 sentences).
- Put gene names under category "gene" when possible.
"""

QA_SYSTEM = """You are a biomedical literature Q&A agent.

Answer ONLY using the evidence items you are given.
If evidence does not contain the answer, say: "Not found in the provided papers."

Use inline citations like [1], [2] corresponding to evidence item IDs.

Output Markdown:
1) Answer
2) Evidence (bullets with snippet + citation number)
3) References (numbered: title, authors, journal, year; include DOI if present)
"""

REPORT_SYSTEM = """You are generating a Genex Evidence Report for clinicians and families.

You will be given:
- Condition name
- Paper-level citations (numbered)
- Evidence items grouped by section (definition/genes/symptoms/treatments/outcomes/other)

STRICT RULES:
- Write a coherent synthesis, NOT a snippet dump.
- Use ONLY the evidence provided. Do not add facts.
- Weight language by strength of evidence (strong/moderate/limited).
- Every key claim must have an inline citation like [1], [2].
- If a section has insufficient evidence, say so.

Write Markdown with sections:
1) Executive Summary including definition and genes affected
2) Symptoms supported by literature (ranked)
3) Treatments/interventions supported by literature (ranked)
4) Outcomes / prognosis (if present)
5) Conflicting / uncertain areas
6) Limitations + what to read next
7) References (numbered list)
"""

metadata_agent = LlmAgent(name="PaperMetadataExtractor", model=llm, instruction=METADATA_SYSTEM)
relevance_agent = LlmAgent(name="PaperRelevanceScreener", model=llm, instruction=RELEVANCE_SYSTEM)
extraction_agent = LlmAgent(name="PaperExtractor", model=llm, instruction=EXTRACTION_SYSTEM)
qa_agent = LlmAgent(name="PaperQAAgent", model=llm, instruction=QA_SYSTEM)
report_agent = LlmAgent(name="GenexReportSynthesizer", model=llm, instruction=REPORT_SYSTEM)


In [7]:

# -----------------------
# 6) Pipeline + report synthesis
# -----------------------
def infer_meta_from_text(paper_id: str, path: str, raw_text: str) -> Dict[str, Any]:
    title = os.path.splitext(os.path.basename(path))[0]
    year = None
    m = re.search(r"(19\d{2}|20\d{2})", raw_text[:4000])
    if m:
        try:
            year = int(m.group(1))
        except Exception:
            year = None
    return {
        "paper_id": paper_id,
        "path": path,
        "title": title,
        "year": year,
        "journal": None,
        "authors": [],
        "doi": None,
    }

async def extract_metadata(paper_id: str, path: str) -> Dict[str, Any]:
    meta_text = pdf_to_text(path, max_pages=2)[:60000]
    meta = infer_meta_from_text(paper_id, path, meta_text)

    session_id = f"meta-{paper_id}"
    await session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=session_id)
    runner = Runner(app_name=APP_NAME, agent=metadata_agent, session_service=session_service)

    msg = f"""PAPER_ID: {paper_id}
FILENAME: {os.path.basename(path)}

TEXT (first pages):
{meta_text}
"""
    events = await run_runner(runner, user_id=USER_ID, session_id=session_id, text=msg)
    meta_json = safe_json_extract(extract_last_text(events))

    if isinstance(meta_json, dict):
        meta["title"] = meta_json.get("title") or meta["title"]
        meta["journal"] = meta_json.get("journal") or meta["journal"]
        meta["year"] = meta_json.get("year") or meta["year"]
        meta["doi"] = meta_json.get("doi") or meta.get("doi")
        meta["authors"] = meta_json.get("authors") or meta.get("authors", [])
    return meta

async def screen_relevance(condition: str, paper_id: str, title: str, text: str) -> RelevanceDecision:
    session_id = f"rel-{paper_id}"
    await session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=session_id)
    runner = Runner(app_name=APP_NAME, agent=relevance_agent, session_service=session_service)

    prompt = f"""CONDITION: {condition}
PAPER_ID: {paper_id}
TITLE: {title}

TEXT:
{text[:120000]}
"""
    events = await run_runner(runner, user_id=USER_ID, session_id=session_id, text=prompt)
    obj = safe_json_extract(extract_last_text(events)) or {}
    obj.setdefault("paper_id", paper_id)
    obj.setdefault("title", title)
    return RelevanceDecision(**obj)

async def extract_paper(condition: str, paper_id: str, meta: Dict[str, Any], text: str, relevance_score: float) -> PaperExtraction:
    session_id = f"ext-{paper_id}"
    await session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=session_id)
    runner = Runner(app_name=APP_NAME, agent=extraction_agent, session_service=session_service)

    prompt = f"""CONDITION: {condition}
PAPER_ID: {paper_id}
TITLE: {meta.get('title')}
YEAR: {meta.get('year')}
JOURNAL: {meta.get('journal')}
AUTHORS: {", ".join(meta.get("authors") or [])}
DOI: {meta.get('doi')}

TEXT:
{text[:140000]}
"""
    events = await run_runner(runner, user_id=USER_ID, session_id=session_id, text=prompt)
    obj = safe_json_extract(extract_last_text(events)) or {}

    obj["paper_id"] = paper_id
    obj["title"] = meta.get("title") or obj.get("title") or ""
    obj["authors"] = meta.get("authors") or obj.get("authors") or []
    obj["year"] = meta.get("year") or obj.get("year")
    obj["journal"] = meta.get("journal") or obj.get("journal")
    obj["doi"] = meta.get("doi") or obj.get("doi")
    obj["condition"] = condition
    obj["relevance_score"] = float(relevance_score or obj.get("relevance_score") or 0.0)

    if not isinstance(obj.get("findings"), list):
        obj["findings"] = []
    return PaperExtraction(**obj)

def _collect_unique_references(papers: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
    seen = set()
    refs = []
    for p in papers:
        cite = _paper_cite(p)
        if cite in seen:
            continue
        seen.add(cite)
        refs.append({
            "cite": cite,
            "title": p.get("title"),
            "authors": p.get("authors") or [],
            "journal": p.get("journal"),
            "year": p.get("year"),
            "doi": p.get("doi"),
        })
    return refs

def _select_evidence_items(ranked: dict, max_per_section: int = 18) -> Dict[str, List[Dict[str, Any]]]:
    sections = {
        "definition": ranked.get("definition", [])[:max_per_section],
        "gene": (ranked.get("gene", []) + ranked.get("genes", []))[:max_per_section],
        "symptom": ranked.get("symptom", [])[:max_per_section],
        "treatment": ranked.get("treatment", [])[:max_per_section],
        "outcome": ranked.get("outcome", [])[:max_per_section],
        "other": ranked.get("other", [])[:max_per_section],
    }

    out: Dict[str, List[Dict[str, Any]]] = {}
    for sec, items in sections.items():
        ev = []
        for it in items:
            for s in (it.get("snippets") or [])[:2]:
                ev.append({
                    "concept": it.get("name"),
                    "score": float(it.get("score") or 0.0),
                    "mentions": int(it.get("mentions") or 0),
                    "polarity": s.get("polarity") or "unclear",
                    "snippet": s.get("snippet") or "",
                    "cite": s.get("cite") or "",
                })
        out[sec] = ev
    return out

async def synthesize_report(condition: str, papers_screened: int, papers_relevant: int, papers: List[Dict[str, Any]], ranked: dict) -> str:
    evidence_by_section = _select_evidence_items(ranked, max_per_section=18)
    refs = _collect_unique_references(papers)

    ref_id_map: Dict[str, int] = {}
    numbered_refs = []
    for i, r in enumerate(refs, start=1):
        ref_id_map[r["cite"]] = i
        authors = ", ".join(r.get("authors") or []) or "UNKNOWN"
        journal = r.get("journal") or "UNKNOWN JOURNAL"
        year = r.get("year") or "n.d."
        doi = r.get("doi")
        doi_txt = f" doi: {doi}" if doi else ""
        numbered_refs.append(f"[{i}] {r.get('title') or '(untitled)'}, {authors}, {journal}, {year}.{doi_txt}")

    def format_evidence(items: List[Dict[str, Any]], max_items: int = 40) -> str:
        lines = []
        for it in items[:max_items]:
            rid = ref_id_map.get(it["cite"])
            rid_txt = f"[{rid}]" if rid is not None else "[?]"
            lines.append(
                f"- concept='{it['concept']}', evidence_score={it['score']:.2f}, mentions={it['mentions']}, polarity={it['polarity']} {rid_txt}\n"
                f"  snippet: {it['snippet']}"
            )
        return "\n".join(lines) if lines else "- (none)"

    prompt = "\n\n".join([
        f"CONDITION: {condition}",
        f"PAPERS_SCREENED: {papers_screened}",
        f"PAPERS_RELEVANT: {papers_relevant}",
        "REFERENCES (use these ids for citations):\n" + "\n".join(numbered_refs),
        "EVIDENCE ITEMS — definition:\n" + format_evidence(evidence_by_section.get("definition", [])),
        "EVIDENCE ITEMS — genes:\n" + format_evidence(evidence_by_section.get("gene", [])),
        "EVIDENCE ITEMS — symptoms:\n" + format_evidence(evidence_by_section.get("symptom", [])),
        "EVIDENCE ITEMS — treatments:\n" + format_evidence(evidence_by_section.get("treatment", [])),
        "EVIDENCE ITEMS — outcomes:\n" + format_evidence(evidence_by_section.get("outcome", [])),
        "EVIDENCE ITEMS — other/notes:\n" + format_evidence(evidence_by_section.get("other", [])),
    ])

    session_id = f"report-{uuid.uuid4().hex[:8]}"
    await session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=session_id)
    runner = Runner(app_name=APP_NAME, agent=report_agent, session_service=session_service)

    events = await run_runner(runner, user_id=USER_ID, session_id=session_id, text=prompt)
    return extract_last_text(events)

async def run_condition_folder(condition: str, folder: str) -> Dict[str, Any]:
    pdfs = list_pdfs(folder)
    papers: List[Dict[str, Any]] = []
    papers_screened = 0
    papers_relevant = 0

    for idx, path in enumerate(pdfs, start=1):
        paper_id = f"paper_{idx:03d}"
        raw_text = pdf_to_text(path, max_pages=MAX_PAGES_PER_PDF)
        papers_screened += 1

        meta = await extract_metadata(paper_id, path)
        rel = await screen_relevance(condition, paper_id, meta.get("title") or paper_id, raw_text)

        if not rel.is_relevant or rel.relevance_score < 0.20:
            continue

        papers_relevant += 1
        extraction = await extract_paper(condition, paper_id, meta, raw_text, rel.relevance_score)
        papers.append(extraction.model_dump())

    ranked = aggregate_findings(papers)

    report_markdown = await synthesize_report(
        condition=condition,
        papers_screened=papers_screened,
        papers_relevant=papers_relevant,
        papers=papers,
        ranked=ranked,
    )

    return {
        "condition": condition,
        "papers_screened": papers_screened,
        "papers_relevant": papers_relevant,
        "papers": papers,
        "ranked": ranked,
        "report_markdown": report_markdown,
    }


In [8]:

# -----------------------
# 7) Q&A over extracted evidence (unique session per question)
# -----------------------
def build_evidence_index(papers: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
    rows = []
    for p in papers:
        for f in (p.get("findings") or []):
            rows.append({
                "paper_id": p.get("paper_id"),
                "title": p.get("title"),
                "authors": p.get("authors") or [],
                "year": p.get("year"),
                "journal": p.get("journal"),
                "doi": p.get("doi"),
                "condition": p.get("condition"),
                "relevance_score": p.get("relevance_score", 0.0),
                "name": f.get("name"),
                "category": f.get("category"),
                "polarity": f.get("polarity"),
                "section": f.get("section"),
                "snippet": f.get("snippet"),
            })
    return rows

def _tokenize(s: str) -> List[str]:
    return re.findall(r"[a-z0-9]+", (s or "").lower())

def retrieve_evidence(question: str, evidence_rows: List[Dict[str, Any]], top_k: int = 14) -> List[Dict[str, Any]]:
    qtok = set(_tokenize(question))
    if not qtok:
        return evidence_rows[:top_k]

    scored = []
    for r in evidence_rows:
        blob = " ".join([
            str(r.get("name","")),
            str(r.get("category","")),
            str(r.get("snippet","")),
            str(r.get("title","")),
            str(r.get("journal","")),
        ]).lower()
        btok = set(_tokenize(blob))
        overlap = len(qtok & btok)
        boost = 0.0
        if (r.get("polarity") or "").lower() == "supports":
            boost += 0.25
        boost += 0.15 * float(r.get("relevance_score") or 0.0)
        score = overlap + boost
        if overlap > 0:
            scored.append((score, r))

    scored.sort(key=lambda x: x[0], reverse=True)
    return [r for _, r in scored[:top_k]]

async def ask_papers(question: str, result: Dict[str, Any], top_k: int = 14) -> str:
    papers = result.get("papers") or []
    evidence_rows = build_evidence_index(papers)
    top = retrieve_evidence(question, evidence_rows, top_k=top_k)

    lines = []
    for i, r in enumerate(top, start=1):
        authors = ", ".join(r.get("authors") or [])
        lines.append(
            f"""EVIDENCE_ITEM [{i}]
paper_id: {r.get('paper_id')}
title: {r.get('title')}
authors: {authors if authors else 'UNKNOWN'}
journal: {r.get('journal')}
year: {r.get('year')}
doi: {r.get('doi')}
category: {r.get('category')}
name: {r.get('name')}
polarity: {r.get('polarity')}
section: {r.get('section')}
snippet: {r.get('snippet')}
"""
        )

    qa_context = "\n".join(lines) if lines else "NO EVIDENCE ITEMS RETRIEVED."

    session_id = f"qa-{uuid.uuid4().hex[:8]}"
    await session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=session_id)
    runner = Runner(app_name=APP_NAME, agent=qa_agent, session_service=session_service)

    prompt = f"""USER QUESTION:
{question}

{qa_context}
"""
    events = await run_runner(runner, user_id=USER_ID, session_id=session_id, text=prompt)
    return extract_last_text(events)


In [9]:

# -----------------------
# 8) How to run (example)
# -----------------------
# If your notebook doesn't support top-level `await`, uncomment:
# import nest_asyncio
# nest_asyncio.apply()

# result = await run_condition_folder("L-serine deficiency disorder", PAPERS_DIR)
# print(result["papers_screened"], result["papers_relevant"])
# print(result["report_markdown"][:4000])

# Q&A examples:
# print(await ask_papers("What genes are linked to serine deficiency disorder?", result))
# print(await ask_papers("Is there any therapy recommended?", result))


In [10]:
result = await run_condition_folder("L-serine deficiency disorder", PAPERS_DIR)
print(result["papers_screened"], result["papers_relevant"])
print(result["report_markdown"][:4000])

11 6
# Genex Evidence Report: L-serine Deficiency Disorder

## 1) Executive Summary
L-serine deficiency disorder encompasses a range of conditions resulting from primary defects in the synthesis of L-serine, crucial for neurological health. It includes a spectrum that ranges from prenatal lethal conditions, such as Neu-Laxova syndrome, to forms manifesting at various ages, including juvenile and adult onset [1], [6]. Genetic mutations contributing to this disorder primarily involve three genes: **PHGDH** (3-phosphoglycerate dehydrogenase), **PSAT1** (phosphoserine aminotransferase), and **PSPH** (phosphoserine phosphatase). Identifying biallelic pathogenic variants in these genes is essential for diagnosis and understanding the disorder's manifestation [1], [6].

## 2) Symptoms Supported by Literature
Symptoms associated with L-serine deficiency disorder, ranked by evidence strength, include:
1. **Neu-Laxova Syndrome**: Characterized by severe intrauterine growth deficiency, microcepha

In [11]:
print(await ask_papers("What genes are linked to serine deficiency disorder?", result))

1) The genes linked to serine deficiency disorder are **PHGDH**, **PSAT1**, and **PSPH**.

2) Evidence:
   - "The diagnosis of a serine deficiency disorder is established in a proband with biallelic pathogenic variants in **PHGDH, PSAT1, or PSPH** identified by molecular genetic testing." [2]
   - "Patient 1 had serine deficiency due to **3-phosphoglycerate dehydrogenase (PHGDH)** deficiency." [3]
   - "Patient 2 had serine deficiency due to **phosphoserine aminotransferase (PSAT1)** deficiency." [4]

3) References:
1. Serine Deficiency Disorders; van der Crabben, SN, de Koning, TJ; GeneReviews; 2023.
2. Two new cases of serine deficiency disorders treated with l-serine; Brassier, A, Valayannopoulos, V, et al.; Eur J Paediatr Neurol; 2016; DOI: 10.1016/j.ejpn.2015.10.007.


In [12]:
print(await ask_papers("Is there any therapy recommended?", result))

1) Not found in the provided papers.

2) 
- "Neu-Laxova syndrome is characterized by severe intrauterine growth deficiency, microcephaly, congenital bilateral cataracts, characteristic dysmorphic features, limb anomalies, and collodion-like ichthyosis." [1]
- "The diagnosis of a serine deficiency disorder is established in a proband with biallelic pathogenic variants in PHGDH, PSAT1, or PSPH identified by molecular genetic testing." [2]
- "Adult-onset serine deficiency is characterized by progressive axonal polyneuropathy with ataxia and possible cognitive impairment." [3]

3) 
1. Serine Deficiency Disorders, van der Crabben, SN, de Koning, TJ, GeneReviews, 2023.


In [13]:
print(await ask_papers("How a baby gets serine deficiency?", result))

1) Answer  
A baby can develop serine deficiency due to genetic factors that influence its metabolism, such as biallelic pathogenic variants in genes like PHGDH, PSAT1, or PSPH, which play critical roles in serine biosynthesis [2]. Additionally, reduced serine levels can be observed in fetal cord blood, indicating that the deficiency may arise during pregnancy [14].

2) Evidence  
- "The diagnosis of a serine deficiency disorder is established in a proband with biallelic pathogenic variants in PHGDH, PSAT1, or PSPH identified by molecular genetic testing." [2]  
- "Reduced serine levels in fetal cord blood may also be diagnostic as early as 30 weeks of pregnancy." [14]  

3) References  
1. Serine Deficiency Disorders, van der Crabben, SN, de Koning, TJ, GeneReviews, 2023.  
2. Two new cases of serine deficiency disorders treated with l-serine, Brassier, A et al., Eur J Paediatr Neurol, 2016; DOI: 10.1016/j.ejpn.2015.10.007.  
3. Juvenile-onset PSAT1-related neuropathy: A milder phenot

In [15]:
print(await ask_papers("What if both parents have one copy of any of genese related to serine deficiency mutation on the same copy?", result))

1) If both parents have one copy of any gene related to serine deficiency mutation on the same copy, the offspring may have a risk of inheriting homozygous mutations, leading to serine deficiency disorders. Such disorders can manifest in various clinical phenotypes, including potential neurological and cognitive impairments.

2) Evidence:
- "A homozygous variant c.43G > C (p.A15P) in the PSAT1 gene was identified in both patients." [1]
- "Serine deficiency disorders include a spectrum of disease ranging from lethal prenatal-onset Neu-Laxova syndrome to serine deficiency with infantile, juvenile, or adult onset." [6]
- "The diagnosis of a serine deficiency disorder is established in a proband with biallelic pathogenic variants in PHGDH, PSAT1, or PSPH identified by molecular genetic testing." [7]

3) References:
1. Juvenile-onset PSAT1-related neuropathy: A milder phenotype of serine deﬁciency disorder, Shen, Yu et al., Frontiers in Genetics, 2022; DOI: 10.3389/fgene.2022.949038
2. Seri