In [1]:
from smolagents import ToolCollection, ToolCallingAgent, OpenAIServerModel
from mcp import StdioServerParameters
import os
import json

In [2]:
os.environ['NEBIUS_API_KEY'] = open('secret.txt', 'r').read().strip()

In [3]:
MODEL_T = "Qwen/Qwen3-235B-A22B-Thinking-2507"
MODEL_I = "Qwen/Qwen3-235B-A22B-Instruct-2507"

model_t = OpenAIServerModel(
    model_id=MODEL_T,
    api_key=os.environ["NEBIUS_API_KEY"],
    api_base="https://api.studio.nebius.com/v1/",
    temperature=0,
)

model_i = OpenAIServerModel(
    model_id=MODEL_I,
    api_key=os.environ["NEBIUS_API_KEY"],
    api_base="https://api.studio.nebius.com/v1/",
    temperature=0,
)

In [8]:
uniprot_output = '''
1. **Canonical sequence and isoforms**  
   - Canonical sequence: UniProtKB accession P10912, length 638 amino acids.  
     FASTA: [https://www.uniprot.org/uniprotkb/P10912/entry](https://www.uniprot.org/uniprotkb/P10912/entry)  
   - Isoform 1 (GHRfl): Accession P10912-1, length 638.  
   - Isoform 2 (GHRtr, GHR1-279): Accession P10912-2, length 295.  
   - Isoform 3 (GHR1-277): Accession P10912-3, length 294.  
   - Isoform 4 (GHRd3): Accession P10912-4, length 638.

2. **Functional sequence intervals**  
   - Signal peptide: 1–18.  
   - Extracellular domain: 19–264; contains ligand-binding region.  
   - Transmembrane domain: 265–288; helical.  
   - Cytoplasmic domain: 289–638.  
   - Fibronectin type-III domain: 151–254; structural domain in extracellular region.  
   - WSXWS motif: 240–244; required for proper protein folding and cell-surface receptor binding.  
   - Box 1 motif: 297–305; required for JAK2 interaction and activation.  
   - UbE motif: 340–349; required for ubiquitin conjugation system recruitment and receptor internalization.  
   - Disordered region: 353–391; contains compositional bias for basic and acidic residues.  
   - Region critical for ADAM17-mediated proteolysis: 260–262.

3. **Natural variants and mutagenesis data**  
   - C56S → Laron syndrome; disrupts disulfide bond.  
   - S58L → Laron syndrome.  
   - E62K → partial growth hormone insensitivity.  
   - W68R → Laron syndrome.  
   - R89K → Laron syndrome.  
   - F114S → Laron syndrome; loss of ligand binding.  
   - V143A → Laron syndrome.  
   - P149Q → Laron syndrome; disrupts GH binding.  
   - V162D/F/I → Laron syndrome or idiopathic short stature.  
   - D170H → Laron syndrome; abolishes receptor homodimerization.  
   - I171T → Laron syndrome; nearly abolishes GH binding.  
   - Q172P → Laron syndrome; nearly abolishes GH binding.  
   - V173G → Laron syndrome; nearly abolishes GH binding.  
   - R179C/H → Laron syndrome or partial insensitivity.  
   - Y226C → Laron syndrome.  
   - R229G/H → Laron syndrome or idiopathic short stature.  
   - E242D → idiopathic short stature.  
   - S244I → Laron syndrome.  
   - D262N → Laron syndrome.  
   - I544L → associated with familial hypercholesterolemia phenotype.  
   - Y487F → increased signaling due to reduced ubiquitination.  
   - Y595F → increased signaling due to reduced ubiquitination.

4. **Post-translational modifications (PTM)**  
   - N-linked glycosylation at N46, N115, N156, N161, N200.  
   - Phosphoserine at S341.  
   - Phosphotyrosine at Y487 and Y595; mediated by JAK2 upon GH binding.  
   - Ubiquitination following JAK2-mediated phosphorylation; leads to proteasomal degradation via ECS(SOCS2) complex.

5. **Cross-references**  
   - HGNC: HGNC:4263 → official gene nomenclature.  
   - Ensembl: ENSG00000112964 → genomic context and transcript variants.  
   - PDB: 1AXI, 1HWG, 2AEW, 5OHD, 6I5J → 3D structures of extracellular and transmembrane domains, and SOCS2 complex.  
   - Reactome: R-HSA-982772 → growth hormone receptor signaling pathway.  
   - Gene Ontology: GO:0004903 → growth hormone receptor activity; GO:0060396 → growth hormone receptor signaling pathway.  
   - KEGG: hsa:2690 → pathway mapping in human.

6. **Subcellular location**  
   - Cell membrane; single-pass type I membrane protein.  
   - Upon GH binding, internalized and ubiquitinated for degradation.  
   - Soluble form (GHBP) is secreted and binds circulating GH.  
   - Isoform 2 remains fixed at the cell membrane and is not internalized.

7. **Longevity relevance**  
   - Info unavailable.

'''

In [11]:
opengenes_output = '''
Based on the OpenGenes database, here is the comprehensive information for the GHR gene and its related genes (GHRH and GHRHR) across different species:

### Longevity/Lifespan/Healthspan Association

The GHR gene and its related genes (GHRH, GHRHR) are strongly associated with longevity and lifespan extension:

- **GHR (Growth Hormone Receptor)**: Loss of function mutations in GHR significantly increase lifespan in mice. Studies show lifespan increases ranging from 9% to 68.2% mean increase, with some studies showing up to 31.6% maximum increase. The gene is associated with improved insulin sensitivity and glucose metabolism.

- **GHRH (Growth Hormone Releasing Hormone)**: Knockout of GHRH increases lifespan in mice by 36.9-49.1% median increase, with some studies showing up to 140% minimum increase.

- **GHRHR (Growth Hormone Releasing Hormone Receptor)**: Mutations in GHRHR increase lifespan in mice by 38.9% median increase and 38.0% maximum increase.


All three genes are linked to the INS/IGF-1 pathway dysregulation, a well-known longevity pathway.

### Modification Effects

- **GHR**: Loss of function modifications (gene knockout) consistently increase lifespan in mice across multiple studies and genetic backgrounds. The effect is more pronounced in males than females in some studies.

- **GHRH**: Loss of function (gene knockout) increases lifespan in both male and female mice, with effects observed under both ad libitum and calorie-restricted diets.

- **GHRHR**: Gene modification leading to loss of function increases lifespan in mice.

### Known Genetic Interventions

- Gene knockout of GHR in various mouse strains (C57BL/6J, Ola-BALB/cJ, 129/OlaHsd x BALB/c)
- Tissue-specific knockout of GHR in muscle tissue
- Inducible knockout of GHR using tamoxifen
- Gene modification of GHRHR (lit/lit genotype)
- Gene knockout of GHRH

### Disease Involvement

While not explicitly detailed in the database, the GHR pathway is known to be involved in:
- Laron syndrome (GHR deficiency)
- Insulin resistance and type 2 diabetes (due to effects on glucose metabolism)
- Cancer (IGF-1 pathway involvement)
- Age-related diseases due to its central role in the growth hormone/IGF-1 axis

### Table of SNPs

| rsID | Polymorphism type | Association type | Significance | Research type | Link to details |
|------|-------------------|------------------|------------|---------------|----------------|
| rs4130113 | SNP | Longevity association | Significant | candidate genes study | [10.1093/gerona/glx247](https://doi.org/10.1093/gerona/glx247) |
| rs2940923 | SNP | Longevity association | Significant | candidate genes study | [10.1111/j.1474-9726.2009.00493.x](https://doi.org/10.1111/j.1474-9726.2009.00493.x) |
| rs2940935 | SNP | Longevity association | Significant | candidate genes study | [10.1111/j.1474-9726.2009.00493.x](https://doi.org/10.1111/j.1474-9726.2009.00493.x) |
| rs3764451 | SNP | Longevity association | Significant | candidate genes study | [10.1111/j.1474-9726.2009.00493.x](https://doi.org/10.1111/j.1474-9726.2009.00493.x) |
| rs12153009 | SNP | Longevity association | Significant | candidate genes study | [10.1111/j.1474-9726.2009.00493.x](https://doi.org/10.1111/j.1474-9726.2009.00493.x) |
| rs6883523 | SNP | Longevity association | Significant | candidate genes study | [10.1111/j.1474-9726.2009.00493.x](https://doi.org/10.1111/j.1474-9726.2009.00493.x) |
| rs4866941 | SNP | Longevity association | Significant | candidate genes study | [10.1111/j.1474-9726.2009.00493.x](https://doi.org/10.1111/j.1474-9726.2009.00493.x) |
| rs4292454 | SNP | Longevity association | Significant | candidate genes study | [10.1111/j.1474-9726.2009.00493.x](https://doi.org/10.1111/j.1474-9726.2009.00493.x) |
| rs6887528 | SNP | Longevity association | Significant | candidate genes study | [10.1111/j.1474-9726.2009.00493.x](https://doi.org/10.1111/j.1474-9726.2009.00493.x) |
| d3-GHR | In/Del | Longevity association | Significant | candidate genes study | [10.1126/sciadv.1602025](https://doi.org/10.1126/sciadv.1602025) |

*Note: The table includes all SNPs associated with GHR in the database. The same SNPs appear multiple times as they were studied in different ethnic populations (Caucasian, Ashkenazi Jewish, Amish, French, Japanese). The d3-GHR polymorphism is an exon 3 deletion in the GHR gene that has been associated with longevity.*

'''

In [9]:
kegg_output = '''
{'query': 'GHR',
 'kegg': {'entry': {'hsa_id': 'hsa:2690',
   'symbol': 'GHR',
   'name': 'growth hormone receptor',
   'ko': 'K05080',
   'organism': 'Homo sapiens',
   'position_text': None,
   'strand': None,
   'start': None,
   'end': None,
   'notes': 'The GHR gene encodes the growth hormone receptor, which binds growth hormone and initiates signaling cascades involved in growth, metabolism, and cell proliferation. It is a member of the cytokine receptor family and activates the JAK-STAT pathway. Mutations in this gene are associated with Laron syndrome, a form of growth hormone insensitivity.'},
  'pathways': [{'map_id': 'hsa04935',
    'title': 'Growth hormone synthesis, secretion and action - Homo sapiens (human)',
    'map_url': 'https://www.kegg.jp/pathway/hsa04935',
    'image_url': 'https://www.kegg.jp/kegg/pathway/hsa04935.png',
    'notes': 'GHR mediates the action of growth hormone in target tissues, regulating postnatal growth and metabolic processes. This pathway includes synthesis, secretion, and downstream signaling of growth hormone.'},
   {'map_id': 'hsa04060',
    'title': 'Cytokine-cytokine receptor interaction - Homo sapiens (human)',
    'map_url': 'https://www.kegg.jp/pathway/hsa04060',
    'image_url': 'https://www.kegg.jp/kegg/pathway/hsa04060.png',
    'notes': 'GHR is a member of the cytokine receptor superfamily and participates in cytokine signaling by binding growth hormone, a cytokine-like hormone, and activating intracellular JAK-STAT signaling.'},
   {'map_id': 'hsa04630',
    'title': 'JAK-STAT signaling pathway - Homo sapiens (human)',
    'map_url': 'https://www.kegg.jp/pathway/hsa04630',
    'image_url': 'https://www.kegg.jp/kegg/pathway/hsa04630.png',
    'notes': 'GHR activates the JAK2-STAT5 signaling cascade upon ligand binding. This pathway is central to growth, immune function, and metabolism. Dysregulation is linked to growth disorders and cancer.'}],
  'diseases': [{'entry_id': 'H02037',
    'name': 'Laron syndrome',
    'description': 'Laron syndrome is an autosomal recessive disorder characterized by growth hormone insensitivity, leading to severe postnatal growth failure, distinctive facial features, and metabolic abnormalities. It results from mutations in the GHR gene that impair receptor function.',
    'brite': ['Human Diseases [BR:br08401]',
     'Endocrine and metabolic diseases [BR:br08402]'],
    'urls': ['https://www.kegg.jp/entry/H02037'],
    'notes': 'Loss-of-function mutations in GHR cause Laron syndrome, highlighting the critical role of GHR in mediating growth hormone effects on growth and metabolism.'}],
  'drugs': [{'entry_id': 'D11486',
    'name': 'Lonapegsomatropin',
    'class': ['Hormone [ATC:A02BC]', 'Growth hormone [ATC:H01AC]'],
    'efficacy': 'Recombinant human growth hormone used to treat growth failure due to insufficient endogenous growth hormone.',
    'targets': [{'gene': 'hsa:2690', 'symbol': 'GHR', 'ko': 'K05080'}],
    'pathways': ['hsa04935'],
    'structure_image_url': 'https://www.kegg.jp/ligand/D11486',
    'is_target_of_gene': False,
    'notes': 'Lonapegsomatropin is a long-acting growth hormone analog that binds to GHR, mimicking endogenous GH activity. It is used in pediatric growth disorders, including those related to GHR dysfunction.'}],
  'modules': [],
  'ssdb': {'orthologs_top10': [], 'paralogs': []},
  'sources': ['https://www.kegg.jp/entry/hsa:2690',
   'https://www.kegg.jp/entry/K05080',
   'https://www.kegg.jp/pathway/hsa04935',
   'https://www.kegg.jp/pathway/hsa04060',
   'https://www.kegg.jp/pathway/hsa04630',
   'https://www.kegg.jp/entry/H02037',
   'https://www.kegg.jp/entry/D11486']}}

'''

In [12]:
summarizer_prompt = f"""
You are a bioinformatics summarization agent specialized in the **Longevity Sequence-to-Function Knowledge Base**.
You will receive as input structured JSON outputs from three data sources:
- **UniProt MCP output** (protein sequence, domains, motifs, variants, PTMs)
- **KEGG MCP output** (pathways, molecular functions, regulatory networks)
- **OpenGenes MCP output** (longevity associations, interventions, model organism data)

---

### 🎯 TASK
Integrate and summarize the information into a **single, human-readable scientific article in Markdown (.md)** format, following the structure below.

Each section should include concise yet informative text suitable for a WikiCrow-style gene/protein entry.
Where available, include UniProt, KEGG, and OpenGenes IDs and URLs.
If any data source is missing, gracefully skip the section without placeholders.

---

## 🧬 1. Gene / Protein Overview
- **Gene Symbol / Name:** from UniProt or KEGG
- **Protein Name:** official name (UniProt)
- **Identifiers:** UniProt ID, KEGG ID, Gene ID, HGNC ID, Ensembl ID (if available)
- **Organism:** Homo sapiens (unless otherwise specified)
- **Sequence Links:**  
  - [Protein (UniProt)](link)  
  - [DNA / mRNA (RefSeq or Ensembl)](link)

---

## 🔬 2. Structure and Functional Domains
- **Protein Length:** (e.g., 605 amino acids)
- **Key Domains / Motifs:** (e.g., Neh1–Neh7 domains, bZIP region, ETGE/DLG motifs)
- **Functional Roles:** summarized from UniProt and KEGG functional annotations
- **Post-Translational Modifications (PTMs):** phosphorylation, ubiquitination, etc.
- **Orthologs / Paralogs:** from KEGG or UniProt cross-refs; include species and % identity

---

## ⚙️ 3. Sequence-to-Function Relationships
| Interval | Type of Modification | Experimental Effect | Functional Outcome | Source |
|-----------|---------------------|---------------------|--------------------|--------|
| 16–32     | ETGE motif mutation | KEAP1 binding loss  | Constitutive NRF2 activation | UniProt |
| 525–550   | Neh1 domain deletion | Loss of DNA binding | Reduced antioxidant response | Literature |

- Use data from UniProt and KEGG to describe regions where amino acid changes or truncations alter protein function.
- Highlight experimentally confirmed relationships (e.g., domain deletions, point mutations, or chimeric constructs).

---

## 🧠 4. Pathways and Functional Networks
- Extract from KEGG:
  - Pathways the protein is involved in (e.g., oxidative stress response, metabolism, reprogramming)
  - Interaction partners (if listed)
- Provide KEGG pathway map links and summarize biological roles.

---

## 🧓 5. Longevity and Aging Associations
From **OpenGenes** (and KEGG if relevant):
- Known longevity associations (pro- or anti-longevity)
- Key experiments in model organisms (C. elegans, Drosophila, mice, etc.)
- Human genetic or population associations (e.g., APOE2, FOXO3 variants)
- Known interventions (overexpression, knockdown, CRISPR, pharmacological)

Example table:

| Model | Intervention | Result | Reference |
|--------|--------------|--------|------------|
| C. elegans (skn-1) | Overexpression | ↑ Lifespan + oxidative stress resistance | PMID: 28612944 |
| Mouse (Nrf2 knockout) | Loss of function | ↓ Lifespan, ↑ inflammation | PMID: ... |

---

## 💊 6. Small Molecule and Drug Interactions
From KEGG or UniProt:
- Known small-molecule modulators, inducers, inhibitors.
- Mechanisms (binding, phosphorylation, inhibition of degradation, etc.)
- Example: *Sulforaphane* → disrupts KEAP1-NRF2 binding → activates antioxidant response.

---

## 🌍 7. Evolutionary Conservation
- Conservation of sequence motifs and domains across species.
- Note orthologs (e.g., SKN-1 in *C. elegans*, CncC in *Drosophila*).
- Discuss conservation of longevity-related functions.

---

## 📚 8. References
List all provided reference links and IDs from the source data (PMIDs, DOIs, KEGG URLs, UniProt links, OpenGenes pages).

---

### OUTPUT REQUIREMENTS
- Format: **Markdown (.md)**, structured exactly as above.
- Tone: neutral, scientific, Wikipedia-style.
- Include inline citations or links to source databases whenever possible.
- Avoid speculation or unverified claims.

---

### INPUT
UniProt data:
{uniprot_output}

KEGG data:
{kegg_output}

OpenGenes data:
{opengenes_output}

---

### OUTPUT
Return only the final Markdown article.
"""


In [13]:
agent = ToolCallingAgent(
        model=model_t,
        tools=[],
        add_base_tools=False,
        max_steps=1,
    )
    # agent.prompt_templates["system_prompt"] = SYSTEM_PROMPT
result = agent.run(summarizer_prompt)

In [15]:
print(result)

# 🧬 1. Gene / Protein Overview
- **Gene Symbol / Name:** GHR (Growth Hormone Receptor)
- **Protein Name:** Growth hormone receptor
- **Identifiers:** 
  - UniProt ID: [P10912](https://www.uniprot.org/uniprotkb/P10912/entry)
  - KEGG ID: [hsa:2690](https://www.kegg.jp/entry/hsa:2690)
  - HGNC: [4263](https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/HGNC:4263)
  - Ensembl ID: [ENSG00000112964](https://www.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000112964)
- **Organism:** *Homo sapiens*
- **Sequence Links:**  
  - [Protein (UniProt)](https://www.uniprot.org/uniprotkb/P10912/entry)  
  - [DNA / mRNA (Ensembl)](https://www.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000112964)

---

## 🔬 2. Structure and Functional Domains
- **Protein Length:** 638 amino acids (canonical isoform)
- **Key Domains / Motifs:**
  - Signal peptide (1–18)
  - Extracellular domain (19–264) containing fibronectin type-III domain (151–254) and WSXWS motif (240–244)
  - Transmembrane domain (265–2

In [16]:
with open('GHR.md', "w") as md_file:
    md_file.write(result)

### Test

In [17]:
%load_ext autoreload
%autoreload 2

In [18]:
from agg import run_query

In [19]:
output = run_query(uniprot_output, kegg_output, opengenes_output)

In [20]:
print(output)

# 🧬 1. Gene / Protein Overview
- **Gene Symbol / Name:** GHR (Growth Hormone Receptor)
- **Protein Name:** Growth hormone receptor
- **Identifiers:**
  - UniProt ID: [P10912](https://www.uniprot.org/uniprotkb/P10912/entry)
  - KEGG ID: [hsa:2690](https://www.kegg.jp/entry/hsa:2690)
  - HGNC: [HGNC:4263](https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/HGNC:4263)
  - Ensembl ID: [ENSG00000112964](https://www.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000112964)
- **Organism:** *Homo sapiens*
- **Sequence Links:**  
  - [Protein (UniProt)](https://www.uniprot.org/uniprotkb/P10912/entry)  
  - [DNA / mRNA (Ensembl)](https://www.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000112964)

---

## 🔬 2. Structure and Functional Domains
- **Protein Length:** 638 amino acids (canonical isoform)
- **Key Domains / Motifs:**
  - Signal peptide (1–18)
  - Extracellular domain (19–264) containing ligand-binding region
  - Fibronectin type-III domain (151–254)
  - WSXWS motif (240–244; c