# PubMed Tools Pipeline Test

This notebook tests the full pipeline: **Search → Fetch → Screen**

The pipeline screens papers for sequence→function relationships using only:
- Paper titles
- MeSH terms (keywords)

## Import Required Libraries

In [1]:
# IMPORTANT: If you've changed the model in .env, restart the kernel before running!
# Kernel -> Restart Kernel

import json
from src.tools.pubmed_search import search_pubmed
from src.tools.pubmed_fetch import fetch_abstracts
from src.tools.screening import screen_paper_by_metadata

# Verify the model being used
from src.config import NEBIUS_MODEL
print(f"✓ Using model: {NEBIUS_MODEL}")

✓ Using model: meta-llama/Llama-3.3-70B-Instruct


## Configure Search Parameters

In [8]:
# Customize these parameters
SEARCH_QUERY = "IGF1R aging"
MAX_RESULTS = 10

## Step 1: Search PubMed

In [9]:
print("=" * 70)
print(f"STEP 1: Searching PubMed for '{SEARCH_QUERY}'")
print("=" * 70)

pmids = search_pubmed(SEARCH_QUERY, max_results=MAX_RESULTS)
print(f"\nFound {len(pmids)} PMIDs")

if not pmids or pmids[0].startswith("Error"):
    print("\n❌ Search failed!")
    raise Exception("Search failed")

print(f"✓ Search successful! Got {len(pmids)} PMIDs")
print(f"\nFirst 5 PMIDs: {pmids[:5]}")

STEP 1: Searching PubMed for 'IGF1R aging'

Found 10 PMIDs
✓ Search successful! Got 10 PMIDs

First 5 PMIDs: ['35857466', '16123266', '37527036', '37441495', '30026579']


## Step 2: Fetch Paper Metadata

In [10]:
print("\n" + "=" * 70)
print(f"STEP 2: Fetching metadata for {len(pmids)} papers")
print("=" * 70)

# Convert list to newline-separated string (as the tool expects)
pmid_string = "\n".join(pmids)
papers = fetch_abstracts(pmid_string)

if papers and "error" in papers[0]:
    print("\n❌ Fetch failed!")
    print(papers[0]["error"])
    raise Exception("Fetch failed")

print(f"✓ Fetched metadata for {len(papers)} papers")

# Show a sample paper
if papers:
    print("\n" + "-" * 70)
    print("Sample paper:")
    print("-" * 70)
    sample = papers[0]
    print(f"PMID: {sample['pmid']}")
    print(f"Title: {sample['title']}")
    print(f"Year: {sample['year']}")
    print(f"Journal: {sample['journal']}")
    print(f"MeSH Terms: {sample['mesh_terms'][:5] if sample['mesh_terms'] else 'None'}...")


STEP 2: Fetching metadata for 10 papers
✓ Fetched metadata for 10 papers

----------------------------------------------------------------------
Sample paper:
----------------------------------------------------------------------
PMID: 35857466
Title: Progerin modulates the IGF-1R/Akt signaling involved in aging.
Year: 2022
Journal: Sci Adv
MeSH Terms: None...


## Step 3: Screen Papers Using Title + Keywords

In [11]:
print("\n" + "=" * 70)
print(f"STEP 3: Screening papers using title + MeSH terms")
print("=" * 70)

relevant_papers = []

for i, paper in enumerate(papers, 1):
    print(f"\nScreening paper {i}/{len(papers)}: {paper['pmid']}")
    print(f"  Title: {paper.get('title', '')[:80]}...")

    result = screen_paper_by_metadata(
        title=paper.get("title", ""),
        keywords=paper.get("mesh_terms", [])
    )

    if result["relevant"]:
        relevant_papers.append({
            **paper,
            "screening_score": result["score"],
            "screening_reasoning": result["reasoning"]
        })
        print(f"  ✓ RELEVANT (score: {result['score']:.2f})")
        print(f"    Reason: {result['reasoning']}")
    else:
        print(f"  ✗ Not relevant (score: {result['score']:.2f})")
        print(f"    Reason: {result['reasoning']}")

print(f"\n✓ Screening complete!")


STEP 3: Screening papers using title + MeSH terms

Screening paper 1/10: 35857466
  Title: Progerin modulates the IGF-1R/Akt signaling involved in aging....
  ✓ RELEVANT (score: 0.80)
    Reason: The title mentions a specific protein, progerin, and its involvement in aging through modulation of the IGF-1R/Akt signaling pathway, suggesting a potential SEQUENCE→PHENOTYPE link. The lack of keywords is a limitation, but the title provides sufficient indication of a link to aging research.

Screening paper 2/10: 16123266
  Title: Suppression of aging in mice by the hormone Klotho....
  ✓ RELEVANT (score: 0.80)
    Reason: The title and keywords suggest a link between the Klotho hormone and aging/longevity, and the presence of terms like 'genetics', 'transgenic mice', and 'recombinant proteins' imply experimental studies, making this paper relevant for SEQUENCE→PHENOTYPE links in aging research.

Screening paper 3/10: 37527036
  Title: IGFBPL1 is a master driver of microglia homeostasis and

## Step 4: Display Results

In [12]:
print("\n" + "=" * 70)
print(f"RESULTS: Found {len(relevant_papers)} relevant papers out of {len(papers)}")
print("=" * 70)

if relevant_papers:
    print("\nRelevant papers:")
    print("-" * 70)

    for i, paper in enumerate(relevant_papers, 1):
        print(f"\n{i}. PMID: {paper['pmid']} | Score: {paper['screening_score']:.2f}")
        print(f"   Title: {paper['title']}")
        print(f"   Year: {paper['year']} | Journal: {paper['journal']}")
        print(f"   MeSH: {', '.join(paper['mesh_terms'][:5]) if paper['mesh_terms'] else 'None'}...")
        print(f"   Why relevant: {paper['screening_reasoning']}")
else:
    print("\n⚠️  No relevant papers found. Try a different query.")

print("\n" + "=" * 70)
print("✓ Pipeline test completed!")
print("=" * 70)


RESULTS: Found 5 relevant papers out of 10

Relevant papers:
----------------------------------------------------------------------

1. PMID: 35857466 | Score: 0.80
   Title: Progerin modulates the IGF-1R/Akt signaling involved in aging.
   Year: 2022 | Journal: Sci Adv
   MeSH: None...
   Why relevant: The title mentions a specific protein, progerin, and its involvement in aging through modulation of the IGF-1R/Akt signaling pathway, suggesting a potential SEQUENCE→PHENOTYPE link. The lack of keywords is a limitation, but the title provides sufficient indication of a link to aging research.

2. PMID: 16123266 | Score: 0.80
   Title: Suppression of aging in mice by the hormone Klotho.
   Year: 2005 | Journal: Science
   MeSH: Aging/genetics/*physiology, Animals, Blood Glucose/analysis, Cell Line, Cell Line, Tumor...
   Why relevant: The title and keywords suggest a link between the Klotho hormone and aging/longevity, and the presence of terms like 'genetics', 'transgenic mice', and 'r

## Summary Statistics

In [7]:
print("\n📊 SUMMARY STATISTICS")
print("=" * 70)
print(f"Search query: {SEARCH_QUERY}")
print(f"Total papers searched: {len(papers)}")
print(f"Relevant papers found: {len(relevant_papers)}")
print(f"Relevance rate: {len(relevant_papers)/len(papers)*100:.1f}%")

if relevant_papers:
    avg_score = sum(p['screening_score'] for p in relevant_papers) / len(relevant_papers)
    print(f"Average relevance score: {avg_score:.2f}")
    print(f"\nTop scored paper: {relevant_papers[0]['title']}")


📊 SUMMARY STATISTICS
Search query: FOXO3 longevity mutation
Total papers searched: 19
Relevant papers found: 12
Relevance rate: 63.2%
Average relevance score: 0.82

Top scored paper: Relaxed Selection Limits Lifespan by Increasing Mutation Load.


## Export Results (Optional)

In [None]:
# Uncomment to save results to JSON
# import json
# with open('relevant_papers.json', 'w') as f:
#     json.dump(relevant_papers, f, indent=2)
# print("Results saved to relevant_papers.json")