# PubMed Tools Pipeline Test

This notebook tests the full pipeline: **Search → Fetch → Screen**

The pipeline screens papers for sequence→function relationships using only:
- Paper titles
- MeSH terms (keywords)

## Import Required Libraries

In [1]:
# IMPORTANT: If you've changed the model in .env, restart the kernel before running!
# Kernel -> Restart Kernel

import json
from src.tools.pubmed_search import search_pubmed
from src.tools.pubmed_fetch import fetch_abstracts
from src.tools.screening import screen_paper_by_metadata

# Verify the model being used
from src.config import NEBIUS_MODEL
print(f"✓ Using model: {NEBIUS_MODEL}")

✓ Using model: meta-llama/Llama-3.3-70B-Instruct


## Configure Search Parameters

In [2]:
# Customize these parameters
SEARCH_QUERY = "FOXO3 longevity mutation"
MAX_RESULTS = 20

## Step 1: Search PubMed

In [3]:
print("=" * 70)
print(f"STEP 1: Searching PubMed for '{SEARCH_QUERY}'")
print("=" * 70)

pmids = search_pubmed(SEARCH_QUERY, max_results=MAX_RESULTS)
print(f"\nFound {len(pmids)} PMIDs")

if not pmids or pmids[0].startswith("Error"):
    print("\n❌ Search failed!")
    raise Exception("Search failed")

print(f"✓ Search successful! Got {len(pmids)} PMIDs")
print(f"\nFirst 5 PMIDs: {pmids[:5]}")

STEP 1: Searching PubMed for 'FOXO3 longevity mutation'

Found 19 PMIDs
✓ Search successful! Got 19 PMIDs

First 5 PMIDs: ['31257025', '22493319', '21909281', '32720744', '24466179']


## Step 2: Fetch Paper Metadata

In [4]:
print("\n" + "=" * 70)
print(f"STEP 2: Fetching metadata for {len(pmids)} papers")
print("=" * 70)

# Convert list to newline-separated string (as the tool expects)
pmid_string = "\n".join(pmids)
papers = fetch_abstracts(pmid_string)

if papers and "error" in papers[0]:
    print("\n❌ Fetch failed!")
    print(papers[0]["error"])
    raise Exception("Fetch failed")

print(f"✓ Fetched metadata for {len(papers)} papers")

# Show a sample paper
if papers:
    print("\n" + "-" * 70)
    print("Sample paper:")
    print("-" * 70)
    sample = papers[0]
    print(f"PMID: {sample['pmid']}")
    print(f"Title: {sample['title']}")
    print(f"Year: {sample['year']}")
    print(f"Journal: {sample['journal']}")
    print(f"MeSH Terms: {sample['mesh_terms'][:5] if sample['mesh_terms'] else 'None'}...")


STEP 2: Fetching metadata for 19 papers
✓ Fetched metadata for 19 papers

----------------------------------------------------------------------
Sample paper:
----------------------------------------------------------------------
PMID: 31257025
Title: Relaxed Selection Limits Lifespan by Increasing Mutation Load.
Year: 2019
Journal: Cell
MeSH Terms: ['Aging', 'Animals', 'DNA Replication', 'Evolution, Molecular', 'Gene Frequency']...


## Step 3: Screen Papers Using Title + Keywords

In [5]:
print("\n" + "=" * 70)
print(f"STEP 3: Screening papers using title + MeSH terms")
print("=" * 70)

relevant_papers = []

for i, paper in enumerate(papers, 1):
    print(f"\nScreening paper {i}/{len(papers)}: {paper['pmid']}")
    print(f"  Title: {paper.get('title', '')[:80]}...")

    result = screen_paper_by_metadata(
        title=paper.get("title", ""),
        keywords=paper.get("mesh_terms", [])
    )

    if result["relevant"]:
        relevant_papers.append({
            **paper,
            "screening_score": result["score"],
            "screening_reasoning": result["reasoning"]
        })
        print(f"  ✓ RELEVANT (score: {result['score']:.2f})")
        print(f"    Reason: {result['reasoning']}")
    else:
        print(f"  ✗ Not relevant (score: {result['score']:.2f})")
        print(f"    Reason: {result['reasoning']}")

print(f"\n✓ Screening complete!")


STEP 3: Screening papers using title + MeSH terms

Screening paper 1/19: 31257025
  Title: Relaxed Selection Limits Lifespan by Increasing Mutation Load....
  ✓ RELEVANT (score: 0.80)
    Reason: The title and keywords suggest a link between mutation load and lifespan, implying a sequence→phenotype connection, and the presence of experimental keywords like 'DNA Replication' and 'Mitochondria/genetics/metabolism' indicate evidence of genetic studies.

Screening paper 2/19: 22493319
  Title: FOXO3/FKHRL1 is activated by 5-aza-2-deoxycytidine and induces silenced caspase-...
  ✗ Not relevant (score: 0.00)
    Reason: The paper focuses on cancer treatment and apoptosis, with no clear link to longevity, lifespan, or aging research, and does not mention specific sequence modifications related to aging or longevity.

Screening paper 3/19: 21909281
  Title: The evolutionarily conserved longevity determinants HCF-1 and SIR-2.1/SIRT1 coll...
  ✓ RELEVANT (score: 0.90)
    Reason: The title and 

## Step 4: Display Results

In [6]:
print("\n" + "=" * 70)
print(f"RESULTS: Found {len(relevant_papers)} relevant papers out of {len(papers)}")
print("=" * 70)

if relevant_papers:
    print("\nRelevant papers:")
    print("-" * 70)

    for i, paper in enumerate(relevant_papers, 1):
        print(f"\n{i}. PMID: {paper['pmid']} | Score: {paper['screening_score']:.2f}")
        print(f"   Title: {paper['title']}")
        print(f"   Year: {paper['year']} | Journal: {paper['journal']}")
        print(f"   MeSH: {', '.join(paper['mesh_terms'][:5]) if paper['mesh_terms'] else 'None'}...")
        print(f"   Why relevant: {paper['screening_reasoning']}")
else:
    print("\n⚠️  No relevant papers found. Try a different query.")

print("\n" + "=" * 70)
print("✓ Pipeline test completed!")
print("=" * 70)


RESULTS: Found 12 relevant papers out of 19

Relevant papers:
----------------------------------------------------------------------

1. PMID: 31257025 | Score: 0.80
   Title: Relaxed Selection Limits Lifespan by Increasing Mutation Load.
   Year: 2019 | Journal: Cell
   MeSH: Aging, Animals, DNA Replication, Evolution, Molecular, Gene Frequency...
   Why relevant: The title and keywords suggest a link between mutation load and lifespan, implying a sequence→phenotype connection, and the presence of experimental keywords like 'DNA Replication' and 'Mitochondria/genetics/metabolism' indicate evidence of genetic studies.

2. PMID: 21909281 | Score: 0.90
   Title: The evolutionarily conserved longevity determinants HCF-1 and SIR-2.1/SIRT1 collaborate to regulate DAF-16/FOXO.
   Year: 2011 | Journal: PLoS Genet
   MeSH: Animals, Caenorhabditis elegans/genetics/physiology, Caenorhabditis elegans Proteins/genetics/metabolism, Evolution, Molecular, Forkhead Box Protein O3...
   Why relevant: 

## Summary Statistics

In [7]:
print("\n📊 SUMMARY STATISTICS")
print("=" * 70)
print(f"Search query: {SEARCH_QUERY}")
print(f"Total papers searched: {len(papers)}")
print(f"Relevant papers found: {len(relevant_papers)}")
print(f"Relevance rate: {len(relevant_papers)/len(papers)*100:.1f}%")

if relevant_papers:
    avg_score = sum(p['screening_score'] for p in relevant_papers) / len(relevant_papers)
    print(f"Average relevance score: {avg_score:.2f}")
    print(f"\nTop scored paper: {relevant_papers[0]['title']}")


📊 SUMMARY STATISTICS
Search query: FOXO3 longevity mutation
Total papers searched: 19
Relevant papers found: 12
Relevance rate: 63.2%
Average relevance score: 0.82

Top scored paper: Relaxed Selection Limits Lifespan by Increasing Mutation Load.


## Export Results (Optional)

In [None]:
# Uncomment to save results to JSON
# import json
# with open('relevant_papers.json', 'w') as f:
#     json.dump(relevant_papers, f, indent=2)
# print("Results saved to relevant_papers.json")