pubmed-miner

Biomedical literature mining toolkit for grant hypothesis generation.

Built and maintained by Scott A. Bowler — Ndhlovu Lab, Weill Cornell Medicine.

Takes a free-text query string and returns ranked, clustered, entity-annotated PubMed abstracts — exported to a structured Excel report. Designed to accelerate literature reviews during grant hypothesis development.

Quickstart

import pubmed_miner

results = pubmed_miner.search(
    query="HERV-K expression immunotherapy response melanoma",
    email="your@email.com",
    max_results=50,
    output_dir="grant_review"
)

print(f"Retrieved {results['n_retrieved']} papers")
print(f"Top paper: {results['papers'][0]['title']}")
print(f"Top genes: {results['top_entities']['genes'][:5]}")
print(f"Report: {results['excel_path']}")

What It Does

For a given query string, the pipeline:

Fetches the top 50 most relevant PubMed abstracts via the NCBI Entrez API
Scores each paper by relevance to the query using TF-IDF cosine similarity
Extracts biomedical entities — genes, diseases, drugs, viral/immune terms
Clusters papers into topic groups with human-readable labels
Exports a multi-sheet Excel report

Output

Results dict

results["papers"]         # List of paper dicts, sorted by relevance score
results["clusters"]       # Topic cluster summaries with top terms
results["top_entities"]   # Most frequent genes, diseases, drugs across corpus
results["excel_path"]     # Path to the exported Excel report
results["n_retrieved"]    # Number of papers fetched

Excel report (5 sheets)

Sheet	Contents
Ranked Papers	All papers sorted by relevance, with entities and cluster label
Topic Clusters	Cluster summaries with top terms and representative titles
Top Entities	Most frequent genes, diseases, drugs across the corpus
Entity Detail	Per-paper entity breakdown
Query Info	Query string, date, paper count — for reproducibility

Installation

git clone https://github.com/sabowler/pubmed-miner.git
cd pubmed-miner
pip install -e .

# Development (includes pytest)
pip install -e ".[dev]"

Parameters

pubmed_miner.search(
    query       = "opioid use disorder neuroinflammation scRNA-seq",
    email       = "your@email.com",      # Required by NCBI
    max_results = 50,                    # Papers to retrieve (default 50)
    n_clusters  = None,                  # Auto-selected if None
    output_dir  = "results",             # Where to write the Excel report
    api_key     = None,                  # NCBI API key (optional, increases rate limit)
    export      = True,                  # Set False to skip Excel export
)

NCBI API key — free to register at ncbi.nlm.nih.gov/account. Increases rate limit from 3 to 10 requests/second.

Example Queries

# Immunotherapy biomarker discovery
pubmed_miner.search("PD-1 PD-L1 biomarker response solid tumors", email="...")

# HIV neurocognitive research
pubmed_miner.search("HIV associated neurocognitive disorder neuroinflammation", email="...")

# Opioid epigenomics (SCORCH)
pubmed_miner.search("opioid use disorder DNA methylation epigenomics", email="...")

# Cancer microbiome
pubmed_miner.search("tumor microbiome immunotherapy response colorectal cancer", email="...")

Running Tests

pytest tests/ -v

All 12 tests run without a network connection using mock paper data.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
pubmed_miner		pubmed_miner
tests		tests
LICENSE.txt		LICENSE.txt
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pubmed-miner

Quickstart

What It Does

Output

Results dict

Excel report (5 sheets)

Installation

Parameters

Example Queries

Running Tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pubmed-miner

Quickstart

What It Does

Output

Results dict

Excel report (5 sheets)

Installation

Parameters

Example Queries

Running Tests

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages