TypeScript clients for NCBI APIs — PubMed, PMC, BLAST, SNP, ClinVar, PubChem, Datasets, and more.
Disclaimer: This is an unofficial, community-maintained SDK. It is not affiliated with, endorsed by, or related to the National Center for Biotechnology Information (NCBI) or the NCBI GitHub organization. For official NCBI tools and resources, visit ncbi.nlm.nih.gov/home/develop.
The National Center for Biotechnology Information (NCBI), part of the U.S. National Library of Medicine (NLM), maintains the world's largest collection of biomedical databases. These include PubMed (37M+ article citations), PubMed Central (PMC, 9M+ full-text articles), MeSH (controlled medical vocabulary), BLAST (sequence alignment), dbSNP (genetic variation), ClinVar (clinical variants), PubChem (chemical compounds), and many more. Researchers, clinicians, and developers rely on NCBI's public APIs to search, retrieve, and analyze biomedical data programmatically.
ncbijs provides typed, zero-dependency TypeScript clients for these APIs. This entire project is built and maintained by AI using Claude Code — no human-written code is accepted. See CONTRIBUTING.md for details.
It is designed for two audiences:
- Developers and researchers building biomedical applications, literature review tools, or clinical decision support systems.
- LLM and AI agents that need structured, programmatic access to biomedical literature for retrieval-augmented generation (RAG), entity extraction, and citation management.
Built for LLM consumption. Every package follows consistent naming, consistent interfaces, and has a self-documenting API with full JSDoc. The MCP server exposes 27 tools that any LLM agent can call directly. The workflow table below and the "Which package do I need?" decision tree make it easy for agents to discover the right package without reading source code. 40 of 43 packages run in the browser — ideal for agentic web apps that query NCBI without a backend.
| Workflow | Packages |
|---|---|
| Search PubMed and retrieve article metadata | @ncbijs/pubmed + @ncbijs/eutils |
| Fetch full-text articles from PMC | @ncbijs/pmc + @ncbijs/jats |
| Extract genes, diseases, chemicals from articles | @ncbijs/pubtator |
| Generate formatted citations (RIS, MEDLINE, CSL-JSON) | @ncbijs/cite |
| Convert between PMID, PMCID, and DOI | @ncbijs/id-converter |
| Expand MeSH terms for comprehensive searches | @ncbijs/mesh |
| Chunk full-text articles for RAG pipelines | @ncbijs/jats (toChunks) |
| Look up genes, genomes, and taxonomy | @ncbijs/datasets |
| Parse FASTA nucleotide/protein sequences | @ncbijs/fasta |
| Run BLAST sequence alignments | @ncbijs/blast |
| Look up SNP/variant data from dbSNP | @ncbijs/snp |
| Query clinical variant significance from ClinVar | @ncbijs/clinvar |
| Retrieve compound, substance, and assay data | @ncbijs/pubchem |
| Fetch protein sequences in FASTA or GenBank format | @ncbijs/protein |
| Fetch nucleotide sequences in FASTA or GenBank format | @ncbijs/nucleotide |
| Parse GenBank flat file records locally | @ncbijs/genbank |
| Look up genetic disorders from OMIM | @ncbijs/omim |
| Query medical genetics concepts from MedGen | @ncbijs/medgen |
| Search genetic tests from GTR | @ncbijs/gtr |
| Search gene expression datasets from GEO | @ncbijs/geo |
| Query structural variants from dbVar | @ncbijs/dbvar |
| Search sequencing experiment metadata from SRA | @ncbijs/sra |
| Look up 3D molecular structures from MMDB/PDB | @ncbijs/structure |
| Search conserved protein domains from CDD | @ncbijs/cdd |
| Search NCBI Bookshelf entries | @ncbijs/books |
| Look up journal/serial records from NLM Catalog | @ncbijs/nlm-catalog |
| Convert variant notations (HGVS, SPDI, VCF) | @ncbijs/snp |
| Get full compound annotations (GHS, patents) | @ncbijs/pubchem |
| Chain search-fetch pipelines via History Server | @ncbijs/eutils |
| Search clinical trials by condition/intervention | @ncbijs/clinical-trials |
| Get citation metrics and impact scores | @ncbijs/icite |
| Normalize drug names and find drug classes | @ncbijs/rxnorm |
| Look up drug labels, SPLs, and NDC packaging | @ncbijs/dailymed |
| Find literature linked to genetic variants | @ncbijs/litvar |
| Get annotated text with entity recognition | @ncbijs/bioc |
| Autocomplete ICD-10, LOINC, SNOMED codes | @ncbijs/clinical-tables |
| Store NCBI data locally in DuckDB | @ncbijs/store |
| Build data pipelines (Source → Parse → Sink) | @ncbijs/pipeline |
| Load any NCBI dataset with one function call | @ncbijs/etl |
| Watch NCBI sources for updates and re-sync | @ncbijs/sync |
| Expose all tools to LLM agents via MCP | @ncbijs/http-mcp |
| Query local NCBI data via MCP | @ncbijs/store-mcp |
| Package | Description | Version |
|---|---|---|
@ncbijs/pubmed |
High-level PubMed search and retrieval with fluent query builder | |
@ncbijs/pmc |
PMC full-text retrieval via E-utilities, OA Service, and OAI-PMH | |
@ncbijs/eutils |
Spec-compliant client for all 9 NCBI E-utilities | |
@ncbijs/cite |
Citation formatting in 4 styles (RIS, MEDLINE, CSL-JSON, Citation) | |
@ncbijs/id-converter |
Batch conversion between PMID, PMCID, DOI, and Manuscript ID | |
@ncbijs/mesh |
MeSH vocabulary tree traversal and query expansion | |
@ncbijs/pubtator |
PubTator3 text mining — entity search and BioC annotation export | |
@ncbijs/pubmed-xml |
PubMed/MEDLINE XML and plain-text parser | |
@ncbijs/jats |
JATS XML parser with markdown, plain-text, and RAG chunking | |
@ncbijs/blast |
BLAST sequence alignment with async submit/poll/retrieve workflow | |
@ncbijs/snp |
dbSNP variation data — placements, allele annotations, frequencies | |
@ncbijs/clinvar |
ClinVar clinical variant significance, genes, traits, locations | |
@ncbijs/pubchem |
PubChem compound data — properties, synonyms, descriptions | |
@ncbijs/datasets |
NCBI Datasets API v2 client for genes, genomes, and taxonomy | |
@ncbijs/protein |
Protein sequence retrieval in FASTA and GenBank formats | |
@ncbijs/nucleotide |
Nucleotide sequence retrieval in FASTA and GenBank formats | |
@ncbijs/genbank |
Zero-dependency GenBank flat file format parser | |
@ncbijs/omim |
OMIM genetic disorders — Mendelian inheritance catalog | |
@ncbijs/medgen |
MedGen medical genetics concepts and disease-gene links | |
@ncbijs/gtr |
Genetic Testing Registry — test catalog and clinical validity | |
@ncbijs/geo |
GEO gene expression datasets — microarray and RNA-seq metadata | |
@ncbijs/dbvar |
dbVar structural variants — copy number, inversions, translocations | |
@ncbijs/sra |
SRA sequencing experiment metadata with embedded XML parsing | |
@ncbijs/structure |
3D molecular structure records from MMDB/PDB | |
@ncbijs/cdd |
Conserved Domain Database — protein domain annotations | |
@ncbijs/books |
NCBI Bookshelf entries — textbooks, reports, chapters | |
@ncbijs/nlm-catalog |
NLM Catalog journal and serial records with ISSN data | |
@ncbijs/clinical-trials |
ClinicalTrials.gov v2 — study search, stats, and field values | |
@ncbijs/icite |
NIH iCite citation metrics — RCR, percentiles, clinical citations | |
@ncbijs/rxnorm |
RxNorm drug normalization — concepts, classes, NDC codes | |
@ncbijs/dailymed |
DailyMed drug labels — SPLs, NDC packaging, drug classes | |
@ncbijs/litvar |
LitVar2 variant-literature linking — publications by rsID | |
@ncbijs/bioc |
BioC annotated text — PubMed/PMC articles with named entities | |
@ncbijs/clinical-tables |
Clinical Table Search — ICD-10, LOINC, SNOMED autocomplete | |
@ncbijs/fasta |
Zero-dependency FASTA format parser for sequences | |
@ncbijs/xml |
Zero-dependency regex-based XML reader for NCBI formats | |
@ncbijs/store |
Storage interfaces and DuckDB implementation for local NCBI data | |
@ncbijs/pipeline |
Composable data pipelines: Source → Parse → Sink | |
@ncbijs/etl |
Pre-wired NCBI data loaders: load('mesh', mySink) |
|
@ncbijs/sync |
NCBI update detection and scheduled re-sync | |
@ncbijs/http-mcp |
MCP server exposing all ncbijs tools for LLM agents | |
@ncbijs/store-mcp |
MCP server for querying locally stored NCBI data via DuckDB | |
@ncbijs/rate-limiter |
Token bucket rate limiter for browser and Node.js |
ncbijs is built to power biomedical RAG (Retrieval-Augmented Generation) pipelines. Use it to enrich document chunks with named entities, normalize terminology via MeSH, validate claims against PubMed, and inject formatted citations into generated answers. The MCP server (@ncbijs/http-mcp) lets LLM agents call any ncbijs tool directly during generation with zero glue code.
See RAG Integration Guide for a full architecture walkthrough covering ingestion enrichment, query-time augmentation, generation-time citation, and priority assessment.
ncbijs includes a composable pipeline system for processing bulk NCBI data. Wire any source, parser, and sink together with a single pipeline() call. The pipeline package is 100% browser-compatible — every export uses standard Web APIs (fetch, DecompressionStream).
import { pipeline, createHttpSource, createSink } from '@ncbijs/pipeline';
import { parseMeshDescriptorXml } from '@ncbijs/mesh';
// Download from NCBI HTTP → parse → write to any destination
await pipeline(
createHttpSource('https://nlmpubs.nlm.nih.gov/projects/mesh/MESH_FILES/xmlmesh/desc2026.xml'),
(xml) => parseMeshDescriptorXml(xml).descriptors,
createSink(async (records) => {
console.log(`Received ${records.length} MeSH descriptors`);
}),
);Or skip the wiring entirely with @ncbijs/etl — one function call to download, parse, and sink any dataset:
import { load, loadAll } from '@ncbijs/etl';
import { createSink } from '@ncbijs/pipeline';
// Load a single dataset
await load(
'mesh',
createSink(async (records) => {
console.log(`${records.length} MeSH descriptors`);
}),
);
// Load all 6 datasets into any sink
await loadAll((dataset) =>
createSink(async (records) => {
console.log(`${dataset}: ${records.length} records`);
}),
);The pipeline has three phases: load, sync, and query:
Phase 1: Initial Load Phase 2: Watch & Sync Phase 3: Query via MCP
NCBI FTP ──→ DuckDB Poll NCBI → re-load store-mcp ──→ Claude
(one-time bulk download) (long-running process) (zero rate limits)
import { load, loadAll } from '@ncbijs/etl';
import { DuckDbFileStorage } from '@ncbijs/store';
const storage = await DuckDbFileStorage.open('ncbi.duckdb');
// Load a single dataset
await load('clinvar', storage.createSink('clinvar'));
// Or load all 6 datasets at once
await loadAll((dataset) => storage.createSink(dataset));Once loaded, start a watcher to poll for upstream changes and re-load only what changed. createCheckers() picks the best detection strategy per dataset: MD5 checksums for ClinVar, Taxonomy, and PubChem; HTTP Last-Modified for all others.
import { createCheckers, load } from '@ncbijs/etl';
import { SyncScheduler, InMemorySyncState } from '@ncbijs/sync';
const scheduler = new SyncScheduler(new InMemorySyncState(), createCheckers(), {
checkIntervalMs: 3600_000,
datasets: ['clinvar', 'genes'],
onUpdate: async (dataset) => {
await load(dataset, storage.createSink(dataset));
},
});
await scheduler.start(); // checks immediately, then every hourOnce data is loaded, expose it to Claude (or any MCP-compatible agent) with @ncbijs/store-mcp:
{
"mcpServers": {
"ncbijs-store": {
"command": "npx",
"args": ["-y", "@ncbijs/store-mcp"],
"env": {
"NCBIJS_DB_PATH": "/absolute/path/to/ncbi.duckdb"
}
}
}
}Now your agent can query the local data directly:
- "Search for pathogenic BRCA1 variants in ClinVar"
- "Look up the MeSH descriptor for Alzheimer's disease"
- "What genes are on chromosome 17 in the local store?"
- "Convert PMID 33024307 to a DOI"
No network, no rate limits, no API keys. See @ncbijs/store-mcp for the full list of 13 query tools.
See examples/data-pipeline/ for complete scripts covering all three phases.
@ncbijs/pipeline— Composable Source/Sink primitives built onAsyncIterable. HTTP and composite sources, streaming, backpressure, abort signals. Browser + Node.js.@ncbijs/etl— Pre-wired loaders for 6 NCBI bulk datasets.load('mesh', mySink)is all you need. Also exportscreateCheckers()for sync.@ncbijs/store— Storage interfaces with a DuckDB reference implementation. Node.js only.@ncbijs/sync— Watches NCBI FTP for updates via MD5 checksums or HTTPLast-Modified. Pluggable checkers, configurable interval, abort signal.
See Data Pipeline Guide for the full API walkthrough, streaming parsers, error handling, and sync scheduling.
ncbijs ships two MCP servers that give AI agents direct access to NCBI data. Pick the one that fits your use case — or use both:
Live API (http-mcp) |
Local data (store-mcp) |
|
|---|---|---|
| Setup | Zero — just add the config | Load data first (Phases 1-2) |
| Network | Required (queries NCBI APIs in real time) | Offline after initial load |
| Rate limits | NCBI limits apply (3-10 req/s) | None |
| Data freshness | Always current | As fresh as last sync |
| Tools | 27 | 13 |
Query NCBI APIs in real time — PubMed, PMC full text, BLAST, ClinVar, PubChem, MeSH, and more. No data loading required.
{
"mcpServers": {
"ncbijs": {
"command": "npx",
"args": ["-y", "@ncbijs/http-mcp"],
"env": {
"NCBI_API_KEY": ""
}
}
}
}27 tools covering: PubMed search, PMC full text, PubTator entity recognition, gene/genome/taxonomy lookup, BLAST alignment, SNP/ClinVar variant queries, PubChem compounds, citation formatting, ID conversion, MeSH vocabulary, iCite metrics, RxNorm drug data, and LitVar variant-literature linking.
Example prompts:
- "Search PubMed for recent CRISPR gene therapy reviews"
- "Get the full text of PMC7886120 and summarize the methods"
- "What genes and diseases are mentioned in PMID 33024307?"
- "Run a BLAST search for the sequence ATCGATCGATCG"
See @ncbijs/http-mcp for details. Get a free API key at ncbi.nlm.nih.gov/account/settings.
Query your local DuckDB database — MeSH, ClinVar, genes, taxonomy, PubChem, and ID mappings. No network needed after loading.
Phase 1: load data ──→ Phase 2: sync ──→ Phase 3: query via store-mcp
(see Data pipelines) (optional) (this section)
{
"mcpServers": {
"ncbijs-store": {
"command": "npx",
"args": ["-y", "@ncbijs/store-mcp"],
"env": {
"NCBIJS_DB_PATH": "/absolute/path/to/ncbi.duckdb"
}
}
}
}13 tools available: store-lookup-mesh, store-search-mesh, store-lookup-variant, store-search-variants, store-lookup-gene, store-search-genes, store-lookup-taxonomy, store-search-taxonomy, store-lookup-compound, store-search-compounds, store-convert-ids, store-search-ids, store-stats.
Example prompts:
- "Search for pathogenic BRCA1 variants in the local ClinVar data"
- "What compounds have an InChI key starting with BSYNRYMUT?"
- "How many records are loaded in each dataset?"
See @ncbijs/store-mcp for details. See Data pipelines above to load the data.
40 of 43 packages work in both browsers and Node.js. Only 3 infrastructure packages require Node.js:
| Runtime | Packages | Why |
|---|---|---|
| Browser + Node.js | All HTTP clients, parsers, rate-limiter, xml, fasta, genbank, pipeline, etl, sync (40 packages) | Only uses fetch, DecompressionStream, and pure computation |
| Node.js only | @ncbijs/store |
Requires @duckdb/node-api (native binding) |
| Node.js only | @ncbijs/store-mcp, @ncbijs/http-mcp |
MCP server CLIs (stdio transport) |
Use ncbijs directly in frontend apps — search PubMed, look up genes, query MeSH, and more with zero server-side code:
import { PubMed } from '@ncbijs/pubmed';
import { Datasets } from '@ncbijs/datasets';
const pubmed = new PubMed();
const articles = await pubmed.search({ term: 'CRISPR therapy', retmax: 10 });
const datasets = new Datasets();
const gene = await datasets.geneBySymbol('BRCA1');npm install @ncbijs/pubmedimport { PubMed } from '@ncbijs/pubmed';
const pubmed = new PubMed({
tool: 'my-research-app',
email: 'you@university.edu',
});
const articles = await pubmed
.search('CRISPR gene therapy')
.dateRange('2023/01/01', '2024/12/31')
.freeFullText()
.limit(10)
.fetchAll();
for (const article of articles) {
console.log(`${article.pmid}: ${article.title}`);
}I want to...
│
├── Search biomedical literature
│ ├── High-level PubMed search ──────────→ @ncbijs/pubmed
│ ├── Low-level Entrez queries ──────────→ @ncbijs/eutils
│ └── Find literature by genetic variant ─→ @ncbijs/litvar
│
├── Retrieve full-text articles
│ ├── PMC open-access articles ──────────→ @ncbijs/pmc
│ └── Annotated text with NER ───────────→ @ncbijs/bioc
│
├── Extract entities from text
│ ├── Genes, diseases, chemicals ────────→ @ncbijs/pubtator
│ └── Annotated passages (BioC format) ──→ @ncbijs/bioc
│
├── Work with citations
│ ├── Format citations (RIS, CSL, etc.) ─→ @ncbijs/cite
│ ├── Convert PMID/PMCID/DOI ────────────→ @ncbijs/id-converter
│ └── Citation impact metrics (RCR) ─────→ @ncbijs/icite
│
├── Work with genes and sequences
│ ├── Gene/genome metadata ──────────────→ @ncbijs/datasets
│ ├── Protein sequences ─────────────────→ @ncbijs/protein
│ ├── Nucleotide sequences ──────────────→ @ncbijs/nucleotide
│ ├── Sequence alignment (BLAST) ────────→ @ncbijs/blast
│ ├── Parse FASTA format ────────────────→ @ncbijs/fasta
│ └── Parse GenBank format ──────────────→ @ncbijs/genbank
│
├── Work with variants and clinical data
│ ├── SNP/variant lookup (dbSNP) ────────→ @ncbijs/snp
│ ├── HGVS/SPDI/VCF conversion ─────────→ @ncbijs/snp
│ ├── Clinical significance (ClinVar) ───→ @ncbijs/clinvar
│ ├── Genetic disorders (OMIM) ──────────→ @ncbijs/omim
│ └── Medical genetics (MedGen) ─────────→ @ncbijs/medgen
│
├── Work with drugs and chemicals
│ ├── Compound properties ───────────────→ @ncbijs/pubchem
│ ├── Compound annotations (GHS, etc.) ──→ @ncbijs/pubchem
│ ├── Drug normalization (RxCUI) ────────→ @ncbijs/rxnorm
│ ├── Drug classes (ATC, VA, MEDRT) ─────→ @ncbijs/rxnorm
│ ├── NDC code lookup ───────────────────→ @ncbijs/rxnorm
│ └── Drug labels and SPLs ─────────────→ @ncbijs/dailymed
│
├── Autocomplete medical codes
│ ├── ICD-10, LOINC, SNOMED ─────────────→ @ncbijs/clinical-tables
│ └── RxTerms drug names ────────────────→ @ncbijs/clinical-tables
│
├── Search clinical trials ────────────────→ @ncbijs/clinical-trials
│
├── Work with vocabularies
│ └── MeSH term expansion ───────────────→ @ncbijs/mesh
│
├── Search other NCBI databases
│ ├── Gene expression (GEO) ─────────────→ @ncbijs/geo
│ ├── Structural variants (dbVar) ───────→ @ncbijs/dbvar
│ ├── Sequencing data (SRA) ─────────────→ @ncbijs/sra
│ ├── 3D structures (MMDB/PDB) ──────────→ @ncbijs/structure
│ ├── Protein domains (CDD) ─────────────→ @ncbijs/cdd
│ ├── Genetic tests (GTR) ───────────────→ @ncbijs/gtr
│ ├── Books/textbooks ───────────────────→ @ncbijs/books
│ └── Journal records (NLM Catalog) ─────→ @ncbijs/nlm-catalog
│
├── Store NCBI data locally ───────────────→ @ncbijs/store
├── Data pipeline (Source → Parse → Sink) ─→ @ncbijs/pipeline
├── Load any NCBI dataset in one call ─────→ @ncbijs/etl
├── Watch NCBI sources for updates ────────→ @ncbijs/sync
├── Expose tools to LLM agents (live API) ─→ @ncbijs/http-mcp
└── Query local data via MCP ─────────────→ @ncbijs/store-mcp
| Capability | Packages |
|---|---|
| Supports API key | eutils, pubmed, pmc, clinvar, snp, datasets, omim, medgen, gtr, geo, dbvar, sra, structure, cdd, books, nlm-catalog, protein, nucleotide (optional, for higher rate limits) |
| No API key needed | All others (non-NCBI APIs) |
| Rate-limited | eutils, datasets, blast, snp, clinvar, pubchem, clinical-trials, icite, rxnorm, dailymed, + all that depend on rate-limiter |
| Zero dependencies | pipeline, sync, cite, id-converter, mesh, fasta, genbank, litvar, bioc, clinical-tables |
| Async iterators | eutils (efetchBatches, searchAndFetch, searchAndSummarize), pubmed (batch), clinical-trials (searchStudies), cite (citeMany), pipeline (Source, streamParser) |
| XML parsing | eutils, pubmed-xml, jats, pubtator, xml |
| Bulk parsers | mesh, cite, id-converter, clinvar, datasets, pubchem, snp, icite, clinical-trials, litvar, medgen, cdd, pmc |
| Data pipelines | pipeline (Source → Parse → Sink), store (DuckDbSink), sync (update detection) |
pnpm install
pnpm build # Build all packages
pnpm test # Run all tests
pnpm lint # Lint all packages
pnpm typecheck # Type-check all packagespnpm nx run @ncbijs/pubmed:build
pnpm nx run @ncbijs/pubmed:testE2E tests hit real NCBI APIs and require an API key:
cp .env.example .env
# Add your NCBI API key to .env
pnpm nx run ncbijs-e2e:e2eGet an API key at ncbi.nlm.nih.gov/account/settings.