Skip to content

PMID Batch Filtering Returns 400 Error Despite Documentation Claims #8

@Bryan-Nsoh

Description

@Bryan-Nsoh

Title: PMID Batch Filtering Returns 400 Error Despite Documentation Claims

Description

The OpenAlex API returns a 400 "Invalid" error when attempting to use batch filtering with PMIDs, despite multiple sources in the documentation and community discussions indicating this should work. This affects the ability to efficiently retrieve works by PMID in bulk.

Expected Behavior

According to multiple official sources, PMID batch filtering should work with pipe-separated values:

  1. [OurResearch Blog Post (Dec 21, 2022)](https://blog.ourresearch.org/fetch-multiple-dois-in-one-openalex-api-request/) states:

    "This technique works with all IDs in OpenAlex, to include OpenAlex IDs and PubMed Central IDs (PMID)."

  2. [Google Groups Discussion](https://groups.google.com/g/openalex-users/c/6xvPsguNM6A) where OpenAlex developer Casey states:

    "This is implemented and available! You can now filter works by MAG, PMID, or PMCID."

  3. [OpenAlex Community Forum](https://groups.google.com/g/openalex-community/c/5foVRPybEYM) shows example:

    https://api.openalex.org/works?filter=ids.pmid:38785209|38773515

  4. [openalexR Package Documentation](https://docs.ropensci.org/openalexR/) shows working example:

    works_from_pmids <- oa_fetch(
      entity = "works",
      pmid = c("14907713", 32572199),
      verbose = TRUE
    )
    #> Requesting url: https://api.openalex.org/works?filter=pmid:14907713|32572199

Actual Behavior

All attempts to use PMID batch filtering return a 400 error with message "Invalid", regardless of:

  • Filter syntax used (pmid: vs ids.pmid:)
  • Number of PMIDs (fails even with single PMID)
  • URL encoding of pipe character
  • Presence of mailto parameter
  • Addition of per-page parameter

Steps to Reproduce

import requests

# Test 1: Single PMID with pmid: filter
response = requests.get(
    "https://api.openalex.org/works?filter=pmid:14907713&mailto=test@example.com"
)
print(f"Single PMID (pmid:): {response.status_code}")
print(response.json())
# Output: 400, {"HTTP_status_code": 400, "error": true, "message": "Invalid"}

# Test 2: Single PMID with ids.pmid: filter
response = requests.get(
    "https://api.openalex.org/works?filter=ids.pmid:14907713&mailto=test@example.com"
)
print(f"Single PMID (ids.pmid:): {response.status_code}")
print(response.json())
# Output: 400, {"HTTP_status_code": 400, "error": true, "message": "Invalid"}

# Test 3: Multiple PMIDs with pipe separator
response = requests.get(
    "https://api.openalex.org/works?filter=pmid:14907713|32572199&mailto=test@example.com"
)
print(f"Multiple PMIDs: {response.status_code}")
print(response.json())
# Output: 400, {"HTTP_status_code": 400, "error": true, "message": "Invalid"}

# Test 4: Direct lookup WORKS FINE
response = requests.get("https://api.openalex.org/works/pmid:14907713")
print(f"Direct lookup: {response.status_code}")
print(f"Found work: {response.json()['id']}")
# Output: 200, Found work: https://openalex.org/W1775749144

Comprehensive Testing Performed

We tested the following combinations:

  1. Filter field variations:

    • filter=pmid:
    • filter=ids.pmid:
    • filter=openalex: (with OpenAlex IDs) ❌
    • filter=ids.openalex:
  2. PMID formats:

    • Short form: 14907713
    • Full URL form: https://pubmed.ncbi.nlm.nih.gov/14907713
  3. Separator variations:

    • Pipe separator: pmid:123|456|789
    • URL-encoded pipe: pmid:123%7C456%7C789
    • Comma separator: pmid:123,456,789
  4. Request variations:

    • With mailto parameter ✓ (still fails)
    • With per-page=100 parameter ✓ (still fails)
    • Different batch sizes (1, 3, 20, 50 PMIDs) ❌
  5. Known-good PMIDs tested:

    • PMIDs from documentation examples: 14907713, 32572199
    • PMIDs from our dataset: 20468064, 25456007, 17885603
    • All are valid (confirmed via direct lookup)

Additional Context

  • DOI batch filtering works correctly as documented
  • Direct PMID lookup works perfectly (e.g., /works/pmid:14907713)
  • The issue affects only the filter parameter with PMIDs
  • This forces users to make N individual API calls instead of N/50 batch calls
  • Error message "Invalid" is not descriptive enough to debug the issue

Environment

  • API endpoint: https://api.openalex.org/works
  • Date tested: June 30, 2025
  • No API key used (testing with polite pool via mailto parameter)
  • Tested with: Python requests, aiohttp, and direct browser access
  • User-Agent: Various (BioQueryous/1.0, Python requests default, Chrome)

Impact

This bug significantly impacts performance for users needing to retrieve multiple works by PMID, forcing them to use individual lookups instead of efficient batch requests. For example, retrieving 1000 PMIDs requires 1000 API calls instead of 20.

Suggested Fix

Either:

  1. Fix the PMID filter to work as documented
  2. Update documentation to reflect that PMID batch filtering is not supported
  3. Provide a more descriptive error message indicating why the filter is invalid

Related Issues

  • The same issue likely affects PMCID filtering (mentioned in the same blog post but not tested)
  • Possibly related to the filter field deprecation mentioned in docs (host_venue, alternate_host_venues)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions