Skip to content

fix(fetchers): harden arxiv fetcher input and body limits#116

Merged
chaliy merged 2 commits into
mainfrom
2026-05-17-propose-fix-for-arxivfetcher-vulnerability
May 17, 2026
Merged

fix(fetchers): harden arxiv fetcher input and body limits#116
chaliy merged 2 commits into
mainfrom
2026-05-17-propose-fix-for-arxivfetcher-vulnerability

Conversation

@chaliy
Copy link
Copy Markdown
Contributor

@chaliy chaliy commented May 17, 2026

Motivation

  • Close a query-injection and DoS vector where arbitrary /abs/... or /pdf/... path segments were interpolated into the arXiv API query and the full API response was read without size limits.
  • Reduce on-path tampering risk by avoiding plaintext HTTP to the arXiv export API.

Description

  • Added strict paper-id validation (is_valid_paper_id) in ArXivFetcher::parse_url to reject IDs containing unsafe characters such as & and other non-arXiv-safe characters, preventing query-string injection.
  • Switched the arXiv API endpoint to use https://export.arxiv.org instead of http://export.arxiv.org to protect against response tampering.
  • Enforced max_body_size and timeouts by reusing the shared read_body_with_timeout helper and DEFAULT_MAX_BODY_SIZE/BODY_TIMEOUT from the default fetcher instead of calling response.text() unbounded.
  • Added a unit test (test_rejects_injected_paper_id) to assert that injected IDs are rejected; the change is contained to crates/fetchkit/src/fetchers/arxiv.rs.

Testing

  • Ran cargo fmt --all and formatting completed successfully.
  • Ran cargo test -p fetchkit arxiv and the arXiv test suite passed: 14 tests ran and all passed.
  • Performed targeted smoke verification of the modified fetcher logic during the test run and observed no regressions in the arXiv-related unit tests.

Codex Task

@chaliy chaliy merged commit 1315516 into main May 17, 2026
11 checks passed
@chaliy chaliy deleted the 2026-05-17-propose-fix-for-arxivfetcher-vulnerability branch May 17, 2026 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant