-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
enhancementNew feature or requestNew feature or request
Description
What
Add an ArXivFetcher that matches arxiv.org/abs/{id} and arxiv.org/pdf/{id} URLs, returning structured paper metadata optimized for research agents.
Why
Research agents and agents working on ML/AI tasks frequently encounter arXiv links. The current DefaultFetcher returns the noisy arXiv HTML page. The arXiv API provides clean, structured metadata including abstracts, author lists, and categorization.
Requirements
- Match:
https://arxiv.org/abs/{id},https://arxiv.org/pdf/{id} - Fetch via arXiv API:
http://export.arxiv.org/api/query?id_list={id} - Return: title, authors, abstract, categories, published/updated dates, DOI, journal ref
- For
/pdf/URLs: return metadata + indicate binary content (consistent with core binary handling) - Include links to: PDF, HTML (if available via ar5iv.labs.arxiv.org), related papers
- Format field:
"arxiv_paper"
Design Notes
- arXiv API returns Atom XML — will need XML parsing (consider
quick-xmlcrate) - ar5iv.labs.arxiv.org provides HTML versions of papers — could be fetched for full-text
- Rate limiting: arXiv asks for reasonable usage, no strict API key required
- Consider extracting references/citations if available in API response
Tier
3 — Differentiated capability
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request