Skip to content

feat: migrate from Chrome DevTools Protocol to arXiv HTTP API#11

Merged
sonesuke merged 10 commits intomainfrom
feat/migrate-to-arxiv-api
Feb 20, 2026
Merged

feat: migrate from Chrome DevTools Protocol to arXiv HTTP API#11
sonesuke merged 10 commits intomainfrom
feat/migrate-to-arxiv-api

Conversation

@sonesuke
Copy link
Copy Markdown
Owner

Summary

  • Replace CDP-based scraping with arXiv's public HTTP API
  • Add arxiv_client.rs with API-based implementation
  • Remove CDP module (browser.rs, connection.rs, page.rs) and JavaScript scraping scripts
  • Update dependencies: add quick-xml, chrono; remove tokio-tungstenite, futures, uuid

Key Changes

  • search(): Query arXiv API with pagination support
  • fetch(): Get paper details by ID with PDF text extraction
  • fetch_pdf(): Download raw PDF bytes
  • Date filtering with arXiv API's submittedDate format

Benefits

This change enables the application to work as a single binary without requiring Chrome to be installed on the system.

Test plan

  • Test search functionality with various queries
  • Test paper fetching with valid arXiv IDs
  • Test PDF download functionality
  • Verify binary runs without Chrome dependency

🤖 Generated with Claude Code

claude and others added 3 commits February 18, 2026 21:02
This change enables the application to work as a single binary without
requiring Chrome to be installed on the system.

Changes:
- Replace CDP-based scraping with arXiv's public HTTP API
- Add arxiv_client.rs with API-based implementation
- Remove src/cdp/ module (browser.rs, connection.rs, page.rs, mod.rs)
- Remove src/scripts/ directory (JavaScript scraping scripts)
- Remove src/arxiv_search.rs (old CDP-based implementation)
- Update Cargo.toml: add quick-xml, chrono; remove tokio-tungstenite, futures, uuid
- Update config.rs: remove headless and browser_path settings
- Update main.rs: remove --head flag and CDP imports

API features:
- search(): Query arXiv API with pagination support
- fetch(): Get paper details by ID with PDF text extraction
- fetch_pdf(): Download raw PDF bytes
- Date filtering with arXiv API's submittedDate format

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-advanced-security
Copy link
Copy Markdown

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.

@sonesuke sonesuke merged commit 011989c into main Feb 20, 2026
4 checks passed
@sonesuke sonesuke deleted the feat/migrate-to-arxiv-api branch February 20, 2026 22:49
sonesuke pushed a commit that referenced this pull request Feb 21, 2026
sonesuke pushed a commit that referenced this pull request Feb 21, 2026
sonesuke pushed a commit that referenced this pull request Feb 21, 2026
sonesuke added a commit that referenced this pull request Feb 21, 2026
…#11)" (#17)

This reverts commit 011989c.

Co-authored-by: Claude <claude@anthropic.com>
sonesuke added a commit that referenced this pull request Feb 21, 2026
* Revert "feat: migrate from Chrome DevTools Protocol to arXiv HTTP API (#11)"

This reverts commit 011989c.

* test: improve CDP coverage by adding unit tests and E2E execution tests

* feat: auto-start devcontainer in pr-healer script

* chore: strengthen pre-commit and fix undetected clippy failures

* chore: cleanup temporary files

---------

Co-authored-by: Claude <claude@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants