Skip to content

feat(sources): add Semantic Scholar adapter with citation context#22

Merged
foundatron merged 1 commit intomainfrom
issue-3
Mar 11, 2026
Merged

feat(sources): add Semantic Scholar adapter with citation context#22
foundatron merged 1 commit intomainfrom
issue-3

Conversation

@foundatron
Copy link
Owner

Closes #3

Changes

1. tentacle/config.py

  • Add SemanticScholarSourceConfig(SourceConfig) dataclass with: s2_api_key: str = "", min_citations: int = 0, days_back: int | None = None
  • Change Config.semantic_scholar type from SourceConfig to SemanticScholarSourceConfig
  • Add env var override in load_config: read S2_API_KEY into config.semantic_scholar.s2_api_key
  • Add validation in validate: min_citations >= 0, days_back >= 1 (if set)
  • Update DEFAULT_CONFIG_TEMPLATE with commented-out s2_api_key, min_citations, days_back

2. tentacle/sources/semantic_scholar.py

  • Add __init__(self, api_key: str = "", min_citations: int = 0, days_back: int | None = None) — store on instance
  • In _search, add x-api-key header to Request when self._api_key is non-empty
  • Add citationCount to the fields query parameter
  • Add publicationDateOrYear parameter (verify exact name against S2 docs) using YYYY-MM-DD:YYYY-MM-DD format when days_back is set
  • Add _fetch_with_retry(self, req: Request) -> bytes method: wraps urlopen, retries on HTTP 429 respecting Retry-After header (parse as int, fallback to 1s), max 3 retries with exponential backoff. Non-retryable HTTP errors (4xx/5xx) are caught, logged, and return None
  • In _search, call _fetch_with_retry instead of raw urlopen; on None return, skip that query
  • After parsing, filter out papers where citationCount (defaulting None to 0) is below min_citations

3. tentacle/cli.py

  • Update _get_sources to pass api_key=config.semantic_scholar.s2_api_key, min_citations=config.semantic_scholar.min_citations, days_back=config.semantic_scholar.days_back to SemanticScholarAdapter()

4. tentacle/models.py

  • No changes needed

Review Findings

  • Errors: 0
  • Warnings: 3
  • Nits: 5
  • Assessment: NEEDS CHANGES

The most actionable items are: (1) catching URLError in _fetch_with_retry to handle network-level failures gracefully, (2) adding a log line when Retry-After header parsing fails, and (3) adding a test for non-integer Retry-After values. The overall implementation is solid — retry logic, config validation, env var override, and test coverage are all well-structured.

… Semantic Scholar adapter

- Add SemanticScholarSourceConfig subclass with s2_api_key, min_citations, days_back fields
- Add S2_API_KEY env var override in load_config
- Add validation for min_citations >= 0 and days_back >= 1
- Add x-api-key header when api_key is set
- Add citationCount to fields param; filter papers below min_citations (None treated as 0)
- Add publicationDateOrYear param when days_back is set (YYYY-MM-DD:YYYY-MM-DD format)
- Add _fetch_with_retry with 429 retry respecting Retry-After header (max 3 retries)
- Non-retryable HTTP errors are logged and the query is skipped gracefully
- Update cli.py to pass new params to SemanticScholarAdapter
- Add 11 new tests covering all new behaviours

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@foundatron foundatron merged commit 1b36fb9 into main Mar 11, 2026
1 check passed
@foundatron foundatron deleted the issue-3 branch March 11, 2026 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(sources): add Semantic Scholar adapter with citation context

1 participant