Skip to content

feat(stackoverflow): surface question_id on listings + new read <id>#1293

Merged
jackwener merged 3 commits intomainfrom
feat/stackoverflow-id-and-read
May 4, 2026
Merged

feat(stackoverflow): surface question_id on listings + new read <id>#1293
jackwener merged 3 commits intomainfrom
feat/stackoverflow-id-and-read

Conversation

@jackwener
Copy link
Copy Markdown
Owner

Summary

  • All 4 stackoverflow listings (hot / search / unanswered / bounties) only emitted [title, score, answers, url]. No id to round-trip into a body read, no tags to filter by topic, no views / is_answered / creation_date / author for triage. Now they expose rank, id (question_id), views, is_answered (omitted on unanswered since always false), tags, author, creation_date.
  • New stackoverflow read <id> fans out against the public Stack Exchange API (/questions/{id} + /questions/{id}/comments + /questions/{id}/answers + batched /answers/a;b;c/comments). Returns POST / Q-COMMENT / ANSWER / A-COMMENT rows mirroring the hackernews read and lobsters read shape. Accepted answer surfaced first with accepted='true'.
  • HTML body cleanup: <pre><code> preserved, <code> inline-fenced, <li>- , tags stripped. A shared decodeEntities handles named / decimal / hex entities, applied to both body and display_name (otherwise users like Jonas K&#246;lker come through mojibaked).
  • Typed fail-fast: ArgumentError for non-numeric id and --max-length < 100; EmptyResultError when items is empty; CommandExecutionError for HTTP non-2xx and for Stack Exchange's in-band error_id envelopes (throttle / quota). No silent clamps.

Test plan

  • npx vitest run clis/stackoverflow/stackoverflow.test.js — 14 assertions pass
    • 4 listing column-shape (incl. unanswered skipping is_answered, bounties keeping bounty column)
    • registration / args / strategy
    • ArgumentError no-fetch on non-numeric id
    • ArgumentError no-fetch on max-length < 100
    • EmptyResultError on empty items
    • CommandExecutionError on Stack Exchange error envelope (throttle)
    • Full POST/Q-COMMENT/ANSWER/A-COMMENT row order, accepted-first
    • Answer-comments fetch verified to batch ids a;b (single API call)
    • Entity decoding (named / decimal / hex) on body and display_name
    • --answers-limit honored when there are more answers
  • Live: stackoverflow hot --limit 2 -f json → id/tags/views/is_answered/author populated
  • Live: stackoverflow search "async await" --limit 1 → same agent-native shape
  • Live: stackoverflow unanswered --limit 1 → no is_answered (always false)
  • Live: stackoverflow read 11227809 --answers-limit 1 --comments-limit 2 → threaded structure with Jonas Kölker decoded properly
  • Live: stackoverflow read not-numericARGUMENT
  • Live: stackoverflow read 999999999EMPTY_RESULT

Quota notes

Stack Exchange API has a 300/day quota per IP for unauthenticated requests. A read call uses up to 4 quota units (question + question comments + answers + batched answer comments). Listings cost 1 each.

🤖 Generated with Claude Code

jackwener added 3 commits May 4, 2026 18:55
…`read <id>`

Agent-native gap: all 4 stackoverflow listings (`hot`, `search`,
`unanswered`, `bounties`) only emitted `[title, score, answers, url]`,
which means an agent could see a hot question but had no `id` to round-
trip into a body read, no `tags` to filter by topic, no `views` to gauge
demand, and no `is_answered` / `creation_date` / `author` to triage.
There also wasn't a `read` adapter, so reading a SO question through
opencli was impossible.

Listings (`hot` / `search` / `bounties` / `unanswered`):
- Add `rank`, `id` (question_id), `views`, `is_answered` (skipped on
  `unanswered` since always false), `tags` (joined), `author`
  (owner.display_name), `creation_date` columns.
- Pass `pagesize` to the upstream API instead of fetching the default
  page and trimming locally.

New `stackoverflow read <id>`:
- 4-call fan-out against the public Stack Exchange API
  (`/questions/{id}` + `/questions/{id}/comments` +
  `/questions/{id}/answers` + batched `/answers/a;b;c/comments`).
- Returns `POST` + `Q-COMMENT` + `ANSWER` + `A-COMMENT` rows mirroring
  the `hackernews read` and `lobsters read` shape.
- Accepted answer is always surfaced first and tagged `accepted='true'`;
  remaining answers follow in descending vote order, capped by
  `--answers-limit`.
- HTML body cleanup: tags stripped, `<pre><code>` preserved, `<code>`
  inline-fenced, `<li>` rendered as `- `, comments indented with `> `.
- Entity decoding: a shared `decodeEntities` handles named (incl.
  `&hellip;`/`&copy;`/etc), decimal (`&#246;`), and hex (`&#x27;`)
  forms, applied to both bodies AND `display_name` (otherwise users
  like `Jonas K&#246;lker` come through mojibaked).
- Typed fail-fast: `ArgumentError` for non-numeric id and
  `--max-length < 100` (with no-fetch assertion); `EmptyResultError`
  when `items` is empty; `CommandExecutionError` for HTTP non-2xx and
  for Stack Exchange's in-band `error_id` envelopes (throttle / quota).
  No silent clamps anywhere.

Tests: 14 vitest assertions
- 4 listing column-shape (incl. `unanswered` skipping `is_answered` and
  `bounties` keeping its `bounty` column position)
- 10 read-adapter cases: registration / args / strategy + 3 typed-error
  fail-fast paths (with no-fetch assertion on the pre-fetch ones) + the
  full POST/Q-COMMENT/ANSWER/A-COMMENT row order with accepted-first +
  the answer-comments fetch verified to batch ids semicolon-joined +
  HTML entity decoding (named/decimal/hex) on both body and display_name
  + answers-limit honored when there are more answers than the cap.

Live verification:
- `stackoverflow hot --limit 2` → `id`/`tags`/`views`/`is_answered`/
  `author` populated.
- `stackoverflow search "async await" --limit 1`,
  `stackoverflow unanswered --limit 1` → same shape.
- `stackoverflow read 79935770` and the very-long classic question
  `stackoverflow read 11227809 --answers-limit 1 --comments-limit 2`
  → produces the threaded POST/Q-COMMENT/ANSWER/A-COMMENT structure
  with proper entity decoding (`Jonas Kölker` reads correctly).
- `stackoverflow read not-numeric` → exits with `ARGUMENT`.
- `stackoverflow read 999999999` → exits with `EMPTY_RESULT`.
Apply the 3 lessons from PR #1292 (devto) review at merge time, before
B-group hits this PR:

1. CLI args may arrive as strings (e.g. `--max-length 50` → `'50'`).
   The bare `Number.isInteger(value)` in `requirePositiveInt` /
   `requireMinInt` would accept negative-but-coerced numbers and reject
   string-form integers. Now the helpers `coerceInt` first then validate,
   and the rejection message echoes the raw input via `JSON.stringify`.

2. `await fetch(url)` and `await res.json()` were not wrapped — a network
   blip would surface as a raw `TypeError` and a maintenance HTML page
   would surface as a raw `SyntaxError`. Both are now caught and rethrown
   as `CommandExecutionError` with hints, matching the in-band error_id
   path.

Tests: +3 cases (17 total)
- fetch network failure → CommandExecutionError
- malformed JSON body → CommandExecutionError
- string-form max-length "50" / "abc" rejected with ArgumentError before
  fetching
@jackwener jackwener merged commit c1a4bd3 into main May 4, 2026
11 checks passed
jackwener added a commit that referenced this pull request May 4, 2026
…it (not 'all') (#1295)

Follow-up from PR #1293 review: 'all answers' was misleading because
the implementation is limit-bounded (default 10, max 100) rather than
unbounded pagination. Spell out the actual contract — including the
accepted-answer-outside-page fallback path — so users don't expect
infinite-scroll behaviour.

Non-blocking docs-only change flagged by codex-mini1 + First-principles-1
during #1293 review.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant