feat(convention): listing↔detail id pairing rule + CI gate#1297
Merged
feat(convention): listing↔detail id pairing rule + CI gate#1297
Conversation
Adds a hard convention: when a site exposes both a listing-class command (search / hot / top / recent / ...) and a detail-class command (read / paper / article / view / ...), every listing row MUST surface an id-shaped column whose value round-trips into the detail command. Without that, an agent has no way to follow up on a listing row except re-searching by title or scraping URLs out of band — both of which break the agent-native contract. What's in this PR - docs/conventions/listing-detail-id-pairing.md — full rule, examples table, why-it-matters, what counts as id-shaped, exemption taxonomy, how to add an id column to a listing. - scripts/check-listing-id-pairing.mjs — validator that reads cli-manifest.json, classifies each entry as listing / detail / other, and fails when a listing on a site that also has a read-detail command is missing an id-shaped column. Exemption allowlist records WHY each pair is exempt so future maintainers know what to verify. - npm run check:listing-id-pairing — strict-mode wrapper. - CI: new step in build job runs the validator after the manifest freshness check on Linux. - docs/developer/ts-adapter.md — cross-link from the adapter authoring guide. - docs/.vitepress/config.mts — sidebar entries for the new conventions section. Fixes brought to zero violations - 1688/search: add offer_id (already extracted, just surfaced) - bluesky/user: add uri (AT URI round-trips into bluesky/thread) - tieba/search: add id + url (thread_id already extracted) - tieba/hot: add url (rows are topics, not threads — url is the best-effort round-trip handle, doc'd as such) Exemptions (intentional, doc'd in EXEMPT map with rationale) - nowcoder/hot, bluesky/trending, twitter/trending — listing rows are topic strings, not posts. - lesswrong/user, reddit/user — rows are profile-attribute key/value pairs, addressed by the username arg. - discord-app/search — desktop UI session, message ids not extractable. - notion/search — Strategy.UI Quick Find, page ids not exposed in DOM. Validator output after this PR: 32 sites scanned, 75 listings checked, 7 exempted, 0 violations.
6 tasks
jackwener
added a commit
that referenced
this pull request
May 4, 2026
* feat(indeed): add `search` and `job` adapters (US site) Adds an Indeed adapter that fills the US job-search gap (alongside existing 51job / boss-zhipin / linkedin coverage). Both commands run through a real browser session because Indeed sits behind Cloudflare and answers bare HTTP fetches with `403` + `cf-mitigated: challenge`. ## Commands - `indeed search <query>` — keyword job search - args: `query`, `--location`, `--fromage`, `--sort`, `--start`, `--limit` - columns: `rank, id, title, company, location, salary, tags, url` - `indeed job <jk>` (alias `detail`, `view`) — full job posting - args: `id` (positional, the 16-char hex `jk` from `search`) - columns: `id, title, company, location, salary, job_type, description, url` ## Listing↔detail id pairing `search.id` is the Indeed `jk` (job key, 16-char lowercase hex). It feeds directly into `indeed job <jk>`. Conforms to the listing↔detail id pairing convention proposed in #1297. ## CF challenge handling The adapter polls the result selectors for up to 15s after navigation, giving the browser time to clear the Cloudflare interstitial. If the challenge is still up after the wait, the adapter throws a `CommandExecutionError` with a hint pointing the user at the connected browser to clear it once. Subsequent calls reuse the warmed cookies via `Strategy.COOKIE`, mirroring the v2ex / boss / linkedin patterns. ## Validation `utils.js` keeps argument validation pure and unit-testable: - `requireJobKey` rejects anything that isn't a 16-char lowercase hex - `requireFromage` only accepts `1` / `3` / `7` / `14` (Indeed's enum) - `requireSort` only accepts `relevance` / `date` - `requireBoundedInt(limit, default=15, max=25)` — Indeed serves at most one page (10 jobs/page); ArgumentError on out-of-range, no silent clamping, per the typed-error feedback in #1289. ## Tests 18 unit tests in `clis/indeed/indeed.test.js` cover registration, validators, URL builders, and DOM-card normalizers. Browser-driven verification stays out of CI by design (CF challenge is interactive). ## Docs - `docs/adapters/browser/indeed.md` — full adapter doc with prerequisite CF-challenge notes and listing↔detail id pairing callout. - Sidebar entry + adapter index row. * fix(indeed): tighten timeout fail-fast and runtime tests * fix(indeed): align readiness with search parser
3 tasks
jackwener
added a commit
that referenced
this pull request
May 4, 2026
While auditing instagram/facebook/pixiv coverage gaps, found that pixiv listings already extract `user_id` and construct `url` per row but drop both fields from the table view (`columns` doesn't list them). The data is in the row object — only the column projection was missing. Per the listing↔detail id pairing convention (#1297), surface them so: - `user_id` round-trips from `ranking` / `search` → `user` / `illusts` - `url` is the canonical share link for every illust / user record Changes: - `ranking`: + user_id, + url - `search`: + user_id, + url - `illusts`: + url (user_id is the arg, no need to repeat per row) - `user`: + url No behavior change beyond the table view — JSON output already had these fields, so existing scripts that consume `-f json` keep working.
jackwener
added a commit
that referenced
this pull request
May 4, 2026
While auditing instagram/facebook/pixiv coverage gaps, found that pixiv listings already extract `user_id` and construct `url` per row but drop both fields from the table view (`columns` doesn't list them). The data is in the row object — only the column projection was missing. Per the listing↔detail id pairing convention (#1297), surface them so: - `user_id` round-trips from `ranking` / `search` → `user` / `illusts` - `url` is the canonical share link for every illust / user record Changes: - `ranking`: + user_id, + url - `search`: + user_id, + url - `illusts`: + url (user_id is the arg, no need to repeat per row) - `user`: + url No behavior change beyond the table view — JSON output already had these fields, so existing scripts that consume `-f json` keep working.
jackwener
added a commit
that referenced
this pull request
May 4, 2026
…1300) While auditing instagram/facebook/pixiv coverage gaps, found that pixiv listings already extract `user_id` and construct `url` per row but drop both fields from the table view (`columns` doesn't list them). The data is in the row object — only the column projection was missing. Per the listing↔detail id pairing convention (#1297), surface them so: - `user_id` round-trips from `ranking` / `search` → `user` / `illusts` - `url` is the canonical share link for every illust / user record Changes: - `ranking`: + user_id, + url - `search`: + user_id, + url - `illusts`: + url (user_id is the arg, no need to repeat per row) - `user`: + url No behavior change beyond the table view — JSON output already had these fields, so existing scripts that consume `-f json` keep working.
This was referenced May 4, 2026
jackwener
added a commit
that referenced
this pull request
May 4, 2026
Round 7 — silent-drop sweep. Continues the listing→detail id-pairing work from #1297. Each row was already extracting these ids/urls internally; only the `columns` projection was missing, so they showed up in `-f json` but never on the table view. | Adapter | Added columns | |--------------------|-------------------------------------| | `1688 search` | `item_url`, `member_id` | | `hupu mentions` | `tid`, `pid`, `url` | | `douban photos` | `photo_id`, `subject_id` | | `linux-do tags` | `slug` | Round-trip wins: - `1688 search` → `1688 item <item_url>` (item_url is the canonical detail.1688.com URL); `1688 search` → `1688 store <member_id>` - `hupu mentions` → `hupu detail <tid>` (and `pid` for the deep link) - `douban photos` → tied back to the parent movie via `subject_id` - `linux-do tags` → `linux-do feed --tag <slug>` (slug is the URL form) No logic change — only the column array. JSON output unchanged. Tests: 45/45 pass for the four affected sites.
jackwener
added a commit
that referenced
this pull request
May 4, 2026
Round 6 — silent-drop audit follow-up. Sibling twitter listings have been inconsistent about the canonical tweet `id` (rest_id): - timeline ✓ exposes id - search ✓ exposes id - list-tweets ✓ exposes id - notifications ✓ exposes id - bookmarks ✗ extracts but drops it from columns - likes ✗ extracts but drops it from columns - tweets ✗ extracts but drops it from columns The `id` is already in the row object — only the `columns` projection was missing. With the listing↔detail id-pairing CI gate from #1297 now on main, surfacing `id` makes round-trip into `twitter thread <id>` / `twitter delete <id>` / `twitter like <id>` work from the table view too (previously only via `-f json`). Other field-presentation drift (`name`, `created_at`, `retweets`) aligned with sibling adapters where those values are already emitted. Tests: tweets.test.js asserts `toEqual` on the columns array — updated that assertion. Other twitter tests use `toMatchObject` and pass unchanged. 81/81 in `clis/twitter/`.
jackwener
added a commit
that referenced
this pull request
May 4, 2026
…1301) Round 6 — silent-drop audit follow-up. Sibling twitter listings have been inconsistent about the canonical tweet `id` (rest_id): - timeline ✓ exposes id - search ✓ exposes id - list-tweets ✓ exposes id - notifications ✓ exposes id - bookmarks ✗ extracts but drops it from columns - likes ✗ extracts but drops it from columns - tweets ✗ extracts but drops it from columns The `id` is already in the row object — only the `columns` projection was missing. With the listing↔detail id-pairing CI gate from #1297 now on main, surfacing `id` makes round-trip into `twitter thread <id>` / `twitter delete <id>` / `twitter like <id>` work from the table view too (previously only via `-f json`). Other field-presentation drift (`name`, `created_at`, `retweets`) aligned with sibling adapters where those values are already emitted. Tests: tweets.test.js asserts `toEqual` on the columns array — updated that assertion. Other twitter tests use `toMatchObject` and pass unchanged. 81/81 in `clis/twitter/`.
jackwener
added a commit
that referenced
this pull request
May 4, 2026
Round 7 — silent-drop sweep. Continues the listing→detail id-pairing work from #1297. Each row was already extracting these ids/urls internally; only the `columns` projection was missing, so they showed up in `-f json` but never on the table view. | Adapter | Added columns | |--------------------|-------------------------------------| | `1688 search` | `item_url`, `member_id` | | `hupu mentions` | `tid`, `pid`, `url` | | `douban photos` | `photo_id`, `subject_id` | | `linux-do tags` | `slug` | Round-trip wins: - `1688 search` → `1688 item <item_url>` (item_url is the canonical detail.1688.com URL); `1688 search` → `1688 store <member_id>` - `hupu mentions` → `hupu detail <tid>` (and `pid` for the deep link) - `douban photos` → tied back to the parent movie via `subject_id` - `linux-do tags` → `linux-do feed --tag <slug>` (slug is the URL form) No logic change — only the column array. JSON output unchanged. Tests: 45/45 pass for the four affected sites.
6 tasks
jackwener
added a commit
that referenced
this pull request
May 4, 2026
…isory (#1316) PR #1297 introduced a CI gate that fails when a site has both a listing and a detail command but the listing rows don't carry an id-shaped column. The gate came with a 10-entry EXEMPT map (topic-string trending, profile-attribute rows, UI-only sessions, ...) where each exemption recorded a "why this listing legitimately doesn't pair" reason. By the same filter that closed PR #1311 (write-without-delete-pair gate): Is "listing should pair with detail" a *permanent* anti-pattern, or case-by-case business judgment? It's case-by-case. Topic-string listings and profile-attribute rows genuinely don't pair with a detail command. The fact that we needed an EXEMPT map with 10 entries and individual reason strings is the smell — it's not the rule winning, it's the rule failing. Forcing every adapter PR to either add an id column or file an exemption was a higher cognitive cost than the silent-loss bugs the rule actually catches. Changes: - .github/workflows/ci.yml — drop the "Check listing↔detail id pairing" step. Other gates (silent-column-drop, typed-error-lint) stay in place. - package.json — rename the script from `check:listing-id-pairing` to `advise:listing-id-pairing` to make the advisory nature explicit. - scripts/check-listing-id-pairing.mjs — drop the `--strict` flag and the EXEMPT map. The script now always exits 0 and prints an advisory report of listings that don't carry an id-shaped column. Reviewers/authors use it as guidance, not a gate. - docs/conventions/listing-detail-id-pairing.md — rewrite from "MUST" to "soft convention". Adds an explicit "why advisory, not a gate" section that lists the legitimate non-pairing categories so future readers know the rule's boundary. - docs/developer/ts-adapter.md — match the advisory tone in the adapter-author guidance. The doc, the script, and the column patterns table all stay — agents and adapter authors can still consult them. What's gone is the CI failure and the per-PR exempt-list maintenance burden. Net diff: -34 lines (gate + EXEMPT map removed, advisory-tone doc adds a small "why advisory" section).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Codifies a hard convention: when a site exposes both a listing-class command (
search/hot/top/recent/feed/ ...) and a detail-class command (read/paper/article/view/ ...), every listing row MUST surface an id-shaped column whose value round-trips into the detail command.Without this, an agent has only three bad ways to follow up on a listing row — re-search by title (fragile, may hit wrong post), parse the URL (assumes URL shape), or hand-craft a search (pure guess). All of them break the agent-native contract.
This is a follow-up to PR #1296 (access metadata) — same shape: a meta-rule that prevents a whole class of silent agent failures, with a CI gate so it can't regress.
What's in this PR
Convention + tooling
docs/conventions/listing-detail-id-pairing.md— full rule, examples table covering hackernews / lobsters / arxiv / stackoverflow / openreview / devto / bilibili / reddit / bluesky / 1688 / tieba, why-it-matters explanation, what counts as id-shaped, exemption taxonomy, how to add an id column to a listing.scripts/check-listing-id-pairing.mjs— validator. Readscli-manifest.json, classifies each entry as listing / detail / other, and fails when a listing on a site that also has aaccess: 'read'detail command is missing an id-shaped column. Exemption allowlist records WHY each pair is exempt so future maintainers know what to verify.npm run check:listing-id-pairing— strict-mode wrapper.buildjob runs the validator after the manifest-freshness check on Linux.docs/developer/ts-adapter.md— cross-link from the adapter authoring guide.docs/.vitepress/config.mts— sidebar entry for the new conventions section.Fixes that bring violations to zero
1688/search→ addsoffer_id(already extracted, just surfaced)bluesky/user→ addsuri(AT URI round-trips intobluesky/thread)tieba/search→ addsid+url(thread_id already extracted)tieba/hot→ addsurl(rows are topics, not threads — url is the best-effort round-trip handle, doc'd as such)Exemptions (intentional, doc'd in
EXEMPTmap with rationale)nowcoder/hot,bluesky/trending,twitter/trending— listing rows are topic strings, not posts.lesswrong/user,reddit/user— rows are profile-attribute key/value pairs, addressed by the username arg.discord-app/search— desktop UI session, message ids not extractable.notion/search— Strategy.UI Quick Find, page ids not exposed in DOM.Validator state after this PR
Why this is its own PR
WAWQAQ flagged in #OpenCLI:48bc586b that this is a foundational rule, not a one-off polish — should ship as its own commit so the convention + gate land together and future adapters can't regress past it. Same pattern as #1296.
Test plan
npx tsc --noEmit— cleannpm run build-manifest— cleannpm run check:listing-id-pairing— OK (0 violations)npx vitest run clis/tieba/ clis/1688/search.test.js— 20/20 pass