fix(hex-id): auto-recover malformed hex IDs in view commands (CLI-16G)#777
fix(hex-id): auto-recover malformed hex IDs in view commands (CLI-16G)#777
Conversation
Adds a recovery pipeline for malformed hex IDs (event, trace, log, span) that catches `validateHexId` failures and attempts to heal them instead of surfacing a raw ValidationError. Follows the AGENTS.md UX principle: "do the intent, gently nudge". Classes of malformed input handled, all drawn from real CLI telemetry: - **Trailing junk concat** (CLI-1A8 merged into CLI-16G): `<32hex>ios` → strip `ios`, warn, continue. - **UI-truncated pure prefix** (dominant pattern in CLI-16G): `05d6975ab9bb` (12 hex) → fuzzy API lookup → resolve to full ID. Minimum 8 hex chars, matching Sentry UI's `getShortEventId`. - **Middle-ellipsis** (CLI's own log formatter output): `abc123...def456` → prefix + suffix filter in fuzzy lookup. - **Off-by-one truncation**: 31-hex / 15-hex → treated as prefix. - **Cross-entity redirect** (CLI-M0): 16-hex span ID passed to `trace view` → resolve span's parent trace and warn. - **Sentinel leak** (CLI-16G `null`, `latest`, `get`, `@latest`): targeted error identifying the likely shell/pipeline leak. - **Slug mistaken for ID** (CLI-16G `apacta-2-wttd`, CLI-M0 `frontend`, CLI-197 `uts-patient-app-7d`): hint pointing at `issue list` / `trace list` / `log list` with the slug. - **Over-nested paths** (CLI-16G `org/project/extra/id`): explicit error. - **Retention hint** (CLI-EQ, CLI-1A4, CLI-3V): "No trace/log/event found" messages now mention retention limits rather than vague wording. Multi-match inputs surface a `ResolutionError` with up to 5 candidates — never auto-pick, since 8-hex collisions are real in busy orgs. Implementation: - New `src/lib/hex-id-recovery.ts` — pure helpers (`preNormalize`, `stripTrailingNonHex`, `extractHexCandidate`, `looksLikeSlug`, `isOverNestedPath`) plus async `recoverHexId` with per-entity adapters (event/trace/log/span) backed by `listSpans`, `listTransactions`, `listLogs`. Scan-and-filter is primary since wildcard queries on UUID fields are unverified server-side. - `handleRecoveryResult` centralizes the command-layer unwrap so every view command logs and errors identically. - Command wiring: `event/view.ts`, `log/view.ts`, `span/view.ts`, and `trace-target.ts` (`parseTraceTargetWithRecovery`) defer strict validation until after org/project resolution so recovery can use API context. Auto-detect paths skip heavy resolution during recovery and run only the cheap local classifications. - `hex-id.ts` hint softened: "if this is a project" → "if it is a name or slug", since recovery now provides entity-specific suggestions. Tests: 71 new cases in `test/lib/hex-id-recovery*` (60 unit + 11 property), each driven from real CLI telemetry inputs. Existing log and span view tests updated for the deferred-validation contract. Fixes CLI-16G. Helps CLI-M0, CLI-163, CLI-197, CLI-EQ, CLI-1A4, CLI-3V.
Semver Impact of This PR🟢 Patch (bug fixes) 📋 Changelog PreviewThis is how your changes will appear in the changelog. Bug Fixes 🐛
Internal Changes 🔧
🤖 This preview updates automatically when you update the PR. |
|
Codecov Results 📊✅ 138 passed | Total: 138 | Pass Rate: 100% | Execution Time: 0ms 📊 Comparison with Base Branch
✨ No test changes detected All tests are passing successfully. ✅ Patch coverage is 91.83%. Project has 1690 uncovered lines. Files with missing lines (4)
Coverage diff@@ Coverage Diff @@
## main #PR +/-##
==========================================
+ Coverage 95.48% 95.50% +0.02%
==========================================
Files 255 257 +2
Lines 36842 37569 +727
Branches 0 0 —
==========================================
+ Hits 35178 35879 +701
- Misses 1664 1690 +26
- Partials 0 0 —Generated by Codecov Action |
Addresses PR review feedback and adds deterministic retention detection:
1. **Shared regex/constants** (review comment on line 144/152):
`LEADING_HEX_RE`, `LEADING_HEX_CHAR_RE`, `MIDDLE_ELLIPSIS_RE`,
`PURE_HEX_RE`, `ALPHA_SEGMENT_RE` move from `hex-id-recovery.ts`
into `hex-id.ts` alongside the existing `HEX_ID_RE` / `SPAN_ID_RE`
/ `UUID_DASH_RE`. New `retention.ts` owns `RETENTION_DAYS` and
`SCAN_PERIODS` tables so `api/logs.ts` and `hex-id-recovery.ts`
reference the same source of truth.
2. **UUIDv7 retention check** (PR discussion thought 1):
Sentry log IDs are UUIDv7 — the first 12 hex chars encode a
millisecond timestamp. New `decodeUuidV7Timestamp` /
`ageInDaysFromUuidV7` helpers in `hex-id.ts` let us say
deterministically "created 147 days ago, past the 90-day log
retention window" instead of the vague "may have been deleted".
Wired into two places:
- `recoverHexId`: after `stripTrailingNonHex` succeeds, check the
stripped ID's age before returning — if expired, surface
`past-retention` rather than a hex that leads to a 404.
- `log/view.ts#throwNotFoundError`: when the API returns empty,
annotate each requested ID with " (created N days ago, past the
90-day log retention)" when applicable.
Traces/events have no hard cap (plan-dependent) so they skip the
check and retain their generic wording.
3. **Clearer not-found wording** (PR discussion thought 2):
Dropped the speculative "may have been deleted" hedge for traces
and events in favor of "ID format is valid but no matching
<entity> exists in this project — check you're querying the right
org/project". Matches the intuition that valid hex IDs always
came from somewhere real; a 404 is more often a wrong-scope issue
than silent deletion.
4. **DRY pass on `hex-id-recovery.ts`** (plus lore feedback):
- Adapter signature simplified: `(ctx) => Promise<string[]>` — the
caller filters by candidate. Drops per-adapter duplicate filter
logic.
- `SLUG_REDIRECT_COMMAND` lookup table replaces the switch in
`buildSlugHint`.
- `RETENTION_DAYS[entityType] !== null` drives per-entity no-match
wording instead of a switch.
- Dropped the `LocalClassificationInput` bundle type — the local
classifications are inlined in `recoverHexId` now that the
decision tree is flatter.
Net effect: hex-id-recovery.ts drops from 884 to 754 lines (500 real
LOC vs 600+), with fewer indirection layers.
Tests: +7 UUIDv7 decode/age cases (unit + property) and +3 retention
scenarios in the recovery decision tree. All existing tests updated
for the new adapter signature and wording changes. Existing 5061
lib + command tests still pass.
Pre-existing AGENTS.md auto-lore updates included per repo convention.
|
Addressed both review comments plus the two thoughts about retention messaging, all in 1. "Can we deterministically detect deletion with UUIDv7?"Yes — for logs. Sentry log IDs are UUIDv7 and the first 12 hex chars encode a ms timestamp. Verified on real telemetry:
Traces and events currently use v4 (random), so no deterministic age claim is possible. If they migrate to v7 later, the helpers work automatically. New
2. "Where are these IDs coming from? They're not fuzzing."Agreed — a valid 32-char hex isn't random. They come from:
So the not-found wording is now less apologetic and more directive. For traces/events: For event view it becomes three bullets that match the three likely causes (wrong org, wrong project, retention). Other changes
5061 lib + command tests still pass; 10 new test cases for UUIDv7 decode/age and the retention paths. |
Addresses Cursor Bugbot finding on PR #777: `tryResolveTraceRecoveryOrg` returned `parsed.org` directly, skipping `resolveEffectiveOrg()`. For DSN-style numeric org IDs (e.g. `o1081365`), fuzzy-adapter API calls like `listSpans` would silently 404/403 and recovery would degrade to `api-error` instead of producing a useful fuzzy match. Aligns with event view's `tryResolveRecoveryOrg` which already normalizes via `resolveEffectiveOrg`. The function becomes async to match. Swallows non-auth errors so a resolution failure during recovery never masks the original validation error.
Three low-severity issues raised by Cursor Bugbot: 1. **`LOG_RETENTION_PERIOD` derived from `RETENTION_DAYS.log`** — previously both encoded "90 days" independently; now the period string is built from the numeric value so a single edit updates both. 2. **`preNormalize` URL-prefix stripping loops to stay idempotent** — the loop handles pathological double-prefixed inputs like `span-span-abc` (one pass would leave `span-abc` behind, breaking the idempotency property the test asserts). Added a unit test for the nested-prefix case. 3. **`listLogs` switched from dynamic to static import** — the other API adapters (`listSpans`, `listTransactions`) from the same module are already statically imported. The lazy `await import()` was inconsistent and added needless async overhead with no benefit.
Two more low-severity issues raised by Bugbot:
1. **Circular type dependency**: `HexEntityType` moved from
`hex-id-recovery.ts` to `hex-id.ts`. `retention.ts` now imports
from the lower-level module instead of a higher-level one.
`hex-id-recovery.ts` re-exports for backwards compat with callers
that already import from it.
2. **Redundant `retentionSuffix` calls in `throwNotFoundError`**:
the multi-ID path computed the suffix twice per ID (once for the
annotation, once for the "any expired?" check), each time decoding
the UUIDv7 timestamp. Now computed once into a `{id, suffix}[]`
array and reused.
Bugbot medium-severity finding: `recoverHexId` could fall through to the fuzzy lookup path for inputs that were already valid hex IDs once `preNormalize` stripped a URL fragment prefix (`span-`, `event-`, …). If the entity happened to sit outside the fuzzy scan window, a perfectly recoverable input would surface a spurious "no-matches" error. Added `tryPreNormalizedValidId` — runs right after pre-normalization and returns a `stripped` result directly when: - the cleaned value passes the entity's regex, AND - it differs from the raw lowercased input (so we don't flag inputs that were always valid — those never reach `recoverHexId` anyway). UUIDv7 retention check applies here too: a valid `span-<expired-id>` still surfaces as `past-retention` rather than returning the stripped hex. Added regression tests for `span-<valid-span-id>` and `event-<valid-event-id>` forms verifying the adapter is NOT invoked.
Three more issues from Bugbot: **Medium-severity**: `stripTrailingNonHex` silently truncated valid hex input when the caller passed a 32-char trace ID as a 16-char span ID. The function matched all 32 hex chars via `LEADING_HEX_RE`, took the first 16 as "span ID" and the last 16 as "stripped junk", masking `validateSpanId`'s dedicated "looks like a trace ID" hint and potentially returning wrong data. Now requires at least one non-hex character in the stripped tail before returning non-null. `NON_HEX_RE` exported from `hex-id.ts` (already existed privately) for reuse. Added regression test: `stripTrailingNonHex(VALID_32, 16) === null`. **Low-severity**: `tryPreNormalizedValidId`'s `stripped` field was `lowered.replace(cleaned, "")`, which silently failed to find a match when `preNormalize` removed both a URL-fragment prefix AND UUID dashes (dashless `cleaned` isn't a literal substring of dashed `lowered`). Added `describeStrippedParts` helper: exact substring diff when possible, otherwise a transformation summary like `"event- + UUID dashes"`. Recovered ID was already correct; this just fixes the warning text. **Low-severity**: `tryResolveRecoveryOrg` (event/view.ts) and `tryResolveTraceRecoveryOrg` (trace-target.ts) had identical switch logic — same resolution semantics, same error handling, same defensive swallow. Consolidated into a single exported `resolveRecoveryOrg` helper in `hex-id-recovery.ts`. trace-target.ts now wraps it with `parseOrgProjectArg` at the call site.
Bugbot finding: slug-like inputs whose dashless form happens to start with enough hex chars (e.g. `cafe-babe-app` → `cafebabeapp` — 8-char hex prefix) were reaching `runFuzzyLookup` and wasting API calls instead of being classified as slugs. `looksLikeSlug` only ran in the too-short fallback, which the candidate skipped past. Moved the `looksLikeSlug(input)` check ahead of the fuzzy path in `recoverHexId` so the raw input — dashes intact — gets checked before we commit to a scan. Dropped the now-redundant `classifyShortInput` helper; the remaining "too-short" branch no longer needs to disambiguate slugs. Added regression test: `cafe-babe-app` → `looks-like-slug`, adapter is not invoked.
Bugbot finding: `preNormalize` checked for sentinel values (`null`, `latest`, …) on the raw input BEFORE stripping URL fragment prefixes. Inputs like `span-null` (shell variable leak through a URL builder) had their prefix stripped to `null` but `sentinel` was never set, so downstream `looksLikeSlug` ran and produced a misleading "looks like a project slug" error instead of the targeted "shell variable was empty" sentinel hint. Moved the sentinel check to run AFTER the prefix-strip loop so it operates on the normalized value. Regression tests cover `span-null` and `event-latest` inputs.
Bugbot finding: `throwNotFoundError` hardcoded `"the last 90 days"` in its generic (non-UUIDv7) fallback hint, bypassing the `RETENTION_DAYS.log` constant the PR introduced as the single source of truth. If the retention value ever changes, the retention-aware path (`retentionSuffix`) would stay correct but the fallback string would go stale. Fix: build `genericHint` from `RETENTION_DAYS.log` so both paths reference the same constant.
Polish pass based on an outside code review after all Bugbot rounds
settled. Four genuine blockers plus ten smaller quality improvements.
Blockers:
- **B1** `no-matches`, `past-retention`, and `looks-like-slug` now
surface as `ResolutionError` (not `ValidationError`) per AGENTS.md —
these are cases where the user *provided* a value that couldn't be
resolved, not malformed input. Agent-consumers that branch on error
class now get the right signal.
- **B2** log `no-matches` hint no longer conflates the 30d scan window
with the 90d retention cap. New wording: "Prefix lookups are
limited to 30d; logs are retained for up to 90 days — pass the full
32-character ID to look up a log older than 30d." Users with 30-90d
logs now get actionable guidance.
- **B3** `span view` didn't run trace-ID recovery. `parsePositionalArgs`
now returns a discriminated `{kind: "resolved" | "deferred"}` and
the command layer routes deferred inputs through
`parseTraceTargetWithRecovery`, mirroring `trace view`.
- **B4** `parseTraceTargetWithRecovery` wraps `parseOrgProjectArg` in a
try/catch so a malformed slug during recovery can no longer mask
the original trace-ID `ValidationError` with an unrelated slug
error. Falls back to empty-context recovery (still runs the cheap
local classifications).
Should-fix:
- Dropped unused `AbortSignal` from `LookupContext` — no adapter ever
respected it.
- Added `Sentry.addBreadcrumb` in a best-effort wrapper around every
`recoverHexId` call so we can track recovery outcomes post-deploy
(the telemetry the plan file called for, previously missing).
- Every adapter now guards on `ctx.org && ctx.project` (not just
`ctx.project`). Belt-and-suspenders — prevents a malformed URL
path if a future caller plumbs an empty org.
- `Promise.all` parallelizes `validateAndRecoverLogId` over raw log
IDs instead of sequential awaits. Warnings still print in arrival
order; a throwing recovery exits via Promise.all's first-rejection.
- Log view retention hint paths share a single `genericHint` derived
from `RETENTION_DAYS.log`.
- `LOG_RETENTION_PERIOD` asserts `RETENTION_DAYS.log` is a positive
number at module load instead of a silent `?? 90` fallback that
hid refactoring risk.
- `retention.ts` docstring clarifies that `SCAN_PERIODS` and
`RETENTION_DAYS` are decoupled concepts (scan cost vs. product
retention).
Nice-to-haves:
- Dropped unreachable `HEX_ID_RE` guard inside
`tryCrossEntityRedirect` (the preceding `SPAN_ID_RE` check rules
out 32-char inputs).
- Inlined `capitalize()` into its two call sites.
- Dropped unused `_cwd` param from `parseTraceTargetWithRecovery`.
- `URL_FRAGMENT_PREFIXES` is now `as const`.
- Rewrote `preNormalize` JSDoc as a numbered pipeline matching
execution order.
- `log/view.ts` uses shared `isAllDigits` util instead of a local
`ALL_DIGITS_RE`.
- `describeStrippedParts` peels URL prefixes in a loop so nested
`span-span-abc` forms report both layers.
- Narrowed UUIDv7 property-test `msArb` to post-RFC-9562 timestamps.
Tests:
- 10 new adapter coverage tests in
`test/lib/hex-id-recovery.adapters.test.ts` mocking
`globalThis.fetch`. Verify query-string correctness for each
adapter (`dataset=*`, `statsPeriod`, `project:` / `trace:`
filters), empty-context short-circuits, and the
`findTraceBySpanId` redirect path.
- Existing span/log view tests updated for the new
`{kind: "resolved" | "deferred"}` contract.
Net: 5078 tests pass (+10). Typecheck + lint clean.
- Replace numeric literal `1_735_689_600_000_000_000` (nanoseconds exceed `Number.MAX_SAFE_INTEGER`) with a string form. The log schema coerces numeric-string fields to numbers on the way in, so the test still exercises the real parsing. - Replace `let callCount = 0; callCount++` with a wrapper object to avoid Biome's `noIncrementDecrement` rule. - Combine the bun:test imports onto a single line per formatter.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 02a8307. Configure here.

Summary
Auto-recovers from malformed hex IDs passed to
event view,trace view,log view, andspan viewinstead of surfacing a rawValidationError. Follows the AGENTS.md UX principle: "do the intent, gently nudge".Every recovery class is driven by real telemetry from the
cliSentry project — no speculative edge cases.What gets recovered
c0a5a9d4dce44358ab4231fc3bead7e9iosios, warn, continue05d6975ab9bbabc123...def456<31hex>trace view)9b3e164945bfd210span-,txn-, …)span-a1b2c3d4e5f67890019d...null,latest,@latest,getapacta-2-wttd,frontendissue list/trace listorg/proj/extra/idMultiple fuzzy matches surface a
ResolutionErrorwith up to 5 candidates — never auto-pick. Min prefix is 8 hex chars, matching Sentry UI'sgetShortEventId.Implementation
src/lib/hex-id-recovery.ts: pure helpers + asyncrecoverHexIdwith per-entity adapters (scan-and-filter vialistSpans/listTransactions/listLogs).handleRecoveryResultcentralizes warnings + error classes so every view command behaves identically.Sentry.addBreadcrumbrecords each recovery outcome for post-deploy validation.src/lib/retention.ts: sharedRETENTION_DAYS/SCAN_PERIODS/LOG_RETENTION_PERIODconstants.api/logs.tsand the recovery module import from the same source of truth.src/lib/hex-id.ts:decodeUuidV7Timestamp/ageInDaysFromUuidV7. Used to surface deterministic "past retention" messages for logs (which Sentry emits as UUIDv7) without needing an API round-trip.LEADING_HEX_RE,MIDDLE_ELLIPSIS_RE,PURE_HEX_RE,ALPHA_SEGMENT_RE,NON_HEX_RE) consolidated insrc/lib/hex-id.tsand imported by the recovery module.event/view.ts,log/view.ts,span/view.ts, andtrace-target.ts(parseTraceTargetWithRecovery). Validation is deferred until after org/project resolution so recovery has API context;span viewroutes through the same recovery path astrace viewvia a sharedresolveRecoveryOrghelper.ResolutionErrorfor things the user provided that couldn't be resolved (no-matches,past-retention,looks-like-slug,multiple-matches);ValidationErrorfor malformed input (sentinel-leak,over-nested,too-short).Tests
test/lib/hex-id-recovery.test.ts— unit tests for the decision tree (stubbed adapters) and pure helpers.test/lib/hex-id-recovery.property.test.ts— property tests for strip/extract idempotence, UUIDv7 timestamp round-trips, monotonicity of age.test/lib/hex-id-recovery.adapters.test.ts— fetch-mocked coverage of each adapter: query strings, empty-context short-circuits,findTraceBySpanIdredirect path.test/lib/hex-id.test.ts— UUIDv7 decode/age unit + property tests.Typecheck clean. Lint clean. 5078 unit + 138 isolated tests pass.
Issues
Fixes CLI-16G (which merged CLI-1A8).
Helps CLI-M0, CLI-163, CLI-197, CLI-EQ, CLI-1A4, CLI-3V — these should go quiet after deploy; verify via the new
hex_id_recoverySentry breadcrumbs before marking resolved.The full design plan is attached as a git note on the first commit:
git notes show 950d8e80.