fix(bindings): classify S3 NoSuchKey 404 as not-found#719
Conversation
The first /fro-bot add-project on a fresh gateway aborted in pre-flight: a valid S3 NoSuchKey 404 (the binding key legitimately absent) was misclassified as a fatal store error, deadlocking first use. The object-store adapter extracted the SDK error code and HTTP status but discarded them, leaving isNotFound() to match on message text that misses AWS's 'The specified key does not exist.' Preserve errorCode and httpStatusCode through the adapter error and classify not-found by structured code (NoSuchKey / 404) first, falling back to a widened message regex for S3-compatible stores whose wording differs. Fixes #713
fro-bot
left a comment
There was a problem hiding this comment.
Reviewed the S3 NoSuchKey classification fix end-to-end: the runtime adapter changes (types.ts, s3-adapter.ts), the gateway isNotFound classifier (store.ts), and all three call sites (getBindingByRepo, getBindingByChannelId, listBindings). Ran pnpm --filter @fro-bot/gateway --filter @fro-bot/runtime test locally — 915 tests pass (361 runtime, 554 gateway).
The fix is well-targeted and the test coverage is genuinely strong. The standout decision is correctly distinguishing the two 404 shapes: NoSuchBucket (fatal misconfig) stays err, while NoSuchKey/NotFound resolves to ok(null) so the first binding write proceeds. The structured-code-first / message-regex-fallback ordering with a bucket-not-found guard is sound, and the negative tests (403 AccessDenied, NoSuchBucket both shapes) lock in the no-false-negative invariant.
Verdict: PASS
Blocking issues
None.
Non-blocking concerns
-
errorNameis always populated, never absent (s3-adapter.ts:28).logS3ErrorsetserrorName = error instanceof Error ? error.name : 'UnknownError', soerrorNameis always attached for any real adapter error — theerrorName?: stringoptionality and the'errorName' in error === falsesingle-arg case only hold for errors built directly via the factory, not for adapter-produced ones. In practice this is harmless: a genericerror.nameof'Error'matches neitherNOT_FOUND_CODESnorFATAL_404_CODES, so classification falls through to the message regex as intended. Worth a one-line comment notingerrorNameis effectively always present from the adapter path, to prevent a future reader from assuming absence is meaningful. -
Plan/implementation drift on
httpStatusCode. The plan statesisNotFoundcheckshttpStatusCode === 404first, but the implementation deliberately does not (correctly —NoSuchBucketalso carries 404). The shipped code is the right call; only the plan doc is stale. No action needed beyond awareness. -
isObjectStoreOperationErrornarrows on'code' in error(store.ts:43) without verifyingerroris an object. Since the parameter is typedError, this is safe in-process; only flagging that the guard relies on the type contract rather than a runtime object check. Acceptable given all callers passResult.errorvalues.
Missing tests
None blocking. Two optional additions would tighten the matrix:
- A
getBindingByChannelId/listBindingscase exercising the not-found path through those call sites (onlygetBindingByRepois covered table-style; the other two shareisNotFoundso risk is low). - A message-only
NoSuchBucketwithout the word "bucket" is not a realistic AWS shape, so not needed — currentBUCKET_NOT_FOUND_REcoverage is adequate.
Risk assessment (LOW):
Blast radius is contained to binding-read classification behind isNotFound, which gates only the three read paths. Structured checks are exact-match (Set.has) and the message fallback is guarded against bucket errors, so the regression and security exposure are low. No public API break — createObjectStoreOperationError gains an optional second arg only; all single-arg callers verified unchanged. The Action dist/ does not bundle the gateway classifier (gateway dist is gitignored and built at deploy), so the dist diff in this PR is expected build churn, not the fix surface — confirm the v0.46.x backport noted in the plan lands separately for the deployed infra.
Run Summary
| Field | Value |
|---|---|
| Event | pull_request |
| Repository | fro-bot/agent |
| Run ID | 26729780607 |
| Cache | hit |
| Session | ses_17f4ab4e7ffezQkVx7Wnfd0kVI |
The first /fro-bot add-project on a fresh gateway aborted in pre-flight: a valid S3 NoSuchKey 404 (the binding key legitimately absent) was misclassified as a fatal store error, deadlocking first use. The object-store adapter extracted the SDK error code and HTTP status but discarded them, leaving isNotFound() to match on message text that misses AWS's wording. Preserve errorCode/errorName/httpStatusCode through the adapter error and classify not-found by structured code first (NoSuchKey / NotFound), with a bucket-not-found guard so NoSuchBucket stays fatal. Falls back to a widened message regex for S3-compatible stores whose wording differs. Backport of #719 onto the 0.46.x maintenance line; no v0.47.0+ features.
Summary
The first
/fro-bot add-projecton a freshly deployed gateway always failed with "Internal error checking existing bindings." A valid S3NoSuchKey404 — the binding key legitimately absent on first use — was misclassified as a fatal store error, deadlocking the flow before it could write the first binding.Root cause
The object-store adapter extracted the SDK error code and HTTP status but discarded them, leaving
isNotFound()to match on message text that misses AWS's actual message ("The specified key does not exist.").Fix
errorCode,errorName, andhttpStatusCodethrough the adapter's operation error (packages/runtime/src/object-store/).NoSuchKey/NotFound), with a widened message-regex fallback for S3-compatible stores (R2/B2/MinIO) whose wording differs.NoSuchBucket(a fatal misconfiguration) stays fatal, while a missing key resolves to "not found" so the first binding is created. A bucket-not-found message guard prevents the fallback regex from misreading a bucket error as a missing key.Verification
NoSuchKey/NotFound/404, SDK-v3errorName-only shape, non-AWS message-only wording, the legacy path, and theNoSuchBucketfatal cases (asserting they are NOT treated as not-found). Plus an adapter test proving the SDK error fields are attached to the returned error.Fixes #713