Skip to content

fix(bindings): classify S3 NoSuchKey 404 as not-found#719

Merged
marcusrbrown merged 1 commit into
mainfrom
fix/isnotfound-s3-classification
Jun 1, 2026
Merged

fix(bindings): classify S3 NoSuchKey 404 as not-found#719
marcusrbrown merged 1 commit into
mainfrom
fix/isnotfound-s3-classification

Conversation

@marcusrbrown
Copy link
Copy Markdown
Collaborator

Summary

The first /fro-bot add-project on a freshly deployed gateway always failed with "Internal error checking existing bindings." A valid S3 NoSuchKey 404 — the binding key legitimately absent on first use — was misclassified as a fatal store error, deadlocking the flow before it could write the first binding.

Root cause

The object-store adapter extracted the SDK error code and HTTP status but discarded them, leaving isNotFound() to match on message text that misses AWS's actual message ("The specified key does not exist.").

Fix

  • Preserve errorCode, errorName, and httpStatusCode through the adapter's operation error (packages/runtime/src/object-store/).
  • Classify not-found by explicit S3 error code first (NoSuchKey/NotFound), with a widened message-regex fallback for S3-compatible stores (R2/B2/MinIO) whose wording differs.
  • Distinguish the two 404 cases: NoSuchBucket (a fatal misconfiguration) stays fatal, while a missing key resolves to "not found" so the first binding is created. A bucket-not-found message guard prevents the fallback regex from misreading a bucket error as a missing key.

Verification

  • Table-driven regression tests cover the AWS message, structured NoSuchKey/NotFound/404, SDK-v3 errorName-only shape, non-AWS message-only wording, the legacy path, and the NoSuchBucket fatal cases (asserting they are NOT treated as not-found). Plus an adapter test proving the SDK error fields are attached to the returned error.
  • Gateway and runtime suites green; types, lint, and build clean.

Fixes #713

The first /fro-bot add-project on a fresh gateway aborted in pre-flight: a valid
S3 NoSuchKey 404 (the binding key legitimately absent) was misclassified as a
fatal store error, deadlocking first use. The object-store adapter extracted the
SDK error code and HTTP status but discarded them, leaving isNotFound() to match
on message text that misses AWS's 'The specified key does not exist.'

Preserve errorCode and httpStatusCode through the adapter error and classify
not-found by structured code (NoSuchKey / 404) first, falling back to a widened
message regex for S3-compatible stores whose wording differs.

Fixes #713
@marcusrbrown marcusrbrown requested a review from fro-bot as a code owner June 1, 2026 01:02
@fro-bot fro-bot added the agent: working Agent is currently working on this label Jun 1, 2026
Copy link
Copy Markdown
Owner

@fro-bot fro-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the S3 NoSuchKey classification fix end-to-end: the runtime adapter changes (types.ts, s3-adapter.ts), the gateway isNotFound classifier (store.ts), and all three call sites (getBindingByRepo, getBindingByChannelId, listBindings). Ran pnpm --filter @fro-bot/gateway --filter @fro-bot/runtime test locally — 915 tests pass (361 runtime, 554 gateway).

The fix is well-targeted and the test coverage is genuinely strong. The standout decision is correctly distinguishing the two 404 shapes: NoSuchBucket (fatal misconfig) stays err, while NoSuchKey/NotFound resolves to ok(null) so the first binding write proceeds. The structured-code-first / message-regex-fallback ordering with a bucket-not-found guard is sound, and the negative tests (403 AccessDenied, NoSuchBucket both shapes) lock in the no-false-negative invariant.

Verdict: PASS

Blocking issues

None.

Non-blocking concerns

  1. errorName is always populated, never absent (s3-adapter.ts:28). logS3Error sets errorName = error instanceof Error ? error.name : 'UnknownError', so errorName is always attached for any real adapter error — the errorName?: string optionality and the 'errorName' in error === false single-arg case only hold for errors built directly via the factory, not for adapter-produced ones. In practice this is harmless: a generic error.name of 'Error' matches neither NOT_FOUND_CODES nor FATAL_404_CODES, so classification falls through to the message regex as intended. Worth a one-line comment noting errorName is effectively always present from the adapter path, to prevent a future reader from assuming absence is meaningful.

  2. Plan/implementation drift on httpStatusCode. The plan states isNotFound checks httpStatusCode === 404 first, but the implementation deliberately does not (correctly — NoSuchBucket also carries 404). The shipped code is the right call; only the plan doc is stale. No action needed beyond awareness.

  3. isObjectStoreOperationError narrows on 'code' in error (store.ts:43) without verifying error is an object. Since the parameter is typed Error, this is safe in-process; only flagging that the guard relies on the type contract rather than a runtime object check. Acceptable given all callers pass Result.error values.

Missing tests

None blocking. Two optional additions would tighten the matrix:

  • A getBindingByChannelId / listBindings case exercising the not-found path through those call sites (only getBindingByRepo is covered table-style; the other two share isNotFound so risk is low).
  • A message-only NoSuchBucket without the word "bucket" is not a realistic AWS shape, so not needed — current BUCKET_NOT_FOUND_RE coverage is adequate.

Risk assessment (LOW):

Blast radius is contained to binding-read classification behind isNotFound, which gates only the three read paths. Structured checks are exact-match (Set.has) and the message fallback is guarded against bucket errors, so the regression and security exposure are low. No public API break — createObjectStoreOperationError gains an optional second arg only; all single-arg callers verified unchanged. The Action dist/ does not bundle the gateway classifier (gateway dist is gitignored and built at deploy), so the dist diff in this PR is expected build churn, not the fix surface — confirm the v0.46.x backport noted in the plan lands separately for the deployed infra.


Run Summary
Field Value
Event pull_request
Repository fro-bot/agent
Run ID 26729780607
Cache hit
Session ses_17f4ab4e7ffezQkVx7Wnfd0kVI

@fro-bot fro-bot removed the agent: working Agent is currently working on this label Jun 1, 2026
@marcusrbrown marcusrbrown merged commit b2a63b2 into main Jun 1, 2026
11 checks passed
@marcusrbrown marcusrbrown deleted the fix/isnotfound-s3-classification branch June 1, 2026 01:07
fro-bot added a commit that referenced this pull request Jun 1, 2026
The first /fro-bot add-project on a fresh gateway aborted in pre-flight: a
valid S3 NoSuchKey 404 (the binding key legitimately absent) was misclassified
as a fatal store error, deadlocking first use. The object-store adapter
extracted the SDK error code and HTTP status but discarded them, leaving
isNotFound() to match on message text that misses AWS's wording.

Preserve errorCode/errorName/httpStatusCode through the adapter error and
classify not-found by structured code first (NoSuchKey / NotFound), with a
bucket-not-found guard so NoSuchBucket stays fatal. Falls back to a widened
message regex for S3-compatible stores whose wording differs.

Backport of #719 onto the 0.46.x maintenance line; no v0.47.0+ features.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add-project: first-use fails — isNotFound() regex misses S3 "does not exist" message (NoSuchKey treated as fatal)

2 participants