Skip to content

fix(serverless-plugin): apply CloudFormation queue Properties to LocalStack#580

Merged
d-klotz merged 8 commits intoclaude/core-requester-timeoutfrom
claude/plugin-queue-attributes
Apr 27, 2026
Merged

fix(serverless-plugin): apply CloudFormation queue Properties to LocalStack#580
d-klotz merged 8 commits intoclaude/core-requester-timeoutfrom
claude/plugin-queue-attributes

Conversation

@d-klotz
Copy link
Copy Markdown
Contributor

@d-klotz d-klotz commented Apr 21, 2026

What changed

This PR now carries two stackable fixes on the same branch:

1. serverless-plugin: LocalStack queue attributes (original scope, PR title)

Commits: 5360ba85, 44871876

  • localstack-queue-service extracts SQS-relevant keys from the CloudFormation Resources block so local queues get the same VisibilityTimeout, MessageRetentionPeriod, RedrivePolicy, etc. as deployed AWS.
  • Drops attributes whose values still carry unresolved Fn::GetAtt / Ref / Fn::Sub intrinsics so we don't ship broken JSON to LocalStack.
  • Previously, queues were always created with the 30s default VisibilityTimeout locally, which caused redelivery of in-flight messages during long-running queue workers.

2. core: atomic Process updates (new, commit c0487bcb)

UpdateProcessMetrics and UpdateProcessState did read-modify-write on Process.context / Process.results. Under concurrent queue-worker invocations (reservedConcurrency > 1), later writers clobber earlier writers — counters drift low, state flags (fetchDone, etc.) silently revert to undefined, syncs stall in non-terminal states.

Fix: new applyProcessUpdate(processId, ops) method on ProcessRepositoryInterface, implemented natively in each backend:

  • PostgreSQL: one UPDATE ... RETURNING * with nested jsonb_set for increments / sets / bounded-array pushes. Row locking during UPDATE serializes concurrent callers.
  • MongoDB: findAndModify via $runCommandRaw with $inc / $set / $push+$slice. Server-side atomic.
  • DocumentDB: same operator set; $slice requires DocumentDB 4.0+ (documented).

Refactors UpdateProcessMetrics into an atomic phase (counters + error history) + non-atomic derived-fields phase (duration, recordsPerSecond, estimatedCompletion — preserved for backward compat, explicitly best-effort under concurrency).

Refactors UpdateProcessState to route context updates through applyProcessUpdate when the repo supports it, falling back to the original read-merge-write path otherwise.

Removes the long-standing TODO banner on update-process-metrics.js documenting this exact race.

Additive change: repository.update(id, patch) is unchanged. No schema migration. Callers that don't opt in see zero behavior change.

Why both on one branch

Downstream consumer (clockwork-recruiting--frigg) hit the race after consuming the earlier PR #579 (Requester timeout) + #580's LocalStack attribute fix made the Option B fan-out sync architecturally viable. The atomic-Process-update fix is the minimum-viable change in core needed to let that fan-out run safely at production concurrency. Keeping the two fixes on one branch gets the whole chain to canary in a single release.

Reviewers: the two commits are cleanly separable for cherry-pick / bisection if you want to split.

Test plan

  • packages/serverless-plugin/lib/localstack-queue-service.test.js — 18 tests for the CloudFormation-intrinsic filter
  • packages/core/integrations/repositories/process-update-ops-shared.test.js — 23 validation tests (path regex, op kinds, spec shape)
  • packages/core/integrations/use-cases/update-process-metrics.test.js — 17 tests covering atomic increment routing, pushSlice for errors, short-circuit on empty batch, non-fatal derived-fields failure, race simulation (25 concurrent increments)
  • packages/core/integrations/use-cases/update-process-state.test.js — existing 17 tests still pass; routed through applyProcessUpdate when available
  • Real-database concurrency test — deferred; adapter-level unit tests mock the DB call, but the race-fix correctness ultimately depends on each backend's atomicity guarantees (well-known for Postgres UPDATE + Mongo findAndModify). Worth a real-env integration test in a follow-up if the team wants it.

🤖 Generated with Claude Code

📦 Published PR as canary version: 2.0.0--canary.580.9716b6d.0

✨ Test out this PR locally via:

npm install @friggframework/core@2.0.0--canary.580.9716b6d.0
npm install @friggframework/devtools@2.0.0--canary.580.9716b6d.0
npm install @friggframework/eslint-config@2.0.0--canary.580.9716b6d.0
npm install @friggframework/prettier-config@2.0.0--canary.580.9716b6d.0
npm install @friggframework/schemas@2.0.0--canary.580.9716b6d.0
npm install @friggframework/serverless-plugin@2.0.0--canary.580.9716b6d.0
npm install @friggframework/test@2.0.0--canary.580.9716b6d.0
npm install @friggframework/ui@2.0.0--canary.580.9716b6d.0
# or 
yarn add @friggframework/core@2.0.0--canary.580.9716b6d.0
yarn add @friggframework/devtools@2.0.0--canary.580.9716b6d.0
yarn add @friggframework/eslint-config@2.0.0--canary.580.9716b6d.0
yarn add @friggframework/prettier-config@2.0.0--canary.580.9716b6d.0
yarn add @friggframework/schemas@2.0.0--canary.580.9716b6d.0
yarn add @friggframework/serverless-plugin@2.0.0--canary.580.9716b6d.0
yarn add @friggframework/test@2.0.0--canary.580.9716b6d.0
yarn add @friggframework/ui@2.0.0--canary.580.9716b6d.0

@d-klotz d-klotz added the release Create a release when this pr is merged label Apr 21, 2026
…lStack

LocalStack queues were being created with only `QueueName`, dropping
every CloudFormation `Properties` field. AWS (via CloudFormation)
applies the full Properties block — notably `VisibilityTimeout: 1800`
that integration-builder emits for integration queues — so local
emulation diverges from production.

Observed impact: the HubspotQueue queue worker's handler runs for
minutes (initial sync of a real firm), but SQS's default 30 s
VisibilityTimeout makes the message re-appear as visible every 30 s.
The same message gets delivered to a second worker invocation while
the first is still processing, which breaks the "one worker per
message" invariant Frigg assumes.

Change
- `extractQueueDefinitions` now looks up each queue's matching
  `AWS::SQS::Queue` resource in `serverless.service.resources.Resources`
  and attaches its `Properties` to the queue definition. If the
  resource is missing, non-SQS, or has no Properties, the behavior is
  unchanged (back-compat).
- `LocalStackQueueService.createQueue` accepts an optional
  `attributes` argument and forwards it to SQS `CreateQueue`. When
  `attributes` is missing/empty the call shape matches the previous
  `{QueueName}`-only behavior.
- `LocalStackQueueService._propertiesToAttributes` whitelists the 15
  CloudFormation Properties that map 1:1 onto SQS CreateQueue
  Attributes and drops everything else (Tags, QueueName, etc.). Object
  values like `RedrivePolicy` get JSON-encoded as AWS expects.
- `createQueues` wires the two together.

Tests: 39/39 pass (9 new cases covering the Properties-forwarding path,
back-compat for definitions without resources, and the property-to-
attribute mapping).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@d-klotz d-klotz force-pushed the claude/plugin-queue-attributes branch from 1003d8d to 5360ba8 Compare April 21, 2026 14:12
@d-klotz d-klotz changed the base branch from next to claude/core-requester-timeout April 21, 2026 14:12
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1003d8deff

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/serverless-plugin/lib/localstack-queue-service.js
…rom queue attributes

Addresses Codex review P1 on PR #580.

Serializing every CloudFormation `Properties` value with
`JSON.stringify` would forward unresolved CloudFormation intrinsics
(`Fn::GetAtt`, `Ref`, `Fn::Sub`, …) into LocalStack's `CreateQueue`
call. The concrete case: integration-builder.js emits

  RedrivePolicy: {
    maxReceiveCount: 3,
    deadLetterTargetArn: { 'Fn::GetAtt': ['InternalErrorQueue', 'Arn'] }
  }

CloudFormation resolves the Fn::GetAtt to a real ARN in AWS, but at
plugin-time the value is still the raw intrinsic object. SQS's
`RedrivePolicy` requires `deadLetterTargetArn` to be a valid ARN
string — passing the JSON blob either fails the CreateQueue call or
produces a queue with malformed DLQ config.

Change
- Added `LocalStackQueueService._containsUnresolvedIntrinsic(value)`:
  recursively detects `Fn::*` and `Ref` keys in CloudFormation Property
  values.
- `_propertiesToAttributes` now checks each attribute's value for
  unresolved intrinsics and DROPS the attribute (with a console.warn)
  instead of stringifying it. The rest of the attributes pass through
  untouched — so `VisibilityTimeout` (the main win of this PR) still
  gets applied locally; the DLQ association is skipped until the
  plugin gains a proper intrinsic resolver.

Tests
- 4 new unit tests covering the intrinsic-filter behavior, `Ref`
  handling, nested intrinsic detection, and the happy path where a
  resolved ARN string passes through untouched.
- 43/43 plugin tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@d-klotz
Copy link
Copy Markdown
Contributor Author

d-klotz commented Apr 21, 2026

Code review

Found 1 issue (same as r3118061868 from the Codex reviewer, fixed in 4487187):

  1. Unresolved CloudFormation intrinsics in queue attributes — RedrivePolicy.deadLetterTargetArn in integration-builder.js is {'Fn::GetAtt': ['InternalErrorQueue', 'Arn']}. JSON-stringifying it and passing it to LocalStack's CreateQueue would fail (SQS expects a real ARN string, not a CloudFormation intrinsic object) or produce malformed DLQ config. CloudFormation resolves this in real AWS deploys, but not at plugin-time.

for (const key of LocalStackQueueService.PROPERTY_ATTRIBUTE_KEYS) {
const value = properties[key];
if (value === undefined || value === null) continue;
attributes[key] =
typeof value === 'object' ? JSON.stringify(value) : String(value);
}
return attributes;
}

Fixed by adding a recursive _containsUnresolvedIntrinsic check; any attribute whose value still contains Fn::* or Ref keys is dropped with a console.warn instead of stringified. VisibilityTimeout (the main motivation for the PR) still lands on the queue; DLQ association stays un-wired on LocalStack, matching pre-PR behavior for that one attribute.

Other observations checked and ruled out as false positives:

  • Key-to-resource name mismatch (if custom.FooQueue key doesn't match the CF logical ID, properties would silently drop) — Frigg-generated queues always use matching IDs (see integration-builder.js:createIntegrationQueue), and the fall-through to defaults is non-regressive (pre-PR behavior was also defaults). Worth documenting but not a blocker.
  • FIFO / QueueAlreadyExists idempotency — theoretical; no regression from prior behavior.

Root CLAUDE.md checked; no guidance relevant to this diff.

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

d-klotz and others added 6 commits April 21, 2026 15:27
…lags

`UpdateProcessMetrics` and `UpdateProcessState` both did read-modify-
write against `Process.context` / `Process.results` JSON blobs. Under
concurrent queue-worker invocations (any integration with
`reservedConcurrency > 1`) the spread-and-save pattern clobbers later
writes with stale snapshots: counters drift low and context flags such
as `fetchDone` silently revert to undefined, leaving syncs stuck in
non-terminal states.

Adds `applyProcessUpdate(processId, ops)` to the ProcessRepository
interface and implements it natively in each adapter:

  - PostgreSQL: one UPDATE ... RETURNING * with nested `jsonb_set`
    expressions for increments, sets, and bounded-array pushes. Row
    locking inside the single statement serializes concurrent callers.
  - MongoDB: findAndModify via `$runCommandRaw` with `$inc` / `$set`
    / `$push` + `$slice`. Server-side atomic.
  - DocumentDB: same operator set as Mongo; adds compat note for
    `$slice` (requires DocumentDB 4.0+).

Validation (regex-checked dot-paths rooted at `context|results`,
whitelisted op kinds) lives in a shared `process-update-ops-shared`
module so every adapter fails fast before touching the DB.

Refactors `UpdateProcessMetrics` to split into an atomic phase
(counters + bounded error history) and a best-effort non-atomic
follow-up for derived fields (duration, recordsPerSecond,
estimatedCompletion) so existing consumers of those fields see no
contract change. Derived-field write failures are logged and do NOT
fail the overall call — the atomic counters already landed in phase 1.

Refactors `UpdateProcessState` to route context updates through
`applyProcessUpdate` when the repo supports it, falling back to the
original read-merge-write path for custom repos that don't. Existing
"shallow top-level merge" semantics are preserved.

Removes the long-standing TODO banner on `update-process-metrics.js`
documenting this exact race. Adds 57 unit tests covering validation,
op shapes, race simulation, empty-batch short-circuit, derived-field
non-fatal failures, and the legacy fallback in `UpdateProcessState`.

This commit is additive: `repository.update(id, patch)` is unchanged,
no schema migration, no behavior change for callers that don't opt in
to `applyProcessUpdate`.

Note on branch scoping: this change lands on `claude/plugin-queue-
attributes` per downstream consumer request. The branch now mixes a
serverless-plugin fix (earlier commits) with a core fix; consumers
reviewing this PR should treat the two concerns as separable for
bisection purposes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses two correctness issues flagged on PR #580 review of
c0487bc:

B1 (Postgres pushSlice ordering): `jsonb_agg(elem)` without an
explicit `ORDER BY` clause produces implementation-defined order even
over a `WITH ORDINALITY` subquery. Added `ORDER BY idx` inside the
aggregate so insertion order is preserved deterministically across
Postgres versions. While there: wrapped the combined-array expression
in a CTE so `${newArr}` is evaluated once on the server rather than
reconstructed three times in the outer expression.

B2 (intermediate-path silent no-op): `jsonb_set(target, path, value,
create_missing=true)` only creates the LEAF segment if missing —
intermediate segments that don't exist as objects cause the call to
return `target` unchanged. For depth-1 paths (our current only callers)
this was fine, but the interface JSDoc advertises arbitrary depth
(`context.pagination.cursor` is in the example). Added an
`ensureParents` helper that wraps each intermediate prefix path in a
`jsonb_set(..., COALESCE(cur #> path, '{}'::jsonb), true)` call,
preserving existing content and synthesizing `{}` where missing.

Known artifact documented in the generated-SQL test: the string-
substitution nature of `ensureParents` causes SQL size to grow as
O(2^N) in path depth. Fine for realistic depths (≤3 segments); a
future optimization could materialize `cur` once per iteration via
CTE to make it O(N). No runtime correctness impact — Postgres still
executes the whole expression as a single UPDATE under a single row
lock.

Adds `process-repository-postgres.test.js` with 10 SQL-generation
tests covering:
  - state-only UPDATE (no jsonb_set chains)
  - depth-1 increment (single jsonb_set wrap, no parent synthesis)
  - depth-2 set (1 parent synthesis + leaf)
  - depth-3 set (ensureParents substitution artifact documented)
  - pushSlice emits `ORDER BY idx` and binds values exactly once
  - parameter binding order is stable left-to-right
  - combined increment/set/pushSlice/newState in one UPDATE
  - null return when UPDATE hits zero rows
  - validateOps rejects invalid paths before DB call
  - depth-1 set skips parent synthesis (no wasted jsonb_set)

Test coverage gap remaining (follow-up): no real-Postgres race test —
the Frigg monorepo has MongoDB-memory-server for Mongo, but no
equivalent for Postgres. Adding one requires a team decision on pg-mem
vs testcontainers vs CI-provisioned. The SQL-generation tests here +
Postgres's documented single-UPDATE atomicity provide the correctness
argument without the infrastructure cost.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Postgres schema at `packages/core/prisma-postgresql/schema.prisma`
declares two artifacts that no migration in `prisma-postgresql/migrations/`
ever creates. `prisma migrate deploy` runs to completion (there's
nothing to apply for those declarations) and the generated client is
happy, so the drift only surfaces as P2021 / P2022 when a downstream
app actually writes to the affected paths.

Missing pieces:

1. `Entity.data Json @default("{}")` was added to the schema but
   never migrated. ModuleRepositoryPostgres.createEntity / findEntity
   persist and rehydrate any `identifiers`/`details` fields that fall
   outside the six named columns through this JSONB blob — so the
   moment an integration's `getEntityDetails` returns a field like
   `firm_subdomain`, prisma.entity.create throws
   `Unknown argument 'userId'. Did you mean 'user'?` (the first
   argument Prisma can't place is the one it names in the error —
   confusingly unrelated to the real cause). Added
   `20260422120000_add_entity_data_column` — a single
   `ALTER TABLE "Entity" ADD COLUMN IF NOT EXISTS "data" JSONB NOT
   NULL DEFAULT '{}'` so the migration is safe to re-apply on
   environments that may have worked around the drift locally.

2. The entire `Process` model is declared in the schema but no
   migration creates the table, its six indexes, or the three FK
   constraints (user, integration, self-referential parent).
   ProcessRepositoryPostgres and FriggProcessManager depend on this
   for long-running-job state tracking — fan-out sync progress,
   batched state machines — so any app that uses the new process
   primitive hits P2021 on first `prisma.process.create`. Added
   `20260422120001_create_process_table` with the table definition,
   all declared indexes, and the three FK constraints matching the
   relations in schema.prisma (userId→User CASCADE,
   integrationId→Integration CASCADE, parentProcessId→Process
   SET NULL).

Downstream apps currently working around these with local patch
migrations (e.g. lefthookhq/clockwork-recruiting--frigg's
`scripts/patch-core-migrations.js`) can delete that workaround after
bumping to the release that includes this commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Under heavy concurrent fan-out sync load, worker Lambdas were hanging the
full 900s timeout with no logs or errors. Root cause: three compounding
infrastructure defaults.

1. Prisma engineType="binary" — schema.prisma forced the binary engine,
   which forks a separate Rust query-engine child and communicates over
   an HTTP/IPC pipe that has NO client-side timeout on the Node side. A
   zombied child wedges the calling process until Lambda kills it.
   Switch both postgres and mongodb schemas to engineType="library" (the
   Prisma 3.x+ default), which loads the Rust engine as a Node-API addon
   in-process. Native crashes now surface as Runtime.ExitError instead of
   15-min hangs.

2. Aurora Serverless v2 MaxCapacity=1 ACU — at 0.5–1 ACU Aurora is CPU-
   bound under 20-way concurrent writes, inflating query latency and
   compounding pool pressure. Bump the default to 4 ACU; apps can still
   override via dbConfig.maxCapacity. Scales to MinCapacity when idle so
   cost impact is minimal.

3. DATABASE_URL had no pool or timeout params — a blocked query or row
   lock would wait forever. Append Prisma + libpq params at URL build
   time:
     connection_limit=1 (one pg conn per Lambda container; rely on Lambda
                         concurrency for parallelism — AWS best practice)
     pool_timeout=10    (throw P2024 after 10s instead of hanging)
     connect_timeout=10 (bound TCP/TLS handshake)
     socket_timeout=60  (kill dead sockets when server never responds)
     options=-c statement_timeout=30000 -c lock_timeout=10000
                        (Postgres-side 30s query cap / 10s lock-wait cap
                         — aborts with SQLSTATE 57014 / 55P03 rather than
                         silently sitting in pg_stat_activity)

Any combination of these defaults could mask the other two; all three
had to ship together to make workers fail loud and fast instead of
silently hanging.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to f1cb41c, addressing code-review concerns:

1. connection_limit 1 → 2, pool_timeout 10 → 20

   core has multiple in-handler Promise.all patterns against Prisma:
   get-process.executeMany (parallel fetch), field-encryption-service
   (batched decrypt). With connection_limit=1 these would serialize
   behind a single pool slot, and under Aurora pressure each later
   query gets less remaining budget before P2024. Bumping to 2 removes
   the cliff while staying safe against max_connections (200 Lambdas ×
   2 = 400 conns, fits aurora-pg 15 at 4 ACU). pool_timeout=20 still
   fails fast relative to Lambda's 900s cap but gives fan-outs headroom.

2. Cover the use-existing and discover-no-secret runtime-construction paths

   Those paths export DATABASE_HOST / DATABASE_PORT / DATABASE_NAME /
   DATABASE_USER / DATABASE_PASSWORD and expect consumer code to
   assemble DATABASE_URL at runtime. Previously got NONE of the new
   timeouts — a real gap. Extract LAMBDA_DATABASE_URL_QUERY_PARAMS to
   a module-scoped constant and export it as a `DATABASE_URL_PARAMS`
   env var on those two paths, with a log hint telling consumers to
   append `?${DATABASE_URL_PARAMS}` when building the URL. Single
   source of truth for the params; no drift between paths.

3. Tests now assert all five param components (connection_limit,
   pool_timeout, connect_timeout, socket_timeout, statement_timeout,
   lock_timeout) so accidental removal of any is caught.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously the GET /api/authorize handler dropped req.query.state on the
floor: it called getModuleInstanceFromType.execute(userId, entityType)
without forwarding any extra params, so the Module constructor's apiParams
never received state. As a result, the OAuth2Requester base — which
already accepts `state` via get(params, 'state', null) — always saw
state = null, and downstream API modules that hardcode &state=${this.state}
in their authorize URL (e.g. api-module-hubspot) interpolated state=null.

This commit threads state through:
- integration-router.js: GET /api/authorize passes
  { state: req.query.state } as the third arg.
- get-module-instance-from-type.js: execute(userId, type, options = {})
  forwards options.state to the Module constructor.
- module.js: Module constructor accepts state and merges { state } into
  apiParams when truthy, so OAuth2Requester picks it up.

Backwards-compatible: state defaults to null, no callers need to change.

Modules that conditionally append state (e.g. api-module-clockwork-recruiting
which uses `if (this.state) params.append('state', ...)`) continue to omit
the param when no state is provided.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
3.1% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

@d-klotz d-klotz merged commit 5ff5209 into claude/core-requester-timeout Apr 27, 2026
6 of 8 checks passed
@d-klotz d-klotz deleted the claude/plugin-queue-attributes branch April 27, 2026 21:10
@seanspeaks
Copy link
Copy Markdown
Contributor

🚀 PR was released in v2.0.0-next.81 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

prerelease This change is available in a prerelease. release Create a release when this pr is merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants