fix(serverless-plugin): apply CloudFormation queue Properties to LocalStack by d-klotz · Pull Request #580 · friggframework/frigg

d-klotz · 2026-04-21T14:09:48Z

What changed

This PR now carries two stackable fixes on the same branch:

1. serverless-plugin: LocalStack queue attributes (original scope, PR title)

Commits: 5360ba85, 44871876

localstack-queue-service extracts SQS-relevant keys from the CloudFormation Resources block so local queues get the same VisibilityTimeout, MessageRetentionPeriod, RedrivePolicy, etc. as deployed AWS.
Drops attributes whose values still carry unresolved Fn::GetAtt / Ref / Fn::Sub intrinsics so we don't ship broken JSON to LocalStack.
Previously, queues were always created with the 30s default VisibilityTimeout locally, which caused redelivery of in-flight messages during long-running queue workers.

2. core: atomic Process updates (new, commit `c0487bcb`)

UpdateProcessMetrics and UpdateProcessState did read-modify-write on Process.context / Process.results. Under concurrent queue-worker invocations (reservedConcurrency > 1), later writers clobber earlier writers — counters drift low, state flags (fetchDone, etc.) silently revert to undefined, syncs stall in non-terminal states.

Fix: new applyProcessUpdate(processId, ops) method on ProcessRepositoryInterface, implemented natively in each backend:

PostgreSQL: one UPDATE ... RETURNING * with nested jsonb_set for increments / sets / bounded-array pushes. Row locking during UPDATE serializes concurrent callers.
MongoDB: findAndModify via $runCommandRaw with $inc / $set / $push+$slice. Server-side atomic.
DocumentDB: same operator set; $slice requires DocumentDB 4.0+ (documented).

Refactors UpdateProcessMetrics into an atomic phase (counters + error history) + non-atomic derived-fields phase (duration, recordsPerSecond, estimatedCompletion — preserved for backward compat, explicitly best-effort under concurrency).

Refactors UpdateProcessState to route context updates through applyProcessUpdate when the repo supports it, falling back to the original read-merge-write path otherwise.

Removes the long-standing TODO banner on update-process-metrics.js documenting this exact race.

Additive change: repository.update(id, patch) is unchanged. No schema migration. Callers that don't opt in see zero behavior change.

Why both on one branch

Downstream consumer (clockwork-recruiting--frigg) hit the race after consuming the earlier PR #579 (Requester timeout) + #580's LocalStack attribute fix made the Option B fan-out sync architecturally viable. The atomic-Process-update fix is the minimum-viable change in core needed to let that fan-out run safely at production concurrency. Keeping the two fixes on one branch gets the whole chain to canary in a single release.

Reviewers: the two commits are cleanly separable for cherry-pick / bisection if you want to split.

Test plan

packages/serverless-plugin/lib/localstack-queue-service.test.js — 18 tests for the CloudFormation-intrinsic filter
packages/core/integrations/repositories/process-update-ops-shared.test.js — 23 validation tests (path regex, op kinds, spec shape)
packages/core/integrations/use-cases/update-process-metrics.test.js — 17 tests covering atomic increment routing, pushSlice for errors, short-circuit on empty batch, non-fatal derived-fields failure, race simulation (25 concurrent increments)
packages/core/integrations/use-cases/update-process-state.test.js — existing 17 tests still pass; routed through applyProcessUpdate when available
Real-database concurrency test — deferred; adapter-level unit tests mock the DB call, but the race-fix correctness ultimately depends on each backend's atomicity guarantees (well-known for Postgres UPDATE + Mongo findAndModify). Worth a real-env integration test in a follow-up if the team wants it.

🤖 Generated with Claude Code

📦 Published PR as canary version: 2.0.0--canary.580.9716b6d.0

✨ Test out this PR locally via:

npm install @friggframework/core@2.0.0--canary.580.9716b6d.0
npm install @friggframework/devtools@2.0.0--canary.580.9716b6d.0
npm install @friggframework/eslint-config@2.0.0--canary.580.9716b6d.0
npm install @friggframework/prettier-config@2.0.0--canary.580.9716b6d.0
npm install @friggframework/schemas@2.0.0--canary.580.9716b6d.0
npm install @friggframework/serverless-plugin@2.0.0--canary.580.9716b6d.0
npm install @friggframework/test@2.0.0--canary.580.9716b6d.0
npm install @friggframework/ui@2.0.0--canary.580.9716b6d.0
# or 
yarn add @friggframework/core@2.0.0--canary.580.9716b6d.0
yarn add @friggframework/devtools@2.0.0--canary.580.9716b6d.0
yarn add @friggframework/eslint-config@2.0.0--canary.580.9716b6d.0
yarn add @friggframework/prettier-config@2.0.0--canary.580.9716b6d.0
yarn add @friggframework/schemas@2.0.0--canary.580.9716b6d.0
yarn add @friggframework/serverless-plugin@2.0.0--canary.580.9716b6d.0
yarn add @friggframework/test@2.0.0--canary.580.9716b6d.0
yarn add @friggframework/ui@2.0.0--canary.580.9716b6d.0

…lStack LocalStack queues were being created with only `QueueName`, dropping every CloudFormation `Properties` field. AWS (via CloudFormation) applies the full Properties block — notably `VisibilityTimeout: 1800` that integration-builder emits for integration queues — so local emulation diverges from production. Observed impact: the HubspotQueue queue worker's handler runs for minutes (initial sync of a real firm), but SQS's default 30 s VisibilityTimeout makes the message re-appear as visible every 30 s. The same message gets delivered to a second worker invocation while the first is still processing, which breaks the "one worker per message" invariant Frigg assumes. Change - `extractQueueDefinitions` now looks up each queue's matching `AWS::SQS::Queue` resource in `serverless.service.resources.Resources` and attaches its `Properties` to the queue definition. If the resource is missing, non-SQS, or has no Properties, the behavior is unchanged (back-compat). - `LocalStackQueueService.createQueue` accepts an optional `attributes` argument and forwards it to SQS `CreateQueue`. When `attributes` is missing/empty the call shape matches the previous `{QueueName}`-only behavior. - `LocalStackQueueService._propertiesToAttributes` whitelists the 15 CloudFormation Properties that map 1:1 onto SQS CreateQueue Attributes and drops everything else (Tags, QueueName, etc.). Object values like `RedrivePolicy` get JSON-encoded as AWS expects. - `createQueues` wires the two together. Tests: 39/39 pass (9 new cases covering the Properties-forwarding path, back-compat for definitions without resources, and the property-to- attribute mapping). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1003d8deff

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…rom queue attributes Addresses Codex review P1 on PR #580. Serializing every CloudFormation `Properties` value with `JSON.stringify` would forward unresolved CloudFormation intrinsics (`Fn::GetAtt`, `Ref`, `Fn::Sub`, …) into LocalStack's `CreateQueue` call. The concrete case: integration-builder.js emits RedrivePolicy: { maxReceiveCount: 3, deadLetterTargetArn: { 'Fn::GetAtt': ['InternalErrorQueue', 'Arn'] } } CloudFormation resolves the Fn::GetAtt to a real ARN in AWS, but at plugin-time the value is still the raw intrinsic object. SQS's `RedrivePolicy` requires `deadLetterTargetArn` to be a valid ARN string — passing the JSON blob either fails the CreateQueue call or produces a queue with malformed DLQ config. Change - Added `LocalStackQueueService._containsUnresolvedIntrinsic(value)`: recursively detects `Fn::*` and `Ref` keys in CloudFormation Property values. - `_propertiesToAttributes` now checks each attribute's value for unresolved intrinsics and DROPS the attribute (with a console.warn) instead of stringifying it. The rest of the attributes pass through untouched — so `VisibilityTimeout` (the main win of this PR) still gets applied locally; the DLQ association is skipped until the plugin gains a proper intrinsic resolver. Tests - 4 new unit tests covering the intrinsic-filter behavior, `Ref` handling, nested intrinsic detection, and the happy path where a resolved ARN string passes through untouched. - 43/43 plugin tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

d-klotz · 2026-04-21T14:18:46Z

Code review

Found 1 issue (same as r3118061868 from the Codex reviewer, fixed in 4487187):

Unresolved CloudFormation intrinsics in queue attributes — RedrivePolicy.deadLetterTargetArn in integration-builder.js is {'Fn::GetAtt': ['InternalErrorQueue', 'Arn']}. JSON-stringifying it and passing it to LocalStack's CreateQueue would fail (SQS expects a real ARN string, not a CloudFormation intrinsic object) or produce malformed DLQ config. CloudFormation resolves this in real AWS deploys, but not at plugin-time.

frigg/packages/serverless-plugin/lib/localstack-queue-service.js

Lines 50 to 58 in 5360ba8

    
             for (const key of LocalStackQueueService.PROPERTY_ATTRIBUTE_KEYS) { 
        
               const value = properties[key]; 
        
               if (value === undefined || value === null) continue; 
        
               attributes[key] = 
        
                 typeof value === 'object' ? JSON.stringify(value) : String(value); 
        
             } 
        
             return attributes; 
        
           }

Fixed by adding a recursive _containsUnresolvedIntrinsic check; any attribute whose value still contains Fn::* or Ref keys is dropped with a console.warn instead of stringified. VisibilityTimeout (the main motivation for the PR) still lands on the queue; DLQ association stays un-wired on LocalStack, matching pre-PR behavior for that one attribute.

Other observations checked and ruled out as false positives:

Key-to-resource name mismatch (if custom.FooQueue key doesn't match the CF logical ID, properties would silently drop) — Frigg-generated queues always use matching IDs (see integration-builder.js:createIntegrationQueue), and the fall-through to defaults is non-regressive (pre-PR behavior was also defaults). Worth documenting but not a blocker.
FIFO / QueueAlreadyExists idempotency — theoretical; no regression from prior behavior.

Root CLAUDE.md checked; no guidance relevant to this diff.

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

…lags `UpdateProcessMetrics` and `UpdateProcessState` both did read-modify- write against `Process.context` / `Process.results` JSON blobs. Under concurrent queue-worker invocations (any integration with `reservedConcurrency > 1`) the spread-and-save pattern clobbers later writes with stale snapshots: counters drift low and context flags such as `fetchDone` silently revert to undefined, leaving syncs stuck in non-terminal states. Adds `applyProcessUpdate(processId, ops)` to the ProcessRepository interface and implements it natively in each adapter: - PostgreSQL: one UPDATE ... RETURNING * with nested `jsonb_set` expressions for increments, sets, and bounded-array pushes. Row locking inside the single statement serializes concurrent callers. - MongoDB: findAndModify via `$runCommandRaw` with `$inc` / `$set` / `$push` + `$slice`. Server-side atomic. - DocumentDB: same operator set as Mongo; adds compat note for `$slice` (requires DocumentDB 4.0+). Validation (regex-checked dot-paths rooted at `context|results`, whitelisted op kinds) lives in a shared `process-update-ops-shared` module so every adapter fails fast before touching the DB. Refactors `UpdateProcessMetrics` to split into an atomic phase (counters + bounded error history) and a best-effort non-atomic follow-up for derived fields (duration, recordsPerSecond, estimatedCompletion) so existing consumers of those fields see no contract change. Derived-field write failures are logged and do NOT fail the overall call — the atomic counters already landed in phase 1. Refactors `UpdateProcessState` to route context updates through `applyProcessUpdate` when the repo supports it, falling back to the original read-merge-write path for custom repos that don't. Existing "shallow top-level merge" semantics are preserved. Removes the long-standing TODO banner on `update-process-metrics.js` documenting this exact race. Adds 57 unit tests covering validation, op shapes, race simulation, empty-batch short-circuit, derived-field non-fatal failures, and the legacy fallback in `UpdateProcessState`. This commit is additive: `repository.update(id, patch)` is unchanged, no schema migration, no behavior change for callers that don't opt in to `applyProcessUpdate`. Note on branch scoping: this change lands on `claude/plugin-queue- attributes` per downstream consumer request. The branch now mixes a serverless-plugin fix (earlier commits) with a core fix; consumers reviewing this PR should treat the two concerns as separable for bisection purposes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Addresses two correctness issues flagged on PR #580 review of c0487bc: B1 (Postgres pushSlice ordering): `jsonb_agg(elem)` without an explicit `ORDER BY` clause produces implementation-defined order even over a `WITH ORDINALITY` subquery. Added `ORDER BY idx` inside the aggregate so insertion order is preserved deterministically across Postgres versions. While there: wrapped the combined-array expression in a CTE so `${newArr}` is evaluated once on the server rather than reconstructed three times in the outer expression. B2 (intermediate-path silent no-op): `jsonb_set(target, path, value, create_missing=true)` only creates the LEAF segment if missing — intermediate segments that don't exist as objects cause the call to return `target` unchanged. For depth-1 paths (our current only callers) this was fine, but the interface JSDoc advertises arbitrary depth (`context.pagination.cursor` is in the example). Added an `ensureParents` helper that wraps each intermediate prefix path in a `jsonb_set(..., COALESCE(cur #> path, '{}'::jsonb), true)` call, preserving existing content and synthesizing `{}` where missing. Known artifact documented in the generated-SQL test: the string- substitution nature of `ensureParents` causes SQL size to grow as O(2^N) in path depth. Fine for realistic depths (≤3 segments); a future optimization could materialize `cur` once per iteration via CTE to make it O(N). No runtime correctness impact — Postgres still executes the whole expression as a single UPDATE under a single row lock. Adds `process-repository-postgres.test.js` with 10 SQL-generation tests covering: - state-only UPDATE (no jsonb_set chains) - depth-1 increment (single jsonb_set wrap, no parent synthesis) - depth-2 set (1 parent synthesis + leaf) - depth-3 set (ensureParents substitution artifact documented) - pushSlice emits `ORDER BY idx` and binds values exactly once - parameter binding order is stable left-to-right - combined increment/set/pushSlice/newState in one UPDATE - null return when UPDATE hits zero rows - validateOps rejects invalid paths before DB call - depth-1 set skips parent synthesis (no wasted jsonb_set) Test coverage gap remaining (follow-up): no real-Postgres race test — the Frigg monorepo has MongoDB-memory-server for Mongo, but no equivalent for Postgres. Adding one requires a team decision on pg-mem vs testcontainers vs CI-provisioned. The SQL-generation tests here + Postgres's documented single-UPDATE atomicity provide the correctness argument without the infrastructure cost. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@default

The Postgres schema at `packages/core/prisma-postgresql/schema.prisma` declares two artifacts that no migration in `prisma-postgresql/migrations/` ever creates. `prisma migrate deploy` runs to completion (there's nothing to apply for those declarations) and the generated client is happy, so the drift only surfaces as P2021 / P2022 when a downstream app actually writes to the affected paths. Missing pieces: 1. `Entity.data Json @default("{}")` was added to the schema but never migrated. ModuleRepositoryPostgres.createEntity / findEntity persist and rehydrate any `identifiers`/`details` fields that fall outside the six named columns through this JSONB blob — so the moment an integration's `getEntityDetails` returns a field like `firm_subdomain`, prisma.entity.create throws `Unknown argument 'userId'. Did you mean 'user'?` (the first argument Prisma can't place is the one it names in the error — confusingly unrelated to the real cause). Added `20260422120000_add_entity_data_column` — a single `ALTER TABLE "Entity" ADD COLUMN IF NOT EXISTS "data" JSONB NOT NULL DEFAULT '{}'` so the migration is safe to re-apply on environments that may have worked around the drift locally. 2. The entire `Process` model is declared in the schema but no migration creates the table, its six indexes, or the three FK constraints (user, integration, self-referential parent). ProcessRepositoryPostgres and FriggProcessManager depend on this for long-running-job state tracking — fan-out sync progress, batched state machines — so any app that uses the new process primitive hits P2021 on first `prisma.process.create`. Added `20260422120001_create_process_table` with the table definition, all declared indexes, and the three FK constraints matching the relations in schema.prisma (userId→User CASCADE, integrationId→Integration CASCADE, parentProcessId→Process SET NULL). Downstream apps currently working around these with local patch migrations (e.g. lefthookhq/clockwork-recruiting--frigg's `scripts/patch-core-migrations.js`) can delete that workaround after bumping to the release that includes this commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Under heavy concurrent fan-out sync load, worker Lambdas were hanging the full 900s timeout with no logs or errors. Root cause: three compounding infrastructure defaults. 1. Prisma engineType="binary" — schema.prisma forced the binary engine, which forks a separate Rust query-engine child and communicates over an HTTP/IPC pipe that has NO client-side timeout on the Node side. A zombied child wedges the calling process until Lambda kills it. Switch both postgres and mongodb schemas to engineType="library" (the Prisma 3.x+ default), which loads the Rust engine as a Node-API addon in-process. Native crashes now surface as Runtime.ExitError instead of 15-min hangs. 2. Aurora Serverless v2 MaxCapacity=1 ACU — at 0.5–1 ACU Aurora is CPU- bound under 20-way concurrent writes, inflating query latency and compounding pool pressure. Bump the default to 4 ACU; apps can still override via dbConfig.maxCapacity. Scales to MinCapacity when idle so cost impact is minimal. 3. DATABASE_URL had no pool or timeout params — a blocked query or row lock would wait forever. Append Prisma + libpq params at URL build time: connection_limit=1 (one pg conn per Lambda container; rely on Lambda concurrency for parallelism — AWS best practice) pool_timeout=10 (throw P2024 after 10s instead of hanging) connect_timeout=10 (bound TCP/TLS handshake) socket_timeout=60 (kill dead sockets when server never responds) options=-c statement_timeout=30000 -c lock_timeout=10000 (Postgres-side 30s query cap / 10s lock-wait cap — aborts with SQLSTATE 57014 / 55P03 rather than silently sitting in pg_stat_activity) Any combination of these defaults could mask the other two; all three had to ship together to make workers fail loud and fast instead of silently hanging. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Follow-up to f1cb41c, addressing code-review concerns: 1. connection_limit 1 → 2, pool_timeout 10 → 20 core has multiple in-handler Promise.all patterns against Prisma: get-process.executeMany (parallel fetch), field-encryption-service (batched decrypt). With connection_limit=1 these would serialize behind a single pool slot, and under Aurora pressure each later query gets less remaining budget before P2024. Bumping to 2 removes the cliff while staying safe against max_connections (200 Lambdas × 2 = 400 conns, fits aurora-pg 15 at 4 ACU). pool_timeout=20 still fails fast relative to Lambda's 900s cap but gives fan-outs headroom. 2. Cover the use-existing and discover-no-secret runtime-construction paths Those paths export DATABASE_HOST / DATABASE_PORT / DATABASE_NAME / DATABASE_USER / DATABASE_PASSWORD and expect consumer code to assemble DATABASE_URL at runtime. Previously got NONE of the new timeouts — a real gap. Extract LAMBDA_DATABASE_URL_QUERY_PARAMS to a module-scoped constant and export it as a `DATABASE_URL_PARAMS` env var on those two paths, with a log hint telling consumers to append `?${DATABASE_URL_PARAMS}` when building the URL. Single source of truth for the params; no drift between paths. 3. Tests now assert all five param components (connection_limit, pool_timeout, connect_timeout, socket_timeout, statement_timeout, lock_timeout) so accidental removal of any is caught. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Previously the GET /api/authorize handler dropped req.query.state on the floor: it called getModuleInstanceFromType.execute(userId, entityType) without forwarding any extra params, so the Module constructor's apiParams never received state. As a result, the OAuth2Requester base — which already accepts `state` via get(params, 'state', null) — always saw state = null, and downstream API modules that hardcode &state=${this.state} in their authorize URL (e.g. api-module-hubspot) interpolated state=null. This commit threads state through: - integration-router.js: GET /api/authorize passes { state: req.query.state } as the third arg. - get-module-instance-from-type.js: execute(userId, type, options = {}) forwards options.state to the Module constructor. - module.js: Module constructor accepts state and merges { state } into apiParams when truthy, so OAuth2Requester picks it up. Backwards-compatible: state defaults to null, no callers need to change. Modules that conditionally append state (e.g. api-module-clockwork-recruiting which uses `if (this.state) params.append('state', ...)`) continue to omit the param when no state is provided. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

sonarqubecloud · 2026-04-27T20:03:00Z

Quality Gate failed

Failed conditions
3.1% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

seanspeaks · 2026-04-27T21:24:47Z

🚀 PR was released in v2.0.0-next.81 🚀

d-klotz added the release Create a release when this pr is merged label Apr 21, 2026

d-klotz force-pushed the claude/plugin-queue-attributes branch from 1003d8d to 5360ba8 Compare April 21, 2026 14:12

d-klotz changed the base branch from next to claude/core-requester-timeout April 21, 2026 14:12

chatgpt-codex-connector Bot reviewed Apr 21, 2026

View reviewed changes

Comment thread packages/serverless-plugin/lib/localstack-queue-service.js

d-klotz and others added 6 commits April 21, 2026 15:27

d-klotz merged commit 5ff5209 into claude/core-requester-timeout Apr 27, 2026
6 of 8 checks passed

d-klotz deleted the claude/plugin-queue-attributes branch April 27, 2026 21:10

seanspeaks added the prerelease This change is available in a prerelease. label Apr 27, 2026

seanspeaks mentioned this pull request Apr 27, 2026

feature: Frigg 2.0 features #394

Draft

d-klotz mentioned this pull request Apr 29, 2026

fix(queue): break Lambda recursion chain on paginated CRM syncs lefthookhq/quo--frigg#94

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(serverless-plugin): apply CloudFormation queue Properties to LocalStack#580

fix(serverless-plugin): apply CloudFormation queue Properties to LocalStack#580
d-klotz merged 8 commits intoclaude/core-requester-timeoutfrom
claude/plugin-queue-attributes

d-klotz commented Apr 21, 2026 •

edited by seanspeaks

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

d-klotz commented Apr 21, 2026

Uh oh!

sonarqubecloud Bot commented Apr 27, 2026

Uh oh!

Uh oh!

seanspeaks commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

d-klotz commented Apr 21, 2026 • edited by seanspeaks Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

1. serverless-plugin: LocalStack queue attributes (original scope, PR title)

2. core: atomic Process updates (new, commit c0487bcb)

Why both on one branch

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

d-klotz commented Apr 21, 2026

Code review

Uh oh!

sonarqubecloud Bot commented Apr 27, 2026

Quality Gate failed

Uh oh!

Uh oh!

seanspeaks commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

d-klotz commented Apr 21, 2026 •

edited by seanspeaks

Loading

2. core: atomic Process updates (new, commit `c0487bcb`)