fix(serverless-plugin): apply CloudFormation queue Properties to LocalStack#580
Conversation
…lStack
LocalStack queues were being created with only `QueueName`, dropping
every CloudFormation `Properties` field. AWS (via CloudFormation)
applies the full Properties block — notably `VisibilityTimeout: 1800`
that integration-builder emits for integration queues — so local
emulation diverges from production.
Observed impact: the HubspotQueue queue worker's handler runs for
minutes (initial sync of a real firm), but SQS's default 30 s
VisibilityTimeout makes the message re-appear as visible every 30 s.
The same message gets delivered to a second worker invocation while
the first is still processing, which breaks the "one worker per
message" invariant Frigg assumes.
Change
- `extractQueueDefinitions` now looks up each queue's matching
`AWS::SQS::Queue` resource in `serverless.service.resources.Resources`
and attaches its `Properties` to the queue definition. If the
resource is missing, non-SQS, or has no Properties, the behavior is
unchanged (back-compat).
- `LocalStackQueueService.createQueue` accepts an optional
`attributes` argument and forwards it to SQS `CreateQueue`. When
`attributes` is missing/empty the call shape matches the previous
`{QueueName}`-only behavior.
- `LocalStackQueueService._propertiesToAttributes` whitelists the 15
CloudFormation Properties that map 1:1 onto SQS CreateQueue
Attributes and drops everything else (Tags, QueueName, etc.). Object
values like `RedrivePolicy` get JSON-encoded as AWS expects.
- `createQueues` wires the two together.
Tests: 39/39 pass (9 new cases covering the Properties-forwarding path,
back-compat for definitions without resources, and the property-to-
attribute mapping).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1003d8d to
5360ba8
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1003d8deff
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…rom queue attributes Addresses Codex review P1 on PR #580. Serializing every CloudFormation `Properties` value with `JSON.stringify` would forward unresolved CloudFormation intrinsics (`Fn::GetAtt`, `Ref`, `Fn::Sub`, …) into LocalStack's `CreateQueue` call. The concrete case: integration-builder.js emits RedrivePolicy: { maxReceiveCount: 3, deadLetterTargetArn: { 'Fn::GetAtt': ['InternalErrorQueue', 'Arn'] } } CloudFormation resolves the Fn::GetAtt to a real ARN in AWS, but at plugin-time the value is still the raw intrinsic object. SQS's `RedrivePolicy` requires `deadLetterTargetArn` to be a valid ARN string — passing the JSON blob either fails the CreateQueue call or produces a queue with malformed DLQ config. Change - Added `LocalStackQueueService._containsUnresolvedIntrinsic(value)`: recursively detects `Fn::*` and `Ref` keys in CloudFormation Property values. - `_propertiesToAttributes` now checks each attribute's value for unresolved intrinsics and DROPS the attribute (with a console.warn) instead of stringifying it. The rest of the attributes pass through untouched — so `VisibilityTimeout` (the main win of this PR) still gets applied locally; the DLQ association is skipped until the plugin gains a proper intrinsic resolver. Tests - 4 new unit tests covering the intrinsic-filter behavior, `Ref` handling, nested intrinsic detection, and the happy path where a resolved ARN string passes through untouched. - 43/43 plugin tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code reviewFound 1 issue (same as r3118061868 from the Codex reviewer, fixed in 4487187):
frigg/packages/serverless-plugin/lib/localstack-queue-service.js Lines 50 to 58 in 5360ba8 Fixed by adding a recursive Other observations checked and ruled out as false positives:
Root CLAUDE.md checked; no guidance relevant to this diff. 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
…lags
`UpdateProcessMetrics` and `UpdateProcessState` both did read-modify-
write against `Process.context` / `Process.results` JSON blobs. Under
concurrent queue-worker invocations (any integration with
`reservedConcurrency > 1`) the spread-and-save pattern clobbers later
writes with stale snapshots: counters drift low and context flags such
as `fetchDone` silently revert to undefined, leaving syncs stuck in
non-terminal states.
Adds `applyProcessUpdate(processId, ops)` to the ProcessRepository
interface and implements it natively in each adapter:
- PostgreSQL: one UPDATE ... RETURNING * with nested `jsonb_set`
expressions for increments, sets, and bounded-array pushes. Row
locking inside the single statement serializes concurrent callers.
- MongoDB: findAndModify via `$runCommandRaw` with `$inc` / `$set`
/ `$push` + `$slice`. Server-side atomic.
- DocumentDB: same operator set as Mongo; adds compat note for
`$slice` (requires DocumentDB 4.0+).
Validation (regex-checked dot-paths rooted at `context|results`,
whitelisted op kinds) lives in a shared `process-update-ops-shared`
module so every adapter fails fast before touching the DB.
Refactors `UpdateProcessMetrics` to split into an atomic phase
(counters + bounded error history) and a best-effort non-atomic
follow-up for derived fields (duration, recordsPerSecond,
estimatedCompletion) so existing consumers of those fields see no
contract change. Derived-field write failures are logged and do NOT
fail the overall call — the atomic counters already landed in phase 1.
Refactors `UpdateProcessState` to route context updates through
`applyProcessUpdate` when the repo supports it, falling back to the
original read-merge-write path for custom repos that don't. Existing
"shallow top-level merge" semantics are preserved.
Removes the long-standing TODO banner on `update-process-metrics.js`
documenting this exact race. Adds 57 unit tests covering validation,
op shapes, race simulation, empty-batch short-circuit, derived-field
non-fatal failures, and the legacy fallback in `UpdateProcessState`.
This commit is additive: `repository.update(id, patch)` is unchanged,
no schema migration, no behavior change for callers that don't opt in
to `applyProcessUpdate`.
Note on branch scoping: this change lands on `claude/plugin-queue-
attributes` per downstream consumer request. The branch now mixes a
serverless-plugin fix (earlier commits) with a core fix; consumers
reviewing this PR should treat the two concerns as separable for
bisection purposes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses two correctness issues flagged on PR #580 review of c0487bc: B1 (Postgres pushSlice ordering): `jsonb_agg(elem)` without an explicit `ORDER BY` clause produces implementation-defined order even over a `WITH ORDINALITY` subquery. Added `ORDER BY idx` inside the aggregate so insertion order is preserved deterministically across Postgres versions. While there: wrapped the combined-array expression in a CTE so `${newArr}` is evaluated once on the server rather than reconstructed three times in the outer expression. B2 (intermediate-path silent no-op): `jsonb_set(target, path, value, create_missing=true)` only creates the LEAF segment if missing — intermediate segments that don't exist as objects cause the call to return `target` unchanged. For depth-1 paths (our current only callers) this was fine, but the interface JSDoc advertises arbitrary depth (`context.pagination.cursor` is in the example). Added an `ensureParents` helper that wraps each intermediate prefix path in a `jsonb_set(..., COALESCE(cur #> path, '{}'::jsonb), true)` call, preserving existing content and synthesizing `{}` where missing. Known artifact documented in the generated-SQL test: the string- substitution nature of `ensureParents` causes SQL size to grow as O(2^N) in path depth. Fine for realistic depths (≤3 segments); a future optimization could materialize `cur` once per iteration via CTE to make it O(N). No runtime correctness impact — Postgres still executes the whole expression as a single UPDATE under a single row lock. Adds `process-repository-postgres.test.js` with 10 SQL-generation tests covering: - state-only UPDATE (no jsonb_set chains) - depth-1 increment (single jsonb_set wrap, no parent synthesis) - depth-2 set (1 parent synthesis + leaf) - depth-3 set (ensureParents substitution artifact documented) - pushSlice emits `ORDER BY idx` and binds values exactly once - parameter binding order is stable left-to-right - combined increment/set/pushSlice/newState in one UPDATE - null return when UPDATE hits zero rows - validateOps rejects invalid paths before DB call - depth-1 set skips parent synthesis (no wasted jsonb_set) Test coverage gap remaining (follow-up): no real-Postgres race test — the Frigg monorepo has MongoDB-memory-server for Mongo, but no equivalent for Postgres. Adding one requires a team decision on pg-mem vs testcontainers vs CI-provisioned. The SQL-generation tests here + Postgres's documented single-UPDATE atomicity provide the correctness argument without the infrastructure cost. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Postgres schema at `packages/core/prisma-postgresql/schema.prisma` declares two artifacts that no migration in `prisma-postgresql/migrations/` ever creates. `prisma migrate deploy` runs to completion (there's nothing to apply for those declarations) and the generated client is happy, so the drift only surfaces as P2021 / P2022 when a downstream app actually writes to the affected paths. Missing pieces: 1. `Entity.data Json @default("{}")` was added to the schema but never migrated. ModuleRepositoryPostgres.createEntity / findEntity persist and rehydrate any `identifiers`/`details` fields that fall outside the six named columns through this JSONB blob — so the moment an integration's `getEntityDetails` returns a field like `firm_subdomain`, prisma.entity.create throws `Unknown argument 'userId'. Did you mean 'user'?` (the first argument Prisma can't place is the one it names in the error — confusingly unrelated to the real cause). Added `20260422120000_add_entity_data_column` — a single `ALTER TABLE "Entity" ADD COLUMN IF NOT EXISTS "data" JSONB NOT NULL DEFAULT '{}'` so the migration is safe to re-apply on environments that may have worked around the drift locally. 2. The entire `Process` model is declared in the schema but no migration creates the table, its six indexes, or the three FK constraints (user, integration, self-referential parent). ProcessRepositoryPostgres and FriggProcessManager depend on this for long-running-job state tracking — fan-out sync progress, batched state machines — so any app that uses the new process primitive hits P2021 on first `prisma.process.create`. Added `20260422120001_create_process_table` with the table definition, all declared indexes, and the three FK constraints matching the relations in schema.prisma (userId→User CASCADE, integrationId→Integration CASCADE, parentProcessId→Process SET NULL). Downstream apps currently working around these with local patch migrations (e.g. lefthookhq/clockwork-recruiting--frigg's `scripts/patch-core-migrations.js`) can delete that workaround after bumping to the release that includes this commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Under heavy concurrent fan-out sync load, worker Lambdas were hanging the
full 900s timeout with no logs or errors. Root cause: three compounding
infrastructure defaults.
1. Prisma engineType="binary" — schema.prisma forced the binary engine,
which forks a separate Rust query-engine child and communicates over
an HTTP/IPC pipe that has NO client-side timeout on the Node side. A
zombied child wedges the calling process until Lambda kills it.
Switch both postgres and mongodb schemas to engineType="library" (the
Prisma 3.x+ default), which loads the Rust engine as a Node-API addon
in-process. Native crashes now surface as Runtime.ExitError instead of
15-min hangs.
2. Aurora Serverless v2 MaxCapacity=1 ACU — at 0.5–1 ACU Aurora is CPU-
bound under 20-way concurrent writes, inflating query latency and
compounding pool pressure. Bump the default to 4 ACU; apps can still
override via dbConfig.maxCapacity. Scales to MinCapacity when idle so
cost impact is minimal.
3. DATABASE_URL had no pool or timeout params — a blocked query or row
lock would wait forever. Append Prisma + libpq params at URL build
time:
connection_limit=1 (one pg conn per Lambda container; rely on Lambda
concurrency for parallelism — AWS best practice)
pool_timeout=10 (throw P2024 after 10s instead of hanging)
connect_timeout=10 (bound TCP/TLS handshake)
socket_timeout=60 (kill dead sockets when server never responds)
options=-c statement_timeout=30000 -c lock_timeout=10000
(Postgres-side 30s query cap / 10s lock-wait cap
— aborts with SQLSTATE 57014 / 55P03 rather than
silently sitting in pg_stat_activity)
Any combination of these defaults could mask the other two; all three
had to ship together to make workers fail loud and fast instead of
silently hanging.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to f1cb41c, addressing code-review concerns: 1. connection_limit 1 → 2, pool_timeout 10 → 20 core has multiple in-handler Promise.all patterns against Prisma: get-process.executeMany (parallel fetch), field-encryption-service (batched decrypt). With connection_limit=1 these would serialize behind a single pool slot, and under Aurora pressure each later query gets less remaining budget before P2024. Bumping to 2 removes the cliff while staying safe against max_connections (200 Lambdas × 2 = 400 conns, fits aurora-pg 15 at 4 ACU). pool_timeout=20 still fails fast relative to Lambda's 900s cap but gives fan-outs headroom. 2. Cover the use-existing and discover-no-secret runtime-construction paths Those paths export DATABASE_HOST / DATABASE_PORT / DATABASE_NAME / DATABASE_USER / DATABASE_PASSWORD and expect consumer code to assemble DATABASE_URL at runtime. Previously got NONE of the new timeouts — a real gap. Extract LAMBDA_DATABASE_URL_QUERY_PARAMS to a module-scoped constant and export it as a `DATABASE_URL_PARAMS` env var on those two paths, with a log hint telling consumers to append `?${DATABASE_URL_PARAMS}` when building the URL. Single source of truth for the params; no drift between paths. 3. Tests now assert all five param components (connection_limit, pool_timeout, connect_timeout, socket_timeout, statement_timeout, lock_timeout) so accidental removal of any is caught. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously the GET /api/authorize handler dropped req.query.state on the
floor: it called getModuleInstanceFromType.execute(userId, entityType)
without forwarding any extra params, so the Module constructor's apiParams
never received state. As a result, the OAuth2Requester base — which
already accepts `state` via get(params, 'state', null) — always saw
state = null, and downstream API modules that hardcode &state=${this.state}
in their authorize URL (e.g. api-module-hubspot) interpolated state=null.
This commit threads state through:
- integration-router.js: GET /api/authorize passes
{ state: req.query.state } as the third arg.
- get-module-instance-from-type.js: execute(userId, type, options = {})
forwards options.state to the Module constructor.
- module.js: Module constructor accepts state and merges { state } into
apiParams when truthy, so OAuth2Requester picks it up.
Backwards-compatible: state defaults to null, no callers need to change.
Modules that conditionally append state (e.g. api-module-clockwork-recruiting
which uses `if (this.state) params.append('state', ...)`) continue to omit
the param when no state is provided.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|
🚀 PR was released in |


What changed
This PR now carries two stackable fixes on the same branch:
1. serverless-plugin: LocalStack queue attributes (original scope, PR title)
Commits:
5360ba85,44871876localstack-queue-serviceextracts SQS-relevant keys from the CloudFormationResourcesblock so local queues get the sameVisibilityTimeout,MessageRetentionPeriod,RedrivePolicy, etc. as deployed AWS.Fn::GetAtt/Ref/Fn::Subintrinsics so we don't ship broken JSON to LocalStack.2. core: atomic Process updates (new, commit
c0487bcb)UpdateProcessMetricsandUpdateProcessStatedid read-modify-write onProcess.context/Process.results. Under concurrent queue-worker invocations (reservedConcurrency > 1), later writers clobber earlier writers — counters drift low, state flags (fetchDone, etc.) silently revert toundefined, syncs stall in non-terminal states.Fix: new
applyProcessUpdate(processId, ops)method onProcessRepositoryInterface, implemented natively in each backend:UPDATE ... RETURNING *with nestedjsonb_setfor increments / sets / bounded-array pushes. Row locking during UPDATE serializes concurrent callers.findAndModifyvia$runCommandRawwith$inc/$set/$push+$slice. Server-side atomic.$slicerequires DocumentDB 4.0+ (documented).Refactors
UpdateProcessMetricsinto an atomic phase (counters + error history) + non-atomic derived-fields phase (duration, recordsPerSecond, estimatedCompletion — preserved for backward compat, explicitly best-effort under concurrency).Refactors
UpdateProcessStateto route context updates throughapplyProcessUpdatewhen the repo supports it, falling back to the original read-merge-write path otherwise.Removes the long-standing TODO banner on
update-process-metrics.jsdocumenting this exact race.Additive change:
repository.update(id, patch)is unchanged. No schema migration. Callers that don't opt in see zero behavior change.Why both on one branch
Downstream consumer (clockwork-recruiting--frigg) hit the race after consuming the earlier PR #579 (Requester timeout) + #580's LocalStack attribute fix made the Option B fan-out sync architecturally viable. The atomic-Process-update fix is the minimum-viable change in core needed to let that fan-out run safely at production concurrency. Keeping the two fixes on one branch gets the whole chain to canary in a single release.
Reviewers: the two commits are cleanly separable for cherry-pick / bisection if you want to split.
Test plan
packages/serverless-plugin/lib/localstack-queue-service.test.js— 18 tests for the CloudFormation-intrinsic filterpackages/core/integrations/repositories/process-update-ops-shared.test.js— 23 validation tests (path regex, op kinds, spec shape)packages/core/integrations/use-cases/update-process-metrics.test.js— 17 tests covering atomic increment routing, pushSlice for errors, short-circuit on empty batch, non-fatal derived-fields failure, race simulation (25 concurrent increments)packages/core/integrations/use-cases/update-process-state.test.js— existing 17 tests still pass; routed throughapplyProcessUpdatewhen available🤖 Generated with Claude Code
📦 Published PR as canary version:
2.0.0--canary.580.9716b6d.0✨ Test out this PR locally via:
npm install @friggframework/core@2.0.0--canary.580.9716b6d.0 npm install @friggframework/devtools@2.0.0--canary.580.9716b6d.0 npm install @friggframework/eslint-config@2.0.0--canary.580.9716b6d.0 npm install @friggframework/prettier-config@2.0.0--canary.580.9716b6d.0 npm install @friggframework/schemas@2.0.0--canary.580.9716b6d.0 npm install @friggframework/serverless-plugin@2.0.0--canary.580.9716b6d.0 npm install @friggframework/test@2.0.0--canary.580.9716b6d.0 npm install @friggframework/ui@2.0.0--canary.580.9716b6d.0 # or yarn add @friggframework/core@2.0.0--canary.580.9716b6d.0 yarn add @friggframework/devtools@2.0.0--canary.580.9716b6d.0 yarn add @friggframework/eslint-config@2.0.0--canary.580.9716b6d.0 yarn add @friggframework/prettier-config@2.0.0--canary.580.9716b6d.0 yarn add @friggframework/schemas@2.0.0--canary.580.9716b6d.0 yarn add @friggframework/serverless-plugin@2.0.0--canary.580.9716b6d.0 yarn add @friggframework/test@2.0.0--canary.580.9716b6d.0 yarn add @friggframework/ui@2.0.0--canary.580.9716b6d.0