Webhooks Backend (Part 3)#2626
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughAdds a complete webhook processing and delivery system, including event creation from WAL streams, HTTP sending with cryptographic signing, SSRF protection, rate limiting per app, database composite type support for webhook metadata, and comprehensive test infrastructure for webhook scenarios. ChangesWebhook Processing and Delivery System
Sequence DiagramsequenceDiagram
participant WAL as WAL Stream
participant Inv as Invalidator
participant HM as History Model
participant WM as Webhook Model
participant WP as Webhook Processor
participant WS as Webhook Sender
participant HTTP as Remote Webhook
WAL->>Inv: flush LSN
Inv->>HM: push-batch!
HM->>HM: create history entries
HM->>WM: create-events!
WM->>WM: compute event data<br/>(topics, idempotency key)
WM->>WP: notify-events
WP->>WP: enqueue to blocking queue
rect rgb(100, 150, 200, 0.5)
Note over WP: Worker Loop
WP->>WP: claim-events!
WP->>WS: send-webhook
WS->>WS: DNS resolve (race DoH)
WS->>WS: SSRF filter IP
WS->>WS: sign payload
WS->>HTTP: POST with signature
HTTP-->>WS: response
WS-->>WP: WebhookAttempt
WP->>WM: record-attempt!
end
rect rgb(200, 100, 150, 0.5)
Note over WP: Retry Path
WP->>WP: should-repeat?<br/>(attempts < max)
WP->>WM: schedule next retry
end
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly Related PRs
Suggested Reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 13
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@server/src/instant/core.clj`:
- Around line 356-357: The webhook processor is being started before flags are
initialized so it reads default :webhook-worker-count and
:webhook-retry-worker-count; call flags-impl/init (or otherwise initialize
flags) before invoking webhook-processor/start-global (inside with-log-init) in
-main so the processor picks up configured flag values rather than defaults—move
or add flags-impl/init prior to the (with-log-init :webhook-processor
(webhook-processor/start-global)) call.
In `@server/src/instant/flags.clj`:
- Around line 50-56: parse-ips-flag currently calls InetAddress/getByName which
performs DNS resolution; change it to only accept strict IP literals by
validating each input with inet.ipaddr.IPAddressString.isValid() (add
(inet.ipaddr IPAddressString) to :import), filter inputs with that check, log
and discard invalid entries, and then convert the validated literals to
InetAddress (or use InetAddress/getByAddress from the parsed bytes) so no DNS
lookups occur; update the function parse-ips-flag and its internal anonymous fn
to use IPAddressString validation instead of blindly calling
InetAddress/getByName.
In `@server/src/instant/model/webhook.clj`:
- Around line 176-190: The create! path currently builds :id-attr-ids and topics
using only each etype's "id" attribute, but webhook-matches? and the history
prefilter check for the actual changed attribute ids; update the create! logic
so it collects and persists the attr ids for all relevant attrs for each etype
(not just the "id" attr) before computing topics. Locate the id-attr-ids/topics
construction in create! (the reduce over etypes that calls
attr-model/seek-by-fwd-ident-name and history/bloom-bit) and change it to map
each etype to the full set of attribute ids (look up attr ids from attrs using
attr-model functions) and then compute topics by OR-ing bloom-bit for each of
those attr ids so :id_attr_ids and topics reflect the real changed fields used
by webhook-matches?.
- Around line 224-239: The update in enable-q inside the enable! flow sets
:status to the literal "enable" which is inconsistent with the rest of the code
that expects enabled webhooks to use "active"; update the uhsql/preformat for
enable-q so the :set map writes the enum value "active" (cast to :webhook_status
the same way other queries do) instead of "enable" so enable! will mark rows as
active and be visible to active-only queries; adjust enable-q (and ensure
enable! uses that updated enable-q) to reference the correct enum value.
- Around line 459-484: The current logic sets status based on :success? and
attempt_count but treats a 410 response as "error" (retryable) before calling
disable!, which leaves a retryable row; change the status computation in the let
binding in webhook.clj (the same form that builds params and calls
record-attempt!) to treat HTTP 410 as terminal: add a check like (when (= 410
(:status-code attempt)) "failed") (or place (= 410 (:status-code attempt))
branch in the cond ahead of the retry branch) so status becomes "failed" for
410s and therefore next-attempt-after remains nil (the next-attempt-after
expression already only sets a value when status = "error"); keep the subsequent
disable! call as-is so disabled webhooks are recorded as terminal attempts and
won't be retried by handle-event!.
In `@server/src/instant/smokescreen.clj`:
- Around line 31-37: Add explicit test coverage for IPv4-mapped IPv6 addresses
by extending the bad-ip? test cases to include examples like "::ffff:127.0.0.1",
"::ffff:10.0.0.1", "::ffff:192.168.1.1" (should be bad) and "::ffff:8.8.8.8"
(should be allowed), and update tests to assert these behaviors; optionally
harden has-ipv6-embedding? by adding an explicit prefix check for the
IPv4-mapped range (::ffff:0:0/96) alongside nat64-well-known-prefix,
six->four-prefix and teredo-prefix so IPv4-mapped embedding is detected even if
.isLocal semantics change.
In `@server/src/instant/util/test.clj`:
- Around line 373-386: The test currently does a blocking deref
@(:started-promise opts#) which can hang if wal/start-worker throws before
delivering that promise; replace that line with a race that surfaces worker
failures immediately: after creating worker# (future (wal/start-worker opts#)),
spawn a small watcher/future that (1) waits on (:started-promise opts#) and (2)
watches worker# for exceptions (catching/reading the worker future's thrown
exception), then deliver whichever completes first into a local promise and
deref that; if the watcher yields an exception rethrow it so the test fails and
finally block runs (refer to worker#, (:started-promise opts#),
wal/start-worker, pumper# and inv/stop to locate where to change the code).
In `@server/src/instant/webhook_processor.clj`:
- Around line 76-90: The work! function currently spawns one ua/vfuture per
event (via ua/vfuture in the mapv over events), allowing unbounded outbound
concurrency; change it to enforce a bounded concurrency equal to the configured
worker count (or a new concurrency param) so each queue token only processes up
to that many concurrent handle-event! calls. Locate work!,
webhook-model/claim-events!, ua/vfuture, max-apps/max-per-app and replace the
unbounded mapv of ua/vfuture with a bounded executor/semaphore or a
limited-parallelism map (e.g., a fixed thread-pool, semaphore-wrapped
handle-event! calls, or a utility like pmap with a concurrency limit) so deref
waits on at most N concurrent tasks; preserve should-repeat? behavior and error
handling.
- Around line 49-58: The code returns early for non-active webhooks and never
clears the claim, causing free-stuck-events! to recycle them; fix by handling
the non-active branch after webhook-model/get-by-app-id-and-webhook-id! so you
still clear the claim with webhook-model/record-attempt! (or call whatever
terminal/failed-recording helper exists) with a terminal attempt indicating the
webhook was inactive (include identifying fields like idempotency-key, event,
and a status/reason such as "webhook_inactive"), rather than only calling
webhook-model/record-attempt! inside the active branch; update the logic around
webhook-sender/send-webhook and webhook-model/record-attempt! to always record
an attempt for claimed events even when (:status webhook) is not "active".
- Around line 133-152: Validate worker-count and retry-worker-count after
reading flags and before constructing ArrayBlockingQueue: if either value is <=
0, avoid creating an ArrayBlockingQueue or calling
start-worker/start-retry-worker (e.g., set q/retry-q to nil and
workers/retry-workers to empty vectors or skip their creation) so start/restart
degrades gracefully instead of throwing IllegalArgumentException; reference the
symbols worker-count, retry-worker-count, ArrayBlockingQueue, start-worker, and
start-retry-worker when making the guard and ensure the tracer/with-span!
attributes still reflect the configured counts.
In `@server/src/instant/webhook_sender.clj`:
- Around line 10-21: The namespace is missing an import for
java.util.concurrent.Callable which causes compilation to fail where you use
(reify Callable ...) in webhook_sender.clj; add Callable to the (:import ...)
list alongside ExecutorService and TimeUnit so that the reify Callable
expression resolves (look for the existing imports block and include
java.util.concurrent Callable).
In `@server/src/tool.clj`:
- Around line 319-323: The webhook composite branches build SQL string literals
by wrapping the composite text from webhook-attempt->composite-str and
->pg-webhook-attempt-array in single quotes but do not escape internal
apostrophes, which breaks unsafe-sql-format-query for values like "can't
connect"; fix by escaping single quotes (replace "'" with "''") on the composite
string before formatting (i.e., call a small helper like escape-sql-literal or
apply (.replace composite-str "'" "''")) when producing the
"'%s'::webhook_attempt" and "'%s'::webhook_attempt[]" values so
webhook-attempt->composite-str and ->pg-webhook-attempt-array outputs are safe
to wrap.
In `@server/test/instant/webhook_sender_test.clj`:
- Around line 47-74: The tests (dns-resolver-filters-bad-ips and the other cases
mentioned) call webhook-sender/send-webhook and rely on real DNS via
race-dns-resolve/Dns, which makes CI flaky; replace that external dependency in
the tests by stubbing/mocking race-dns-resolve (or the Dns implementation used
by webhook-sender) with with-redefs or a test fixture to return deterministic
results for the hostnames used (e.g., return resolved private/metadata IPs or
throw a resolution error as needed), so assertions only depend on local test
state and not live DoH/DNS; update the tests at the referenced blocks (including
the other ranges 76-109 and 111-136) to use the stubbed resolver.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: e0493f55-bb0a-4fad-9c05-3a78831c85c0
📒 Files selected for processing (26)
server/.clj-kondo/config.ednserver/.clj-kondo/hooks/with_test_replication_slot.cljserver/deps.ednserver/src/instant/config.cljserver/src/instant/core.cljserver/src/instant/flags.cljserver/src/instant/grpc.cljserver/src/instant/jdbc/sql.cljserver/src/instant/jdbc/wal.cljserver/src/instant/jdbc/wal_entry.cljserver/src/instant/model/history.cljserver/src/instant/model/webhook.cljserver/src/instant/nippy.cljserver/src/instant/rate_limit.cljserver/src/instant/reactive/invalidator.cljserver/src/instant/smokescreen.cljserver/src/instant/util/json.cljserver/src/instant/util/test.cljserver/src/instant/webhook_processor.cljserver/src/instant/webhook_sender.cljserver/src/tool.cljserver/test/instant/jdbc/sql_test.cljserver/test/instant/model/webhook_test.cljserver/test/instant/nippy_test.cljserver/test/instant/smokescreen_test.cljserver/test/instant/webhook_sender_test.clj
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
server/src/instant/util/test.clj (1)
387-390: ⚡ Quick win
@worker#and@pumper#infinallyhave no timeout and can hang indefinitely.If
inv/stopfails to closewal-chan#(e.g., due to a stuck network/replication connection), the pumper'sa/<!!loop never sees nil and@pumper#blocks forever. Similarly,@worker#blocks if the worker future doesn't terminate after signalling. A timed deref with a diagnostic throw would prevent silent test-suite hangs.🛡️ Proposed fix: add timeouts to both derefs
(finally (inv/stop opts#) - `@worker`# - `@pumper`#)))) + (when (= ::timeout (deref worker# 15000 ::timeout)) + (future-cancel worker#) + (throw (ex-info "WAL worker did not shut down within 15s" {}))) + (a/close! wal-chan#) ; ensure pumper exits even if inv/stop missed it + (when (= ::timeout (deref pumper# 5000 ::timeout)) + (throw (ex-info "WAL pumper did not shut down within 5s" {})))))))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/instant/util/test.clj` around lines 387 - 390, The finally block currently blocks indefinitely on `@worker`# and `@pumper`#; change these to timed derefs (deref with timeout or @ with timeout) and throw a diagnostic exception if the timeout elapses so tests fail rather than hang: after calling inv/stop (which may fail to close wal-chan#), replace the plain derefs of worker# and pumper# with a timed-deref pattern (e.g., deref worker# TIMEOUT or alts/timeout) and if nil/timeout is returned throw an informative exception mentioning worker# or pumper# and wal-chan#/inv/stop to aid debugging. Ensure the same timeout policy is applied to both worker# and pumper#.server/test/instant/smokescreen_test.clj (1)
16-97: ⚡ Quick winAdd coverage for the whitelist bypass.
bad-ip?now has a new first branch forflags/smokescreen-whitelist-ips, but this suite never asserts that a normally blocked address becomes allowed when whitelisted. A small regression test here would lock in theInetAddressequality path the new flag parser depends on.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/test/instant/smokescreen_test.clj` around lines 16 - 97, Add a test that verifies the new whitelist flag bypass in bad-ip?: create a case in the smokescreen_test.clj that sets flags/smokescreen-whitelist-ips to include an address normally blocked (e.g. "127.0.0.1" or "10.0.0.1"), then assert that (bad-ip? <that-address>) returns false when the flag is enabled; ensure the test exercises the InetAddress equality path the new parser uses by supplying the exact string form the parser will convert to an InetAddress and toggling the flag context around the call to bad-ip?.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@server/src/instant/model/webhook.clj`:
- Around line 459-488: The 410-disable branch must only run if the
`record-attempt!` update actually succeeded; change the code so that after
calling `sql/execute!` with `record-attempt-q` (the `res` variable from the
`record-attempt!` call) you check that the DB update affected the expected row
(i.e. `res` indicates a successful/positive update count) before invoking
`disable!`; keep the existing `gone?` check but add a guard like `when (and
gone? record-update-succeeded?)` so stale workers that failed to win the row
lock cannot disable the webhook.
- Around line 215-241: After updating a webhook in disable! and enable! you must
evict the cached row that get-by-app-id-and-webhook-id! stores so readers don't
see stale :status/:disabled_reason; for disable! call the cache eviction logic
immediately after the sql/do-execute! in the two-arg arity, and for enable!
perform the eviction after the with-transaction completes (i.e. after
sql/do-execute! and after the transaction block) using the same cache key
construction that get-by-app-id-and-webhook-id! uses (app-id and webhook-id) so
the cache entry is removed and subsequent reads fetch fresh data.
In `@server/src/instant/webhook_processor.clj`:
- Around line 89-95: The batch deref is being aborted when any future throws, so
change the per-event future creation (the mapv producing futs that calls
ua/vfuture with handle-event!) to wrap the handle-event! invocation in a
try/catch inside the future: catch exceptions, log or record the error and
return a safe failure marker (so deref never throws), ensuring the rest of the
futures complete and the subsequent should-repeat? check still runs; apply the
same pattern to the other similar block referenced by work-retry! so failures
are handled inside each future rather than at mapv deref.
- Around line 208-229: Protect lifecycle transitions by adding a shared lock and
guards: make stop-global, start-global and restart acquire a single mutex/lock
(e.g., an atom used as a guard) so stop/start cannot interleave, and change the
flag listeners inside _watch-worker-count-change to first check that `@process` is
non-nil and then call restart under that same lock (no-op if process is nil).
Also ensure the webhook-processor is created with sensible defaults in
server/src/instant/core.clj so listeners only reconfigure an already-initialized
processor after flags load; reference the start-global, stop-global, restart,
_watch-worker-count-change and the process atom when making these changes.
---
Nitpick comments:
In `@server/src/instant/util/test.clj`:
- Around line 387-390: The finally block currently blocks indefinitely on
`@worker`# and `@pumper`#; change these to timed derefs (deref with timeout or @
with timeout) and throw a diagnostic exception if the timeout elapses so tests
fail rather than hang: after calling inv/stop (which may fail to close
wal-chan#), replace the plain derefs of worker# and pumper# with a timed-deref
pattern (e.g., deref worker# TIMEOUT or alts/timeout) and if nil/timeout is
returned throw an informative exception mentioning worker# or pumper# and
wal-chan#/inv/stop to aid debugging. Ensure the same timeout policy is applied
to both worker# and pumper#.
In `@server/test/instant/smokescreen_test.clj`:
- Around line 16-97: Add a test that verifies the new whitelist flag bypass in
bad-ip?: create a case in the smokescreen_test.clj that sets
flags/smokescreen-whitelist-ips to include an address normally blocked (e.g.
"127.0.0.1" or "10.0.0.1"), then assert that (bad-ip? <that-address>) returns
false when the flag is enabled; ensure the test exercises the InetAddress
equality path the new parser uses by supplying the exact string form the parser
will convert to an InetAddress and toggling the flag context around the call to
bad-ip?.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 1d767c29-f62f-4f2d-964a-d81985fb9e3b
📒 Files selected for processing (11)
server/src/instant/flags.cljserver/src/instant/model/webhook.cljserver/src/instant/smokescreen.cljserver/src/instant/util/test.cljserver/src/instant/webhook_processor.cljserver/src/instant/webhook_sender.cljserver/src/tool.cljserver/test/instant/model/webhook_test.cljserver/test/instant/smokescreen_test.cljserver/test/instant/webhook_processor_test.cljserver/test/instant/webhook_sender_test.clj
🚧 Files skipped from review as they are similar to previous changes (1)
- server/src/instant/webhook_sender.clj
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
server/src/instant/model/webhook.clj (1)
183-197:⚠️ Potential issue | 🔴 Critical | ⚡ Quick win
create!still stores the wrong attr ids for update matching.This persists only each etype’s
"id"attr, but the newwebhook-matches?-testproves a normal field update matches[name-aid]and not[id-aid]. A webhook created here for["update"]will therefore miss ordinary non-id updates. Either persist all relevant attr ids for the selected etypes, or change the matcher/prefilter contract so both sides key off the same attr set.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/instant/model/webhook.clj` around lines 183 - 197, The webhook create! currently builds id-attr-ids by taking only each etype’s "id" attribute via attr-model/seek-by-fwd-ident-name, which causes update webhooks to miss ordinary-field updates; change the create! logic that defines id-attr-ids (and the subsequent topics) to collect and persist all relevant attribute ids for the selected etypes (e.g., filter attrs from attr-model/get-by-app-id by etype and collect their :id values) so the stored attr-id set matches what the matcher/prefilter expects; ensure the same attribute-id set is used when computing topics (history/bloom-bit) and when storing the webhook so matching uses the full field-id list rather than only the "id" attr.
🧹 Nitpick comments (1)
server/src/instant/webhook_processor.clj (1)
35-44: 💤 Low valueConsider closing the JSON generator in a
try/finallyblock.If
webhook-jwt/webhook-payload-jwtthrows before.close gen, the generator remains unclosed. WhileByteArrayOutputStreamholds no OS resources (so there's no real leak), the partial write means an incomplete byte array would be returned — though in practice the exception propagates, preventing use of the corrupted buffer.♻️ Suggested fix
(defn webhook-body [{:keys [app_id webhook_id isn] :as _event}] (let [out (ByteArrayOutputStream.) gen (.createGenerator json/object-mapper out) payload-url (str config/server-origin "/webhooks/payload/" app_id "/" webhook_id "/" isn)] - (.writeStartObject gen) - (.writeStringField gen "payloadUrl" payload-url) - (.writeStringField gen "token" (webhook-jwt/webhook-payload-jwt app_id webhook_id isn)) - (.writeEndObject gen) - (.close gen) - (.toByteArray out))) + (try + (.writeStartObject gen) + (.writeStringField gen "payloadUrl" payload-url) + (.writeStringField gen "token" (webhook-jwt/webhook-payload-jwt app_id webhook_id isn)) + (.writeEndObject gen) + (.close gen) + (.toByteArray out) + (finally + (.close gen)))))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/instant/webhook_processor.clj` around lines 35 - 44, Wrap creation and use of the JSON generator in a try/finally inside webhook-body: create gen via (.createGenerator json/object-mapper out), perform the write calls and compute the resulting (.toByteArray out) inside the try, and in the finally ensure (.close gen) is always invoked (so webhook-jwt/webhook-payload-jwt exceptions won’t leave gen unclosed); ByteArrayOutputStream (out) need not be closed. Use the existing symbols webhook-body, gen, out and webhook-jwt/webhook-payload-jwt to locate the code.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@server/src/instant/model/webhook.clj`:
- Line 494: Remove the debug hook tool/def-locals from the record-attempt!
function to avoid defining globals and leaking payload/state across concurrent
deliveries; locate the record-attempt! definition in webhook.clj, delete the
(tool/def-locals) invocation (and any related debug-only code tied to that
hook), and ensure the function only uses local bindings or returns values
without mutating namespace-level state so concurrent requests no longer retain
the last event/attempt payload.
- Around line 343-345: The predicate passed to ucoll/seek is using the keyword
lookup `:=` instead of equality, causing incorrect matches; update the vmemoize
get-wal-record predicate from `#(:= isn (:isn %))` to use an equality check `#(=
isn (:isn %))` so that get-wal-record (used in the get-wal-record binding and by
ucoll/seek over wal-records) returns the correct WAL record for a given isn.
---
Duplicate comments:
In `@server/src/instant/model/webhook.clj`:
- Around line 183-197: The webhook create! currently builds id-attr-ids by
taking only each etype’s "id" attribute via attr-model/seek-by-fwd-ident-name,
which causes update webhooks to miss ordinary-field updates; change the create!
logic that defines id-attr-ids (and the subsequent topics) to collect and
persist all relevant attribute ids for the selected etypes (e.g., filter attrs
from attr-model/get-by-app-id by etype and collect their :id values) so the
stored attr-id set matches what the matcher/prefilter expects; ensure the same
attribute-id set is used when computing topics (history/bloom-bit) and when
storing the webhook so matching uses the full field-id list rather than only the
"id" attr.
---
Nitpick comments:
In `@server/src/instant/webhook_processor.clj`:
- Around line 35-44: Wrap creation and use of the JSON generator in a
try/finally inside webhook-body: create gen via (.createGenerator
json/object-mapper out), perform the write calls and compute the resulting
(.toByteArray out) inside the try, and in the finally ensure (.close gen) is
always invoked (so webhook-jwt/webhook-payload-jwt exceptions won’t leave gen
unclosed); ByteArrayOutputStream (out) need not be closed. Use the existing
symbols webhook-body, gen, out and webhook-jwt/webhook-payload-jwt to locate the
code.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 77aa1700-2454-4025-ba8b-f0e69e3fbf51
📒 Files selected for processing (11)
server/src/instant/flags.cljserver/src/instant/model/webhook.cljserver/src/instant/smokescreen.cljserver/src/instant/util/test.cljserver/src/instant/webhook_processor.cljserver/src/instant/webhook_sender.cljserver/src/tool.cljserver/test/instant/model/webhook_test.cljserver/test/instant/smokescreen_test.cljserver/test/instant/webhook_processor_test.cljserver/test/instant/webhook_sender_test.clj
🚧 Files skipped from review as they are similar to previous changes (1)
- server/src/instant/webhook_sender.clj
There was a problem hiding this comment.
🧹 Nitpick comments (2)
server/src/instant/model/webhook.clj (2)
62-67: ⚡ Quick win
with-cache-invalidationsilently breaks for multi-form bodies.
~@bodyis spliced directly into the innerletbindings vector. With a single body form (today's only callers indisable!/enable!) this works, but as soon as someone writes(with-cache-invalidation k a-side-effect (do-thing))the macro expands to(let [res# a-side-effect (do-thing)] ...)— a malformed bindings vector that won't compile. Wrap the body indoto make the macro robust to future callers.♻️ Proposed fix
(defmacro with-cache-invalidation [cache-key & body] `(let [cache-key# ~cache-key] (evict-webhook-from-cache cache-key#) - (let [res# ~@body] + (let [res# (do ~@body)] (evict-webhook-from-cache cache-key#) res#)))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/instant/model/webhook.clj` around lines 62 - 67, The macro with-cache-invalidation improperly splices ~@body into the inner let bindings, which breaks for multi-form bodies; fix it by wrapping the body forms in a do so the inner let creates a single expression for res# (i.e. bind res# to (do ~@body)), leaving the evict-webhook-from-cache(cache-key#) calls unchanged and preserving existing behavior in disable! / enable! callers.
155-159: 💤 Low valueDispatch key missing namespace qualifier.
Every other
sql/select*/sql/execute*call in this namespace passes a::xxx!namespaced keyword as the dispatch tag (e.g.::take-webhook-count-lock!on Line 138,::create!on Line 205). This one is:check-webhook-limit!(single colon), so traces/metrics keyed off the dispatch tag will land in a different bucket. Looks like a typo.✏️ Proposed fix
(let [{:keys [webhook_count]} - (sql/select-one :check-webhook-limit! + (sql/select-one ::check-webhook-limit! conn (uhsql/formatp check-webhook-limit-q {:app-id app-id}))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/instant/model/webhook.clj` around lines 155 - 159, The dispatch tag for the sql/select-one call is missing the namespace qualifier causing traces to be categorized separately; update the dispatch keyword used in the sql/select-one invocation that references check-webhook-limit-q to use the namespaced keyword ::check-webhook-limit! (matching the style of ::take-webhook-count-lock! and ::create!) so the trace/metric dispatch bucket is consistent with other sql/* calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@server/src/instant/model/webhook.clj`:
- Around line 62-67: The macro with-cache-invalidation improperly splices ~@body
into the inner let bindings, which breaks for multi-form bodies; fix it by
wrapping the body forms in a do so the inner let creates a single expression for
res# (i.e. bind res# to (do ~@body)), leaving the
evict-webhook-from-cache(cache-key#) calls unchanged and preserving existing
behavior in disable! / enable! callers.
- Around line 155-159: The dispatch tag for the sql/select-one call is missing
the namespace qualifier causing traces to be categorized separately; update the
dispatch keyword used in the sql/select-one invocation that references
check-webhook-limit-q to use the namespaced keyword ::check-webhook-limit!
(matching the style of ::take-webhook-count-lock! and ::create!) so the
trace/metric dispatch bucket is consistent with other sql/* calls.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: f680eb1c-c1f1-4dd9-b0ca-a868e110203a
📒 Files selected for processing (2)
server/src/instant/model/webhook.cljserver/test/instant/model/webhook_test.clj
|
View Vercel preview at instant-www-js-webhooks-3-jsv.vercel.app. |
See #2637 for an explanation of the Dashboard
See #2646 for an explanation of the SDK
This PR completes the backend machinery for webhooks. Next up after this will be admin sdk and the UI and the routes for creating/modifying webhooks, and viewing/retrying webhook events.
The main pieces are:
Invalidator hook
When we call push-history!, we'll now return all of the webhooks that match the bloom filter (the webhook has a filter on attr-ids, and the wal-record has a filter on attr-ids--we check if they match).
Some of the returned webhooks will be false positives, so we do another filter that checks the
tripleschanges to make sure our webhook matches.If the webhook matches, we insert a webhook_event into the webhook_events table. Then we pick a random machine to notify that there is something new in the queue.
Webhook processor
We start a process with 15 workers (10 for webhooks and 5 for retries) that listens for notifications that a webhook_event is ready. When it gets a notification, the worker will fetch up to 100 records from the queue and process them all in parallel.
Right now we do the traditional
skip lockedfetch from pg. We may run into some scaling issues with that, so I've set it up so that we can switch to a system where we assign each event to a single machine. Then we just have to do a targeted update on a single row. I didn't make that the default because you still have to have a similarclaimflow for events that get dropped, so this keeps things a lot simpler.It also periodically checks for new work, stuck events, and events that need to be retried in the background.
After we send (or fail to send) the webhook, we will record an attempt on the
webhook_eventsrow. If the attempt fails, then we'll add it to the retry queue (by setting thestatusto 'error' and settingnext_attempt_after). We'll do an exponential backoff (1m, 5m, 30m, 2h, 5h, 10h, and then every 12 hours for up to 11 attempts) until we give up after 3 days.HTTP Client
We use Okhttp for our http client. I chose it because we already have it as a dependency through
posthot-serverandopentelemetry(I had to bump the opentelemetry version), it is supposed to be good at handling transitory failure, it supports connection pooling, and it supports custom DNS resolution.We want to prevent Server-Side Request Forgery (SSRF), where an attacker uses our own network to attack our internal systems. We use a dns resolver that uses dns-over-https (checking both google and cloudflare in parallel and taking the first result). That lets us bypass the aws dns resolver, which will resolve internal services.
We have an ip filter in
instant.smokescreenthat prevents us from resolving dangerous IPs, like127.0.0.1. I took the filters from Stripe's smokescreen (they use it to send webhooks, fly.io also has a nice writeup on the problem.).Limitations
We only allow an app to have up to 100 webhooks--since we have to the processing sequentially in the singleton invalidator, we want to keep the processing to a reasonable level.
We don't support webhooks on
link/unlink. I think we could probably support this later by addinglinkandunlinkto actions and having a set ofref_attr_idsthat we match against.