feat: add trigger primitive for configuring and receiving external events#2146
feat: add trigger primitive for configuring and receiving external events#2146
Conversation
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
🦋 Changeset detectedLatest commit: 71f72f0 The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
|
||||||||||||||||
|
|
||||||||||||||||
This comment has been minimized.
This comment has been minimized.
🚀 Preview Environment (PR #2146)Preview URL: https://pr-2146.dev.getgram.ai
Gram Preview Bot |
| for (const log of logs) { | ||
| const traceId = log.traceId; | ||
| if (!traceId) continue; | ||
| const traceId = log.traceId || log.id; |
There was a problem hiding this comment.
change too broad, check if needed at all, if needed - check eventSource === "trigger"
server/internal/triggers/impl.go
Outdated
| result, err := s.runtime.ProcessWebhook(ctx, triggerID, body, r.Header) | ||
| if err != nil { | ||
| return oops.E(oops.CodeUnexpected, err, "process trigger webhook").Log(ctx, s.logger) | ||
| } |
There was a problem hiding this comment.
🚩 ProcessWebhook error propagation returns 500 to external callers like Slack
When ProcessEvent fails inside ProcessWebhook (server/internal/triggers/runtime.go:177), the error propagates to HandleWebhook (impl.go:368-369) which returns a 500 HTTP response. For Slack, this means the event delivery is treated as failed, triggering retries and potentially disabling the webhook URL after repeated failures. This is especially problematic in combination with the missing dispatchers (BUG-0001), but even after that is fixed, any transient error in filter evaluation or enqueueing will cause the same behavior. Consider returning 200 to the webhook caller and handling failures asynchronously (e.g., logging the error and emitting a delivery failure telemetry event without propagating the error to the HTTP response).
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
wrong, our boundary should be durable async hand-off, returning 500 before that is correct
| o11y.AttachHandler(mux, "POST", "/rpc/triggers/{id}/webhook", func(w http.ResponseWriter, r *http.Request) { | ||
| oops.ErrHandle(service.logger, service.HandleWebhook).ServeHTTP(w, r) | ||
| }) | ||
| } |
There was a problem hiding this comment.
🚩 Webhook endpoint is unauthenticated by design but lacks rate limiting
The webhook endpoint at server/internal/triggers/impl.go:80 is mounted outside the Goa auth middleware chain using o11y.AttachHandler directly. This is intentional — external services (e.g., Slack) need to POST to it without Gram auth. Security is handled at the definition level (e.g., Slack HMAC signature verification in definitions.go:513-538). The GetTriggerInstanceByIDPublic query at server/internal/triggers/repo/queries.sql:40-44 has no project_id filter, only checking the trigger ID and soft-delete status, which is correct for this public endpoint. However, the endpoint has no rate limiting, which could be a concern for abuse.
Was this helpful? React with 👍 or 👎 to provide feedback.
| logger *slog.Logger, | ||
| db *pgxpool.Pool, | ||
| temporalEnv *tenv.Environment, | ||
| envLoader EnvironmentLoader, | ||
| deliveryLogger DeliveryLogger, | ||
| serverURL *url.URL, |
There was a problem hiding this comment.
None of these should be nil under any circumstances and yet the methods below have defensive checks for that. We should be spreading nil values down the stack. There are suitable values for all of these dependencies for local, test, production.
| envEntries, | ||
| &triggerDeliveryLogger{telemetry: telemetry}, | ||
| serverURL, | ||
| bgtriggers.NewNoopDispatcher(logger), |
There was a problem hiding this comment.
🚩 No assistant dispatcher registered — triggers with target_kind='assistant' will fail at dispatch time
In server/cmd/gram/triggers.go:95, the newTriggersApp function only registers a NoopDispatcher. The ValidateTargetKind function at server/internal/background/triggers/app.go:551 accepts both assistant and noop, and the UI/API allow creating triggers with target_kind=assistant. However, when such a trigger fires, App.Dispatch at server/internal/background/triggers/app.go:445 will fail with "trigger dispatcher for target kind \"assistant\" is not configured". The trigger instance gets created successfully but will always fail at runtime. This might be intentional for a staged rollout, but users can currently configure something that will never work.
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
yes, Devin, by design...
| targetKind, | ||
| targetRef, | ||
| targetDisplay, | ||
| ...(environmentId ? { environmentId } : {}), |
There was a problem hiding this comment.
🚩 Dashboard update form cannot clear environmentId
In client/dashboard/src/pages/triggers/Triggers.tsx:526, the update handler uses ...(environmentId ? { environmentId } : {}) to conditionally include the environment ID. Because empty string is falsy in JavaScript, if a user clears the environment selection, environmentId will be "" and will be omitted from the request. On the server side at server/internal/triggers/impl.go:190-196, when payload.EnvironmentID is nil, the existing environment ID is preserved. This means once an environment is associated with a trigger, it cannot be removed through the update UI. This may be intentional if environments are always required for the associated trigger definitions, but it's worth verifying.
Was this helpful? React with 👍 or 👎 to provide feedback.
87d7d12 to
bc163a0
Compare
| mux.Use(middleware.CORSMiddleware(c.String("environment"), c.String("server-url"), chatSessionsManager)) | ||
| mux.Use(customdomains.Middleware(logger, db, c.String("environment"), serverURL)) | ||
| mux.Use(middleware.SessionMiddleware) | ||
| mux.Use(middleware.AdminOverrideMiddleware) |
There was a problem hiding this comment.
🟡 RBACOverrideMiddleware accidentally removed from server middleware chain
The mux.Use(middleware.RBACOverrideMiddleware(c.String("environment"))) line was deleted from the HTTP middleware chain in server/cmd/gram/start.go. This middleware (defined at server/internal/middleware/rbac_override.go:18) reads the X-Gram-Scope-Override header in local development environments and stores structured RBAC overrides on the request context. Its removal is unrelated to the triggers feature—it appears to be an accidental casualty of the import reorganization in this file. Without it, local-dev RBAC scope override testing via the header is silently broken.
| mux.Use(middleware.AdminOverrideMiddleware) | |
| mux.Use(middleware.AdminOverrideMiddleware) | |
| mux.Use(middleware.RBACOverrideMiddleware(c.String("environment"))) |
Was this helpful? React with 👍 or 👎 to provide feedback.
e135d5e to
fc6cb58
Compare
There was a problem hiding this comment.
Devin Review found 1 new potential issue.
🐛 1 issue in files not directly in the diff
🐛 Soft-delete query sets deleted_at but not deleted = TRUE, so "deleted" triggers remain visible (server/internal/triggers/repo/queries.sql:75-83)
The DeleteTriggerInstance SQL query at server/internal/triggers/repo/queries.sql:75-83 sets deleted_at = clock_timestamp() and updated_at = clock_timestamp(), but never sets deleted = TRUE. All read queries (ListTriggerInstances, GetTriggerInstanceByID, GetTriggerInstanceByIDPublic) filter on deleted IS FALSE. Since the deleted boolean column is never flipped to TRUE, the soft-deleted trigger continues to appear in list results, webhook processing continues to find the instance via GetTriggerInstanceByIDPublic, and the trigger remains operational. The App.Delete method in server/internal/background/triggers/app.go:238-251 also attempts to clean up the Temporal schedule after the DB delete, but because the trigger isn't actually marked as deleted, subsequent operations on it will still succeed.
View 16 additional findings in Devin Review.
b23e176 to
f1f3e4e
Compare
| switch { | ||
| case errors.Is(err, bgtriggers.ErrBadRequest): | ||
| code = oops.CodeBadRequest | ||
| case errors.Is(err, sql.ErrNoRows): |
There was a problem hiding this comment.
🔴 toTriggerError uses sql.ErrNoRows instead of pgx.ErrNoRows, causing 500s instead of 404s
The toTriggerError function checks errors.Is(err, sql.ErrNoRows) to detect not-found errors, but the trigger repo is generated by sqlc with sql_package: "pgx/v5" (server/database/sqlc.yaml:204), so database queries return pgx.ErrNoRows, not sql.ErrNoRows. These are distinct sentinel errors (pgx.ErrNoRows = errors.New("no rows in result set") vs sql.ErrNoRows = errors.New("sql: no rows in result set")), so the errors.Is check never matches. This means all "trigger not found" errors (e.g., get/update/delete with a non-existent ID) return HTTP 500 instead of 404. The codebase's newer services like toolsets correctly use pgx.ErrNoRows (see server/internal/toolsets/impl.go:513).
Was this helpful? React with 👍 or 👎 to provide feedback.
server/cmd/gram/triggers.go
Outdated
| triggerrepo "github.com/speakeasy-api/gram/server/internal/triggers/repo" | ||
| ) | ||
|
|
||
| type triggerDeliveryLogger struct { |
There was a problem hiding this comment.
this can be moved into server/internal/background/triggers now since we refactored the telemetry logger.
server/cmd/gram/triggers.go
Outdated
| }) | ||
| } | ||
|
|
||
| func newTriggersApp( |
There was a problem hiding this comment.
and this can be moved back into deps.go
Add the trigger runtime, API surface, generated SDK updates, and dashboard UI for managing triggers. This also includes the follow-up fixes and review updates that landed in the trigger branch before rebasing onto main.
disintegrator
left a comment
There was a problem hiding this comment.
Nicely done. Interfaces and codebase structure are solid. The explicit separation of authenticating and handling of webhooks makes the handling of auth explicit which is great to see.
f1f3e4e to
740714d
Compare
| func toTriggerError(ctx context.Context, logger *slog.Logger, err error, message string) error { | ||
| code := oops.CodeUnexpected | ||
| switch { | ||
| case errors.Is(err, bgtriggers.ErrBadRequest): | ||
| code = oops.CodeBadRequest | ||
| case errors.Is(err, sql.ErrNoRows): | ||
| code = oops.CodeNotFound | ||
| } | ||
| return oops.E(code, err, "%s", message).Log(ctx, logger) | ||
| } |
There was a problem hiding this comment.
🚩 Webhook authentication errors return HTTP 500 instead of 401/403
When AuthenticateWebhook fails (e.g., Slack signature mismatch), ProcessWebhook wraps the error with fmt.Errorf("authenticate webhook: %w", err). In impl.go:toTriggerError, this doesn't match ErrBadRequest or sql.ErrNoRows, so it falls through to oops.CodeUnexpected (HTTP 500). Webhook callers (e.g., Slack) may expect a 401 or 403 for authentication failures. Returning 500 obscures the actual problem and may cause Slack to retry the webhook unnecessarily.
Was this helpful? React with 👍 or 👎 to provide feedback.
Why
Gram needs a first-class way to react to external events and turn them into internal work.
This PR introduces
triggersas the primitive for describing how Gram receives events from external systems and how those events should be routed into Gram.The immediate goal is to unblock assistant-style products that need to respond to external systems like Slack or scheduled jobs, while giving us a project-scoped resource we can observe, configure, and evolve later.
What This Enables
Correlation IDs
Triggers use correlation IDs to group related events for downstream routing:
channel:thread_ts, falling back to justchannel, thenteam_id. All messages in the same Slack thread share a correlation ID, enabling thread-aware routing to assistant conversations.Trigger Config & Filtering
Each trigger definition owns its filtering logic through the
Config.Filter(event any) (bool, error)interface:Filterhandles both event-type matching (user selects which Slack event types to receive from the full Events API catalogue) and optional CEL expression evaluation. CEL programs are compiled once duringDecodeConfigso parse errors surface at creation time.Filteralways returns true. Cron config only hasschedule(cron expression).Environment is only required when the trigger definition declares required env vars (e.g. Slack needs
SLACK_SIGNING_SECRET). Cron triggers work without an environment.Example
A typical Slack trigger configured through
platform_configure_triggermight look like:{ "definition_slug": "slack", "name": "support-thread-replies", "environment_slug": "prod", "target_kind": "assistant", "target_ref": "assistant:support", "target_display": "Support Assistant", "config": { "event_types": ["message"], "filter": "event.channel_id == \"C12345678\" && event.thread_id == \"1712685400.123456\"" } }The trigger definition handles Slack-specific event ingestion, while the instance-level CEL filter decides whether a normalized Slack event should be sent onward. That gives us one reusable Slack trigger type with per-instance routing rules instead of hard-coding behavior for each integration.
Screen.Recording.2026-04-09.at.21.11.52.mov
Screen.Recording.2026-04-11.at.19.14.32.mov
Scope Of This PR
Deliberate Non-Goals