Troubleshooting

Work top-down: is the event arriving → is it passing the chain → is the handler firing → is the provider accepting it? Three tools answer almost everything: pod logs, /readyz, and the metrics.

kubectl logs -n tekton-events-relay deploy/tekton-events-relay -f
kubectl exec -n tekton-events-relay deploy/tekton-events-relay -- wget -qO- localhost:8080/readyz

Nothing happens at all

Check	How	Fix
Tekton is sending events	`events_received_total` increasing? Any `cloudevent_request_started` logs?	Set `default-cloud-events-sink` in the `config-defaults` ConfigMap (`tekton-pipelines` ns) to the relay Service URL; check NetworkPolicies between namespaces.
Events arrive but are dropped early	Log `no decoder registered for event type` + `events_unsupported_type_total`	Unknown CloudEvent type (often after a Tekton upgrade) — open an issue with the type.
Resource type filtered	`events_filtered_total`	Enable the type under `filter:` (`allow_taskrun: true`, …).
Missing annotations	Log `missing annotation tekton.dev/tekton-events-relay.scm.provider`	Annotate the PipelineRun in your TriggerTemplate.
Wrong provider name	Dispatcher log `no handlers processed event`	`scm.provider` must equal a configured instance `name` exactly.
Handler exists but `when` never matches	enable `logging.level: debug`	Test the CEL expression; remember states are lowercase.

Provider rejects the call

/readyz shows the last error per handler — start there.

Symptom	Cause	Fix
`401 Bad credentials` / `403`	expired or under-scoped token	Rotate the Secret (hot reload picks it up; the webhook/grafana/sentry/jira notifiers re-read the mounted secret per request, so the new value applies immediately). For webhook/jira you can instead use OAuth2 client credentials (`auth.oauth2` + `token_url`) so the relay auto-refreshes the token before expiry. Check scopes on the provider page.
`404` on status/comment	wrong owner/name/project annotations, or token can't see the repo	Compare annotations against the repo URL; remember GitLab prefers `repo-id`.
`422` / validation error	field limits (context/description/label length)	Shorten the template; limits are provider-specific.
Frequent `429`	provider rate limiting	Watch `notifier_rate_limit_hits_total{host}`; the retry policy honors `Retry-After` — if sustained, reduce event volume per action with `when`/filters, or use a GitHub App (higher limits).
Self-signed TLS errors	private CA	Mount the CA and configure the client; avoid `insecure_skip_verify`.

Permanent failures are preserved in the DLQ when enabled — after fixing credentials, POST /api/v1/dlq/replay.

Duplicate or missing notifications

Symptom	Cause	Fix
Duplicate comments, `replicaCount > 1`	per-pod `memory` store: retransmissions land on another replica	Use a shared store (valkey/olric) — or 1 replica. `mode: upsert` also neutralizes duplicates for comments.
Duplicates after pod restart	memory store lost	Same as above.
`store_errors_total` rising + occasional duplicates	store backend down — relay fails open	Fix Valkey/Olric connectivity; events were delivered, only dedup degraded.
`deduper_evictions_total` climbing	cache smaller than event volume	Raise `dedupe_size` (memory) / rely on TTL-based remote backends.
Events silently missing under load	back-pressure	`events_backpressure_total` — these return 503 and Tekton retransmits; check what's slow via `notifier_latency_seconds`.
One slow provider delays everything	no	It can't: `handler_timeout` (default 10s) bounds each handler — see `handler_timeouts_total`.

Config & deploy issues

Symptom	Fix
Pod CrashLoops at start	`tekton-events-relay --validate --config …` against the rendered ConfigMap; the error message names the bad key. `helm install` already schema-validates values.
Edited the ConfigMap, nothing changed	Hot reload only applies valid configs — check `config_reloads_total{result="failure"}` and the `config reload:` log line. `server`/`store`/`dlq`/`logging`/`tracing` changes need a restart.
`verbose options require logging.level to be 'debug'`	Exactly that — set the level or drop the verbose flags.
Olric pods don't form a cluster	Pod-to-pod 3320/tcp + 3322/tcp+udp must be open (the chart's NetworkPolicy handles it when `backend: olric`; check other policies/CNI).

Debugging one event end-to-end

logging.level: debug (+ verbose.payloads: true if needed — secrets are redacted).
Trigger the run; grep the logs for the CloudEvent ID (ce_id).
You'll see: received → decoder → each chain step → per-handler success/failure with the provider's response.
With tracing.endpoint set, the same journey is one trace with per-handler spans.

Still stuck? Open an issue with the relay version, the ce_id log excerpt and your (redacted) config.

Repository · Releases · Helm Chart · Issues · Security Policy

🏠 Home

Getting started

Reference

SCM providers

Notifiers

Running in production

More

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Troubleshooting

Troubleshooting

Nothing happens at all

Provider rejects the call

Duplicate or missing notifications

Config & deploy issues

Debugging one event end-to-end

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally