Skip to content

chore: wire OpenTelemetry collector URL into staging deploy#174

Merged
mpetrun5 merged 2 commits intomainfrom
chore/staging-otel-collector-url
Apr 20, 2026
Merged

chore: wire OpenTelemetry collector URL into staging deploy#174
mpetrun5 merged 2 commits intomainfrom
chore/staging-otel-collector-url

Conversation

@akandic47
Copy link
Copy Markdown

Summary

  • Add SYG_RELAYER_OPENTELEMETRYCOLLECTORURL to all three relayers in docker-compose.staging.yml
  • Expose the env var in .env.staging.template and the staging Portainer deploy workflow (sourced from secrets.SYG_RELAYER_OPENTELEMETRYCOLLECTORURL, added to the envsubst allowlist)

Test plan

  • Set SYG_RELAYER_OPENTELEMETRYCOLLECTORURL secret in the staging GitHub environment
  • Trigger the staging deploy and verify each relayer container has the env var set
  • Confirm metrics flow to the OTLP collector

@github-actions
Copy link
Copy Markdown

Go Test coverage is 53.3 %\ ✨ ✨ ✨

Copy link
Copy Markdown
Collaborator

@mpetrun5 mpetrun5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also add relayerID and env variables.

@akandic47 akandic47 requested a review from mpetrun5 April 20, 2026 15:28
@github-actions
Copy link
Copy Markdown

Go Test coverage is 53.3 %\ ✨ ✨ ✨

@mpetrun5 mpetrun5 merged commit cbdc585 into main Apr 20, 2026
7 checks passed
@mpetrun5 mpetrun5 deleted the chore/staging-otel-collector-url branch April 20, 2026 15:31
mpetrun5 pushed a commit that referenced this pull request Apr 28, 2026
## Summary

Adds `metric.WithUnit("s")` to the four `Float64Histogram`s in
`MpcMetrics`:

- `relayer.SessionTime` (PR #143)
- `relayer.InitiateTime` (PR #171)
- `relayer.CommSendTime` (PR #171)
- `relayer.CommDnsResolveTime` (PR #171)

## Why

`sygma-core` registers a sub-second bucket view in
`observability.InitMetricProvider`:

```go
// observability/metrics.go (initSecondView)
sdkmetric.NewView(
    sdkmetric.Instrument{Unit: "s"},
    sdkmetric.Stream{Aggregation: aggregation.ExplicitBucketHistogram{
        Boundaries: []float64{1e-6, 1e-5, 1e-4, 1e-3, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10, 100, 1000, 10000},
    }},
)
```

The view is keyed by `Instrument{Unit: "s"}`. The histograms in
`metrics/mpc.go` declared a description but no unit, so the view never
matched and the SDK fell back to OTel's default histogram boundaries
`[5, 10, 25, 50, ..., 10000]` — which are tuned for milliseconds.
Signing-phase durations (sub-second to a few seconds) collapse into the
`le=5` bucket, making `histogram_quantile` return values pinned to
bucket boundaries rather than real percentiles.

Values are already recorded in seconds via `d.Seconds()`, so no math
changes — only bucketing.

## Grafana query change

The OTLP→Prometheus exporter appends a `_seconds` suffix when the
instrument carries `Unit: "s"`. After this PR, dashboard queries change:

| Before | After |
|---|---|
| `relayer_SessionTime_bucket` | `relayer_SessionTime_seconds_bucket` |
| `relayer_InitiateTime_bucket` | `relayer_InitiateTime_seconds_bucket`
|
| `relayer_CommSendTime_bucket` | `relayer_CommSendTime_seconds_bucket`
|
| `relayer_CommDnsResolveTime_bucket` |
`relayer_CommDnsResolveTime_seconds_bucket` |

In practice no dashboards depend on the old names yet — the OTel
collector URL was only wired into staging in #174, so historical data is
empty.

## Test plan

- [x] `go build ./...` clean
- [x] `go test ./metrics/... ./tss/... ./comm/p2p/...` pass
- [ ] After deploy, confirm `relayer_SessionTime_seconds_count` and the
three new `_seconds_count` series are non-empty in Grafana
- [ ] Confirm `histogram_quantile(0.95, ...)` returns values that vary
with workload rather than snapping to bucket boundaries
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants