Aggregated metrics for payjoin-service (with OTel Collector sidecar) by spacebear21 · Pull Request #1323 · payjoin/rust-payjoin

spacebear21 · 2026-02-10T13:35:39Z

This PR introduces a mechanism for collecting aggregated metrics from distributed payjoin-service operators. This is achieved by introducing an (optional) OpenTelemetry Collector sidecar that scrapes the local Prometheus /metrics endpoint, collects structured logs from the service's stdout, and receives traces. It pushes all three signal types to the Grafana Cloud instance I setup for the payjoin org using per-operator credentials. Claude drew this nice explanatory diagram:

┌─────────────────────────────────┐
│  Operator Server A              │
│  ┌───────────┐  ┌─────────────┐ │
│  │ payjoin-  │  │ OTel        │ │     OTLP/gRPC or OTLP/HTTP
│  │ service   ├──► Collector   ├─┼──────────────────────┐
│  │ (/metrics)│  │ (sidecar)   │ │                      │
│  └───────────┘  └─────────────┘ │                      │
└─────────────────────────────────┘                      │
                                                         ▼
┌─────────────────────────────────┐         ┌────────────────────────┐
│  Operator Server B              │         │  Grafana Cloud         │
│  ┌───────────┐  ┌─────────────┐ │         │                        │
│  │ payjoin-  │  │ OTel        │ │  OTLP   │  Mimir  (metrics)      │
│  │ service   ├──► Collector   ├─┼────────►│  Loki   (logs)         │
│  │           │  │             │ │         │  Tempo  (traces)       │
│  └───────────┘  └─────────────┘ │         │                        │
└─────────────────────────────────┘         │  Grafana (dashboards)  │
                                            └────────────────────────┘

This is an opt-in design. Each operator who opts-in needs to request an auth token from us and configure the collector accordingly.

Some open questions for reviewers:

Is the --telemetry feature overkill? nix2container needs to build the docker image with all features enabled anyway, so in practice payjoin-service features aren't really configurable for docker users. The same goes for the --acme feature.
The sidecar architecture introduces a pseudo-dependency on docker-compose to run the payjoin-service, since I don't expect many operators to go through the trouble of configuring and running this stack on bare metal. Is this OK?
The current approach requires operators to set the "OPERATOR_DOMAIN" env variable, that we could use to get per-operator stats (# active operators, up time by operator, etc.). User error would result in unreliable results, so I wonder if this could be set dynamically somehow? Maybe using acme.domains if it's set? IP address? Random UID?

AI disclosure: I used Opus 4.6 to design the system and write much of the code and config files, manually reviewed everything and edited as needed.

Pull Request Checklist

Please confirm the following before requesting review:

I have disclosed my use of
AI
in the body of this PR.
I have read CONTRIBUTING.md and rebased my branch to produce hygienic commits.

This enables structured log output and configures exporters for OpenTelemetry.

coveralls · 2026-02-10T13:40:28Z

Pull Request Test Coverage Report for Build 21928328465

Details

0 of 25 (0.0%) changed or added relevant lines in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage decreased (-0.1%) to 83.128%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
payjoin-service/src/main.rs	0	25	0.0%

Totals
Change from base Build 21922394027:	-0.1%
Covered Lines:	10238
Relevant Lines:	12316

💛 - Coveralls

DanGould · 2026-02-10T13:56:58Z

I think we want to be very specific with shared metrics rather than enable logs in general. I can see that getting away from us and over-collecting easily.
I sent Ava a message about a docker compose dependency. This is the kind of thing you gotta ask the users right away. Docker is fine for BB & Cake evidently, that's how they're set up right now afaiu. BOBSpace may be on bare meta if they're not on compose already (but I think they're on compose already).

The current approach requires operators to set the "OPERATOR_DOMAIN" env variable, that we could use to get per-operator stats (# active operators, up time by operator, etc.). User error would result in unreliable results, so I wonder if this could be set dynamically somehow? Maybe using acme.domains if it's set? IP address? Random UID?

IP seems fine, but panicking the program unless it's set comes to mind, and then reporting as "You're running as x.y.z

DanGould · 2026-02-10T17:00:59Z

2:47 PM

Yesterday @DanGould

Is a dependency on docker compose ok for you or do you nix everything

#1323

9:53 PM

Today @achow101

I nix everything

Docker can fuck right off

The OpenTelemetry Collector sidecar scrapes Prometheus metrics and receives traces and logs from the `tracing` crate. Everything is then tagged with operator metadata and exported to a Grafana OTLP endpoint.

spacebear21 · 2026-02-13T17:11:23Z

Superseded by #1327

Add optional telemetry for payjoin-service

48d00d2

This enables structured log output and configures exporters for OpenTelemetry.

spacebear21 requested review from DanGould, nothingmuch and zealsham February 10, 2026 13:35

spacebear21 mentioned this pull request Feb 10, 2026

Add nix2container workflows for creating payjoin-service docker images #1321

Merged

2 tasks

Add telemetry configurations

e1fd193

The OpenTelemetry Collector sidecar scrapes Prometheus metrics and receives traces and logs from the `tracing` crate. Everything is then tagged with operator metadata and exported to a Grafana OTLP endpoint.

spacebear21 force-pushed the open-telemetry-grafana branch from fabf8dd to e1fd193 Compare February 12, 2026 00:15

spacebear21 mentioned this pull request Feb 12, 2026

Aggregated metrics for payjoin-service (with native OTLP) #1327

Merged

2 tasks

spacebear21 changed the title ~~Aggregated metrics for payjoin-service~~ Aggregated metrics for payjoin-service (with OTel Collector sidecar) Feb 12, 2026

spacebear21 closed this Feb 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregated metrics for payjoin-service (with OTel Collector sidecar)#1323

Aggregated metrics for payjoin-service (with OTel Collector sidecar)#1323
spacebear21 wants to merge 2 commits intopayjoin:masterfrom
spacebear21:open-telemetry-grafana

spacebear21 commented Feb 10, 2026

Uh oh!

coveralls commented Feb 10, 2026 •

edited

Loading

Uh oh!

DanGould commented Feb 10, 2026

Uh oh!

DanGould commented Feb 10, 2026

Uh oh!

spacebear21 commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

spacebear21 commented Feb 10, 2026

Uh oh!

coveralls commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 21928328465

Details

💛 - Coveralls

Uh oh!

DanGould commented Feb 10, 2026

Uh oh!

DanGould commented Feb 10, 2026

Uh oh!

spacebear21 commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coveralls commented Feb 10, 2026 •

edited

Loading