Skip to content

Enable OpenTelemetry in prod with sampling#1985

Merged
cadmiumcat merged 2 commits into
mainfrom
enable-otel-in-prod
Feb 16, 2026
Merged

Enable OpenTelemetry in prod with sampling#1985
cadmiumcat merged 2 commits into
mainfrom
enable-otel-in-prod

Conversation

@cadmiumcat
Copy link
Copy Markdown
Contributor

@cadmiumcat cadmiumcat commented Feb 12, 2026

What problem does this pull request solve?

This PR enables OpenTelemetry in production.

We want to sample based on the trace ID by picking the percentage of traces to sample (Head Sampling).

We expect we generate a small amount of data such that we could collect all of it without sampling. However, we want to be cautious in case collecting the data has an impact on e.g. the latency of our services.

We have opted for Head Sampling because it's easy to understand and configure, and it will allow us to determine whether we need to manage the data volume.

For more info: https://opentelemetry.io/docs/concepts/sampling/

Trello card:

Things to consider when reviewing

I've tried the sampling in Staging by setting the ratio to 0.1 and checking that only 1 out of 10 requests make it through to AWS Xray

  • Ensure that you consider the wider context.
  • Does it work when run on your machine?
  • Is it clear what the code is doing?
  • Do the commit messages explain why the changes were made?
  • Are there all the unit tests needed?
  • Has all relevant documentation been updated?

Reminders

If you've made changes to the deployer role (files in modules/deployer-access):

  • Remember to run make <environment> forms/account apply on the relevant environments (dev, staging, user-research, and/or prod)
  • Check the #govuk-forms-deployment-notifications Slack channel to ensure the apply-forms-terraform-<environment> pipelines have run successfully

@cadmiumcat cadmiumcat requested review from a team and theseanything February 12, 2026 13:48
Comment thread infra/deployments/forms/tfvars/staging.tfvars Outdated
Comment thread infra/modules/ecs-service/ecs.tf Outdated
We want to sample based on the trace ID by picking the percentage of traces to
sample (Head Sampling).

We expect we generate a small amount of data such that we could collect all of
it without sampling. However, we want to be cautious in case collecting the data
has an impact on e.g. the latency of our services.

We have opted for Head Sampling because it's easy to understand and configure,
and it will allow us to determine whether we need to manage the data volume.

More specifically, we chose "parentbased_traceidratio" to respect the parent
span's sampling decision (https://opentelemetry.io/docs/specs/otel/trace/sdk/#parentbased)

For more info: https://opentelemetry.io/docs/concepts/sampling/
Copy link
Copy Markdown
Contributor

@theseanything theseanything left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay! Thanks Cat!

@cadmiumcat cadmiumcat added this pull request to the merge queue Feb 16, 2026
Merged via the queue into main with commit 9b889ab Feb 16, 2026
20 checks passed
@cadmiumcat cadmiumcat deleted the enable-otel-in-prod branch February 16, 2026 10:51
cadmiumcat added a commit that referenced this pull request Mar 12, 2026
cadmiumcat added a commit that referenced this pull request Mar 16, 2026
cadmiumcat added a commit that referenced this pull request Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants