Adaptive Sampling with OpenTelemetry and AWS X-Ray in Python applications

When we work with telemetry in production environments, we need to balance the cost vs. observability trade-off. Send everything to your tracing backend, and watch your AWS bill explode. Sample too aggressively, and you’ll miss critical errors when you need them most.

The solution? Adaptive sampling with OpenTelemetry and AWS X-Ray. The idea is straightforward: capture 100% of errors (because those are the traces you actually need for debugging) while sampling normal operations at a configurable rate. This approach keeps costs manageable without sacrificing error visibility.

Most distributed tracing implementations force you to choose:

Sample everything: Perfect visibility, impractical costs in production
Fixed sampling rate: Lower costs, but you might miss critical errors
Complex sampling rules: Hard to maintain, easy to misconfigure

I wanted something simpler: intelligent sampling that automatically captures what matters.

Architecture

The setup uses three key components:

graph TD
    App["Python App<br/>(Flask in this example)"]
    Collector["OTEL Collector<br/>(Sidecar)"]
    XRay["AWS X-Ray<br/>Service"]
    
    App -->|OTLP/HTTP| Collector
    Collector -->|AWS X-Ray API| XRay

Why this architecture?

OTLP Collector as middleware: Keeps AWS credentials out of application code
OpenTelemetry SDK: Vendor-neutral instrumentation, can switch backends later
AWS X-Ray backend: Mature tracing service with good visualization and integration

The magic happens in a custom BatchSpanProcessor that inspects span status before export:

class ErrorAwareBatchSpanProcessor(BatchSpanProcessor):
    """
    Span processor with adaptive sampling:
    - Always exports spans with errors (100% coverage)
    - Samples other spans based on configured ratio
    """

    def __init__(self, exporter, sampling_rate: float = 0.05, **kwargs):
        super().__init__(exporter, **kwargs)
        self.sampling_rate = sampling_rate

    def on_end(self, span: ReadableSpan) -> None:
        # Always export spans with errors
        if span.status.status_code == StatusCode.ERROR:
            super().on_end(span)
            return

        # For non-errors, apply sampling rate using trace_id for consistency
        trace_id = span.get_span_context().trace_id
        if (trace_id % 100) < int(self.sampling_rate * 100):
            super().on_end(span)

The key insight: use trace_id % 100 for deterministic sampling. This ensures that if one span in a distributed trace is sampled, related spans across services will also be sampled (assuming same trace_id propagation).

Initialization follows the principle of explicit configuration over implicit behavior:

from core.telemetry import setup_telemetry
from opentelemetry.instrumentation.threading import ThreadingInstrumentor
from opentelemetry.instrumentation.requests import RequestsInstrumentor

setup_telemetry(
    environment="production",
    service_name="my-api",
    xray_enabled=True,
    otlp_endpoint="http://localhost:4318",
    sampling_rate=0.05,  # 5% of normal operations
    instrumentors=[
        ThreadingInstrumentor,
        RequestsInstrumentor,
    ],
)

By default, the library enables only manual instrumentation (trace_operation(), add_span_event(), set_span_attribute()). You must explicitly pass the instrumentor classes you need. This follows the principle: explicit is better than implicit.

If instrumentors is None or [], only manual tracing is available. No automatic instrumentation occurs.

For business logic tracing, use the context manager:

from core.telemetry import trace_operation, add_span_event, set_span_attribute

with trace_operation("process_payment", {"user_id": user.id}):
    add_span_event("validation_started")

    # Your business logic here
    validate_payment(payment_data)

    set_span_attribute("payment_amount", payment_data.amount)
    add_span_event("payment_processed")

Error handling is automatic. If an exception occurs inside the context manager, the span is automatically marked with error status and exported (100% capture rate).

Flask Application Example

Here's a complete working example:

from flask import Flask, jsonify, request
from opentelemetry.instrumentation.flask import FlaskInstrumentor
from opentelemetry.instrumentation.threading import ThreadingInstrumentor
from opentelemetry.instrumentation.requests import RequestsInstrumentor
from core.telemetry import setup_telemetry, trace_operation

app = Flask(__name__)

setup_telemetry(
    environment="production",
    service_name="payment-api",
    xray_enabled=True,
    otlp_endpoint="http://localhost:4318",
    sampling_rate=0.05,
    instrumentors=[
        ThreadingInstrumentor,  # Context propagation for threads
        RequestsInstrumentor,   # Auto-instrument outgoing HTTP calls
    ],
)

# FlaskInstrumentor requires separate app instrumentation
FlaskInstrumentor().instrument_app(app)

@app.post("/api/process")
def process_data():
    data = request.get_json() or {}

    with trace_operation("process_data", {"user_id": data.get("user_id")}):
        # Business logic
        result = perform_processing(data)
        return jsonify(result), 200

if __name__ == "__main__":
    app.run()

Note: FlaskInstrumentor must call .instrument_app(app) separately after setup. It cannot be passed in the instrumentors list like other instrumentors.

The collector acts as a sidecar, handling AWS authentication and buffering:

docker-compose.yml

services:
  otel:
    image: otel/opentelemetry-collector-contrib:latest
    ports:
      - "4318:4318"  # OTLP HTTP
      - "4317:4317"  # OTLP gRPC
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
    environment:
      - AWS_REGION=${AWS_REGION}
      - AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
      - AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}

otel-collector-config.yaml

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:

exporters:
  awsxray:
    region: eu-central-1

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [awsxray]

Validation: The setup_telemetry() function uses Pydantic's @validate_call decorator to validate parameters at runtime. The sampling_rate parameter must be between 0.0 and 1.0, or a ValidationError will be raised.

If you're using background threads (Celery...), ThreadingInstrumentor is not optional. Without it, spans created in threads won't be linked to parent traces:

instrumentors=[
    ThreadingInstrumentor,  # Must be first
    ...
]

The ErrorAwareBatchSpanProcessor ensures all errors are captured. In production, this means:

Every failed request traced (100%)
Every exception traced (100%)
Normal operations sampled at configured rate (e.g., 5%)

This reduces costs while maintaining debugging capability.

Using trace_id % 100 for sampling ensures consistency across distributed traces. If a trace is sampled in Service A, related spans in Service B will also be sampled (assuming proper context propagation).

Production Metrics

In a production API handling ~1M requests/day:

Without adaptive sampling: ~1M spans/day → ~$150/month
With 5% sampling + 100% errors: ~50K normal spans + errors → ~$15/month
Error visibility: 100% (no errors missed)

The cost reduction is significant, and you still capture every error trace.

The included Flask app has several endpoints to demonstrate different scenarios:

# Health check
curl http://localhost:5000/health

# Normal operation (5% sampling)
curl -X POST http://localhost:5000/api/process \
  -H "Content-Type: application/json" \
  -d '{"user_id": "123"}'

# Random failures (demonstrates 100% error capture)
curl http://localhost:5000/api/random

# External API call (auto-instrumented)
curl http://localhost:5000/api/external

# Nested operations
curl http://localhost:5000/api/nested

Check AWS X-Ray console to see traces. Notice that:

All errors appear in X-Ray (100% capture)
Normal operations appear ~5% of the time
Nested spans maintain parent-child relationships

When to Use This

This approach works well when:

You need production observability without exploding costs
Errors are more important than sampling every successful request
You're using AWS infrastructure (X-Ray integrates well with other AWS services)
You want vendor-neutral instrumentation (OpenTelemetry can export to multiple backends)

It's probably overkill for:

Development environments (just sample everything at 100%)
Low-traffic services (sampling won't save much money)
Non-distributed applications (simpler logging might suffice)

And that's all. Adaptive sampling with OpenTelemetry gives you the best of both worlds: comprehensive error visibility and manageable costs.

The complete implementation is available at: github.com/gonzalo123/telemetry

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.docker/otel		.docker/otel
doc		doc
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Adaptive Sampling with OpenTelemetry and AWS X-Ray in Python applications

Architecture

Flask Application Example

Production Metrics

When to Use This

About

Uh oh!

Releases

Packages

Languages

gonzalo123/telemetry

Folders and files

Latest commit

History

Repository files navigation

Adaptive Sampling with OpenTelemetry and AWS X-Ray in Python applications

Architecture

Flask Application Example

Production Metrics

When to Use This

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages