Skip to content

Default overrides ignored when per_tenant_overrides is set #4996

@romainlaurent

Description

@romainlaurent

Describe the bug
The default values specified in the overrides.defaults configuration are not being applied to tenant configurations when using per_tenant_override_config. When a tenant is defined in the override file but doesn't specify certain limits, it appears these values are defaulting to 0 instead of inheriting the default values from the main configuration.

To Reproduce
Steps to reproduce the behavior:

  1. Start Tempo (2.7.1)
  2. Write traces when a default override rate_limit is set but not in the per_tenant override

Expected behavior
When a tenant is defined in the override file but doesn't specify certain limits (like ingestion.rate_limit_bytes), it should inherit the default values from the main configuration (as it is for mimir & loki). In this case, tenant "b4af8459-3937-462b-9886-fd749be4f6ad" should have (see following config):

  • ingestion.rate_limit_bytes = 15000000
  • Other default values as specified in the main configuration

Environment:

  • Infrastructure: Kubernetes
  • Deployment tool: helm

Additional Context
Here is my conf

tempo.yaml: |2
  cache:
    caches:
    - memcached:
        consistent_hash: true
        host: 'tempo-0-memcached'
        service: memcached-client
        timeout: 500ms
      roles:
      - parquet-footer
      - bloom
      - frontend-search
  compactor:
    compaction:
      block_retention: 768h
      compacted_block_retention: 1h
      compaction_cycle: 30s
      compaction_window: 1h
      max_block_bytes: 107374182400
      max_compaction_objects: 6000000
      max_time_per_tenant: 5m
      retention_concurrency: 10
      v2_in_buffer_bytes: 5242880
      v2_out_buffer_bytes: 20971520
      v2_prefetch_traces_count: 1000
    ring:
      kvstore:
        store: memberlist
  distributor:
    receivers:
      jaeger:
        protocols:
          thrift_http:
            endpoint: 0.0.0.0:14268
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
      zipkin:
        endpoint: 0.0.0.0:9411
    ring:
      kvstore:
        store: memberlist
  ingester:
    lifecycler:
      ring:
        kvstore:
          store: memberlist
        replication_factor: 3
      tokens_file_path: /var/tempo/tokens.json
  memberlist:
    abort_if_cluster_join_fails: false
    bind_addr: []
    bind_port: 7946
    cluster_label: 'tempo-0.tempo-0'
    gossip_interval: 1s
    gossip_nodes: 2
    gossip_to_dead_nodes_time: 30s
    join_members:
    - dns+tempo-0-gossip-ring:7946
    leave_timeout: 5s
    left_ingesters_timeout: 5m
    max_join_backoff: 1m
    max_join_retries: 10
    min_join_backoff: 1s
    node_name: ""
    packet_dial_timeout: 5s
    packet_write_timeout: 5s
    pull_push_interval: 30s
    randomize_node_name: true
    rejoin_interval: 0s
    retransmit_factor: 2
    stream_timeout: 10s
  multitenancy_enabled: true
  overrides:
    defaults:
      ingestion:
        burst_size_bytes: 20000000
        max_traces_per_user: 10000
        rate_limit_bytes: 15000000
      read:
        max_bytes_per_tag_values_query: 1000000
    per_tenant_override_config: /etc/runtime-config/tempo-0.yaml
  querier:
    frontend_worker:
      frontend_address: tempo-0-query-frontend-discovery:9095
    max_concurrent_queries: 20
    search:
      query_timeout: 30s
    trace_by_id:
      query_timeout: 10s
  query_frontend:
    max_outstanding_per_tenant: 2000
    max_retries: 2
    metrics:
      concurrent_jobs: 1000
      duration_slo: 0s
      interval: 5m
      max_duration: 3h
      query_backend_after: 30m
      target_bytes_per_job: 104857600
      throughput_bytes_slo: 0
    search:
      concurrent_jobs: 1000
      target_bytes_per_job: 104857600
    trace_by_id:
      query_shards: 50
  server:
    grpc_server_max_recv_msg_size: 4194304
    grpc_server_max_send_msg_size: 4194304
    http_listen_port: 3100
    http_server_read_timeout: 30s
    http_server_write_timeout: 30s
    log_format: logfmt
    log_level: info
  storage:
    trace:
      backend: s3
      blocklist_poll: 5m
      local:
        path: /var/tempo/traces
      pool:
        max_workers: 400
        queue_depth: 20000
      s3:
        access_key: <secret>
        bucket: <secret>
        endpoint: <secret>
        secret_key: <secret>
      wal:
        path: /var/tempo/wal
  usage_report:
    reporting_enabled: false

Here the logs when I push traces to tempo

level=error ts=2025-04-14T16:39:08.7461777Z caller=rate_limited_logger.go:38 msg="pusher failed to consume trace data" err="rpc error: code = ResourceExhausted desc = RATE_LIMITED: ingestion rate limit (loca │
│ l: 0 bytes, global: 0 bytes) exceeded while adding 1611 bytes for user dev"

Here the results on a GET /status/runtime_config?mode=default

GET /status/runtime_config
defaults:
  ingestion:
    rate_strategy: local
    rate_limit_bytes: 15000000
    burst_size_bytes: 20000000
    max_traces_per_user: 10000
  read:
    max_bytes_per_tag_values_query: 1000000
  metrics_generator:
    generate_native_histograms: classic
    ingestion_time_range_slack: 0s
  global:
    max_bytes_per_trace: 5000000
overrides:
  dev:
    compaction:
      block_retention: 1w

And here the results with GET /status/runtime_config?mode=diff

GET /status/runtime_config
overrides:
  dev:
    compaction:
      block_retention: 1w

Thanks you all for your time and the great product you provide !

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions