Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tail_sampling 'and' policy with latency and service name regex invert #13929

Closed
knayakar opened this issue Sep 7, 2022 · 6 comments
Closed
Assignees
Labels
bug Something isn't working priority:p2 Medium processor/tailsampling Tail sampling processor Stale

Comments

@knayakar
Copy link

knayakar commented Sep 7, 2022

Describe the bug
A clear and concise description of what the bug is.
In our sampling policy, we are trying to exclude latency related traces for a specific service alone and capture for the rest.
We wrote below policy

policies:
          [
              {
                name: http_status_code,
                type: string_attribute,
                string_attribute: {key: http.status_code,  values: ["4[0-9][0&&2-9]", "4[1-9][0-9]", "5[0-9][0-9]"], enabled_regex_matching: true}
              },
              {
                name: example-latency-and-service-policy,
                type: and,
                and: {
                  and_sub_policy:
                  [
                    {
                      name: example-service-policy,
                      type: string_attribute,
                      string_attribute: { key: service.name, values: [ myapp ], enabled_regex_matching: true, invert_match: true,  }
                    },
                    {
                      name: example-latency-policy,
                      type: latency,
                      latency: {threshold_ms: 5000}
                    }
                  ]
                }
              }
          ]

Steps to reproduce
otel-collector-contrib version: 0.55.0
with below config:

apiVersion: v1
kind: ConfigMap
metadata:
  name: <sampler>
  labels:
    app.kubernetes.io/name: sampler
  namespace: <any-namespace>
data:
  config.yaml: |-
    receivers:
      otlp:
        protocols:
          grpc:
          http:
    processors:
      batch:
        timeout: 10s
      memory_limiter:
        # 80% of maximum memory up to 2G
        limit_mib: 1500
        # 25% of limit up to 2G
        spike_limit_mib: 512
        check_interval: 5s
      tail_sampling:
        decision_wait: 120s
        policies:
          [
              {
                name: http_status_code,
                type: string_attribute,
                string_attribute: {key: http.status_code,  values: ["4[0-9][0&&2-9]", "4[1-9][0-9]", "5[0-9][0-9]"], enabled_regex_matching: true}
              },
              {
                name: latency-and-policy,
                type: and,
                and: {
                  and_sub_policy:
                  [
                    {
                      name: service-policy,
                      type: string_attribute,
                      string_attribute: { key: service.name, values: [ myapp ] , invert_match: true }
                    },
                    {
                      name: latency-policy,
                      type: latency,
                      latency: {threshold_ms: 5000}
                    }
                  ]
                }
              }
          ]
      extensions:
      health_check:
      memory_ballast:
        # Memory Ballast size should be max 1/3 to 1/2 of memory
        size_mib: 683
    exporters:
      jaeger:
        endpoint: <endpoint>:<port> 
        tls:
          insecure: true
        sending_queue:
          enabled: true
          num_consumers: 20
          queue_size: 10000
        retry_on_failure:
          enabled: true
          initial_interval: 10s
          max_interval: 60s
          max_elapsed_time: 10m
        timeout: 5s
      logging:
        loglevel: debug
    service:
      extensions: [health_check, memory_ballast]
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, tail_sampling]
          exporters: [jaeger]

What did you expect to see?
collector starts normally and samples traces with latency greater than 5 seconds and app invert match condition.

What did you see instead?
App crashloopbackoff with logs as below:

2022-09-06T23:31:36.977Z        info    service/service.go:129  Everything is ready. Begin running and processing data.
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x2c8c550]

goroutine 56 [running]:
github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor/internal/sampling.(*And).Evaluate(0x7?, {{0x8a, 0xf7, 0x4b, 0x83, 0x2e, 0x91, 0x4a, 0x6a, 0x3c, ...}}, ...)
        github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor@v0.59.0/internal/sampling/and.go:44 +0x70
github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor.(*tailSamplingSpanProcessor).makeDecision(0xc000cec790, {{0x8a, 0xf7, 0x4b, 0x83, 0x2e, 0x91, 0x4a, 0x6a, 0x3c, ...}}, ...)
        github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor@v0.59.0/processor.go:229 +0x1c4
github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor.(*tailSamplingSpanProcessor).samplingPolicyOnTick(0xc000cec790)
        github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor@v0.59.0/processor.go:178 +0x811
github.com/open-telemetry/opentelemetry-collector-contrib/internal/coreinternal/timeutils.(*PolicyTicker).OnTick(...)
        github.com/open-telemetry/opentelemetry-collector-contrib/internal/coreinternal@v0.59.0/timeutils/ticker_helper.go:56
github.com/open-telemetry/opentelemetry-collector-contrib/internal/coreinternal/timeutils.(*PolicyTicker).Start.func1()
        github.com/open-telemetry/opentelemetry-collector-contrib/internal/coreinternal@v0.59.0/timeutils/ticker_helper.go:47 +0x32
created by github.com/open-telemetry/opentelemetry-collector-contrib/internal/coreinternal/timeutils.(*PolicyTicker).Start
        github.com/open-telemetry/opentelemetry-collector-contrib/internal/coreinternal@v0.59.0/timeutils/ticker_helper.go:43 +0xad

What version did you use?
otel-collector-contrib v0.55.0 and otel-collector-contrib v0.59.0

@knayakar knayakar added the bug Something isn't working label Sep 7, 2022
@knayakar knayakar changed the title sampling and policy with latency and service name regex invert tail_sampling 'and' policy with latency and service name regex invert Sep 7, 2022
@mx-psi mx-psi added the processor/tailsampling Tail sampling processor label Sep 7, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Sep 7, 2022

Pinging code owners: @jpkrohling. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@jpkrohling jpkrohling self-assigned this Sep 8, 2022
@jpkrohling
Copy link
Member

@mottibec, would you be interested in taking a look at this one?

@mottibec
Copy link
Contributor

mottibec commented Sep 8, 2022

@jpkrohling yep, I'm on it you can assign it to me.

@satwika007
Copy link

We noticed that this pull request is addressing our issue - #11505. Can you please take a look @mottibec ?

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Nov 10, 2022
@mottibec
Copy link
Contributor

@jpkrohling we can close this as #11505 fixes this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority:p2 Medium processor/tailsampling Tail sampling processor Stale
Projects
None yet
Development

No branches or pull requests

6 participants