Allow denylisting in the string-attribute-filter #30155

majelbstoat · 2019-08-02T17:57:39Z

Currently, the string-attribute-filter in OpenCensus collector allows spans to be exported if they match specified strings, i.e. whitelisted. It is, however, currently not possible to create a blacklist.

One specific use-case for this is to exclude traces for grpc.health.v1.HealthCheck, which are a standard way of adding liveness and readiness endpoints to GRPC servers, but which add lots of noise to trace lists. (The current SpanContext for OpenCensus does not include the span name, so exclusion at that point is not possible either.)

From a flexibility point of view, it would be ideal if there was some sort of simple syntax for including or excluding partial matches too, whether through regexes, globs etc, though latency might be a consideration. At a minimum though, the ability to blacklist certain strings would be great.

Per @songy23, this blocked by open-telemetry/opentelemetry-collector#221.

The text was updated successfully, but these errors were encountered:

ccaraman · 2019-08-07T22:40:45Z

When this does get implemented, please use the term denylist or blockedlist instead of blacklist. Thank you!

majelbstoat · 2019-08-07T22:41:49Z

Ah, yes! Thank you, title updated. (Description kept to retain context.)

pedro-ribeiro · 2019-10-21T00:33:44Z

Hi. Do we have any suggestions on how the configuration could look like for this?

majelbstoat · 2019-11-19T23:10:27Z

here's a starting suggestion based on testdata examples for the tailsampling and attributes processors:

processors:
  tail_sampling:
    decision_wait: 10s
    num_traces: 100
    expected_new_traces_per_sec: 10
    policies:
      include:
        numeric_attribute:
          - key: httpRequest.latency
          - min_value: 1000
       exclude:
         string_attribute:
           - key: route
              values:
                - "/v[0-9]+\.Health\.Check$/"
                - "/^admin/"

This example would include for sampling any span that had a request latency of greater than 1 second. It would exclude anything on a route matching the specified regexes, like grpc.health.v1.Health.Check or admin/dashboard. Include takes precedence over exclude, so slow requests are logged even if they match the exclusions. And any span with a parent that is already sampled should be sampled, as is currently the case.

In general, it might be nice to harmonise the config for these processors as much as possible. The AttributesProcessor runs a pre-process stage that decides whether or not to perform any actions. In this it is quite similar to the TailSamplingProcessor, only differing in what action to take (update/add a key/value vs send/drop the trace). The config to determine to perform this very similar filtering is very different though.

If you're amenable to the spirit of this, I'm happy to have a go at modifying the StringAttributeProcessor to support it. And then we could decide if it makes sense to extract the logic to work for other processors.

Add missing unit tests to the currently implemented attribute filters (`stringAttributeFilter` and `numericAttributeFilter`) **Link to tracking Issue:** This can be considered part of #226

mxiamxia · 2021-05-06T05:34:25Z

Hi @majelbstoat , I am not sure if it is a good idea to have denylist in tailsamplingprocessor, we may only want to keep the sampling processor in a simple responsibility that only take care of sampling job on the interested data. If this issue is to ignore the noise tracing data. The attributesprocessor should perfectly handle this job.
https://github.com/open-telemetry/opentelemetry-collector/blob/66d460700e7fd8d316c15cc7bb815f9ddb353649/internal/processor/filterconfig/config.go#L45

I also sent a PR to enhance string-attribute-filter config in tailsamplingprocessor to support regex rules but I don't feel it's a good idea to add exclude | denylist in string-attribute-filter

spaletta · 2021-05-14T12:02:21Z

Hi @mxiamxia, from my understanding, filtering with the attributes processor (or the span processor) can not achieve the same effect as filtering in the tailsampling processor. Attributes and span processors can include or exclude individual spans based on their individual properties, while filtering in the tailsampling processor includes or excludes whole traces based on any of the comprising spans.
Noise rarely comes as individual spans only, and so this is a crucial difference. The insufficient filtering capabilities for complete traces (which arguably could be separated from the tailsampling processor) have required me to run a patched version of the tailsampling processor with inverted matching logic at $work for several months now.

alolita · 2021-05-21T20:39:39Z

@mxiamxia are you working on this issue? Will re-assign otherwise. Lmk. Thx.

alolita · 2021-09-15T23:25:19Z

@mxiamxia are you still planning to submit a PR for this issue? Else I will re-assign. Let me know. Thx.

shivanshuraj1333 · 2021-10-16T15:01:04Z

I think the issue isn't progressing, I'd love to come up with a fix for this.
@alolita can I work on it?
Thanks!!

* api(trace): change trace id to byte array. * fix lint errors * add helper to create trace id from hex and improve stdout exporter. * remove comma. * fix lint * change TraceIDFromHex to be compliant with w3 trace-context * revert remove of hex16 regex because its used to parse SpanID * lint * fix typo

Chinwendu20 · 2022-10-13T14:51:48Z

Heyy, is this issue still valid?

mattsains · 2022-11-09T17:47:54Z

I was looking into this issue and I have some questions.

Firstly, I was wondering if having denylisting is functionally the same as string_attribute's invert_match field: #4393

I think you can rewrite @majelbstoat 's example like this: (which predates the invert_match feature btw)

processors:
  tail_sampling:
    decision_wait: 10s
    num_traces: 100
    expected_new_traces_per_sec: 10
    policies:
      - numeric_attribute:
          key: httpRequest.latency
          min_value: 1000
      - string_attribute:
          key: route
          values:
            - "/v[0-9]+\.Health\.Check$/"
            - "/^admin/"
          invert_match: true  // <--- inverts the rule and turns it into a deny list

Currently, it seems that invert_match is only available for string_attribute, but it seems like we could add it to the other policy types as well.

The benefit of doing this instead of adding include/exclude fields is that it's backward compatible (moving policies into a new attribute — ie., into include — might be disruptive to existing processor configs). I also feel like people might get confused with two ways to negate a rule if we added an exclude list.

I'm new to this project, so let me know if I've misunderstood something. Would also appreciate some direction on what to do next. Thanks!

github-actions · 2024-02-20T03:29:17Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

majelbstoat changed the title ~~Allow blacklisting in the string-attribute-filter~~ Allow denylisting in the string-attribute-filter Aug 7, 2019

songy23 added the good first issue Good for newcomers label Aug 16, 2019

This was referenced Oct 9, 2019

fixing policy names in README example open-telemetry/opentelemetry-collector#385

Closed

fixing policy names in README example open-telemetry/opentelemetry-collector#386

Closed

tomaszpiekarczyk referenced this issue in tomaszpiekarczyk/opentelemetry-collector Jun 26, 2020

Add unit tests to attribute filters (#226)

d169d90

tomaszpiekarczyk mentioned this issue Jun 26, 2020

Add unit tests to attribute filters (#226) open-telemetry/opentelemetry-collector#1204

Merged

tigrannajaryan added this to the Backlog milestone Jul 30, 2020

tigrannajaryan added the help wanted Extra attention is needed label Jul 30, 2020

mxiamxia mentioned this issue May 6, 2021

Enrich tailsamplingprocessor string-attribute-filter to support regex on span attribute filtering #3335

Merged

alolita assigned shivanshuraj1333 Oct 28, 2021

hughesjj referenced this issue in hughesjj/opentelemetry-collector Apr 27, 2023

Update sizing guidelines (#226)

27a03de

mx-psi transferred this issue from open-telemetry/opentelemetry-collector Dec 21, 2023

github-actions bot added the Stale label Feb 20, 2024

atoulme closed this as completed Apr 4, 2024

atoulme closed this as not planned Won't fix, can't repro, duplicate, stale Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow denylisting in the string-attribute-filter #30155

Allow denylisting in the string-attribute-filter #30155

majelbstoat commented Aug 2, 2019

ccaraman commented Aug 7, 2019

majelbstoat commented Aug 7, 2019 •

edited

pedro-ribeiro commented Oct 21, 2019

majelbstoat commented Nov 19, 2019

mxiamxia commented May 6, 2021

spaletta commented May 14, 2021

alolita commented May 21, 2021

alolita commented Sep 15, 2021

shivanshuraj1333 commented Oct 16, 2021

Chinwendu20 commented Oct 13, 2022

mattsains commented Nov 9, 2022 •

edited

github-actions bot commented Feb 20, 2024

Allow denylisting in the string-attribute-filter #30155

Allow denylisting in the string-attribute-filter #30155

Comments

majelbstoat commented Aug 2, 2019

ccaraman commented Aug 7, 2019

majelbstoat commented Aug 7, 2019 • edited

pedro-ribeiro commented Oct 21, 2019

majelbstoat commented Nov 19, 2019

mxiamxia commented May 6, 2021

spaletta commented May 14, 2021

alolita commented May 21, 2021

alolita commented Sep 15, 2021

shivanshuraj1333 commented Oct 16, 2021

Chinwendu20 commented Oct 13, 2022

mattsains commented Nov 9, 2022 • edited

github-actions bot commented Feb 20, 2024

majelbstoat commented Aug 7, 2019 •

edited

mattsains commented Nov 9, 2022 •

edited