Skip to content

Fix reëntrant events/spans escaping per-layer filters (round 2)#2

Merged
Madel​eine Mortensen (maddiemort) merged 7 commits intodittofrom
sm/re-entrant-per-layer-escape-2
Jun 20, 2024
Merged

Fix reëntrant events/spans escaping per-layer filters (round 2)#2
Madel​eine Mortensen (maddiemort) merged 7 commits intodittofrom
sm/re-entrant-per-layer-escape-2

Conversation

@maddiemort
Copy link
Copy Markdown
Member

Replaces #1, still a fork-internal PR. Still motivated by getditto/ditto#13306, still see tokio-rs#2704 for context & discussion.

This fix includes two major changes, the second of which was also in #1:

  • When a per-layer filter is asked to check whether it enabled or disabled an event/span, it clears a bit in the filter state to indicate that it has done so. The next time that function (did_enable()) is called, it will check that bit, see that the filter state has been cleared, and recalculate the enablement status based on the current event/span's metadata. In that way, the filter state is now treated as a cache, which is invalidated when we would have read from stale state. Reëntrant events can therefore no longer clobber the filter state.
  • Filtered layers, which are layers that have a per-layer filter attached, no longer ever store their interest as "always", instead knocking down an "always" interest to "sometimes". This likely represents a performance regression, but it's a necessary one in order to force the reëvaluation of enabled() for every span/event - if we use "always" and short-circuit that call, we see incorrect behavior for some kinds of reëntrant event scenarios (such as the new test enabled_event_inside_event)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not gonna lie, it took me about 4 complete read throughs before I feel like I really grokked it but I feel confident I do now 😅

Nicely done getting this solved - it makes sense to me 👍


At least once I got hung up on the fact that there's self.enabled which then has a disabled bitmask. My simple brain was struggling to reconcile that for a while

@maddiemort
Copy link
Copy Markdown
Member Author

At least once I got hung up on the fact that there's self.enabled which then has a disabled bitmask. My simple brain was struggling to reconcile that for a while

Oh, thanks for saying something about that - that's just a missed rename from when I basically re-applied that part of the diff from the old PR, but from inside my brain. I'll rename enabled to something like filter_map.

@maddiemort Madel​eine Mortensen (maddiemort) deleted the sm/re-entrant-per-layer-escape-2 branch June 20, 2024 16:03
@maddiemort Madel​eine Mortensen (maddiemort) restored the sm/re-entrant-per-layer-escape-2 branch June 20, 2024 16:03
Nick Robinson (nickrobinson) pushed a commit that referenced this pull request Apr 29, 2026
…irectives (tokio-rs#3487)

## Motivation

`EnvFilter` maintains two `RwLock`-protected `HashMap`s (`by_id` and `by_cs`)
for tracking per-span field-level match state. These are only populated when
`has_dynamics` is `true` — i.e., when the filter includes span-name or
field-level directives like `my_crate[span_name]=debug`.

However, the span lifecycle callbacks (`on_new_span`, `on_enter`, `on_exit`,
`on_close`, `on_record`) unconditionally acquire the `by_id` read lock (or write
lock for `on_close`) on every invocation, even when `has_dynamics` is `false`
and the maps are guaranteed empty.

For the common case of pure `target=level` directives (e.g.
`RUST_LOG=info,hyper=warn`), these locks are acquired and released on every span
enter/exit/close/record with no effect.

## Problem

Under high concurrency this causes measurable lock contention. A GDB backtrace
captured from a Tokio-based HTTP server (80+ async workers, 3 `EnvFilter`
instances as per-layer filters) during a load test showed ~25 threads contending
on `RwLock::read` inside `EnvFilter::on_enter`, `on_exit`, and
`cares_about_span`, with threads blocked in `read_contended` by `on_close`'s
write lock:

``` #0 std::sys::sync::rwlock::futex::RwLock::read_contended #1 RwLock::read
(self=0x55e0c7694178) #2 RwLock<HashMap<SpanId, MatchSet<SpanMatch>>>::read #3
EnvFilter::on_enter at filter/env/mod.rs:585 ```

## Solution

Add an early `if !self.has_dynamics { return; }` guard to each of the five span
lifecycle methods. When dynamic directives are present, the existing code path
runs unchanged. When they're absent (the common case), the lock is never
touched.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants