Make Notifications::Fanout faster and safer #44469

jhawthorn · 2022-02-17T20:59:50Z

@seejohnrun and I wrote this on his Twitch stream Tuesday night. Thanks those who helped us in chat 😁

This commit aims to improve ActiveSupport::Notifications::Fanout. There are three main goals here: backwards compatibility, safety, and performance.

Backwards compatibility

ActiveSupport::Notifications is an old and well used interface. Over time it has collected a lot of features and flexibility, much of which I suspect is not used anywhere by anyone, but it is hard to know specifics and we would at minimum need a deprecation cycle to remove any of them.

For this reason this aims to fully maintain compatibility. This includes both the ability to use an alternate notification implementation instead of Fanout, the signatures received by all types of listeners, and the interface used on the Instrumenter and Fanout itself (including the sometimes problematic start/finish).

Safety

There have been issues (both recent and past, ex. #44167 #21873) with the "timestacks" becoming invalid, particularly when subscribing and unsubscribing within events. This is an issue when topics are subscribed/unsubscribed to while they are in flight.

The previous implementation would record a separate timestamp or event object for each listener in a thread local stack. This meant that it was essential that the listeners to start and finish were identical.

This issue is avoided by passing the listeners used to start the event to finish (finish_with_state in the Instrumenter), to ensure they are the same set in start/finish.

This commit further avoids this issue. Instead of pushing individual times onto a stack, we now push a single object, Handle, onto the stack for an event. This object holds all the subscribers (recorded at start time) and all their associated data. This means that as long as start/stop calls are not interleaved they too are now safe. When Instrumenter#instrument is used we can avoid using thread-locals entirely!

This commit also exposes build_handle as a public interface. This returns the Handle object which can have start/stop called at any time and any order safely. The one reservation I have with making this public is that existing "evented" listeners (those receiving start/stop) may not be ready for that (ex. if they maintain an internal thread-local stack).

Performance

This aims to be faster and make fewer allocations then the existing implementation.

For time-based and event-object-based listeners, the previous implementation created a separate object for each listener, pushing and popping it on a thread-local stack. This is slower both because we need to access the thread local repeatedly (hash lookups) and because we're allocating duplicate objects.

The new implementation works by grouping similar types of listeners together and shares either the Event or start/stop times between all of them. The grouping was done so that we didn't need to allocate Events or Times for topics which did have a listener of that type.

This implementation is significantly faster for all cases, except for evented, which is slower.

For topics with 10 subscriptions:

benchmark

# frozen_string_literal: true

require 'benchmark/ips'
require 'active_support/all'

class EventedSubscriber
  def start(name, id, payload); end
  def finish(name, id, payload); end
end

10.times do
  ActiveSupport::Notifications.subscribe("timed") do |name, started, ended, id, payload|
  end

  ActiveSupport::Notifications.monotonic_subscribe("timed_monotonic") do |name, started, ended, id, payload|
  end

  ActiveSupport::Notifications.subscribe("event_object") do |event|
  end

  ActiveSupport::Notifications.subscribe("evented", EventedSubscriber.new)
end

Benchmark.ips do |x|
  x.report("timed") do
    ActiveSupport::Notifications.instrument("timed"){}
  end
  x.report("timed_monotonic") do
    ActiveSupport::Notifications.instrument("timed_monotonic"){}
  end
  x.report("event_object") do
    ActiveSupport::Notifications.instrument("event_object"){}
  end
  x.report("evented") do
    ActiveSupport::Notifications.instrument("evented"){}
  end
  x.report("unsubscribed") do
    ActiveSupport::Notifications.instrument("unsubscribed"){}
  end
end

main:

           timed     66.739k (± 2.5%) i/s -    338.800k in   5.079883s
 timed_monotonic    138.265k (± 0.6%) i/s -    699.261k in   5.057575s
    event_object     48.650k (± 0.2%) i/s -    244.250k in   5.020614s
         evented    366.559k (± 1.0%) i/s -      1.851M in   5.049727s
    unsubscribed      3.696M (± 0.5%) i/s -     18.497M in   5.005335s

This branch:

           timed    259.031k (± 0.6%) i/s -      1.302M in   5.025612s
 timed_monotonic    327.439k (± 1.7%) i/s -      1.665M in   5.086815s
    event_object    228.991k (± 0.3%) i/s -      1.164M in   5.083539s
         evented    296.057k (± 0.3%) i/s -      1.501M in   5.070315s
    unsubscribed      3.670M (± 0.3%) i/s -     18.376M in   5.007095s

This commit aims to improve ActiveSupport::Notifications::Fanout. There are three main goals here: backwards compatibility, safety, and performance. * Backwards compatibility This ActiveSupport::Notifications is an old and well used interface. Over time it has collected a lot of features and flexibility, much of which I suspect is not used anywhere by anyone, but it is hard to know specifics and we would at minimum need a deprecation cycle. For this reason this aims to fully maintain compatibility. This includes both the ability to use an alternate notification implementation instead of Fanout, the signatures received by all types of listeners, and the interface used on the Instrumenter and Fanout itself (including the sometimes problematic start/finish). * Safety There have been issues (both recent and past) with the "timestacks" becoming invalid, particularly when subscribing and unsubscribing within events. This is an issue when topics are subscribed/unsubscribed to while they are in flight. The previous implementation would record a separate timestamp or event object for each listener in a thread local stack. This meant that it was essential that the listeners to start and finish were identical. This issue is avoided by passing the listeners used to `start` the event to `finish` (`finish_with_state` in the Instrumenter), to ensure they are the same set in `start`/`finish`. This commit further avoids this issue. Instead of pushing individual times onto a stack, we now push a single object, `Handle`, onto the stack for an event. This object holds all the subscribers (recorded at start time) and all their associated data. This means that as long as start/stop calls are not interleaved. This commit also exposes `build_handle` as a public interface. This returns the Handle object which can have start/stop called at any time and any order safely. The one reservation I have with making this public is that existing "evented" listeners (those receiving start/stop) may not be ready for that (ex. if they maintain an internal thread-local stack). * Performance This aims to be faster and make fewer allocations then the existing implementation. For time-based and event-object-based listeners, the previous implementation created a separate object for each listener, pushing and popping it on a thread-local stack. This is slower both because we need to access the thread local repeatedly (hash lookups) and because we're allocating duplicate objects. The new implementation works by grouping similar types of listeners together and shares either the `Event` or start/stop times between all of them. The grouping was done so that we didn't need to allocate Events or Times for topics which did have a listener of that type. This implementation is significantly faster for all cases, except for evented, which is slower. For topics with 10 subscriptions: *main*: timed 66.739k (± 2.5%) i/s - 338.800k in 5.079883s timed_monotonic 138.265k (± 0.6%) i/s - 699.261k in 5.057575s event_object 48.650k (± 0.2%) i/s - 244.250k in 5.020614s evented 366.559k (± 1.0%) i/s - 1.851M in 5.049727s unsubscribed 3.696M (± 0.5%) i/s - 18.497M in 5.005335s *This branch*: timed 259.031k (± 0.6%) i/s - 1.302M in 5.025612s timed_monotonic 327.439k (± 1.7%) i/s - 1.665M in 5.086815s event_object 228.991k (± 0.3%) i/s - 1.164M in 5.083539s evented 296.057k (± 0.3%) i/s - 1.501M in 5.070315s unsubscribed 3.670M (± 0.3%) i/s - 18.376M in 5.007095s Co-authored-by: John Crepezzi <john.crepezzi@gmail.com> Co-authored-by: Theo Julienne <theojulienne@github.com>

Previously in rails#43282, which shipped with Rails 7.0, we added the guarantee that if an exception occurred within an ActiveSupport::Notificaitons subscriber that the remaining subscribers would still be run. This was broken in rails#44469, where the different types of subscribers were broken into groups by type for performance. Although we would guard exceptions and allways run all (or none) of the same group, that didn't work cross-group with different types of subscriber.

rails-bot bot added the activesupport label Feb 17, 2022

jhawthorn force-pushed the fanout_groups branch from e5fd553 to 8502be0 Compare February 17, 2022 21:02

jhawthorn changed the title ~~Make Notifier::Fanout faster and safer~~ Make Notifications::Fanout faster and safer Feb 17, 2022

jhawthorn force-pushed the fanout_groups branch from 1913792 to dbf2edb Compare June 2, 2022 01:55

jhawthorn merged commit 693c14c into rails:main Jun 2, 2022

jhawthorn mentioned this pull request Jun 6, 2022

Use AS::N build_handle API in Rails::Rack::Logger #45281

Merged

jhawthorn added this to the 7.1.0 milestone Jun 9, 2022

jhawthorn mentioned this pull request Aug 10, 2022

Optimize AS::LogSubscriber #45796

Merged

jhawthorn mentioned this pull request Sep 6, 2022

Improve handling of overlapping events #44349

Closed

jhawthorn mentioned this pull request Oct 25, 2023

Fix exception guards on multiple subscriber types #49784

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Notifications::Fanout faster and safer #44469

Make Notifications::Fanout faster and safer #44469

jhawthorn commented Feb 17, 2022 •

edited

Make Notifications::Fanout faster and safer #44469

Make Notifications::Fanout faster and safer #44469

Conversation

jhawthorn commented Feb 17, 2022 • edited

Backwards compatibility

Safety

Performance

jhawthorn commented Feb 17, 2022 •

edited