Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Notifications::Fanout faster and safer #44469

Merged
merged 1 commit into from
Jun 2, 2022

Conversation

jhawthorn
Copy link
Member

@jhawthorn jhawthorn commented Feb 17, 2022

@seejohnrun and I wrote this on his Twitch stream Tuesday night. Thanks those who helped us in chat 😁

This commit aims to improve ActiveSupport::Notifications::Fanout. There are three main goals here: backwards compatibility, safety, and performance.

Backwards compatibility

ActiveSupport::Notifications is an old and well used interface. Over time it has collected a lot of features and flexibility, much of which I suspect is not used anywhere by anyone, but it is hard to know specifics and we would at minimum need a deprecation cycle to remove any of them.

For this reason this aims to fully maintain compatibility. This includes both the ability to use an alternate notification implementation instead of Fanout, the signatures received by all types of listeners, and the interface used on the Instrumenter and Fanout itself (including the sometimes problematic start/finish).

Safety

There have been issues (both recent and past, ex. #44167 #21873) with the "timestacks" becoming invalid, particularly when subscribing and unsubscribing within events. This is an issue when topics are subscribed/unsubscribed to while they are in flight.

The previous implementation would record a separate timestamp or event object for each listener in a thread local stack. This meant that it was essential that the listeners to start and finish were identical.

This issue is avoided by passing the listeners used to start the event to finish (finish_with_state in the Instrumenter), to ensure they are the same set in start/finish.

This commit further avoids this issue. Instead of pushing individual times onto a stack, we now push a single object, Handle, onto the stack for an event. This object holds all the subscribers (recorded at start time) and all their associated data. This means that as long as start/stop calls are not interleaved they too are now safe. When Instrumenter#instrument is used we can avoid using thread-locals entirely!

This commit also exposes build_handle as a public interface. This returns the Handle object which can have start/stop called at any time and any order safely. The one reservation I have with making this public is that existing "evented" listeners (those receiving start/stop) may not be ready for that (ex. if they maintain an internal thread-local stack).

Performance

This aims to be faster and make fewer allocations then the existing implementation.

For time-based and event-object-based listeners, the previous implementation created a separate object for each listener, pushing and popping it on a thread-local stack. This is slower both because we need to access the thread local repeatedly (hash lookups) and because we're allocating duplicate objects.

The new implementation works by grouping similar types of listeners together and shares either the Event or start/stop times between all of them. The grouping was done so that we didn't need to allocate Events or Times for topics which did have a listener of that type.

This implementation is significantly faster for all cases, except for evented, which is slower.

For topics with 10 subscriptions:

benchmark
# frozen_string_literal: true

require 'benchmark/ips'
require 'active_support/all'

class EventedSubscriber
  def start(name, id, payload); end
  def finish(name, id, payload); end
end

10.times do
  ActiveSupport::Notifications.subscribe("timed") do |name, started, ended, id, payload|
  end

  ActiveSupport::Notifications.monotonic_subscribe("timed_monotonic") do |name, started, ended, id, payload|
  end

  ActiveSupport::Notifications.subscribe("event_object") do |event|
  end

  ActiveSupport::Notifications.subscribe("evented", EventedSubscriber.new)
end

Benchmark.ips do |x|
  x.report("timed") do
    ActiveSupport::Notifications.instrument("timed"){}
  end
  x.report("timed_monotonic") do
    ActiveSupport::Notifications.instrument("timed_monotonic"){}
  end
  x.report("event_object") do
    ActiveSupport::Notifications.instrument("event_object"){}
  end
  x.report("evented") do
    ActiveSupport::Notifications.instrument("evented"){}
  end
  x.report("unsubscribed") do
    ActiveSupport::Notifications.instrument("unsubscribed"){}
  end
end

main:

           timed     66.739k (± 2.5%) i/s -    338.800k in   5.079883s
 timed_monotonic    138.265k (± 0.6%) i/s -    699.261k in   5.057575s
    event_object     48.650k (± 0.2%) i/s -    244.250k in   5.020614s
         evented    366.559k (± 1.0%) i/s -      1.851M in   5.049727s
    unsubscribed      3.696M (± 0.5%) i/s -     18.497M in   5.005335s

This branch:

           timed    259.031k (± 0.6%) i/s -      1.302M in   5.025612s
 timed_monotonic    327.439k (± 1.7%) i/s -      1.665M in   5.086815s
    event_object    228.991k (± 0.3%) i/s -      1.164M in   5.083539s
         evented    296.057k (± 0.3%) i/s -      1.501M in   5.070315s
    unsubscribed      3.670M (± 0.3%) i/s -     18.376M in   5.007095s

@jhawthorn jhawthorn changed the title Make Notifier::Fanout faster and safer Make Notifications::Fanout faster and safer Feb 17, 2022
This commit aims to improve ActiveSupport::Notifications::Fanout. There
are three main goals here: backwards compatibility, safety, and
performance.

* Backwards compatibility

This ActiveSupport::Notifications is an old and well used interface.
Over time it has collected a lot of features and flexibility, much of
which I suspect is not used anywhere by anyone, but it is hard to know
specifics and we would at minimum need a deprecation cycle.

For this reason this aims to fully maintain compatibility. This includes
both the ability to use an alternate notification implementation instead
of Fanout, the signatures received by all types of listeners, and the
interface used on the Instrumenter and Fanout itself (including the
sometimes problematic start/finish).

* Safety

There have been issues (both recent and past) with the "timestacks"
becoming invalid, particularly when subscribing and unsubscribing within
events. This is an issue when topics are subscribed/unsubscribed to
while they are in flight.

The previous implementation would record a separate timestamp or event
object for each listener in a thread local stack. This meant that it was
essential that the listeners to start and finish were identical.

This issue is avoided by passing the listeners used to `start` the event
to `finish` (`finish_with_state` in the Instrumenter), to ensure they
are the same set in `start`/`finish`.

This commit further avoids this issue. Instead of pushing individual
times onto a stack, we now push a single object, `Handle`, onto the
stack for an event. This object holds all the subscribers (recorded at
start time) and all their associated data. This means that as long as
start/stop calls are not interleaved.

This commit also exposes `build_handle` as a public interface. This
returns the Handle object which can have start/stop called at any time
and any order safely. The one reservation I have with making this public
is that existing "evented" listeners (those receiving start/stop) may
not be ready for that (ex. if they maintain an internal thread-local
stack).

* Performance

This aims to be faster and make fewer allocations then the existing
implementation.

For time-based and event-object-based listeners, the previous
implementation created a separate object for each listener, pushing
and popping it on a thread-local stack. This is slower both because we
need to access the thread local repeatedly (hash lookups) and because
we're allocating duplicate objects.

The new implementation works by grouping similar types of listeners
together and shares either the `Event` or start/stop times between all
of them. The grouping was done so that we didn't need to allocate Events
or Times for topics which did have a listener of that type.

This implementation is significantly faster for all cases, except for
evented, which is slower.

For topics with 10 subscriptions:

*main*:

               timed     66.739k (± 2.5%) i/s -    338.800k in   5.079883s
     timed_monotonic    138.265k (± 0.6%) i/s -    699.261k in   5.057575s
        event_object     48.650k (± 0.2%) i/s -    244.250k in   5.020614s
             evented    366.559k (± 1.0%) i/s -      1.851M in   5.049727s
        unsubscribed      3.696M (± 0.5%) i/s -     18.497M in   5.005335s

*This branch*:

               timed    259.031k (± 0.6%) i/s -      1.302M in   5.025612s
     timed_monotonic    327.439k (± 1.7%) i/s -      1.665M in   5.086815s
        event_object    228.991k (± 0.3%) i/s -      1.164M in   5.083539s
             evented    296.057k (± 0.3%) i/s -      1.501M in   5.070315s
        unsubscribed      3.670M (± 0.3%) i/s -     18.376M in   5.007095s

Co-authored-by: John Crepezzi <john.crepezzi@gmail.com>
Co-authored-by: Theo Julienne <theojulienne@github.com>
@jhawthorn jhawthorn merged commit 693c14c into rails:main Jun 2, 2022
@jhawthorn jhawthorn added this to the 7.1.0 milestone Jun 9, 2022
jhawthorn added a commit to jhawthorn/rails that referenced this pull request Oct 25, 2023
Previously in rails#43282, which shipped
with Rails 7.0, we added the guarantee that if an exception occurred
within an ActiveSupport::Notificaitons subscriber that the remaining
subscribers would still be run.

This was broken in rails#44469, where the
different types of subscribers were broken into groups by type for
performance. Although we would guard exceptions and allways run all (or
none) of the same group, that didn't work cross-group with different
types of subscriber.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant