Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread-local storage for Tracer.state #2475

Merged
merged 2 commits into from Mar 7, 2024

Conversation

markiz
Copy link
Contributor

@markiz markiz commented Mar 2, 2024

Overview

Background

We were working on a certain sidekiq job that was generating a large (300+mb) CSV file, and to deal with the growing memory usage, we've implemented streaming. That implementation included using custom Ruby Enumerators.

When we launched it in production, we've noticed that NewRelic no longer traces database calls within the sidekiq job transaction. After some debugging, it turned out that the reason for that was that NewRelic holds its tracer state in the Thread#[...] storage. However, Thread#[...] is actually a fiber-local storage, and our implementation specifically used Enumerator#next, which spawns a new Fiber. As a result, NewRelic lost track of the current tracer state and attributed everything within that enumerator to CPU burn.

Support case where this issue originated is at https://support.newrelic.com/s/case-details?caseId=500Ph000007YRJX

Proposed Solution

This PR includes a small new abstraction NewRelic::ThreadLocalStorage that is used instead of the Thread#[...] where applicable. Its behavior is controlled by the new configuration option: thread_local_tracer_state (default: false) — when false, keeps the original behavior, when true, uses Thread#thread_variable_get and Thread#thread_variable_set in place of Thread#[] and Thread#[]= respectively. I'm not 100% sure how safe it would be to use the new behavior across the board, because there could be fiber-based server implementations that also use NewRelic and rely on that storage being fiber-local, hence the configuration option.

There is also one spot where I wasn't entirely sure whether a change from Thread.current[] to ThreadLocalStorage[] makes sense (NewRelic::Agent::Instrumentation::NotificationsSubscriber#segment_stack), so I haven't touched that.

Submitter Checklist:

  • Include a link to the related GitHub issue, if applicable
  • Include a security review link, if applicable

Reviewer Checklist

  • Perform code review
  • Add performance label
  • Perform appropriate level of performance testing
  • Confirm all checks passed
  • Add version label prior to acceptance

@CLAassistant
Copy link

CLAassistant commented Mar 2, 2024

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added the community To tag external issues and PRs submitted by the community label Mar 2, 2024
@kford-newrelic kford-newrelic added the estimate Issue needing estimation label Mar 4, 2024
@markiz
Copy link
Contributor Author

markiz commented Mar 6, 2024

Hey, regarding that test failure: so the test basically makes Agent.config throw an exception which doesn't get logged. I could code around that defensively, like this (the tests are passing on my machine after the change):

diff --git a/lib/new_relic/thread_local_storage.rb b/lib/new_relic/thread_local_storage.rb
index 0d3e161cb..2f2766ba9 100644
--- a/lib/new_relic/thread_local_storage.rb
+++ b/lib/new_relic/thread_local_storage.rb
@@ -5,7 +5,7 @@
 module NewRelic
   module ThreadLocalStorage
     def self.get(thread, key)
-      if Agent.config[:thread_local_tracer_state]
+      if use_thread_local_tracer_state?
         thread.thread_variable_get(key)
       else
         thread[key]
@@ -13,7 +13,7 @@ module NewRelic
     end

     def self.set(thread, key, value)
-      if Agent.config[:thread_local_tracer_state]
+      if use_thread_local_tracer_state?
         thread.thread_variable_set(key, value)
       else
         thread[key] = value
@@ -27,5 +27,11 @@ module NewRelic
     def self.[]=(key, value)
       set(::Thread.current, key, value)
     end
+
+    def self.use_thread_local_tracer_state?
+      Agent.config[:thread_local_tracer_state]
+    rescue StandardError
+      false
+    end
   end
 end

... or we could change the test itself, what do you think?

@markiz
Copy link
Contributor Author

markiz commented Mar 6, 2024

Here's a version that changes the test rather than the code:

--- a/test/multiverse/suites/rake/instrumentation_test.rb
+++ b/test/multiverse/suites/rake/instrumentation_test.rb
@@ -57,7 +57,7 @@ class RakeInstrumentationTest < Minitest::Test
     NewRelic::Agent::Instrumentation::Rake.stub :should_trace?, true, [instance.name] do
       error = RuntimeError.new('expected')
       # produce the exception we want to have the method rescue
-      NewRelic::Agent.stub :config, -> { raise error } do
+      NewRelic::Agent.stub :instance, -> { raise error } do
         logger = MiniTest::Mock.new
         NewRelic::Agent.stub :logger, logger do
           logger.expect :error, nil, [/^Exception/, error]

Also makes the tests pass on my machine. I like this one better — as long as we generally don't expect Agent.config to actually raise exceptions.

@@ -9,8 +9,7 @@ class AgentThread
def self.create(label, &blk)
::NewRelic::Agent.logger.debug("Creating AgentThread: #{label}")
wrapped_blk = proc do
if ::Thread.current[:newrelic_tracer_state] && Thread.current[:newrelic_tracer_state].current_transaction
txn = ::Thread.current[:newrelic_tracer_state].current_transaction
if (txn = ::NewRelic::ThreadLocalStorage[:newrelic_tracer_state]&.current_transaction)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice syntax update!

@hannahramadan
Copy link
Contributor

Hi @markiz—thanks for your PR! This is a great improvement to how we handle thread state. Thanks for the extra effort of putting this change behind a configuration and adding tests.

For the failing test, we like your second option better too (changing the test to stubbing :instance). Once that test is updated, we will be happy to get this approved and merged!

@markiz
Copy link
Contributor Author

markiz commented Mar 7, 2024

@hannahramadan alright, I've amended the MR, let's check it out

Copy link
Contributor

@kaylareopelle kaylareopelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @markiz!

@hannahramadan hannahramadan merged commit 6be912c into newrelic:dev Mar 7, 2024
25 checks passed
@hannahramadan hannahramadan mentioned this pull request Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community To tag external issues and PRs submitted by the community estimate Issue needing estimation
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

5 participants