Pass runtime state instead of repeatedly fetching it from Thread.current #4621

rmosolgo · 2023-09-06T14:46:36Z

I ran the benchmarks with YJIT turned on and found that Thread#[] was one of the top methods. I thought I might minimize the hits to that by refactoring a bit. In the end, the change was negligible. I think the reason is because, with YJIT running, most of the runtime ends in "misses," which I'm guessing is optimized code that stackprof can't see.

Large Result Benchmark

Before, YJIT (9.453 i/s)

Calculating -------------------------------------
Querying for 1000 objects
                        9.453  (±10.6%) i/s -     95.000  in  10.096298s
==================================
  Mode: wall(1)
  Samples: 324246 (71.59% miss rate)
  GC: 11313 (3.49%)
==================================
   TOTAL    (pct)     SAMPLES    (pct)     FRAME
   25950   (8.0%)       25950   (8.0%)     Kernel#class
   20545   (6.3%)       20545   (6.3%)     Thread#[]
   19958   (6.2%)       19958   (6.2%)     GraphQL::Execution::Lazy#initialize
   13375   (4.1%)       13375   (4.1%)     Kernel#hash
   11176   (3.4%)       11176   (3.4%)     GraphQL::Execution::Interpreter::Runtime#dead_result?
   30562   (9.4%)       10012   (3.1%)     GraphQL::Execution::Interpreter::Runtime#get_current_runtime_state
  303235  (93.5%)        9393   (2.9%)     GraphQL::Execution::Interpreter::Runtime#after_lazy
    9290   (2.9%)        9290   (2.9%)     GraphQL::Schema::Member::HasArguments#own_arguments
    9076   (2.8%)        9076   (2.8%)     GraphQL::Schema.lazy_methods
    8739   (2.7%)        8739   (2.7%)     Kernel#respond_to?

Before, No YJIT (4.943 i/s)

Calculating -------------------------------------
  Querying for 1000 objects
                        4.943  (± 0.0%) i/s -     50.000  in  10.147070s
  ==================================
    Mode: wall(1)
    Samples: 167739 (56.34% miss rate)
    GC: 7384 (4.40%)
  ==================================
   TOTAL    (pct)     SAMPLES    (pct)     FRAME
  104604  (62.4%)        9047   (5.4%)     GraphQL::Execution::Interpreter::Runtime#continue_field
    6650   (4.0%)        6650   (4.0%)     (sweeping)
  118832  (70.8%)        6473   (3.9%)     GraphQL::Execution::Interpreter::Runtime#after_lazy
   74223  (44.2%)        6092   (3.6%)     GraphQL::Execution::Interpreter::Runtime#evaluate_selection_with_resolved_keyword_args
  103127  (61.5%)        5410   (3.2%)     GraphQL::Execution::Interpreter::Runtime#resolve_list_item
    6590   (3.9%)        4702   (2.8%)     GraphQL::Execution::Interpreter::Runtime#lazy?
   82289  (49.1%)        4512   (2.7%)     GraphQL::Execution::Interpreter::Runtime#evaluate_selection
    7029   (4.2%)        4276   (2.5%)     GraphQL::Execution::Interpreter::Runtime#get_current_runtime_state
   19960  (11.9%)        4163   (2.5%)     GraphQL::Schema::Field#resolve
    3549   (2.1%)        3549   (2.1%)     GraphQL::Execution::Lazy#initialize
    8347   (5.0%)        3211   (1.9%)     GraphQL::Execution::Interpreter::Runtime#set_result
   14259   (8.5%)        2905   (1.7%)     GraphQL::Schema::Field#with_extensions
  121394  (72.4%)        2771   (1.7%)     Array#each

After, YJIT (9.558 i/s)

Calculating -------------------------------------
Querying for 1000 objects
                        9.558  (±10.5%) i/s -     96.000  in  10.085145s
==================================
  Mode: wall(1)
  Samples: 299263 (71.11% miss rate)
  GC: 11417 (3.82%)
==================================
   TOTAL    (pct)     SAMPLES    (pct)     FRAME
   29462   (9.8%)       29462   (9.8%)     Kernel#class
   17072   (5.7%)       17072   (5.7%)     GraphQL::Execution::Lazy#initialize
   13158   (4.4%)       13158   (4.4%)     Kernel#hash
   11492   (3.8%)       11492   (3.8%)     Thread#[]
    8537   (2.9%)        8537   (2.9%)     GraphQL::Schema.lazy_methods
   35112  (11.7%)        8318   (2.8%)     GraphQL::Execution::Interpreter::Runtime#lazy?
    8161   (2.7%)        8161   (2.7%)     Kernel#respond_to?
    8101   (2.7%)        8101   (2.7%)     GraphQL::Schema::Member::HasArguments#own_arguments

After, No YJIT (5.188 i/s)

 Calculating -------------------------------------
Querying for 1000 objects
                        5.188  (± 0.0%) i/s -     52.000  in  10.032891s
==================================
  Mode: wall(1)
  Samples: 162494 (56.02% miss rate)
  GC: 9416 (5.79%)
==================================
   TOTAL    (pct)     SAMPLES    (pct)     FRAME
    8440   (5.2%)        8440   (5.2%)     (sweeping)
   99386  (61.2%)        8373   (5.2%)     GraphQL::Execution::Interpreter::Runtime#continue_field
  112920  (69.5%)        6398   (3.9%)     GraphQL::Execution::Interpreter::Runtime#after_lazy
   66843  (41.1%)        5444   (3.4%)     GraphQL::Execution::Interpreter::Runtime#evaluate_selection_with_resolved_keyword_args
   97897  (60.2%)        4906   (3.0%)     GraphQL::Execution::Interpreter::Runtime#resolve_list_item
    6704   (4.1%)        4662   (2.9%)     GraphQL::Execution::Interpreter::Runtime#lazy?

Introspection Benchmark

Before, YJIT (6.369 (± 0.0%) i/s)

Calculating -------------------------------------
Run large introspection
                        6.369  (± 0.0%) i/s -     64.000  in  10.065125s
==================================
Mode: wall(1000)
Samples: 161 (4.17% miss rate)
GC: 4 (2.48%)
==================================
   TOTAL    (pct)     SAMPLES    (pct)     FRAME
      22  (13.7%)          22  (13.7%)     Kernel#class
      13   (8.1%)          13   (8.1%)     Kernel#hash
      57  (35.4%)           8   (5.0%)     GraphQL::Schema::Field#resolve
       8   (5.0%)           8   (5.0%)     GraphQL::Execution::Interpreter::Runtime#dead_result?
      12   (7.5%)           6   (3.7%)     GraphQL::Schema::Warden#get_field
      28  (17.4%)           6   (3.7%)     GraphQL::Execution::Interpreter::Runtime#lazy?
       6   (3.7%)           6   (3.7%)     GraphQL::Schema::Member::HasArguments#own_arguments
       4   (2.5%)           4   (2.5%)     GraphQL::Schema::EnumValue#value
       4   (2.5%)           4   (2.5%)     Thread#[]
       4   (2.5%)           4   (2.5%)     Kernel#respond_to?
       4   (2.5%)           4   (2.5%)     GraphQL::Schema::Field#type

Before, No YJIT (2.766 (± 0.0%) i/s)

Calculating -------------------------------------
Run large introspection
                        2.766  (± 0.0%) i/s -     28.000  in  10.129171s
==================================
Mode: wall(1000)
Samples: 375 (1.57% miss rate)
GC: 3 (0.80%)
==================================
   TOTAL    (pct)     SAMPLES    (pct)     FRAME
     369  (98.4%)          42  (11.2%)     GraphQL::Execution::Interpreter::Runtime#evaluate_selection_with_resolved_keyword_args
      98  (26.1%)          41  (10.9%)     GraphQL::Schema::Field#resolve
     369  (98.4%)          28   (7.5%)     GraphQL::Execution::Interpreter::Runtime#continue_field
     369  (98.4%)          28   (7.5%)     GraphQL::Execution::Interpreter::Runtime#evaluate_selection
      16   (4.3%)          13   (3.5%)     GraphQL::Execution::Interpreter::Runtime#get_current_runtime_state
      91  (24.3%)          12   (3.2%)     GraphQL::Query#after_lazy
      13   (3.5%)          11   (2.9%)     GraphQL::Execution::Interpreter::Runtime#lazy?
     369  (98.4%)          10   (2.7%)     GraphQL::Execution::Interpreter::Runtime#evaluate_selections
      18   (4.8%)           9   (2.4%)     GraphQL::Execution::Interpreter::Runtime#continue_value
       9   (2.4%)           9   (2.4%)     GraphQL::Execution::Interpreter::Runtime::GraphQLResultHash#set_leaf
      24   (6.4%)           9   (2.4%)     GraphQL::Execution::Interpreter::Runtime#set_result
       9   (2.4%)           8   (2.1%)     GraphQL::Schema::Warden#get_field
      34   (9.1%)           7   (1.9%)     Kernel#public_send
     354  (94.4%)           7   (1.9%)     GraphQL::Execution::Interpreter::Runtime#resolve_list_item
      13   (3.5%)           7   (1.9%)     GraphQL::Schema::Member::HasArguments::FieldConfigured#any_arguments?
     369  (98.4%)           6   (1.6%)     Hash#each

After, YJIT (6.450 (± 0.0%) i/s)

 Calculating -------------------------------------
Run large introspection
                        6.450  (± 0.0%) i/s -     65.000  in  10.102631s
==================================
Mode: wall(1000)
Samples: 152 (5.00% miss rate)
GC: 0 (0.00%)
==================================
   TOTAL    (pct)     SAMPLES    (pct)     FRAME
      17  (11.2%)          17  (11.2%)     Kernel#class
      12   (7.9%)          12   (7.9%)     Kernel#respond_to?
       9   (5.9%)           9   (5.9%)     Kernel#hash
      59  (38.8%)           9   (5.9%)     GraphQL::Schema::Field#resolve
       8   (5.3%)           8   (5.3%)     BasicObject#==
       6   (3.9%)           6   (3.9%)     GraphQL::Schema::Field#type
       6   (3.9%)           6   (3.9%)     Kernel#is_a?
       5   (3.3%)           5   (3.3%)     GraphQL::Execution::Interpreter::Runtime#dead_result?
       4   (2.6%)           4   (2.6%)     Thread#[]
       8   (5.3%)           4   (2.6%)     GraphQL::Execution::Interpreter::Runtime#get_current_runtime_state
       4   (2.6%)           4   (2.6%)     GraphQL::Execution::Interpreter::Runtime::GraphQLResult#initialize
       4   (2.6%)           4   (2.6%)     GraphQL::Schema::Member::BaseDSLMethods#graphql_name
      21  (13.8%)           4   (2.6%)     GraphQL::Execution::Interpreter::Runtime#lazy?
       3   (2.0%)           3   (2.0%)     Hash#any?

After, No YJIT (2.791 (± 0.0%) i/s)

Calculating -------------------------------------
Run large introspection
                        2.791  (± 0.0%) i/s -     28.000  in  10.036736s
==================================
Mode: wall(1000)
Samples: 375 (1.83% miss rate)
GC: 7 (1.87%)
==================================
   TOTAL    (pct)     SAMPLES    (pct)     FRAME
     366  (97.6%)          41  (10.9%)     GraphQL::Execution::Interpreter::Runtime#evaluate_selection_with_resolved_keyword_args
     366  (97.6%)          38  (10.1%)     GraphQL::Execution::Interpreter::Runtime#continue_field
      90  (24.0%)          31   (8.3%)     GraphQL::Schema::Field#resolve
     366  (97.6%)          28   (7.5%)     GraphQL::Execution::Interpreter::Runtime#evaluate_selection
      15   (4.0%)          13   (3.5%)     GraphQL::Execution::Interpreter::Runtime#lazy?
      79  (21.1%)          12   (3.2%)     GraphQL::Query#after_lazy
      14   (3.7%)          12   (3.2%)     GraphQL::Schema::Warden#get_field
     366  (97.6%)          11   (2.9%)     GraphQL::Execution::Interpreter::Runtime#evaluate_selections
      11   (2.9%)          11   (2.9%)     GraphQL::Execution::Interpreter::Runtime#get_current_runtime_state
      22   (5.9%)           9   (2.4%)     GraphQL::Schema::Object.authorized_new
      18   (4.8%)           8   (2.1%)     GraphQL::Execution::Interpreter::Runtime#continue_value
      30   (8.0%)           8   (2.1%)     Kernel#public_send
     366  (97.6%)           7   (1.9%)     Hash#each

~~It seems like introspection even got slower, that's a bit of a surprise to me because I would have expected that case to benefit from fewer needless calls to re-fetch state.~~ I updated the benchmark after removing another write to state and it's looking faster or the same now.

rmosolgo added 6 commits September 6, 2023 10:17

Pass runtime state instead of repeatedly fetching it from Thread.current

8b7e4ff

Remove needless fallback

259d4ac

Merge branch 'master' into better-runtime-state

9630a3e

Merge branch 'master' into better-runtime-state

5b8744d

Move result objects to another file; eliminate needless state update

cf08105

Merge branch 'master' into better-runtime-state

4c1da51

rmosolgo added this to the 2.1.1 milestone Sep 21, 2023

rmosolgo merged commit 0f10399 into master Sep 21, 2023
12 checks passed

rmosolgo deleted the better-runtime-state branch September 21, 2023 16:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pass runtime state instead of repeatedly fetching it from Thread.current #4621

Pass runtime state instead of repeatedly fetching it from Thread.current #4621

rmosolgo commented Sep 6, 2023 •

edited

Pass runtime state instead of repeatedly fetching it from Thread.current #4621

Pass runtime state instead of repeatedly fetching it from Thread.current #4621

Conversation

rmosolgo commented Sep 6, 2023 • edited

rmosolgo commented Sep 6, 2023 •

edited