Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass runtime state instead of repeatedly fetching it from Thread.current #4621

Merged
merged 6 commits into from
Sep 21, 2023

Conversation

rmosolgo
Copy link
Owner

@rmosolgo rmosolgo commented Sep 6, 2023

I ran the benchmarks with YJIT turned on and found that Thread#[] was one of the top methods. I thought I might minimize the hits to that by refactoring a bit. In the end, the change was negligible. I think the reason is because, with YJIT running, most of the runtime ends in "misses," which I'm guessing is optimized code that stackprof can't see.

  • Large Result Benchmark

    Before, YJIT (9.453 i/s)

    Calculating -------------------------------------
    Querying for 1000 objects
                            9.453  (±10.6%) i/s -     95.000  in  10.096298s
    ==================================
      Mode: wall(1)
      Samples: 324246 (71.59% miss rate)
      GC: 11313 (3.49%)
    ==================================
       TOTAL    (pct)     SAMPLES    (pct)     FRAME
       25950   (8.0%)       25950   (8.0%)     Kernel#class
       20545   (6.3%)       20545   (6.3%)     Thread#[]
       19958   (6.2%)       19958   (6.2%)     GraphQL::Execution::Lazy#initialize
       13375   (4.1%)       13375   (4.1%)     Kernel#hash
       11176   (3.4%)       11176   (3.4%)     GraphQL::Execution::Interpreter::Runtime#dead_result?
       30562   (9.4%)       10012   (3.1%)     GraphQL::Execution::Interpreter::Runtime#get_current_runtime_state
      303235  (93.5%)        9393   (2.9%)     GraphQL::Execution::Interpreter::Runtime#after_lazy
        9290   (2.9%)        9290   (2.9%)     GraphQL::Schema::Member::HasArguments#own_arguments
        9076   (2.8%)        9076   (2.8%)     GraphQL::Schema.lazy_methods
        8739   (2.7%)        8739   (2.7%)     Kernel#respond_to?
        

    Before, No YJIT (4.943 i/s)

    Calculating -------------------------------------
      Querying for 1000 objects
                            4.943  (± 0.0%) i/s -     50.000  in  10.147070s
      ==================================
        Mode: wall(1)
        Samples: 167739 (56.34% miss rate)
        GC: 7384 (4.40%)
      ==================================
       TOTAL    (pct)     SAMPLES    (pct)     FRAME
      104604  (62.4%)        9047   (5.4%)     GraphQL::Execution::Interpreter::Runtime#continue_field
        6650   (4.0%)        6650   (4.0%)     (sweeping)
      118832  (70.8%)        6473   (3.9%)     GraphQL::Execution::Interpreter::Runtime#after_lazy
       74223  (44.2%)        6092   (3.6%)     GraphQL::Execution::Interpreter::Runtime#evaluate_selection_with_resolved_keyword_args
      103127  (61.5%)        5410   (3.2%)     GraphQL::Execution::Interpreter::Runtime#resolve_list_item
        6590   (3.9%)        4702   (2.8%)     GraphQL::Execution::Interpreter::Runtime#lazy?
       82289  (49.1%)        4512   (2.7%)     GraphQL::Execution::Interpreter::Runtime#evaluate_selection
        7029   (4.2%)        4276   (2.5%)     GraphQL::Execution::Interpreter::Runtime#get_current_runtime_state
       19960  (11.9%)        4163   (2.5%)     GraphQL::Schema::Field#resolve
        3549   (2.1%)        3549   (2.1%)     GraphQL::Execution::Lazy#initialize
        8347   (5.0%)        3211   (1.9%)     GraphQL::Execution::Interpreter::Runtime#set_result
       14259   (8.5%)        2905   (1.7%)     GraphQL::Schema::Field#with_extensions
      121394  (72.4%)        2771   (1.7%)     Array#each
        

    After, YJIT (9.558 i/s)

    Calculating -------------------------------------
    Querying for 1000 objects
                            9.558  (±10.5%) i/s -     96.000  in  10.085145s
    ==================================
      Mode: wall(1)
      Samples: 299263 (71.11% miss rate)
      GC: 11417 (3.82%)
    ==================================
       TOTAL    (pct)     SAMPLES    (pct)     FRAME
       29462   (9.8%)       29462   (9.8%)     Kernel#class
       17072   (5.7%)       17072   (5.7%)     GraphQL::Execution::Lazy#initialize
       13158   (4.4%)       13158   (4.4%)     Kernel#hash
       11492   (3.8%)       11492   (3.8%)     Thread#[]
        8537   (2.9%)        8537   (2.9%)     GraphQL::Schema.lazy_methods
       35112  (11.7%)        8318   (2.8%)     GraphQL::Execution::Interpreter::Runtime#lazy?
        8161   (2.7%)        8161   (2.7%)     Kernel#respond_to?
        8101   (2.7%)        8101   (2.7%)     GraphQL::Schema::Member::HasArguments#own_arguments
        

    After, No YJIT (5.188 i/s)

     Calculating -------------------------------------
    Querying for 1000 objects
                            5.188  (± 0.0%) i/s -     52.000  in  10.032891s
    ==================================
      Mode: wall(1)
      Samples: 162494 (56.02% miss rate)
      GC: 9416 (5.79%)
    ==================================
       TOTAL    (pct)     SAMPLES    (pct)     FRAME
        8440   (5.2%)        8440   (5.2%)     (sweeping)
       99386  (61.2%)        8373   (5.2%)     GraphQL::Execution::Interpreter::Runtime#continue_field
      112920  (69.5%)        6398   (3.9%)     GraphQL::Execution::Interpreter::Runtime#after_lazy
       66843  (41.1%)        5444   (3.4%)     GraphQL::Execution::Interpreter::Runtime#evaluate_selection_with_resolved_keyword_args
       97897  (60.2%)        4906   (3.0%)     GraphQL::Execution::Interpreter::Runtime#resolve_list_item
        6704   (4.1%)        4662   (2.9%)     GraphQL::Execution::Interpreter::Runtime#lazy?
        

  • Introspection Benchmark

    Before, YJIT (6.369 (± 0.0%) i/s)

    Calculating -------------------------------------
    Run large introspection
                            6.369  (± 0.0%) i/s -     64.000  in  10.065125s
    ==================================
    Mode: wall(1000)
    Samples: 161 (4.17% miss rate)
    GC: 4 (2.48%)
    ==================================
       TOTAL    (pct)     SAMPLES    (pct)     FRAME
          22  (13.7%)          22  (13.7%)     Kernel#class
          13   (8.1%)          13   (8.1%)     Kernel#hash
          57  (35.4%)           8   (5.0%)     GraphQL::Schema::Field#resolve
           8   (5.0%)           8   (5.0%)     GraphQL::Execution::Interpreter::Runtime#dead_result?
          12   (7.5%)           6   (3.7%)     GraphQL::Schema::Warden#get_field
          28  (17.4%)           6   (3.7%)     GraphQL::Execution::Interpreter::Runtime#lazy?
           6   (3.7%)           6   (3.7%)     GraphQL::Schema::Member::HasArguments#own_arguments
           4   (2.5%)           4   (2.5%)     GraphQL::Schema::EnumValue#value
           4   (2.5%)           4   (2.5%)     Thread#[]
           4   (2.5%)           4   (2.5%)     Kernel#respond_to?
           4   (2.5%)           4   (2.5%)     GraphQL::Schema::Field#type
        

    Before, No YJIT (2.766 (± 0.0%) i/s)

    Calculating -------------------------------------
    Run large introspection
                            2.766  (± 0.0%) i/s -     28.000  in  10.129171s
    ==================================
    Mode: wall(1000)
    Samples: 375 (1.57% miss rate)
    GC: 3 (0.80%)
    ==================================
       TOTAL    (pct)     SAMPLES    (pct)     FRAME
         369  (98.4%)          42  (11.2%)     GraphQL::Execution::Interpreter::Runtime#evaluate_selection_with_resolved_keyword_args
          98  (26.1%)          41  (10.9%)     GraphQL::Schema::Field#resolve
         369  (98.4%)          28   (7.5%)     GraphQL::Execution::Interpreter::Runtime#continue_field
         369  (98.4%)          28   (7.5%)     GraphQL::Execution::Interpreter::Runtime#evaluate_selection
          16   (4.3%)          13   (3.5%)     GraphQL::Execution::Interpreter::Runtime#get_current_runtime_state
          91  (24.3%)          12   (3.2%)     GraphQL::Query#after_lazy
          13   (3.5%)          11   (2.9%)     GraphQL::Execution::Interpreter::Runtime#lazy?
         369  (98.4%)          10   (2.7%)     GraphQL::Execution::Interpreter::Runtime#evaluate_selections
          18   (4.8%)           9   (2.4%)     GraphQL::Execution::Interpreter::Runtime#continue_value
           9   (2.4%)           9   (2.4%)     GraphQL::Execution::Interpreter::Runtime::GraphQLResultHash#set_leaf
          24   (6.4%)           9   (2.4%)     GraphQL::Execution::Interpreter::Runtime#set_result
           9   (2.4%)           8   (2.1%)     GraphQL::Schema::Warden#get_field
          34   (9.1%)           7   (1.9%)     Kernel#public_send
         354  (94.4%)           7   (1.9%)     GraphQL::Execution::Interpreter::Runtime#resolve_list_item
          13   (3.5%)           7   (1.9%)     GraphQL::Schema::Member::HasArguments::FieldConfigured#any_arguments?
         369  (98.4%)           6   (1.6%)     Hash#each
        

    After, YJIT (6.450 (± 0.0%) i/s)

     Calculating -------------------------------------
    Run large introspection
                            6.450  (± 0.0%) i/s -     65.000  in  10.102631s
    ==================================
    Mode: wall(1000)
    Samples: 152 (5.00% miss rate)
    GC: 0 (0.00%)
    ==================================
       TOTAL    (pct)     SAMPLES    (pct)     FRAME
          17  (11.2%)          17  (11.2%)     Kernel#class
          12   (7.9%)          12   (7.9%)     Kernel#respond_to?
           9   (5.9%)           9   (5.9%)     Kernel#hash
          59  (38.8%)           9   (5.9%)     GraphQL::Schema::Field#resolve
           8   (5.3%)           8   (5.3%)     BasicObject#==
           6   (3.9%)           6   (3.9%)     GraphQL::Schema::Field#type
           6   (3.9%)           6   (3.9%)     Kernel#is_a?
           5   (3.3%)           5   (3.3%)     GraphQL::Execution::Interpreter::Runtime#dead_result?
           4   (2.6%)           4   (2.6%)     Thread#[]
           8   (5.3%)           4   (2.6%)     GraphQL::Execution::Interpreter::Runtime#get_current_runtime_state
           4   (2.6%)           4   (2.6%)     GraphQL::Execution::Interpreter::Runtime::GraphQLResult#initialize
           4   (2.6%)           4   (2.6%)     GraphQL::Schema::Member::BaseDSLMethods#graphql_name
          21  (13.8%)           4   (2.6%)     GraphQL::Execution::Interpreter::Runtime#lazy?
           3   (2.0%)           3   (2.0%)     Hash#any?
        

    After, No YJIT (2.791 (± 0.0%) i/s)

    Calculating -------------------------------------
    Run large introspection
                            2.791  (± 0.0%) i/s -     28.000  in  10.036736s
    ==================================
    Mode: wall(1000)
    Samples: 375 (1.83% miss rate)
    GC: 7 (1.87%)
    ==================================
       TOTAL    (pct)     SAMPLES    (pct)     FRAME
         366  (97.6%)          41  (10.9%)     GraphQL::Execution::Interpreter::Runtime#evaluate_selection_with_resolved_keyword_args
         366  (97.6%)          38  (10.1%)     GraphQL::Execution::Interpreter::Runtime#continue_field
          90  (24.0%)          31   (8.3%)     GraphQL::Schema::Field#resolve
         366  (97.6%)          28   (7.5%)     GraphQL::Execution::Interpreter::Runtime#evaluate_selection
          15   (4.0%)          13   (3.5%)     GraphQL::Execution::Interpreter::Runtime#lazy?
          79  (21.1%)          12   (3.2%)     GraphQL::Query#after_lazy
          14   (3.7%)          12   (3.2%)     GraphQL::Schema::Warden#get_field
         366  (97.6%)          11   (2.9%)     GraphQL::Execution::Interpreter::Runtime#evaluate_selections
          11   (2.9%)          11   (2.9%)     GraphQL::Execution::Interpreter::Runtime#get_current_runtime_state
          22   (5.9%)           9   (2.4%)     GraphQL::Schema::Object.authorized_new
          18   (4.8%)           8   (2.1%)     GraphQL::Execution::Interpreter::Runtime#continue_value
          30   (8.0%)           8   (2.1%)     Kernel#public_send
         366  (97.6%)           7   (1.9%)     Hash#each
    

It seems like introspection even got slower, that's a bit of a surprise to me because I would have expected that case to benefit from fewer needless calls to re-fetch state. I updated the benchmark after removing another write to state and it's looking faster or the same now.

@rmosolgo rmosolgo added this to the 2.1.1 milestone Sep 21, 2023
@rmosolgo rmosolgo merged commit 0f10399 into master Sep 21, 2023
12 checks passed
@rmosolgo rmosolgo deleted the better-runtime-state branch September 21, 2023 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant