Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite Integer#times with Ruby #3656

Closed
wants to merge 2 commits into from
Closed

Conversation

k0kubun
Copy link
Member

@k0kubun k0kubun commented Oct 14, 2020

For maintenance reasons and future speed-up with (not-committed-yet) yield inlining by JIT.

Interpreter

$ benchmark-driver benchmark/loop_times.yml -v --rbenv 'before::yjit-release-before;after::yjit-release-after' --repeat-count=4
before: ruby 3.2.0dev (2022-10-21T21:06:34Z master 87bb0bee6b) [x86_64-linux]
after: ruby 3.2.0dev (2022-10-21T21:21:05Z builtin-times 165efec492) [x86_64-linux]
Warming up --------------------------------------
             10.times    11.241M i/s -     11.473M times in 1.020624s (88.96ns/i, 195clocks/i)
        10.times{|e|}     4.647M i/s -      4.647M times in 1.000029s (215.20ns/i, 473clocks/i)
      1000.times{|e|}    52.275k i/s -     56.980k times in 1.089995s (19.13μs/i)
30_000_000.times{|e|}      1.801 i/s -       2.000 times in 1.110794s (555.40ms/i)
Calculating -------------------------------------
                          before       after
             10.times    13.312M     12.616M i/s -     33.724M times in 2.533426s 2.673206s
        10.times{|e|}     4.987M      3.376M i/s -     13.940M times in 2.795105s 4.128913s
      1000.times{|e|}    55.811k     37.200k i/s -    156.826k times in 2.809951s 4.215721s
30_000_000.times{|e|}      1.857       1.266 i/s -       5.000 times in 2.692246s 3.950562s

Comparison:
                          10.times
               before:  13311817.6 i/s
                after:  12615752.3 i/s - 1.06x  slower

                     10.times{|e|}
               before:   4987373.4 i/s
                after:   3376246.8 i/s - 1.48x  slower

                   1000.times{|e|}
               before:     55810.9 i/s
                after:     37200.3 i/s - 1.50x  slower

             30_000_000.times{|e|}
               before:         1.9 i/s
                after:         1.3 i/s - 1.47x  slower

integer.rb Outdated Show resolved Hide resolved
@k0kubun k0kubun force-pushed the builtin-times branch 3 times, most recently from a494681 to 165efec Compare October 21, 2022 21:21
@k0kubun
Copy link
Member Author

k0kubun commented Oct 21, 2022

I benchmarked it again and updated the description. For some reason, the result is significantly different today and I'm no longer comfortable merging it. I'll revisit this after MJIT or YJIT gets yield inlining.

@k0kubun k0kubun closed this Oct 21, 2022
@k0kubun k0kubun deleted the builtin-times branch October 21, 2022 21:37
@k0kubun k0kubun restored the builtin-times branch November 1, 2022 22:56
@k0kubun
Copy link
Member Author

k0kubun commented Nov 1, 2022

Now that we at least have #6640, I'll play with this branch a bit again.

@k0kubun
Copy link
Member Author

k0kubun commented Nov 1, 2022

I prepared "C times" ruby (dbab15ebaf) and "Ruby times" ruby (8da08136e9), both of which have invokeblock support (#6640).

$ benchmark-driver ~/tmp/a.yml --rbenv "C times::yjit-release-before-$arch;C times+YJIT::yjit-release-before-$arch --yjit;Ruby times::yjit-release-after-$arch;Ruby times+YJIT::yjit-release-after-$arch --yjit" -v
C times: ruby 3.2.0dev (2022-11-02) [arm64-darwin21]
C times+YJIT: ruby 3.2.0dev (2022-11-02) +YJIT [arm64-darwin21]
Ruby times: ruby 3.2.0dev (2022-11-02) [arm64-darwin21]
Ruby times+YJIT: ruby 3.2.0dev (2022-11-02) +YJIT [arm64-darwin21]
Warming up --------------------------------------
             10.times     7.062M i/s -      7.288M times in 1.032008s (141.60ns/i)
        10.times{|e|}     3.472M i/s -      3.667M times in 1.056193s (288.00ns/i)
      1000.times{|e|}    44.999k i/s -     48.752k times in 1.083400s (22.22μs/i)
30_000_000.times{|e|}      1.496 i/s -       2.000 times in 1.337344s (668.67ms/i)
Calculating -------------------------------------
                         C times  C times+YJIT  Ruby times  Ruby times+YJIT
             10.times     8.143M        8.122M      3.317M           3.558M i/s -     21.186M times in 2.601655s 2.608632s 6.387270s 5.955144s
        10.times{|e|}     3.761M        3.775M      2.432M          10.085M i/s -     10.417M times in 2.769832s 2.759564s 4.283163s 1.032869s
      1000.times{|e|}    44.868k       45.412k     28.985k         159.115k i/s -    134.997k times in 3.008730s 2.972685s 4.657558s 0.848424s
30_000_000.times{|e|}      1.501         1.505       0.971            0.927 i/s -       4.000 times in 2.664303s 2.658322s 4.120002s 4.314808s

Comparison:
                          10.times
              C times:   8143433.7 i/s
         C times+YJIT:   8121653.4 i/s - 1.00x  slower
      Ruby times+YJIT:   3557664.6 i/s - 2.29x  slower
           Ruby times:   3316973.4 i/s - 2.46x  slower

                     10.times{|e|}
      Ruby times+YJIT:  10085171.5 i/s
         C times+YJIT:   3774748.8 i/s - 2.67x  slower
              C times:   3760755.5 i/s - 2.68x  slower
           Ruby times:   2432002.0 i/s - 4.15x  slower

                   1000.times{|e|}
      Ruby times+YJIT:    159115.0 i/s
         C times+YJIT:     45412.5 i/s - 3.50x  slower
              C times:     44868.4 i/s - 3.55x  slower
           Ruby times:     28984.5 i/s - 5.49x  slower

             30_000_000.times{|e|}
         C times+YJIT:         1.5 i/s
              C times:         1.5 i/s - 1.00x  slower
           Ruby times:         1.0 i/s - 1.55x  slower
      Ruby times+YJIT:         0.9 i/s - 1.62x  slower

10.times is not a yield case; it looks like we're not optimizing the to_enum path well.

10.times{|e|} and 1000.times{|e|} run 2.68~3.55x faster with YJIT and Integer#times in Ruby than Integer#times in C. Ruby is faster than C.

30_000_000.times{|e|} is not optimized because benchmark-driver decided to run it only 4.000 times and YJIT doesn't compile it. It's likely that we get a similar speedup for that case if you loop it more.

@k0kubun
Copy link
Member Author

k0kubun commented Nov 3, 2022

I'll revisit this a bit differently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants