Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various Ruby driver optimizations #184

Merged
merged 7 commits into from
Mar 23, 2024
Merged

Conversation

casperisfine
Copy link
Collaborator

I recently paired with @tenderlove on trying to see if the Ruby vs Hiredis performance difference could be closed.

We came up with a bunch of small changes that really close the gap quite significantly, especially when YJIT is enabled.

The Ruby driver is now either on par with hiredis, or just marginally slower on some cases.

There are a few more optimization we thought of, but they require some changes in Ruby, I'll try to work on these this year. Notably a way to create hashes with a given capacity would help a lot.

Before (YJIT)

ruby: ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]

redis-server: Redis server v=7.0.12 sha=00000000:0 malloc=libc bits=64 build=a11d0151eabf466c

small string x 100

ruby 3.3.0 (2023-12-25 revision 5124f9ac75) +YJIT [arm64-darwin23]
             hiredis:     6745.7 i/s
                ruby:     5182.0 i/s - 1.30x  slower

large string x 100

ruby 3.3.0 (2023-12-25 revision 5124f9ac75) +YJIT [arm64-darwin23]
             hiredis:      303.5 i/s
                ruby:      343.1 i/s - 1.13x  faster

small list x 100

ruby 3.3.0 (2023-12-25 revision 5124f9ac75) +YJIT [arm64-darwin23]
             hiredis:     3683.6 i/s
                ruby:     1952.3 i/s - 1.89x  slower

large list

ruby 3.3.0 (2023-12-25 revision 5124f9ac75) +YJIT [arm64-darwin23]
             hiredis:     6540.2 i/s
                ruby:     2710.0 i/s - 2.41x  slower

small hash x 100

ruby 3.3.0 (2023-12-25 revision 5124f9ac75) +YJIT [arm64-darwin23]
             hiredis:     4002.6 i/s
                ruby:     2317.0 i/s - 1.73x  slower

large hash

ruby 3.3.0 (2023-12-25 revision 5124f9ac75) +YJIT [arm64-darwin23]
             hiredis:     2467.1 i/s
                ruby:     2439.0 i/s - same-ish: difference falls within error

After (YJIT)

ruby: ruby 3.4.0dev (2024-03-19T14:18:56Z master 5c2937733c) [arm64-darwin23]

redis-server: Redis server v=7.0.12 sha=00000000:0 malloc=libc bits=64 build=a11d0151eabf466c

small string x 100

ruby 3.4.0dev (2024-03-19T14:18:56Z master 5c2937733c) +YJIT [arm64-darwin23]
             hiredis:     6407.8 i/s
                ruby:     5852.0 i/s - same-ish: difference falls within error

large string x 100

ruby 3.4.0dev (2024-03-19T14:18:56Z master 5c2937733c) +YJIT [arm64-darwin23]
             hiredis:      302.8 i/s
                ruby:      337.3 i/s - same-ish: difference falls within error

small list x 100

ruby 3.4.0dev (2024-03-19T14:18:56Z master 5c2937733c) +YJIT [arm64-darwin23]
             hiredis:     4067.7 i/s
                ruby:     2721.5 i/s - 1.49x  slower

large list

ruby 3.4.0dev (2024-03-19T14:18:56Z master 5c2937733c) +YJIT [arm64-darwin23]
             hiredis:     7138.7 i/s
                ruby:     6605.4 i/s - same-ish: difference falls within error

small hash x 100

ruby 3.4.0dev (2024-03-19T14:18:56Z master 5c2937733c) +YJIT [arm64-darwin23]
             hiredis:     4219.8 i/s
                ruby:     3586.4 i/s - 1.18x  slower

large hash

ruby 3.4.0dev (2024-03-19T14:18:56Z master 5c2937733c) +YJIT [arm64-darwin23]
             hiredis:     5240.9 i/s
                ruby:     5312.5 i/s - same-ish: difference falls within error

NB: the later use ruby-head to benefit from some YJIT opts.

byroot and others added 5 commits March 22, 2024 11:29
THe main gain is to have proper method calls rather than
dynamic `send`.

Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
All RESP3 types start with a single line containing a postive
integer that announce the size of the data.

e.g. `*42\r\n....`.

Before this commit we would read one line, hence allocate
a String, then parse it with `Kernel.Integer`.

By instead reading bytes one by one and rebuilding the integer
we save that string allocation, which has a significant impact.

The `gets_integer` method also optimistically assumes the buffer
already contains the line, which saves some methods calls in the
happy path.

Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
redis-client when parsing a string first attempt to encode
it as `Encoding.default_external` and if invalid, fallbacks to
encode it as binary.

By having the buffer encoded with the default external encoding
we save having to change the encoding once on every parsed string.

Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
This method is a big hotspot for parsing, by not pessimistically
checking the size of the buffer we save a few method calls on
the green path.

Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
@casperisfine
Copy link
Collaborator Author

The TruffleRuby failure seem legit and may be indicative of a TruffleRuby bug:

RedisClientTest#test_encoding [test/redis_client_test.rb:31]:
Expected: #<Encoding:UTF-8>
  Actual: #<Encoding:ASCII-8BIT>

I'll dig this one down and try to come up with a minimal repro. FYI @eregon

There was some recent YJIT optimization for getbyte and some
other methods we use a lot in the Ruby driver.
@casperisfine
Copy link
Collaborator Author

Ruby spec for the TruffleRuby discrepancy: ruby/spec#1145

@mperham
Copy link

mperham commented Mar 22, 2024

21.1:

src/sidekiq % RUBY_YJIT_ENABLE=1 LATENCY=0 bin/sidekiqload
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) +YJIT [arm64-darwin23]
THREADS: nil, LATENCY: 0, AJ: nil, PROFILE: nil
2024-03-22T16:31:33.787Z pid=24663 tid=l27 ERROR: Setup RSS: 46496
2024-03-22T16:31:36.538Z pid=24663 tid=l27 WARN: Created 500000 jobs in 2.748508 sec
2024-03-22T16:31:36.538Z pid=24663 tid=l27 WARN: Starting load
2024-03-22T16:31:36.538Z pid=24663 tid=l27 ERROR: Simulating 0ms of latency between Sidekiq and redis
2024-03-22T16:31:53.557Z pid=24663 tid=lj3 ERROR: Done, 500000 jobs in **17.018055** sec, 29380 jobs/sec
2024-03-22T16:31:53.563Z pid=24663 tid=lj3 ERROR: Ending RSS: 54400
2024-03-22T16:31:53.564Z pid=24663 tid=lj3 ERROR: Now here's the latency for three jobs
0.0007669925689697266
0.00015878677368164062
0.00014734268188476562
src/sidekiq % RUBY_YJIT_ENABLE=1 LATENCY=0 bin/sidekiqload
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) +YJIT [arm64-darwin23]
THREADS: nil, LATENCY: 0, AJ: nil, PROFILE: nil
2024-03-22T16:31:57.331Z pid=24700 tid=l2s ERROR: Setup RSS: 45648
2024-03-22T16:32:00.203Z pid=24700 tid=l2s WARN: Created 500000 jobs in 2.869176 sec
2024-03-22T16:32:00.203Z pid=24700 tid=l2s WARN: Starting load
2024-03-22T16:32:00.203Z pid=24700 tid=l2s ERROR: Simulating 0ms of latency between Sidekiq and redis
2024-03-22T16:32:17.020Z pid=24700 tid=lhw ERROR: Done, 500000 jobs in **16.816453** sec, 29732 jobs/sec
2024-03-22T16:32:17.027Z pid=24700 tid=lhw ERROR: Ending RSS: 53104
2024-03-22T16:32:17.027Z pid=24700 tid=lhw ERROR: Now here's the latency for three jobs
0.0009238719940185547
0.00021910667419433594
0.00011897087097167969

This branch:

src/sidekiq % RUBY_YJIT_ENABLE=1 LATENCY=0 bin/sidekiqload
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) +YJIT [arm64-darwin23]
THREADS: nil, LATENCY: 0, AJ: nil, PROFILE: nil
2024-03-22T16:29:32.030Z pid=24563 tid=gu3 ERROR: Setup RSS: 46432
2024-03-22T16:29:34.784Z pid=24563 tid=gu3 WARN: Created 500000 jobs in 2.736431 sec
2024-03-22T16:29:34.784Z pid=24563 tid=gu3 WARN: Starting load
2024-03-22T16:29:34.785Z pid=24563 tid=gu3 ERROR: Simulating 0ms of latency between Sidekiq and redis
2024-03-22T16:29:51.390Z pid=24563 tid=ggr ERROR: Done, 500000 jobs in **16.603711** sec, 30113 jobs/sec
2024-03-22T16:29:51.397Z pid=24563 tid=ggr ERROR: Ending RSS: 53600
2024-03-22T16:29:51.398Z pid=24563 tid=ggr ERROR: Now here's the latency for three jobs
0.0010797977447509766
0.0003361701965332031
0.00019621849060058594
src/sidekiq % RUBY_YJIT_ENABLE=1 LATENCY=0 bin/sidekiqload
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) +YJIT [arm64-darwin23]
THREADS: nil, LATENCY: 0, AJ: nil, PROFILE: nil
2024-03-22T16:29:56.089Z pid=24584 tid=l34 ERROR: Setup RSS: 44496
2024-03-22T16:29:58.893Z pid=24584 tid=l34 WARN: Created 500000 jobs in 2.801765 sec
2024-03-22T16:29:58.893Z pid=24584 tid=l34 WARN: Starting load
2024-03-22T16:29:58.893Z pid=24584 tid=l34 ERROR: Simulating 0ms of latency between Sidekiq and redis
2024-03-22T16:30:15.505Z pid=24584 tid=lgg ERROR: Done, 500000 jobs in **16.608818** sec, 30104 jobs/sec
2024-03-22T16:30:15.521Z pid=24584 tid=lgg ERROR: Ending RSS: 52064
2024-03-22T16:30:15.521Z pid=24584 tid=lgg ERROR: Now here's the latency for three jobs
0.0014886856079101562
0.0005791187286376953
0.0002732276916503906

A very slight improvement to Sidekiq's benchmark.

The behavior of `IO#read_nonblock` differs sligthly.
See: ruby/spec#1145
@casperisfine casperisfine merged commit bf00251 into master Mar 23, 2024
26 checks passed
@casperisfine casperisfine deleted the bench-hiredis-vs-ruby branch March 23, 2024 08:52
casperisfine pushed a commit that referenced this pull request Apr 12, 2024
As of #184,
the buffer String is no longer BINARY but UTF-8.

I missed that the code that search for newlines was using `.index`
instead of `.byteindex`, causing the buffer offset to go out of
sync.
casperisfine pushed a commit that referenced this pull request Apr 12, 2024
As of #184,
the buffer String is no longer BINARY but UTF-8.

I missed that the code that search for newlines was using `.index`
instead of `.byteindex`, causing the buffer offset to go out of
sync.
casperisfine pushed a commit that referenced this pull request Apr 12, 2024
As of #184,
the buffer String is no longer BINARY but UTF-8.

I missed that the code that search for newlines was using `.index`
instead of `.byteindex`, causing the buffer offset to go out of
sync.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants