Separate `IOSource#ensure_buffer` from `IOSource#match`. #118

naitoh · 2024-03-05T13:49:36Z

Why?

It would affect performance to do a read check in IOSource#match every time, Separate read processing from IOSource#ensure_buffer.

Use IOSource#ensure_buffer in the following cases where @source.buffer is empty.

at the start of pull_event
If a trailing '>' pattern matches, as in @source.match(/\s*>/um).

Benchmark

RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.3.0/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin22]
Calculating -------------------------------------
                         before       after  before(YJIT)  after(YJIT)
                 dom     10.278      10.986        16.430       16.941 i/s -     100.000 times in 9.729858s 9.102574s 6.086579s 5.902885s
                 sax     30.166      30.496        49.851       51.596 i/s -     100.000 times in 3.315008s 3.279069s 2.005961s 1.938123s
                pull     35.459      36.380        60.266       63.134 i/s -     100.000 times in 2.820181s 2.748745s 1.659301s 1.583928s
              stream     33.762      34.636        55.173       55.859 i/s -     100.000 times in 2.961948s 2.887131s 1.812485s 1.790218s

Comparison:
                              dom
         after(YJIT):        16.9 i/s
        before(YJIT):        16.4 i/s - 1.03x  slower
               after:        11.0 i/s - 1.54x  slower
              before:        10.3 i/s - 1.65x  slower

                              sax
         after(YJIT):        51.6 i/s
        before(YJIT):        49.9 i/s - 1.04x  slower
               after:        30.5 i/s - 1.69x  slower
              before:        30.2 i/s - 1.71x  slower

                             pull
         after(YJIT):        63.1 i/s
        before(YJIT):        60.3 i/s - 1.05x  slower
               after:        36.4 i/s - 1.74x  slower
              before:        35.5 i/s - 1.78x  slower

                           stream
         after(YJIT):        55.9 i/s
        before(YJIT):        55.2 i/s - 1.01x  slower
               after:        34.6 i/s - 1.61x  slower
              before:        33.8 i/s - 1.65x  slower

YJIT=ON : 1.01x - 1.05x faster
YJIT=OFF : 1.01x - 1.06x faster

kou · 2024-03-06T06:41:07Z

Can we give it better name?
I feel that next_read doesn't imply "read if buffer is empty".

ensure_buffered?

## Why? It would affect performance to do a read check in `IOSource#match` every time, Separate read processing from IOSource#ensure_buffer. Use `IOSource#ensure_buffer` in the following cases: 1. at the start of pull_event 2. after a trailing '>' pattern such as `@source.match(/\s*>/um`) ## Benchmark ``` RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.3.0/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin22] Calculating ------------------------------------- before after before(YJIT) after(YJIT) dom 10.278 10.986 16.430 16.941 i/s - 100.000 times in 9.729858s 9.102574s 6.086579s 5.902885s sax 30.166 30.496 49.851 51.596 i/s - 100.000 times in 3.315008s 3.279069s 2.005961s 1.938123s pull 35.459 36.380 60.266 63.134 i/s - 100.000 times in 2.820181s 2.748745s 1.659301s 1.583928s stream 33.762 34.636 55.173 55.859 i/s - 100.000 times in 2.961948s 2.887131s 1.812485s 1.790218s Comparison: dom after(YJIT): 16.9 i/s before(YJIT): 16.4 i/s - 1.03x slower after: 11.0 i/s - 1.54x slower before: 10.3 i/s - 1.65x slower sax after(YJIT): 51.6 i/s before(YJIT): 49.9 i/s - 1.04x slower after: 30.5 i/s - 1.69x slower before: 30.2 i/s - 1.71x slower pull after(YJIT): 63.1 i/s before(YJIT): 60.3 i/s - 1.05x slower after: 36.4 i/s - 1.74x slower before: 35.5 i/s - 1.78x slower stream after(YJIT): 55.9 i/s before(YJIT): 55.2 i/s - 1.01x slower after: 34.6 i/s - 1.61x slower before: 33.8 i/s - 1.65x slower ``` - YJIT=ON : 1.01x - 1.05x faster - YJIT=OFF : 1.01x - 1.06x faster

naitoh · 2024-03-06T12:04:56Z

@kou

Can we give it better name? I feel that next_read doesn't imply "read if buffer is empty".

ensure_buffered?

I changed it to ensure_buffer.
What do you think?

kou · 2024-03-06T22:02:29Z

OK. Let's use it.

naitoh force-pushed the add_next_read branch from c650670 to 2b5c1a0 Compare March 6, 2024 11:59

naitoh changed the title ~~Separate IOSource#next_read from IOSource#match.~~ Separate IOSource#ensure_buffer from IOSource#match. Mar 6, 2024

kou merged commit 77cb0dc into ruby:master Mar 6, 2024
39 checks passed

naitoh deleted the add_next_read branch March 6, 2024 22:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate `IOSource#ensure_buffer` from `IOSource#match`. #118

Separate `IOSource#ensure_buffer` from `IOSource#match`. #118

naitoh commented Mar 5, 2024 •

edited

Loading

kou commented Mar 6, 2024

naitoh commented Mar 6, 2024 •

edited

Loading

kou commented Mar 6, 2024

Separate IOSource#ensure_buffer from IOSource#match. #118

Separate IOSource#ensure_buffer from IOSource#match. #118

Conversation

naitoh commented Mar 5, 2024 • edited Loading

Why?

Benchmark

kou commented Mar 6, 2024

naitoh commented Mar 6, 2024 • edited Loading

kou commented Mar 6, 2024

Separate `IOSource#ensure_buffer` from `IOSource#match`. #118

Separate `IOSource#ensure_buffer` from `IOSource#match`. #118

naitoh commented Mar 5, 2024 •

edited

Loading

naitoh commented Mar 6, 2024 •

edited

Loading