Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate IOSource#ensure_buffer from IOSource#match. #118

Merged
merged 1 commit into from
Mar 6, 2024

Conversation

naitoh
Copy link
Contributor

@naitoh naitoh commented Mar 5, 2024

Why?

It would affect performance to do a read check in IOSource#match every time, Separate read processing from IOSource#ensure_buffer.

Use IOSource#ensure_buffer in the following cases where @source.buffer is empty.

  1. at the start of pull_event
  2. If a trailing '>' pattern matches, as in @source.match(/\s*>/um).

Benchmark

RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.3.0/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin22]
Calculating -------------------------------------
                         before       after  before(YJIT)  after(YJIT)
                 dom     10.278      10.986        16.430       16.941 i/s -     100.000 times in 9.729858s 9.102574s 6.086579s 5.902885s
                 sax     30.166      30.496        49.851       51.596 i/s -     100.000 times in 3.315008s 3.279069s 2.005961s 1.938123s
                pull     35.459      36.380        60.266       63.134 i/s -     100.000 times in 2.820181s 2.748745s 1.659301s 1.583928s
              stream     33.762      34.636        55.173       55.859 i/s -     100.000 times in 2.961948s 2.887131s 1.812485s 1.790218s

Comparison:
                              dom
         after(YJIT):        16.9 i/s
        before(YJIT):        16.4 i/s - 1.03x  slower
               after:        11.0 i/s - 1.54x  slower
              before:        10.3 i/s - 1.65x  slower

                              sax
         after(YJIT):        51.6 i/s
        before(YJIT):        49.9 i/s - 1.04x  slower
               after:        30.5 i/s - 1.69x  slower
              before:        30.2 i/s - 1.71x  slower

                             pull
         after(YJIT):        63.1 i/s
        before(YJIT):        60.3 i/s - 1.05x  slower
               after:        36.4 i/s - 1.74x  slower
              before:        35.5 i/s - 1.78x  slower

                           stream
         after(YJIT):        55.9 i/s
        before(YJIT):        55.2 i/s - 1.01x  slower
               after:        34.6 i/s - 1.61x  slower
              before:        33.8 i/s - 1.65x  slower

  • YJIT=ON : 1.01x - 1.05x faster
  • YJIT=OFF : 1.01x - 1.06x faster

@kou
Copy link
Member

kou commented Mar 6, 2024

Can we give it better name?
I feel that next_read doesn't imply "read if buffer is empty".

ensure_buffered?

## Why?

It would affect performance to do a read check in `IOSource#match` every time,
Separate read processing from IOSource#ensure_buffer.

Use `IOSource#ensure_buffer` in the following cases:

1. at the start of pull_event
2. after a trailing '>' pattern such as `@source.match(/\s*>/um`)

## Benchmark

```
RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.3.0/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin22]
Calculating -------------------------------------
                         before       after  before(YJIT)  after(YJIT)
                 dom     10.278      10.986        16.430       16.941 i/s -     100.000 times in 9.729858s 9.102574s 6.086579s 5.902885s
                 sax     30.166      30.496        49.851       51.596 i/s -     100.000 times in 3.315008s 3.279069s 2.005961s 1.938123s
                pull     35.459      36.380        60.266       63.134 i/s -     100.000 times in 2.820181s 2.748745s 1.659301s 1.583928s
              stream     33.762      34.636        55.173       55.859 i/s -     100.000 times in 2.961948s 2.887131s 1.812485s 1.790218s

Comparison:
                              dom
         after(YJIT):        16.9 i/s
        before(YJIT):        16.4 i/s - 1.03x  slower
               after:        11.0 i/s - 1.54x  slower
              before:        10.3 i/s - 1.65x  slower

                              sax
         after(YJIT):        51.6 i/s
        before(YJIT):        49.9 i/s - 1.04x  slower
               after:        30.5 i/s - 1.69x  slower
              before:        30.2 i/s - 1.71x  slower

                             pull
         after(YJIT):        63.1 i/s
        before(YJIT):        60.3 i/s - 1.05x  slower
               after:        36.4 i/s - 1.74x  slower
              before:        35.5 i/s - 1.78x  slower

                           stream
         after(YJIT):        55.9 i/s
        before(YJIT):        55.2 i/s - 1.01x  slower
               after:        34.6 i/s - 1.61x  slower
              before:        33.8 i/s - 1.65x  slower

```

- YJIT=ON : 1.01x - 1.05x faster
- YJIT=OFF : 1.01x - 1.06x faster
@naitoh naitoh changed the title Separate IOSource#next_read from IOSource#match. Separate IOSource#ensure_buffer from IOSource#match. Mar 6, 2024
@naitoh
Copy link
Contributor Author

naitoh commented Mar 6, 2024

@kou

Can we give it better name? I feel that next_read doesn't imply "read if buffer is empty".

ensure_buffered?

I changed it to ensure_buffer.
What do you think?

@kou
Copy link
Member

kou commented Mar 6, 2024

OK. Let's use it.

@kou kou merged commit 77cb0dc into ruby:master Mar 6, 2024
39 checks passed
@naitoh naitoh deleted the add_next_read branch March 6, 2024 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants