Skip to content

Commit

Permalink
Use @scanner << readline instead of `@scanner.string = @scanner.res…
Browse files Browse the repository at this point in the history
…t + readline` (#107)

## Why

JRuby's `StringScanner#<<` and `StringScanner#scan` OutOfMemoryError has
been resolved in strscan gem 3.0.9.

ruby/strscan#83

## Benchmark

```
RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.3.0/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml
ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin22]
Calculating -------------------------------------
                         before       after  before(YJIT)  after(YJIT)
                 dom     10.958      11.044        16.615       16.783 i/s -     100.000 times in 9.126104s 9.055023s 6.018799s 5.958437s
                 sax     29.624      29.609        44.390       45.370 i/s -     100.000 times in 3.375641s 3.377372s 2.252774s 2.204080s
                pull     33.868      34.695        51.173       53.492 i/s -     100.000 times in 2.952679s 2.882229s 1.954138s 1.869422s
              stream     31.719      32.351        43.604       45.403 i/s -     100.000 times in 3.152713s 3.091052s 2.293356s 2.202514s

Comparison:
                              dom
         after(YJIT):        16.8 i/s
        before(YJIT):        16.6 i/s - 1.01x  slower
               after:        11.0 i/s - 1.52x  slower
              before:        11.0 i/s - 1.53x  slower

                              sax
         after(YJIT):        45.4 i/s
        before(YJIT):        44.4 i/s - 1.02x  slower
              before:        29.6 i/s - 1.53x  slower
               after:        29.6 i/s - 1.53x  slower

                             pull
         after(YJIT):        53.5 i/s
        before(YJIT):        51.2 i/s - 1.05x  slower
               after:        34.7 i/s - 1.54x  slower
              before:        33.9 i/s - 1.58x  slower

                           stream
         after(YJIT):        45.4 i/s
        before(YJIT):        43.6 i/s - 1.04x  slower
               after:        32.4 i/s - 1.40x  slower
              before:        31.7 i/s - 1.43x  slower

```

- YJIT=ON : 1.01x - 1.05x faster
- YJIT=OFF : 1.00x - 1.02x faster
  • Loading branch information
naitoh committed Jan 21, 2024
1 parent 83ca5c4 commit 7712855
Show file tree
Hide file tree
Showing 3 changed files with 4 additions and 8 deletions.
4 changes: 2 additions & 2 deletions benchmark/parse.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ contexts:
prelude: require 'rexml'
- name: master
gems:
strscan: 3.0.8
strscan: 3.0.9
prelude: |
$LOAD_PATH.unshift(File.expand_path("lib"))
require 'rexml'
Expand All @@ -19,7 +19,7 @@ contexts:
RubyVM::YJIT.enable
- name: master(YJIT)
gems:
strscan: 3.0.8
strscan: 3.0.9
prelude: |
$LOAD_PATH.unshift(File.expand_path("lib"))
require 'rexml'
Expand Down
6 changes: 1 addition & 5 deletions lib/rexml/source.rb
Original file line number Diff line number Diff line change
Expand Up @@ -149,11 +149,7 @@ def initialize(arg, block_size=500, encoding=nil)

def read
begin
# NOTE: `@scanner << readline` does not free memory, so when parsing huge XML in JRuby's DOM,
# out-of-memory error `Java::JavaLang::OutOfMemoryError: Java heap space` occurs.
# `@scanner.string = @scanner.rest + readline` frees memory that is already consumed
# and avoids this problem.
@scanner.string = @scanner.rest + readline
@scanner << readline
rescue Exception, NameError
@source = nil
end
Expand Down
2 changes: 1 addition & 1 deletion rexml.gemspec
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ Gem::Specification.new do |spec|

spec.required_ruby_version = '>= 2.5.0'

spec.add_runtime_dependency("strscan", ">= 3.0.8")
spec.add_runtime_dependency("strscan", ">= 3.0.9")

spec.add_development_dependency "benchmark_driver"
spec.add_development_dependency "bundler"
Expand Down

0 comments on commit 7712855

Please sign in to comment.