Skip to content

parser_tsv: Improve parse performance and reduce memory allocation#5344

Merged
kenhys merged 1 commit into
fluent:masterfrom
Watson1978:parser_tsv
Apr 30, 2026
Merged

parser_tsv: Improve parse performance and reduce memory allocation#5344
kenhys merged 1 commit into
fluent:masterfrom
Watson1978:parser_tsv

Conversation

@Watson1978
Copy link
Copy Markdown
Contributor

@Watson1978 Watson1978 commented Apr 30, 2026

Which issue(s) this PR fixes:
Fixes #

What this PR does / why we need it:
This commit replaces Hash[@keys.zip(values)] with a while loop in TSVParser#parse.

The previous implementation using Array#zip created multiple short-lived intermediate arrays per line, which caused significant GC pressure when processing a large amount of logs.
By using a simple while loop and direct hash assignment, we can completely eliminate these intermediate object allocations.

Furthermore, because Array#zip is implemented in C natively, it leaves little room for Ruby's JIT compilers.

Micro benchmark

require 'bundler/inline'
gemfile do
  source 'https://rubygems.org'
  gem 'benchmark-ips'
  gem 'benchmark-memory'
end

@values = Array.new(9) { |i| i.to_s * 20 }
@keys = ("k1".."k9").to_a

def benchmarks(x)
  x.report('zip') do
    Hash[@keys.zip(@values)]
  end

  x.report('each_with_index') do
    r = {}
    @keys.each_with_index do |k, i|
      r[k] = @values[i]
    end
  end

  x.report('while') do
    r = {}
    i = 0
    len = @keys.length
    while i < len
      r[@keys[i]] = @values[i]
      i += 1
    end
  end
end

Benchmark.ips do |x|
  benchmarks(x)
  x.compare!
end

Benchmark.memory do |x|
  benchmarks(x)
end
  • Result (no JIT)
$ ruby zip.rb
ruby 4.0.3 (2026-04-21 revision 85ddef263a) +PRISM [x86_64-linux]
Warming up --------------------------------------
                 zip   130.961k i/100ms
     each_with_index   121.090k i/100ms
               while   163.611k i/100ms
Calculating -------------------------------------
                 zip      1.315M (± 0.2%) i/s  (760.72 ns/i) -      6.679M in   5.080866s
     each_with_index      1.250M (± 0.6%) i/s  (799.97 ns/i) -      6.297M in   5.037284s
               while      1.655M (± 0.4%) i/s  (604.23 ns/i) -      8.344M in   5.041905s

Comparison:
               while:  1654988.8 i/s
                 zip:  1314549.8 i/s - 1.26x  slower
     each_with_index:  1250053.2 i/s - 1.32x  slower

Calculating -------------------------------------
                 zip     1.384k memsize (     0.000  retained)
                        19.000  objects (     0.000  retained)
                         7.000  strings (     0.000  retained)
     each_with_index   824.000  memsize (     0.000  retained)
                         8.000  objects (     0.000  retained)
                         7.000  strings (     0.000  retained)
               while   824.000  memsize (   824.000  retained)
                         8.000  objects (     8.000  retained)
                         7.000  strings (     7.000  retained)
  • Result (JIT)
$ ruby --jit zip.rb
ruby 4.0.3 (2026-04-21 revision 85ddef263a) +YJIT +PRISM [x86_64-linux]
Warming up --------------------------------------
                 zip   132.431k i/100ms
     each_with_index   144.321k i/100ms
               while   222.340k i/100ms
Calculating -------------------------------------
                 zip      1.247M (± 1.8%) i/s  (801.86 ns/i) -      6.357M in   5.098935s
     each_with_index      1.502M (± 0.8%) i/s  (665.56 ns/i) -      7.649M in   5.091197s
               while      2.272M (± 0.9%) i/s  (440.10 ns/i) -     11.562M in   5.088800s

Comparison:
               while:  2272192.3 i/s
     each_with_index:  1502491.1 i/s - 1.51x  slower
                 zip:  1247100.3 i/s - 1.82x  slower

Calculating -------------------------------------
                 zip     1.384k memsize (     0.000  retained)                                                                                                     19.000  objects (     0.000  retained)
                         7.000  strings (     0.000  retained)
     each_with_index   824.000  memsize (     0.000  retained)
                         8.000  objects (     0.000  retained)
                         7.000  strings (     0.000  retained)
               while   824.000  memsize (     0.000  retained)
                         8.000  objects (     0.000  retained)
                         7.000  strings (     0.000  retained)

Docs Changes:
N/A

Release Note:

  • parser_tsv: improve parse performance and reduce memory allocation

Signed-off-by: Shizuo Fujita <fujita@clear-code.com>
@Watson1978 Watson1978 added this to the v1.20.0 milestone Apr 30, 2026
@Watson1978 Watson1978 marked this pull request as ready for review April 30, 2026 04:51
@Watson1978 Watson1978 requested a review from kenhys April 30, 2026 04:51
Copy link
Copy Markdown
Contributor

@kenhys kenhys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@kenhys kenhys merged commit 07a1480 into fluent:master Apr 30, 2026
36 of 37 checks passed
@Watson1978
Copy link
Copy Markdown
Contributor Author

Thanks

@Watson1978 Watson1978 deleted the parser_tsv branch April 30, 2026 05:55
Watson1978 added a commit that referenced this pull request May 11, 2026
…5354)

**Which issue(s) this PR fixes**: 
Fixes #

**What this PR does / why we need it**: 
This commit replaces `Hash[@keys.zip(values)]` with a `while` loop in
`CSVParser#parse`.

This is same with #5344

**Docs Changes**:

**Release Note**:

Signed-off-by: Shizuo Fujita <fujita@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants