parser_tsv: Improve parse performance and reduce memory allocation by Watson1978 · Pull Request #5344 · fluent/fluentd

Watson1978 · 2026-04-30T04:29:47Z

Which issue(s) this PR fixes:
Fixes #

What this PR does / why we need it:
This commit replaces Hash[@keys.zip(values)] with a while loop in TSVParser#parse.

The previous implementation using Array#zip created multiple short-lived intermediate arrays per line, which caused significant GC pressure when processing a large amount of logs.
By using a simple while loop and direct hash assignment, we can completely eliminate these intermediate object allocations.

Furthermore, because Array#zip is implemented in C natively, it leaves little room for Ruby's JIT compilers.

Micro benchmark

require 'bundler/inline'
gemfile do
  source 'https://rubygems.org'
  gem 'benchmark-ips'
  gem 'benchmark-memory'
end

@values = Array.new(9) { |i| i.to_s * 20 }
@keys = ("k1".."k9").to_a

def benchmarks(x)
  x.report('zip') do
    Hash[@keys.zip(@values)]
  end

  x.report('each_with_index') do
    r = {}
    @keys.each_with_index do |k, i|
      r[k] = @values[i]
    end
  end

  x.report('while') do
    r = {}
    i = 0
    len = @keys.length
    while i < len
      r[@keys[i]] = @values[i]
      i += 1
    end
  end
end

Benchmark.ips do |x|
  benchmarks(x)
  x.compare!
end

Benchmark.memory do |x|
  benchmarks(x)
end

Result (no JIT)

$ ruby zip.rb
ruby 4.0.3 (2026-04-21 revision 85ddef263a) +PRISM [x86_64-linux]
Warming up --------------------------------------
                 zip   130.961k i/100ms
     each_with_index   121.090k i/100ms
               while   163.611k i/100ms
Calculating -------------------------------------
                 zip      1.315M (± 0.2%) i/s  (760.72 ns/i) -      6.679M in   5.080866s
     each_with_index      1.250M (± 0.6%) i/s  (799.97 ns/i) -      6.297M in   5.037284s
               while      1.655M (± 0.4%) i/s  (604.23 ns/i) -      8.344M in   5.041905s

Comparison:
               while:  1654988.8 i/s
                 zip:  1314549.8 i/s - 1.26x  slower
     each_with_index:  1250053.2 i/s - 1.32x  slower

Calculating -------------------------------------
                 zip     1.384k memsize (     0.000  retained)
                        19.000  objects (     0.000  retained)
                         7.000  strings (     0.000  retained)
     each_with_index   824.000  memsize (     0.000  retained)
                         8.000  objects (     0.000  retained)
                         7.000  strings (     0.000  retained)
               while   824.000  memsize (   824.000  retained)
                         8.000  objects (     8.000  retained)
                         7.000  strings (     7.000  retained)

Result (JIT)

$ ruby --jit zip.rb
ruby 4.0.3 (2026-04-21 revision 85ddef263a) +YJIT +PRISM [x86_64-linux]
Warming up --------------------------------------
                 zip   132.431k i/100ms
     each_with_index   144.321k i/100ms
               while   222.340k i/100ms
Calculating -------------------------------------
                 zip      1.247M (± 1.8%) i/s  (801.86 ns/i) -      6.357M in   5.098935s
     each_with_index      1.502M (± 0.8%) i/s  (665.56 ns/i) -      7.649M in   5.091197s
               while      2.272M (± 0.9%) i/s  (440.10 ns/i) -     11.562M in   5.088800s

Comparison:
               while:  2272192.3 i/s
     each_with_index:  1502491.1 i/s - 1.51x  slower
                 zip:  1247100.3 i/s - 1.82x  slower

Calculating -------------------------------------
                 zip     1.384k memsize (     0.000  retained)                                                                                                     19.000  objects (     0.000  retained)
                         7.000  strings (     0.000  retained)
     each_with_index   824.000  memsize (     0.000  retained)
                         8.000  objects (     0.000  retained)
                         7.000  strings (     0.000  retained)
               while   824.000  memsize (     0.000  retained)
                         8.000  objects (     0.000  retained)
                         7.000  strings (     0.000  retained)

Docs Changes:
N/A

Release Note:

parser_tsv: improve parse performance and reduce memory allocation

Signed-off-by: Shizuo Fujita <fujita@clear-code.com>

kenhys

LGTM.

Watson1978 · 2026-04-30T05:55:08Z

Thanks

…5354) **Which issue(s) this PR fixes**: Fixes # **What this PR does / why we need it**: This commit replaces `Hash[@keys.zip(values)]` with a `while` loop in `CSVParser#parse`. This is same with #5344 **Docs Changes**: **Release Note**: Signed-off-by: Shizuo Fujita <fujita@clear-code.com>

parser_tsv: Improve parse performance and reduce memory allocation

07c455a

Signed-off-by: Shizuo Fujita <fujita@clear-code.com>

Watson1978 added this to the v1.20.0 milestone Apr 30, 2026

Watson1978 marked this pull request as ready for review April 30, 2026 04:51

Watson1978 requested a review from kenhys April 30, 2026 04:51

kenhys approved these changes Apr 30, 2026

View reviewed changes

kenhys merged commit 07a1480 into fluent:master Apr 30, 2026
36 of 37 checks passed

Watson1978 deleted the parser_tsv branch April 30, 2026 05:55

Watson1978 mentioned this pull request May 9, 2026

parser_csv: improve parse performance and reduce memory allocation #5354

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parser_tsv: Improve parse performance and reduce memory allocation#5344

parser_tsv: Improve parse performance and reduce memory allocation#5344
kenhys merged 1 commit into
fluent:masterfrom
Watson1978:parser_tsv

Watson1978 commented Apr 30, 2026 •

edited

Loading

Uh oh!

kenhys left a comment

Uh oh!

Uh oh!

Watson1978 commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Watson1978 commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Micro benchmark

Uh oh!

kenhys left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Watson1978 commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Watson1978 commented Apr 30, 2026 •

edited

Loading