From 6cc2606438a46e6c3032805927882b1df86e8273 Mon Sep 17 00:00:00 2001 From: Hiroshi Hatake Date: Fri, 31 Jul 2020 18:33:33 +0900 Subject: [PATCH] buffer: Do more precise timekey optimization handling. Fix #3088 Object#object_id should return values the following rule: ``` On Windows: irb(main):140:0> RUBY_VERSION => "2.7.1" irb(main):141:0> a = 2**30 - 1 irb(main):142:0> a.object_id => 2147483647 irb(main):143:0> a = 2**30 - 1 irb(main):144:0> a.object_id => 2147483647 irb(main):145:0> a = 2**30 irb(main):146:0> a.object_id => 640 irb(main):147:0> a = 2**30 irb(main):148:0> a.object_id => 660 ``` For Windows, less than equal `2^30 - 1` should be stable and be able to use with #hash. --- ``` On GNU/Linux x86_64: irb(main):001:0> RUBY_VERSION => "2.7.0" irb(main):002:0> a = 2**30 - 1 irb(main):003:0> a.object_id => 2147483647 irb(main):004:0> a = 2**30 - 1 irb(main):005:0> a.object_id => 2147483647 irb(main):006:0> a = 2**30 irb(main):007:0> a.object_id => 2147483649 irb(main):008:0> a = 2**30 irb(main):009:0> a.object_id => 2147483649 irb(main):010:0> a = 2**62 - 1 irb(main):011:0> a.object_id => 9223372036854775807 irb(main):012:0> a = 2**62 - 1 irb(main):013:0> a.object_id => 9223372036854775807 irb(main):014:0> a = 2**62 irb(main):015:0> a.object_id => 180 irb(main):016:0> a = 2**62 irb(main):017:0> a.object_id => 200 ``` For GNU/Linux x86_64, less than equal `2^62 - 1` should be stable and be able to use with #hash. --- ``` On GNU/Linux aarch64: irb(main):001:0> RUBY_VERSION => "2.7.1" irb(main):002:0> a = 2**30 - 1 irb(main):003:0> a.object_id => 2147483647 irb(main):004:0> a = 2**30 - 1 irb(main):005:0> a.object_id => 2147483647 irb(main):006:0> a = 2**30 irb(main):007:0> a.object_id => 2147483649 irb(main):008:0> a = 2**30 irb(main):009:0> a.object_id => 2147483649 irb(main):010:0> a = 2**62 -1 irb(main):011:0> a.object_id => 9223372036854775807 irb(main):012:0> a = 2**62 -1 irb(main):013:0> a.object_id => 9223372036854775807 irb(main):014:0> a = 2**62 irb(main):015:0> a.object_id => 180 irb(main):016:0> a = 2**62 irb(main):017:0> a.object_id => 200 ``` For GNU/Linux aarch64, less than equal `2^62 - 1` should be stable and be able to use with #hash. --- ``` On GNU/Linux armv7l irb(main):001:0> RUBY_VERSION => "2.6.6" irb(main):002:0> a = 2**30 -1 => 1073741823 irb(main):003:0> a.object_id => 2147483647 irb(main):004:0> a = 2**30 -1 => 1073741823 irb(main):005:0> a.object_id => 2147483647 irb(main):006:0> a = 2**30 => 1073741824 irb(main):007:0> a.object_id => -209995496 irb(main):008:0> a = 2**30 => 1073741824 irb(main):009:0> a.object_id => -210001856 irb(main):010:0> a = 2**62 -1 => 4611686018427387903 irb(main):011:0> a.object_id => -209951576 irb(main):012:0> a = 2**62 -1 => 4611686018427387903 irb(main):013:0> a.object_id => -209925764 irb(main):014:0> a = 2**62 => 4611686018427387904 irb(main):015:0> a.object_id => -209907800 irb(main):016:0> a = 2**62 => 4611686018427387904 irb(main):017:0> a.object_id => -209891900 ``` For GNU/Linux armv7l than equal `2^30 - 1` should be stable and be able to use with #hash. Nowadays, unixtime should be bigger than `2^30 -1`: irb> Time.parse("2020/07/31 18:30:00+09:00").to_i > 2**30 - 1 => true So, we should check to #hash method optimization validity with the following method: ```ruby def self.enable_optimize? a1 = 2**30 - 1 a2 = 2**30 - 1 b1 = 2**62 - 1 b2 = 2**62 - 1 (a1.object_id == a2.object_id) && (b1.object_id == b2.object_id) end ``` Signed-off-by: Hiroshi Hatake --- lib/fluent/plugin/buffer.rb | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/lib/fluent/plugin/buffer.rb b/lib/fluent/plugin/buffer.rb index 5b08e19444..c908fc61e8 100644 --- a/lib/fluent/plugin/buffer.rb +++ b/lib/fluent/plugin/buffer.rb @@ -143,17 +143,33 @@ def <=>(o) end end + # timekey should be unixtime as usual. + # So, unixtime should be bigger than 2^30 - 1 (= 1073741823) nowadays. + # We should check object_id stability to use object_id as optimization for comparing operations. + # e.g.) + # irb> Time.parse("2020/07/31 18:30:00+09:00").to_i + # => 1596187800 + # irb> Time.parse("2020/07/31 18:30:00+09:00").to_i > 2**30 -1 + # => true + def self.enable_optimize? + a1 = 2**30 - 1 + a2 = 2**30 - 1 + b1 = 2**62 - 1 + b2 = 2**62 - 1 + (a1.object_id == a2.object_id) && (b1.object_id == b2.object_id) + end + # This is an optimization code. Current Struct's implementation is comparing all data. # https://github.com/ruby/ruby/blob/0623e2b7cc621b1733a760b72af246b06c30cf96/struct.c#L1200-L1203 # Actually this overhead is very small but this class is generated *per chunk* (and used in hash object). # This means that this class is one of the most called object in Fluentd. # See https://github.com/fluent/fluentd/pull/2560 - # But, this optimization has a side effect on Windows due to differing object_id. + # But, this optimization has a side effect on Windows and 32bit environment(s) due to differing object_id. # This difference causes flood of buffer files. - # So, this optimization should be enabled on non-Windows platform. + # So, this optimization should be enabled on `enable_optimize?` as true platforms. def hash timekey.object_id - end unless Fluent.windows? + end if enable_optimize? end # for tests