New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add auto_type_convert option for parser.rb #151
Conversation
…log and so on. I have add auto type conversion option for parser. It is not cool to manualy type conversion by manualy. it have to update whenever log format has changed. #### configuration sample ``` <source> type tail format ltsv time_format %d/%b/%Y:%H:%M:%S %z path /var/log/httpd/access_log tag debug.apache.access auto_type_convert yes </source> ``` #### sample results the difference seems in status, size, response_time. ``` #auto_type_convert yes 2013-07-25 01:12:48 +0900 debug.apache.access: {"domain":"127.0.0.1","host":"127.0.0.1","server":"127.0.0.1","ident":"- ","user":"-","me thod":"GET","path":"/","protocol":"HTTP/1.1","status":404,"size":198,"re ferer":"-","agent":"Mozilla","response_time":398,"cookie":"-","set_cooki e":"-"} #auto_type_convert no (or undefined) 2013-07-25 01:12:48 +0900 debug.apache.access: {"domain":"127.0.0.1","host":"127.0.0.1","server":"127.0.0.1","ident":"- ","user":"-","me thod":"GET","path":"/","protocol":"HTTP/1.1","status":"404","size":"198" ,"referer":"-","agent":"Mozilla","response_time":"398","cookie":"-","set _cookie":"-"} ```
it is twice as fast! ``` $ bundle exec ruby test.rb user system total real 0.000000 0.000000 0.000000 ( 0.000033) 0.000000 0.000000 0.000000 ( 0.000016) ```
benchmark resultI have benchmarked the type conversion speed because it is very important to parse as fast as possible.
benchmark coderequire 'benchmark'
def convert_type1(record)
record.each do |index,value|
if value == value.to_i.to_s
record[index] = value.to_i
elsif value == value.to_f.to_s
record[index] = value.to_f
end
end
return record
end
def convert_type2(record)
record.each do |index,value|
if (Integer(value) rescue false)
record[index] = value.to_i
elsif (Float(value) rescue false)
record[index] = value.to_f
end
end
return record
end
def convert_type3(record)
record.each do |index,value|
if (int_value = Integer(value) rescue false)
record[index] = int_value
elsif (float_value = Float(value) rescue false)
record[index] = float_value
end
end
return record
end
record = {"int"=>"123","float"=>"12.34"}
n = 1000000
Benchmark.bm do |x|
x.report { convert_type1(record) }
x.report { convert_type2(record) }
x.report { convert_type3(record) }
end |
I have wrote a blog about this pull-req. |
@@ -95,6 +95,7 @@ class ValuesParser | |||
config_param :keys, :string | |||
config_param :time_key, :string, :default => nil | |||
config_param :time_format, :string, :default => nil | |||
config_param :auto_type_convert, :string, :default => nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use string type?
On using yes/no or true/false, it should use :bool instead of :string
Oops, I've fixed. y-ken@64e21a4 |
Maybe, your benchmark is wrong. require 'benchmark'
def convert_type1(record)
record.each do |key, value|
if value == value.to_i.to_s
record[key] = value.to_i
elsif value == value.to_f.to_s
record[key] = value.to_f
end
end
record
end
def convert_type2(record)
record.each do |key, value|
if (Integer(value) rescue false)
record[key] = value.to_i
elsif (Float(value) rescue false)
record[key] = value.to_f
end
end
record
end
def convert_type3(record)
record.each do |key, value|
if (int_value = Integer(value) rescue false)
record[key] = int_value
elsif (float_value = Float(value) rescue false)
record[key] = float_value
end
end
record
end
record = {"int" => "123", "float" => "12.34", "str" => "13498734hoge", "msg" => "/path/to"}
n = 10000
Benchmark.bmbm do |x|
x.report { n.times { convert_type1(record.dup) } }
x.report { n.times { convert_type2(record.dup) } }
x.report { n.times { convert_type3(record.dup) } }
end In my env, result is below:
|
thank you. $ ruby --version
ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin12.2.0]
$ bundle exec ruby test.rb
Rehearsal ------------------------------------
0.200000 0.000000 0.200000 ( 0.198400)
1.530000 0.030000 1.560000 ( 1.565692)
1.530000 0.030000 1.560000 ( 1.555802)
--------------------------- total: 3.320000sec
user system total real
0.200000 0.000000 0.200000 ( 0.197015)
1.450000 0.030000 1.480000 ( 1.481821)
1.460000 0.030000 1.490000 ( 1.493970) |
On adding this test, convert_type4 is same speed or faster a little than convert_type1. def convert_type4(record)
record.each do |index,value|
if value == (int_value = value.to_i).to_s
record[index] = int_value
elsif value == (float_value = value.to_f).to_s
record[index] = float_value
end
end
return record
end results
coderequire 'benchmark'
def convert_type1(record)
record.each do |index,value|
if value == value.to_i.to_s
record[index] = value.to_i
elsif value == value.to_f.to_s
record[index] = value.to_f
end
end
return record
end
def convert_type2(record)
record.each do |index,value|
if (Integer(value) rescue false)
record[index] = value.to_i
elsif (Float(value) rescue false)
record[index] = value.to_f
end
end
return record
end
def convert_type3(record)
record.each do |index,value|
if (int_value = Integer(value) rescue false)
record[index] = int_value
elsif (float_value = Float(value) rescue false)
record[index] = float_value
end
end
return record
end
def convert_type4(record)
record.each do |index,value|
if value == (int_value = value.to_i).to_s
record[index] = int_value
elsif value == (float_value = value.to_f).to_s
record[index] = float_value
end
end
return record
end
record = {"int" => "123", "float" => "12.34", "str" => "13498734hoge", "msg" => "/path/to"}
n = 10000
Benchmark.bmbm do |x|
x.report { n.times { convert_type1(record.dup) } }
x.report { n.times { convert_type2(record.dup) } }
x.report { n.times { convert_type3(record.dup) } }
x.report { n.times { convert_type4(record.dup) } }
end |
I have updated the code and also confirmed rake test has passed. y-ken@7313cdf |
speed up for convert type implementation. It has mistaken for my previous benchmark.
@@ -126,6 +128,17 @@ def values_map(values) | |||
|
|||
return time, record | |||
end | |||
|
|||
def convert_type(record) | |||
record.each do |index,value| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable name index
is wrong. It's not index.
field
? name
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I quite agree. It should use field
, name
or key
.
Thank you @tagomoris . I quite agree. It should use |
It may results unexpected conversion and it should use non-missin-critical situation. Would you please give me a opinion for this issue?
|
Auto casting is one of type conversion. Example:
What do you think? |
Now parser has |
Thank you! |
summary
We have a problem to analyzing int/float columns with TreasureData or mongoDB which parsed from file with in_tail.
I expect
123
to be integer,123.45
to be float value but in_tail makes them string type.On using
fluent-plugin-typecast
, It could convert types by manually.It is not cool to convert type (cast) by manually. it have to update whenever log format has changed.
it takes much cost of maintenance.
Thus, I have add auto type conversion option for
parser.rb
.configuration sample
sample results
the difference seems in status, size, response_time.