Skip to content

Commit

Permalink
Force encoding to UTF-8 when concatenating sources to generate finger…
Browse files Browse the repository at this point in the history
…print (#64)

To get around jruby/jruby#6748, force the encoding to UTF-8
This issue causes events containing arrays of hashes which include non US-ASCII
characters to crash the plugin if concatenating sources:

Example event that would previously cause an issue:
{"top_level":"ง","inner":[{"inner_key":"ง"},{"1":"2"}]}

This commit forces encoding of intermediate results when concatenating to UTF-8 to
get around this issue

Fixes #63
  • Loading branch information
robbavey committed Jul 8, 2021
1 parent 0b4fcc0 commit 61e21c8
Show file tree
Hide file tree
Showing 4 changed files with 25 additions and 3 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
## 3.3.1
- Force encoding to UTF-8 when concatenating sources to generate fingerprint [#64](https://github.com/logstash-plugins/logstash-filter-fingerprint/pull/64)

## 3.3.0
- Add ECS compatibility [#62](https://github.com/logstash-plugins/logstash-filter-fingerprint/pull/62)

Expand Down
6 changes: 4 additions & 2 deletions lib/logstash/filters/fingerprint.rb
Original file line number Diff line number Diff line change
Expand Up @@ -128,11 +128,13 @@ def filter(event)
to_string = ""
if @concatenate_all_fields
deep_sort_hashes(event.to_hash).each do |k,v|
to_string << "|#{k}|#{v}"
# Force encoding to UTF-8 to get around https://github.com/jruby/jruby/issues/6748
to_string << "|#{k}|#{v}".force_encoding("UTF-8")
end
else
@source.sort.each do |k|
to_string << "|#{k}|#{deep_sort_hashes(event.get(k))}"
# Force encoding to UTF-8 to get around https://github.com/jruby/jruby/issues/6748
to_string << "|#{k}|#{deep_sort_hashes(event.get(k))}".force_encoding("UTF-8")
end
end
to_string << "|"
Expand Down
2 changes: 1 addition & 1 deletion logstash-filter-fingerprint.gemspec
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Gem::Specification.new do |s|

s.name = 'logstash-filter-fingerprint'
s.version = '3.3.0'
s.version = '3.3.1'
s.licenses = ['Apache-2.0']
s.summary = "Fingerprints fields by replacing values with a consistent hash"
s.description = "This gem is a Logstash plugin required to be installed on top of the Logstash core pipeline using $LS_HOME/bin/logstash-plugin install gemname. This gem is not a stand-alone program"
Expand Down
17 changes: 17 additions & 0 deletions spec/filters/fingerprint_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,23 @@
end
end

context "when utf-8 chars used" do
let(:config) { super().merge("source" => ['field1', 'field2']) }
let(:data) { {"field1"=>[{"inner_key"=>"🂡"}, {"1"=>"2"}], "field2"=>"🂡"} }
it "fingerprints the value of the last value" do
# SHA1 of "|field1|inner_key|🂡|1|2|field2|🂡|"
expect(fingerprint).to eq("58fa9e0e60c9f0d24b51d84cddb26732a39eeb3d")
end

describe "with concatenate_sources" do
let(:config) { super().merge("concatenate_sources" => true) }
it "fingerprints the value of concatenated key/pairs" do
# SHA1 of "|field1|inner_key|🂡|1|2|field2|🂡|"
expect(fingerprint).to eq("d74f41841c7cdc793a97c218d2ff18064a5f1950")
end
end
end

describe "PUNCTUATION method" do
let(:fingerprint_method) { 'PUNCTUATION' }
let(:config) { super().merge("source" => 'field1') }
Expand Down

0 comments on commit 61e21c8

Please sign in to comment.