Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fluentd: Fix bug that caused lines to be dropped when containing non utf-8 characters #5107

Merged
merged 4 commits into from
Jan 12, 2022

Conversation

chaudum
Copy link
Contributor

@chaudum chaudum commented Jan 12, 2022

What this PR does / why we need it:

When a log line in hash format contains non UTF-8 characters fluentd
would drop the complete line because it failed to convert the line in
key-value format.

By forcing UTF-8 encoding and replacing non UTF-8 characters with empty
strings the log line will not be dropped but only contain the valid
UTF-8 characters.

Which issue(s) this PR fixes:

Fixes #5099

Special notes for your reviewer:

Checklist

  • Documentation added
  • Tests updated
  • Add an entry in the CHANGELOG.md about the changes.

When a log line in hash format contains non UTF-8 characters fluentd
would drop the complete line because it failed to convert the line in
key-value format.

By forcing UTF-8 encoding and replacing non UTF-8 characters with empty
strings the log line will not be dropped but only contain the valid
UTF-8 characters.

Fixes #5099

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
@chaudum chaudum requested a review from a team as a code owner January 12, 2022 09:22
Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
@chaudum chaudum changed the title Chaudum/fluentd docker compse fluentd: Fix bug that caused lines to be dropped when containing non utf-8 characters Jan 12, 2022
@chaudum
Copy link
Contributor Author

chaudum commented Jan 12, 2022

@cyriltovena is bumping the fluentd plugin version here the right thing to do? 7fd4c40

@cyriltovena
Copy link
Contributor

7fd4c40

Correct !

@@ -278,6 +278,8 @@ def record_to_line(record)
when :key_value
formatted_labels = []
record.each do |k, v|
# Remove non UTF-8 characters by force-encoding the string and replacing said chars with empty string
v = v.encode('utf-8', invalid: :replace, replace: '')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we replace with another special character may be ? Not sure if we can use empty string , it will not tell users there was an issue there ? WDYT ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would � work ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can let Ruby decide how to replace the invalid characters, like so:

v = v.encode('utf-8', invalid: :replace)

@@ -4,7 +4,7 @@ $LOAD_PATH.push File.expand_path('lib', __dir__)

Gem::Specification.new do |spec|
spec.name = 'fluent-plugin-grafana-loki'
spec.version = '1.2.16'
spec.version = '1.2.17'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 this will do !

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
@@ -122,7 +139,7 @@
expect(body[:streams][0]['stream'].empty?).to eq true
expect(body[:streams][0]['values'].count).to eq 1
expect(body[:streams][0]['values'][0][0]).to eq '1546270458000000000'
expect(body[:streams][0]['values'][0][1]).to eq 'message="' + content[0] + '" stream="stdout"'
expect(body[:streams][0]['values'][0][1]).to eq 'message="' + content[0] + '" stream=stdout'
Copy link
Contributor Author

@chaudum chaudum Jan 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was failing since c298514793ddf49ec8d15972b35cdae5f225b5d2

Copy link
Contributor

@cyriltovena cyriltovena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cyriltovena
Copy link
Contributor

Thank you so much @chaudum

@cyriltovena cyriltovena merged commit 52fa211 into main Jan 12, 2022
@cyriltovena cyriltovena deleted the chaudum/fluentd-docker-compse branch January 12, 2022 13:05
chaudum added a commit that referenced this pull request Jan 17, 2022
This change fixes a bug that is caused by incorrectly trying to encode
any values other than string.
This bug was a regression introduced with #5107 and caused
`NoMethodError` errors.

Fixes #5099

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
cyriltovena pushed a commit that referenced this pull request Jan 17, 2022
* Fix encoding error in fluentd client

This change fixes a bug that is caused by incorrectly trying to encode
any values other than string.
This bug was a regression introduced with #5107 and caused
`NoMethodError` errors.

Fixes #5099

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>

* Bump fluent-plugin-grafana-loki to version 1.2.18

Signed-off-by: Christian Haudum <christian.haudum@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

When log line from fluentd contains non-UTF-8 characters, the whole log line is dropped in Loki
2 participants