-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for utf16 character encoding #26
Conversation
@avoosh thanks for submitting will take a look in a second. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make it an option with a default matching old behavior to keep backwards compatibility.
bin/check-log.rb
Outdated
@@ -208,7 +208,7 @@ def search_log | |||
line = get_log_entry(line) | |||
end | |||
|
|||
line = line.encode('UTF-8', invalid: :replace, replace: '') | |||
line = line.encode('UTF-16', invalid: :replace, replace: '').encode('UTF-8', invalid: :replace, replace: '') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this an option, while UTF-16 does have some advantages it does also have some disadvantages as outlined here (quick summary): https://stackoverflow.com/questions/4655250/difference-between-utf-8-and-utf-16 to avoid breaking change we could have it default to utf8 and then override where neccessry.
@majormoses thx for the reply, I've added an option to encode utf16 before line matching. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on what I saw that would not work as you would encode utf-16 and then re-encode utf-8. Anyways there is a better path forward. Let's just expose an option to use any encoding they want and just pass it in. Let's keep the default of UTF-8 to avoid breaking changes.
bin/check-log.rb
Outdated
@@ -128,6 +128,13 @@ class CheckLog < Sensu::Plugin::Check::CLI | |||
proc: proc(&:to_i), | |||
default: 250 | |||
|
|||
option :encode_utf16, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not make the option just encoding? for example:
option :encoding
description: 'Encode line with the following encoding before matching',
long: '--encoding $ENCODING'
default: 'UTF-8'
This makes it easier when someone decides hey I want utf-32 encoding when that exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point that being said its trivial to set the encoding on the regex itself, if you still feel strongly about it I am willing to conceded as long as you add some comments in the code around it so a year from now we understand why it was done that way.
The issue that we have is invalid byte sequence when matching strings, and the prefered encoding in such cases is UTF-8 (ASCII is a compatible subset).
Hence we do want to re-encode with utf-8 before calling the match function. |
just add a space between the |
trailing whitespace. |
Pull Request Checklist
Is this in reference to an existing issue?
Yes - #25
General
Update Changelog following the conventions laid out on Keep A Changelog
Update README with any necessary configuration snippets
Binstubs are created if needed
RuboCop passes
Existing tests pass
New Plugins
Tests
Add the plugin to the README
Does it have a complete header as outlined here
Purpose
Known Compatablity Issues