Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc5424 syslog not parsed properly #2815

Closed
ZigZagT opened this issue Feb 3, 2020 · 4 comments · Fixed by #2816
Closed

rfc5424 syslog not parsed properly #2815

ZigZagT opened this issue Feb 3, 2020 · 4 comments · Fixed by #2816

Comments

@ZigZagT
Copy link
Contributor

ZigZagT commented Feb 3, 2020

Check CONTRIBUTING guideline first and here is the list to help us investigate the problem.

Describe the bug

rfc5424 syslog can't be parsed with only regex.

Examples for incorrect behavior

  1. [] characters in message:
    Give this log message:
[e@123 ...][meta ...] [this is message]

because fluentd parse extradata with (\[(.*)\] , it will consider the real message part as extra data as well, which is incorrect

source at https://github.com/fluent/fluentd/blob/master/lib/fluent/plugin/parser_syslog.rb#L30
rfc reference at https://tools.ietf.org/html/rfc5424#section-6.3

  1. can't deal with unicode BOM
    as docmented here https://tools.ietf.org/html/rfc5424#section-6.4 the MSG

If a syslog application encodes MSG in UTF-8, the string MUST start
with the Unicode byte order mask (BOM), which for UTF-8 is ABNF
%xEF.BB.BF.

regex just can't deal with it.

Your Environment
fluentd v1.9.1

Proposals for fixing

For issues like given in the examples, I can fix it by slightly enhance the regex. However, since rfc5424 is indeed a binary-based protocol, regex is definitely not the way that we should go at the end of the day.

Instead, we need to find a way to properly implement rfc5424 parsing, probably by integrate with a 3rd party library. However, I can't help with this part in a reasonable short term since I don't know ruby at all.

@ganmacs
Copy link
Member

ganmacs commented Feb 4, 2020

2nd issue makes sense to me. and I've confirmed it doesn't work.
but it's unclear for me about 1st issue([] characters in message).

These two test works without any change in my local env. Do you have any example to reproduce it?
https://github.com/fluent/fluentd/compare/master...ganmacs:syslog-regex-test?expand=1#diff-19d720eba498373bb2a1e9e85b5b2466L331
https://github.com/fluent/fluentd/compare/master...ganmacs:syslog-regex-test?expand=1#diff-19d720eba498373bb2a1e9e85b5b2466L297

@ZigZagT
Copy link
Contributor Author

ZigZagT commented Feb 4, 2020

@ganmacs thank you for quick responding.

for the second issue, could you explain more about why it doesn't work?

for the [] issue the test case itself is a false positive. I've fixed it in last commit by removing , after [Hi]. https://github.com/fluent/fluentd/pull/2816/files#diff-19d720eba498373bb2a1e9e85b5b2466R338

@ganmacs
Copy link
Member

ganmacs commented Feb 4, 2020

for the second issue, could you explain more about why it doesn't work?

Oh, sorry... Your suggestion is correct. I said that I had confirmed that current fluentd failed this test .

for the [] issue the test case itself is a false positive. I've fixed it in last commit by removing , after [Hi]

👍

@ZigZagT
Copy link
Contributor Author

ZigZagT commented Feb 4, 2020

@ganmacs sure, thank you for your confirmation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants