-
Notifications
You must be signed in to change notification settings - Fork 641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pmrfc3164 blindly swallows the first word of an invalid syslog header as the hostname #1789
Comments
the rfc3164 parser is making a 'best effort' attempt to produce something sane
when presented with a message that doesn't conform with the rfc. It's a pile of
heuristics that have evolved over the years. Unfortunantly there are systems
that include hostname, but don't includ PRI or timestamp, and for those systems,
the current behavior is correct.
Unless the first word fails the criteria for being a hostname (special
characters for example), how is rsyslog supposed to know that it's not a
hostname?
leaving hostname null is never the correct answer, it would get populated by
fromhost if all else fails.
It would be possible to define a dontallowhostnamewithoutpriortimestamp but that
is getting very ugly, it would be better to fix the configuration of whatever is
sending the malformed data so that it's sending legitimate messages instead.
|
Fair enough, I can understand the messy evolution of syslog and leaving the the current default to accept any string that looks like a hostname. Having something like
Which is what the RFC says should happen if the priority or header is invalid, but I get that it's useful to try parse a valid timestamp and hostname header if just the priority part is missing.
Ideally, but not always practical E.g. internal vulnerability scanners hit the TCP or UDP ports for syslog and while trying to fingerprint the service, generate bogus hostnames. Dataset from rsyslog gets polluted with thousands of bogus hostnames - this is in fact the reason I logged this issue. I suppose, one work around could be fiddling with IP tables to blacklist the known scanning hosts. |
On Thu, 21 Sep 2017, JPvRiel wrote:
> Unfortunantly there are systems that include hostname, but don't includ PRI or timestamp, and for those systems, the current behavior is correct.
Fair enough, I can understand the messy evolution of syslog and leaving the
the current default to accept any string that looks like a hostname. Having
something like `force.validHeaderHostname` or as you suggested
`dontallowhostnamewithoutpriortimestamp` as an option for the parser would
still be valuable.
well, it already only accepts the word as the hostname if it is a valid
hostname.
> leaving hostname null is never the correct answer, it would get populated by
fromhost if all else fails.
Which is what the RFC says should happen if the priority or header is invalid, but I get that it's useful to try parse a valid timestamp and hostname header if just the priority part is missing.
yeah, again the RFC is what's supposed to happen in theory and "in theory,
theory and practice are the same, in practice they are not". there are a lot of
things that forget to send the PRI tag, but do send a timestamp, or are missing
both and do send a hostname.
> fix the configuration of whatever is sending the malformed data so that it's sending legitimate messages instead
Ideally, but not always practical E.g. internal vulnerability scanners hit the TCP or UDP ports for syslog and while trying to fingerprint the service, generate bogus hostnames. Dataset from rsyslog gets polluted with thousands of bogus hostnames - this is in fact the reason I logged this issue.
I suppose, one work around could be fiddling with IP tables to blacklist the known scanning hosts.
well, if it's an internal scanner, that should be easy (and it should not be
trying that many things on your syslog port, so a few 'extra' hostnames would
show up, but not lots)
If it's external things scanning you, you really should block them via packet
filters, the syslog protocol has no protection against forgery and other nasty
things being transported in the log messages.
David Lang
|
@davidelang agree with your advice. Nonetheless, rsyslog
The above two would default to off, but if on, not try parse the hostname from the message, and instead use the Albeit my C/C++ skills are limited, I tried forking and working on a PR for this, but there's a few external source dependencies ( |
Problem statement and example
Given a poorly formed, lazy, unconventional message:
pmrfc3164 (with
force.tagEndingByColon="on"
) will parse it and populate$hostname
with "Poor" and set$msg
as " form RFC3164 without syslog header". This "pollutes" the$hostname
field and cuts information out of the$msg
field. Withoutforce.tagEndingByColon="on"
, the result is even worse...While RFC3164 does permit input without any priority header, date, hostname, or syslog tag, it's poor form and considered 'unconventional'.
pmrfc3164 follows the RFC and accepts such 'malformed' / 'lazy' messages, as it should, but then also assumes they are well formed and parses content into the hostname and syslog tags. By current default (v8.29), invalid "junk" ends up in the following message properties:
$hostname
$syslogtag
$app-name
/$programname
I'm tempted to call the above default assumptions bugs. RFC3164 pretty much states these assumptions should not be made.
Relevant RFC3164 extracts
4.3.2 Valid PRI but no TIMESTAMP or invalid TIMESTAMP
4.3.3 No PRI or Unidentifiable PRI
Current work-around methods
There is no option to avoid the junk in the
hostname
property. Theforce.tagEndingByColon
solves the issue for invalid tags, but not hostnames.The
parseHostnameAndTag
option is halfway there, but not ideal since most users might well want these fields parsed if the message did conform to RFC3164 conventions.Users might be tempted to work arround with other config tricks such as custom local message variables with templating, but it usually amounts to something cumbersome and at some performance cost.
My first thought was to use regex matches, but I know those might be expensive and a very simple string match to see
$rawmsg
or$rawmsg-after-pri
begin with$hostname
. I got this far and realised this should rather be fixed in the source of pmrfc3163:Then a custom template can include the metadata.
Possible solution
Add an option
force.validHeaderHostname
which can default to "off" (backward compatibility) but when "on", sets$hostname
to a null value if the syslog header is invalid (e.g. no date detected) and assume$msg = $rawmsg-after-pri
.There is some trickiness without RFC3164 relay/output, and one can either pass along the "malformed" message verbatim or to to avoid pain on the other end:
$hostname = $fromhost
(not done at present)$timestamp = $timegenerated
(this is already done I assume)For relaying as RFC5424 or JSON, one could be truthful and set
$hostname = "-"
(nil value), but this loses the benefit of guessing the log source host via$fromhost
in trying to be overly correct.In my case, I like the option to have metadata injected that indicates if the message was malformed and without a syslog header. Nice properties to complement this approach might be to have these two extra properties to indicate what happend during parsing (e.g. in the JSON object, i):
That could let downstream systems know if the facility, hostname and/or date was guessed/added instead of being from the raw message.
Related issues
I searched the issues and was surprised to not find this raised already: https://github.com/rsyslog/rsyslog/issues?utf8=%E2%9C%93&q=is%3Aissue%20hostname%20rfc3164
These are related, but not the same thing:
parseHostnameAndTag
option to disable parsing hostname and syslog tag, but this is normally useful and desirable for valid messages.The text was updated successfully, but these errors were encountered: