pmrfc3164 blindly swallows the first word of an invalid syslog header as the hostname #1789

JPvRiel · 2017-09-21T23:29:55Z

Problem statement and example

Given a poorly formed, lazy, unconventional message:

Poor form RFC3164 without syslog header

pmrfc3164 (with force.tagEndingByColon="on") will parse it and populate $hostname with "Poor" and set $msg as " form RFC3164 without syslog header". This "pollutes" the $hostname field and cuts information out of the $msg field. Without force.tagEndingByColon="on", the result is even worse...

While RFC3164 does permit input without any priority header, date, hostname, or syslog tag, it's poor form and considered 'unconventional'.

pmrfc3164 follows the RFC and accepts such 'malformed' / 'lazy' messages, as it should, but then also assumes they are well formed and parses content into the hostname and syslog tags. By current default (v8.29), invalid "junk" ends up in the following message properties:

$hostname
$syslogtag
$app-name / $programname

I'm tempted to call the above default assumptions bugs. RFC3164 pretty much states these assumptions should not be made.

Relevant RFC3164 extracts

4.3.2 Valid PRI but no TIMESTAMP or invalid TIMESTAMP

If a relay does not find a valid TIMESTAMP in a received syslog
packet, then it MUST add a TIMESTAMP and a space character
immediately after the closing angle bracket of the PRI part. It
SHOULD additionally add a HOSTNAME and a space character after the
TIMESTAMP.

4.3.3 No PRI or Unidentifiable PRI

If the relay receives a syslog message without a PRI, or with an
unidentifiable PRI, then it MUST insert a PRI with a Priority value
of 13 as well as a TIMESTAMP as described in Section 4.3.2. The
relay SHOULD also insert a HOSTNAME as described in Section 4.3.2.

Current work-around methods

There is no option to avoid the junk in the hostname property. The force.tagEndingByColon solves the issue for invalid tags, but not hostnames.

The parseHostnameAndTag option is halfway there, but not ideal since most users might well want these fields parsed if the message did conform to RFC3164 conventions.

Users might be tempted to work arround with other config tricks such as custom local message variables with templating, but it usually amounts to something cumbersome and at some performance cost.

My first thought was to use regex matches, but I know those might be expensive and a very simple string match to see $rawmsg or $rawmsg-after-pri begin with $hostname. I got this far and realised this should rather be fixed in the source of pmrfc3163:

  # append meta-data
  if ($protocol-version == "1") then {
    set $!format = "RFC5424";
  } else {
    # assume protocol-version == 0
    # check priority
    # 0-191 are valid priority encodings, set to 192 > 191 to indicate invalid
    set $.pri-test = cnum(re_extract($rawmsg, "^<([0-9]{1,3})>", 0, 0, "192"));
    if ($.pri-test > 191) then {
      set $!pri-valid = "false";
      set $!format = "RFC3164_malformed";
    } else {
      set $!pri-valid = "true";
    }
    # Check syslog header (date and hostname)
    # - Regex is costly, so skim for 'Mmm' pattern of date (but dont match to actual months)
    # - Also use a trick to see if rsyslog assumed the first word was the hostname after failing to parse a syslog header date
    if (not re_match($rawmsg-after-pri, "^[A-Z][a-z]{2}") or $rawmsg-after-pri startswith $hostname) then {
      # rsyslog assumed first word of malformed message was the hostname, so no valid header
      set $!format = "RFC3164_malformed";
      set $!header-valid = "false";
      set $.hostname = $hostname;
      set $.msg = $rawmsg-after-pri;
    } else {
      set $!format = "RFC3164";
      # above checks not 100% precise to avoid perfomance cost, so only reasonably confident a good header was found
      set $!header-valid = "true";
      set $.hostname = $hostname;
      set $.msg = $msg;
    }
  }

Then a custom template can include the metadata.

Possible solution

Add an option force.validHeaderHostname which can default to "off" (backward compatibility) but when "on", sets $hostname to a null value if the syslog header is invalid (e.g. no date detected) and assume $msg = $rawmsg-after-pri.

There is some trickiness without RFC3164 relay/output, and one can either pass along the "malformed" message verbatim or to to avoid pain on the other end:

$hostname = $fromhost (not done at present)
$timestamp = $timegenerated (this is already done I assume)

For relaying as RFC5424 or JSON, one could be truthful and set $hostname = "-" (nil value), but this loses the benefit of guessing the log source host via $fromhost in trying to be overly correct.

In my case, I like the option to have metadata injected that indicates if the message was malformed and without a syslog header. Nice properties to complement this approach might be to have these two extra properties to indicate what happend during parsing (e.g. in the JSON object, i):

{
  "pri-valid": false
  "syslogheader-valid": false
}

That could let downstream systems know if the facility, hostname and/or date was guessed/added instead of being from the raw message.

Related issues

I searched the issues and was surprised to not find this raised already: https://github.com/rsyslog/rsyslog/issues?utf8=%E2%9C%93&q=is%3Aissue%20hostname%20rfc3164

These are related, but not the same thing:

global(parser.parseHostnameAndTag="off") not honored #1190: parseHostnameAndTag option to disable parsing hostname and syslog tag, but this is normally useful and desirable for valid messages.
Add Option to Verify Hostname in Syslog Message Against the TLS Client Certficate's CN when in Auth Name Mode #436 : suggested enhancement to validate hostname in message matches client cert when TLS is in use.
incorrectly inserts hostnames #99 : confusion between RFC3164 and RFC5424 where they wanted nill for the hostname '-' from and RFC3164 message.
allow overriding of built-in properties #327 : allow overwriting default properties parsed (which is a hook to work around issues like this one).

The text was updated successfully, but these errors were encountered:

davidelang · 2017-09-22T00:11:11Z

the rfc3164 parser is making a 'best effort' attempt to produce something sane when presented with a message that doesn't conform with the rfc. It's a pile of heuristics that have evolved over the years. Unfortunantly there are systems that include hostname, but don't includ PRI or timestamp, and for those systems, the current behavior is correct. Unless the first word fails the criteria for being a hostname (special characters for example), how is rsyslog supposed to know that it's not a hostname? leaving hostname null is never the correct answer, it would get populated by fromhost if all else fails. It would be possible to define a dontallowhostnamewithoutpriortimestamp but that is getting very ugly, it would be better to fix the configuration of whatever is sending the malformed data so that it's sending legitimate messages instead.

JPvRiel · 2017-09-22T02:16:15Z

Unfortunantly there are systems that include hostname, but don't includ PRI or timestamp, and for those systems, the current behavior is correct.

Fair enough, I can understand the messy evolution of syslog and leaving the the current default to accept any string that looks like a hostname. Having something like force.validHeaderHostname or as you suggested dontallowhostnamewithoutpriortimestamp as an option for the parser would still be valuable.

leaving hostname null is never the correct answer, it would get populated by
fromhost if all else fails.

Which is what the RFC says should happen if the priority or header is invalid, but I get that it's useful to try parse a valid timestamp and hostname header if just the priority part is missing.

fix the configuration of whatever is sending the malformed data so that it's sending legitimate messages instead

Ideally, but not always practical E.g. internal vulnerability scanners hit the TCP or UDP ports for syslog and while trying to fingerprint the service, generate bogus hostnames. Dataset from rsyslog gets polluted with thousands of bogus hostnames - this is in fact the reason I logged this issue.

I suppose, one work around could be fiddling with IP tables to blacklist the known scanning hosts.

davidelang · 2017-09-22T04:14:52Z

On Thu, 21 Sep 2017, JPvRiel wrote: > Unfortunantly there are systems that include hostname, but don't includ PRI or timestamp, and for those systems, the current behavior is correct. Fair enough, I can understand the messy evolution of syslog and leaving the the current default to accept any string that looks like a hostname. Having something like `force.validHeaderHostname` or as you suggested `dontallowhostnamewithoutpriortimestamp` as an option for the parser would still be valuable.

well, it already only accepts the word as the hostname if it is a valid hostname.

> leaving hostname null is never the correct answer, it would get populated by fromhost if all else fails. Which is what the RFC says should happen if the priority or header is invalid, but I get that it's useful to try parse a valid timestamp and hostname header if just the priority part is missing.

yeah, again the RFC is what's supposed to happen in theory and "in theory, theory and practice are the same, in practice they are not". there are a lot of things that forget to send the PRI tag, but do send a timestamp, or are missing both and do send a hostname.

> fix the configuration of whatever is sending the malformed data so that it's sending legitimate messages instead Ideally, but not always practical E.g. internal vulnerability scanners hit the TCP or UDP ports for syslog and while trying to fingerprint the service, generate bogus hostnames. Dataset from rsyslog gets polluted with thousands of bogus hostnames - this is in fact the reason I logged this issue. I suppose, one work around could be fiddling with IP tables to blacklist the known scanning hosts.

well, if it's an internal scanner, that should be easy (and it should not be trying that many things on your syslog port, so a few 'extra' hostnames would show up, but not lots) If it's external things scanning you, you really should block them via packet filters, the syslog protocol has no protection against forgery and other nasty things being transported in the log messages. David Lang

JPvRiel · 2017-09-26T14:38:20Z

@davidelang agree with your advice.

Nonetheless, rsyslog pmrfc3164 would still be more feature complete if it provided the following two options (tried to name similar to force.tagEndingByColon):

force.parseHostnameRequiresPriorityFirst
force.parseHostnameRequiresDateFirst

The above two would default to off, but if on, not try parse the hostname from the message, and instead use the $fromhost value.

Albeit my C/C++ skills are limited, I tried forking and working on a PR for this, but there's a few external source dependencies (libestr and liblogging) that meant it's not a quick thing to setup and test.

JPvRiel mentioned this issue Sep 21, 2017

allow overriding of built-in properties #327

Open

JPvRiel mentioned this issue Jun 7, 2022

RSYSLOG_TraditionalForwardFormat does not end the syslog tag with a colon if the source message was RFC5424 #4891

Open

bmagistro mentioned this issue Mar 8, 2023

RFC3164 4.3.2 -- Handling messages with pri but without timestamp or host #5098

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pmrfc3164 blindly swallows the first word of an invalid syslog header as the hostname #1789

pmrfc3164 blindly swallows the first word of an invalid syslog header as the hostname #1789

JPvRiel commented Sep 21, 2017 •

edited

davidelang commented Sep 22, 2017 via email

JPvRiel commented Sep 22, 2017

davidelang commented Sep 22, 2017 via email

JPvRiel commented Sep 26, 2017 •

edited

pmrfc3164 blindly swallows the first word of an invalid syslog header as the hostname #1789

pmrfc3164 blindly swallows the first word of an invalid syslog header as the hostname #1789

Comments

JPvRiel commented Sep 21, 2017 • edited

Problem statement and example

Relevant RFC3164 extracts

Current work-around methods

Possible solution

Related issues

davidelang commented Sep 22, 2017 via email

JPvRiel commented Sep 22, 2017

davidelang commented Sep 22, 2017 via email

JPvRiel commented Sep 26, 2017 • edited

JPvRiel commented Sep 21, 2017 •

edited

JPvRiel commented Sep 26, 2017 •

edited