Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pmrfc3164 blindly swallows the first word of an invalid syslog header as the hostname #1789

Open
JPvRiel opened this issue Sep 21, 2017 · 4 comments

Comments

@JPvRiel
Copy link

JPvRiel commented Sep 21, 2017

Problem statement and example

Given a poorly formed, lazy, unconventional message:

Poor form RFC3164 without syslog header

pmrfc3164 (with force.tagEndingByColon="on") will parse it and populate $hostname with "Poor" and set $msg as " form RFC3164 without syslog header". This "pollutes" the $hostname field and cuts information out of the $msg field. Without force.tagEndingByColon="on", the result is even worse...

While RFC3164 does permit input without any priority header, date, hostname, or syslog tag, it's poor form and considered 'unconventional'.

pmrfc3164 follows the RFC and accepts such 'malformed' / 'lazy' messages, as it should, but then also assumes they are well formed and parses content into the hostname and syslog tags. By current default (v8.29), invalid "junk" ends up in the following message properties:

  1. $hostname
  2. $syslogtag
  3. $app-name / $programname

I'm tempted to call the above default assumptions bugs. RFC3164 pretty much states these assumptions should not be made.

Relevant RFC3164 extracts

4.3.2 Valid PRI but no TIMESTAMP or invalid TIMESTAMP

If a relay does not find a valid TIMESTAMP in a received syslog
packet, then it MUST add a TIMESTAMP and a space character
immediately after the closing angle bracket of the PRI part. It
SHOULD additionally add a HOSTNAME and a space character after the
TIMESTAMP.

4.3.3 No PRI or Unidentifiable PRI

If the relay receives a syslog message without a PRI, or with an
unidentifiable PRI, then it MUST insert a PRI with a Priority value
of 13 as well as a TIMESTAMP as described in Section 4.3.2. The
relay SHOULD also insert a HOSTNAME as described in Section 4.3.2.

Current work-around methods

There is no option to avoid the junk in the hostname property. The force.tagEndingByColon solves the issue for invalid tags, but not hostnames.

The parseHostnameAndTag option is halfway there, but not ideal since most users might well want these fields parsed if the message did conform to RFC3164 conventions.

Users might be tempted to work arround with other config tricks such as custom local message variables with templating, but it usually amounts to something cumbersome and at some performance cost.

My first thought was to use regex matches, but I know those might be expensive and a very simple string match to see $rawmsg or $rawmsg-after-pri begin with $hostname. I got this far and realised this should rather be fixed in the source of pmrfc3163:

  # append meta-data
  if ($protocol-version == "1") then {
    set $!format = "RFC5424";
  } else {
    # assume protocol-version == 0
    # check priority
    # 0-191 are valid priority encodings, set to 192 > 191 to indicate invalid
    set $.pri-test = cnum(re_extract($rawmsg, "^<([0-9]{1,3})>", 0, 0, "192"));
    if ($.pri-test > 191) then {
      set $!pri-valid = "false";
      set $!format = "RFC3164_malformed";
    } else {
      set $!pri-valid = "true";
    }
    # Check syslog header (date and hostname)
    # - Regex is costly, so skim for 'Mmm' pattern of date (but dont match to actual months)
    # - Also use a trick to see if rsyslog assumed the first word was the hostname after failing to parse a syslog header date
    if (not re_match($rawmsg-after-pri, "^[A-Z][a-z]{2}") or $rawmsg-after-pri startswith $hostname) then {
      # rsyslog assumed first word of malformed message was the hostname, so no valid header
      set $!format = "RFC3164_malformed";
      set $!header-valid = "false";
      set $.hostname = $hostname;
      set $.msg = $rawmsg-after-pri;
    } else {
      set $!format = "RFC3164";
      # above checks not 100% precise to avoid perfomance cost, so only reasonably confident a good header was found
      set $!header-valid = "true";
      set $.hostname = $hostname;
      set $.msg = $msg;
    }
  }

Then a custom template can include the metadata.

Possible solution

Add an option force.validHeaderHostname which can default to "off" (backward compatibility) but when "on", sets $hostname to a null value if the syslog header is invalid (e.g. no date detected) and assume $msg = $rawmsg-after-pri.

There is some trickiness without RFC3164 relay/output, and one can either pass along the "malformed" message verbatim or to to avoid pain on the other end:

  • $hostname = $fromhost (not done at present)
  • $timestamp = $timegenerated (this is already done I assume)

For relaying as RFC5424 or JSON, one could be truthful and set $hostname = "-" (nil value), but this loses the benefit of guessing the log source host via $fromhost in trying to be overly correct.

In my case, I like the option to have metadata injected that indicates if the message was malformed and without a syslog header. Nice properties to complement this approach might be to have these two extra properties to indicate what happend during parsing (e.g. in the JSON object, i):

{
  "pri-valid": false
  "syslogheader-valid": false
}

That could let downstream systems know if the facility, hostname and/or date was guessed/added instead of being from the raw message.

Related issues

I searched the issues and was surprised to not find this raised already: https://github.com/rsyslog/rsyslog/issues?utf8=%E2%9C%93&q=is%3Aissue%20hostname%20rfc3164

These are related, but not the same thing:

@davidelang
Copy link
Contributor

davidelang commented Sep 22, 2017 via email

@JPvRiel
Copy link
Author

JPvRiel commented Sep 22, 2017

Unfortunantly there are systems that include hostname, but don't includ PRI or timestamp, and for those systems, the current behavior is correct.

Fair enough, I can understand the messy evolution of syslog and leaving the the current default to accept any string that looks like a hostname. Having something like force.validHeaderHostname or as you suggested dontallowhostnamewithoutpriortimestamp as an option for the parser would still be valuable.

leaving hostname null is never the correct answer, it would get populated by
fromhost if all else fails.

Which is what the RFC says should happen if the priority or header is invalid, but I get that it's useful to try parse a valid timestamp and hostname header if just the priority part is missing.

fix the configuration of whatever is sending the malformed data so that it's sending legitimate messages instead

Ideally, but not always practical E.g. internal vulnerability scanners hit the TCP or UDP ports for syslog and while trying to fingerprint the service, generate bogus hostnames. Dataset from rsyslog gets polluted with thousands of bogus hostnames - this is in fact the reason I logged this issue.

I suppose, one work around could be fiddling with IP tables to blacklist the known scanning hosts.

@davidelang
Copy link
Contributor

davidelang commented Sep 22, 2017 via email

@JPvRiel
Copy link
Author

JPvRiel commented Sep 26, 2017

@davidelang agree with your advice.

Nonetheless, rsyslog pmrfc3164 would still be more feature complete if it provided the following two options (tried to name similar to force.tagEndingByColon):

  • force.parseHostnameRequiresPriorityFirst
  • force.parseHostnameRequiresDateFirst

The above two would default to off, but if on, not try parse the hostname from the message, and instead use the $fromhost value.

Albeit my C/C++ skills are limited, I tried forking and working on a PR for this, but there's a few external source dependencies (libestr and liblogging) that meant it's not a quick thing to setup and test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants