RFC3164 4.3.2 -- Handling messages with pri but without timestamp or host #5098

bmagistro · 2023-03-08T21:03:19Z

This may be related to #1789 but don't want to mix the threads if it is not.

Personally I would consider this a partial/broken implementation however, this is generated from a commercially available device and does seem to have a path within RFC 3164 to be considered valid. I am working on making contact with the vendor to discuss this (and some other feature requests) but time to fix if at all once that contact is made is unknown. The vendor does not expose transmission port either so short of using a separate instance I am not sure how easily one could differentiate this traffic from other syslog traffic coming in (unless all other traffic had its port changed).

Based on this input data I would expect this to be handled as RFC 3164 data and then further handled according to section 4.3.2, A valid PRI but no/invalid timestamp. The behavior lines assume output in RFC 5424 format.

Input data

Sender 10.0.37.16

<38>User logged into the appliance. User: bmagistro.
<6>The appliance is rebooting. Command issued by user: bmagistro.
<30>Network link has gone down on interface 1.

Actual behavior

<38>1 2023-03-08T19:41:38.089271+00:00 User logged - - -  into the appliance. User: bmagistro.
<6>1 2023-03-08T19:41:47.262415+00:00 The appliance - - -  is rebooting. Command issued by user: bmagistro.
<30>1 2023-03-08T19:41:59.470277+00:00 Network link - - -  has gone down on interface 1.

Expected behavior

This is generated by hand and I think I counted fields correctly, at the end of the day only expecting pri, version, timestamp, hostname, and msg to be able to be populated. To achieve this I would expect the timestamp and hostname (subject to dns resolution, so IP inserted here for discussion purposes) to be added based on section 3164 4.3.2. This is probably where the gray areas start, per comments on #1789 there are implementations that may send a valid PRI but no hostname or send a message starting with a hostname so parsing/processing needs to account for these too. If trying to handle all of these, one would likely need multiple inputs to set the different parameters for parsing behavior. The below fills in the first part of the header based on the receive information from rsyslog. Following RFC 3164 for tag detection, that would be part of the message but is not present here so would not expect the remaining header fields in 5424 to be populated, same for structured data.

<38>1 2023-03-08T19:41:38.089271+00:00 10.0.37.16 - - - - User logged into the appliance. User: bmagistro.
<6>1 2023-03-08T19:41:47.262415+00:00 10.0.37.16 - - - - The appliance is rebooting. Command issued by user: bmagistro.
<30>1 2023-03-08T19:41:59.470277+00:00 10.0.37.16 - - - - Network link has gone down on interface 1.

I am open to suggestions on how best to handle this, right now the easiest answer seems to be something like allocate a second IP to the host, listen for this "broken" format there and handle with some custom parsing logic.

Environment

rsyslog version: 8.2212.0
platform: Alma 9.1

Relevant configuration snippet

# Provides UDP syslog reception
$ModLoad imudp

ruleset(name="remote") {
    action(type="omfile" file="/tmp/relay.log" template="RSYSLOG_SyslogProtocol23Format")
    action(type="omfile" file="/tmp/relaydebug.log" template="RSYSLOG_DebugFormat")
}
input( type="imudp" port="514" ruleset="remote")

pcap (zip'd for github upload)

The text was updated successfully, but these errors were encountered:

davidelang · 2023-03-08T22:01:17Z

The problem is that there are several things that can be sent <pri>msg <pri>syslogtag msg <pri>hostname syslogtag msg and the 'correct' <pri>date hostname syslogtag msg how can we tell the first three apart? using your examples:

<38>User logged into the appliance. User: bmagistro. <6>The appliance is rebooting. Command issued by user: bmagistro. <30>Network link has gone down on interface 1.

how do we know that the hostnames aren't 'User', 'The', and 'Network', etc. There are characters that are not allowed to be in a hostname or syslogtag, and rsyslog does look for those and when it sees them, says that that and anything after that must be msg (and we get complaints about that too :-) ) because of this, the rsyslog behavior can't be changed without breaking existing configs. What you can do is to filter on fromhost-ip (the ip of the system that sent the logs to you) or fromhost (a name lookup of fromhost-ip) and then having logic that looks at/parses rawmsg to get what you want, and then you can use a template to craft a properly formatted message for further processing. David Lang

bmagistro · 2023-03-09T04:37:55Z

The RFC also says in section 4

The payload of any IP packet that has a UDP destination port of 514 MUST be treated as a syslog message. There MAY be differences between the format of an originally transmitted syslog message and the format of a relayed message. In essence, it is RECOMMENDED to transmit a syslog message in the format specified in this document, but it is not required. If a relay is able to recognize the message as adhering to that format then it MUST retransmit the message without making any changes to it. However, if a relay receives a message but cannot discern the proper implementation of the format, it is REQUIRED to modify the message so that it conforms to that format before it retransmits it. Section 4.1 will describe the RECOMMENDED format for syslog messages. Section 4.2 will describe the requirements for originally transmitted messages and Section 4.3 will describe the requirements for relayed messages.

Continuing on, this is where I said there is some gray area and definitely open to interpretation + experience. Based on your comment, the implication is that there was a need to deviate from the RFC due to real world implementations. Ideally those needs/reasons would be captured somewhere, as a developer I know that doesn't always happen/isn't always possible/practical. As a user though, we are then left with what the maintainers/contributors chose and minimal controls to adjust this behavior today.

how can we tell the first three apart? using your examples:
and the 'correct'

The way I read RFC3164 makes some of this very straight forward. I'll start with the "correct" (full?) format, this processing path is based on the presence of a valid timestamp. If the timestamp is present, it should be assumed that the rest of the header and tag are also present. However if the timestamp is not present/invalid, we hit the three scenarios you mention. Following the RFC, this is exactly the scenario described by section 4.3.2 based on that the relay MUST insert a timestamp and SHOULD add a hostname. It also states that "the remainder of the packet MUST be treated as the content field of the msg and appended". It goes on to state "the TAG value cannot be determined and will not be included". As suggested in the referenced ticket, adding a flag to toggle this behavior allowing users to choose existing behavior or "new" (strictRfc3164s43?) seems like it could be a valid option for more than just this scenario.

Personally I think vendors should have been able to migrate to well formed 5424 for currently maintained equipment by now... (insert laugher here) But vendors have rarely taken my opinion(s) into consideration so like others, I am left trying to normalize (hammer) what they provide into a consistent format that can be consumed meaningfully. I'd like to try and do the same with this message format without adding too much overhead for anyone, this project included. I would go so far as to say we would be willing to help develop a patch if some consensus around behavior and a reasonable expectation that it would be accepted can be established.

davidelang · 2023-03-09T08:08:23Z

it is being treated as a syslog message, and it is being processed. It's not following the RFC and there is a history of senders leaving out the timestamp, but including the hostname and syslog tag. Thus decades ago, the 'best' huristic was to try and guess what was meant and the current behavior is the result. As this has been in place for a couple of decades (and may even have been in sysklogd before rsyslog started) I _REALLY_ don't think that we are going to be willing to break existing behavior on some messages that aren't fully following the RFC to do slightly better on some other messages that aren't fully following the RFC. If we were to do so, we would get people arguing to change it back to match their reading of possibly legal interpretations. David Lang

On Wed, 8 Mar 2023, bmagistro wrote: The RFC also says in section 4 > The payload of any IP packet that has a UDP destination port of 514 MUST be treated as a syslog message. There MAY be differences between the format of an originally transmitted syslog message and the format of a relayed message. In essence, it is RECOMMENDED to transmit a syslog message in the format specified in this document, but it is not required. If a relay is able to recognize the message as adhering to that format then it MUST retransmit the message without making any changes to it. However, if a relay receives a message but cannot discern the proper implementation of the format, it is REQUIRED to modify the message so that it conforms to that format before it retransmits it. [Section 4.1](https://www.rfc-editor.org/rfc/rfc3164.html#section-4.1) will describe the RECOMMENDED format for syslog messages. [Section 4.2](https://www.rfc-editor.org/rfc/rfc3164.html#section-4.2) will describe the requirements for originally transmitted messages and [Section 4.3](https://www

.rfc-editor.org/rfc/rfc3164.html#section-4.3) will describe the requirements for relayed messages.

…

Continuing on, this is where I said there is some gray area and definitely open to interpretation + experience. Based on your comment, the implication is that there was a need to deviate from the RFC due to real world implementations. Ideally those needs/reasons would be captured somewhere, as a developer I know that doesn't always happen/isn't always possible/practical. As a user though, we are then left with what the maintainers/contributors chose and minimal controls to adjust this behavior today. > how can we tell the first three apart? using your examples: > and the 'correct' The way _**I**_ read RFC3164 makes some of this very straight forward. I'll start with the "correct" (full?) format, this processing path is based on the presence of a valid timestamp. If the timestamp is present, it should be assumed that the rest of the header and tag are also present. However if the timestamp is not present/invalid, we hit the three scenarios you mention. Following the RFC, this is exactly the scenario described by section 4.3.2 based on that the relay **MUST** insert a timestamp and **SHOULD** add a hostname. It also states that "the remainder of the packet **MUST** be treated as the content field of the msg and appended". It goes on to state "the TAG value cannot be determined and *will not* be included". As suggested in the referenced ticket, adding a flag to toggle this behavior allowing users to choose existing behavior or "new" (strictRfc3164s43?) seems like it could be a valid option for more than just this scenario. Personally I think vendors should have been able to migrate to well formed 5424 for currently maintained equipment by now... (insert laugher here) But vendors have rarely taken my opinion(s) into consideration so like others, I am left trying to normalize (hammer) what they provide into a consistent format that can be consumed meaningfully. I'd like to try and do the same with this message format without adding too much overhead for anyone, this project included. I would go so far as to say we would be willing to help develop a patch if some consensus around behavior and a reasonable expectation that it would be accepted can be established.

rgerhards · 2023-03-09T08:16:12Z

El jue, 9 mar 2023 a las 9:08, David Lang ***@***.***>) escribió:

it is being treated as a syslog message, and it is being processed. It's not following the RFC and there is a history of senders leaving out the timestamp, but including the hostname and syslog tag. Thus decades ago, the 'best' huristic was to try and guess what was meant and the current behavior is the result. As this has been in place for a couple of decades (and may even have been in sysklogd before rsyslog started) I _REALLY_ don't think that we are going to be willing to break existing behavior on some messages that aren't fully following the RFC to do slightly better on some other messages that aren't fully following the RFC. If we were to do so, we would get people arguing to change it back to match their reading of possibly legal interpretations.

Unfortunately, I can't delve deep into the details, but RFC 3164 syslog is far from being well defined. In any case, I am with David here: we will not change the normal 3164 parser heuristic - it works far well. Messages without a header are garbage in the first place. And there is no reliable way to detect what they contain and what not. That said, rsyslog is modular. So you are free to write you own parser module, based on pm3164 for example, and make it parse this type of message exactly like you want. I would suggest to use a different port number for these devices then, because if other devices use that parser, you'll probably end in real-world format hell ;-) HTH Rainer

…

David Lang On Wed, 8 Mar 2023, bmagistro wrote: > The RFC also says in section 4 > >> The payload of any IP packet that has a UDP destination port of 514 MUST be treated as a syslog message. There MAY be differences between the format of an originally transmitted syslog message and the format of a relayed message. In essence, it is RECOMMENDED to transmit a syslog message in the format specified in this document, but it is not required. If a relay is able to recognize the message as adhering to that format then it MUST retransmit the message without making any changes to it. However, if a relay receives a message but cannot discern the proper implementation of the format, it is REQUIRED to modify the message so that it conforms to that format before it retransmits it. [Section 4.1]( https://www.rfc-editor.org/rfc/rfc3164.html#section-4.1) will describe the RECOMMENDED format for syslog messages. [Section 4.2]( https://www.rfc-editor.org/rfc/rfc3164.html#section-4.2) will describe the requirements for originally transmitted messages and [Section 4.3]( https://www .rfc-editor.org/rfc/rfc3164.html#section-4.3) will describe the requirements for relayed messages. > > Continuing on, this is where I said there is some gray area and definitely open to interpretation + experience. Based on your comment, the implication is that there was a need to deviate from the RFC due to real world implementations. Ideally those needs/reasons would be captured somewhere, as a developer I know that doesn't always happen/isn't always possible/practical. As a user though, we are then left with what the maintainers/contributors chose and minimal controls to adjust this behavior today. > >> how can we tell the first three apart? using your examples: >> and the 'correct' > > The way _**I**_ read RFC3164 makes some of this very straight forward. I'll start with the "correct" (full?) format, this processing path is based on the presence of a valid timestamp. If the timestamp is present, it should be assumed that the rest of the header and tag are also present. However if the timestamp is not present/invalid, we hit the three scenarios you mention. Following the RFC, this is exactly the scenario described by section 4.3.2 based on that the relay **MUST** insert a timestamp and **SHOULD** add a hostname. It also states that "the remainder of the packet **MUST** be treated as the content field of the msg and appended". It goes on to state "the TAG value cannot be determined and *will not* be included". As suggested in the referenced ticket, adding a flag to toggle this behavior allowing users to choose existing behavior or "new" (strictRfc3164s43?) seems like it could be a valid option for more than just this scenario. > > Personally I think vendors should have been able to migrate to well formed 5424 for currently maintained equipment by now... (insert laugher here) But vendors have rarely taken my opinion(s) into consideration so like others, I am left trying to normalize (hammer) what they provide into a consistent format that can be consumed meaningfully. I'd like to try and do the same with this message format without adding too much overhead for anyone, this project included. I would go so far as to say we would be willing to help develop a patch if some consensus around behavior and a reasonable expectation that it would be accepted can be established. > > — Reply to this email directly, view it on GitHub <#5098 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALJ3C2G5WCKGPC54CH62STW3GFYJANCNFSM6AAAAAAVUIEJH4> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

rgerhards · 2023-03-09T08:18:01Z

Let me add (I forgot). There may be is a point that RFC 3164 suggest (its informational, so everything is just a suggestion) to treat some fields other than we do. What we do, however, is what works best in practice. I think the *real* solution is indeed a special parser. Rainer El jue, 9 mar 2023 a las 9:15, Rainer Gerhards ***@***.***>) escribió:

…

El jue, 9 mar 2023 a las 9:08, David Lang ***@***.***>) escribió: > it is being treated as a syslog message, and it is being processed. It's > not > following the RFC and there is a history of senders leaving out the > timestamp, > but including the hostname and syslog tag. > > Thus decades ago, the 'best' huristic was to try and guess what was meant > and > the current behavior is the result. As this has been in place for a > couple of > decades (and may even have been in sysklogd before rsyslog started) > > I _REALLY_ don't think that we are going to be willing to break existing > behavior on some messages that aren't fully following the RFC to do > slightly > better on some other messages that aren't fully following the RFC. > > If we were to do so, we would get people arguing to change it back to > match > their reading of possibly legal interpretations. > > Unfortunately, I can't delve deep into the details, but RFC 3164 syslog is far from being well defined. In any case, I am with David here: we will not change the normal 3164 parser heuristic - it works far well. Messages without a header are garbage in the first place. And there is no reliable way to detect what they contain and what not. That said, rsyslog is modular. So you are free to write you own parser module, based on pm3164 for example, and make it parse this type of message exactly like you want. I would suggest to use a different port number for these devices then, because if other devices use that parser, you'll probably end in real-world format hell ;-) HTH Rainer > David Lang > > On Wed, 8 Mar 2023, bmagistro wrote: > > > The RFC also says in section 4 > > > >> The payload of any IP packet that has a UDP destination port of 514 > MUST be treated as a syslog message. There MAY be differences between the > format of an originally transmitted syslog message and the format of a > relayed message. In essence, it is RECOMMENDED to transmit a syslog message > in the format specified in this document, but it is not required. If a > relay is able to recognize the message as adhering to that format then it > MUST retransmit the message without making any changes to it. However, if a > relay receives a message but cannot discern the proper implementation of > the format, it is REQUIRED to modify the message so that it conforms to > that format before it retransmits it. [Section 4.1]( > https://www.rfc-editor.org/rfc/rfc3164.html#section-4.1) will describe > the RECOMMENDED format for syslog messages. [Section 4.2]( > https://www.rfc-editor.org/rfc/rfc3164.html#section-4.2) will describe > the requirements for originally transmitted messages and [Section 4.3]( > https://www > .rfc-editor.org/rfc/rfc3164.html#section-4.3) will describe the > requirements for relayed messages. > > > > Continuing on, this is where I said there is some gray area and > definitely open to interpretation + experience. Based on your comment, the > implication is that there was a need to deviate from the RFC due to real > world implementations. Ideally those needs/reasons would be captured > somewhere, as a developer I know that doesn't always happen/isn't always > possible/practical. As a user though, we are then left with what the > maintainers/contributors chose and minimal controls to adjust this behavior > today. > > > >> how can we tell the first three apart? using your examples: > >> and the 'correct' > > > > The way _**I**_ read RFC3164 makes some of this very straight forward. > I'll start with the "correct" (full?) format, this processing path is based > on the presence of a valid timestamp. If the timestamp is present, it > should be assumed that the rest of the header and tag are also present. > However if the timestamp is not present/invalid, we hit the three scenarios > you mention. Following the RFC, this is exactly the scenario described by > section 4.3.2 based on that the relay **MUST** insert a timestamp and > **SHOULD** add a hostname. It also states that "the remainder of the packet > **MUST** be treated as the content field of the msg and appended". It goes > on to state "the TAG value cannot be determined and *will not* be > included". As suggested in the referenced ticket, adding a flag to toggle > this behavior allowing users to choose existing behavior or "new" > (strictRfc3164s43?) seems like it could be a valid option for more than > just this scenario. > > > > Personally I think vendors should have been able to migrate to well > formed 5424 for currently maintained equipment by now... (insert laugher > here) But vendors have rarely taken my opinion(s) into consideration so > like others, I am left trying to normalize (hammer) what they provide into > a consistent format that can be consumed meaningfully. I'd like to try and > do the same with this message format without adding too much overhead for > anyone, this project included. I would go so far as to say we would be > willing to help develop a patch if some consensus around behavior and a > reasonable expectation that it would be accepted can be established. > > > > > > — > Reply to this email directly, view it on GitHub > <#5098 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AALJ3C2G5WCKGPC54CH62STW3GFYJANCNFSM6AAAAAAVUIEJH4> > . > You are receiving this because you are subscribed to this thread.Message > ID: ***@***.***> >

bmagistro · 2023-03-09T16:08:33Z

Thanks for the discussion, while I disagree with the stance that there is no room for a user configurable flag to change this behavior I do appreciate the work that has been done to get us and this app this far.

Cheers

bmagistro closed this as completed Mar 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC3164 4.3.2 -- Handling messages with pri but without timestamp or host #5098

RFC3164 4.3.2 -- Handling messages with pri but without timestamp or host #5098

bmagistro commented Mar 8, 2023

davidelang commented Mar 8, 2023 via email

bmagistro commented Mar 9, 2023

davidelang commented Mar 9, 2023 via email

rgerhards commented Mar 9, 2023 via email

rgerhards commented Mar 9, 2023 via email

bmagistro commented Mar 9, 2023

RFC3164 4.3.2 -- Handling messages with pri but without timestamp or host #5098

RFC3164 4.3.2 -- Handling messages with pri but without timestamp or host #5098

Comments

bmagistro commented Mar 8, 2023

Input data

Actual behavior

Expected behavior

Environment

davidelang commented Mar 8, 2023 via email

bmagistro commented Mar 9, 2023

davidelang commented Mar 9, 2023 via email

rgerhards commented Mar 9, 2023 via email

rgerhards commented Mar 9, 2023 via email

bmagistro commented Mar 9, 2023