-
Notifications
You must be signed in to change notification settings - Fork 642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC3164 4.3.2 -- Handling messages with pri but without timestamp or host #5098
Comments
The problem is that there are several things that can be sent
<pri>msg
<pri>syslogtag msg
<pri>hostname syslogtag msg
and the 'correct'
<pri>date hostname syslogtag msg
how can we tell the first three apart? using your examples:
<38>User logged into the appliance. User: bmagistro.
<6>The appliance is rebooting. Command issued by user: bmagistro.
<30>Network link has gone down on interface 1.
how do we know that the hostnames aren't 'User', 'The', and 'Network', etc.
There are characters that are not allowed to be in a hostname or syslogtag, and
rsyslog does look for those and when it sees them, says that that and anything
after that must be msg (and we get complaints about that too :-) )
because of this, the rsyslog behavior can't be changed without breaking existing
configs.
What you can do is to filter on fromhost-ip (the ip of the system that sent the
logs to you) or fromhost (a name lookup of fromhost-ip) and then having logic
that looks at/parses rawmsg to get what you want, and then you can use a
template to craft a properly formatted message for further processing.
David Lang
|
The RFC also says in section 4
Continuing on, this is where I said there is some gray area and definitely open to interpretation + experience. Based on your comment, the implication is that there was a need to deviate from the RFC due to real world implementations. Ideally those needs/reasons would be captured somewhere, as a developer I know that doesn't always happen/isn't always possible/practical. As a user though, we are then left with what the maintainers/contributors chose and minimal controls to adjust this behavior today.
The way I read RFC3164 makes some of this very straight forward. I'll start with the "correct" (full?) format, this processing path is based on the presence of a valid timestamp. If the timestamp is present, it should be assumed that the rest of the header and tag are also present. However if the timestamp is not present/invalid, we hit the three scenarios you mention. Following the RFC, this is exactly the scenario described by section 4.3.2 based on that the relay MUST insert a timestamp and SHOULD add a hostname. It also states that "the remainder of the packet MUST be treated as the content field of the msg and appended". It goes on to state "the TAG value cannot be determined and will not be included". As suggested in the referenced ticket, adding a flag to toggle this behavior allowing users to choose existing behavior or "new" (strictRfc3164s43?) seems like it could be a valid option for more than just this scenario. Personally I think vendors should have been able to migrate to well formed 5424 for currently maintained equipment by now... (insert laugher here) But vendors have rarely taken my opinion(s) into consideration so like others, I am left trying to normalize (hammer) what they provide into a consistent format that can be consumed meaningfully. I'd like to try and do the same with this message format without adding too much overhead for anyone, this project included. I would go so far as to say we would be willing to help develop a patch if some consensus around behavior and a reasonable expectation that it would be accepted can be established. |
it is being treated as a syslog message, and it is being processed. It's not
following the RFC and there is a history of senders leaving out the timestamp,
but including the hostname and syslog tag.
Thus decades ago, the 'best' huristic was to try and guess what was meant and
the current behavior is the result. As this has been in place for a couple of
decades (and may even have been in sysklogd before rsyslog started)
I _REALLY_ don't think that we are going to be willing to break existing
behavior on some messages that aren't fully following the RFC to do slightly
better on some other messages that aren't fully following the RFC.
If we were to do so, we would get people arguing to change it back to match
their reading of possibly legal interpretations.
David Lang
On Wed, 8 Mar 2023, bmagistro wrote:
The RFC also says in section 4
> The payload of any IP packet that has a UDP destination port of 514 MUST be treated as a syslog message. There MAY be differences between the format of an originally transmitted syslog message and the format of a relayed message. In essence, it is RECOMMENDED to transmit a syslog message in the format specified in this document, but it is not required. If a relay is able to recognize the message as adhering to that format then it MUST retransmit the message without making any changes to it. However, if a relay receives a message but cannot discern the proper implementation of the format, it is REQUIRED to modify the message so that it conforms to that format before it retransmits it. [Section 4.1](https://www.rfc-editor.org/rfc/rfc3164.html#section-4.1) will describe the RECOMMENDED format for syslog messages. [Section 4.2](https://www.rfc-editor.org/rfc/rfc3164.html#section-4.2) will describe the requirements for originally transmitted messages and [Section 4.3](https://www
.rfc-editor.org/rfc/rfc3164.html#section-4.3) will describe the requirements for relayed messages.
…
Continuing on, this is where I said there is some gray area and definitely open to interpretation + experience. Based on your comment, the implication is that there was a need to deviate from the RFC due to real world implementations. Ideally those needs/reasons would be captured somewhere, as a developer I know that doesn't always happen/isn't always possible/practical. As a user though, we are then left with what the maintainers/contributors chose and minimal controls to adjust this behavior today.
> how can we tell the first three apart? using your examples:
> and the 'correct'
The way _**I**_ read RFC3164 makes some of this very straight forward. I'll start with the "correct" (full?) format, this processing path is based on the presence of a valid timestamp. If the timestamp is present, it should be assumed that the rest of the header and tag are also present. However if the timestamp is not present/invalid, we hit the three scenarios you mention. Following the RFC, this is exactly the scenario described by section 4.3.2 based on that the relay **MUST** insert a timestamp and **SHOULD** add a hostname. It also states that "the remainder of the packet **MUST** be treated as the content field of the msg and appended". It goes on to state "the TAG value cannot be determined and *will not* be included". As suggested in the referenced ticket, adding a flag to toggle this behavior allowing users to choose existing behavior or "new" (strictRfc3164s43?) seems like it could be a valid option for more than just this scenario.
Personally I think vendors should have been able to migrate to well formed 5424 for currently maintained equipment by now... (insert laugher here) But vendors have rarely taken my opinion(s) into consideration so like others, I am left trying to normalize (hammer) what they provide into a consistent format that can be consumed meaningfully. I'd like to try and do the same with this message format without adding too much overhead for anyone, this project included. I would go so far as to say we would be willing to help develop a patch if some consensus around behavior and a reasonable expectation that it would be accepted can be established.
|
El jue, 9 mar 2023 a las 9:08, David Lang ***@***.***>)
escribió:
it is being treated as a syslog message, and it is being processed. It's
not
following the RFC and there is a history of senders leaving out the
timestamp,
but including the hostname and syslog tag.
Thus decades ago, the 'best' huristic was to try and guess what was meant
and
the current behavior is the result. As this has been in place for a couple
of
decades (and may even have been in sysklogd before rsyslog started)
I _REALLY_ don't think that we are going to be willing to break existing
behavior on some messages that aren't fully following the RFC to do
slightly
better on some other messages that aren't fully following the RFC.
If we were to do so, we would get people arguing to change it back to
match
their reading of possibly legal interpretations.
Unfortunately, I can't delve deep into the details, but RFC 3164 syslog is
far from being well defined. In any case, I am with David here: we will not
change the normal 3164 parser heuristic - it works far well. Messages
without a header are garbage in the first place. And there is no reliable
way to detect what they contain and what not.
That said, rsyslog is modular. So you are free to write you own parser
module, based on pm3164 for example, and make it parse this type of message
exactly like you want. I would suggest to use a different port number for
these devices then, because if other devices use that parser, you'll
probably end in real-world format hell ;-)
HTH
Rainer
… David Lang
On Wed, 8 Mar 2023, bmagistro wrote:
> The RFC also says in section 4
>
>> The payload of any IP packet that has a UDP destination port of 514
MUST be treated as a syslog message. There MAY be differences between the
format of an originally transmitted syslog message and the format of a
relayed message. In essence, it is RECOMMENDED to transmit a syslog message
in the format specified in this document, but it is not required. If a
relay is able to recognize the message as adhering to that format then it
MUST retransmit the message without making any changes to it. However, if a
relay receives a message but cannot discern the proper implementation of
the format, it is REQUIRED to modify the message so that it conforms to
that format before it retransmits it. [Section 4.1](
https://www.rfc-editor.org/rfc/rfc3164.html#section-4.1) will describe
the RECOMMENDED format for syslog messages. [Section 4.2](
https://www.rfc-editor.org/rfc/rfc3164.html#section-4.2) will describe
the requirements for originally transmitted messages and [Section 4.3](
https://www
.rfc-editor.org/rfc/rfc3164.html#section-4.3) will describe the
requirements for relayed messages.
>
> Continuing on, this is where I said there is some gray area and
definitely open to interpretation + experience. Based on your comment, the
implication is that there was a need to deviate from the RFC due to real
world implementations. Ideally those needs/reasons would be captured
somewhere, as a developer I know that doesn't always happen/isn't always
possible/practical. As a user though, we are then left with what the
maintainers/contributors chose and minimal controls to adjust this behavior
today.
>
>> how can we tell the first three apart? using your examples:
>> and the 'correct'
>
> The way _**I**_ read RFC3164 makes some of this very straight forward.
I'll start with the "correct" (full?) format, this processing path is based
on the presence of a valid timestamp. If the timestamp is present, it
should be assumed that the rest of the header and tag are also present.
However if the timestamp is not present/invalid, we hit the three scenarios
you mention. Following the RFC, this is exactly the scenario described by
section 4.3.2 based on that the relay **MUST** insert a timestamp and
**SHOULD** add a hostname. It also states that "the remainder of the packet
**MUST** be treated as the content field of the msg and appended". It goes
on to state "the TAG value cannot be determined and *will not* be
included". As suggested in the referenced ticket, adding a flag to toggle
this behavior allowing users to choose existing behavior or "new"
(strictRfc3164s43?) seems like it could be a valid option for more than
just this scenario.
>
> Personally I think vendors should have been able to migrate to well
formed 5424 for currently maintained equipment by now... (insert laugher
here) But vendors have rarely taken my opinion(s) into consideration so
like others, I am left trying to normalize (hammer) what they provide into
a consistent format that can be consumed meaningfully. I'd like to try and
do the same with this message format without adding too much overhead for
anyone, this project included. I would go so far as to say we would be
willing to help develop a patch if some consensus around behavior and a
reasonable expectation that it would be accepted can be established.
>
>
—
Reply to this email directly, view it on GitHub
<#5098 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALJ3C2G5WCKGPC54CH62STW3GFYJANCNFSM6AAAAAAVUIEJH4>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Let me add (I forgot). There may be is a point that RFC 3164 suggest (its
informational, so everything is just a suggestion) to treat some fields
other than we do. What we do, however, is what works best in practice. I
think the *real* solution is indeed a special parser.
Rainer
El jue, 9 mar 2023 a las 9:15, Rainer Gerhards ***@***.***>)
escribió:
… El jue, 9 mar 2023 a las 9:08, David Lang ***@***.***>)
escribió:
> it is being treated as a syslog message, and it is being processed. It's
> not
> following the RFC and there is a history of senders leaving out the
> timestamp,
> but including the hostname and syslog tag.
>
> Thus decades ago, the 'best' huristic was to try and guess what was meant
> and
> the current behavior is the result. As this has been in place for a
> couple of
> decades (and may even have been in sysklogd before rsyslog started)
>
> I _REALLY_ don't think that we are going to be willing to break existing
> behavior on some messages that aren't fully following the RFC to do
> slightly
> better on some other messages that aren't fully following the RFC.
>
> If we were to do so, we would get people arguing to change it back to
> match
> their reading of possibly legal interpretations.
>
>
Unfortunately, I can't delve deep into the details, but RFC 3164 syslog is
far from being well defined. In any case, I am with David here: we will not
change the normal 3164 parser heuristic - it works far well. Messages
without a header are garbage in the first place. And there is no reliable
way to detect what they contain and what not.
That said, rsyslog is modular. So you are free to write you own parser
module, based on pm3164 for example, and make it parse this type of message
exactly like you want. I would suggest to use a different port number for
these devices then, because if other devices use that parser, you'll
probably end in real-world format hell ;-)
HTH
Rainer
> David Lang
>
> On Wed, 8 Mar 2023, bmagistro wrote:
>
> > The RFC also says in section 4
> >
> >> The payload of any IP packet that has a UDP destination port of 514
> MUST be treated as a syslog message. There MAY be differences between the
> format of an originally transmitted syslog message and the format of a
> relayed message. In essence, it is RECOMMENDED to transmit a syslog message
> in the format specified in this document, but it is not required. If a
> relay is able to recognize the message as adhering to that format then it
> MUST retransmit the message without making any changes to it. However, if a
> relay receives a message but cannot discern the proper implementation of
> the format, it is REQUIRED to modify the message so that it conforms to
> that format before it retransmits it. [Section 4.1](
> https://www.rfc-editor.org/rfc/rfc3164.html#section-4.1) will describe
> the RECOMMENDED format for syslog messages. [Section 4.2](
> https://www.rfc-editor.org/rfc/rfc3164.html#section-4.2) will describe
> the requirements for originally transmitted messages and [Section 4.3](
> https://www
> .rfc-editor.org/rfc/rfc3164.html#section-4.3) will describe the
> requirements for relayed messages.
> >
> > Continuing on, this is where I said there is some gray area and
> definitely open to interpretation + experience. Based on your comment, the
> implication is that there was a need to deviate from the RFC due to real
> world implementations. Ideally those needs/reasons would be captured
> somewhere, as a developer I know that doesn't always happen/isn't always
> possible/practical. As a user though, we are then left with what the
> maintainers/contributors chose and minimal controls to adjust this behavior
> today.
> >
> >> how can we tell the first three apart? using your examples:
> >> and the 'correct'
> >
> > The way _**I**_ read RFC3164 makes some of this very straight forward.
> I'll start with the "correct" (full?) format, this processing path is based
> on the presence of a valid timestamp. If the timestamp is present, it
> should be assumed that the rest of the header and tag are also present.
> However if the timestamp is not present/invalid, we hit the three scenarios
> you mention. Following the RFC, this is exactly the scenario described by
> section 4.3.2 based on that the relay **MUST** insert a timestamp and
> **SHOULD** add a hostname. It also states that "the remainder of the packet
> **MUST** be treated as the content field of the msg and appended". It goes
> on to state "the TAG value cannot be determined and *will not* be
> included". As suggested in the referenced ticket, adding a flag to toggle
> this behavior allowing users to choose existing behavior or "new"
> (strictRfc3164s43?) seems like it could be a valid option for more than
> just this scenario.
> >
> > Personally I think vendors should have been able to migrate to well
> formed 5424 for currently maintained equipment by now... (insert laugher
> here) But vendors have rarely taken my opinion(s) into consideration so
> like others, I am left trying to normalize (hammer) what they provide into
> a consistent format that can be consumed meaningfully. I'd like to try and
> do the same with this message format without adding too much overhead for
> anyone, this project included. I would go so far as to say we would be
> willing to help develop a patch if some consensus around behavior and a
> reasonable expectation that it would be accepted can be established.
> >
> >
>
> —
> Reply to this email directly, view it on GitHub
> <#5098 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AALJ3C2G5WCKGPC54CH62STW3GFYJANCNFSM6AAAAAAVUIEJH4>
> .
> You are receiving this because you are subscribed to this thread.Message
> ID: ***@***.***>
>
|
Thanks for the discussion, while I disagree with the stance that there is no room for a user configurable flag to change this behavior I do appreciate the work that has been done to get us and this app this far. Cheers |
This may be related to #1789 but don't want to mix the threads if it is not.
Personally I would consider this a partial/broken implementation however, this is generated from a commercially available device and does seem to have a path within RFC 3164 to be considered valid. I am working on making contact with the vendor to discuss this (and some other feature requests) but time to fix if at all once that contact is made is unknown. The vendor does not expose transmission port either so short of using a separate instance I am not sure how easily one could differentiate this traffic from other syslog traffic coming in (unless all other traffic had its port changed).
Based on this input data I would expect this to be handled as RFC 3164 data and then further handled according to section 4.3.2,
A valid PRI but no/invalid timestamp
. The behavior lines assume output in RFC 5424 format.Input data
Sender 10.0.37.16
Actual behavior
Expected behavior
This is generated by hand and I think I counted fields correctly, at the end of the day only expecting pri, version, timestamp, hostname, and msg to be able to be populated. To achieve this I would expect the timestamp and hostname (subject to dns resolution, so IP inserted here for discussion purposes) to be added based on section 3164 4.3.2. This is probably where the gray areas start, per comments on #1789 there are implementations that may send a valid PRI but no hostname or send a message starting with a hostname so parsing/processing needs to account for these too. If trying to handle all of these, one would likely need multiple inputs to set the different parameters for parsing behavior. The below fills in the first part of the header based on the receive information from rsyslog. Following RFC 3164 for tag detection, that would be part of the message but is not present here so would not expect the remaining header fields in 5424 to be populated, same for structured data.
I am open to suggestions on how best to handle this, right now the easiest answer seems to be something like allocate a second IP to the host, listen for this "broken" format there and handle with some custom parsing logic.
Environment
Relevant configuration snippet
pcap (zip'd for github upload)
The text was updated successfully, but these errors were encountered: