Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IMPROVEMENT] add separator character option to 'name-value-list' parser #335

Merged
merged 4 commits into from
Nov 11, 2021

Conversation

frikilax
Copy link
Contributor

@frikilax frikilax commented Mar 17, 2020

Description

An extradata parameter is now available for the experimental parser name-value-list, this parameter allows to set a different character as separator between key/value couples.

What's new

Before, this parser could only handle logs like
a=1 b=2 c=text_without_spaces

But with this update, a parameter 'separator' can be specified to act as the matching separator between key/value couples, and a parameter 'assignator' as a separator between key and value.

for example:

rule=:%keyval:name-value-list:|%

or

rule=:%{"name": "keyval", "type": "name-value-list", "separator": "|"}%

are equivalent (by default, legacy 'extradata' is assigned to the 'separator' parameter in the first case)
the matching samples will then be:
a=1|b=2|c=text_without_spaces|d=text with spaces
key=42|other_key=hello there|yet-another.key=i'm a value|

When 'assignator' is left by default, it will match '=' and will check for valid naming of keys (allowing only alphanumeric characters, '.', '-' and '_').
In the case someone wants to match another character by setting 'assignator', the parser will loosen the restriction and only cut keys and values by separating between 'assignator' and 'separator'. However, key name validation will NOT occur !

Rulebase with 'assignator' defined:

rule=:%{"name": ".", "type": "name-value-list", "separator": "|", "assignator":"="}%

examples of matching sample:
a=1|key_two=hello world|thirdkey:=a valid value containing an \| escaped pipe||fourth#key=42
and result:

{
  "a": "1",
  "key_two": "hello world",
  "thirdkey:": "a valid value containing an \\| escaped pipe",
  "fourth#key": "42"
}

One can note character escaping will still be respected.

Credits

thanks to @KGuillemot for the help
thanks to @bdolez for the original work on quoting (see #234)

@davidelang
Copy link
Contributor

davidelang commented Mar 17, 2020 via email

@frikilax
Copy link
Contributor Author

frikilax commented Mar 17, 2020

Hey @davidelang!
Thanks for the insights!

We were going to give this as a simple improvement as we didn't need more yet, but after coming across some other problems, I think we'll still add support for a separator and an assignement character.

I think I will also add errors when those fields are more than 1 char, to give some feedback to the users (as it's easy/fast to add).

We also encountered problems with escaped characters, so I will also add support to ignored matched characters that are escaped.

However, quoted strings and multibytes separators might need a bit more work ,which we don't have at the moment sorry...

@frikilax
Copy link
Contributor Author

quotes support has been added

@bdolez
Copy link

bdolez commented Jul 6, 2020

Hi,

I did a PR (#234) a long time ago on the same topic but I can no longer update it.
I think my main commit is simplier and may be you could have a look at it. I think the diff I made is shorter and your features could be rebased after it. What do you think ?

Benoit

@frikilax
Copy link
Contributor Author

frikilax commented Jul 6, 2020

Hi,

I did a PR (#234) a long time ago on the same topic but I can no longer update it.
I think my main commit is simplier and may be you could have a look at it. I think the diff I made is shorter and your features could be rebased after it. What do you think ?

Benoit

This PR tries to achieve more than just quoting detection (the main goal wasn't quoting to begin with) so I'm not sure it's relevant to rebase this work on your PR.

@bdolez
Copy link

bdolez commented Jul 6, 2020

Hi,
I did a PR (#234) a long time ago on the same topic but I can no longer update it.
I think my main commit is simplier and may be you could have a look at it. I think the diff I made is shorter and your features could be rebased after it. What do you think ?
Benoit

This PR tries to achieve more than just quoting detection (the main goal wasn't quoting to begin with) so I'm not sure it's relevant to rebase this work on your PR.

My PR is probably dead because I can't update it.
I propose you take the only commit (cherry-pick) and add your feature (change sep, change assignator & config) before PR. The commit was successuly accepted&compiled and your propositions do not change algo. They will be accepted more easily.

Benoit

@frikilax
Copy link
Contributor Author

frikilax commented Jul 6, 2020

Hi,
I did a PR (#234) a long time ago on the same topic but I can no longer update it.
I think my main commit is simplier and may be you could have a look at it. I think the diff I made is shorter and your features could be rebased after it. What do you think ?
Benoit

This PR tries to achieve more than just quoting detection (the main goal wasn't quoting to begin with) so I'm not sure it's relevant to rebase this work on your PR.

My PR is probably dead because I can't update it.
I propose you take the only commit (cherry-pick) and add your feature (change sep, change assignator & config) before PR. The commit was successuly accepted&compiled and your propositions do not change algo. They will be accepted more easily.

Benoit

Frankly, what is done in your PR has been added more or less in this PR, you can check this commit
However, I realise we may have more or less been inspired by your PR while creating this one at the time, so I should credit you in this one

@bdolez
Copy link

bdolez commented Jul 6, 2020

My goal is not credit but having such feature merged into main branch and I wish it in the next release.

You treat the '\' and allow changing separator, it's better than mine.

To make it accepted, I recommand send only clean commits (merge errors & repush with --force). There is some coding-style errors. I think you sould replace "ass" variable.

In the last commit, you can group all specific "quoting" code into one "if" terminated with "goto". It make a more lisible commit.

@rgerhards rgerhards self-assigned this Jul 7, 2020
@bdolez
Copy link

bdolez commented Jul 17, 2020

Hi Rainer,

As you tell us in #234 ("if you don't hear back pls ping me"), do you have some new about this PR & state of your centos7 buildbot ?

Regards

Benoit

@bdolez
Copy link

bdolez commented Jul 30, 2020

Hi @frikilax , any updates about this PR ?

@frikilax frikilax force-pushed the PR-NameValueSeparator branch 2 times, most recently from 6755059 to 8113a7e Compare August 3, 2020 13:56
@bdolez
Copy link

bdolez commented Mar 25, 2021

Hi, any updates ?

@rgerhards
Copy link
Member

That PR needs a fresh push to restart the current CI. If this will not happen, I'll branch it off to a new branch, submit that as PR and see what happens. Feel free to remind me next Tuesday if nothing happend until then.

…TING

- new parameter 'separator': defines a custom character to look for between key/value pairs (replaces whitepsace when defined)
- new parameter 'assignator': defines a custom character to look for between key and value (replaces '=' when defined, but DISABLES key name validation)
- add support for (simple and double) quoting in values
- add support for escaped characters in values
@frikilax
Copy link
Contributor Author

Sorry, didn't have time to come back to this PR and completely forgot about it, I will try to get back to it before next Monday

…lue is not valid - to allow parsing rest of line
@frikilax frikilax mentioned this pull request Oct 12, 2021
Closed
@rgerhards rgerhards merged commit 28be845 into rsyslog:master Nov 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants