Skip to content

Matches as an array (4.0)#3948

Merged
gaborznagy merged 45 commits intosyslog-ng:masterfrom
bazsi:matches-as-an-array
Jun 1, 2022
Merged

Matches as an array (4.0)#3948
gaborznagy merged 45 commits intosyslog-ng:masterfrom
bazsi:matches-as-an-array

Conversation

@bazsi
Copy link
Collaborator

@bazsi bazsi commented Mar 12, 2022

This builds on top of #3888 and implements $* as a list and other match related improvements.

These are the new features:

  • $* is a new macro that generates a list from $1, $2, $3 ... name-value pairs
  • the number of such name value pairs are tracked internally. Any new match operation (e.g. regexp matching) would reset the entire match array, while previously "stale" values could remain, if the new match operation would generate less matches than the previous one
  • added set-matches() and unset-matches() rewrite operations, the first expecting a list, while the second just resets the matches array to be empty.

@kira-syslogng
Copy link
Contributor

Build SUCCESS

1 similar comment
@kira-syslogng
Copy link
Contributor

Build SUCCESS

@bazsi bazsi force-pushed the matches-as-an-array branch from 7c1cd66 to 255d394 Compare March 21, 2022 06:30
@kira-syslogng
Copy link
Contributor

Build FAILURE

@bazsi
Copy link
Collaborator Author

bazsi commented Mar 21, 2022

@kira-syslogng test this please;

@kira-syslogng
Copy link
Contributor

Build SUCCESS

@bazsi bazsi changed the title Matches as an array Matches as an array (4.0) Mar 24, 2022
@bazsi bazsi force-pushed the matches-as-an-array branch from 255d394 to 124b385 Compare April 1, 2022 09:10
@kira-syslogng
Copy link
Contributor

Build SUCCESS

@gaborznagy gaborznagy self-requested a review April 7, 2022 12:18
@gaborznagy
Copy link
Collaborator

Just a note from our discussion: investigate the possibility of unsetting the stale/obsolete matches from earlier (i.e. when we have fewer matches in a subsequent matching than before).

@bazsi bazsi force-pushed the matches-as-an-array branch from 124b385 to 6c792a6 Compare April 8, 2022 11:28
@bazsi
Copy link
Collaborator Author

bazsi commented Apr 8, 2022

Just a note from our discussion: investigate the possibility of unsetting the stale/obsolete matches from earlier (i.e. when we have fewer matches in a subsequent matching than before).

It seems that I have implemented this already and then forgot about it. It is this patch: c67006b63d05d60530f1a435750b423281935ffe

This means that if we set non-consecutive match values then anything between the old range and the new one are unset explicitly.

I've now noticed that there's a function to reset the number of matches: log_msg_clear_matches() so I've committed changes to
use that function instead of open-coding the same.

@kira-syslogng
Copy link
Contributor

Build SUCCESS

Copy link
Collaborator

@gaborznagy gaborznagy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked it, and it looks good to me!

I have only some minor remarks about commit history, about a unit test, and some general remarks.

I don't have a better, more general alternative to the term "matches" in set-matches or unset-matches.
I'm not sure of their general nature either: users of $1 or set-matches() should be careful of regex capture groups that could overwrite their explicitly set matches.

In Bash, $*, $1, etc. all refer to positional parameters.
In Perl it was related to multiline matching in regular expressions, but not since Perl 5.10.

@gaborznagy
Copy link
Collaborator

I've sent a PR about the news entries.

@kira-syslogng
Copy link
Contributor

Build FAILURE

@bazsi
Copy link
Collaborator Author

bazsi commented Apr 19, 2022

@kira-syslogng retest this please;

@kira-syslogng
Copy link
Contributor

Build FAILURE

@gaborznagy
Copy link
Collaborator

@bazsi Kira fails due to the git security patch to CVE-2022-24765 (https://github.blog/2022-04-12-git-security-vulnerability-announced/)

@gaborznagy
Copy link
Collaborator

We'll be adding the suggested safe.directory git config workaround as I've had some test issues running the docker image as the repository's owner (some journald tests fail)

@bazsi bazsi force-pushed the matches-as-an-array branch 2 times, most recently from 2d7925e to 1089ad1 Compare April 24, 2022 13:00
@bazsi
Copy link
Collaborator Author

bazsi commented Apr 24, 2022

Thanks @gaborznagy for the good feedback. I've now addressed all your review notes and pushed out another iteration. There seems to be some style-check related issue in the CI about a missing Python command. Otherwise things seem to be green (unless I am breaking something with the last force-push :)

Reviewing this one would be appreciated.

@kira-syslogng
Copy link
Contributor

Build SUCCESS

gaborznagy
gaborznagy previously approved these changes Apr 28, 2022
Copy link
Collaborator

@gaborznagy gaborznagy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for resolving my comments.

@gaborznagy
Copy link
Collaborator

@bazsi this needs one last rebase to fix the Light style-check issue in Github actions.

@gaborznagy gaborznagy requested a review from OverOrion April 29, 2022 08:52
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log {
  source(s_src);
  rewrite {
    set("foo bar" value('1'));
    unset-matches();
  };
  destination { file("/dev/stdout" template("$(format-json --scope nv-pairs --key 1)\n")); };
};

With this patch we can still query the $1 value after the unset-matches(); is it an intended behaviour?
I would argue that after unset-matches() is called every $n should be empty (""), or at least that is what I expect.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. The log_msg_values_foreach() did not filter out out-of-range matches. I'm pushing a fix for this, however I am starting to feel that it'd be much better to explicitly unset all matches explitictly, rather than trying to track the number of matches and then checking their number in all the code paths that query name-value pairs.

Might even be faster doing it only once, instead of at every log_msg_values_foreach() iteration.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the fix for this: e234146b1ae44af0cdfff06e65917d40fa14c275

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to be nitpicking especially since I've approved this PR once, but this fix in e234146 only fixes the case when a cleared match is referenced through value-pairs, e.g. through format-json.

If the match (e.g. $1) is used later as an input in a parser or rewrite rule, then it would still hold the previously (cleared) value.
Maybe this scenario is rather a programming error, e.g. the writer of an SCL block should be aware of this.
A dummy example SCL:

parser example-parser() {
....
  set("foo bar" value('1'));
  unset-matches();
  syslog-parser(template('$1'));
};

TLDR: how about using log_msg_unset_match() instead of log_msg_clear_matches()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the last iteration, I have changed the approach of resetting values, so the branch now contains a patch that changes things - again.

With that said, this case has worked correctly before, '$1' was unset because of this branch in log_msg_get_match():

+const gchar *
+log_msg_get_match_if_set_with_type(const LogMessage *self, gint index_, gssize *value_len,
+                                   LogMessageValueType *type)
+{
+  g_assert(index_ >= 0 && index_ < LOGMSG_MAX_MATCHES);
+
+  if (index_ >= self->num_matches)
+    return NULL;
+
+  return nv_table_get_value_if_set(self->payload, match_handles[index_], value_len, type);
+}
+

e.g. if a match number was outside the range of num_matches we always considered it to be unset (NOTE the NULL return). This worked correctly even before. The value-pairs code path had a bug as @OverOrion noticed correctly, as it didn't use log_msg_get_match(), rather it iterated over all name-value pairs using log_msg_values_foreach(), which did not filter out $1 in this case. That's what my last patch fixed.

With all this iterations however, at the end I've decided to go back to the original concept and call log_msg_unset() all matches whenever the num_matches would change. This eliminates the need to handle matches separately from other name-value pairs in the query APIs.

This is this patch that reverses most of the above.

commit 855e0c4174ebbba5db05d9395a2d33efb6b1cc4f
Author: Balazs Scheidler bazsi77@gmail.com
Date: Tue May 3 13:48:12 2022 +0200

logmsg: unset matches proactively

With that we are failing the cisco-parser() testcase in light, so I am now going to find that issue.

@bazsi bazsi force-pushed the matches-as-an-array branch from 425227b to 68dbb05 Compare May 18, 2022 14:34
@kira-syslogng
Copy link
Contributor

Build SUCCESS

Gabor Nagy and others added 3 commits May 21, 2022 11:22
Signed-off-by: Gabor Nagy <gabor.nagy@oneidentity.com>
…hes_too

Signed-off-by: Balazs Scheidler <bazsi77@gmail.com>
The input parameters might refer to a borrowed reference we might clobber
with log_msg_set_value() calls. In those cases, we save the original
value into result.source_value so use that instead of the parameter.

Signed-off-by: Balazs Scheidler <bazsi77@gmail.com>
@bazsi bazsi force-pushed the matches-as-an-array branch from 2b67be9 to 16f1c76 Compare May 21, 2022 09:22
@kira-syslogng
Copy link
Contributor

Build SUCCESS

bazsi added 4 commits May 22, 2022 09:17
We only have up to $255 so don't set any more than that.

Signed-off-by: Balazs Scheidler <bazsi77@gmail.com>
LogMessage has traditionally aborted whenever we tried to access match
variables that are out-of-range, however with the adoption of set-matches()
and the use of match variables for array-like use-cases this is not
right choice anymore, ignore the set/get operation in these cases instead.

Signed-off-by: Balazs Scheidler <bazsi77@gmail.com>
Signed-off-by: Balazs Scheidler <bazsi77@gmail.com>
Signed-off-by: Balazs Scheidler <bazsi77@gmail.com>
@bazsi bazsi force-pushed the matches-as-an-array branch from 16f1c76 to 7b9dd74 Compare May 22, 2022 07:17
@kira-syslogng
Copy link
Contributor

Build SUCCESS

@kira-syslogng
Copy link
Contributor

Build SUCCESS

bazsi added 3 commits May 22, 2022 19:16
Signed-off-by: Balazs Scheidler <bazsi77@gmail.com>
…obbering()

Signed-off-by: Balazs Scheidler <bazsi77@gmail.com>
…ndle that as well

Signed-off-by: Balazs Scheidler <bazsi77@gmail.com>
@bazsi bazsi force-pushed the matches-as-an-array branch from 0c0c8b1 to 30ad5b6 Compare May 22, 2022 17:30
@bazsi
Copy link
Collaborator Author

bazsi commented May 22, 2022

I have found yet another case that clobbers the borrowed input of LogMatcher instances, which caused the change in
#3934 to fail. Added some more unit tests and a fix too.

@kira-syslogng
Copy link
Contributor

Build SUCCESS

Copy link
Collaborator

@OverOrion OverOrion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Also thank your for not only addressing the review notes, but writing additional tests for those as well!

@MrAnno MrAnno mentioned this pull request May 30, 2022
8 tasks
@gaborznagy
Copy link
Collaborator

regexp-parser() still uses log_matcher_match directly instead of the new wrappers (buffer, value, and template).
I've checked it and it's fine this way: it doesn't have to use log_matcher_match_value, as the NVTable reference counting happens in LogParser layer, can't leverage the advantages of log_matcher_match_template either, as the template is formatted already in the LogParser layer.

Maybe log_matcher_match_buffer could be used to emphasize that the caller already provided the functionalities and requirements supplied by those wrappers otherwise.

Copy link
Collaborator

@gaborznagy gaborznagy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. This was a very long time and lot of efforts for all of us, but I think it really worth it.
Sorry for taking it so long. :)

Otherwise I would ask to squash some patches (especially that some feature took opposite directions), but it would require a lot of effort again from all of us.
The patches are self-contained (not fixup patches), so we can merge it in this state.

@gaborznagy gaborznagy merged commit 3ac3176 into syslog-ng:master Jun 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants