Set counter consistently so zero always indicates all records #7801

bgirardeau-figma · 2022-10-27T00:47:35Z

This change sets the counter to zero only when all records are being logged as part of an initial run or new epoch.

Previously in some cases, the counter would be zero for a differential log (specifically events tables or if the query changed). For these cases, the counter is now reset to 1 instead. This change also does a small refactor to ensure the epoch in the database is updated for events tables, so that it will be set accurately if the same query is later changed to run over a non-evented table.

More context and motivation for this change is in #7799

directionless

I like the intent here.

One ask -- can something about the meaning of 0 be added to the docs/wiki?

bgirardeau-figma · 2023-01-06T17:55:38Z

One ask -- can something about the meaning of 0 be added to the docs/wiki?

Yeah! There is documentation about counter, so I updated it to be a little more explicit about when it starts at 0 vs 1.

directionless · 2023-02-10T01:54:20Z

osquery/core/query.cpp

+uint64_t Query::getQueryCounter(bool all_records, bool new_query) const {
  uint64_t counter = 0;
-  if (new_query) {
+  if (all_records) {
    return counter;
  }


I can roll with this, but I think it's a bit confusing.

For events, this calls: incrementCounter(false, new_epoch || new_query, counter);
but for normal queries: incrementCounter(new_epoch, new_query, counter);

And that distinction feels weird. I don't really understand why it's not something like:

uint64_t Query::getQueryCounter(bool start_at_zero, bool new_query) const { if (new_query) { if (all_records) { return 0; } return 1; } uint64_t counter = 0;

which would allow a more consistent call semantics.

But I might be missing something?

Yeah I found making a clear distinction between if we are returning all records vs something about the query changed was the least confusing option for me, but I agree it's tricky.

The issue is that events and normal queries are different in that events never have a snapshot/all records available, so they never start at 0 and always reset to 1, while normal queries have a distinction between epoch change or just query SQL changing when they are resetting the counter (to 0 vs 1).

So normal queries would still need to change start_at_zero based on new_epoch. ie: incrementCounter(new_epoch, new_query || new_epoch, counter), which means it doesn't really change the logic in a meaningful way.

Yeah I found making a clear distinction between if we are returning all records vs something about the query changed was the least confusing option for me, but I agree it's tricky.

Yeah -- I wasn't suggesting a logic change, just a naming/signature change. But I think you're right -- bool start_at_zero is no better than bool all_records. They're both just describing the behavior. Mostly this pairs with the other comment.

directionless · 2023-02-10T01:57:14Z

osquery/core/query.cpp

-  if (update_db || fresh_results || new_query) {
-    auto status = incrementCounter(fresh_results || new_query, counter);
+  if (update_db || new_epoch || new_query) {
+    auto status = incrementCounter(new_epoch, new_query, counter);


Why not auto status = incrementCounter(true, new_epoch || new_query, counter);

Will this do the right thing with the same epoch and a new query?

Edit: summary is it won't do the right thing because we actually want to have the behavior to return 1 in the new query same epoch case (since it does not return all records then).

The first argument should be false if it's the same epoch because that means the results are a diff, rather than including all records (even if the query changes). In the same epoch but new query case, counter resets to 1 instead of 0.

auto status = incrementCounter(new_epoch, new_epoch || new_query, counter) would work here though. It won't have different behavior since new_epoch always forces counter of 0 anyway, happy to change if you think it's clearer.

I am very confused here.

I thought new_epoch referred to a new epoch, eg, the epoch has been changed. And that new_query referred to the first execution of a query. (I get this from reading the comments in the code).

As I understand the goal, either a new_query OR a new_epoch should reset the counter. And then what we reset the counter to depends on whether it's evented or not. Put differently, I think if there's a new_epoch, every query should be a new_query. And that we should be calling this as incrementCounter(<event-or-regular>, new_epoch || new_query, counter);

Splitting it up as incrementCounter(new_epoch, new_query, counter); feels weird. I think when a new config is loaded, and there are new queries, this would return 1 which seems wrong?

Ah yes I see the confusion, I had been using "new query" to mean the query SQL changed, not necessarily when the query is brand new overall. And "new epoch" to mean a new epoch for this query, including its first epoch overall.

I think it could be clearer if I rename these variables instead to new_query_epoch and new_query_sql -- I'll try that now but happy to revert or tweak if you prefer other names.

Here's another way to describe the behaviors I was going for in this PR:

non-evented table

query is brand new -> reset counter to 0, include all records

epoch changed -> reset counter to 0, include all records

epoch same, query sql changed -> reset counter to 1 (formerly 0), do a normal diff

epoch same, query sql same -> increment counter by 1, do a normal diff

evented table

query is brand new -> reset counter to 1 (formerly 0), stream events

epoch changed -> reset counter to 1 (formerly 0), stream events

epoch same, query sql changed -> reset counter to 1 (formerly 0), stream events

epoch same, query sql same -> increment counter by 1, stream events

The bolded parts are what this PR changes vs what osquery does today. To me the new behavior better matches the intent of the documentation to be able "to skip the initial added records" by ignoring when counter = 0 described here.

(updated the PR, hopefully the naming makes more sense!)

Oh, your table form is helpful. I am confused, because I disagree with one of the lines:

non-evented table

query is brand new -> reset counter to 0, include all records

epoch changed -> reset counter to 0, include all records

epoch same, query sql changed -> reset counter to 1 (formerly 0), do a normal diff

epoch same, query sql same -> increment counter by 1, do a normal diff

I think (1), (2), and (3) should have the same behavior. They should reset to 0 and includes all records.

I think a SQL change is functionally a new query. I'm curious why not?

Wait, maybe we're both wrong?

If the query name is unchanged, we should not be resetting the the counter. [query name, epoch, counter] should be unique. Changing the query sql, but not the name, should not change the counter because it would invalidate that. It's also probably a bad idea.

But I'm not sure what the underlying system actually does.

Yeah I think there are good arguments either way for whether to return all records for a SQL change, so I was hesitant to change the current behavior of it doing a diff - it seemed like a larger/potentially more disruptive change than changing the counter behavior.

If the query name is unchanged, we should not be resetting the the counter.

I like this idea! While I was most bothered by the counter being 0 in this case, I'm not actually sure why resetting at all there is helpful -- I agree violating the uniqueness of that triple is annoying.

I'm happy to make this update. If anyone really relied on knowing when results are due to a query SQL update for some reason, in theory they could do that in a more clear/direct way by looking for the status line or joining on the osquery_schedule table in the query itself.

Yeah I think there are good arguments either way for whether to return all records for a SQL change, so I was hesitant to change the current behavior of it doing a diff - it seemed like a larger/potentially more disruptive change than changing the counter behavior.

Yes, I think that's the correct judgement -- This PR should preserve the existing behavior. Not that we can't fix it, but we should be deliberate about doing so.

In that vein, this PR feels reasonable. It's not a deep change. Can you open a blueprint issue with a proposal for better counter behavior on changing SQL? I imagine there might be discussion to be had (I can think of 2 solutions)

Sounds good! I opened one here #7945

…te_diff param)

directionless

I'm somewhat trepidatious -- as discussed in the comment threads, I think this exposes the corner case where we have poor behavior on a sql change. (osquery resets the counter, thus violating the idea that [queryname, epoch, counter] is immutable.)

Set epoch and counter consistently

2fe2a27

bgirardeau-figma requested review from a team as code owners October 27, 2022 00:47

Fix query name changed counter test

fd741a0

bgirardeau-figma mentioned this pull request Oct 28, 2022

Make epoch and counter fields more consistent #7799

Open

mike-myers-tob added ready for review Pull requests that are ready to be reviewed by a maintainer core labels Nov 9, 2022

directionless reviewed Dec 20, 2022

View reviewed changes

Update documentation

8dd9c7b

directionless reviewed Feb 10, 2023

View reviewed changes

directionless added this to the 5.8.0 milestone Feb 15, 2023

bgirardeau-figma added 4 commits February 22, 2023 21:15

Rename variables to clarify behavior

d32bbb0

Update header function comment

d4b86da

Fix query tests

65b95da

Don't reset counter when query SQL changes (and remove unused calcula…

14dd2b7

…te_diff param)

directionless approved these changes Feb 24, 2023

View reviewed changes

bgirardeau-figma mentioned this pull request Feb 24, 2023

Better counter behavior on changing SQL #7945

Closed

directionless merged commit 756c680 into osquery:master Feb 24, 2023

directionless mentioned this pull request Mar 21, 2023

Fix empty batch result set reporting #7958

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set counter consistently so zero always indicates all records #7801

Set counter consistently so zero always indicates all records #7801

bgirardeau-figma commented Oct 27, 2022

directionless left a comment

bgirardeau-figma commented Jan 6, 2023

directionless Feb 10, 2023

bgirardeau-figma Feb 10, 2023

directionless Feb 23, 2023

directionless Feb 10, 2023

bgirardeau-figma Feb 10, 2023 •

edited

directionless Feb 23, 2023 •

edited

bgirardeau-figma Feb 23, 2023 •

edited

bgirardeau-figma Feb 23, 2023

directionless Feb 23, 2023

directionless Feb 23, 2023 •

edited

bgirardeau-figma Feb 23, 2023

directionless Feb 24, 2023

bgirardeau-figma Feb 24, 2023

directionless left a comment

Set counter consistently so zero always indicates all records #7801

Set counter consistently so zero always indicates all records #7801

Conversation

bgirardeau-figma commented Oct 27, 2022

directionless left a comment

Choose a reason for hiding this comment

bgirardeau-figma commented Jan 6, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bgirardeau-figma Feb 10, 2023 • edited

Choose a reason for hiding this comment

directionless Feb 23, 2023 • edited

Choose a reason for hiding this comment

bgirardeau-figma Feb 23, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

directionless Feb 23, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

directionless left a comment

Choose a reason for hiding this comment

bgirardeau-figma Feb 10, 2023 •

edited

directionless Feb 23, 2023 •

edited

bgirardeau-figma Feb 23, 2023 •

edited

directionless Feb 23, 2023 •

edited