Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpc: make blocking queries for non-existent items more efficient #12110

Merged
merged 3 commits into from Feb 17, 2022

Conversation

dnephin
Copy link
Contributor

@dnephin dnephin commented Jan 17, 2022

Branched from #12109, fixes #9449

Full details about this change are in the commit messages. This is a problem that has been reported a few times. It is particularly problematic when connect is enabled, because agents will do many blocking queries for config-entries that often do not exist.

This PR improves the behaviour of blocking queries when the result of the query is "no items found" (no modify index to use as the QueryMeta.Index). This change makes blockingQuery more efficient, but there is still possibility for improvement. Currently we have to return on the first iteration if there was a write between the time the previous RPC call was returned and the new one was received (because the table max index would have changed). We can fix that case by adding a new query option, which I will do in a follow up PR.

TODO:

  • more test cases for state changes between existing->not-existing and not-existing->existing
  • return errNotFound from all the RPC endpoint functions passed to blockingQuery (where appropriate), so that all queries benefit from this enhancement. Done for all "get" endpoints. List endpoints will be left for a future PR as those are more involved.
  • changelog entry

@dnephin dnephin added type/enhancement Proposed improvement or new feature theme/internals Serf, Raft, SWIM, Lifeguard, Anti-Entropy, locking topics labels Jan 17, 2022
@dnephin dnephin changed the title rpc: make blocking queries for non-existent items more efficient. rpc: make blocking queries for non-existent items more efficient Jan 17, 2022
@github-actions github-actions bot removed the theme/internals Serf, Raft, SWIM, Lifeguard, Anti-Entropy, locking topics label Jan 17, 2022
@dnephin dnephin force-pushed the dnephin/blocking-queries-not-found branch from 6c81389 to ce99688 Compare January 17, 2022 22:44
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging January 17, 2022 22:44 Inactive
@vercel vercel bot temporarily deployed to Preview – consul January 17, 2022 22:44 Inactive
@@ -142,7 +142,7 @@ func (c *ConfigEntry) Get(args *structs.ConfigEntryQuery, reply *structs.ConfigE

reply.Index = index
if entry == nil {
return nil
return errNotFound
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the change that needs to be applied to all the other RPC endpoints to take advantage of this enhancement. Currently it is only implemented on this one ConfigEntry.Get endpoint.

@banks
Copy link
Member

banks commented Jan 25, 2022

Nice Daniel! This is a really smart way to save a lot of CPU time in this common case without invasive changes to how we store things!

If I understand correctly, this saves on the CPU overhead of building a response, serializing it, writing to the wire, completing the request, and then accepting a new request from the client, deserialising etc.

The one part that remains that potentially costs something is the re-query of the state store for every change. But I expect that to be a very small portion of the CPU usage end to end here since in this case the query should be a single index lookup that returns no results and so a quick exit.

Did you see any data while debugging that indicated the proportion of CPU that could be saved here vs the internal query parts? No worries if not just interested.

For comparison, where this query work was an issue in the past it was for service health with a large number of instances because then there was O(N) work to do joining all the node, check and instance information to build the result, so when N got large, and many watchers were all doing this in a semi-hot loop (on every service update) it got really expensive. (That's different from the empty case but amounted to the same thing thanks to our limit of 2048 WatchSet chans after which we fell back to whole-table index watching, we fixed this with per-service indexes specifically for Health queries.)

So yeah this seems like a really simple change that should be a very significant CPU saving in this common case 👏

@dnephin
Copy link
Contributor Author

dnephin commented Jan 25, 2022

Yes, exactly! I have not measured the cost of the extra query, but I don't expect that to be significant. I think it's worth mentioning that this extra cost already exists in all of our blocking queries. I believe the WatchSet can fire any time a node is modified because of a delete or insert. The frequency of these fires is hard to predict, but it should be fair to say it happens often enough.

Any time the WatchSet fires due to a node update, we'd have the same behaviour as you describe. There would be an extra state store query, but the returned index would not have changed (because it should generally be the ModifyIndex), so we would never notice those queries in our logs or metrics.

I think given that this already happens today in normal blocking-queries, there's no real concern that it will continue to happen for blocking-queries for non-existent entries.

@dnephin dnephin added the theme/internals Serf, Raft, SWIM, Lifeguard, Anti-Entropy, locking topics label Jan 25, 2022
@dnephin dnephin force-pushed the dnephin/blocking-queries-not-found branch from ce99688 to 4457d6a Compare January 25, 2022 23:51
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging January 25, 2022 23:51 Inactive
@vercel vercel bot temporarily deployed to Preview – consul January 25, 2022 23:51 Inactive
@dnephin dnephin marked this pull request as ready for review January 25, 2022 23:51
Base automatically changed from dnephin/blocking-query-1 to main January 26, 2022 23:13
} else {
reply.Index = index
}
reply.Index = index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably this part of the change was intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, this is now handled generally in Server.blockingQuery by calling Server.setQueryMeta, and setting the value to 1 if it is still 0. I guess this logic in kvs_endpoint predated the general fix in blockingQuery.

@dnephin dnephin force-pushed the dnephin/blocking-queries-not-found branch from 4457d6a to ae10e60 Compare February 15, 2022 18:44
@vercel vercel bot temporarily deployed to Preview – consul February 15, 2022 18:45 Inactive
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging February 15, 2022 18:45 Inactive
if err != nil {
switch {
case errors.Is(err, errNotFound):
if notFound {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need var notFound bool? It seems like it moots the whole point, because the first errNotFound skips the conditional here, and proceeds exactly like the prior scenario, never executing line 978.

Copy link
Contributor Author

@dnephin dnephin Feb 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question!

I believe what you are calling out is that on the first iteration of this loop we still rely on the index not having changed. If a write happens between the time the last blocking query returned and when the next blocking query request happens (which I think should generally only be a few milliseconds), then this optimization does not work. We continue to have the old behaviour of returning unnecessarily. Generally that should be ok because it only limits the optimization, should not impact correctness, and there is a very small window where a write must happen to cause this problem.

There is a way to work around this limitation (could be a follow up), but it introduces some risk, so I thought it would be better to avoid. I haven't documented that anywhere yet (just some discussions in old slack threads), but I can do that here.

The work around for this limitation would be to add a new X-Consul-Index-Not-Found query parameter (as a sibling to X-Consul-Index). This parameter would be used to set the initial value of var notFound bool. That way when a new query comes in, we'll already know if we're in a "not found" state, and only the creation of an item (or a timeout) will cause blockingQuery to return. The risk here is that if the user incorrectly sets this header, then they could miss a delete that happened, but I haven't thought enough about this case to know if it's actually a problem for correctness. Maybe we should do this as a follow up.

Why do we need var notFound bool?

To answer the other part of this question, this is essential to the optimization. I think this is explained briefly in 9abb49da51ad87312a5f34364d8e6a3c0e35c328, but I'm happy to add more detail since this is the main reason for the change. As documented in #12343, there are two things to consider here: the "last modified index" set on responseMeta.Index, and the WatchSet blocking.

We can't fix the WatchSet blocking without turning every read operation into a write that modifies the memdb index (I'm pretty sure this is not an approach we want to investigation). Maybe there are other ways of making WatchSet block correctly for not found items? I have not found any.

So that leaves the index comparison (responseMeta.Index > query.MinIndex). Generally we know this comparison will return true in these "not found" cases, because responseMeta.Index will be the max index of the table, and it changes on every write.

What we need is a comparison we can use instead of responseMeta.Index > query.MinIndex.

If we think about this comparison more abstractly, I think we'll see that the question this is asking is "has the result of the query changed from the last result sent to the client". There are plenty of ways that question could be answered, using the modify index is a very cost effective one, so that's generally what we use.

However as we've noticed, using that modify index doesn't work for cases where an item does not exist, because we don't have a stable modify index to compare against. In this case we use "not found" to indicate if the result has changed.

Since we don't know the state from the client perspective (X-Consul-Index-Not-Found would give us that), we have to perform the query once. If the result of the query is the same "last modified index", then we know at that index the query returned no results (which tells us the client's last view was "not found"). Until the query returns some result, the state has not changed, so we can continue to block until that time.

In the line below this, when we see "state has not changed, the result is still not found", we update the minQueryIndex to the latest responseMeta.Index, so that the comparison later keeps us in the for loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #12363 for that potential future optimization.

Copy link
Contributor Author

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll rebase this PR, it confliced with some renames from the docs PR.

I'll also update the godoc to document this special errNotFound that the query function can return.

if err != nil {
switch {
case errors.Is(err, errNotFound):
if notFound {
Copy link
Contributor Author

@dnephin dnephin Feb 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question!

I believe what you are calling out is that on the first iteration of this loop we still rely on the index not having changed. If a write happens between the time the last blocking query returned and when the next blocking query request happens (which I think should generally only be a few milliseconds), then this optimization does not work. We continue to have the old behaviour of returning unnecessarily. Generally that should be ok because it only limits the optimization, should not impact correctness, and there is a very small window where a write must happen to cause this problem.

There is a way to work around this limitation (could be a follow up), but it introduces some risk, so I thought it would be better to avoid. I haven't documented that anywhere yet (just some discussions in old slack threads), but I can do that here.

The work around for this limitation would be to add a new X-Consul-Index-Not-Found query parameter (as a sibling to X-Consul-Index). This parameter would be used to set the initial value of var notFound bool. That way when a new query comes in, we'll already know if we're in a "not found" state, and only the creation of an item (or a timeout) will cause blockingQuery to return. The risk here is that if the user incorrectly sets this header, then they could miss a delete that happened, but I haven't thought enough about this case to know if it's actually a problem for correctness. Maybe we should do this as a follow up.

Why do we need var notFound bool?

To answer the other part of this question, this is essential to the optimization. I think this is explained briefly in 9abb49da51ad87312a5f34364d8e6a3c0e35c328, but I'm happy to add more detail since this is the main reason for the change. As documented in #12343, there are two things to consider here: the "last modified index" set on responseMeta.Index, and the WatchSet blocking.

We can't fix the WatchSet blocking without turning every read operation into a write that modifies the memdb index (I'm pretty sure this is not an approach we want to investigation). Maybe there are other ways of making WatchSet block correctly for not found items? I have not found any.

So that leaves the index comparison (responseMeta.Index > query.MinIndex). Generally we know this comparison will return true in these "not found" cases, because responseMeta.Index will be the max index of the table, and it changes on every write.

What we need is a comparison we can use instead of responseMeta.Index > query.MinIndex.

If we think about this comparison more abstractly, I think we'll see that the question this is asking is "has the result of the query changed from the last result sent to the client". There are plenty of ways that question could be answered, using the modify index is a very cost effective one, so that's generally what we use.

However as we've noticed, using that modify index doesn't work for cases where an item does not exist, because we don't have a stable modify index to compare against. In this case we use "not found" to indicate if the result has changed.

Since we don't know the state from the client perspective (X-Consul-Index-Not-Found would give us that), we have to perform the query once. If the result of the query is the same "last modified index", then we know at that index the query returned no results (which tells us the client's last view was "not found"). Until the query returns some result, the state has not changed, so we can continue to block until that time.

In the line below this, when we see "state has not changed, the result is still not found", we update the minQueryIndex to the latest responseMeta.Index, so that the comparison later keeps us in the for loop.

This test shows how blocking queries are not efficient when the query
returns no results.  The test fails with 100+ calls instead of the
expected 2.

This test is still a bit flaky because it depends on the timing of the
writes. It can sometimes return 3 calls.

A future commit should fix this and make blocking queries even more
optimal for not-found results.
By using the query results as state.

Blocking queries are efficient when the query matches some results,
because the ModifyIndex of those results, returned as queryMeta.Mindex,
will never change unless the items themselves change.

Blocking queries for non-existent items are not efficient because the
queryMeta.Index can (and often does) change when other entities are
written.

This commit reduces the churn of these queries by using a different
comparison for "has changed". Instead of using the modified index, we
use the existence of the results. If the previous result was "not found"
and the new result is still "not found", we know we can ignore the
modified index and continue to block.

This is done by setting the minQueryIndex to the returned
queryMeta.Index, which prevents the query from returning before a state
change is observed.
Any query that returns a list of items is not part of this commit.
@dnephin dnephin force-pushed the dnephin/blocking-queries-not-found branch from ae10e60 to 8a6e75a Compare February 15, 2022 23:24
@vercel vercel bot temporarily deployed to Preview – consul February 15, 2022 23:24 Inactive
@vercel vercel bot temporarily deployed to Preview – consul-ui-staging February 15, 2022 23:24 Inactive
Copy link
Member

@rboyer rboyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dnephin
Copy link
Contributor Author

dnephin commented Feb 17, 2022

I'm going to merge without any backport tags for now. I'll need to backport #12109 before this one, otherwise it will never apply.

@dnephin dnephin merged commit 85ecbaf into main Feb 17, 2022
@dnephin dnephin deleted the dnephin/blocking-queries-not-found branch February 17, 2022 23:09
@hc-github-team-consul-core
Copy link
Collaborator

🍒 If backport labels were added before merging, cherry-picking will start automatically.

To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/588613.

rboyer added a commit that referenced this pull request Feb 18, 2022
…g entries are involved

Starting from and extending the mechanism introduced in #12110 we can
specially handle the following special Consul RPC endpoints that react to
many config entries in a single blocking query in Connect:

- DiscoveryChain.Get
- ConfigEntry.ResolveServiceConfig
- Intentions.Match
- ConfigEntry.List
- Internal.IntentionUpstreams

All of these will internally watch for many config entries, and at least
one of those will likely be not found in any given query. Because these
are blends of multiple reads the exact solution from #12110 isn't
perfectly aligned, but we can tweak the approach slightly and regain the
utility of that mechanism.

(A) No Config Entries Found

In this case, despite looking for many config entries none may be found
at all. Unlike #12110 in this scenario we do not return an empty reply
to the caller, but instead synthesize a struct from default values to
return. This can be handled nearly identically to #12110 with the first
1-2 replies being non-empty payloads followed by the standard spurious
wakeup suppression mechanism from #12110.

(B) No Change Since Last Wakeup

Once a blocking query loop on the server has completed and slept at
least once, there is a further optimization we can make here to detect
if any of the config entries that were present at specific versions for
the prior execution of the loop are identical for the loop we just woke
up for. In that scenario we can return a slightly different internal
sentinel error and basically externally handle it similar to #12110.

This would mean that even if 20 discovery chain read RPC handling
goroutines wakeup due to the creation of an unrelated config entry, the
only ones that will terminate and reply with a blob of data are those
that genuinely have new data to report.
rboyer added a commit that referenced this pull request Feb 22, 2022
…g entries are involved

Starting from and extending the mechanism introduced in #12110 we can
specially handle the following special Consul RPC endpoints that react to
many config entries in a single blocking query in Connect:

- DiscoveryChain.Get
- ConfigEntry.ResolveServiceConfig
- Intentions.Match
- ConfigEntry.List
- Internal.IntentionUpstreams

All of these will internally watch for many config entries, and at least
one of those will likely be not found in any given query. Because these
are blends of multiple reads the exact solution from #12110 isn't
perfectly aligned, but we can tweak the approach slightly and regain the
utility of that mechanism.

(A) No Config Entries Found

In this case, despite looking for many config entries none may be found
at all. Unlike #12110 in this scenario we do not return an empty reply
to the caller, but instead synthesize a struct from default values to
return. This can be handled nearly identically to #12110 with the first
1-2 replies being non-empty payloads followed by the standard spurious
wakeup suppression mechanism from #12110.

(B) No Change Since Last Wakeup

Once a blocking query loop on the server has completed and slept at
least once, there is a further optimization we can make here to detect
if any of the config entries that were present at specific versions for
the prior execution of the loop are identical for the loop we just woke
up for. In that scenario we can return a slightly different internal
sentinel error and basically externally handle it similar to #12110.

This would mean that even if 20 discovery chain read RPC handling
goroutines wakeup due to the creation of an unrelated config entry, the
only ones that will terminate and reply with a blob of data are those
that genuinely have new data to report.
rboyer added a commit that referenced this pull request Feb 25, 2022
…g entries are involved (#12362)

Starting from and extending the mechanism introduced in #12110 we can specially handle the 3 main special Consul RPC endpoints that react to many config entries in a single blocking query in Connect:

- `DiscoveryChain.Get`
- `ConfigEntry.ResolveServiceConfig`
- `Intentions.Match`

All of these will internally watch for many config entries, and at least one of those will likely be not found in any given query. Because these are blends of multiple reads the exact solution from #12110 isn't perfectly aligned, but we can tweak the approach slightly and regain the utility of that mechanism.

### No Config Entries Found

In this case, despite looking for many config entries none may be found at all. Unlike #12110 in this scenario we do not return an empty reply to the caller, but instead synthesize a struct from default values to return. This can be handled nearly identically to #12110 with the first 1-2 replies being non-empty payloads followed by the standard spurious wakeup suppression mechanism from #12110.

### No Change Since Last Wakeup

Once a blocking query loop on the server has completed and slept at least once, there is a further optimization we can make here to detect if any of the config entries that were present at specific versions for the prior execution of the loop are identical for the loop we just woke up for. In that scenario we can return a slightly different internal sentinel error and basically externally handle it similar to #12110.

This would mean that even if 20 discovery chain read RPC handling goroutines wakeup due to the creation of an unrelated config entry, the only ones that will terminate and reply with a blob of data are those that genuinely have new data to report.

### Extra Endpoints

Since this pattern is pretty reusable, other key config-entry-adjacent endpoints used by `agent/proxycfg` also were updated:

- `ConfigEntry.List`
- `Internal.IntentionUpstreams` (tproxy)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/internals Serf, Raft, SWIM, Lifeguard, Anti-Entropy, locking topics type/enhancement Proposed improvement or new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

blocking k/v read on non-existent key returns once any key is written
4 participants