-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rpc: make blocking queries for non-existent items more efficient #12110
Conversation
6c81389
to
ce99688
Compare
@@ -142,7 +142,7 @@ func (c *ConfigEntry) Get(args *structs.ConfigEntryQuery, reply *structs.ConfigE | |||
|
|||
reply.Index = index | |||
if entry == nil { | |||
return nil | |||
return errNotFound |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the change that needs to be applied to all the other RPC endpoints to take advantage of this enhancement. Currently it is only implemented on this one ConfigEntry.Get
endpoint.
Nice Daniel! This is a really smart way to save a lot of CPU time in this common case without invasive changes to how we store things! If I understand correctly, this saves on the CPU overhead of building a response, serializing it, writing to the wire, completing the request, and then accepting a new request from the client, deserialising etc. The one part that remains that potentially costs something is the re-query of the state store for every change. But I expect that to be a very small portion of the CPU usage end to end here since in this case the query should be a single index lookup that returns no results and so a quick exit. Did you see any data while debugging that indicated the proportion of CPU that could be saved here vs the internal query parts? No worries if not just interested. For comparison, where this query work was an issue in the past it was for service health with a large number of instances because then there was O(N) work to do joining all the node, check and instance information to build the result, so when N got large, and many watchers were all doing this in a semi-hot loop (on every service update) it got really expensive. (That's different from the empty case but amounted to the same thing thanks to our limit of 2048 WatchSet chans after which we fell back to whole-table index watching, we fixed this with per-service indexes specifically for Health queries.) So yeah this seems like a really simple change that should be a very significant CPU saving in this common case 👏 |
Yes, exactly! I have not measured the cost of the extra query, but I don't expect that to be significant. I think it's worth mentioning that this extra cost already exists in all of our blocking queries. I believe the Any time the I think given that this already happens today in normal blocking-queries, there's no real concern that it will continue to happen for blocking-queries for non-existent entries. |
ce99688
to
4457d6a
Compare
} else { | ||
reply.Index = index | ||
} | ||
reply.Index = index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably this part of the change was intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya, this is now handled generally in Server.blockingQuery
by calling Server.setQueryMeta
, and setting the value to 1 if it is still 0. I guess this logic in kvs_endpoint
predated the general fix in blockingQuery
.
4457d6a
to
ae10e60
Compare
if err != nil { | ||
switch { | ||
case errors.Is(err, errNotFound): | ||
if notFound { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need var notFound bool
? It seems like it moots the whole point, because the first errNotFound
skips the conditional here, and proceeds exactly like the prior scenario, never executing line 978.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question!
I believe what you are calling out is that on the first iteration of this loop we still rely on the index not having changed. If a write happens between the time the last blocking query returned and when the next blocking query request happens (which I think should generally only be a few milliseconds), then this optimization does not work. We continue to have the old behaviour of returning unnecessarily. Generally that should be ok because it only limits the optimization, should not impact correctness, and there is a very small window where a write must happen to cause this problem.
There is a way to work around this limitation (could be a follow up), but it introduces some risk, so I thought it would be better to avoid. I haven't documented that anywhere yet (just some discussions in old slack threads), but I can do that here.
The work around for this limitation would be to add a new X-Consul-Index-Not-Found
query parameter (as a sibling to X-Consul-Index
). This parameter would be used to set the initial value of var notFound bool
. That way when a new query comes in, we'll already know if we're in a "not found" state, and only the creation of an item (or a timeout) will cause blockingQuery
to return. The risk here is that if the user incorrectly sets this header, then they could miss a delete that happened, but I haven't thought enough about this case to know if it's actually a problem for correctness. Maybe we should do this as a follow up.
Why do we need
var notFound bool
?
To answer the other part of this question, this is essential to the optimization. I think this is explained briefly in 9abb49da51ad87312a5f34364d8e6a3c0e35c328, but I'm happy to add more detail since this is the main reason for the change. As documented in #12343, there are two things to consider here: the "last modified index" set on responseMeta.Index
, and the WatchSet
blocking.
We can't fix the WatchSet
blocking without turning every read operation into a write that modifies the memdb index (I'm pretty sure this is not an approach we want to investigation). Maybe there are other ways of making WatchSet
block correctly for not found items? I have not found any.
So that leaves the index comparison (responseMeta.Index > query.MinIndex
). Generally we know this comparison will return true in these "not found" cases, because responseMeta.Index
will be the max index of the table, and it changes on every write.
What we need is a comparison we can use instead of responseMeta.Index > query.MinIndex
.
If we think about this comparison more abstractly, I think we'll see that the question this is asking is "has the result of the query changed from the last result sent to the client". There are plenty of ways that question could be answered, using the modify index is a very cost effective one, so that's generally what we use.
However as we've noticed, using that modify index doesn't work for cases where an item does not exist, because we don't have a stable modify index to compare against. In this case we use "not found" to indicate if the result has changed.
Since we don't know the state from the client perspective (X-Consul-Index-Not-Found
would give us that), we have to perform the query once. If the result of the query is the same "last modified index", then we know at that index the query returned no results (which tells us the client's last view was "not found"). Until the query returns some result, the state has not changed, so we can continue to block until that time.
In the line below this, when we see "state has not changed, the result is still not found", we update the minQueryIndex
to the latest responseMeta.Index
, so that the comparison later keeps us in the for loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened #12363 for that potential future optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll rebase this PR, it confliced with some renames from the docs PR.
I'll also update the godoc to document this special errNotFound
that the query function can return.
if err != nil { | ||
switch { | ||
case errors.Is(err, errNotFound): | ||
if notFound { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question!
I believe what you are calling out is that on the first iteration of this loop we still rely on the index not having changed. If a write happens between the time the last blocking query returned and when the next blocking query request happens (which I think should generally only be a few milliseconds), then this optimization does not work. We continue to have the old behaviour of returning unnecessarily. Generally that should be ok because it only limits the optimization, should not impact correctness, and there is a very small window where a write must happen to cause this problem.
There is a way to work around this limitation (could be a follow up), but it introduces some risk, so I thought it would be better to avoid. I haven't documented that anywhere yet (just some discussions in old slack threads), but I can do that here.
The work around for this limitation would be to add a new X-Consul-Index-Not-Found
query parameter (as a sibling to X-Consul-Index
). This parameter would be used to set the initial value of var notFound bool
. That way when a new query comes in, we'll already know if we're in a "not found" state, and only the creation of an item (or a timeout) will cause blockingQuery
to return. The risk here is that if the user incorrectly sets this header, then they could miss a delete that happened, but I haven't thought enough about this case to know if it's actually a problem for correctness. Maybe we should do this as a follow up.
Why do we need
var notFound bool
?
To answer the other part of this question, this is essential to the optimization. I think this is explained briefly in 9abb49da51ad87312a5f34364d8e6a3c0e35c328, but I'm happy to add more detail since this is the main reason for the change. As documented in #12343, there are two things to consider here: the "last modified index" set on responseMeta.Index
, and the WatchSet
blocking.
We can't fix the WatchSet
blocking without turning every read operation into a write that modifies the memdb index (I'm pretty sure this is not an approach we want to investigation). Maybe there are other ways of making WatchSet
block correctly for not found items? I have not found any.
So that leaves the index comparison (responseMeta.Index > query.MinIndex
). Generally we know this comparison will return true in these "not found" cases, because responseMeta.Index
will be the max index of the table, and it changes on every write.
What we need is a comparison we can use instead of responseMeta.Index > query.MinIndex
.
If we think about this comparison more abstractly, I think we'll see that the question this is asking is "has the result of the query changed from the last result sent to the client". There are plenty of ways that question could be answered, using the modify index is a very cost effective one, so that's generally what we use.
However as we've noticed, using that modify index doesn't work for cases where an item does not exist, because we don't have a stable modify index to compare against. In this case we use "not found" to indicate if the result has changed.
Since we don't know the state from the client perspective (X-Consul-Index-Not-Found
would give us that), we have to perform the query once. If the result of the query is the same "last modified index", then we know at that index the query returned no results (which tells us the client's last view was "not found"). Until the query returns some result, the state has not changed, so we can continue to block until that time.
In the line below this, when we see "state has not changed, the result is still not found", we update the minQueryIndex
to the latest responseMeta.Index
, so that the comparison later keeps us in the for loop.
This test shows how blocking queries are not efficient when the query returns no results. The test fails with 100+ calls instead of the expected 2. This test is still a bit flaky because it depends on the timing of the writes. It can sometimes return 3 calls. A future commit should fix this and make blocking queries even more optimal for not-found results.
By using the query results as state. Blocking queries are efficient when the query matches some results, because the ModifyIndex of those results, returned as queryMeta.Mindex, will never change unless the items themselves change. Blocking queries for non-existent items are not efficient because the queryMeta.Index can (and often does) change when other entities are written. This commit reduces the churn of these queries by using a different comparison for "has changed". Instead of using the modified index, we use the existence of the results. If the previous result was "not found" and the new result is still "not found", we know we can ignore the modified index and continue to block. This is done by setting the minQueryIndex to the returned queryMeta.Index, which prevents the query from returning before a state change is observed.
Any query that returns a list of items is not part of this commit.
ae10e60
to
8a6e75a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I'm going to merge without any backport tags for now. I'll need to backport #12109 before this one, otherwise it will never apply. |
🍒 If backport labels were added before merging, cherry-picking will start automatically. To retroactively trigger a backport after merging, add backport labels and re-run https://circleci.com/gh/hashicorp/consul/588613. |
…g entries are involved Starting from and extending the mechanism introduced in #12110 we can specially handle the following special Consul RPC endpoints that react to many config entries in a single blocking query in Connect: - DiscoveryChain.Get - ConfigEntry.ResolveServiceConfig - Intentions.Match - ConfigEntry.List - Internal.IntentionUpstreams All of these will internally watch for many config entries, and at least one of those will likely be not found in any given query. Because these are blends of multiple reads the exact solution from #12110 isn't perfectly aligned, but we can tweak the approach slightly and regain the utility of that mechanism. (A) No Config Entries Found In this case, despite looking for many config entries none may be found at all. Unlike #12110 in this scenario we do not return an empty reply to the caller, but instead synthesize a struct from default values to return. This can be handled nearly identically to #12110 with the first 1-2 replies being non-empty payloads followed by the standard spurious wakeup suppression mechanism from #12110. (B) No Change Since Last Wakeup Once a blocking query loop on the server has completed and slept at least once, there is a further optimization we can make here to detect if any of the config entries that were present at specific versions for the prior execution of the loop are identical for the loop we just woke up for. In that scenario we can return a slightly different internal sentinel error and basically externally handle it similar to #12110. This would mean that even if 20 discovery chain read RPC handling goroutines wakeup due to the creation of an unrelated config entry, the only ones that will terminate and reply with a blob of data are those that genuinely have new data to report.
…g entries are involved Starting from and extending the mechanism introduced in #12110 we can specially handle the following special Consul RPC endpoints that react to many config entries in a single blocking query in Connect: - DiscoveryChain.Get - ConfigEntry.ResolveServiceConfig - Intentions.Match - ConfigEntry.List - Internal.IntentionUpstreams All of these will internally watch for many config entries, and at least one of those will likely be not found in any given query. Because these are blends of multiple reads the exact solution from #12110 isn't perfectly aligned, but we can tweak the approach slightly and regain the utility of that mechanism. (A) No Config Entries Found In this case, despite looking for many config entries none may be found at all. Unlike #12110 in this scenario we do not return an empty reply to the caller, but instead synthesize a struct from default values to return. This can be handled nearly identically to #12110 with the first 1-2 replies being non-empty payloads followed by the standard spurious wakeup suppression mechanism from #12110. (B) No Change Since Last Wakeup Once a blocking query loop on the server has completed and slept at least once, there is a further optimization we can make here to detect if any of the config entries that were present at specific versions for the prior execution of the loop are identical for the loop we just woke up for. In that scenario we can return a slightly different internal sentinel error and basically externally handle it similar to #12110. This would mean that even if 20 discovery chain read RPC handling goroutines wakeup due to the creation of an unrelated config entry, the only ones that will terminate and reply with a blob of data are those that genuinely have new data to report.
…g entries are involved (#12362) Starting from and extending the mechanism introduced in #12110 we can specially handle the 3 main special Consul RPC endpoints that react to many config entries in a single blocking query in Connect: - `DiscoveryChain.Get` - `ConfigEntry.ResolveServiceConfig` - `Intentions.Match` All of these will internally watch for many config entries, and at least one of those will likely be not found in any given query. Because these are blends of multiple reads the exact solution from #12110 isn't perfectly aligned, but we can tweak the approach slightly and regain the utility of that mechanism. ### No Config Entries Found In this case, despite looking for many config entries none may be found at all. Unlike #12110 in this scenario we do not return an empty reply to the caller, but instead synthesize a struct from default values to return. This can be handled nearly identically to #12110 with the first 1-2 replies being non-empty payloads followed by the standard spurious wakeup suppression mechanism from #12110. ### No Change Since Last Wakeup Once a blocking query loop on the server has completed and slept at least once, there is a further optimization we can make here to detect if any of the config entries that were present at specific versions for the prior execution of the loop are identical for the loop we just woke up for. In that scenario we can return a slightly different internal sentinel error and basically externally handle it similar to #12110. This would mean that even if 20 discovery chain read RPC handling goroutines wakeup due to the creation of an unrelated config entry, the only ones that will terminate and reply with a blob of data are those that genuinely have new data to report. ### Extra Endpoints Since this pattern is pretty reusable, other key config-entry-adjacent endpoints used by `agent/proxycfg` also were updated: - `ConfigEntry.List` - `Internal.IntentionUpstreams` (tproxy)
Branched from #12109, fixes #9449
Full details about this change are in the commit messages. This is a problem that has been reported a few times. It is particularly problematic when connect is enabled, because agents will do many blocking queries for config-entries that often do not exist.
This PR improves the behaviour of blocking queries when the result of the query is "no items found" (no modify index to use as the
QueryMeta.Index
). This change makesblockingQuery
more efficient, but there is still possibility for improvement. Currently we have to return on the first iteration if there was a write between the time the previous RPC call was returned and the new one was received (because the table max index would have changed). We can fix that case by adding a new query option, which I will do in a follow up PR.TODO:
errNotFound
from all the RPC endpoint functions passed toblockingQuery
(where appropriate), so that all queries benefit from this enhancement. Done for all "get" endpoints. List endpoints will be left for a future PR as those are more involved.