-
Notifications
You must be signed in to change notification settings - Fork 24.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streamline AsyncShardFetch#getNumberOfInFlightFetches #93632
Conversation
Pinging @elastic/es-distributed (Team:Distributed) |
@elasticmachine generate changelog |
Thanks @luyuncheng for spotting this and suggesting a fix. I think there's a better approach, however: we can make it so that |
LGTM, Let me try. |
@DaveCTurner at commit 6080f60 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. I left a few small comments.
I'd also like to introduce a method to verify that the count is consistent when not running in prod:
private boolean assertFetchingCountConsistent() {
assert Thread.holdsLock(this);
assert fetchingCount.get() == cache.values().stream().filter(NodeEntry::isFetching).count();
return true;
}
and then we can say assert assertFetchingCountConsistent();
after we've changed the cache contents. That way we should pick up any mistakes in this area quickly.
} | ||
return count; | ||
public int getNumberOfInFlightFetches() { | ||
return fetchingCount.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much nicer 😄
@@ -57,6 +58,7 @@ | |||
private final Set<String> nodesToIgnore = new HashSet<>(); | |||
private final AtomicLong round = new AtomicLong(); | |||
private boolean closed; | |||
private final AtomicInteger fetchingCount = new AtomicInteger(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could just be a volatile int
because it's only ever updated within synchronized
methods.
} | ||
} | ||
return false; | ||
// visible for testing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with keeping this private, I don't think it needs testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ modified
I have pushed a commit (5beddb8) which adds the required changelog YAML file. You'll need to pull this branch and merge or rebase your changes on top of my commit. |
@elasticmachine ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @luyuncheng
(it's Friday evening here, I will not merge this before Monday) |
Avoids an O(#nodes) iteration by tracking the number of fetches directly.
Avoids an O(#nodes) iteration by tracking the number of fetches directly.
Avoids an O(#nodes) iteration by tracking the number of fetches directly. Backport of elastic#93632 to 7.17
Avoids an O(#nodes) iteration by tracking the number of fetches directly. Backport of #93632 to 7.17 Co-authored-by: luyuncheng <luyuncheng@bytedance.com>
When we restart a ES 7.10 cluster, we found lead master cpu hot_threads much cost in
org.elasticsearch.action.admin.cluster.health.TransportClusterHealthAction.getResponse
->org.elasticsearch.cluster.routing.allocation.AllocationService.getNumberOfInFlightFetches
we trace the function
getNumberOfInFlightFetches
it shows that all time cost in
GatewayAllocator.getNumberOfInFlightFetches
,elasticsearch/server/src/main/java/org/elasticsearch/gateway/GatewayAllocator.java
Lines 85 to 93 in b29399e
In some case, it will scan all
shards * ndoes
every time.elasticsearch/server/src/main/java/org/elasticsearch/gateway/AsyncShardFetch.java
Lines 77 to 84 in b29399e
BUT, in most cases, the fetching status is all false after scanned once.
there is no need to iterator nodes every time. we can add a fetching cache to ignore this at this PR
ISSUE: #93631