-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce Number of List Calls During Snapshot Create and Delete #44088
Reduce Number of List Calls During Snapshot Create and Delete #44088
Conversation
Some obvious cleanups I found when investigation the API call count metering: * No need to get the latest generation id after loading latest repository data * Loading RepositoryData already requires fetching the latest generation so we can reuse it * Also, reuse list of all root blobs when fetching latest repo generation during snapshot delete like we do for shard folders * Lastly, don't try and load `index--1` (N = -1) repository data, it doesn't exist -> just return the empty repo data initially
Pinging @elastic/es-distributed |
* @param newGeneration New Generation | ||
* @return New instance | ||
*/ | ||
public RepositoryData withGenId(long newGeneration) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to add this for the test, but I think it can be used in a follow up to clean up handling of the generation on repository data instances a little as well (currently the generation on these instances doesn't always correspond the the actual generation it will be written as).
server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Outdated
Show resolved
Hide resolved
final long currentGen = latestIndexBlobId(); | ||
if (currentGen != repositoryStateId) { | ||
final long currentGen = repositoryData.getGenId(); | ||
if (currentGen != expectedGen) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so the behavior of re-reading the last index-N file before writing a new one is kept but it is now the responsibility of the caller of writeIndexGen()
to pass an up to date RepositoryData
. While I'm OK with the change I find it more difficult now to identify the "generation" of the RepositoryData to be written.
Maybe we could change RepositoryData
so that it increments its own generation when a snapshot is removed or added and here we check that repositoryData.getGenId() == (expectedGen + 1)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tlrx yea, I think that would be a nice refactoring. Unfortunately, when I last tried doing that (the auto-increment) it ran into a huge number of failing tests (there's literally more than 10 that somehow indirectly rely on the fact the the repository gen id stays the same on these operations). I think it's worthwhile, especially with this change, but probably best left to another PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worthwhile, especially with this change, but probably best left to another PR?
I'm OK if it's done in a follow up PR
Thanks @tlrx all points addressed :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
thanks @tlrx |
…c#44088) * Reduce Numebr of List Calls During Snapshot Create and Delete Some obvious cleanups I found when investigation the API call count metering: * No need to get the latest generation id after loading latest repository data * Loading RepositoryData already requires fetching the latest generation so we can reuse it * Also, reuse list of all root blobs when fetching latest repo generation during snapshot delete like we do for shard folders * Lastly, don't try and load `index--1` (N = -1) repository data, it doesn't exist -> just return the empty repo data initially
#44209) * Reduce Number of List Calls During Snapshot Create and Delete Some obvious cleanups I found when investigation the API call count metering: * No need to get the latest generation id after loading latest repository data * Loading RepositoryData already requires fetching the latest generation so we can reuse it * Also, reuse list of all root blobs when fetching latest repo generation during snapshot delete like we do for shard folders * Lastly, don't try and load `index--1` (N = -1) repository data, it doesn't exist -> just return the empty repo data initially
Some obvious cleanups I found when investigation the API call count
metering:
repository data
generation so we can reuse it
generation during snapshot delete like we do for shard folders
index--1
(N = -1) repository data, itdoesn't exist -> just return the empty repo data initially