Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce Number of List Calls During Snapshot Create and Delete #44088

Merged

Conversation

original-brownbear
Copy link
Member

Some obvious cleanups I found when investigation the API call count
metering:

  • No need to get the latest generation id after loading latest
    repository data
    • Loading RepositoryData already requires fetching the latest
      generation so we can reuse it
  • Also, reuse list of all root blobs when fetching latest repo
    generation during snapshot delete like we do for shard folders
  • Lastly, don't try and load index--1 (N = -1) repository data, it
    doesn't exist -> just return the empty repo data initially

Some obvious cleanups I found when investigation the API call count
metering:
* No need to get the latest generation id after loading latest
repository data
   * Loading RepositoryData already requires fetching the latest
generation so we can reuse it
* Also, reuse list of all root blobs when fetching latest repo
generation during snapshot delete like we do for shard folders
* Lastly, don't try and load `index--1` (N = -1) repository data, it
doesn't exist -> just return the empty repo data initially
@original-brownbear original-brownbear added >non-issue :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.4.0 labels Jul 8, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

* @param newGeneration New Generation
* @return New instance
*/
public RepositoryData withGenId(long newGeneration) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to add this for the test, but I think it can be used in a follow up to clean up handling of the generation on repository data instances a little as well (currently the generation on these instances doesn't always correspond the the actual generation it will be written as).

@original-brownbear original-brownbear changed the title Reduce Numebr of List Calls During Snapshot Create and Delete Reduce Number of List Calls During Snapshot Create and Delete Jul 8, 2019
final long currentGen = latestIndexBlobId();
if (currentGen != repositoryStateId) {
final long currentGen = repositoryData.getGenId();
if (currentGen != expectedGen) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so the behavior of re-reading the last index-N file before writing a new one is kept but it is now the responsibility of the caller of writeIndexGen() to pass an up to date RepositoryData. While I'm OK with the change I find it more difficult now to identify the "generation" of the RepositoryData to be written.

Maybe we could change RepositoryData so that it increments its own generation when a snapshot is removed or added and here we check that repositoryData.getGenId() == (expectedGen + 1)?

Copy link
Member Author

@original-brownbear original-brownbear Jul 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tlrx yea, I think that would be a nice refactoring. Unfortunately, when I last tried doing that (the auto-increment) it ran into a huge number of failing tests (there's literally more than 10 that somehow indirectly rely on the fact the the repository gen id stays the same on these operations). I think it's worthwhile, especially with this change, but probably best left to another PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worthwhile, especially with this change, but probably best left to another PR?

I'm OK if it's done in a follow up PR

@original-brownbear
Copy link
Member Author

Thanks @tlrx all points addressed :)

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@original-brownbear
Copy link
Member Author

thanks @tlrx

@original-brownbear original-brownbear merged commit 4849c3e into elastic:master Jul 9, 2019
@original-brownbear original-brownbear deleted the save-some-rpc-calls branch July 9, 2019 11:06
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jul 11, 2019
…c#44088)

* Reduce Numebr of List Calls During Snapshot Create and Delete

Some obvious cleanups I found when investigation the API call count
metering:
* No need to get the latest generation id after loading latest
repository data
   * Loading RepositoryData already requires fetching the latest
generation so we can reuse it
* Also, reuse list of all root blobs when fetching latest repo
generation during snapshot delete like we do for shard folders
* Lastly, don't try and load `index--1` (N = -1) repository data, it
doesn't exist -> just return the empty repo data initially
original-brownbear added a commit that referenced this pull request Jul 11, 2019
#44209)

* Reduce Number of List Calls During Snapshot Create and Delete

Some obvious cleanups I found when investigation the API call count
metering:
* No need to get the latest generation id after loading latest
repository data
   * Loading RepositoryData already requires fetching the latest
generation so we can reuse it
* Also, reuse list of all root blobs when fetching latest repo
generation during snapshot delete like we do for shard folders
* Lastly, don't try and load `index--1` (N = -1) repository data, it
doesn't exist -> just return the empty repo data initially
@original-brownbear original-brownbear restored the save-some-rpc-calls branch August 6, 2020 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >non-issue v7.4.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants