Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshot : azure module - accelerate the listing of files (used in delete snapshot) #25710

Merged

Conversation

etiennecarriere
Copy link
Contributor

@etiennecarriere etiennecarriere commented Jul 13, 2017

Close #25424 .

Current behaviour of the Listing of a container in azure plugin :

  • List all the blob
  • For each blob, request the medata of blob
    => it is ineffective as we have N+1 requests for N blobs

Proposed behaviour

  • List all blob and retrieving the metadata in the same request

@elasticmachine
Copy link
Collaborator

Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually?

@etiennecarriere
Copy link
Contributor Author

@karmi , My company signed a CLA ("A company wanting its employees to contribute") with my Github account in the annex ". Do I have to sign an other one (" An individual contributing under an existing company contributor agreement") ?

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I left some comments. In general it looks like the idea is to call a different listBlobs impl which can return all the metadata of the keys along with the key itself. Can you please update the description to explain that?

CloudBlobClient client = this.getSelectedClient(account, mode);
CloudBlobContainer blobContainer = client.getContainerReference(container);

SocketAccess.doPrivilegedVoidException(() -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doPrivileged needs to stay.

// uri.getPath is of the form /container/keyPath.* and we want to strip off the /container/
// this requires 1 + container.length() + 1, with each 1 corresponding to one of the /
String blobPath = uri.getPath().substring(1 + container.length() + 1);
if (!(blobItem instanceof CloudBlockBlob)){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this check, it should be an error if we don't get back CloudBlockBlob (which is what we expect for all keys here). I think it can be a hard assert.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is the assumption on the previous version of code but I prefer to be defensive : if the azure storage container has been modified by an outside component, it could have other type of blob (Append/Page) so I propose ignoring it and put a warning.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am +1 on what @rjernst said - silently ignoring is not the way to go here, very few read log messages, its more important to fail hard if something changed that alters our underlying assumptions of what object type is returned there.

logger.trace("blob url [{}], name [{}], size [{}]", uri, name, properties.getLength());
blobsBuilder.put(name, new PlainBlobMetaData(name, properties.getLength()));
if (blobContainer.exists()) {
for (ListBlobItem blobItem : blobContainer.listBlobs(keyPath + (prefix == null ? "" : prefix),false,enumBlobListingDetails,null,null)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please use spaces after the commas

Also, check the length of this line, it looks close to the 140 char limit

*/
String testName = "snapshot-itest-"
.concat(RandomizedTest.getContext().getRunnerSeedAsString().toLowerCase(Locale.ROOT))
.concat(new Long(RandomizedTest.getContext().getRandom().nextLong()).toString());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can simply be randomLong(). However, I don't think this will work because the cleanup in wipeAzureRepositories() would then not know about the generated name. If these need to be unique, then this method needs to stash the generated name so that the repository can be cleaned up after the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right : I rollback as it will not work in wipeAzureRepository .

return testName.contains(" ") ? Strings.split(testName, " ")[0] : testName;
}

@Override
public Settings indexSettings() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these should be removed, but if they should, then please explain why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reflexion, ok to keep it but we have to add the MockFSIndexStore.TestPlugin so that the configuration variable are known. If we don't have, we have random error as MockFSIndexStore.TestPlugin is added randomly in ESIntegTestCase superclass

@rjernst
Copy link
Member

rjernst commented Jul 13, 2017

@etiennecarriere Regarding the CLA, I believe you need to associate your github profile with the company signed CLA. If you follow the link to the CLA, I think that is the second option An individual contributing under an existing company contributor agreement.

@karmi
Copy link
Contributor

karmi commented Jul 14, 2017

Hi @etiennecarriere, we have found your signature in our records, but it seems like you have signed with a different e-mail than the one used in yout Git commit. Can you please add both of these e-mails into your Github profile (they can be hidden), so we can match your e-mails to your Github profile?

@etiennecarriere etiennecarriere force-pushed the 25424_azure_improve_delete_snapshot branch from 6ad3fb6 to e6478a6 Compare July 14, 2017 02:28
@etiennecarriere
Copy link
Contributor Author

@rjernst, @abeyad , I applied your comments for the code on the "main" class so the gradle check / test works.
On the tests, I had to modify 2 weeks ago in order to have functional integration tests. Unfortunately, I don't achieve to reproduce them today even on master without my modifications.

@etiennecarriere
Copy link
Contributor Author

@karmi, my fault, I commited the corrections with my personal account and not my professional account.

@karmi
Copy link
Contributor

karmi commented Jul 17, 2017

@Etienne-Carriere, the check is now green, thanks for fixing the e-mail problem!

@dakrone dakrone requested a review from rjernst August 15, 2017 21:23
Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Etienne-Carriere Thanks for the updates and sorry for the delay in reviewing again. I left a couple more minor comments. If you can address them, I will merge this.

SocketAccess.doPrivilegedVoidException(() -> {
if (blobContainer.exists()) {
for (ListBlobItem blobItem : blobContainer.listBlobs(keyPath + (prefix == null ? "" : prefix))) {
for (ListBlobItem blobItem : blobContainer.listBlobs(keyPath + (prefix==null ? "" : prefix),false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please keep the spacing around == and between arguments (after ,)

private String getRepositoryPath() {
String testName = "it-" + getTestName();
return testName.contains(" ") ? Strings.split(testName, " ")[0] : testName;
}

public static String getContainerName() {
String testName = "snapshot-itest-".concat(RandomizedTest.getContext().getRunnerSeedAsString().toLowerCase(Locale.ROOT));
/* Have a different name per test so that there is no possible race condition. As the long can be negative,
* there mustn't be an hyphen between the 2 concat numbers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: an -> a, concat -> concatenated

String testName = "snapshot-itest-".concat(RandomizedTest.getContext().getRunnerSeedAsString().toLowerCase(Locale.ROOT));
/* Have a different name per test so that there is no possible race condition. As the long can be negative,
* there mustn't be an hyphen between the 2 concat numbers
* (can't have 2 consecutives hypens on Azure containers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: hypens -> hyphens

@etiennecarriere
Copy link
Contributor Author

Changes made. For the resolution of the conflicts, I don't know what is the Elasticsearch method :

  • Use the github resolver (as I have done)
  • Made a rebase then force push to the branch (I prepared it on my local repo).

@rjernst
Copy link
Member

rjernst commented Aug 23, 2017

@elasticmachine test this please

@rjernst
Copy link
Member

rjernst commented Aug 23, 2017

Thanks @Etienne-Carriere, I will get this merged once CI completes.

@rjernst
Copy link
Member

rjernst commented Aug 23, 2017

@Etienne-Carriere This branch no longer compiles against master. Can you bring it up to date? I think the ctor for azure impl has been changed.

@etiennecarriere
Copy link
Contributor Author

I change the API call . I made the following tests :
gradle assemble on the root directory : OK
gradle check on the plugins/repository-azure : OK

but the gradle check the root directory failed on an error that seems not to be linked with this PR.

@rjernst
Copy link
Member

rjernst commented Aug 24, 2017

@elasticmachine please test this

@etiennecarriere
Copy link
Contributor Author

etiennecarriere commented Sep 7, 2017

@rjernst , do you need additionnals elements to validate the PR ?

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for your patience @Etienne-Carriere

@rjernst rjernst merged commit 7060672 into elastic:master Sep 11, 2017
rjernst pushed a commit that referenced this pull request Sep 11, 2017
…pshot) (#25710)

This commit reworks the azure listing of snapshot files to do a single listing, instead of once per blob.

closes #25424
rjernst pushed a commit that referenced this pull request Sep 11, 2017
…pshot) (#25710)

This commit reworks the azure listing of snapshot files to do a single listing, instead of once per blob.

closes #25424
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Sep 12, 2017
…rflow

* origin/master: (59 commits)
  Fix Lucene version of 5.6.1.
  Remove azure deprecated settings (elastic#26099)
  Handle the 5.6.0 release
  Allow plugins to validate cluster-state on join (elastic#26595)
  Remove index mapper dynamic settings (elastic#25734)
  update AWS SDK for ECS Task IAM support in discovery-ec2 (elastic#26479)
  Azure repository: Accelerate the listing of files (used in delete snapshot) (elastic#25710)
  Build: Remove norelease from forbidden patterns (elastic#26592)
  Fix reference to painless inside expression engine (elastic#26528)
  Build: Move javadoc linking to root build.gradle (elastic#26529)
  Test: Remove leftover static bwc test case (elastic#26584)
  Docs: Remove remaining references to file and native scripts (elastic#26580)
  Snapshot fallback should consider build.snapshot
  elastic#26496: Set the correct bwc version after backport to 6.x
  Fix the MapperFieldType.rangeQuery API. (elastic#26552)
  Deduplicate `_field_names`. (elastic#26550)
  [Docs] Update method setSource(byte[] source) (elastic#26561)
  [Docs] Fix typo in javadocs (elastic#26556)
  Allow multiple digits in Vagrant 2.x minor versions
  Support Vagrant 2.x
  ...
jasontedor added a commit to jpountz/elasticsearch that referenced this pull request Sep 13, 2017
* master: (21 commits)
  Ensure module is bundled before installing in tests
  Add boolean similarity to built in similarity types (elastic#26613)
  [Tests] Remove skip tests in search/30_limits.yml
  Let search phases override max concurrent requests
  Add a soft limit for the number of requested doc-value fields (elastic#26574)
  Support for accessing Azure repositories through a proxy (elastic#23518)
  Add beta tag to MSI Windows Installer (elastic#26616)
  Fix Lucene version of 5.6.1.
  Remove azure deprecated settings (elastic#26099)
  Handle the 5.6.0 release
  Allow plugins to validate cluster-state on join (elastic#26595)
  Remove index mapper dynamic settings (elastic#25734)
  update AWS SDK for ECS Task IAM support in discovery-ec2 (elastic#26479)
  Azure repository: Accelerate the listing of files (used in delete snapshot) (elastic#25710)
  Build: Remove norelease from forbidden patterns (elastic#26592)
  Fix reference to painless inside expression engine (elastic#26528)
  Build: Move javadoc linking to root build.gradle (elastic#26529)
  Test: Remove leftover static bwc test case (elastic#26584)
  Docs: Remove remaining references to file and native scripts (elastic#26580)
  Snapshot fallback should consider build.snapshot
  ...
@lcawl lcawl removed the v6.1.0 label Dec 12, 2017
@clintongormley clintongormley added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs and removed :Plugin Repository Azure labels Feb 14, 2018
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v6.0.0-rc1 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants