New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhances get snapshots API to allow retrieving repository index only #24477

Merged
merged 5 commits into from May 10, 2017

Conversation

Projects
None yet
4 participants
@abeyad
Contributor

abeyad commented May 4, 2017

Currently, the get snapshots API (e.g. /_snapshot/{repositoryName}/_all)
provides information about snapshots in the repository, including the
snapshot state, number of shards snapshotted, failures, etc. In order
to provide information about each snapshot in the repository, the call
must read the snapshot metadata blob (snap-{snapshot_uuid}.dat) for
every snapshot. In cloud-based repositories, this can be expensive,
both from a cost and performance perspective. Sometimes, all the user
wants is to retrieve all the names/uuids of each snapshot, and the
indices that went into each snapshot, without any of the other status
information about the snapshot. This minimal information can be
retrieved from the repository index blob (index-N) without needing to
read each snapshot metadata blob.

This commit enhances the get snapshots API with an optional verbose
parameter. If verbose is set to false on the request, then the get
snapshots API will only retrieve the minimal information about each
snapshot (the name, uuid, and indices in the snapshot), and only read
this information from the repository index blob, thereby giving users
the option to retrieve the snapshots in a repository in a more
cost-effective and efficient manner.

Closes #24288

...n/java/org/elasticsearch/action/admin/cluster/snapshots/get/TransportGetSnapshotsAction.java Outdated
final List<SnapshotInfo> currentSnapshots) {
List<SnapshotInfo> snapshotInfos = new ArrayList<>();
for (SnapshotInfo snapshotInfo : currentSnapshots) {
if (toResolve.contains(snapshotInfo.snapshotId())) {

This comment has been minimized.

@StefanSchmidtOz

StefanSchmidtOz May 4, 2017

Idea:

if (toResolve.remove(snapshotInfo.snapshotId())) {
        snapshotInfos.add(snapshotInfo.basic());
}

This comment has been minimized.

@abeyad

abeyad May 10, 2017

Contributor

thank you @StefanSchmidtOz, i made the suggested change

@imotov

Code looks good. One concern that I have is that the state is such an important part of the snapshot information that omitting state can lead to the situation when a user might thing that they have the latest snapshot for a particular index, while this snapshot actually failed. Could we add this information to the index file? What do you think?

Since we are modifying toXContent, I think it might be useful to add another get snapshot info test with verbose=false to rest tests. There is also some non-trivial bwc work going on here. So, I think having a bwc test might be helpful as well.

@abeyad

This comment has been minimized.

Contributor

abeyad commented May 4, 2017

@imotov I looked at adding the state to the index blob and I believe this will be relatively straight forward, and I agree with you its only a little bit of added information to the index blob, but the benefits are great as state is the most important thing one would care about for a given snapshot. I will work on adding this and push commits, and will add the tests you suggested as well.

@abeyad

This comment has been minimized.

Contributor

abeyad commented May 9, 2017

@imotov I pushed baa2055 that addresses your feedback and enables storing snapshot states in the index file

@imotov

This comment has been minimized.

Member

imotov commented May 9, 2017

Looks good! Any chance we can add reading non-verbose listings to one of snapshot/restore bwc tests?

@abeyad

This comment has been minimized.

Contributor

abeyad commented May 9, 2017

@imotov the different repository index formats are implicitly executed in the backwards-5.0 and rolling upgrade tests - since we have different versioned nodes, with some of them on the old format, sometimes we get nodes that read/write the old format and sometimes we get nodes that read/write the new format. In fact, running those tests helped me find a bug in reading the old format from a new node.

I can add a test which just writes the old x-content and reads it in the newer version, and vice versa (a unit test where I'd be copying/pasting the old code in the test to test old serialization/deserialization with the new).

@abeyad

This comment has been minimized.

Contributor

abeyad commented May 9, 2017

retest this please

@abeyad

This comment has been minimized.

Contributor

abeyad commented May 10, 2017

@imotov i added the old snapshot bwc tests in 9ca8c06e635d3e246c70a288e43a0525f37695cf

@imotov

imotov approved these changes May 10, 2017

Thanks! LGTM

@imotov

This comment has been minimized.

Member

imotov commented May 10, 2017

This is going only to 6.0, right? Should we remove 5.5 label?

@abeyad

This comment has been minimized.

Contributor

abeyad commented May 10, 2017

@imotov I thought we could backport it too 5.5 too, do you feel its too risky?

@abeyad

This comment has been minimized.

Contributor

abeyad commented May 10, 2017

retest this please

@imotov

This comment has been minimized.

Member

imotov commented May 10, 2017

@abeyad I don't think it's too risky. I just noticed this

VERBOSE_INTRODUCED = Version.V_6_0_0_alpha1_UNRELEASED;

but I guess you did it temporary to make sure BWC tests pass.

@abeyad

This comment has been minimized.

Contributor

abeyad commented May 10, 2017

VERBOSE_INTRODUCED = Version.V_6_0_0_alpha1_UNRELEASED;

Yes exactly, once I backport the change to 5.5, I will change the constant as well on both 5.5 and master.

Ali Beyad added some commits May 4, 2017

Ali Beyad
Enhances get snapshots API to allow retrieving repository index only
Currently, the get snapshots API (e.g. /_snapshot/{repositoryName}/_all)
provides information about snapshots in the repository, including the
snapshot state, number of shards snapshotted, failures, etc.  In order
to provide information about each snapshot in the repository, the call
must read the snapshot metadata blob (`snap-{snapshot_uuid}.dat`) for
every snapshot.  In cloud-based repositories, this can be expensive,
both from a cost and performance perspective.  Sometimes, all the user
wants is to retrieve all the names/uuids of each snapshot, and the
indices that went into each snapshot, without any of the other status
information about the snapshot.  This minimal information can be
retrieved from the repository index blob (`index-N`) without needing to
read each snapshot metadata blob.

This commit enhances the get snapshots API with an optional `verbose`
parameter.  If `verbose` is set to false on the request, then the get
snapshots API will only retrieve the minimal information about each
snapshot (the name, uuid, and indices in the snapshot), and only read
this information from the repository index blob, thereby giving users
the option to retrieve the snapshots in a repository in a more
cost-effective and efficient manner.

Closes #24288
Ali Beyad

@abeyad abeyad merged commit 743217a into elastic:master May 10, 2017

1 of 2 checks passed

elasticsearch-ci Build finished.
Details
CLA Commit author is a member of Elasticsearch
Details

@abeyad abeyad deleted the abeyad:simple_get_snapshots branch May 10, 2017

abeyad pushed a commit that referenced this pull request May 10, 2017

Ali Beyad
Enhances get snapshots API to allow retrieving repository index only (#…
…24477)

Currently, the get snapshots API (e.g. /_snapshot/{repositoryName}/_all)
provides information about snapshots in the repository, including the
snapshot state, number of shards snapshotted, failures, etc.  In order
to provide information about each snapshot in the repository, the call
must read the snapshot metadata blob (`snap-{snapshot_uuid}.dat`) for
every snapshot.  In cloud-based repositories, this can be expensive,
both from a cost and performance perspective.  Sometimes, all the user
wants is to retrieve all the names/uuids of each snapshot, and the
indices that went into each snapshot, without any of the other status
information about the snapshot.  This minimal information can be
retrieved from the repository index blob (`index-N`) without needing to
read each snapshot metadata blob.

This commit enhances the get snapshots API with an optional `verbose`
parameter.  If `verbose` is set to false on the request, then the get
snapshots API will only retrieve the minimal information about each
snapshot (the name, uuid, and indices in the snapshot), and only read
this information from the repository index blob, thereby giving users
the option to retrieve the snapshots in a repository in a more
cost-effective and efficient manner.

Closes #24288
@abeyad

This comment has been minimized.

Contributor

abeyad commented May 10, 2017

5.x commit: 8fc3f1c

@abeyad

This comment has been minimized.

Contributor

abeyad commented May 10, 2017

thanks for the review @imotov

debadair added a commit to debadair/elasticsearch that referenced this pull request May 16, 2017

Enhances get snapshots API to allow retrieving repository index only (e…
…lastic#24477)

Currently, the get snapshots API (e.g. /_snapshot/{repositoryName}/_all)
provides information about snapshots in the repository, including the
snapshot state, number of shards snapshotted, failures, etc.  In order
to provide information about each snapshot in the repository, the call
must read the snapshot metadata blob (`snap-{snapshot_uuid}.dat`) for
every snapshot.  In cloud-based repositories, this can be expensive,
both from a cost and performance perspective.  Sometimes, all the user
wants is to retrieve all the names/uuids of each snapshot, and the
indices that went into each snapshot, without any of the other status
information about the snapshot.  This minimal information can be
retrieved from the repository index blob (`index-N`) without needing to
read each snapshot metadata blob.

This commit enhances the get snapshots API with an optional `verbose`
parameter.  If `verbose` is set to false on the request, then the get
snapshots API will only retrieve the minimal information about each
snapshot (the name, uuid, and indices in the snapshot), and only read
this information from the repository index blob, thereby giving users
the option to retrieve the snapshots in a repository in a more
cost-effective and efficient manner.

Closes elastic#24288

@clintongormley clintongormley added v6.0.0-alpha2 and removed v6.0.0 labels Jun 6, 2017

abeyad pushed a commit to abeyad/elasticsearch that referenced this pull request Jun 28, 2017

Ali Beyad
Output all SnapshotInfo fields in returned x-content
In elastic#24477, a less verbose option was added to retrieve snapshot info via
GET /_snapshot/{repo}/{snapshots}.  The point of adding this less
verbose option was so that if the repository is a cloud based one, and
there are many snapshots for which the snapshot info needed to be
retrieved, then each snapshot would require reading a separate snapshot
metadata file to pull out the necessary information.  This can be costly
(performance and cost) on cloud based repositories, so a less verbose
option was added that only retrieves very basic information about each
snapshot that is all available in the index-N blob - requiring only one
read!

In order to display this less verbose snapshot info appropriately, logic
was added to not display those fields which could not be populated.
However, this broke integrators (e.g. ECE) that required these fields to
be present, even if empty.  This commit is to return these fields in the
response, even if empty, in which case they will be represented as an
empty array/map or 0 values.

abeyad pushed a commit to abeyad/elasticsearch that referenced this pull request Jun 28, 2017

Ali Beyad
Output all empty snapshot info fields if in verbose mode
In elastic#24477, a less verbose option was added to retrieve snapshot info via
GET /_snapshot/{repo}/{snapshots}.  The point of adding this less
verbose option was so that if the repository is a cloud based one, and
there are many snapshots for which the snapshot info needed to be
retrieved, then each snapshot would require reading a separate snapshot
metadata file to pull out the necessary information.  This can be costly
(performance and cost) on cloud based repositories, so a less verbose
option was added that only retrieves very basic information about each
snapshot that is all available in the index-N blob - requiring only one
read!

In order to display this less verbose snapshot info appropriately, logic
was added to not display those fields which could not be populated.
However, this broke integrators (e.g. ECE) that required these fields to
be present, even if empty.  This commit is to return these fields in the
response, even if empty, if the verbose option is set.

abeyad pushed a commit that referenced this pull request Jun 28, 2017

Ali Beyad
Output all empty snapshot info fields if in verbose mode (#25455)
In #24477, a less verbose option was added to retrieve snapshot info via
GET /_snapshot/{repo}/{snapshots}.  The point of adding this less
verbose option was so that if the repository is a cloud based one, and
there are many snapshots for which the snapshot info needed to be
retrieved, then each snapshot would require reading a separate snapshot
metadata file to pull out the necessary information.  This can be costly
(performance and cost) on cloud based repositories, so a less verbose
option was added that only retrieves very basic information about each
snapshot that is all available in the index-N blob - requiring only one
read!

In order to display this less verbose snapshot info appropriately, logic
was added to not display those fields which could not be populated.
However, this broke integrators (e.g. ECE) that required these fields to
be present, even if empty.  This commit is to return these fields in the
response, even if empty, if the verbose option is set.

abeyad pushed a commit that referenced this pull request Jun 28, 2017

Ali Beyad
Output all empty snapshot info fields if in verbose mode (#25455)
In #24477, a less verbose option was added to retrieve snapshot info via
GET /_snapshot/{repo}/{snapshots}.  The point of adding this less
verbose option was so that if the repository is a cloud based one, and
there are many snapshots for which the snapshot info needed to be
retrieved, then each snapshot would require reading a separate snapshot
metadata file to pull out the necessary information.  This can be costly
(performance and cost) on cloud based repositories, so a less verbose
option was added that only retrieves very basic information about each
snapshot that is all available in the index-N blob - requiring only one
read!

In order to display this less verbose snapshot info appropriately, logic
was added to not display those fields which could not be populated.
However, this broke integrators (e.g. ECE) that required these fields to
be present, even if empty.  This commit is to return these fields in the
response, even if empty, if the verbose option is set.

abeyad pushed a commit that referenced this pull request Jun 28, 2017

Ali Beyad
Output all empty snapshot info fields if in verbose mode (#25455)
In #24477, a less verbose option was added to retrieve snapshot info via
GET /_snapshot/{repo}/{snapshots}.  The point of adding this less
verbose option was so that if the repository is a cloud based one, and
there are many snapshots for which the snapshot info needed to be
retrieved, then each snapshot would require reading a separate snapshot
metadata file to pull out the necessary information.  This can be costly
(performance and cost) on cloud based repositories, so a less verbose
option was added that only retrieves very basic information about each
snapshot that is all available in the index-N blob - requiring only one
read!

In order to display this less verbose snapshot info appropriately, logic
was added to not display those fields which could not be populated.
However, this broke integrators (e.g. ECE) that required these fields to
be present, even if empty.  This commit is to return these fields in the
response, even if empty, if the verbose option is set.

tlrx added a commit to tlrx/elasticsearch that referenced this pull request Apr 25, 2018

Fix NullPointerException when creating or deleting a snapshot
A NullPointerException is thrown when trying to create or delete
a snapshot in a repository that has been used by a pre-5.5 cluster
and a post-5.5 cluster. The way snapshots are formatted in the
repository snapshots index file changed in elastic#24477 and it can cause
a NPE if a very specific set of operations is written in the repository.

tlrx added a commit to tlrx/elasticsearch that referenced this pull request Apr 25, 2018

Fix NullPointerException when creating or deleting a snapshot
A NullPointerException is thrown when trying to create or delete
a snapshot in a repository that has been used by a pre-5.5 cluster
and a post-5.5 cluster. The way snapshots are formatted in the
repository snapshots index file changed in elastic#24477 and it can cause
a NPE if a very specific set of operations is written in the repository.

tlrx added a commit to tlrx/elasticsearch that referenced this pull request Apr 26, 2018

Fix NullPointerException when creating or deleting a snapshot
A NullPointerException is thrown when trying to create or delete
a snapshot in a repository that has been used by a pre-5.5 cluster
and a post-5.5 cluster. The way snapshots are formatted in the
repository snapshots index file changed in elastic#24477 and it can cause
a NPE if a very specific set of operations is written in the repository.

tlrx added a commit to tlrx/elasticsearch that referenced this pull request Apr 27, 2018

Fix NullPointerException when creating or deleting a snapshot
A NullPointerException is thrown when trying to create or delete
a snapshot in a repository that has been used by a pre-5.5 cluster
and a post-5.5 cluster. The way snapshots are formatted in the
repository snapshots index file changed in elastic#24477 and it can cause
a NPE if a very specific set of operations is written in the repository.

tlrx added a commit that referenced this pull request Apr 27, 2018

Fail snapshot operations early on repository corruption (#30140)
A NullPointerException is thrown when trying to create or delete
a snapshot in a repository that has been written to by an older 
Elasticsearch after writing to it with a newer Elasticsearch version.

This is because the way snapshots are formatted in the repository 
snapshots index file changed in #24477.

This commit changes the parsing of the repository index file so that 
it now detects a corrupted index file and fails early the snapshot 
operation.

closes #29052

tlrx added a commit that referenced this pull request Apr 27, 2018

Fail snapshot operations early on repository corruption (#30140)
A NullPointerException is thrown when trying to create or delete
a snapshot in a repository that has been written to by an older
Elasticsearch after writing to it with a newer Elasticsearch version.

This is because the way snapshots are formatted in the repository
snapshots index file changed in #24477.

This commit changes the parsing of the repository index file so that
it now detects a corrupted index file and fails early the snapshot
operation.

closes #29052
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment