Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prioritize listing index-N blobs over index.latest in reading snapshots #23333

Merged
merged 1 commit into from
Feb 23, 2017

Conversation

abeyad
Copy link

@abeyad abeyad commented Feb 23, 2017

There are two ways to determine the latest index-N blob that contains
the truth of the contents of the repository: (1) list all index-N blobs
and figure out what the latest value of N is, and (2) read the
index.latest blob, which contains the latest value of N explicitely.
Note that the index.latest blob is not written atomically and can be
re-written, as opposed to the index-N blobs which are never re-written
(to create an updated index blob, index-{N+1} is written).

Previously, the latest index-N was determined by first trying to read
the index.latest blob and if that blob was missing (it was deleted
before being re-written and in between deleting it and re-writing it,
the system crashed), then all index-N blobs were listed to pick the
highest N value.

For non-read-only repositories, this could produce race conditions with
the file system. In particular, it is possible that the index.latest
blob is being read in order to serve a read request (e.g. get snapshots)
and while doing so, an attempt is made to delete the index.latest blob
and re-write it in order to finalize a snapshot operation. On some file
systems (e.g. Windows), it is forbidden to delete a file while it is
open for reading by another process/thread.

This commit changes the priority so that figuring out the latest index-N
blob is first done by listing all index-N blobs and determining the
latest N value. If that values because the repository does not
support listing blobs (e.g. the URL repository), then the index.latest
blob is read. This is safe because in read-only repositories that do
not support listing blobs, the index.latest blob is never deleted and
then re-written, so the aforementioned issue does not arise.

There are two ways to determine the latest index-N blob that contains
the truth of the contents of the repository: (1) list all index-N blobs
and figure out what the latest value of N is, and (2) read the
index.latest blob, which contains the latest value of N explicitely.
Note that the index.latest blob is not written atomically and can be
re-written, as opposed to the index-N blobs which are never re-written
(to create an updated index blob, index-{N+1} is written).

Previously, the latest index-N was determined by first trying to read
the index.latest blob and if that blob was missing (it was deleted
before being re-written and in between deleting it and re-writing it,
the system crashed), then all index-N blobs were listed to pick the
highest N value.

For non-read-only repositories, this could produce race conditions with
the file system.  In particular, it is possible that the index.latest
blob is being read in order to serve a read request (e.g. get snapshots)
and while doing so, an attempt is made to delete the index.latest blob
and re-write it in order to finalize a snapshot operation.  On some file
systems (e.g. Windows), it is forbidden to delete a file while it is
open for reading by another process/thread.

This commit changes the priority so that figuring out the latest index-N
blob is first done by listing all index-N blobs and determining the
latest N value.  If that values because the repository does not
support listing blobs (e.g. the URL repository), then the index.latest
blob is read.  This is safe because in read-only repositories that do
not support listing blobs, the index.latest blob is never deleted and
then re-written, so the aforementioned issue does not arise.
@abeyad abeyad added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >bug v5.4.0 v6.0.0-alpha1 labels Feb 23, 2017
@abeyad abeyad requested a review from imotov February 23, 2017 17:22
Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@abeyad
Copy link
Author

abeyad commented Feb 23, 2017

retest this please

@abeyad abeyad merged commit 25a9a7e into elastic:master Feb 23, 2017
@abeyad
Copy link
Author

abeyad commented Feb 23, 2017

thanks @s1monw

abeyad pushed a commit that referenced this pull request Feb 23, 2017
…ts (#23333)

There are two ways to determine the latest index-N blob that contains
the truth of the contents of the repository: (1) list all index-N blobs
and figure out what the latest value of N is, and (2) read the
index.latest blob, which contains the latest value of N explicitely.
Note that the index.latest blob is not written atomically and can be
re-written, as opposed to the index-N blobs which are never re-written
(to create an updated index blob, index-{N+1} is written).

Previously, the latest index-N was determined by first trying to read
the index.latest blob and if that blob was missing (it was deleted
before being re-written and in between deleting it and re-writing it,
the system crashed), then all index-N blobs were listed to pick the
highest N value.

For non-read-only repositories, this could produce race conditions with
the file system.  In particular, it is possible that the index.latest
blob is being read in order to serve a read request (e.g. get snapshots)
and while doing so, an attempt is made to delete the index.latest blob
and re-write it in order to finalize a snapshot operation.  On some file
systems (e.g. Windows), it is forbidden to delete a file while it is
open for reading by another process/thread.

This commit changes the priority so that figuring out the latest index-N
blob is first done by listing all index-N blobs and determining the
latest N value.  If that values because the repository does not
support listing blobs (e.g. the URL repository), then the index.latest
blob is read.  This is safe because in read-only repositories that do
not support listing blobs, the index.latest blob is never deleted and
then re-written, so the aforementioned issue does not arise.
@abeyad
Copy link
Author

abeyad commented Feb 23, 2017

5.x commit: 3811a6a

jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Feb 25, 2017
* master: (26 commits)
  CLI: Fix prompting for yes/no to handle console returning null (elastic#23320)
  Tests: Fix reproduce line for packagingTest (elastic#23365)
  Build: Remove extra copies of netty license (elastic#23361)
  [TEST] Removes timeout based wait_for_active_shards REST test (elastic#23360)
  [TEST] increase timeout slightly in wait_for_active_shards test to allow for index creation cluster state update to be processed before ensuring the wait times out
  Handle snapshot repository's missing index.latest
  Adding equals/hashCode to MainResponse (elastic#23352)
  Always restore the ThreadContext for operations delayed due to a block (elastic#23349)
  Add support for named xcontent parsers to high level REST client (elastic#23328)
  Add unit tests for ParentToChildAggregator (elastic#23305)
  Fix after last merge with master and apply last comments
  [INGEST] Lazy load the geoip databases.
  disable BWC tests for the highlighters, need a new 5.x build to make it work
  Expose WordDelimiterGraphTokenFilter (elastic#23327)
  Test that buildCredentials returns correct clazz (elastic#23334)
  Add BreakIteratorBoundaryScanner support for FVH (elastic#23248)
  Prioritize listing index-N blobs over index.latest in reading snapshots (elastic#23333)
  Test: Fix hdfs test fixture setup on windows
  delete and index tests can share some part of the code
  Remove createSampleDocument method and use the sync'ed index method
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v5.4.0 v6.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants