Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not perform cleanup if Manifest write fails with dirty exception #40519

Merged
merged 2 commits into from Apr 1, 2019

Conversation

@andrershov
Copy link
Contributor

commented Mar 27, 2019

Currently if Manifest write is unsuccesful (i.e. WriteStateException is thrown) we perform cleanup of newly created metadata files. However, this is wrong.
Consider the following sequence (catched by CI here #39077):

  • cluster global data is written successful
  • the associated manifest write fails (during the fsync, ie files have been written)
  • deleting (revert) the manifest files, fails, metadata is therefore persisted
  • deleting (revert) the cluster global data is successful

In this case, when trying to load metadata (after node restart because of dirty WriteStateException), the following exception will happen

java.io.IOException: failed to find global metadata [generation: 0]

because manifest file is referencing missing global metadata file.

This commit checks if thrown WriteStateException is dirty and if its we don't perform any cleanup, because new Manifest file might be created, but its deletion has failed.
In the future, we might add more fine grained check - perform the cleanup if WriteStateException is dirty, but Manifest deletion is succesful.

Closes #39077

@elasticmachine

This comment has been minimized.

Copy link
Collaborator

commented Mar 27, 2019

Copy link
Contributor

left a comment

Also unmute testAtomicityWithFailures?

Copy link
Contributor

left a comment

LGTM

@andrershov andrershov merged commit 1fac569 into elastic:master Apr 1, 2019
8 checks passed
8 checks passed
CLA All commits in pull request signed
Details
elasticsearch-ci/1 Build finished.
Details
elasticsearch-ci/2 Build finished.
Details
elasticsearch-ci/bwc Build finished.
Details
elasticsearch-ci/default-distro Build finished.
Details
elasticsearch-ci/docbldesx Build finished.
Details
elasticsearch-ci/oss-distro-docs Build finished.
Details
elasticsearch-ci/packaging-sample Build finished.
Details
andrershov added a commit that referenced this pull request Apr 1, 2019
…40519)

Currently, if Manifest write is unsuccessful (i.e. WriteStateException
is thrown) we perform cleanup of newly created metadata files.
However, this is wrong.
Consider the following sequence (caught by CI here
#39077):

- cluster global data is written **successful**
- the associated manifest write **fails** (during the fsync, ie files
have been written)
- deleting (revert) the manifest files, **fails**, metadata is
therefore persisted
- deleting (revert) the cluster global data is **successful**

In this case, when trying to load metadata (after node restart
because of dirty WriteStateException),  the following exception will
happen
```
java.io.IOException: failed to find global metadata [generation: 0]
```
because the manifest file is referencing missing global metadata file.

This commit checks if thrown WriteStateException is dirty and if its
we don't perform any cleanup, because new Manifest file might be
created, but its deletion has failed.
In the future, we might add more fine-grained check - perform the
clean up if WriteStateException is dirty, but Manifest deletion is
successful.

Closes #39077

(cherry picked from commit 1fac569)
andrershov added a commit that referenced this pull request Apr 1, 2019
…40519)

Currently, if Manifest write is unsuccessful (i.e. WriteStateException
is thrown) we perform cleanup of newly created metadata files.
However, this is wrong.
Consider the following sequence (caught by CI here
#39077):

- cluster global data is written **successful**
- the associated manifest write **fails** (during the fsync, ie files
have been written)
- deleting (revert) the manifest files, **fails**, metadata is
therefore persisted
- deleting (revert) the cluster global data is **successful**

In this case, when trying to load metadata (after node restart
because of dirty WriteStateException),  the following exception will
happen
```
java.io.IOException: failed to find global metadata [generation: 0]
```
because the manifest file is referencing missing global metadata file.

This commit checks if thrown WriteStateException is dirty and if its
we don't perform any cleanup, because new Manifest file might be
created, but its deletion has failed.
In the future, we might add more fine-grained check - perform the
clean up if WriteStateException is dirty, but Manifest deletion is
successful.

Closes #39077

(cherry picked from commit 1fac569)
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Apr 1, 2019
…lastic#40519)

Currently, if Manifest write is unsuccessful (i.e. WriteStateException
is thrown) we perform cleanup of newly created metadata files.
However, this is wrong. 
Consider the following sequence (caught by CI here
elastic#39077):

- cluster global data is written **successful**
- the associated manifest write **fails** (during the fsync, ie files
have been written)
- deleting (revert) the manifest files, **fails**, metadata is
therefore persisted
- deleting (revert) the cluster global data is **successful**

In this case, when trying to load metadata (after node restart
because of dirty WriteStateException),  the following exception will
happen
```
java.io.IOException: failed to find global metadata [generation: 0]
```
because the manifest file is referencing missing global metadata file.

This commit checks if thrown WriteStateException is dirty and if its
we don't perform any cleanup, because new Manifest file might be
created, but its deletion has failed.
In the future, we might add more fine-grained check - perform the
clean up if WriteStateException is dirty, but Manifest deletion is
successful.

Closes elastic#39077
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Apr 1, 2019
* elastic/7.0:
  [TEST] Mute WebhookHttpsIntegrationTests.testHttps
  [DOCS] Add 'time value' links to several monitor settings (elastic#40633) (elastic#40687)
  Do not perform cleanup if Manifest write fails with dirty exception (elastic#40519)
  Remove mention of soft deletes from getting started (elastic#40668)
  Fix bug in detecting use of bundled JDK on macOS
  Reindex conflicts clarification (docs) (elastic#40442)
  SQL: [Tests] Enable integration tests for fixed issues (elastic#40664)
  Add information about the default sort mode (elastic#40657)
  SQL: [Docs] Fix example for CURDATE
  SQL: [Docs] Fix doc errors regarding CURRENT_DATE. (elastic#40649)
  Clarify using time_zone and date math in range query (elastic#40655)
  Add notice for bundled jdk (elastic#40576)
  disable kerberos test until kerberos fixture is working again
  [DOCS] Use "source" instead of "inline" in ML docs (elastic#40635)
  Unmute and fix testSubParserArray (elastic#40626)
  Geo Point parse error fix (elastic#40447)
  Increase suite timeout to 30 minutes for docs tests (elastic#40521)
  Fix repository-hdfs when no docker and unnecesary fixture
  Avoid building hdfs-fixure use an image that works instead
@jakelandis jakelandis added v7.0.0-rc2 and removed v7.0.0 labels Apr 3, 2019
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
…lastic#40519)

Currently, if Manifest write is unsuccessful (i.e. WriteStateException
is thrown) we perform cleanup of newly created metadata files.
However, this is wrong. 
Consider the following sequence (caught by CI here
elastic#39077):

- cluster global data is written **successful**
- the associated manifest write **fails** (during the fsync, ie files
have been written)
- deleting (revert) the manifest files, **fails**, metadata is
therefore persisted
- deleting (revert) the cluster global data is **successful**

In this case, when trying to load metadata (after node restart
because of dirty WriteStateException),  the following exception will
happen
```
java.io.IOException: failed to find global metadata [generation: 0]
```
because the manifest file is referencing missing global metadata file.

This commit checks if thrown WriteStateException is dirty and if its
we don't perform any cleanup, because new Manifest file might be
created, but its deletion has failed.
In the future, we might add more fine-grained check - perform the
clean up if WriteStateException is dirty, but Manifest deletion is
successful.

Closes elastic#39077
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.