Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing historyUUID in peer recovery when rolling upgrade 5.x to 6.3 #31506

Merged
merged 4 commits into from
Jun 22, 2018

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Jun 21, 2018

Today we make sure that a 5.x index commit should have all required
commit tags in RecoveryTarget#cleanFiles method. The reason we do this
in RecoveryTarget#cleanFiles method because this is only needed in a
file-based recovery and we assume that #cleanFiles should be called in a
file-based recovery. However, this assumption is not valid if the index
is sealed (.i.e synced-flush). This incorrect assumption would prevent
users from rolling upgrade from 5.x to 6.3 if their index were sealed.

Closes #31482

Today we make sure that a 5.x index commit should have all required
commit tags in RecoveryTarget#cleanFiles method. The reason we do this
in RecoveryTarget#cleanFiles method because this is only needed in a
file-based recovery and we assume that #cleanFiles should be called in a
file-based recovery. However, this assumption is not valid if the index
is sealed (.i.e synced-flush). This incorrect assumption would prevent
users from rolling upgrade from 5.x to 6.3 if their index were sealed.
@dnhatn dnhatn added >bug >upgrade :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. v6.4.0 v6.3.1 labels Jun 21, 2018
@dnhatn dnhatn requested review from bleskes and ywelsch June 21, 2018 14:32
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some nits but looks good o.w.

createIndex(index, settings.build());
indexRandomDocuments(count, randomBoolean(), randomBoolean(),
n -> JsonXContent.contentBuilder().startObject().field("key", "value").endObject());
assertBusy(() -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we replace this assertBusy by an ensureGreen(index)?

*/
public void testRecoverySealedIndex() throws Exception {
int count;
if (runningAgainstOldCluster) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe fold this if-else into the next one.

@@ -892,6 +932,7 @@ public void testHistoryUUIDIsAdded() throws Exception {
mappingsAndSettings.endObject();
client().performRequest("PUT", "/" + index, Collections.emptyMap(),
new StringEntity(Strings.toString(mappingsAndSettings), ContentType.APPLICATION_JSON));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove extra line

* Tests that a synced-flushed index is correctly recovered.
* This might be an edge-case from 5.x to 6.x since a 5.x index commit does not have all required 6.x commit tags.
*/
public void testRecoverySealedIndex() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testRecoverSyncedFlushIndex

@@ -285,4 +286,27 @@ public void testSearchGeoPoints() throws Exception {
}
}

public void testRecoverySealedIndex() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testRecoverSyncedFlushIndex

@@ -285,4 +286,27 @@ public void testSearchGeoPoints() throws Exception {
}
}

public void testRecoverySealedIndex() throws Exception {
final String index = "recovery_a_sealed_index";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

synced_flush_index

.put(SETTING_ALLOCATION_MAX_RETRY.getKey(), "0"); // fail faster
createIndex(index, settings.build());
indexDocs(index, 0, randomInt(5));
assertBusy(() -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above, replace by ensureGreen()?

@dnhatn
Copy link
Member Author

dnhatn commented Jun 21, 2018

@ywelsch Thanks for looking.

Can we replace this assertBusy by an ensureGreen(index)?

Unfortunately, we have to spin here. We fire a global checkpoint sync after the last op is completed. A synced-flush will fail if the global checkpoint is happening. I added a comment to explain this.

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I wonder if #31291 is relevant (it is different because it doesn't fail to recover). I think it's worth while double checking that.

* Tests that a synced-flushed index is correctly recovered.
* This might be an edge-case from 5.x to 6.x since a 5.x index commit does not have all required 6.x commit tags.
*/
public void testRecoverSyncedFlushIndex() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we fold this into testRecovery and randomly do a synced flush instead of a normal flush?

@dnhatn
Copy link
Member Author

dnhatn commented Jun 21, 2018

I wonder if #31291 is relevant (it is different because it doesn't fail to recover). I think it's worth while double checking that.

Sure, will do.

@dnhatn
Copy link
Member Author

dnhatn commented Jun 22, 2018

Thanks @ywelsch and @bleskes for reviewing!

@dnhatn dnhatn merged commit 9100a70 into elastic:6.x Jun 22, 2018
@dnhatn dnhatn deleted the fix-missing-historyuuid branch June 22, 2018 03:29
dnhatn added a commit that referenced this pull request Jun 22, 2018
…6.3 (#31506)

Today we make sure that a 5.x index commit should have all required
commit tags in RecoveryTarget#cleanFiles method. The reason we do this
in RecoveryTarget#cleanFiles method because this is only needed in a
file-based recovery and we assume that #cleanFiles should be called in a
file-based recovery. However, this assumption is not valid if the index
is sealed (.i.e synced-flush). This incorrect assumption would prevent
users from rolling upgrade from 5.x to 6.3 if their index were sealed.

Closes #31482
dnhatn added a commit that referenced this pull request Jun 23, 2018
Although the master branch does not affect by #31482, it's helpful to
have BWC tests that verify the peer recovery with a synced-flush index.
This commit adds the bwc tests from #31506 to the master branch.

Relates #31482
Relates #31506
@dnhatn
Copy link
Member Author

dnhatn commented Jun 23, 2018

Although the master branch is not affected by this issue, I think it's nice to have these BWC tests in the master branch. I added these tests to the master branch in 5115102.

dnhatn added a commit that referenced this pull request Jun 23, 2018
* 6.x:
  Avoid sending duplicate remote failed shard requests (#31313)
  Add get field mappings to High Level REST API Client Relates to #27205
  [DOCS] Updates Watcher examples for code testing (#31152)
  [DOCS] Move monitoring to docs folder (#31477)
  [DOCS] Fixes SQL docs in nav
  [DOCS] Move sql to docs
  IndexShard should not return null stats - empty stats or AlreadyCloseException if it's closed is better
  Clarify that IP range data can be specified in CIDR notation. (#31374)
  Remove some cases in FieldTypeLookupTests that are no longer relevant. (#31381)
  In NumberFieldType equals and hashCode, make sure that NumberType is taken into account. (#31514)
  fix repository update with the same settings but different type
  Revert "AwaitsFix FullClusterRestartIT#testRecovery"
  Upgrade to Lucene 7.4.0. (#31529)
  Avoid deprecation warning when running the ML datafeed extractor. (#31463)
  Retry synced-flush in FullClusterRestartIT#testRecovery
  Allow multiple unicast host providers (#31509)
  [ML] Add ML filter update API (#31437)
  AwaitsFix FullClusterRestartIT#testRecovery
  Fix missing historyUUID in peer recovery when rolling upgrade 5.x to 6.3 (#31506)
  Remove QueryCachingPolicy#ALWAYS_CACHE (#31451)
  Rename createNewTranslog to fileBasedRecovery (#31508)
  [DOCS] Add code snippet testing in more ML APIs (#31339)
  [DOCS] Remove fixed file from build.gradle
  [DOCS] Creates field and document level security overview (#30937)
  Test: Skip assertion on windows
  [DOCS] Move migration APIs to docs (#31473)
  Add a known issue for upgrading from 5.x to 6.3.0 (#31501)
  Return transport addresses from UnicastHostsProvider (#31426)
  Add Delete Snapshot High Level REST API
  Reload secure settings for plugins (#31481)
  [DOCS] Fix JDBC Maven client group/artifact ID
colings86 pushed a commit that referenced this pull request Jun 25, 2018
Although the master branch does not affect by #31482, it's helpful to
have BWC tests that verify the peer recovery with a synced-flush index.
This commit adds the bwc tests from #31506 to the master branch.

Relates #31482
Relates #31506
@bleskes
Copy link
Contributor

bleskes commented Jun 25, 2018

I added these tests to the master branch in 5115102.

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >upgrade v6.3.1 v6.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants