Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peer recovery should flush at the end #41660

Merged
merged 14 commits into from
May 22, 2019
Merged

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Apr 30, 2019

Flushing at the end of a peer recovery (if needed) can bring these benefits:

  • Closing an index won't end up with the red state for a recovering replica should always be ready for closing whether it performs the verifying-before-close step or not.
  • Good opportunities to compact store (i.e., flushing and merging Lucene, and trimming translog)

Closes #40024
Closes #39588
Relates #33888

@dnhatn dnhatn added >enhancement :Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. v8.0.0 v7.2.0 labels Apr 30, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a question.

// if all those uncommitted operations have baked into the existing Lucene index commit already.
final SequenceNumbers.CommitInfo commitInfo = SequenceNumbers.loadSeqNoInfoFromLuceneCommit(
indexShard.commitStats().getUserData().entrySet());
return commitInfo.maxSeqNo != commitInfo.localCheckpoint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the condition above about the translog is sufficient. What situation is the condition here addressing that's not addressed by the above one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a file-based occurs, the primary also sends its translog to replica. These operations are uncommitted on the replica even though they are baked into the commit already. We need this condition to avoid flushing in this case to keep the syncId. I pushed 07c3a7c to use another check.

@dnhatn dnhatn requested a review from ywelsch April 30, 2019 14:22
@dnhatn
Copy link
Member Author

dnhatn commented Apr 30, 2019

@elasticmachine test this please

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could solve the issue and have other benefits as described.

But I am a bit worried about the implications, especially for future maintenance. If we ever add anything into VerifyShardBeforeClose, we need to also ensure the same holds at the end of a recovery. Also, I am not 100% sure recovery is the only place to ensure this (though I have no concrete cases).

I would find it more intuitive to (maybe in addition to this) add a check in MetaDataIndexStateService.closeRoutingTable to fail closing the index if the routing table contains unvalidated shard copies (meaning we would have to collect more info in the previous steps).

@ywelsch
Copy link
Contributor

ywelsch commented May 2, 2019

Good point @henningandersen, but failing the closing operation would also not be very user-friendly, as shards are free to move around based on rebalancing decisions. Let's consider more options here.

@dnhatn
Copy link
Member Author

dnhatn commented May 6, 2019

@henningandersen found that we can always validate max_seq_no equals to the global checkpoint in ReadOnlyEngine with this change. I pushed 6e952c5 to enable it.

@tlrx tlrx mentioned this pull request May 7, 2019
50 tasks
@henningandersen
Copy link
Contributor

@henningandersen found that we can always validate max_seq_no equals to the global checkpoint in ReadOnlyEngine with this change. I pushed 6e952c5 to enable it.

I tend to think I was wrong about this, since FrozenEngine extends ReadOnlyEngine. If something was frozen on 6.7 or 7.0, it might not obey the invariant if they have #41041 ?

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@tlrx tlrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dnhatn
Copy link
Member Author

dnhatn commented May 22, 2019

Thanks everyone!

@dnhatn dnhatn merged commit 75be2a6 into elastic:master May 22, 2019
@dnhatn dnhatn deleted the peer-recovery-flush branch May 22, 2019 02:35
dnhatn added a commit that referenced this pull request May 22, 2019
Flushing at the end of a peer recovery (if needed) can bring these
benefits:

1. Closing an index won't end up with the red state for a recovering
replica should always be ready for closing whether it performs the
verifying-before-close step or not.

2. Good opportunities to compact store (i.e., flushing and merging
Lucene, and trimming translog)

Closes #40024
Closes #39588
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
Flushing at the end of a peer recovery (if needed) can bring these
benefits:

1. Closing an index won't end up with the red state for a recovering
replica should always be ready for closing whether it performs the
verifying-before-close step or not.

2. Good opportunities to compact store (i.e., flushing and merging
Lucene, and trimming translog)

Closes elastic#40024
Closes elastic#39588
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement v7.2.0 v8.0.0-alpha1
Projects
None yet
6 participants