New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use index for peer recovery instead of translog #45136
Use index for peer recovery instead of translog #45136
Commits on Jun 19, 2019
-
Create peer-recovery retention leases (elastic#43190)
This creates a peer-recovery retention lease for every shard during recovery, ensuring that the replication group retains history for future peer recoveries. It also ensures that leases for active shard copies do not expire, and leases for inactive shard copies expire immediately if the shard is fully-allocated. Relates elastic#41536
Configuration menu - View commit details
-
Copy full SHA for 2ec1483 - Browse repository at this point
Copy the full SHA 2ec1483View commit details
Commits on Jun 21, 2019
-
Configuration menu - View commit details
-
Copy full SHA for dfa22bc - Browse repository at this point
Copy the full SHA dfa22bcView commit details
Commits on Jun 24, 2019
-
Configuration menu - View commit details
-
Copy full SHA for f68fac4 - Browse repository at this point
Copy the full SHA f68fac4View commit details
Commits on Jun 26, 2019
-
Configuration menu - View commit details
-
Copy full SHA for cb39840 - Browse repository at this point
Copy the full SHA cb39840View commit details -
Configuration menu - View commit details
-
Copy full SHA for f5fdb75 - Browse repository at this point
Copy the full SHA f5fdb75View commit details
Commits on Jun 28, 2019
-
Configuration menu - View commit details
-
Copy full SHA for 00145cd - Browse repository at this point
Copy the full SHA 00145cdView commit details
Commits on Jul 1, 2019
-
Configuration menu - View commit details
-
Copy full SHA for cb6b0a9 - Browse repository at this point
Copy the full SHA cb6b0a9View commit details -
Configuration menu - View commit details
-
Copy full SHA for b328478 - Browse repository at this point
Copy the full SHA b328478View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7f7f84b - Browse repository at this point
Copy the full SHA 7f7f84bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6bac16a - Browse repository at this point
Copy the full SHA 6bac16aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9941eb6 - Browse repository at this point
Copy the full SHA 9941eb6View commit details -
Configuration menu - View commit details
-
Copy full SHA for f3fbb33 - Browse repository at this point
Copy the full SHA f3fbb33View commit details -
Advance PRRLs to match GCP of tracked shards (elastic#43751)
This commit adjusts the behaviour of the retention lease sync to first renew any peer-recovery retention leases where either: - the corresponding shard's global checkpoint has advanced, or - the lease is older than half of its expiry time Relates elastic#41536
Configuration menu - View commit details
-
Copy full SHA for fbc4477 - Browse repository at this point
Copy the full SHA fbc4477View commit details
Commits on Jul 3, 2019
-
Configuration menu - View commit details
-
Copy full SHA for b1be151 - Browse repository at this point
Copy the full SHA b1be151View commit details
Commits on Jul 4, 2019
-
Configuration menu - View commit details
-
Copy full SHA for 291ff8d - Browse repository at this point
Copy the full SHA 291ff8dView commit details -
Remove PRRLs before performing file-based recovery (elastic#43928)
If the primary performs a file-based recovery to a node that has (or recently had) a copy of the shard then it is possible that the persisted global checkpoint of the new copy is behind that of the old copy since file-based recoveries are somewhat destructive operations. Today we leave that node's PRRL in place during the recovery with the expectation that it can be used by the new copy. However this isn't the case if the new copy needs more history to be retained, because retention leases may only advance and never retreat. This commit addresses this by removing any existing PRRL during a file-based recovery: since we are performing a file-based recovery we have already determined that there isn't enough history for an ops-based recovery, so there is little point in keeping the old lease in place. Caught by [a failure of `RecoveryWhileUnderLoadIT.testRecoverWhileRelocating`](https://scans.gradle.com/s/wxccfrtfgjj3g/console-log?task=:server:integTest#L14) Relates elastic#41536
Configuration menu - View commit details
-
Copy full SHA for 389c625 - Browse repository at this point
Copy the full SHA 389c625View commit details
Commits on Jul 5, 2019
-
Update BWC version for PRRLs (elastic#43958)
This commit updates the version in which PRRLs are expected to exist to 7.4.0.
Configuration menu - View commit details
-
Copy full SHA for ac2da33 - Browse repository at this point
Copy the full SHA ac2da33View commit details -
Return recovery to generic thread post-PRRL action (elastic#44000)
Today we perform `TransportReplicationAction` derivatives during recovery, and these actions call their response handlers on the transport thread. This change moves the continued execution of the recovery back onto the generic threadpool.
Configuration menu - View commit details
-
Copy full SHA for d016e79 - Browse repository at this point
Copy the full SHA d016e79View commit details -
Configuration menu - View commit details
-
Copy full SHA for 76ff6e8 - Browse repository at this point
Copy the full SHA 76ff6e8View commit details -
Skip PRRL renewal on UNASSIGNED_SEQ_NO (elastic#44019)
Today when renewing PRRLs we assert that any invalid "backwards" renewals must be because we are recovering the shard. In fact it's also possible to have `checkpointState.globalCheckpoint == SequenceNumbers.UNASSIGNED_SEQ_NO` on a tracked shard copy if the primary was just promoted and hasn't received checkpoints from all of its peers too. This commit weakens the assertion to match. Caught by a [failure of the full cluster restart tests](https://scans.gradle.com/s/5lllzgqtuegty/console-log#L8605) Relates elastic#41536
Configuration menu - View commit details
-
Copy full SHA for da3c901 - Browse repository at this point
Copy the full SHA da3c901View commit details
Commits on Jul 8, 2019
-
Only call assertNotTransportThread if asserts on (elastic#44028)
In elastic#44000 we introduced some calls to `assertNotTransportThread` that are executed whether assertions are enabled or not. Although they have no effect if assertions are disabled, we should have done it like this instead.
Configuration menu - View commit details
-
Copy full SHA for c5ed201 - Browse repository at this point
Copy the full SHA c5ed201View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9523445 - Browse repository at this point
Copy the full SHA 9523445View commit details -
Create missing PRRLs after primary activation (elastic#44009)
Today peer recovery retention leases (PRRLs) are created when starting a replication group from scratch and during peer recovery. However, if the replication group was migrated from nodes running a version which does not create PRRLs (e.g. 7.3 and earlier) then it's possible that the primary was relocated or promoted without first establishing all the expected leases. It's not possible to establish these leases before or during primary activation, so we must create them as soon as possible afterwards. This gives weaker guarantees about history retention, since there's a possibility that history will be discarded before it can be used. In practice such situations are expected to occur only rarely. This commit adds the machinery to create missing leases after primary activation, and strengthens the assertions about the existence of such leases in order to ensure that once all the leases do exist we never again enter a state where there's a missing lease. Relates elastic#41536
Configuration menu - View commit details
-
Copy full SHA for bea2627 - Browse repository at this point
Copy the full SHA bea2627View commit details -
Reduce number of replicas in cluster restart test
The cluster in the full-cluster restart test only has 2 nodes, so we cannot fully allocate an index with 2 replicas.
Configuration menu - View commit details
-
Copy full SHA for 11e9880 - Browse repository at this point
Copy the full SHA 11e9880View commit details -
Only create missing PRRLs when appropriate
Today PRRLs are not supported on closed indices or indices where soft deletes are disabled, but (confusingly) nor are they actively forbidden. This commit avoids creating them unnecessarily in unsupported situations.
Configuration menu - View commit details
-
Copy full SHA for d7f7ebc - Browse repository at this point
Copy the full SHA d7f7ebcView commit details
Commits on Jul 9, 2019
-
Configuration menu - View commit details
-
Copy full SHA for ba7c4be - Browse repository at this point
Copy the full SHA ba7c4beView commit details
Commits on Jul 11, 2019
-
Configuration menu - View commit details
-
Copy full SHA for e12bde6 - Browse repository at this point
Copy the full SHA e12bde6View commit details
Commits on Jul 15, 2019
-
Configuration menu - View commit details
-
Copy full SHA for b8bcc0b - Browse repository at this point
Copy the full SHA b8bcc0bView commit details
Commits on Jul 20, 2019
-
Configuration menu - View commit details
-
Copy full SHA for 40ea029 - Browse repository at this point
Copy the full SHA 40ea029View commit details
Commits on Jul 23, 2019
-
Configuration menu - View commit details
-
Copy full SHA for 69c94f4 - Browse repository at this point
Copy the full SHA 69c94f4View commit details -
Use global checkpoint as starting seq in ops-based recovery (elastic#…
…43463) Today we use the local checkpoint of the safe commit on replicas as the starting sequence number of operation-based peer recovery. While this is a good choice due to its simplicity, we need to share this information between copies if we use retention leases in peer recovery. We can avoid this extra work if we use the global checkpoint as the starting sequence number. With this change, we will try to recover replica locally up to the global checkpoint before performing peer recovery. This commit should also increase the chance of operation-based recovery.
Configuration menu - View commit details
-
Copy full SHA for d15684d - Browse repository at this point
Copy the full SHA d15684dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 06d9be6 - Browse repository at this point
Copy the full SHA 06d9be6View commit details
Commits on Jul 24, 2019
-
Do not load global checkpoint to ReplicationTracker in local recovery…
… step (elastic#44781) If we force allocate an empty or stale primary, the global checkpoint on replicas might be higher than the primary's as the local recovery step (introduced in elastic#43463) loads the previous (stale) global checkpoint into ReplicationTracker. There's no issue with the retention leases for a new lease with a higher term will supersede the stale one. Relates elastic#43463
Configuration menu - View commit details
-
Copy full SHA for 6275cd7 - Browse repository at this point
Copy the full SHA 6275cd7View commit details
Commits on Jul 25, 2019
-
Configuration menu - View commit details
-
Copy full SHA for 96dd543 - Browse repository at this point
Copy the full SHA 96dd543View commit details
Commits on Jul 29, 2019
-
Configuration menu - View commit details
-
Copy full SHA for 417a2ac - Browse repository at this point
Copy the full SHA 417a2acView commit details -
Configuration menu - View commit details
-
Copy full SHA for b5c897c - Browse repository at this point
Copy the full SHA b5c897cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 446ebf0 - Browse repository at this point
Copy the full SHA 446ebf0View commit details
Commits on Jul 30, 2019
-
Configuration menu - View commit details
-
Copy full SHA for 7a247e5 - Browse repository at this point
Copy the full SHA 7a247e5View commit details -
Skip local recovery for closed or frozen indices (elastic#44887)
For closed and frozen indices, we should not recover shard locally up to the global checkpoint before performing peer recovery for that copy might be offline when the index was closed/frozen. Relates elastic#43463 Closes elastic#44855
Configuration menu - View commit details
-
Copy full SHA for 907bb55 - Browse repository at this point
Copy the full SHA 907bb55View commit details
Commits on Jul 31, 2019
-
Configuration menu - View commit details
-
Copy full SHA for 0b066d3 - Browse repository at this point
Copy the full SHA 0b066d3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2bae406 - Browse repository at this point
Copy the full SHA 2bae406View commit details
Commits on Aug 1, 2019
-
Configuration menu - View commit details
-
Copy full SHA for 6960cf7 - Browse repository at this point
Copy the full SHA 6960cf7View commit details -
Recover peers using history from Lucene (elastic#44853)
Thanks to peer recovery retention leases we now retain the history needed to perform peer recoveries from the index instead of from the translog. This commit adjusts the peer recovery process to do so, and also adjusts it to use the existence of a retention lease to decide whether or not to attempt an operations-based recovery. Reverts elastic#38904 and elastic#42211 Relates elastic#41536
Configuration menu - View commit details
-
Copy full SHA for 5322b00 - Browse repository at this point
Copy the full SHA 5322b00View commit details -
Reset starting seqno if fail to read last commit (elastic#45106)
Previously, if the metadata snapshot is empty (either no commit found or error), we won't compute the starting sequence number and use -2 to opt out the operation-based recovery. With elastic#43463, we have a starting sequence number before reading the last commit. Thus, we need to reset it if we fail to snapshot the store. Closes elastic#45072
Configuration menu - View commit details
-
Copy full SHA for 77720e8 - Browse repository at this point
Copy the full SHA 77720e8View commit details
Commits on Aug 2, 2019
-
Configuration menu - View commit details
-
Copy full SHA for 51778da - Browse repository at this point
Copy the full SHA 51778daView commit details -
Configuration menu - View commit details
-
Copy full SHA for aea938b - Browse repository at this point
Copy the full SHA aea938bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 09ae1e6 - Browse repository at this point
Copy the full SHA 09ae1e6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 89b6a3b - Browse repository at this point
Copy the full SHA 89b6a3bView commit details