Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use index for peer recovery instead of translog #45136

Merged
merged 48 commits into from Aug 2, 2019

Conversation

DaveCTurner
Copy link
Contributor

Today we recover a replica by copying operations from the primary's translog.
However we also retain some historical operations in the index itself, as long
as soft-deletes are enabled. This commit adjusts peer recovery to use the
operations in the index for recovery rather than those in the translog, and
ensures that the replication group retains enough history for use in peer
recovery by means of retention leases.

Reverts #38904 and #42211
Relates #41536

DaveCTurner and others added 30 commits June 19, 2019 17:39
This creates a peer-recovery retention lease for every shard during recovery,
ensuring that the replication group retains history for future peer recoveries.
It also ensures that leases for active shard copies do not expire, and leases
for inactive shard copies expire immediately if the shard is fully-allocated.

Relates elastic#41536
This commit adjusts the behaviour of the retention lease sync to first renew
any peer-recovery retention leases where either:

- the corresponding shard's global checkpoint has advanced, or

- the lease is older than half of its expiry time

Relates elastic#41536
If the primary performs a file-based recovery to a node that has (or recently
had) a copy of the shard then it is possible that the persisted global
checkpoint of the new copy is behind that of the old copy since file-based
recoveries are somewhat destructive operations.

Today we leave that node's PRRL in place during the recovery with the
expectation that it can be used by the new copy. However this isn't the case if
the new copy needs more history to be retained, because retention leases may
only advance and never retreat.

This commit addresses this by removing any existing PRRL during a file-based
recovery: since we are performing a file-based recovery we have already
determined that there isn't enough history for an ops-based recovery, so there
is little point in keeping the old lease in place.

Caught by [a failure of `RecoveryWhileUnderLoadIT.testRecoverWhileRelocating`](https://scans.gradle.com/s/wxccfrtfgjj3g/console-log?task=:server:integTest#L14)

Relates elastic#41536
This commit updates the version in which PRRLs are expected to exist to 7.4.0.
Today we perform `TransportReplicationAction` derivatives during recovery, and
these actions call their response handlers on the transport thread. This change
moves the continued execution of the recovery back onto the generic threadpool.
Today when renewing PRRLs we assert that any invalid "backwards" renewals must
be because we are recovering the shard. In fact it's also possible to have
`checkpointState.globalCheckpoint == SequenceNumbers.UNASSIGNED_SEQ_NO` on a
tracked shard copy if the primary was just promoted and hasn't received
checkpoints from all of its peers too.

This commit weakens the assertion to match.

Caught by a [failure of the full cluster restart
tests](https://scans.gradle.com/s/5lllzgqtuegty/console-log#L8605)

Relates elastic#41536
In elastic#44000 we introduced some calls to `assertNotTransportThread` that are
executed whether assertions are enabled or not. Although they have no effect if
assertions are disabled, we should have done it like this instead.
Today peer recovery retention leases (PRRLs) are created when starting a
replication group from scratch and during peer recovery. However, if the
replication group was migrated from nodes running a version which does not
create PRRLs (e.g. 7.3 and earlier) then it's possible that the primary was
relocated or promoted without first establishing all the expected leases.

It's not possible to establish these leases before or during primary
activation, so we must create them as soon as possible afterwards. This gives
weaker guarantees about history retention, since there's a possibility that
history will be discarded before it can be used. In practice such situations
are expected to occur only rarely.

This commit adds the machinery to create missing leases after primary
activation, and strengthens the assertions about the existence of such leases
in order to ensure that once all the leases do exist we never again enter a
state where there's a missing lease.

Relates elastic#41536
The cluster in the full-cluster restart test only has 2 nodes, so we cannot
fully allocate an index with 2 replicas.
Today PRRLs are not supported on closed indices or indices where soft deletes
are disabled, but (confusingly) nor are they actively forbidden. This commit
avoids creating them unnecessarily in unsupported situations.
dnhatn pushed a commit to dnhatn/elasticsearch that referenced this pull request Aug 12, 2019
Now that elastic#45136 means we perform recoveries from the index rather than the
translog (if soft deletes are enabled) there is no need to retain extra
translog for performing peer recoveries. This commit reduces the default
translog retention to zero so that it can be discarded more quickly.
dnhatn added a commit that referenced this pull request Aug 21, 2019
Since #45136, we use soft-deletes instead of translog in peer recovery.
There's no need to retain extra translog to increase a chance of
operation-based recoveries. This commit ignores the translog retention
policy if soft-deletes is enabled so we can discard translog more
quickly.

Co-authored-by: David Turner <david.turner@elastic.co>

Relates #45136
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Aug 22, 2019
Since elastic#45136, we use soft-deletes instead of translog in peer recovery.
There's no need to retain extra translog to increase a chance of
operation-based recoveries. This commit ignores the translog retention
policy if soft-deletes is enabled so we can discard translog more
quickly.

Co-authored-by: David Turner <david.turner@elastic.co>

Relates elastic#45136
dnhatn added a commit that referenced this pull request Aug 22, 2019
Since #45136, we use soft-deletes instead of translog in peer recovery.
There's no need to retain extra translog to increase a chance of
operation-based recoveries. This commit ignores the translog retention
policy if soft-deletes is enabled so we can discard translog more
quickly.

Backport of #45473
Relates #45136
dnhatn added a commit that referenced this pull request Nov 21, 2019
Today we do not use retention leases in peer recovery for closed indices 
because we can't sync retention leases on closed indices. This change
allows that ability and adjusts peer recovery to use retention leases
for all indices with soft-deletes enabled.

Relates #45136


Co-authored-by: David Turner <david.turner@elastic.co>
jimczi pushed a commit to jimczi/elasticsearch that referenced this pull request Nov 22, 2019
Today we do not use retention leases in peer recovery for closed indices 
because we can't sync retention leases on closed indices. This change
allows that ability and adjusts peer recovery to use retention leases
for all indices with soft-deletes enabled.

Relates elastic#45136


Co-authored-by: David Turner <david.turner@elastic.co>
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Nov 24, 2019
Today we do not use retention leases in peer recovery for closed indices
because we can't sync retention leases on closed indices. This change
allows that ability and adjusts peer recovery to use retention leases
for all indices with soft-deletes enabled.

Relates elastic#45136

Co-authored-by: David Turner <david.turner@elastic.co>
dnhatn added a commit that referenced this pull request Dec 13, 2019
Since 7.4, we switch from translog to Lucene as the source of history 
for peer recoveries. However, we reduce the likelihood of
operation-based recoveries when performing a full cluster restart from
pre-7.4 because existing copies do not have PPRL.

To remedy this issue, we fallback using translog in peer recoveries if 
the recovering replica does not have a peer recovery retention lease,
and the replication group hasn't fully migrated to PRRL.

Relates #45136
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Dec 15, 2019
Today we do not use retention leases in peer recovery for closed indices
because we can't sync retention leases on closed indices. This change
allows that ability and adjusts peer recovery to use retention leases
for all indices with soft-deletes enabled.

Relates elastic#45136

Co-authored-by: David Turner <david.turner@elastic.co>
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Dec 15, 2019
Since 7.4, we switch from translog to Lucene as the source of history
for peer recoveries. However, we reduce the likelihood of
operation-based recoveries when performing a full cluster restart from
pre-7.4 because existing copies do not have PPRL.

To remedy this issue, we fallback using translog in peer recoveries if
the recovering replica does not have a peer recovery retention lease,
and the replication group hasn't fully migrated to PRRL.

Relates elastic#45136
dnhatn added a commit that referenced this pull request Dec 15, 2019
Today we do not use retention leases in peer recovery for closed indices
because we can't sync retention leases on closed indices. This change
allows that ability and adjusts peer recovery to use retention leases
for all indices with soft-deletes enabled.

Relates #45136

Co-authored-by: David Turner <david.turner@elastic.co>
dnhatn added a commit that referenced this pull request Dec 15, 2019
Since 7.4, we switch from translog to Lucene as the source of history
for peer recoveries. However, we reduce the likelihood of
operation-based recoveries when performing a full cluster restart from
pre-7.4 because existing copies do not have PPRL.

To remedy this issue, we fallback using translog in peer recoveries if
the recovering replica does not have a peer recovery retention lease,
and the replication group hasn't fully migrated to PRRL.

Relates #45136
dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Dec 16, 2019
Since 7.4, we switch from translog to Lucene as the source of history
for peer recoveries. However, we reduce the likelihood of
operation-based recoveries when performing a full cluster restart from
pre-7.4 because existing copies do not have PPRL.

To remedy this issue, we fallback using translog in peer recoveries if
the recovering replica does not have a peer recovery retention lease,
and the replication group hasn't fully migrated to PRRL.

Relates elastic#45136
dnhatn added a commit that referenced this pull request Dec 16, 2019
Since 7.4, we switch from translog to Lucene as the source of history
for peer recoveries. However, we reduce the likelihood of
operation-based recoveries when performing a full cluster restart from
pre-7.4 because existing copies do not have PPRL.

To remedy this issue, we fallback using translog in peer recoveries if
the recovering replica does not have a peer recovery retention lease,
and the replication group hasn't fully migrated to PRRL.

Relates #45136
dnhatn added a commit that referenced this pull request Dec 20, 2019
…50351)

Today, the replica allocator uses peer recovery retention leases to 
select the best-matched copies when allocating replicas of indices with
soft-deletes. We can employ this mechanism for indices without
soft-deletes because the retaining sequence number of a PRRL is the
persisted global checkpoint (plus one) of that copy. If the primary and 
replica have the same retaining sequence number, then we should be able
to perform a noop recovery. The reason is that we must be retaining
translog up to the local checkpoint of the safe commit, which is at most
the global checkpoint of either copy). The only limitation is that we
might not cancel ongoing file-based recoveries with PRRLs for noop
recoveries. We can't make the translog retention policy comply with
PRRLs. We also have this problem with soft-deletes if a PRRL is about to
expire.

Relates #45136
Relates #46959
dnhatn added a commit that referenced this pull request Dec 24, 2019
…50351)

Today, the replica allocator uses peer recovery retention leases to
select the best-matched copies when allocating replicas of indices with
soft-deletes. We can employ this mechanism for indices without
soft-deletes because the retaining sequence number of a PRRL is the
persisted global checkpoint (plus one) of that copy. If the primary and
replica have the same retaining sequence number, then we should be able
to perform a noop recovery. The reason is that we must be retaining
translog up to the local checkpoint of the safe commit, which is at most
the global checkpoint of either copy). The only limitation is that we
might not cancel ongoing file-based recoveries with PRRLs for noop
recoveries. We can't make the translog retention policy comply with
PRRLs. We also have this problem with soft-deletes if a PRRL is about to
expire.

Relates #45136
Relates #46959
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020
Since 7.4, we switch from translog to Lucene as the source of history 
for peer recoveries. However, we reduce the likelihood of
operation-based recoveries when performing a full cluster restart from
pre-7.4 because existing copies do not have PPRL.

To remedy this issue, we fallback using translog in peer recoveries if 
the recovering replica does not have a peer recovery retention lease,
and the replication group hasn't fully migrated to PRRL.

Relates elastic#45136
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020
…lastic#50351)

Today, the replica allocator uses peer recovery retention leases to 
select the best-matched copies when allocating replicas of indices with
soft-deletes. We can employ this mechanism for indices without
soft-deletes because the retaining sequence number of a PRRL is the
persisted global checkpoint (plus one) of that copy. If the primary and 
replica have the same retaining sequence number, then we should be able
to perform a noop recovery. The reason is that we must be retaining
translog up to the local checkpoint of the safe commit, which is at most
the global checkpoint of either copy). The only limitation is that we
might not cancel ongoing file-based recoveries with PRRLs for noop
recoveries. We can't make the translog retention policy comply with
PRRLs. We also have this problem with soft-deletes if a PRRL is about to
expire.

Relates elastic#45136
Relates elastic#46959
@mfussenegger mfussenegger mentioned this pull request Mar 24, 2020
37 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Recovery Anything around constructing a new shard, either from a local or a remote source. >enhancement v7.4.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants