Bug #1497912: Too weak condition (no < scanned_no) in log copying #110

gl-sergei · 2015-10-12T17:21:30Z

thread

There can be three reasons behind the log block mismatch in the
InnoDB redo log copying loop.

We used incorrect last checkpoint LSN offset in our
calculations. MySQL / Percona Server 5.6 and Percona Server 5.5
store this offset in different places of InnoDB system tablespace
header.
We came across old log block. We must wait until InnoDB flush this
block.
InnoDB redo log is corrupted.

We cannot distinguish between the cases 1 and 2 when we are inside of
the loop. Instead of making a decision on which LSN offset to use
when we are already inside of the loop, we make this decision before
starting to copy InnoDB redo log. Since InnoDB flushes redo log first
and only then updates tablespace header, the log block pointed by the
last checkpoint LSN must be already written, thus we consider that
old log blocks are impossible to meet.

We are trying to decide which LSN offset to use right after we read
the last checkpoint LSN from InnoDB system tablespace header. This
happen twice: just before we start log copying and when we are about
to stop it.

Redo log copying loop was too large to be easily maintainable. I
extracted the inner loop into separate function.

Bug #1505017: False positive error "The log was not applied to the
intended LSN"

XtraBackup copies InnoDB redo log up to where it has been
written. Sometimes a log record isn't fully written by server by the
time XtraBackup accessing it and backup might end up with the last
log record written partially.

XtraBackup stores the last checkpoint LSN and the last copied LSN to
the 'xtrabackup_checkpoint' file.

On apply-log stage XtraBackup compares the LSN which redo log has
been applied to with the LSN stored in 'xtrabackup_checkpoint' file.

If we compare the last applied LSN to the last LSN containing in
'xtrabackup_checkpoints', latter might actually be greater because of
partially written log record at the end of the 'xtrabackup_logfile'.

We don't really need to apply redo log up to that LSN. What we want
is to apply redo log up to the last checkpoint LSN.

On the other hand it would not be easy for us to detect partially
written log record when we copy InnoDB redo log, because it would
require us to parse records as we copy them, which we don't do.

Given all this, the fix which I came up with is to verify that the
'xtrabackup_logfile' has been applied at least up to the last
checkpoint LSN instead of the last copied LSN.

thread There can be three reasons behind the log block mismatch in the InnoDB redo log copying loop. 1. We used incorrect last checkpoint LSN offset in our calculations. MySQL / Percona Server 5.6 and Percona Server 5.5 store this offset in different places of InnoDB system tablespace header. 2. We came across old log block. We must wait until InnoDB flush this block. 3. InnoDB redo log is corrupted. We cannot distinguish between the cases 1 and 2 when we are inside of the loop. Instead of making a decision on which LSN offset to use when we are already inside of the loop, we make this decision before starting to copy InnoDB redo log. Since InnoDB flushes redo log first and only then updates tablespace header, the log block pointed by the last checkpoint LSN must be already written, thus we consider that old log blocks are impossible to meet. We are trying to decide which LSN offset to use right after we read the last checkpoint LSN from InnoDB system tablespace header. This happen twice: just before we start log copying and when we are about to stop it. Redo log copying loop was too large to be easily maintainable. I extracted the inner loop into separate function. Bug #1505017: False positive error "The log was not applied to the intended LSN" XtraBackup copies InnoDB redo log up to where it has been written. Sometimes a log record isn't fully written by server by the time XtraBackup accessing it and backup might end up with the last log record written partially. XtraBackup stores the last checkpoint LSN and the last copied LSN to the 'xtrabackup_checkpoint' file. On apply-log stage XtraBackup compares the LSN which redo log has been applied to with the LSN stored in 'xtrabackup_checkpoint' file. If we compare the last applied LSN to the last LSN containing in 'xtrabackup_checkpoints', latter might actually be greater because of partially written log record at the end of the 'xtrabackup_logfile'. We don't really need to apply redo log up to that LSN. What we want is to apply redo log up to the last checkpoint LSN. On the other hand it would not be easy for us to detect partially written log record when we copy InnoDB redo log, because it would require us to parse records as we copy them, which we don't do. Given all this, the fix which I came up with is to verify that the 'xtrabackup_logfile' has been applied at least up to the last checkpoint LSN instead of the last copied LSN.

gl-sergei · 2015-10-12T17:21:56Z

http://jenkins.percona.com/view/PXB%202.2/job/percona-xtrabackup-2.2-param/343/

Bug #1497912: Too weak condition (no < scanned_no) in log copying

gl-sergei added a commit that referenced this pull request Oct 14, 2015

Merge pull request #110 from gl-sergei/2.2-xb-bug1497912

d7d1e7d

Bug #1497912: Too weak condition (no < scanned_no) in log copying

gl-sergei merged commit d7d1e7d into percona:2.2 Oct 14, 2015

gl-sergei deleted the 2.2-xb-bug1497912 branch November 24, 2016 01:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug #1497912: Too weak condition (no < scanned_no) in log copying #110

Bug #1497912: Too weak condition (no < scanned_no) in log copying #110

gl-sergei commented Oct 12, 2015

gl-sergei commented Oct 12, 2015

Bug #1497912: Too weak condition (no < scanned_no) in log copying #110

Bug #1497912: Too weak condition (no < scanned_no) in log copying #110

Conversation

gl-sergei commented Oct 12, 2015

gl-sergei commented Oct 12, 2015