Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug #1497912: Too weak condition (no < scanned_no) in log copying #110

Merged
merged 1 commit into from Oct 14, 2015

Conversation

gl-sergei
Copy link
Contributor

thread

There can be three reasons behind the log block mismatch in the
InnoDB redo log copying loop.

  1. We used incorrect last checkpoint LSN offset in our
    calculations. MySQL / Percona Server 5.6 and Percona Server 5.5
    store this offset in different places of InnoDB system tablespace
    header.
  2. We came across old log block. We must wait until InnoDB flush this
    block.
  3. InnoDB redo log is corrupted.

We cannot distinguish between the cases 1 and 2 when we are inside of
the loop. Instead of making a decision on which LSN offset to use
when we are already inside of the loop, we make this decision before
starting to copy InnoDB redo log. Since InnoDB flushes redo log first
and only then updates tablespace header, the log block pointed by the
last checkpoint LSN must be already written, thus we consider that
old log blocks are impossible to meet.

We are trying to decide which LSN offset to use right after we read
the last checkpoint LSN from InnoDB system tablespace header. This
happen twice: just before we start log copying and when we are about
to stop it.

Redo log copying loop was too large to be easily maintainable. I
extracted the inner loop into separate function.

Bug #1505017: False positive error "The log was not applied to the
intended LSN"

XtraBackup copies InnoDB redo log up to where it has been
written. Sometimes a log record isn't fully written by server by the
time XtraBackup accessing it and backup might end up with the last
log record written partially.

XtraBackup stores the last checkpoint LSN and the last copied LSN to
the 'xtrabackup_checkpoint' file.

On apply-log stage XtraBackup compares the LSN which redo log has
been applied to with the LSN stored in 'xtrabackup_checkpoint' file.

If we compare the last applied LSN to the last LSN containing in
'xtrabackup_checkpoints', latter might actually be greater because of
partially written log record at the end of the 'xtrabackup_logfile'.

We don't really need to apply redo log up to that LSN. What we want
is to apply redo log up to the last checkpoint LSN.

On the other hand it would not be easy for us to detect partially
written log record when we copy InnoDB redo log, because it would
require us to parse records as we copy them, which we don't do.

Given all this, the fix which I came up with is to verify that the
'xtrabackup_logfile' has been applied at least up to the last
checkpoint LSN instead of the last copied LSN.

thread

There can be three reasons behind the log block mismatch in the
InnoDB redo log copying loop.

1. We used incorrect last checkpoint LSN offset in our
   calculations. MySQL / Percona Server 5.6 and Percona Server 5.5
   store this offset in different places of InnoDB system tablespace
   header.
2. We came across old log block. We must wait until InnoDB flush this
   block.
3. InnoDB redo log is corrupted.

We cannot distinguish between the cases 1 and 2 when we are inside of
the loop. Instead of making a decision on which LSN offset to use
when we are already inside of the loop, we make this decision before
starting to copy InnoDB redo log. Since InnoDB flushes redo log first
and only then updates tablespace header, the log block pointed by the
last checkpoint LSN must be already written, thus we consider that
old log blocks are impossible to meet.

We are trying to decide which LSN offset to use right after we read
the last checkpoint LSN from InnoDB system tablespace header. This
happen twice: just before we start log copying and when we are about
to stop it.

Redo log copying loop was too large to be easily maintainable. I
extracted the inner loop into separate function.

Bug #1505017: False positive error "The log was not applied to the
intended LSN"

XtraBackup copies InnoDB redo log up to where it has been
written. Sometimes a log record isn't fully written by server by the
time XtraBackup accessing it and backup might end up with the last
log record written partially.

XtraBackup stores the last checkpoint LSN and the last copied LSN to
the 'xtrabackup_checkpoint' file.

On apply-log stage XtraBackup compares the LSN which redo log has
been applied to with the LSN stored in 'xtrabackup_checkpoint' file.

If we compare the last applied LSN to the last LSN containing in
'xtrabackup_checkpoints', latter might actually be greater because of
partially written log record at the end of the 'xtrabackup_logfile'.

We don't really need to apply redo log up to that LSN. What we want
is to apply redo log up to the last checkpoint LSN.

On the other hand it would not be easy for us to detect partially
written log record when we copy InnoDB redo log, because it would
require us to parse records as we copy them, which we don't do.

Given all this, the fix which I came up with is to verify that the
'xtrabackup_logfile' has been applied at least up to the last
checkpoint LSN instead of the last copied LSN.
@gl-sergei
Copy link
Contributor Author

gl-sergei added a commit that referenced this pull request Oct 14, 2015
Bug #1497912: Too weak condition (no < scanned_no) in log copying
@gl-sergei gl-sergei merged commit d7d1e7d into percona:2.2 Oct 14, 2015
@gl-sergei gl-sergei deleted the 2.2-xb-bug1497912 branch November 24, 2016 01:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant