Skip to content

Commit

Permalink
BUG#17943188 SHOW SLAVE STATUS/RETRIEVED_GTID_SET MAY HAVE PARTIAL TRX
Browse files Browse the repository at this point in the history
             OR MISS COMPLETE TRX

Problem:
=======

The SHOW SLAVE STATUS command contains the column RETRIEVED_GTID_SET.
This is supposed to contain the set of GTIDs that exist in the relay
log. However, the field is updated when the slave receiver thread
(I/O thread) receives a Gtid_log_event, which happens at the beginning
of the transaction.

If the I/O thread gets disconnected in the middle of a transaction,
RETRIEVED_GTID_SET can contain a GTID for a transaction that is only
partially received in the relay log. This transaction will
subsequently be rolled back, so it is wrong to pretend that the
transaction is there.

Typical fail-over algorithms use RETRIEVED_GTID_SET to determine which
slave has received the most transactions to promote the slave to a
master. This is true for e.g. the mysqlfailover utility.

When RETRIEVED_GTID_SET can contain partially transmitted transactions,
the fail-over utility can choose the wrong slave to promote. This can
lead to data corruption later.

This means that even if semi-sync is enabled, transactions that have
been acknowledged by one slave can be lost.

Fix:
===

It was implemented a transaction boundaries parser that will give
information about transaction boundaries of an event stream based on
the event types and their queries (when they are Query_log_event).

As events are queued by the I/O thread, it feeds the Master_info
transaction boundary parser. The slave I/O recovery also uses the
transaction parser to determine if a given GTID can be added to the
Retrieved_Gtid_Set or not.

When the event parser is in GTID state because a Gtid_log_event was
queued, the event's GTID isn't added to the retrieved list yet.
It is stored in an auxiliary GTID variable.

After flushing an event into the relay log, the IO thread verifies if
the transaction parser is not inside a transaction anymore (meaning
that the last event of the transaction has been flushed).

If transaction parser is outside a transaction, the I/O thread
verifies if a GTID was stored in the start of the transaction, adding
it to the retrieved list, ensuring that all the transaction has arrived
and was flushed to the relay log.

Also, before this patch, after the I/O thread flushed a single received
event into the relaylog, it was possible to rotate the relaylog if the
current relaylog file size exceeded max_binlog_size/max_relaylog_size.
After this patch, when GTIDs are enabled we only allow this rotation by
size if the transaction parser is not in the middle of a transaction.

Note: The current patch removed the changes for BUG#17280176, as it
      also dealt with similar problem in a different way.
  • Loading branch information
Joao Gramacho committed Jan 16, 2015
1 parent 25d1855 commit 9dab9da
Show file tree
Hide file tree
Showing 42 changed files with 5,686 additions and 110 deletions.
1 change: 1 addition & 0 deletions libmysqld/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ SET(SQL_EMBEDDED_SOURCES
../sql/rpl_gtid_persist.cc
../sql/rpl_table_access.cc
../sql/rpl_context.cc
../sql/rpl_trx_boundary_parser.cc
${IMPORTED_SOURCES}
)

Expand Down
28 changes: 23 additions & 5 deletions mysql-test/extra/rpl_tests/grep_pattern.inc
Original file line number Diff line number Diff line change
Expand Up @@ -3,23 +3,41 @@
--perl
use strict;
my $verbose= $ENV{'GREP_PRINT_NOT_VERBOSE'} ? 0 : 1;
my $last_sequence_after= $ENV{'GREP_LAST_SEQUENCE_AFTER'};
my $file= $ENV{'GREP_FILE'} or die "grep file not set";
my $pattern= $ENV{'GREP_PATTERN'} or die "pattern is not set";
open(FILE, "$file") or die("Unable to open $file: $!\n");
my $count = 0;
print "Matching lines are:\n";
my $output = "";
while (<FILE>) {
my $line = $_;
if ($last_sequence_after && $line =~ /$last_sequence_after/) {
$output = "";
$count = 0;
}
if ($line =~ /$pattern/) {
if ($verbose == 1) {
print "$line\n";
if ($ENV{'GREP_FIND'}) {
$line=~ s/$ENV{'GREP_FIND'}/$ENV{'GREP_REPLACE'}/;
}
if ($ENV{'GREP_NO_NEWLINE'}) {
$output= $output . $line;
} else {
$output= $output . $line . "\n";
}
}
$count++;
}
}
if ($count == 0) {
print "None\n";
unless($ENV{'GREP_SKIP_HEADER'}) {
print "Matching lines are:\n";
}
print $output;
unless($ENV{'GREP_SKIP_FOOTER'}) {
if ($count == 0) {
print "None\n";
}
print "Occurrences of the $pattern in the input file : $count\n";
}
print "Occurrences of the $pattern in the input file : $count\n";
close(FILE);
EOF
44 changes: 44 additions & 0 deletions mysql-test/extra/rpl_tests/rpl_trx_boundary_parser.inc
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# ==== Purpose ====
#
# This include will insert some data into a table at the master varying the
# debug sync point at slave that will be used to stop the IO thread in the
# middle of transaction event stream (trying to let partial transactions in
# the relay log).
#
# It will do this task (insert some data) twice.
#
# The first time with the SQL thread stopped, letting the IO thread do its job
# until all data is replicated, starting the SQL only at the end of the test.
#
# The second time, the SQL thread will be running all the time, syncing on each
# step of the test.
#
# ==== Usage ====
#
# [--let $storage_engine= InnoDB | MyISAM]
# --source extra/rpl_tests/rpl_trx_boundary_parser.inc
#
# Parameters:
# $storage_engine
# The storage engine that will be used in the CREATE TABLE statement.
# If not specified, InnoDB will be used.
#

if ( `SELECT '$storage_engine' != '' AND UPPER('$storage_engine') <> 'INNODB' AND UPPER('$storage_engine') <> 'MYISAM'` )
{
--die ERROR IN TEST: invalid value for mysqltest variable 'storage_engine': $storage_engine
}

--echo ## Running the test with the SQL thread stopped
--source include/rpl_connection_slave.inc
--source include/stop_slave_sql.inc
--source extra/rpl_tests/rpl_trx_boundary_parser_all_steps.inc

--echo ## Starting and syncing the SQL thread before next round
--source include/rpl_connection_slave.inc
--source include/start_slave_sql.inc
--source include/rpl_connection_master.inc
--source include/sync_slave_sql_with_master.inc

--echo ## Running the test with the SQL thread started
--source extra/rpl_tests/rpl_trx_boundary_parser_all_steps.inc
252 changes: 252 additions & 0 deletions mysql-test/extra/rpl_tests/rpl_trx_boundary_parser_all_steps.inc
Original file line number Diff line number Diff line change
@@ -0,0 +1,252 @@
# ==== Purpose ====
#
# This include will insert some data into a table at the master varying the
# debug sync point at slave that will be used to stop the IO thread in the
# middle of transaction event stream (trying to let partial transactions in
# the relay log).
#
# ==== Usage ====
#
# [--let $storage_engine= InnoDB | MyISAM]
# --source extra/rpl_tests/rpl_trx_boundary_parser_all_steps.inc
#
# Parameters:
# $storage_engine
# The storage engine that will be used in the CREATE TABLE statement.
# If not specified, InnoDB will be used.
#

if (!$storage_engine)
{
--let $_storage_engine= INNODB
}
if ($storage_engine)
{
--let $_storage_engine= `SELECT UPPER('$storage_engine')`
}
if ( `SELECT '$_storage_engine' <> 'INNODB' AND '$_storage_engine' <> 'MYISAM'` )
{
--die ERROR IN TEST: invalid value for mysqltest variable 'storage_engine': $storage_engine
}

# Check if SQL thread is running
--source include/rpl_connection_slave.inc
--let $_is_sql_thread_running= query_get_value(SHOW SLAVE STATUS, Slave_SQL_Running, 1)

# If the SQL thread is stopped, we will assert GTIDs based on
# Retrieved_Gtid_Set
if ( $_is_sql_thread_running == No )
{
--let $assert_on_retrieved_gtid_set= 1
--let $gtid_step_assert_include=include/gtid_step_assert_on_retrieved.inc
--let $gtid_step_reset_include=include/gtid_step_reset_on_retrieved.inc
}
if ( $_is_sql_thread_running == Yes )
{
--let $assert_on_retrieved_gtid_set= 0
--let $gtid_step_assert_include=include/gtid_step_assert.inc
--let $gtid_step_reset_include=include/gtid_step_reset.inc
}

--source include/rpl_connection_master.inc
# GTID steps will be based on master's UUID
--let $gtid_step_uuid= `SELECT @@GLOBAL.SERVER_UUID`
--source include/rpl_connection_slave.inc
--source $gtid_step_reset_include

# Creating tables t1 and t2 using $_storage_engine
# Table t1 will log the testcase activity
# Table t2 will be used to insert data to be tested
--source include/rpl_connection_master.inc
--eval CREATE TABLE t1 (i INT NOT NULL AUTO_INCREMENT PRIMARY KEY, info VARCHAR(64)) ENGINE=$_storage_engine
--eval CREATE TABLE t2 (i INT) ENGINE=$_storage_engine

#
# First, we insert some data, restart the slave IO thread and
# sync slave SQL thread (if it is running) with master
# as a normal case just for control.
#

# Insert data without splitting transactions in the relay log
INSERT INTO t1 (info) VALUE ('Insert data without splitting transactions in the relay log');

BEGIN;
INSERT INTO t2 (i) VALUES (-6);
INSERT INTO t2 (i) VALUES (-5);
INSERT INTO t2 (i) VALUES (-4);
COMMIT;

# Check if SQL thread was running before to sync it
if ( $_is_sql_thread_running == Yes )
{
# Sync SQL thread
--source include/rpl_connection_master.inc
--source include/sync_slave_sql_with_master.inc
--let diff_tables= master:t1, slave:t1
--source include/diff_tables.inc
}
# Else we sync only the IO thread
if ( $_is_sql_thread_running == No )
{
# Sync IO thread
--source include/rpl_connection_master.inc
--source include/sync_slave_io_with_master.inc
}

# Restart the IO thread not in the middle of transaction
--source include/rpl_connection_slave.inc
--source include/stop_slave_io.inc
--source include/start_slave_io.inc

# Check if the IO thread retrieved the correct amount of GTIDs
--source include/rpl_connection_slave.inc
--let $gtid_step_count= 4
if ($_storage_engine == 'MYISAM')
{
--let $gtid_step_count= 6
}
--source $gtid_step_assert_include

#
# Second, we make master rotate its binlog
#

# Insert data rotating master binlog between two transactions
--source include/rpl_connection_master.inc
INSERT INTO t1 (info) VALUE ('Insert data rotating master binlog between two transactions');

BEGIN;
INSERT INTO t2 (i) VALUES (-3);
INSERT INTO t2 (i) VALUES (-2);
COMMIT;
FLUSH LOGS;
INSERT INTO t1 (info) VALUE ('After FLUSH LOGS at master');
BEGIN;
INSERT INTO t2 (i) VALUES (-1);
INSERT INTO t2 (i) VALUES (0);
COMMIT;

# Check if SQL thread was running before to sync it
if ( $_is_sql_thread_running == Yes )
{
# Sync SQL thread
--source include/rpl_connection_master.inc
--source include/sync_slave_sql_with_master.inc
--let diff_tables= master:t1, slave:t1
--source include/diff_tables.inc
}
# Else we sync only the IO thread
if ( $_is_sql_thread_running == No )
{
# Sync IO thread
--source include/rpl_connection_master.inc
--source include/sync_slave_io_with_master.inc
}

# Restart the IO thread again, not in the middle of transaction
--source include/rpl_connection_slave.inc
--source include/stop_slave_io.inc
--source include/start_slave_io.inc

# Check if the IO thread retrieved the correct amount of GTIDs
--source include/rpl_connection_slave.inc
--let $gtid_step_count= 4
if ($_storage_engine == 'MYISAM')
{
# We will expect a different amount of GTIDs, as the non-transactional
# storage engine will "ignore" the BEGIN/COMMIT boundaries and will
# generate one transaction for each INSERT statement.
--let $gtid_step_count= 6
}
--source $gtid_step_assert_include

#
# Third, let's go with splitting transactions
#

--let $info_table= t1
--let $table= t2
--let $counter= 0

# Stop after GTID, just if GTIDs are enabled
--inc $counter
--let $debug_point= stop_io_after_reading_gtid_log_event
--let $gtids_after_stop= 1
--let $gtids_after_sync= 2
if ($_storage_engine == 'MYISAM')
{
--let $gtids_after_sync= 3
}
--source extra/rpl_tests/rpl_trx_boundary_parser_one_step.inc

# Stop after BEGIN query
--inc $counter
--let $debug_point= stop_io_after_reading_query_log_event
--let $gtids_after_stop= 1
--let $gtids_after_sync= 2
if ($_storage_engine == 'MYISAM')
{
--let $gtids_after_sync= 3
}
--source extra/rpl_tests/rpl_trx_boundary_parser_one_step.inc

# Stop after USER_VAR, just for SBR
if ( `SELECT @@GLOBAL.binlog_format = 'STATEMENT'` )
{
--inc $counter
--let $debug_point= stop_io_after_reading_user_var_log_event
--let $gtids_after_stop= 1
--let $gtids_after_sync= 2
if ($_storage_engine == 'MYISAM')
{
--let $gtids_after_stop= 2
--let $gtids_after_sync= 2
}
--source extra/rpl_tests/rpl_trx_boundary_parser_one_step.inc
}

# Stop after TABLE_MAP, just for RBR
if ( `SELECT @@GLOBAL.binlog_format = 'ROW'` )
{
--inc $counter
--let $debug_point= stop_io_after_reading_table_map_event
--let $gtids_after_stop= 1
--let $gtids_after_sync= 2
if ($_storage_engine == 'MYISAM')
{
--let $gtids_after_stop= 1
--let $gtids_after_sync= 3
}
--source extra/rpl_tests/rpl_trx_boundary_parser_one_step.inc
}

# Stop after XID, just for InnoDB tables
if ( $_storage_engine == 'INNODB' )
{
--inc $counter
--let $debug_point= stop_io_after_reading_xid_log_event
--let $gtids_after_stop= 2
--let $gtids_after_sync= 1
--source extra/rpl_tests/rpl_trx_boundary_parser_one_step.inc
}

# Check if SQL thread was running before to sync it
if ( $_is_sql_thread_running == Yes )
{
# Sync SQL thread
--source include/rpl_connection_master.inc
--source include/sync_slave_sql_with_master.inc
--let diff_tables= master:t1, slave:t1
--source include/diff_tables.inc
}

# Dropping tables t1 and t2
--source include/rpl_connection_master.inc
DROP TABLE t1,t2;

# Check if SQL thread was running before to sync it
if ( $_is_sql_thread_running == Yes )
{
# Let the slave to sync with the master before exiting the include
--source include/sync_slave_sql_with_master.inc
}
Loading

0 comments on commit 9dab9da

Please sign in to comment.