Skip to content

Commit

Permalink
fix missing big transaction with GTIDs
Browse files Browse the repository at this point in the history
Summary:
This diff ports Oracles's patch for missing big transaction
with GTIDs. This diff also includes fix for missing big transaction
with GTIDs when parallel replication is enabled.

Io_thread may receive only a partial transaction before
it was stopped using stop slave. This causes a partial transaction
with GTID to get logged in the relay log. When the slave is restarted
again, it misses the transaction because GTID protocol assumes that the logged
GTID in relay log is complete. This is fixed by removing the last GTID
in the relay log from the gtid_retrieved_set causing the master to resend that
whole transaction.

Possible cases:
1) If there is a partial transaction, the whole transaction is retrieved
again into the next relay log which will be executed by SQL thread. SQL thread
rollbacks the partial transaction after seeing the FDE in the next relay log
and starts executing the same transaction which was retrieved again. In MTS
mode SQL thread appends a ROLLBACK query to the slave worker queue which got
the partial transaction.

2) I/O thread would have retrieved full transaction already and SQL thread
would have already executed it. In that case, We are not going to remove last
retrieved GTID from "Retrieved_gtid_set" otherwise we will see gaps in
"Retrieved set".

3) I/O thread would have retrieved full transaction already in the first time
itself and SQL thread has not applied it yet while requesting dump but applied
it after I/O thread started receiving events from master. In this case
retrieving the same transaction again will not cause problem because GTID
number is same, Hence SQL thread will not commit it again.

Please note there will be partial transactions written in relay log but they
will not cause any problem in case of transactional tables.  But in case
of non-transaction tables, partial transaction will create inconsistency
between master and slave.  In that case, users need to check manually.
This is not a problem for us since we are using transactional tables.

Test Plan:
Added a test to verify all the scenarios with and without MTS. Also
ran mysqltest.sh --parallel=32 with and without valgrind.

Reviewers: steaphan, jtolmer

Reviewed By: steaphan
  • Loading branch information
santoshbanda authored and steaphangreene committed Jun 10, 2014
1 parent 97c9324 commit 9bcc118
Show file tree
Hide file tree
Showing 14 changed files with 558 additions and 49 deletions.
150 changes: 150 additions & 0 deletions mysql-test/suite/rpl/r/rpl_gtid_missing_big_event.result
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
include/master-slave.inc
Warnings:
Note #### Sending passwords in plain text without SSL/TLS is extremely insecure.
Note #### Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
[connection master]
create table t1(a int) engine = innodb;
include/stop_slave.inc
change master to master_auto_position = 1;
include/start_slave.inc
== Testing scenario1 where a partial transaction is written in ==
== relay log and a stop slave; start slave; are executed ==
** Test scenario1 without MTS **
insert into t1 values(1);
set global debug = "d,partial_relay_log_transaction";
insert into t1 values (2);
include/wait_for_slave_io_to_stop.inc
select * from t1;
a
1
set global debug = ``;
include/start_slave_io.inc
select * from t1;
a
1
2
** Test scenario1 with MTS **
include/stop_slave.inc
set @@global.slave_parallel_workers = 2;
include/start_slave.inc
delete from t1;
insert into t1 values(1);
set global debug = "d,partial_relay_log_transaction";
insert into t1 values(2);
include/wait_for_slave_io_to_stop.inc
select * from t1;
a
1
set global debug = ``;
include/start_slave_io.inc
select * from t1;
a
1
2
include/stop_slave.inc
set @@global.slave_parallel_workers = 0;
include/start_slave.inc
== Testing scenario2 where a complete transaction is ==
== retrieved by i/o thread and sql thread executed it ==
** Test scenario2 without MTS **
delete from t1;
insert into t1 values(1);
include/stop_slave.inc
include/start_slave.inc
insert into t1 values(2);
select * from t1;
a
1
2
** Test scenario2 with MTS **
include/stop_slave.inc
set @@global.slave_parallel_workers = 2;
include/start_slave.inc
delete from t1;
insert into t1 values(1);
include/stop_slave.inc
include/start_slave.inc
insert into t1 values(2);
select * from t1;
a
1
2
include/stop_slave.inc
set @@global.slave_parallel_workers = 0;
include/start_slave.inc
== Testing scenario3 where a complete transaction is ==
== retrieved by i/o thread but sql thread didn't execute it ==
== retrieving same transaction here is not a problem since ==
== sql thread just skips if a GTID is already committed ==
** Test scenario3 without MTS **
delete from t1;
include/stop_slave_sql.inc
insert into t1 values(1);
include/sync_slave_io_with_master.inc
include/stop_slave_io.inc
include/start_slave.inc
insert into t1 values(2);
select * from t1;
a
1
2
** Test scenario3 with MTS **
include/stop_slave.inc
set @@global.slave_parallel_workers = 2;
include/start_slave.inc
delete from t1;
include/stop_slave_sql.inc
insert into t1 values(1);
include/sync_slave_io_with_master.inc
include/stop_slave_io.inc
include/start_slave.inc
insert into t1 values(2);
select * from t1;
a
1
2
include/stop_slave.inc
set @@global.slave_parallel_workers = 0;
include/start_slave.inc
== Testing scenario4 where a gtid event is written in ==
== relay log and a stop slave; start slave; are executed ==
** Test scenario4 without MTS **
delete from t1;
insert into t1 values(1);
set global debug = "d,partial_relay_log_transaction_with_only_gtid";
insert into t1 values (2);
include/wait_for_slave_io_to_stop.inc
select * from t1;
a
1
set global debug = ``;
include/start_slave_io.inc
select * from t1;
a
1
2
** Test scenario4 with MTS **
include/stop_slave.inc
set @@global.slave_parallel_workers=2;
include/start_slave.inc
delete from t1;
insert into t1 values(1);
set global debug = "d,partial_relay_log_transaction_with_only_gtid";
insert into t1 values (2);
include/wait_for_slave_io_to_stop.inc
select * from t1;
a
1
set global debug = ``;
include/start_slave_io.inc
select * from t1;
a
1
2
** Clean up **
include/stop_slave.inc
set @@global.slave_parallel_workers = 0;
change master to master_auto_position=0;
include/start_slave.inc
drop table t1;
include/rpl_end.inc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
--gtid_mode=ON --enforce_gtid_consistency --log_slave_updates
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
--gtid_mode=ON --enforce_gtid_consistency --log_slave_updates
204 changes: 204 additions & 0 deletions mysql-test/suite/rpl/t/rpl_gtid_missing_big_event.test
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
source include/master-slave.inc;
source include/have_gtid.inc;
source include/have_debug.inc;
source include/have_innodb.inc;
source include/have_binlog_format_statement.inc;

let $old_debug = `select @@global.debug;`;
connection master;
create table t1(a int) engine = innodb;
sync_slave_with_master;
source include/stop_slave.inc;
change master to master_auto_position = 1;
source include/start_slave.inc;

--echo == Testing scenario1 where a partial transaction is written in ==
--echo == relay log and a stop slave; start slave; are executed ==
--echo ** Test scenario1 without MTS **
connection master;
insert into t1 values(1);
sync_slave_with_master;
set global debug = "d,partial_relay_log_transaction";

connection master;
insert into t1 values (2);
connection slave;

source include/wait_for_slave_io_to_stop.inc;
select * from t1;
eval set global debug = `$old_debug`;
source include/start_slave_io.inc;

connection master;
sync_slave_with_master;
select * from t1;

--echo ** Test scenario1 with MTS **
source include/stop_slave.inc;
set @@global.slave_parallel_workers = 2;
source include/start_slave.inc;

connection master;
delete from t1;
insert into t1 values(1);
sync_slave_with_master;
set global debug = "d,partial_relay_log_transaction";

connection master;
insert into t1 values(2);
connection slave;

source include/wait_for_slave_io_to_stop.inc;
select * from t1;
eval set global debug = `$old_debug`;
source include/start_slave_io.inc;

let $count=2;
let $table=t1;
let $wait_timeout= 300;
source include/wait_until_rows_count.inc;
select * from t1;
source include/stop_slave.inc;
set @@global.slave_parallel_workers = 0;
source include/start_slave.inc;

--echo == Testing scenario2 where a complete transaction is ==
--echo == retrieved by i/o thread and sql thread executed it ==
--echo ** Test scenario2 without MTS **
connection master;
delete from t1;
insert into t1 values(1);
sync_slave_with_master;

source include/stop_slave.inc;
source include/start_slave.inc;

connection master;
insert into t1 values(2);
sync_slave_with_master;
select * from t1;

--echo ** Test scenario2 with MTS **
source include/stop_slave.inc;
set @@global.slave_parallel_workers = 2;
source include/start_slave.inc;
connection master;
delete from t1;
insert into t1 values(1);
sync_slave_with_master;

source include/stop_slave.inc;
source include/start_slave.inc;

connection master;
insert into t1 values(2);
sync_slave_with_master;
select * from t1;
source include/stop_slave.inc;
set @@global.slave_parallel_workers = 0;
source include/start_slave.inc;

--echo == Testing scenario3 where a complete transaction is ==
--echo == retrieved by i/o thread but sql thread didn't execute it ==
--echo == retrieving same transaction here is not a problem since ==
--echo == sql thread just skips if a GTID is already committed ==
--echo ** Test scenario3 without MTS **
connection master;
delete from t1;
sync_slave_with_master;
source include/stop_slave_sql.inc;

connection master;
insert into t1 values(1);
--let $use_gtids = 0
source include/sync_slave_io_with_master.inc;
source include/stop_slave_io.inc;
source include/start_slave.inc;

connection master;
insert into t1 values(2);
sync_slave_with_master;
select * from t1;

--echo ** Test scenario3 with MTS **
source include/stop_slave.inc;
set @@global.slave_parallel_workers = 2;
source include/start_slave.inc;

connection master;
delete from t1;
sync_slave_with_master;
source include/stop_slave_sql.inc;

connection master;
insert into t1 values(1);
--let $use_gtids = 0
source include/sync_slave_io_with_master.inc;
source include/stop_slave_io.inc;
source include/start_slave.inc;

connection master;
insert into t1 values(2);
sync_slave_with_master;
select * from t1;
source include/stop_slave.inc;
set @@global.slave_parallel_workers = 0;
source include/start_slave.inc;


--echo == Testing scenario4 where a gtid event is written in ==
--echo == relay log and a stop slave; start slave; are executed ==
--echo ** Test scenario4 without MTS **
connection master;
delete from t1;
insert into t1 values(1);
sync_slave_with_master;
set global debug = "d,partial_relay_log_transaction_with_only_gtid";

connection master;
insert into t1 values (2);
connection slave;

source include/wait_for_slave_io_to_stop.inc;
select * from t1;
eval set global debug = `$old_debug`;
source include/start_slave_io.inc;

connection master;
sync_slave_with_master;
select * from t1;

--echo ** Test scenario4 with MTS **
source include/stop_slave.inc;
set @@global.slave_parallel_workers=2;
source include/start_slave.inc;

connection master;
delete from t1;
insert into t1 values(1);
sync_slave_with_master;
set global debug = "d,partial_relay_log_transaction_with_only_gtid";

connection master;
insert into t1 values (2);
connection slave;

source include/wait_for_slave_io_to_stop.inc;
select * from t1;
eval set global debug = `$old_debug`;
source include/start_slave_io.inc;

let $count=2;
let $table=t1;
source include/wait_until_rows_count.inc;
let $wait_timeout= 300;
select * from t1;

--echo ** Clean up **
source include/stop_slave.inc;
set @@global.slave_parallel_workers = 0;
change master to master_auto_position=0;
source include/start_slave.inc;
connection master;
drop table t1;
source include/rpl_end.inc;
Loading

0 comments on commit 9bcc118

Please sign in to comment.