Skip to content
This repository has been archived by the owner on Nov 5, 2022. It is now read-only.

Unable to set up mysql semi sync with ripple #23

Open
moumouwan opened this issue Feb 25, 2020 · 9 comments
Open

Unable to set up mysql semi sync with ripple #23

moumouwan opened this issue Feb 25, 2020 · 9 comments

Comments

@moumouwan
Copy link

Hi, I am trying to set up semi sync between master and ripple, but semi sync always switched off automatically when I am running sysbench, the errors on both sides are like below:

Master error.log:

2020-02-25T09:40:06.868855Z 0 [ERROR] /usr/sbin/mysqld: Got packets out of order
2020-02-25T09:40:06.868902Z 0 [ERROR] /usr/sbin/mysqld: Got packets out of order
2020-02-25T09:40:06.868932Z 0 [ERROR] /usr/sbin/mysqld: Couldn't uncompress communication packet
2020-02-25T09:40:06.868941Z 0 [ERROR] /usr/sbin/mysqld: Got packets out of order
2020-02-25T09:40:06.868948Z 0 [ERROR] /usr/sbin/mysqld: Got packets out of order
2020-02-25T09:40:06.870024Z 0 [ERROR] /usr/sbin/mysqld: Got timeout reading communication packets
2020-02-25T09:40:06.875397Z 0 [ERROR] /usr/sbin/mysqld: Got an error reading communication packets
2020-02-25T09:40:06.882687Z 29 [Note] While initializing dump thread for slave with server_id <112211>, found a zombie dump thread with the same server_id. Master is killing the zombie dump thread(3).
2020-02-25T09:40:06.882770Z 3 [Note] Stop semi-sync binlog_dump to slave (server_id: 112211)
2020-02-25T09:40:06.882815Z 29 [Note] Start binlog_dump to master_thread_id(29) slave_server(112211), pos(, 4)
2020-02-25T09:40:06.882837Z 29 [Note] Start semi-sync binlog_dump to slave (server_id: 112211), pos(, 4)
2020-02-25T09:40:06.989646Z 30 [Note] While initializing dump thread for slave with server_id <112211>, found a zombie dump thread with the same server_id. Master is killing the zombie dump thread(29).
2020-02-25T09:40:06.989714Z 29 [Note] Stop semi-sync binlog_dump to slave (server_id: 112211)
2020-02-25T09:40:06.989770Z 30 [Note] Start binlog_dump to master_thread_id(30) slave_server(112211), pos(, 4)
2020-02-25T09:40:06.989793Z 30 [Note] Start semi-sync binlog_dump to slave (server_id: 112211), pos(, 4)
2020-02-25T09:41:46.868547Z 23 [Warning] Timeout waiting for reply of binlog (file: mysql-bin.000001, pos: 596), semi-sync up to file , position 4.
2020-02-25T09:41:46.868601Z 23 [Note] Semi-sync replication switched OFF.

Rpl_semi_sync_master_status also turned OFF during the run:

mysql> show status like '%rpl%';
+--------------------------------------------+-------+
| Variable_name | Value |
+--------------------------------------------+-------+
| Rpl_semi_sync_master_clients | 1 |
| Rpl_semi_sync_master_net_avg_wait_time | 0 |
| Rpl_semi_sync_master_net_wait_time | 0 |
| Rpl_semi_sync_master_net_waits | 3 |
| Rpl_semi_sync_master_no_times | 1 |
| Rpl_semi_sync_master_no_tx | 2 |
| Rpl_semi_sync_master_status | OFF |
| Rpl_semi_sync_master_timefunc_failures | 0 |
| Rpl_semi_sync_master_tx_avg_wait_time | 0 |
| Rpl_semi_sync_master_tx_wait_time | 0 |
| Rpl_semi_sync_master_tx_waits | 0 |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0 |
| Rpl_semi_sync_master_wait_sessions | 0 |
| Rpl_semi_sync_master_yes_tx | 0 |
+--------------------------------------------+-------+

Ripple log:

I0225 09:40:06.894847 24855 mysql_protocol.cc:550] Send event, type: Xid, timestamp: 1582623606, length: 31, nextpos: 14648
I0225 09:40:06.894914 24855 mysql_protocol.cc:550] Send event, type: Gtid, timestamp: 1582623606, length: 65, nextpos: 14729
I0225 09:40:06.894984 24855 mysql_protocol.cc:550] Send event, type: Query, timestamp: 1582623606, length: 74, nextpos: 14819
I0225 09:40:06.895058 24855 mysql_protocol.cc:550] Send event, type: Unknown event: 19, timestamp: 1582623606, length: 90, nextpos: 14925
I0225 09:40:06.895129 24855 mysql_protocol.cc:550] Send event, type: Unknown event: 30, timestamp: 1582623606, length: 290, nextpos: 15231
I0225 09:40:06.895203 24855 mysql_protocol.cc:550] Send event, type: Xid, timestamp: 1582623606, length: 31, nextpos: 15278

E0225 09:40:06.986371 24811 mysql_master_session.cc:298] Failed to read packet: Got error reading packet from server: Lost connection to MySQL server during query

I0225 09:40:06.986407 24811 binlog.cc:675] Connection closed last position binlog file: binlog.000000:15278, gtid: 0-0-25
I0225 09:40:06.986430 24811 mysql_master_session.cc:273] Disconnecting from master
I0225 09:40:06.987946 24811 mysql_client_connection.cc:148] connected to host: xxx, port: 3306
I0225 09:40:06.990568 24811 mysql_master_session.cc:137] Connected to host: xxx, port: 3306, server_id: 1, server_name:
I0225 09:40:06.991778 24811 mysql_master_session.cc:202] master has semi sync enabled
I0225 09:40:06.991804 24811 mysql_master_session.cc:207] start replicating from '02b13200-56ca-11ea-945b-000d3aa16eab:0-0-25'
I0225 09:40:06.992206 24811 mysql_master_session.cc:230] Master session entering main loop
I0225 09:40:06.992585 24811 binlog.cc:626] Update binlog position to end_pos: binlog.000000:15341, gtid: 0-0-25
I0225 09:40:06.992621 24811 binlog.cc:616] Skip writing event [ Previous_gtids len = 27 ]
I0225 09:40:06.992635 24811 binlog.cc:626] Update binlog position to end_pos: binlog.000000:15341, gtid: 0-0-25

Background:
Mysql version: 5.7
instances on Azure VM Ubuntu, master and ripple are on different 2 VMs,

Commands I am running:
./bazel-bin/rippled -ripple_datadir=/mnt/mysql-log -ripple_master_address=xxx -ripple_master_port=3306 -ripple_master_user=repl_user -ripple_master_password=root123 -ripple_semi_sync_slave_enabled=true

I also verified that:

  1. Async replica works fine between Master & Ripple and slave (ripple & slave on same machine)
  2. Semi sync also works fine with Master & slave (without using ripple)
  3. Increase max_allowed_packet or rpl-semi-sync-master-timeout don't help

Could someone help provide guidance on what I I did wrong or missing ?

Thanks

@pivanof
Copy link
Contributor

pivanof commented Mar 1, 2020

The MySQL logs that you posted seem to indicate that Ripple doesn't send semi-sync replies, but Ripple logs don't indicate why. So it's hard to say what's the problem. Maybe you can add more logging to the code here https://github.com/google/mysql-ripple/blob/master/mysql_master_session.cc#L253 to understand what's happening and why Ripple doesn't send semi-sync replies?

@moumouwan
Copy link
Author

Thank you for your reply !!!

I add logging as you suggested, looks like semi_sync_reply always 0, GetSemiSyncSlaveReplyActive() return 1 => bool reply always 0 in my case

So I delete semi_sync_reply like below just for testing :

  bool reply = GetSemiSyncSlaveReplyActive();
  //LOG(INFO) << "semi_sync_reply is: " << semi_sync_reply;
  LOG(INFO) << "method: " << GetSemiSyncSlaveReplyActive();
  LOG(INFO) << "reply from semi sync is: " << reply;
  if (!binlog_->AddEvent(event, reply)) {
    LOG(ERROR) << "Failed to add event to binlog";
    break;
  }

  if (reply) {
    FilePosition file_pos =
        binlog_->GetBinlogPosition().latest_master_position;
    if (!SendSemiSyncReply(file_pos)) {
      LOG(WARNING) << "Failed to send semi sync reply";
      break;
    } else {
      LOG(INFO) << "SendSemiSyncReply is true: send smi sync reply succeed";
    }
  }

Ripple logs:

r_port=3306 -ripple_master_user=repl_user -ripple_master_password=root123 -ripple_server_ports=15000 -ripple_semi_sync_slave_enabled=true
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0302 05:46:51.246495 114714 rippled.cc:48] InitPlugins
I0302 05:46:51.246569 114714 rippled.cc:60] Setup
I0302 05:46:51.246594 114714 binlog.cc:307] Starting binlog recovery
I0302 05:46:51.246752 114714 binlog.cc:350] Scanning binlog file: binlog.000000
I0302 05:46:51.784989 114714 binlog.cc:417] Binlog recovery complete
binlog file: binlog.000000, offset: 11591202, gtid: 02b13200-56ca-11ea-945b-000d3aa16eab:0-0-1
I0302 05:46:51.785027 114714 rippled.cc:121] Recovered binlog
I0302 05:46:51.785503 114714 mysql_server_port_tcpip.cc:150] Listen on host: localhost, port: 15000
I0302 05:46:51.785544 114714 rippled.cc:62] Start
I0302 05:46:51.785815 114716 mysql_master_session.cc:182] Master session starting
I0302 05:46:51.789144 114716 mysql_client_connection.cc:148] connected to host: 172.16.4.4, port: 3306
I0302 05:46:51.791885 114716 mysql_master_session.cc:137] Connected to host: 172.16.4.4, port: 3306, server_id: 1, server_name:
I0302 05:46:51.793169 114716 mysql_master_session.cc:202] master has semi sync enabled
I0302 05:46:51.793195 114716 mysql_master_session.cc:207] start replicating from '02b13200-56ca-11ea-945b-000d3aa16eab:0-0-1'
I0302 05:46:51.793565 114716 mysql_master_session.cc:230] Master session entering main loop
I0302 05:46:51.793905 114716 binlog.cc:626] Update binlog position to end_pos: binlog.000000:11591265, gtid: 0-0-1

I0302 05:46:51.793931 114716 mysql_master_session.cc:255] method: 1
I0302 05:46:51.793941 114716 mysql_master_session.cc:256] reply from semi sync is: 1

I0302 05:46:51.793952 114716 binlog.cc:616] Skip writing event [ Previous_gtids len = 27 ]
I0302 05:46:51.793967 114716 binlog.cc:626] Update binlog position to end_pos: binlog.000000:11591265, gtid: 0-0-1

I0302 05:46:51.793999 114716 mysql_master_session.cc:269] SendSemiSyncReply is true: send smi sync reply succeed

I0302 05:46:51.794016 114716 mysql_master_session.cc:255] method: 1
I0302 05:46:51.794021 114716 mysql_master_session.cc:256] reply from semi sync is: 1
I0302 05:46:51.794045 114716 mysql_master_session.cc:269] SendSemiSyncReply is true: send smi sync reply succeed
E0302 05:46:51.794075 114716 mysql_master_session.cc:303] Failed to read packet: Got error reading packet from server: Lost connection to MySQL server during query
I0302 05:46:51.794091 114716 binlog.cc:675] Connection closed last position binlog file: binlog.000000:11591265, gtid: 0-0-1
I0302 05:46:51.794113 114716 mysql_master_session.cc:278] Disconnecting from master
I0302 05:46:51.795538 114716 mysql_client_connection.cc:148] connected to host: 172.16.4.4, port: 3306
I0302 05:46:51.798288 114716 mysql_master_session.cc:137] Connected to host: 172.16.4.4, port: 3306, server_id: 1, server_name:
I0302 05:46:51.799501 114716 mysql_master_session.cc:202] master has semi sync enabled
I0302 05:46:51.799525 114716 mysql_master_session.cc:207] start replicating from '02b13200-56ca-11ea-945b-000d3aa16eab:0-0-1'
I0302 05:46:51.799897 114716 mysql_master_session.cc:230] Master session entering main loop
....

Seems running ??? But never stop...

Does that means ripple are sending semi-sync replies back ?

@pivanof
Copy link
Contributor

pivanof commented Mar 2, 2020

If semi_sync_reply is always 0, it means that master doesn't request for the semi-sync ack. It sounds a little strange, why would master do that. Are you using vanilla MySQL?

@moumouwan
Copy link
Author

No, I don't even know what is vanilla MySQL, I reran again, with the logging, so looks like semi_sync_reply do change to 1 at some time,

code:
bool reply = semi_sync_reply && GetSemiSyncSlaveReplyActive();

  LOG(INFO) << "reply from semi sync is: " << reply;
  if (!binlog_->AddEvent(event, reply)) {
    LOG(ERROR) << "Failed to add event to binlog";
    break;
  }
  //LOG(INFO) << "Before if reply, current value of reply is : " << reply;
  if (reply) {
    FilePosition file_pos =
        binlog_->GetBinlogPosition().latest_master_position;
    if (!SendSemiSyncReply(file_pos)) {
      LOG(WARNING) << "Failed to send semi sync reply";
      break;
    } else {
      LOG(INFO) << "SendSemiSyncReply is true: send smi sync reply succeed";
    }
  }

``
Before running:

Master
mysql> show variables like '%Rpl%';
+-------------------------------------------+------------+
| Variable_name | Value |
+-------------------------------------------+------------+
| rpl_semi_sync_master_enabled | ON |
| rpl_semi_sync_master_timeout | 100000 |
| rpl_semi_sync_master_trace_level | 32 |
| rpl_semi_sync_master_wait_for_slave_count | 1 |
| rpl_semi_sync_master_wait_no_slave | ON |
| rpl_semi_sync_master_wait_point | AFTER_SYNC |
| rpl_stop_slave_timeout | 31536000 |
+-------------------------------------------+------------+:

Start running a short 60s sys bench test:

Master log from beginning:

2020-03-03T03:58:45.134848Z 5 [Note] Start binlog_dump to master_thread_id(5) slave_server(112211), pos(, 4)
2020-03-03T03:58:45.134905Z 5 [Note] Start semi-sync binlog_dump to slave (server_id: 112211), pos(, 4)
2020-03-03T03:59:27.940465Z 0 [ERROR] /usr/sbin/mysqld: Got packets out of order
2020-03-03T03:59:27.940518Z 0 [ERROR] /usr/sbin/mysqld: Got packets out of order
2020-03-03T03:59:27.940552Z 0 [ERROR] /usr/sbin/mysqld: Couldn't uncompress communication packet
2020-03-03T03:59:27.940567Z 0 [ERROR] /usr/sbin/mysqld: Got packets out of order
2020-03-03T03:59:27.940586Z 0 [ERROR] /usr/sbin/mysqld: Got packets out of order
2020-03-03T03:59:27.940601Z 0 [ERROR] /usr/sbin/mysqld: Got an error reading communication packets

2020-03-03T03:59:27.940804Z 5 [ERROR] Semi-sync master failed on net_flush() before waiting for slave reply

2020-03-03T03:59:27.940834Z 5 [Note] Stop semi-sync binlog_dump to slave (server_id: 112211)
2020-03-03T03:59:27.940854Z 5 [Note] Aborted connection 5 to db: 'unconnected' user: 'repl_user' host: 'perf-slave.internal.cloudapp.net' (Found net error)
2020-03-03T03:59:27.946870Z 20 [Note] Start binlog_dump to master_thread_id(20) slave_server(112211), pos(, 4)
2020-03-03T03:59:27.946910Z 20 [Note] Start semi-sync binlog_dump to slave (server_id: 112211), pos(, 4)
2020-03-03T03:59:27.947564Z 20 [ERROR] Semi-sync master failed on net_flush() before waiting for slave reply
2020-03-03T03:59:27.947590Z 20 [Note] Stop semi-sync binlog_dump to slave (server_id: 112211)
2020-03-03T03:59:27.947609Z 20 [Note] Aborted connection 20 to db: 'unconnected' user: 'repl_user' host: 'perf-slave.internal.cloudapp.net' (Found net error)

…...

2020-03-03T03:59:28.107392Z 29 [Note] Start binlog_dump to master_thread_id(29) slave_server(112211), pos(, 4)
2020-03-03T03:59:28.107436Z 29 [Note] Start semi-sync binlog_dump to slave (server_id: 112211), pos(, 4)

2020-03-03T04:01:07.939990Z 10 [Warning] Timeout waiting for reply of binlog (file: mysql-bin.000003, pos: 1000), semi-sync up to file , position 4.

2020-03-03T04:01:07.940046Z 10 [Note] Semi-sync replication switched OFF.

Ripple:

WARNING: Logging before InitGoogleLogging() is written to STDERR)
I0303 03:58:45.077606 58650 rippled.cc:48] InitPlugins
I0303 03:58:45.077679 58650 rippled.cc:60] Setup
I0303 03:58:45.077705 58650 binlog.cc:307] Starting binlog recovery
I0303 03:58:45.077872 58650 binlog.cc:350] Scanning binlog file: binlog.000000
I0303 03:58:45.078395 58650 binlog.cc:417] Binlog recovery complete
binlog file: binlog.000000, offset: 13645, gtid: 02b13200-56ca-11ea-945b-000d3aa16eab:0-0-20
I0303 03:58:45.078423 58650 rippled.cc:121] Recovered binlog
I0303 03:58:45.078850 58650 mysql_server_port_tcpip.cc:150] Listen on host: localhost, port: 15000
I0303 03:58:45.078891 58650 rippled.cc:62] Start
I0303 03:58:45.079146 58652 mysql_master_session.cc:182] Master session starting_datadir=/mnt/data -ripple_master_address=172.16.4.4 -ripple_masterI0303 03:58:45.123920 58652 mysql_client_connection.cc:148] connected to host: 172.16.4.4, port: 3306
I0303 03:58:45.127638 58652 mysql_master_session.cc:137] Connected to host: 172.16.4.4, port: 3306, server_id: 1, server_name:
I0303 03:58:45.128952 58652 mysql_master_session.cc:202] master has semi sync enabled
I0303 03:58:45.128983 58652 mysql_master_session.cc:207] start replicating from '02b13200-56ca-11ea-945b-000d3aa16eab:0-0-20'
I0303 03:58:45.129371 58652 mysql_master_session.cc:230] Master session entering main loop
I0303 03:58:45.129760 58652 binlog.cc:626] Update binlog position to end_pos: binlog.000000:13708, gtid: 0-0-20
I0303 03:58:45.129787 58652 mysql_master_session.cc:256] reply from semi sync is: 0
I0303 03:58:45.129814 58652 binlog.cc:616] Skip writing event [ Previous_gtids len = 67 ]
I0303 03:58:45.129837 58652 binlog.cc:626] Update binlog position to end_pos: binlog.000000:13708, gtid: 0-0-20
………

I0303 03:58:45.229821 58652 mysql_master_session.cc:256] reply from semi sync is: 0
I0303 03:59:27.934952 58652 mysql_master_session.cc:256] reply from semi sync is: 0
I0303 03:59:27.934984 58652 mysql_master_session.cc:256] reply from semi sync is: 1
I0303 03:59:27.935034 58652 binlog.cc:626] Update binlog position to end_pos: binlog.000000:14594, gtid: 0-0-21

I0303 03:59:27.935078 58652 mysql_master_session.cc:269] SendSemiSyncReply is true: send smi sync reply succeed

E0303 03:59:27.935127 58652 mysql_master_session.cc:303] Failed to read packet: Got error reading packet from server: Lost connection to MySQL server during query
I0303 03:59:27.935153 58652 binlog.cc:675] Connection closed last position binlog file: binlog.000000:14594, gtid: 0-0-21
I0303 03:59:27.935178 58652 mysql_master_session.cc:278] Disconnecting from master
I0303 03:59:27.936650 58652 mysql_client_connection.cc:148] connected to host: 172.16.4.4, port: 3306
I0303 03:59:27.939680 58652 mysql_master_session.cc:137] Connected to host: 172.16.4.4, port: 3306, server_id: 1, server_name:
I0303 03:59:27.940980 58652 mysql_master_session.cc:202] master has semi sync enabled
I0303 03:59:27.941006 58652 mysql_master_session.cc:207] start replicating from '02b13200-56ca-11ea-945b-000d3aa16eab:0-0-21'
I0303 03:59:27.941398 58652 mysql_master_session.cc:230] Master session entering main loop
I0303 03:59:27.941757 58652 binlog.cc:626] Update binlog position to end_pos: binlog.000000:14657, gtid: 0-0-21
I0303 03:59:27.941782 58652 mysql_master_session.cc:256] reply from semi sync is: 0

I0303 03:59:27.941793 58652 binlog.cc:616] Skip writing event [ Previous_gtids len = 67 ]
I0303 03:59:27.941807 58652 binlog.cc:626] Update binlog position to end_pos: binlog.000000:14657, gtid: 0-0-21

I0303 03:59:27.941815 58652 mysql_master_session.cc:256] reply from semi sync is: 0
I0303 03:59:27.941835 58652 mysql_master_session.cc:256] reply from semi sync is: 0
......

I0303 03:59:27.942019 58652 mysql_master_session.cc:256] reply from semi sync is: 1
I0303 03:59:27.942057 58652 binlog.cc:626] Update binlog position to end_pos: binlog.000000:15543, gtid: 0-0-22

I0303 03:59:27.942097 58652 mysql_master_session.cc:269] SendSemiSyncReply is true: send smi sync reply succeed

E0303 03:59:27.942142 58652 mysql_master_session.cc:303] Failed to read packet: Got error reading packet from server: Lost connection to MySQL server during query
I0303 03:59:27.942167 58652 binlog.cc:675] Connection closed last position binlog file: binlog.000000:15543, gtid: 0-0-22

@moumouwan
Copy link
Author

So far
Master:

  1. rpl_semi_sync_master_wait_for_slave_count =1 since mysql start
  2. Those "[ERROR] /usr/sbin/mysqld: Got packets out of order" on master are the first line of log when I am start running sys bench.
  3. I set rpl_semi_sync_master_timeout to 100000 which I think should be enough for now ? (As running semi sync without ripple and I set to to default 1s don't have any issues)

Ripple side,

  1. semi_sync_reply change to 1 and sending reply back succeed
    https://github.com/google/mysql-ripple/blob/master/mysql_master_session.cc#L261
  2. Not sure why it's throwing this "E0303 03:59:27.942142 58652 mysql_master_session.cc:303] Failed to read packet: Got error reading packet from server: Lost connection to MySQL server during query"

@moumouwan
Copy link
Author

One more thing, master & ripple are on 2 VMs under same vnet/region/OS/disk.. etc

@moumouwan
Copy link
Author

Any updates ?

@pivanof
Copy link
Contributor

pivanof commented Mar 15, 2020

Sorry for the late reply...
Looking through the logs I have an impression that things work properly initially. First some binlog events don't need sem-sync reply, because they are in the middle of a transaction, thus master doesn't request the reply. Then COMMIT event requires reply and master requests it. Ripple successfully sends the reply, but then the master for some reason doesn't understand this reply and breaks the connection from Ripple. Then Ripple reconnects, again receives a few binlog events, but when it tries to send semi-sync reply, master breaks the connection.
Why master doesn't understand what Ripple sends to it, I'm not sure. I guess some more debugging inside MySQL code needs to be done to understand that... Which exact MySQL version are you using? Did you just downloaded it from MySQL website?

@moumouwan
Copy link
Author

Thank you for you reply, i am using mysql Ver 14.14 Distrib 5.7.29, for Linux (x86_64), I installed on ubuntu linux Azure VM and I only ran two commands: sudo apt update, sudo apt install mysql-server

mysql> show variables like '%version%';
+-------------------------+-------------------------+
| Variable_name | Value |
+-------------------------+-------------------------+
| innodb_version | 5.7.29 |
| protocol_version | 10 |
| slave_type_conversions | |
| tls_version | TLSv1,TLSv1.1,TLSv1.2 |
| version | 5.7.29-0ubuntu0.18.04.1 |
| version_comment | (Ubuntu) |
| version_compile_machine | x86_64 |
| version_compile_os | Linux |
+-------------------------+-------------------------+

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants