Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clone] Persist synchronization gtid from P_S.log_status #1450

Closed
wants to merge 2 commits into from

Conversation

sunxiayi
Copy link

Summary:

Summary

When synchronizing engines, get the synchronization coordinates from P_S.log_status table, send from the server plugin to client plugin. Upon receiving, client plugin writes to a file #clone/#synchronization_coordinates.

Approach

Protocol
Add a new response type COM_RES_GTID_V4 and update latest protocol to CLONE_PROTOCOL_VERSION_V4.

Common
synchronize_engines() call was in Ha_clone_common_cbk. But it actually is only called from server or local, not client. So I move this function into the child class(client, server, local, with client implementation is a no-op). The reason I make this change is because, I want to send the gtid from the plugin layer without calling into engine again, and moving synchronize_engines() into Server_Cbk has the advantage that I can get the server handler, while in Ha_clone_common_cbk I cannot.

The log_status query and set_log_stop steps are moved into a new function synchronize_logs under Ha_clone_common_cbk.

populate_synchronization_coordinates would populate a Key_Values data structure of:

  • gtid(from log_status table)
  • binlog_file
  • binlog_offset
  • gtid(from binlog_file and binlog_offset)

We want to record both gtid from log_status and from binlog_file/offset because they seem to be out of sync in prod, and need a way to confirm this.

Client
synchronize_engines() is a no-op and errors out upon calling.

Upon receiving COM_RES_GTID_V4, deserialize the coordinates and persist that in #clone/#synchronization_coordinates.

Server
synchronize_engines() would call synchronize_logs and get the server handle to send the coordinates one by one, utilizing existing helper functions.

Local
synchronize_engines() would call synchronize_logs, then get the client handle to persist coordinates in #clone/#synchronization_coordinates.

Handle version mismatch
In server, only send coordinate if negotiated version >= V4.
In client, #synchronization_coordinates file is cleaned upon start of clone.

Test Plan:

MTR

local_create_synchronization_coordinates
remote_create_synchronization_coordinates

Local clone

Test on my debug build in devserver after install plugin clone SONAME 'mysql_clone.so';:
1/ mysql> CLONE LOCAL DATA DIRECTORY = '/home/sunxiayi/mysql/mysql-fork/_build-8.0-Debug/mysql-test/var/tmp/mysqld.1/data_new';
2/ Check synchronization coordinate from log_status is the same as in file:

mysql> select local from performance_schema.log_status;
+------------------------------------------------------------------------------------------------------------------------------------+
| local                                                                                                                              |
+------------------------------------------------------------------------------------------------------------------------------------+
| {"gtid_executed": "e0e4654f-00fb-11ef-815f-95ead878d1b0:1-3", "binary_log_file": "master-bin.000001", "binary_log_position": 1019} |
+------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

2024-03-29T21:35:31.654256Z 13 [Note] [MY-013273] [Clone] Plugin Clone reported: 'Server: w_local: {"gtid_executed": "279dd150-f06c-11ee-abb5-60fac8fc94f1:1-4", "binary_log_file": "master-bin.000001", "binary_log_position": 1019}.'
# with binlog
[sunxiayi@devvm7592.atn0 ~/mysql/mysql-fork/_build-8.0-Debug/mysql-test/var/tmp/mysqld.1/data_new_1/#clone (128f0405)]$ cat '#synchronization_coordinates'
gtid_from_log_status
2287bbe2-02be-11ef-aef5-9d0d68e44093:1-2
binary_log_file
master-bin.000001
binary_log_position
736
gtid_from_binlog_file_offset
2287bbe2-02be-11ef-aef5-9d0d68e44093:1-2

# without binlog
[sunxiayi@devvm7592.atn0 ~/mysql/mysql-fork/_build-8.0-Debug/mysql-test/var/tmp/mysqld.1/data_new_2/#clone (128f0405)]$ cat '#synchronization_coordinates'
gtid_from_log_status
2287bbe2-02be-11ef-aef5-9d0d68e44093:1-2
binary_log_file
master-bin.000001
binary_log_position
736

Remote clone, normal

1/ Take udb35350.ftw5:3301 as the test server, install my debug build. Take udb12221.atn5:3301 as the client server, install my debug build. Install plugin on both servers.
2/ On udb12221.atn5:3301, do

(admin:sys.database@udb12221.atn5 3301) [(none)]> SET GLOBAL clone_valid_donor_list = 'udb35350.ftw5.facebook.com:3301';
(admin:sys.database@udb12221.atn5 3301) [(none)]> set global clone_enable_compression="on";
(admin:sys.database@udb12221.atn5 3301) [(none)]> set enable_block_stale_hlc_read=0;
(admin:sys.database@udb12221.atn5 3301) [(none)]> set allow_noncurrent_db_rw=on;
(admin:sys.database@udb12221.atn5 3301) [(none)]> CLONE INSTANCE FROM 'dba_scripts:sys.database'@'udb35350.ftw5.facebook.com':3301 IDENTIFIED BY '' REQUIRE SSL;

3/ Check the new file is written correctly
4/ Repeat the clone, file is overwritten correctly.

Remote clone, sev scenario, apply-logs

Issue a clone command using instance whose mysql.gtid_executed table has hole as donor in the raft world. Also this is copying from secondary to secondary meaning binlog is an apply-logs. Check file on recipient is correct.

Remote clone, donor and client version mismatch

Only update recipient. Clone finishes, file is not there.

[root@udb12221.atn5 /data/mysql/3304/#clone]# ls
'#old_files'  '#replace_files'  '#status_fix'  '#view_progress'  '#view_status'

Only update donor. Clone finishes, file is not there.

[root@udb12221.atn5 /data/mysql/3305/#clone]# ls
'#old_files'  '#replace_files'  '#status_fix'  '#view_progress'  '#view_status'

Update both and do a successful clone. Then only update client. File is not there.

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: https://phabricator.intern.facebook.com/D55614528

@facebook-github-bot
Copy link

Hi @sunxiayi!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@facebook-github-bot
Copy link

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

Copy link
Contributor

@laurynas-biveinis laurynas-biveinis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry for a partial review, but a full review depends on how some of the current comments are going to be resolved.

What is the plan with the two sets of binlog positions, once the mismatch is fixed or its absence is confirmed?

# synchronization_coordinates file contains 4 key/val pairs, only examine 3 Here
# excluding gtid get from binlog_file/offset

--source include/have_example_plugin.inc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

@@ -0,0 +1,104 @@
# Test after local clone command, synchronization_coordinates file is created
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Placing the tests in rocksdb suite tests synchronization between binlog and InnoDB. Consider moving it to rocksdb_clone and adding MyRocks tables too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment still relevant

DROP TABLE t1;
UNINSTALL PLUGIN clone;
--force-rmdir $CLONE_DATADIR
--exec rm -f $MYSQLTEST_VARDIR/tmp/v_local.json;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the MTR command, IIIRC remove-file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved


int Client::set_syncronization_coordinate(const uchar *packet, size_t length) {
Key_Value syncronization_coordinate;
auto err = extract_key_value(packet, length, syncronization_coordinate);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto err = extract_key_value(packet, length, syncronization_coordinate);
const auto err = extract_key_value(packet, length, syncronization_coordinate);

if (err == 0) {
persist_syncronization_coordinate(syncronization_coordinate);
}
return (err);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return (err);
return err;

@@ -1962,4 +1982,9 @@ int Client_Cbk::apply_cbk(Ha_clone_file to_file, bool apply_file,
return (err);
}

[[nodiscard]] int Client_Cbk::synchronize_engines() {
my_error(ER_NOT_SUPPORTED_YET, MYF(0), "Remote Clone Client");
return (ER_NOT_SUPPORTED_YET);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return (ER_NOT_SUPPORTED_YET);
return ER_NOT_SUPPORTED_YET;

plugin/clone/src/clone_common.cc Show resolved Hide resolved
}
auto server = get_clone_server();
if (server->should_send_synchronization_coordinates()) {
for (auto &coordinate : synchronization_coordinates) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (auto &coordinate : synchronization_coordinates) {
for (const auto &coordinate : synchronization_coordinates) {

@sunxiayi
Copy link
Author

sunxiayi commented May 7, 2024

I am sorry for a partial review, but a full review depends on how some of the current comments are going to be resolved.

What is the plan with the two sets of binlog positions, once the mismatch is fixed or its absence is confirmed?

in the client side, we can reset replica and set global gtid_purged so it can resume replication from that gtid.

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: https://phabricator.intern.facebook.com/D55614528
@laurynas-biveinis
Copy link
Contributor

laurynas-biveinis commented May 8, 2024 via email

@sunxiayi
Copy link
Author

sunxiayi commented May 8, 2024

sunxiayi @.***> writes:
What is the plan with the two sets of binlog positions, once the mismatch is fixed or its absence is confirmed? in the client side, we can reset replica and set global gtid_purged so it can resume replication from that gtid.
Right, but I am asking specifically about the two sets of binlog positions that should be equal: one from pfs.log_status and one from the binlog itself.

If the two sets match, we would use that gtid to resume replication in client.
If they are not the same, we would abort the clone, log the mismatch details, then make plans to fix the bug.

Copy link
Contributor

@laurynas-biveinis laurynas-biveinis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, minor comments only

plugin/clone/include/clone_client.h Outdated Show resolved Hide resolved
plugin/clone/include/clone_common.h Show resolved Hide resolved
plugin/clone/src/clone_common.cc Outdated Show resolved Hide resolved
plugin/clone/include/clone_common.h Outdated Show resolved Hide resolved
plugin/clone/include/clone_common.h Outdated Show resolved Hide resolved
plugin/clone/src/clone_status.cc Outdated Show resolved Hide resolved
plugin/clone/src/clone_status.cc Outdated Show resolved Hide resolved
plugin/clone/src/clone_status.cc Outdated Show resolved Hide resolved
plugin/clone/include/clone_common.h Show resolved Hide resolved
plugin/clone/include/clone_status.h Outdated Show resolved Hide resolved
@@ -17,6 +17,7 @@
#ifndef CLONE_COMMON_H
#define CLONE_COMMON_H

#include "clone.h"
#include "sql/handler.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#include <string_view>

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is it used for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::string_view for a get_json_object parameter

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird I can use string_view without declaring the header still?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably one of the other headers pulls it in, but

  1. transitive includes are brittle as the other headers (including standard library ones) may stop including it at any time
  2. source files and headers should include their dependencies directly - this is also something that tooling like include-what-you-use and clang-tidy enforce

plugin/clone/src/clone_client.cc Show resolved Hide resolved
plugin/clone/src/clone_common.cc Outdated Show resolved Hide resolved
plugin/clone/src/clone_common.cc Show resolved Hide resolved
plugin/clone/include/clone_status.h Outdated Show resolved Hide resolved
plugin/clone/src/clone_status.cc Outdated Show resolved Hide resolved
plugin/clone/src/clone_status.cc Outdated Show resolved Hide resolved
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: https://phabricator.intern.facebook.com/D55614528
@facebook-github-bot
Copy link

@sunxiayi has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants