-
Notifications
You must be signed in to change notification settings - Fork 329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release notes for Percona XtraBackup 2.2.9 #2
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
akopytov
added a commit
that referenced
this pull request
Feb 16, 2015
Release notes for Percona XtraBackup 2.2.9
roidelapluie
added a commit
to roidelapluie/percona-xtrabackup
that referenced
this pull request
Jun 9, 2015
The latest version of Xtrabackup issues a FLUSH TABLE before the FTWRL. While it will help the backups in some situation, it also implies that FLUSH TABLE will be written to the binary log. While this situation is okay most of the time, it is not in MariaDB 10.0 with GTID enabled. If the backup is taken on the Slave, then the FLUSH TABLE statement is still written to the binary log. This alters the GTID of that slave and XtraBackup does not see the "correct" GTID anymore. It is then impossible to see the GTID of the backup, because we would only see the "wrong" GTID that was written to the binary log. To avoid that, we have to issue the flush command with the NO_WRITE_TO_BINLOG option. Side note #1: Our setup: MariaDB 10.0; GTID Replication, with GTID strict mode, and log-slave-updates=0. Side note percona#2: There is no problem with the FTWRL because it is never written to the binlog.
gl-sergei
pushed a commit
that referenced
this pull request
Mar 1, 2016
Problem: The binary log group commit sync is failing when committing a group of transactions into a non-transactional storage engine while other thread is rotating the binary log. Analysis: The binary log group commit procedure (ordered_commit) acquires LOCK_log during the #1 stage (flush). As it holds the LOCK_log, a binary log rotation will have to wait until this flush stage to finish before actually rotating the binary log. For the #2 stage (sync), the binary log group commit only holds the LOCK_log if sync_binlog=1. In this case, the rotation has to wait also for the sync stage to finish. When sync_binlog>1, the sync stage releases the LOCK_log (to let other groups to enter the flush stage), holding only the LOCK_sync. In this case, the rotation can acquire the LOCK_log in parallel with the sync stage. For commits into transactional storage engine, the binary log rotation checks a counter of "flushed but not yet committed" transactions, waiting until this counter to be zeroed before closing the current binary log file. As the commit of the transactions happen in the #3 stage of the binary log group commit, the sync of the binary log in stage #2 always succeed. For commits into non-transactional storage engine, the binary log rotation is checking the "flushed but not yet committed" transactions counter, but it is zero because it only counts transactions that contains XIDs. So, the rotation is allowed to take place in parallel with the #2 stage of the binary log group commit. When the sync is called at the same time that the rotation has closed the old binary log file but didn't open the new file yet, the sync is failing with the following error: 'Can't sync file 'UNOPENED' to disk (Errcode: 9 - Bad file descriptor)'. Fix: For non-transactional only workload, binary log group commit will keep the LOCK_log when entering #2 stage (sync) if the current group is supposed to be synced to the binary log file.
VasilyNemkov
referenced
this pull request
in VasilyNemkov/percona-xtrabackup
Sep 29, 2017
TO DISSAPPEAR" Problem ------- The test case is failing to make the slave server to "disappear". Analysis -------- The "crash_in_a_worker" debug sync point is relying on the fact that the workload will be parallelized and reach MTS worker #2, but on slow systems the parallelization will not happen and the server will fail to "disappear". Fix --- Ensure that the workload will be distributed by at all the workers even on slow systems.
EvgeniyPatlan
added a commit
that referenced
this pull request
Apr 18, 2018
PXB-1530 addenum - fix parentheses
gl-sergei
pushed a commit
that referenced
this pull request
Jul 19, 2019
…E TO A SERVER Problem ======================================================================== Running the GCS tests with ASAN seldomly reports a user-after-free of the server reference that the acceptor_learner_task uses. Here is an excerpt of ASAN's output: ==43936==ERROR: AddressSanitizer: heap-use-after-free on address 0x63100021c840 at pc 0x000000530ff8 bp 0x7fc0427e8530 sp 0x7fc0427e8520 WRITE of size 8 at 0x63100021c840 thread T3 #0 0x530ff7 in server_detected /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:962 #1 0x533814 in buffered_read_bytes /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:1249 #2 0x5481af in buffered_read_msg /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:1399 #3 0x51e171 in acceptor_learner_task /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:4690 #4 0x562357 in task_loop /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/task.c:1140 #5 0x5003b2 in xcom_taskmain2 /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:1324 #6 0x6a278a in Gcs_xcom_proxy_impl::xcom_init(unsigned short, node_address*) /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_proxy.cc:164 #7 0x59b3c1 in xcom_taskmain_startup /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_control_interface.cc:107 #8 0x7fc04a2e4dd4 in start_thread (/lib64/libpthread.so.0+0x7dd4) #9 0x7fc047ff2bfc in __clone (/lib64/libc.so.6+0xfebfc) 0x63100021c840 is located 64 bytes inside of 65688-byte region [0x63100021c800,0x63100022c898) freed by thread T3 here: #0 0x7fc04a5d7508 in __interceptor_free (/lib64/libasan.so.4+0xde508) #1 0x52cf86 in freesrv /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:836 #2 0x52ea78 in srv_unref /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:868 #3 0x524c30 in reply_handler_task /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:4914 #4 0x562357 in task_loop /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/task.c:1140 #5 0x5003b2 in xcom_taskmain2 /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:1324 #6 0x6a278a in Gcs_xcom_proxy_impl::xcom_init(unsigned short, node_address*) /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_proxy.cc:164 #7 0x59b3c1 in xcom_taskmain_startup /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_control_interface.cc:107 #8 0x7fc04a2e4dd4 in start_thread (/lib64/libpthread.so.0+0x7dd4) previously allocated by thread T3 here: #0 0x7fc04a5d7a88 in __interceptor_calloc (/lib64/libasan.so.4+0xdea88) #1 0x543604 in mksrv /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:721 #2 0x543b4c in addsrv /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:755 #3 0x54af61 in update_servers /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:1747 #4 0x501082 in site_install_action /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:1572 #5 0x55447c in import_config /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/site_def.c:486 #6 0x506dfc in handle_x_snapshot /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:5257 #7 0x50c444 in xcom_fsm /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:5325 #8 0x516c36 in dispatch_op /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:4510 #9 0x521997 in acceptor_learner_task /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:4772 #10 0x562357 in task_loop /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/task.c:1140 #11 0x5003b2 in xcom_taskmain2 /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:1324 #12 0x6a278a in Gcs_xcom_proxy_impl::xcom_init(unsigned short, node_address*) /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_proxy.cc:164 #13 0x59b3c1 in xcom_taskmain_startup /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_control_interface.cc:107 #14 0x7fc04a2e4dd4 in start_thread (/lib64/libpthread.so.0+0x7dd4) Analysis ======================================================================== The server structure is reference counted by the associated sender_task and reply_handler_task. When they finish, they unreference the server, which leads to its memory being freed. However, the acceptor_learner_task keeps a "naked" reference to the server structure. Under the right ordering of operations, i.e. the sender_task and reply_handler_task terminating after the acceptor_learner_task acquires, but before it uses, the reference to the server structure, leads to the acceptor_learner_task accessing the server structure after it has been freed. Solution ======================================================================== Let the acceptor_learner_task also reference count the server structure so it is not freed while still in use. Reviewed-by: André Negrão <andre.negrao@oracle.com> Reviewed-by: Venkatesh Venugopal <venkatesh.venugopal@oracle.com> RB: 21209
rahulmalik87
added a commit
that referenced
this pull request
Aug 11, 2020
PXB-2243 undo tablespaces are corrupted if undo truncation happens between full backup and incremental
rahulmalik87
pushed a commit
that referenced
this pull request
Feb 4, 2021
To call a service implementation one needs to: 1. query the registry to get a reference to the service needed 2. call the service via the reference 3. call the registry to release the reference While #2 is very fast (just a function pointer call) #1 and #3 can be expensive since they'd need to interact with the registry's global structure in a read/write fashion. Hence if the above sequence is to be repeated in a quick succession it'd be beneficial to do steps #1 and #3 just once and aggregate as many #2 steps in a single sequence. This will usually mean to cache the service reference received in #1 and delay 3 for as much as possible. But since there's an active reference held to the service implementation until 3 is taken special handling is needed to make sure that: The references are released at regular intervals so changes in the registry can become effective. There is a way to mark a service implementation as "inactive" ("dying") so that until all of the active references to it are released no new ones are possible. All of the above is part of the current audit API machinery, but needs to be isolated into a separate service suite and made generally available to all services. This is what this worklog aims to implement. RB#24806
rahulmalik87
pushed a commit
that referenced
this pull request
Feb 4, 2021
HENCE ABORTING THE SERVER. Description: ------------ When ‘gtid_purged’ is set to its max value, server stops after executing the next transaction with an error, 'ERROR 1598 (HY000): Binary logging not possible. Message: An error occurred during flush stage of the commit.‘binlog_error_action’ is set to ‘ABORT_SERVER’. Hence aborting the server.' Analysis: --------- Reason for server is being stopped is due to max-out of GTID's integer component(GNO) while assigning new automatic GTID. - When gtid_purgedis set to CONCAT(@@GLOBAL.server_uuid,':1-9223372036854775805'), server updates gtid_executed with the same value. - During the second transaction, when assigning new automatic GTID, GTID(GNO) hits the max_limit(9223372036854775807). - Server returns error from get_automatic_gno(). Then sets binlog_error_action=ABORT_SERVER. - Server then prints out the error message and triggers abort signal. - It is documented that the server shuts down immediately if the binary log cannot be written: 'https://dev.mysql.com/doc/refman/8.0/en/ replication-options-binary-log.html #sysvar_binlog_error_action' Hence, Server shutdown is intentional, and default behavior. Error message text "An error occurred during flush stage of the commit" is imprecise and a bit internal. It would be better to mention that the limit for generated GTIDs has been reached, and suggest how to fix the problem. There is also no warning message when system getting close to GTID max limit. Fix: ---- 1. Give a better error message when exhausting the range and acting according to binlog_error_action=ABORT_SERVER. 2. Set GTID Threshold as 99% of the max GTID limit. Generate a warning message in the error log when, - auto generated GTID is above threshold. - setting gtid above threshold using SET gtid_purged. Point #2 is only implemented for mysql-8.0 onwards. RB#25130
rahulmalik87
pushed a commit
that referenced
this pull request
Feb 4, 2021
A heap-buffer-overflow in libmyqlxclient when - auth-method is MYSQL41 - the "server" sends a nonce that is shortert than 20 bytes. ==2466857==ERROR: AddressSanitizer: heap-buffer-overflow on address #0 0x4a7b76 in memcpy (routertest_component_routing_splicer+0x4a7b76) #1 0x7fd3a1d89052 in SHA1_Update (/libcrypto.so.1.1+0x1c2052) #2 0x63409c in compute_mysql41_hash_multi(unsigned char*, char const*, unsigned int, char const*, unsigned int) ... RB: 25305 Reviewed-by: Lukasz Kotula <lukasz.kotula@oracle.com>
rahulmalik87
pushed a commit
that referenced
this pull request
Feb 4, 2021
TABLESPACE STATE DOES NOT CHANGE THE SPACE TO EMPTY After the commit for Bug#31991688, it was found that an idle system may not ever get around to truncating an undo tablespace when it is SET INACTIVE. Actually, it takes about 128 seconds before the undo tablespace is finally truncated. There are three main tasks for the function trx_purge(). 1) Process the undo logs and apply changes to the data files. (May be multiple threads) 2) Clean up the history list by freeing old undo logs and rollback segments. 3) Truncate undo tablespaces that have grown too big or are SET INACTIVE explicitly. Bug#31991688 made sure that steps 2 & 3 are not done too often. Concentrating this effort keeps the purge lag from growing too large. By default, trx_purge() does step#1 128 times before attempting steps #2 & #3 which are called 'truncate' steps. This is set by the setting innodb_purge_rseg_truncate_frequency. On an idle system, trx_purge() is called once per second if it has nothing to do in step 1. After 128 seconds, it will finally do steps 2 (truncating the undo logs and rollback segments which reduces the history list to zero) and step 3 (truncating any undo tablespaces that need it). The function that the purge coordinator thread uses to make these repeated calls to trx_purge() is called srv_do_purge(). When trx_purge() returns having done nothing, srv_do_purge() returns to srv_purge_coordinator_thread() which will put the purge thread to sleep. It is woke up again once per second by the master thread in srv_master_do_idle_tasks() if not sooner by any of several of other threads and activities. This is how an idle system can wait 128 seconds before the truncate steps are done and an undo tablespace that was SET INACTIVE can finally become 'empty'. The solution in this patch is to modify srv_do_purge() so that if trx_purge() did nothing and there is an undo space that was explicitly set to inactive, it will immediately call trx_purge again with do_truncate=true so that steps #2 and #3 will be done. This does not affect the effort by Bug#31991688 to keep the purge lag from growing too big on sysbench UPDATE NO_KEY. With this change, the purge lag has to be zero and there must be a pending explicit undo space truncate before this extra call to trx_purge is done. Approved by Sunny in RB#25311
rahulmalik87
pushed a commit
that referenced
this pull request
Feb 4, 2021
…TH VS 2019 [#2] [noclose] storage\ndb\src\kernel\blocks\backup\Backup.cpp(2807,37): warning C4805: '==': unsafe mix of type 'Uint32' and type 'bool' in operation Change-Id: I0582c4e40bcfc69cdf3288ed84ad3ac62c9e4b80
altmannmarcelo
pushed a commit
to altmannmarcelo/percona-xtrabackup
that referenced
this pull request
Apr 20, 2021
Use --cluster-config-suffix in mtr. Change-Id: I667984cfe01c597510c81f80802532df490bf5e6
altmannmarcelo
pushed a commit
that referenced
this pull request
Jul 24, 2021
…ING TABLESPACES The occurrence of this message is a minor issue fixed by change #1 below. But during testing, I found that if mysqld is restarted while remote and local tablespaces are discarded, especially if the tablespaces to be imported are already in place at startup, then many things can go wrong. There were various asserts that occurred depending on timing. During all the testing and debugging, the following changes were made. 1. Prevent the stats thread from complaining about a missing tablespace. See dict_stats_update(). 2. Prevent a discarded tablespace from being opened at startup, even if the table to be imported is already in place. See Validate_files::check(). 3. dd_tablespace_get_state_enum() was refactored to separate the normal way to do it in v8.0, which is to use "state" key in dd::tablespaces::se_private_date, from the non-standard way which is to check undo::spaces or look for the old key value pair of "discarded=true". This allowed the new call to this routine by the change in fix #2 above. 4. Change thd_tablespace_op() in sql/sql_thd_api.cc such that instead of returning 1 if the DDL requires an implicit tablespace, it returns the DDL operation flag. This can still be interpreted as a boolean, but it can also be used to determine if the op is an IMPORT or a DISCARD. 5. With that change, the annoying message that a space is discarded can be avoided during an import when it needs to be discarded. 6. Several test cases were corrected now that the useless "is discarded" warning is no longer being written. 7. Two places where dd_tablespace_set_state() was called to set the state to either "discard" or "normal" were consolidated to a new version of dd_tablespace_set_state(thd, dd_space_id, space_name, dd_state). 8. This new version of dd_tablespace_set_state() was used in dd_commit_inplace_alter_table() to make sure that in all three places the dd is changed to identify a discarded tablesapace, it is identified in dd:Tablespace::se_private_data as well as dd:Table::se_private_data or dd::Partition::se_private_data. The reason it is necessary to record this in dd::Tablespace is that during startup, boot_tablespaces() and Validate::files::check() are only traversing dd::Tablespace. And that is where fix #2 is done! 9. One of the asserts that occurred was during IMPORT TABLESPACE after a restart that found a discarded 5.7 tablespace in the v8.0 discarded location. This assert occurred in Fil_shard::get_file_size() just after ER_IB_MSG_272. The 5.7 file did not have the SDI flag, but the v8.0 space that was discarded did have that flag. So the flags did not match. That crash was fixed by setting the fil_space_t::flags to what it is in the tablespace header page. A descriptive comment was added. 10. There was a section in fil_ibd_open() that checked `if (space != nullptr) {` and if true, it would close and free stuff then immediately crash. I think I remember many years ago adding that assert because I did not think it actually occurred. Well it did occur during my testing before I added fix #2 above. This made fil_ibd_open() assume that the file was NOT already open. So fil_ibd_open() is now changed to allow for that possibility by adding `if (space != nullptr) {return DB_SUCCESS}` further down. Since fil_ibd_open() can be called with a `validate` boolean, the routine now attempts to do all the validation whether or not the tablespace is already open. The following are non-functional changes; - Many code documentation lines were added or improved. - dict_sys_t::s_space_id renamed to dict_sys_t::s_dict_space_id in order to clarify better which space_id it referred to. - For the same reason, change s_dd_space_id to s_dd_dict_space_id. - Replaced `table->flags2 & DICT_TF2_DISCARDED` with `dict_table_is_discarded(table)` in dict0load.cc - A redundant call to ibuf_delete_for_discarded_space(space_id) was deleted from fil_discard_tablespace() because it is also called higher up in the call stack in row_import_for_mysql(). - Deleted the declaration to `row_import_update_discarded_flag()` since the definition no longer exists. It was deleted when we switched from `discarded=true` to 'state=discarded' in dd::Tablespace::se_private_data early in v8.0 developement. Approved by Mateusz in RB#26077
altmannmarcelo
pushed a commit
to altmannmarcelo/percona-xtrabackup
that referenced
this pull request
Oct 29, 2021
Memory leaks detected when running testMgm with ASAN build. bld_asan$> mtr test_mgm Direct leak of 8 byte(s) in 1 object(s) allocated from: #0 0x3004ed in malloc (trunk/bld_asan/runtime_output_directory/testMgm+0x3004ed) #1 0x7f794d6b0b46 in ndb_mgm_create_logevent_handle trunk/bld_asan/../storage/ndb/src/mgmapi/ndb_logevent.cpp:85:24 percona#2 0x335b4b in runTestMgmApiReadErrorRestart(NDBT_Context*, NDBT_Step*) trunk/bld_asan/../storage/ndb/test/ndbapi/testMgm.cpp:652:32 Add support for using unique_ptr for all functions in mgmapi that return pointer to something that need to be released. Move existing functionality for ndb_mgm_configuration to same new file. Use ndb_mgm namespace for new functions and remove implementation details from both the new and old functionality Use new functionality to properly release allocated memory. Change-Id: Id455234077c4ed6756e93bf7f40a1e93179af1a0
altmannmarcelo
pushed a commit
to altmannmarcelo/percona-xtrabackup
that referenced
this pull request
Oct 29, 2021
Remove the unused "ndb_table_statistics_row" struct Change-Id: I62982d005d50a0ece7d92b3861ecfa8462a05661
altmannmarcelo
pushed a commit
to altmannmarcelo/percona-xtrabackup
that referenced
this pull request
Oct 29, 2021
Patch percona#2: Support multi-valued indexes for prepared statements. Parameters to prepared statements are not denoted as constant but constant during statement execution, however only constant values are considered for use with multi-valued indexes. Replace const_item() with const_for_execution() to enable use of such parameters with multi-valued indexes. This is a contribution by Yubao Liu. Change-Id: I8cf843a95d2657e5fcc67a04df65815f9ad3154a
altmannmarcelo
pushed a commit
to altmannmarcelo/percona-xtrabackup
that referenced
this pull request
Jan 26, 2022
This error happens for queries such as: SELECT ( SELECT 1 FROM t1 ) AS a, ( SELECT a FROM ( SELECT x FROM t1 ORDER BY a ) AS d1 ); Query_block::prepare() for query block percona#4 (corresponding to the 4th SELECT in the query above) calls setup_order() which again calls find_order_in_list(). That function replaces an Item_ident for 'a' in Query_block.order_list with an Item_ref pointing to query block percona#2. Then Query_block::merge_derived() merges query block percona#4 into query block percona#3. The Item_ref mentioned above is then moved to the order_list of query block percona#3. In the next step, find_order_in_list() is called for query block percona#3. At this point, 'a' in the select list has been resolved to another Item_ref, also pointing to query block percona#2. find_order_in_list() detects that the Item_ref in the order_list is equivalent to the Item_ref in the select list, and therefore decides to replace the former with the latter. Then find_order_in_list() calls Item::clean_up_after_removal() recursively (via Item::walk()) for the order_list Item_ref (since that is no longer needed). When calling clean_up_after_removal(), no Cleanup_after_removal_context object is passed. This is the actual error, as there should be a context pointing to query block percona#3 that ensures that clean_up_after_removal() only purge Item_subselect.unit if both of the following conditions hold: 1) The Item_subselect should not be in any of the Item trees in the select list of query block percona#3. 2) Item_subselect.unit should be a descendant of query block percona#3. These conditions ensure that we only purge Item_subselect.unit if we are sure that it is not needed elsewhere. But without the right context, query block percona#2 gets purged even if it is used in the select lists of query blocks #1 and percona#3. The fix is to pass a context (for query block percona#3) to clean_up_after_removal(). Both of the above conditions then become false, and Item_subselect.unit is not purged. As an additional shortcut, find_order_in_list() will not call clean_up_after_removal() if real_item() of the order item and the select list item are identical. In addition, this commit changes clean_up_after_removal() so that it requires the context to be non-null, to prevent similar errors. It also simplifies Item_sum::clean_up_after_removal() by removing window functions unconditionally (and adds a corresponding test case). Change-Id: I449be15d369dba97b23900d1a9742e9f6bad4355
altmannmarcelo
pushed a commit
to altmannmarcelo/percona-xtrabackup
that referenced
this pull request
Jan 26, 2022
percona#2] If the schema distribution client detects timeout, but before freeing the schema object if the coordinator receives the schema event, then coordinator instead of returning the function, will process the stale schema event. The coordinator does not know if the schema distribution time out is detected by the client. It starts processing the schema event whenever the schema object is valid. So, introduce a new variable to indicate the state of the schema object and change the state when client detect the schema distribution timeout or when the schema event is received by the coordinator. So that both coordinator and client can be in sync. Change-Id: Ic0149aa9a1ae787c7799a675f2cd085f0ac0c4bb
altmannmarcelo
pushed a commit
to altmannmarcelo/percona-xtrabackup
that referenced
this pull request
Apr 26, 2022
…ON COMPILER WARNINGS Remove some stringop-truncation warning using cstrbuf. Change-Id: I3ab43f6dd8c8b0b784d919211b041ac3ad4fad40
altmannmarcelo
pushed a commit
to altmannmarcelo/percona-xtrabackup
that referenced
this pull request
Apr 26, 2022
Patch #1 caused several problems in mysql-trunk related to ndbinfo initialization and upgrade, including the failure of the test ndb_76_inplace_upgrade and the failure of all NDB MTR tests in Pushbuild on Windows. This patch fixes these issues, including fixes for bug#33726826 and bug#33730799. In ndbinfo, revert the removal of ndb$blocks and ndb$index_stats and the change of blocks and index_stats from views to tables. Improve the ndbinfo schema upgrade & initialization logic to better handle such a change in the future. This logic now runs in two passes: first it drops the known tables and views from current and previous versions, then it creates the tables and views for the current version. Add a new class method NdbDictionary::printColumnTypeDescription(). This is needed for the ndbinfo.columns table in patch percona#2 but was missing from patch #1. Add boilerplate index lookup initialization code that was also missing. Fix ndbinfo prefix determination on Windows. Change-Id: I422856bcad4baf5ae9b14c1e3a1f2871bd6c5f59
altmannmarcelo
added a commit
to altmannmarcelo/percona-xtrabackup
that referenced
this pull request
Jun 3, 2022
Compression using 'xtrabackup --compress=zstd` & `xtrabackup --decompress` works fine as we do everything in one go (read the entire file and pass it to zstd client). Xbstream is a bit different. There is no alignment between the buffers we deal with in compression and stream(Read buffer, compression and xbstream chunk). Let's take an example of hypothetical File 1. We will have 3 layers of data: 1. Raw file We read raw file in read_buffer_size chunks (Default 10Mb): +------------------------------------------------------------+ | File 1 | +------------------------------------------------------------+ ^ | 10 Mb 2. Compressed data Read buffer is passed into ds_compress_zstd and gets compressed into one or multiple Frames: +-------------- ---+--------------- + | File 1 Frame 1 | File 1 Frame 2 | +------------------+----------------+ 3. XBStream data xbstream will then write data into multiple chunks of read_buffer_size +------------------$---+-------------+ | XBS chunk 1 $ | XBS chunk 2 | +------------------$---+-------------+ | End of F1F1 XBStream chunk 1 will have the complete data of File 1 Frame 1 and part of the data of File 1 Frame 2, while the remaining data of that frame will only be available at XBStream chunk 2. Thus we need to parse xbstream data reading it with the ZSTD compression format https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md in order to validate when we should send the data of F1F1to ZSTD decompress functions. Also we need to take into consideration that part of the buffer from chunk1 needs to be saved and append at the beginning on XBS chunk 2 in order to see the full F1F2 data (reassembly item percona#2). Thus we need to create a ring buffer to parse each xbstream chunk. We start by reading ZSTD frame, there should be at least 9 bytes on a frame ( magic 4bytes, header at least 2 bytes, 1 block header 3 bytes).
altmannmarcelo
added a commit
to altmannmarcelo/percona-xtrabackup
that referenced
this pull request
Jun 3, 2022
Compression using 'xtrabackup --compress=zstd` & `xtrabackup --decompress` works fine as we do everything in one go (read the entire file and pass it to zstd client). Xbstream is a bit different. There is no alignment between the buffers we deal with in compression and stream(Read buffer, compression and xbstream chunk). Let's take an example of hypothetical File 1. We will have 3 layers of data: 1. Raw file We read raw file in read_buffer_size chunks (Default 10Mb): +------------------------------------------------------------+ | File 1 | +------------------------------------------------------------+ ^ | 10 Mb 2. Compressed data Read buffer is passed into ds_compress_zstd and gets compressed into one or multiple Frames: +-------------- ---+--------------- + | File 1 Frame 1 | File 1 Frame 2 | +------------------+----------------+ 3. XBStream data xbstream will then write data into multiple chunks of read_buffer_size +------------------$---+-------------+ | XBS chunk 1 $ | XBS chunk 2 | +------------------$---+-------------+ | End of F1F1 XBStream chunk 1 will have the complete data of File 1 Frame 1 and part of the data of File 1 Frame 2, while the remaining data of that frame will only be available at XBStream chunk 2. Thus we need to parse xbstream data reading it with the ZSTD compression format https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md in order to validate when we should send the data of F1F1to ZSTD decompress functions. Also we need to take into consideration that part of the buffer from chunk1 needs to be saved and append at the beginning on XBS chunk 2 in order to see the full F1F2 data (reassembly item percona#2). Thus we need to create a ring buffer to parse each xbstream chunk. Ring buffer work as follow: - When we receive a new xb chunk we save it as a new buffer in the ring buffer at ds_istream.h. - Before starting to parse anything meaningful (Frame/Block header) we save current position of ring buffer as we might need to advance it depending on the type of data we will attemp to parse. If we find any issue while parsing data, like asking for more bytes than buffer capacity, or unable to parse the frame, we declare the chunk as ZSTD_INCOMPLETE. We restore to the saved position and move to the next xbstream chunk. Next time we will have remaining data from previous chunk + data from next chunk to be read in a continious way. Parsing: We start by reading ZSTD frame, there should be at least 9 bytes on a frame ( magic 4bytes, header at least 2 bytes, 1 block header 3 bytes). WIP
patrickbirch
pushed a commit
to patrickbirch/percona-xtrabackup
that referenced
this pull request
Jul 26, 2022
PXB-2817 Add README doc to a new doc repository
satya-bodapati
pushed a commit
that referenced
this pull request
Aug 18, 2022
When creating a NdbEventOperationImpl it need reference to a NdbDictionary::Event. Creating a NdbDictionary::Event involves a roundtrip to NDB in order to "open" the Event and return the Event instance. This may fail and is not suitable for doing in a constructor. Fix by moving the opening of NdbDictionary::Event out of NdbEventOperationImpl constructor. Change-Id: I5752f8b636ddd31672ac95f59b8f272a41cddfa9
satya-bodapati
pushed a commit
that referenced
this pull request
Aug 18, 2022
* PROBLEM The test "ndb.ndb_bug17624736" was constantly failing in [daily|weekly]-8.0-cluster branches in PB2, whether on `ndb-ps` or `ndb-default-big` profile test runs. The high-level reason for the failure was the installation of a duplicate entry in the Data Dictionary in respect to the `engine`-`se_private_id` pair, even when the previous table definition should have been dropped. * LOW-LEVEL EXPLANATION When data nodes fail and need to reorganize, the MySQL servers connected start to synchronize the schema definition in their own Data Dictionary. The `se_private_id` for NDB tables installed in the DD is the same as the NDB table ID, hereafter refered to as just ID, and thus a pair `engine`-`se_private_id` is installed in the `tables.engine`. It is common tables to be updated with different IDs, such as when an ALTER table or a DROP/CREATE occurs. The previous table definition, gotten by table full qualified name ("schema.table" format), is usually sufficient to be dropped and hence the new table to be installed with the new ID, since it is assumed that no other table definition is installed with that ID. However, on the synchronization phase, if the data node failure caused a previous table definition *of a different table than the one to be installed* to still exist with the ID to be installed, then that old definition won't be dropped and thus a duplicate entry warning will be logged on the THD. Example: t1 - id=13,version=1 t2 - id=15,version=1 <failures and synchronization> t1 = id=9,version=2 t2 = id=13,version=2 (previous def=15, but ndbcluster-13 still exists) One of the reasons for the error is that on `Ndb_dd_client::install_table` the name is used to fetch the previous definition while on `Ndb_dd_client::store_table` the ID is used instead. Also, `Ndb_dd_client::install_table` should be able to drop the required table definitions on the DD in order to install the new one, as dictated by the data nodes. It was just dropping the one found by the name of the table to be installed. * SOLUTION The solution was to add procedures to check if the ID to be installed is different than the previous, then it must be checked if an old table definition already exists with that ID. If it does, drop it also. Additionally, some renaming (`object_id` to `spi`, refering to `se_private_id`) and a new struct were employed to make it simpler to keep the pair (ID-VERSION) together and respectively install these on the new table's definition SE fields. Change-Id: Ie671a5fc58646e02c21ef1299309303f33173e95
satya-bodapati
pushed a commit
that referenced
this pull request
Aug 18, 2022
-- Patch #1: Persist secondary load information -- Problem: We need a way of knowing which tables were loaded to HeatWave after MySQL restarts due to a crash or a planned shutdown. Solution: Add a new "secondary_load" flag to the `options` column of mysql.tables. This flag is toggled after a successful secondary load or unload. The information about this flag is also reflected in INFORMATION_SCHEMA.TABLES.CREATE_OPTIONS. -- Patch #2 -- The second patch in this worklog triggers the table reload from InnoDB after MySQL restart. The recovery framework recognizes that the system restarted by checking whether tables are present in the Global State. If there are no tables present, the framework will access the Data Dictionary and find which tables were loaded before the restart. This patch introduces the "Data Dictionary Worker" - a MySQL service recovery worker whose task is to query the INFORMATION_SCHEMA.TABLES table from a separate thread and find all tables whose secondary_load flag is set to 1. All tables that were found in the Data Dictionary will be appended to the list of tables that have to be reloaded by the framework from InnoDB. If an error occurs during restart recovery we will not mark the recovery as failed. This is done because the types of failures that can occur when the tables are reloaded after a restart are less critical compared to previously existing recovery situations. Additionally, this code will soon have to be adapted for the next worklog in this area so we are proceeding with the simplest solution that makes sense. A Global Context variable m_globalStateEmpty is added which indicates whether the Global State should be recovered from an external source. -- Patch #3 -- This patch adds the "rapid_reload_on_restart" system variable. This variable is used to control whether tables should be reloaded after a restart of mysqld or the HeatWave plugin. This variable is persistable (i.e., SET PERSIST RAPID_RELOAD_ON_RESTART = TRUE/FALSE). The default value of this variable is set to false. The variable can be modified in OFF, IDLE, and SUSPENDED states. -- Patch #4 -- This patch refactors the recovery code by removing all recovery-related code from ha_rpd.cc and moving it to separate files: - ha_rpd_session_factory.h/cc: These files contain the MySQLAdminSessionFactory class, which is used to create admin sessions in separate threads that can be used to issue SQL queries. - ha_rpd_recovery.h/cc: These files contain the MySQLServiceRecoveryWorker, MySQLServiceRecoveryJob and ObjectStoreRecoveryJob classes which were previously defined in ha_rpd.cc. This file also contains a function that creates the RecoveryWorkerFactory object. This object is passed to the constructor of the Recovery Framework and is used to communicate with the other section of the code located in rpdrecoveryfwk.h/cc. This patch also renames rpdrecvryfwk to rpdrecoveryfwk for better readability. The include relationship between the files is shown on the following diagram: rpdrecoveryfwk.h◄──────────────rpdrecoveryfwk.cc ▲ ▲ │ │ │ │ │ └──────────────────────────┐ │ │ ha_rpd_recovery.h◄─────────────ha_rpd_recovery.cc──┐ ▲ │ │ │ │ │ │ │ │ │ ▼ │ ha_rpd.cc───────────────────────►ha_rpd.h │ ▲ │ │ │ ┌───────────────────────────────┘ │ │ ▼ ha_rpd_session_factory.cc──────►ha_rpd_session_factory.h Other changes: - In agreement with Control Plane, the external Global State is now invalidated during recovery framework startup if: 1) Recovery framework recognizes that it should load the Global State from an external source AND, 2) rapid_reload_on_restart is set to OFF. - Addressed review comments for Patch #3, rapid_reload_on_restart is now also settable while plugin is ON. - Provide a single entry point for processing external Global State before starting the recovery framework loop. - Change when the Data Dictionary is read. Now we will no longer wait for the HeatWave nodes to connect before querying the Data Dictionary. We will query it when the recovery framework starts, before accepting any actions in the recovery loop. - Change the reload flow by inserting fake global state entries for tables that need to be reloaded instead of manually adding them to a list of tables scheduled for reload. This method will be used for the next phase where we will recover from Object Storage so both recovery methods will now follow the same flow. - Update secondary_load_dd_flag added in Patch #1. - Increase timeout in wait_for_server_bootup to 300s to account for long MySQL version upgrades. - Add reload_on_restart and reload_on_restart_dbg tests to the rapid suite. - Add PLUGIN_VAR_PERSIST_AS_READ_ONLY flag to "rapid_net_orma_port" and "rapid_reload_on_restart" definitions, enabling their initialization from persisted values along with "rapid_bootstrap" when it is persisted as ON. - Fix numerous clang-tidy warnings in recovery code. - Prevent suspended_basic and secondary_load_dd_flag tests to run on ASAN builds due to an existing issue when reinstalling the RAPID plugin. -- Bug#33752387 -- Problem: A shutdown of MySQL causes a crash in queries fired by DD worker. Solution: Prevent MySQL from killing DD worker's queries by instantiating a DD_kill_immunizer before the queries are fired. -- Patch #5 -- Problem: A table can be loaded before the DD Worker queries the Data Dictionary. This means that table will be wrongly processed as part of the external global state. Solution: If the table is present in the current in-memory global state we will not consider it as part of the external global state and we will not process it by the recovery framework. -- Bug#34197659 -- Problem: If a table reload after restart causes OOM the cluster will go into RECOVERYFAILED state. Solution: Recognize when the tables are being reloaded after restart and do not move the cluster into RECOVERYFAILED. In that case only the current reload will fail and the reload for other tables will be attempted. Change-Id: Ic0c2a763bc338ea1ae6a7121ff3d55b456271bf0
rahulmalik87
pushed a commit
that referenced
this pull request
Jan 24, 2023
Add various json fields in the new JSON format. Have json field "access_type" with value "index" for many scans that use some or the other forms of index. Plans with "access_type=index" have additional fields such as index_access_type, covering, lookup_condition, index_name, etc. The value of index_access_type will further tell us what specfic type of index scan it is; like Index range scan, Index lookup scan, etc. Join plan nodes have access_type=join. Such plans will, again, have additional json fields that tell us whether it's a hash join, merge join, and whether it is an antijoin, semijoin, etc. If a plan node is a root of a subquery subtree, it additionally has the field 'subquery' with value "true". Such plan nodes will also have fields like "location=projection", "dependent=true" corresponding to the TREE format synopsis : Select #2 (subquery in projection; dependent) If a json field is absent, its value should be interpreted as either 0, empty, or false, depending on its type. A side effect of this commit is that for AccessPath::REF, the phrase "iterate backwards" is changed to "reverse". New test file added to test format=JSON with hypergraph optimizer. Change-Id: I816af3ec546c893d4fc0c77298ef17d49cff7427
rahulmalik87
pushed a commit
that referenced
this pull request
Jan 24, 2023
Enh#34350907 - [Nvidia] Allow DDLs when tables are loaded to HeatWave Bug#34433145 - WL#15129: mysqld crash Assertion `column_count == static_cast<int64_t>(cp_table- Bug#34446287 - WL#15129: mysqld crash at rapid::data::RapidNetChunkCtx::consolidateEncodingsDic Bug#34520634 - MYSQLD CRASH : Sql_cmd_secondary_load_unload::mysql_secondary_load_or_unload Bug#34520630 - Failed Condition: "table_id != InvalidTableId" Currently, DDL statements such as ALTER TABLE*, RENAME TABLE, and TRUNCATE TABLE are not allowed if a table has a secondary engine defined. The statements fail with the following error: "DDLs on a table with a secondary engine defined are not allowed." This worklog lifts this restriction for tables whose secondary engine is RAPID. A secondary engine hook is called in the beginning (pre-hook) and in the end (post-hook) of a DDL statement execution. If the DDL statement succeeds, the post-hook will direct the recovery framework to reload the table in order to reflect that change in HeatWave. Currently all DDL statements that were previously disallowed will trigger a reload. This can be improved in the future by checking whether the DDL operation has an impact on HeatWave or not. However detecting all edge-cases in this behavior is not straightforward so this improvement has been left as a future improvement. Additionally, if a DDL modifies the table schema in a way that makes it incompatible with HeatWave (e.g., dropping a primary key column) the reload will fail silently. There is no easy way to recognize whether the table schema will become incompatible with HeatWave in a pre-hook. List of changes: 1) [MySQL] Add new HTON_SECONDARY_ENGINE_SUPPORTS_DDL flag to indicate whether a secondary engine supports DDLs. 2) [MySQL] Add RAII hooks for RENAME TABLE and TRUNCATE TABLE, modeled on the ALTER TABLE hook. 3) Define HeatWave hooks for ALTER TABLE, RENAME TABLE, and TRUNCATE TABLE statements. 4) If a table reload is necessary, trigger it by marking the table as stale (WL#14914). 4) Move all change propagation & DDL hooks to ha_rpd_hooks.cc. 5) Adjust existing tests to support table reload upon DDL execution. 6) Extract code related to RapidOpSyncCtx in ha_rpd_sync_ctx.cc, and the PluginState enum to ha_rpd_fsm.h. * Note: ALTER TABLE statements related to secondary engine setting and loading were allowed before: - ALTER TABLE <TABLE> SECONDARY_UNLOAD, - ALTER TABLE SECONDARY_ENGINE = NULL. -- Bug#34433145 -- -- Bug#34446287 -- --Problem #1-- Crashes in Change Propagation when the CP thread tries to apply DMLs of tables with new schema to the not-yet-reloaded table in HeatWave. --Solution #1-- Remove table from Change Propagation before marking it as stale and revert the original change from rpd_binlog_parser.cc where we were checking if the table was stale before continuing with binlog parsing. The original change is no longer necessary since the table is removed from CP before being marked as stale. --Problem #2-- In case of a failed reload, tables are not removed from Global State. --Solution #2-- Keep track of whether the table was reloaded because it was marked as STALE. In that case we do not want the Recovery Framework to retry the reload and therefore we can remove the table from the Global State. -- Bug#34520634 -- Problem: Allowing the change of primary engine for tables with a defined secondary engine hits an assertion in mysql_secondary_load_or_unload(). Example: CREATE TABLE t1 (col1 INT PRIMARY KEY) SECONDARY_ENGINE = RAPID; ALTER TABLE t1 ENGINE = BLACKHOLE; ALTER TABLE t1 SECONDARY_LOAD; <- assertion hit here Solution: Disallow changing the primary engine for tables with a defined secondary engine. -- Bug#34520630 -- Problem: A debug assert is being hit in rapid_gs_is_table_reloading_from_stale because the table was dropped in the meantime. Solution: Instead of asserting, just return false if table is not present in the Global State. This patch also changes rapid_gs_is_table_reloading_from_stale to a more specific check (inlined the logic in load_table()). This check now also covers the case when a table was dropped/unloaded before the Recovery Framework marked it as INRECOVERY. In that case, if the reload fails we should not have an entry for that table in the Global State. The patch also adjusts dict_types MTR test, where we no longer expect for tables to be in UNAVAIL state after a failed reload. Additionally, recovery2_ddls.test is adjusted to not try to offload queries running on Performance Schema. Change-Id: I6ee390b1f418120925f5359d5e9365f0a6a415ee
altmannmarcelo
pushed a commit
to altmannmarcelo/percona-xtrabackup
that referenced
this pull request
May 4, 2023
The SSL socket table is a big table of pointers to SSL objects, indexed by socket number. When an NdbSocket is initialized from an ndb_socket_t, it will fetch any saved SSL * associated with the socket from the SSL socket table. If SSL_TABLE_ABORT_ON_HIT in ssl_socket_table.h is set to 1, debug builds will abort when an SSL pointer is found but not expected. This is a intended as a tool to help find and fix code that might lose the association between a socket and its SSL. The table is protected by a read-write lock. As the code base evolves toward use of NdbSocket with correct life cycles, the contention on the lock should become less and less. Once SecureSockets are correctly used everywhere, the table can be disabled. Change-Id: Ibb9b858146b8d983ca99f05b61ba0abee11b3ee6
altmannmarcelo
pushed a commit
to altmannmarcelo/percona-xtrabackup
that referenced
this pull request
May 4, 2023
Enable the NDB binlog injector to calculate transaction dependencies for changes written to the binlog. This is done by extending rpl_injector to populate the "THD::writeset" array with 64 bit hash values representing the key(s) of each binlogged row in the current transaction. Those values are then at binlog commit time compared with the historical writeset to find the oldest transaction for which there are no change to the same key(s). The same mechanism is already available for other storage engines since WL#9556 and intention is that it should now work in an identical fashion and with same limitations also for transactions binlogged for changes done in NDB. The transaction writeset hash value calculations is enabled by using --ndb-log-transaction-dependency=[ON|OFF], thus enabling the use of WRITESET dependency tracking mode when the ndb_binlog thread writes to the binlog. Expected result is basically that the "last_comitted" value for each binlogged transaction will be set to the sequence number of the transaction that previously have modified the same row(s) or the last serial synchronization point. Change-Id: I7b50365ce13d26a70eb5e93a3d703c0fe82ba8a4
altmannmarcelo
pushed a commit
to altmannmarcelo/percona-xtrabackup
that referenced
this pull request
May 4, 2023
Fix static analysis warnings for variables that are assigned but never used. storage/ndb/test/src/UtilTransactions.cpp:286:16: warning: Although the value stored to 'eof' is used in the enclosing expression, the value is never actually read from 'eof' [clang-analyzer-deadcode.DeadStores] Change-Id: I6d7b1fbae691a8d9750f57a8a11c2b12ff65041d
altmannmarcelo
pushed a commit
to altmannmarcelo/percona-xtrabackup
that referenced
this pull request
May 4, 2023
# This is the 1st commit message: WL#15280: HEATWAVE SUPPORT FOR MDS HA Problem Statement ----------------- Currently customers cannot enable heatwave analytics service to their HA DBSystem or enable HA if they are using Heatwave enabled DBSystem. In this change, we attempt to remove this limitation and provide failover support of heatwave in an HA enabled DBSystem. High Level Overview ------------------- To support heatwave with HA, we extended the existing feature of auto- reloading of tables to heatwave on MySQL server restart (WL-14396). To provide seamless failover functionality to tables loaded to heatwave, each node in the HA cluster (group replication) must have the latest view of tables which are currently loaded to heatwave cluster attached to the primary, i.e., the secondary_load flag should be in-sync always. To achieve this, we made following changes - 1. replicate secondary load/unload DDL statements to all the active secondary nodes by writing the DDL into the binlog, and 2. Control how secondary load/unload is executed when heatwave cluster is not attached to node executing the command Implementation Details ---------------------- Current implementation depends on two key assumptions - 1. All MDS DBSystems will have RAPID plugin installed. 2. No non-MDS system will have the RAPID plugin installed. Based on these assumptions, we made certain changes w.r.t. how server handles execution of secondary load/unload statements. 1. If secondary load/unload command is executed from a mysql client session on a system without RAPID plugin installed (i.e., non-MDS), instead of an error, a warning message will be shown to the user, and the DDL is allowed to commit. 2. If secondary load/unload command is executed from a replication connection on an MDS system without heatwave cluster attached, instead of throwing an error, the DDL is allowed to commit. 3. If no error is thrown from secondary engine, then the DDL will update the secondary_load metadata and write a binlog entry. Writing to binlog implies that all the consumer of binlog now need to handle this DDL gracefully. This has an adverse effect on Point-in-time Recovery. If the PITR backup is taken from a DBSystem with heatwave, it may contain traces of secondary load/unload statements in its binlog. If such a backup is used to restore a new DBSystem, it will cause failure while trying to execute statements from its binlog because a) DBSystem will not heatwave cluster attached at this time, and b) Statements from binlog are executed from standard mysql client connection, thus making them indistinguishable from user executed command. Customers will be prevented (by control plane) from using PITR functionality on a heatwave enabled DBSystem until there is a solution for this. Testing ------- This commit changes the behavior of secondary load/unload statements, so it - adjusts existing tests' expectations, and - adds a new test validating new DDL behavior under different scenarios Change-Id: Ief7e9b3d4878748b832c366da02892917dc47d83 # This is the commit message percona#2: WL#15280: HEATWAVE SUPPORT FOR MDS HA (PITR SUPPORT) Problem ------- A PITR backup taken from a heatwave enabled system could have traces of secondary load or unload statements in binlog. When such a backup is used to restore another system, it can cause failure because of following two reasons: 1. Currently, even if the target system is heatwave enabled, heatwave cluster is attached only after PITR restore phase completes. 2. When entries from binlogs are applied, a standard mysql client connection is used. This makes it indistinguishable from other user session. Since secondary load (or unload) statements are meant to throw error when they are executed by user in the absence of a healthy heatwave cluster, PITR restore workflow will fail if binlogs from the backup have any secondary load (or unload) statements in them. Solution -------- To avoid PITR failure, we are introducing a new system variable rapid_enable_delayed_secondary_ops. It controls how load or unload commands are to be processed by rapid plugin. - When turned ON, the plugin silently skips the secondary engine operation (load/unload) and returns success to the caller. This allows secondary load (or unload) statements to be executed by the server in the absence of any heatwave cluster. - When turned OFF, it follows the existing behavior. - The default value is OFF. - The value can only be changed when rapid_bootstrap is IDLE or OFF. - This variable cannot be persisted. In PITR workflow, Control Plane would set the variable at the start of PITR restore and then reset it at the end of workflow. This allows the workflow to complete without failure even when heatwave cluster is not attached. Since metadata is always updated when secondary load/unload DDLs are executed, when heatwave cluster is attached at a later point in time, the respective tables get reloaded to heatwave automatically. Change-Id: I42e984910da23a0e416edb09d3949989159ef707 # This is the commit message percona#3: WL#15280: HEATWAVE SUPPORT FOR MDS HA (TEST CHANGES) This commit adds new functional tests for the MDS HA + HW integration. Change-Id: Ic818331a4ca04b16998155efd77ac95da08deaa1 # This is the commit message percona#4: WL#15280: HEATWAVE SUPPORT FOR MDS HA BUG#34776485: RESTRICT DEFAULT VALUE FOR rapid_enable_delayed_secondary_ops This commit does two things: 1. Add a basic test for newly introduced system variable rapid_enable_delayed_secondary_ops, which controls the behavior of alter table secondary load/unload ddl statements when rapid cluster is not available. 2. It also restricts the DEFAULT value setting for the system variable So, following is not allowed: SET GLOBAL rapid_enable_delayed_secondary_ops = default This variable is to be used in restricted scenarios and control plane only sets it to ON/OFF before and after PITR apply. Allowing set to default has no practical use. Change-Id: I85c84dfaa0f868dbfc7b1a88792a89ffd2e81da2 # This is the commit message percona#5: Bug#34726490: ADD DIAGNOSTICS FOR SECONDARY LOAD / UNLOAD DDL Problem: -------- If secondary load or unload DDL gets rolled back due to some error after it had loaded / unloaded the table in heatwave cluster, there is no undo of the secondary engine action. Only secondary_load flag update is reverted and binlog is not written. From User's perspective, the table is loaded and can be seen on performance_schema. There are also no error messages printed to notify that the ddl didn't commit. This creates a problem to debug any issue in this area. Solution: --------- The partial undo of secondary load/unload ddl will be handled in bug#34592922. In this commit, we add diagnostics to reveal if the ddl failed to commit, and from what stage. Change-Id: I46c04dd5dbc07fc17beb8aa2a8d0b15ddfa171af # This is the commit message percona#6: WL#15280: HEATWAVE SUPPORT FOR MDS HA (TEST FIX) Since ALTER TABLE SECONDARY LOAD / UNLOAD DDL statements now write to binlog, from Heatwave's perspective, SCN is bumped up. In this commit, we are adjusting expected SCN values in certain tests which does secondary load/unload and expects SCN to match. Change-Id: I9635b3cd588d01148d763d703c72cf50a0c0bb98 # This is the commit message percona#7: Adding MTR tests for ML in rapid group_replication suite Added MTR tests with Heatwave ML queries with in an HA setup. Change-Id: I386a3530b5bbe6aea551610b6e739ab1cf366439 # This is the commit message percona#8: WL#15280: HEATWAVE SUPPORT FOR MDS HA (MTR TEST ADJUSTMENT) In this commit we have adjusted the existing test to work with the new MTR test infrastructure which extends the functionalities to HA landscape. With this change, a lot of mannual settings have now become redundant and thus removed in this commit. Change-Id: Ie1f4fcfdf047bfe8638feaa9f54313d509cbad7e # This is the commit message percona#9: WL#15280: HEATWAVE SUPPORT FOR MDS HA (CLANG-TIDY FIX) Fix clang-tidy warnings found in previous change#16530, patch#20 Change-Id: I15d25df135694c2f6a3a9146feebe2b981637662 Change-Id: I3f3223a85bb52343a4619b0c2387856b09438265
altmannmarcelo
pushed a commit
to altmannmarcelo/percona-xtrabackup
that referenced
this pull request
May 4, 2023
…::register_variable Several problems stacked up together: 1. The component initialization, when failing should clean up after itself. Fixed the validate_password component's init method to properly clean up in case of failures. 2. The validate_password component had an REQUIRES_SERIVCE(registry). While this is not wrong per se, it collided with the implicit REQUIRES_SERVICE(registry) done by the BEGIN_COMPONENT_REQUIRES() macro in that it was using the same placeholder global variable. So now the same service handle was released twice on error or component unload. Fixed by removing the second REQUIRES_SERVICE(registry). 3. The dynamic loader is releasing the newly acquired service references for the required services on initialization error. However after doing that it was actually setting the service handle placeholder to NULL. This is not wrong, but combined with problem percona#2 was causing a reference to the registry service to be acquired twice, stored into the same placeholder and then released just once, since after the first release the placeholder was set to null and thus the second release is a no-op. Fixed by not resetting the handle placeholder after releasing the service reference. 4. The system variable registration service wouldn't release the intermediate memory slots it was allocating on error. Fixed by using std::unique_ptr to handle the proper releasing. Change-Id: Ib2c7ae80736c591838af8c182fda1980be1e1f0e
altmannmarcelo
pushed a commit
to altmannmarcelo/percona-xtrabackup
that referenced
this pull request
Oct 26, 2023
Part of WL#15135 Certificate Architecture This patch introduces class TlsKeyManager, containing all TLS authentication and key management logic. A single instance of TlsKeyManager in each node owns the local NodeCertificate, an SSL_CTX, and a table holding the serial numbers and expiration dates of all peer certificates. A large set of TLS-related error codes is introduced in the file TlsKeyErrors.h. The unit test testTlsKeyManager-t tests TLS authentication over client/server connections on localhost. Change-Id: I2ee42efc268219639691f73a1d7638a336844d88
altmannmarcelo
pushed a commit
to altmannmarcelo/percona-xtrabackup
that referenced
this pull request
Oct 26, 2023
Implement ndb$certificates base table and certificates view. Update results for tests ndbinfo and ndbinfo plans. Change-Id: Iab1b89f5eb82ac1b3e0c049dd55eb7d07394070a
altmannmarcelo
pushed a commit
to altmannmarcelo/percona-xtrabackup
that referenced
this pull request
Oct 26, 2023
Move client_authenticate() out of SocketClient::connect() (which returns void) into a separate SocketClient::authenticate() method which can return a value. In SocketAuthenticator, change the signature of the authentication routines to return an int (which can represent a result code) rather than a bool. Results less than AuthOk represent failure, and results greater than or equal to AuthOk represent success. Remove the username and password variables from SocketAuthSimple; make them constant strings in the implementation. There are no functional changes. Change-Id: I4c25e99f1b9b692db39213dfa63352da8993a8fb
altmannmarcelo
pushed a commit
to altmannmarcelo/percona-xtrabackup
that referenced
this pull request
Oct 26, 2023
This changes TransporterRegistry::connect_ndb_mgmd() to return NdbSocket rather than ndb_socket_t. It extends the StartTls test in testMgmd to test upgrading the TLS MGM protocol socket to a transporter. Change-Id: Ic3b9ccf39ec78ed25705a4bbbdc5ac2953a35611
altmannmarcelo
pushed a commit
to altmannmarcelo/percona-xtrabackup
that referenced
this pull request
Oct 26, 2023
Post push fix. NdbSocket::copy method duplicated the mutex pointer, leaving two objects referring to one mutex. Typically the source will destroy its mutex, making it unusable for target object. Fix by use the transfer method instead. Change-Id: I199c04b870049498463903f6358f79a38649f543
altmannmarcelo
pushed a commit
to altmannmarcelo/percona-xtrabackup
that referenced
this pull request
Oct 26, 2023
Post push fix. NdbSocket::copy method duplicated the mutex pointer, leaving two objects referring to one mutex. Typically the source will destroy its mutex, making it unusable for target object. Remove copy method. Change-Id: I2cc36128c343c7bab08d96651b12946ecd87210c
satya-bodapati
pushed a commit
to satya-bodapati/percona-xtrabackup
that referenced
this pull request
Jan 24, 2024
… THE SQL If the argument to a window function contains a subquery, the access path of that subquery would be printed twice when doing 'EXPLAIN FORMAT=TREE'. When using the Hypergraph optimizer, the subquery path was not printed at all, whether using FORMAT=TREE or FORMAT=JSON. This commit fixes this by ensuring that we ignore duplicate paths, and (for Hypergraph) by traversing the structures needed to find the relevant Item_subselect objects. Change-Id: I2abedcf690294f98ce169b74e53f042f46c47a45
satya-bodapati
pushed a commit
to satya-bodapati/percona-xtrabackup
that referenced
this pull request
Jan 24, 2024
… THE SQL Post-push fix: Cherry-picking the fix onto mysql-trunk introduced an unintended duplication of a code block, causing a shadowing-warning when building with g++. This commit corrects that. Change-Id: I1b279818ca0d30e32fc8dabb76c647120b531e8f
satya-bodapati
pushed a commit
to satya-bodapati/percona-xtrabackup
that referenced
this pull request
Jan 24, 2024
Problem ================================ Group Replication ASAN run failing without any symptom of a leak, but with shutdown issues: worker[6] Shutdown report from /dev/shm/mtr-3771884/var-gr-debug/6/log/mysqld.1.err after tests: group_replication.gr_flush_logs group_replication.gr_delayed_initialization_thread_handler_error group_replication.gr_sbr_verifications group_replication.gr_server_uuid_matches_group_name_bootstrap group_replication.gr_stop_async_on_stop_gr group_replication.gr_certifier_message_same_member group_replication.gr_ssl_mode_verify_identity_error_xcom Analysis and Fix ================================ It ended up being a leak on gr_ssl_mode_verify_identity_error_xcom test: Direct leak of 24 byte(s) in 1 object(s) allocated from: #0 0x7f1709fbe1c7 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99 #1 0x7f16ea0df799 in xcom_tcp_server_startup(Xcom_network_provider*) (/export/home/tmp/BUG35594709/mysql-trunk/BIN-ASAN/plugin_output_directory /group_replication.so+0x65d799) percona#2 0x7f170751e2b2 (/lib/x86_64-linux-gnu/libstdc++.so.6+0xdc2b2) This happens because we delegated incoming connections cleanup to the external consumer in incoming_connection_task. Since it calls incoming_connection() from Network_provider_manager, in case of a concurrent stop, a connection could be left orphan in the shared atomic due to the lack of an Active Provider, thus creating a memory leak. The solution is to make this cleanup on Network_provider_manager, on both stop_provider() and in stop_all_providers() methods, thus ensuring that no incoming connection leaks. Change-Id: I2367c37608ad075dee63785e9f908af5e81374ca
satya-bodapati
pushed a commit
to satya-bodapati/percona-xtrabackup
that referenced
this pull request
Jan 24, 2024
Post push fix. In test program testTlsKeyManager-t a struct sockaddr pointer was passed to inet_ntop instead of struct in_addr for AF_INET and struct in6_addr for AF_INET6. That caused wrong addresses to be printed on error: not ok 26 - Client cert for test hostname is OK >>> Test of address 2.0.0.0 for msdn.microsoft.com returned error authorization failure: bad hostname not ok 27 - Client cert for test hostname is OK >>> Test of address a00::2620:1ec:46:0 for msdn.microsoft.com returned error authorization failure: bad hostname not ok 28 - Client cert for test hostname is OK >>> Test of address a00::2620:1ec:bdf:0 for msdn.microsoft.com returned error authorization failure: bad hostname Should be 13.107.x.53 or 2620:1ec:x::53. Changed to use ndb_sockaddr and Ndb_inet_ntop instead. Change-Id: Iae4bebca26462f9b65c3232e9768c574e767b380
satya-bodapati
pushed a commit
to satya-bodapati/percona-xtrabackup
that referenced
this pull request
Jan 24, 2024
Move client_authenticate() out of SocketClient::connect() (which returns void) into a separate SocketClient::authenticate() method which can return a value. In SocketAuthenticator, change the signature of the authentication routines to return an int (which can represent a result code) rather than a bool. Results less than AuthOk represent failure, and results greater than or equal to AuthOk represent success. Remove the username and password variables from SocketAuthSimple; make them constant strings in the implementation. There are no functional changes. Change-Id: I4c25e99f1b9b692db39213dfa63352da8993a8fb
satya-bodapati
pushed a commit
to satya-bodapati/percona-xtrabackup
that referenced
this pull request
Jan 24, 2024
This changes TransporterRegistry::connect_ndb_mgmd() to return NdbSocket rather than ndb_socket_t. Back-ported from mysql-trunk. Change-Id: Ic3b9ccf39ec78ed25705a4bbbdc5ac2953a35611
satya-bodapati
pushed a commit
to satya-bodapati/percona-xtrabackup
that referenced
this pull request
Jan 24, 2024
…N THE SQL Post-push fix. ASan reported memory leaks from some EXPLAIN tests, such as main.explain_tree. The reason was that the Json_dom objects that were discarded to avoid describing a subquery twice, were not properly destroyed. The EXPLAIN code uses unique_ptr to make sure the Json_dom objects are destroyed, but there are windows in which the objects only exist as unmanaged raw pointers. This patch closes the window which caused this memory leak by changing ExplainChild::obj from a raw pointer to a unique_ptr, so that it gets destroyed even if it doesn't make it into the final tree that describes the full plan. Change-Id: I0f0885da867e8a34335ff11f3ae9da883a878ba4
satya-bodapati
pushed a commit
to satya-bodapati/percona-xtrabackup
that referenced
this pull request
Jan 24, 2024
BUG#35949017 Schema dist setup lockup Bug#35948153 Problem setting up events due to stale NdbApi dictionary cache [percona#2] Bug#35948153 Problem setting up events due to stale NdbApi dictionary cache [#1] Bug#32550019 Missing check for ndb_schema_result leads to schema dist timeout Change-Id: I4a32197992bf8b6899892f21587580788f828f34
aybek
pushed a commit
to aybek/percona-xtrabackup
that referenced
this pull request
May 30, 2024
cache [percona#2] This is second patch, solving the problem of ineffiecent cache invalidation when invalidating a table which is known to be invalid but unknown if it is in the cache or not. Problem: Currently the only way to invalidate a table in the NdbApi dictionary cache is to open the table and then mark it as invalid. In case the table does not exists in the cache, it will still have to be opened and thus fetched fom NDB. This means that in order to get the latest table definition it has to be fetched two times, although the table definition does not already exist in the cache. This is inefficient. Analysis: In order to avoid the double roundtrip there need to be a function which marks the table as invalid only if it exists in the cache. Fix: Implement a NdbApi function that invalidates table by name if it exists in the cache. Replace the old pattern of opening table in order to invalidate it with the new function. The old pattern is still a valid use case for invalidating a table after having worked with it. Change-Id: I20f275f1fed76d991330348bea4ae72548366467
aybek
pushed a commit
to aybek/percona-xtrabackup
that referenced
this pull request
May 30, 2024
…nt on Windows and posix [percona#2] The posix version of NdbProcess::start_process assumed the arguments where quoted using " and \ in a way that resembles POSIX sh quoting, and unquoted spaces were treated as argument separators splitting the argument to several. But the Windows version of NdbProcess::start_process did not treat options in the same way. And the Windows C runtime (CRT) parse arguments different from POSIX sh. Note that if program do not use CRT when it may treat the command line in its own way and the quoting done for CRT will mess up the command line. On Windows NdbProcess:start_process should only be used for CRT compatible programs on Windows with respect to argument quoting on command line, or one should make sure given arguments will not trigger unwanted quoting. This may be relevant for ndb_sign_keys and --CA-tool=<batch-file>. Instead this patch change the intention of start_process to pass arguments without modification from caller to the called C programs argument vector in its main entry function. In posix path that is easy, just pass the incoming C strings to execvp. On Windows one need to quote for Windows CRT when composing the command line. Note that the command part of command line have different quoting than the following arguments have. Change-Id: I763530c634d3ea460b24e6e01061bbb5f3321ad4
aybek
pushed a commit
to aybek/percona-xtrabackup
that referenced
this pull request
May 30, 2024
Problem: Starting ´ndb_mgmd --bind-address´ may potentially cause abnormal program termination in MgmtSrvr destructor when ndb_mgmd restart itself. Core was generated by `ndb_mgmd --defa'. Program terminated with signal SIGABRT, Aborted. #0 0x00007f8ce4066b8f in raise () from /lib64/libc.so.6 #1 0x00007f8ce4039ea5 in abort () from /lib64/libc.so.6 percona#2 0x00007f8ce40a7d97 in __libc_message () from /lib64/libc.so.6 percona#3 0x00007f8ce40af08c in malloc_printerr () from /lib64/libc.so.6 percona#4 0x00007f8ce40b132d in _int_free () from /lib64/libc.so.6 percona#5 0x00000000006e9ffe in MgmtSrvr::~MgmtSrvr (this=0x28de4b0) at mysql/8.0/storage/ndb/src/mgmsrv/MgmtSrvr.cpp: 890 percona#6 0x00000000006ea09e in MgmtSrvr::~MgmtSrvr (this=0x2) at mysql/8.0/ storage/ndb/src/mgmsrv/MgmtSrvr.cpp:849 percona#7 0x0000000000700d94 in mgmd_run () at mysql/8.0/storage/ndb/src/mgmsrv/main.cpp:260 percona#8 0x0000000000700775 in mgmd_main (argc=<optimized out>, argv=0x28041d0) at mysql/8.0/storage/ndb/src/ mgmsrv/main.cpp:479 Analysis: While starting up, the ndb_mgmd will allocate memory for bind_address in order to potentially rewrite the parameter. When ndb_mgmd restart itself the memory will be released and dangling pointer causing double free. Fix: Drop support for bind_address=[::], it is not documented anywhere, is not useful and doesn't work. This means the need to rewrite bind_address is gone and bind_address argument need neither alloc or free. Change-Id: I7797109b9d8391394587188d64d4b1f398887e94
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
release notes for pxb-2.2.9