Skip to content

Commit

Permalink
[upstream] PS-9018: Replica stalls completely when log_replica_update…
Browse files Browse the repository at this point in the history
…s=0 and a local DDL

         executed

https://perconadev.atlassian.net/browse/PS-9018

Problem
-------
In high concurrency scenarios, MySQL replica can enter into a deadlock due to a
race condition between the replica applier thread and the client thread
performing a binlog group commit.

Analysis
--------
It needs at least 3 threads for this deadlock to happen

1. One client thread
2. Two replica applier threads

How this deadlock happens?
--------------------------
0. Binlog is enabled on replica, but log_replica_updates is disabled.

1. Initially, both "Commit Order" and "Binlog Flush" queues are empty.

2. Replica applier thread 1 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier
   thread 1

   3.1. Becomes leader (In Commit_stage_manager::enroll_for()).

   3.2. Registers in the commit order queue.

   3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log.

   3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is
        not yet released.

   NOTE: SE commit for applier thread is already done by the time it reaches
         here.

4. Replica applier thread 2 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the
   applier thread 2

   5.1. Becomes leader (In Commit_stage_manager::enroll_for())

   5.2. Registers in the commit order queue.

   5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier
        thread 1 it will wait until the lock is released.

6. Client thread enters the group commit pipeline to register in the
   "Binlog Flush" queue.

7. Since "Commit Order" queue is not empty (there is applier thread 2 in the
   queue), it enters the conditional wait `m_stage_cond_leader` with an
   intention to become the leader for both the "Binlog Flush" and
   "Commit Order" queues.

8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update
   the GTID by calling gtid_state->update_commit_group() from
   Commit_order_manager::flush_engine_and_signal_threads().

9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log.

   9.1. It checks if there is any thread waiting in the "Binlog Flush" queue
        to become the leader. Here it finds the client thread waiting to be
        the leader.

   9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the
        cond_var `m_stage_cond_leader` and enters a conditional wait until the
        thread's `tx_commit_pending` is set to false by the client thread
       (will be done in the
       Commit_stage_manager::process_final_stage_for_ordered_commit_group()
       called by client thread from fetch_and_process_flush_stage_queue()).

10. The client thread wakes up from the cond_var `m_stage_cond_leader`.  The
    thread has now become a leader and it is its responsibility to update GTID
    of applier thread 2.

    10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log.

    10.2. Returns from `enroll_for()` and proceeds to process the
          "Commit Order" and "Binlog Flush" queues.

    10.3. Fetches the "Commit Order" and "Binlog Flush" queues.

    10.4. Performs the storage engine flush by calling ha_flush_logs() from
          fetch_and_process_flush_stage_queue().

    10.5. Proceeds to update the GTID of threads in "Commit Order" queue by
          calling gtid_state->update_commit_group() from
          Commit_stage_manager::process_final_stage_for_ordered_commit_group().

11. At this point, we will have

    - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and
    - Applier thread 1 performing GTID update for itself (from step 8).

    Due to the lack of proper synchronization between the above two threads,
    there exists a time window where both threads can call
    gtid_state->update_commit_group() concurrently.

    In subsequent steps, both threads simultaneously try to modify the contents
    of the array `commit_group_sidnos` which is used to track the lock status of
    sidnos. This concurrent access to `update_commit_group()` can cause a
    lock-leak resulting in one thread acquiring the sidno lock and not
    releasing at all.

-----------------------------------------------------------------------------------------------------------
Client thread                                           Applier Thread 1
-----------------------------------------------------------------------------------------------------------
update_commit_group() => global_sid_lock->rdlock();     update_commit_group() => global_sid_lock->rdlock();

calls update_gtids_impl_lock_sidnos()                   calls update_gtids_impl_lock_sidnos()

set commit_group_sidno[2] = true                        set commit_group_sidno[2] = true

                                                        lock_sidno(2) -> successful

lock_sidno(2) -> waits

                                                        update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

                                                        if (commit_group_sidnos[2]) {
                                                          unlock_sidno(2);
                                                          commit_group_sidnos[2] = false;
                                                        }

                                                        Applier thread continues..

lock_sidno(2) -> successful

update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

if (commit_group_sidnos[2]) { <=== this check fails and lock is not released.
  unlock_sidno(2);
  commit_group_sidnos[2] = false;
}

Client thread continues without releasing the lock
-----------------------------------------------------------------------------------------------------------

12. As the above lock-leak can also happen the other way i.e, the applier
    thread fails to unlock, there can be different consequences hereafter.

13. If the client thread continues without releasing the lock, then at a later
    stage, it can enter into a deadlock with the applier thread performing a
    GTID update with stack trace.

    Client_thread
    -------------
    #1  __GI___lll_lock_wait
    #2  ___pthread_mutex_lock
    #3  native_mutex_lock                                       <= waits for commit lock while holding sidno lock
    percona#4  Commit_stage_manager::enroll_for
    percona#5  MYSQL_BIN_LOG::change_stage
    percona#6  MYSQL_BIN_LOG::ordered_commit
    percona#7  MYSQL_BIN_LOG::commit
    percona#8  ha_commit_trans
    percona#9  trans_commit_implicit
    percona#10 mysql_create_like_table
    percona#11 Sql_cmd_create_table::execute
    percona#12 mysql_execute_command
    percona#13 dispatch_sql_command

    Applier thread
    --------------
    #1  ___pthread_mutex_lock
    #2  native_mutex_lock
    #3  safe_mutex_lock
    percona#4  Gtid_state::update_gtids_impl_lock_sidnos               <= waits for sidno lock
    percona#5  Gtid_state::update_commit_group
    percona#6  Commit_order_manager::flush_engine_and_signal_threads   <= acquires commit lock here
    percona#7  Commit_order_manager::finish
    percona#8  Commit_order_manager::wait_and_finish
    percona#9  ha_commit_low
    percona#10 trx_coordinator::commit_in_engines
    percona#11 MYSQL_BIN_LOG::commit
    percona#12 ha_commit_trans
    percona#13 trans_commit
    percona#14 Xid_log_event::do_commit
    percona#15 Xid_apply_log_event::do_apply_event_worker
    percona#16 Slave_worker::slave_worker_exec_event
    percona#17 slave_worker_exec_job_group
    percona#18 handle_slave_worker

14. If the applier thread continues without releasing the lock, then at a later
    stage, it can perform recursive locking while setting the GTID for the next
    transaction (in set_gtid_next()).

    In debug builds the above case hits the assertion
    `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the
    replica applier thread when it tries to re-acquire the lock.

Solution
--------
In the above problematic example, when seen from each thread
individually, we can conclude that there is no problem in the order of lock
acquisition, thus there is no need to change the lock order.

However, the root cause for this problem is that multiple threads can
concurrently access to the array `Gtid_state::commit_group_sidnos`.

In its initial implementation, it was expected that threads should
hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it
was not considered when upstream implemented WL#7846 (MTS:
slave-preserve-commit-order when log-slave-updates/binlog is disabled).

With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired
when the client thread (binlog flush leader) when it tries to perform GTID
update on behalf of threads waiting in "Commit Order" queue, thus providing a
guarantee that `Gtid_state::commit_group_sidnos` array is never accessed
without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
  • Loading branch information
venkatesh-prasad-v authored and oleksandr-kachan committed Apr 16, 2024
1 parent f832eb7 commit 0fdfe90
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 0 deletions.
7 changes: 7 additions & 0 deletions sql/rpl_commit_stage_manager.cc
Original file line number Diff line number Diff line change
Expand Up @@ -476,7 +476,14 @@ THD *Commit_stage_manager::fetch_queue_skip_acquire_lock(StageID stage) {
void Commit_stage_manager::process_final_stage_for_ordered_commit_group(
THD *first) {
if (first != nullptr) {
/*
The below call to update_commit_group() function accesses the array
`commit_group_sidnos` and needs to be protected with
MYSQL_BIN_LOG::LOCK_commit.
*/
mysql_mutex_lock(mysql_bin_log.get_commit_lock());
gtid_state->update_commit_group(first);
mysql_mutex_unlock(mysql_bin_log.get_commit_lock());
signal_done(first, Commit_stage_manager::COMMIT_ORDER_FLUSH_STAGE);
}
}
Expand Down
3 changes: 3 additions & 0 deletions sql/rpl_gtid_state.cc
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,9 @@ void Gtid_state::broadcast_owned_sidnos(const THD *thd) {
void Gtid_state::update_commit_group(THD *first_thd) {
DBUG_TRACE;

// Assert that we already hold MYSQL_BIN_LOG::LOCK_commit here
mysql_mutex_assert_owner(mysql_bin_log.get_commit_lock());

bool gtid_threshold_breach = false;
/*
We are going to loop in all sessions of the group commit in order to avoid
Expand Down

0 comments on commit 0fdfe90

Please sign in to comment.