Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PS-8854: Move Percona's MTR tests to suite/percona and suite/percona_innodb #11

Merged
merged 5 commits into from
Aug 28, 2023

Conversation

inikep
Copy link
Owner

@inikep inikep commented Aug 23, 2023

No description provided.

inikep added a commit that referenced this pull request Aug 24, 2023
PS-5741: Incorrect use of memset_s in keyring_vault.

Fixed the usage of memset_s. The arguments should be:
void memset_s(void *dest, size_t dest_max, int c, size_t n)
where the 2nd argument is size of buffer and the 3rd is
argument is character to fill.

---------------------------------------------------------------------------

PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate

---

*Problem:*

`st_mysql_value::val_str` might return a pointer to `buf` which after
the function called is deleted. Therefore the value in `save`, after
reuturnin from the function, is invalid.

In this particular case, the error is not manifesting as val_str`
returns memory allocated with `thd_strmake` and it does not use `buf`.

*Solution:*

Allocate memory with `thd_strmake` so the memory in `save` is not local.

---------------------------------------------------------------------------

Fix test main.bug12969156 when WITH_ASAN=ON

*Problem:*

ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`:

```
==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478
WRITE of size 24 at 0x7fe746d06d14 thread T16777215

Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame
    #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62

  This frame has 4 object(s):
    [48, 56) 'result' (line 66)
    [80, 112) '_db_stack_frame_' (line 63)
    [144, 200) 'tm_tmp' (line 67)
    [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
Thread T26 created by T25 here:
    #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
    #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104
    #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148
    #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279
    #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279
    #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664
    #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160
    #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952
    #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544
    #9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065
    #10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325
    #11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198
    #12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473
```

The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above.

This is a benign error as all the variables are on the stack.

*Solution*:

Finish the thread in orderly way by using a signalling variable.

---------------------------------------------------------------------------

PS-8204: Fix XML escape rules for audit plugin

https://jira.percona.com/browse/PS-8204

There was a wrong length specified for some XML
escape rules. As a result of this terminating null symbol from
replacement rule was copied into resulting string. This lead to
quer text truncation in audit log file.
In addition added empty replacement rules for '\b' and 'f' symbols
which just remove them from resulting string. These symboles are
not supported in XML 1.0.
@inikep inikep force-pushed the internal-8.1.0-linear branch 3 times, most recently from 346c764 to 6072e36 Compare August 25, 2023 10:27
Copy link

@percona-ysorokin percona-ysorokin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@inikep Just a suggestion
If we are already doing this, to me it also makes sense to extract 'percona_encryption_udf_*' tests into a separate suite. Probably component_encryption_udf (following existing conventions)

Copy link

@percona-ysorokin percona-ysorokin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@inikep inikep merged commit 10eb6c3 into internal-8.1.0-linear Aug 28, 2023
17 checks passed
@inikep inikep deleted the PS-8854-percona_innodb branch August 28, 2023 19:18
inikep added a commit that referenced this pull request Oct 16, 2023
PS-5741: Incorrect use of memset_s in keyring_vault.

Fixed the usage of memset_s. The arguments should be:
void memset_s(void *dest, size_t dest_max, int c, size_t n)
where the 2nd argument is size of buffer and the 3rd is
argument is character to fill.

---------------------------------------------------------------------------

PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate

---

*Problem:*

`st_mysql_value::val_str` might return a pointer to `buf` which after
the function called is deleted. Therefore the value in `save`, after
reuturnin from the function, is invalid.

In this particular case, the error is not manifesting as val_str`
returns memory allocated with `thd_strmake` and it does not use `buf`.

*Solution:*

Allocate memory with `thd_strmake` so the memory in `save` is not local.

---------------------------------------------------------------------------

Fix test main.bug12969156 when WITH_ASAN=ON

*Problem:*

ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`:

```
==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478
WRITE of size 24 at 0x7fe746d06d14 thread T16777215

Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame
    #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62

  This frame has 4 object(s):
    [48, 56) 'result' (line 66)
    [80, 112) '_db_stack_frame_' (line 63)
    [144, 200) 'tm_tmp' (line 67)
    [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
Thread T26 created by T25 here:
    #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
    #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104
    #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148
    #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279
    #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279
    #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664
    #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160
    #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952
    #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544
    #9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065
    #10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325
    #11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198
    #12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473
```

The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above.

This is a benign error as all the variables are on the stack.

*Solution*:

Finish the thread in orderly way by using a signalling variable.

---------------------------------------------------------------------------

PS-8204: Fix XML escape rules for audit plugin

https://jira.percona.com/browse/PS-8204

There was a wrong length specified for some XML
escape rules. As a result of this terminating null symbol from
replacement rule was copied into resulting string. This lead to
quer text truncation in audit log file.
In addition added empty replacement rules for '\b' and 'f' symbols
which just remove them from resulting string. These symboles are
not supported in XML 1.0.

---------------------------------------------------------------------------

PS-8854: Add main.percona_udf MTR test

Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions.
inikep added a commit that referenced this pull request Oct 16, 2023
PS-5741: Incorrect use of memset_s in keyring_vault.

Fixed the usage of memset_s. The arguments should be:
void memset_s(void *dest, size_t dest_max, int c, size_t n)
where the 2nd argument is size of buffer and the 3rd is
argument is character to fill.

---------------------------------------------------------------------------

PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate

---

*Problem:*

`st_mysql_value::val_str` might return a pointer to `buf` which after
the function called is deleted. Therefore the value in `save`, after
reuturnin from the function, is invalid.

In this particular case, the error is not manifesting as val_str`
returns memory allocated with `thd_strmake` and it does not use `buf`.

*Solution:*

Allocate memory with `thd_strmake` so the memory in `save` is not local.

---------------------------------------------------------------------------

Fix test main.bug12969156 when WITH_ASAN=ON

*Problem:*

ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`:

```
==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478
WRITE of size 24 at 0x7fe746d06d14 thread T16777215

Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame
    #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62

  This frame has 4 object(s):
    [48, 56) 'result' (line 66)
    [80, 112) '_db_stack_frame_' (line 63)
    [144, 200) 'tm_tmp' (line 67)
    [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
Thread T26 created by T25 here:
    #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
    #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104
    #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148
    #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279
    #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279
    #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664
    #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160
    #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952
    #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544
    #9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065
    #10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325
    #11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198
    #12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473
```

The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above.

This is a benign error as all the variables are on the stack.

*Solution*:

Finish the thread in orderly way by using a signalling variable.

---------------------------------------------------------------------------

PS-8204: Fix XML escape rules for audit plugin

https://jira.percona.com/browse/PS-8204

There was a wrong length specified for some XML
escape rules. As a result of this terminating null symbol from
replacement rule was copied into resulting string. This lead to
quer text truncation in audit log file.
In addition added empty replacement rules for '\b' and 'f' symbols
which just remove them from resulting string. These symboles are
not supported in XML 1.0.

---------------------------------------------------------------------------

PS-8854: Add main.percona_udf MTR test

Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions.
inikep added a commit that referenced this pull request Oct 25, 2023
PS-5741: Incorrect use of memset_s in keyring_vault.

Fixed the usage of memset_s. The arguments should be:
void memset_s(void *dest, size_t dest_max, int c, size_t n)
where the 2nd argument is size of buffer and the 3rd is
argument is character to fill.

---------------------------------------------------------------------------

PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate

---

*Problem:*

`st_mysql_value::val_str` might return a pointer to `buf` which after
the function called is deleted. Therefore the value in `save`, after
reuturnin from the function, is invalid.

In this particular case, the error is not manifesting as val_str`
returns memory allocated with `thd_strmake` and it does not use `buf`.

*Solution:*

Allocate memory with `thd_strmake` so the memory in `save` is not local.

---------------------------------------------------------------------------

Fix test main.bug12969156 when WITH_ASAN=ON

*Problem:*

ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`:

```
==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478
WRITE of size 24 at 0x7fe746d06d14 thread T16777215

Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame
    #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62

  This frame has 4 object(s):
    [48, 56) 'result' (line 66)
    [80, 112) '_db_stack_frame_' (line 63)
    [144, 200) 'tm_tmp' (line 67)
    [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
Thread T26 created by T25 here:
    #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
    #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104
    #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148
    #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279
    #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279
    #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664
    #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160
    #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952
    #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544
    #9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065
    #10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325
    #11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198
    #12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473
```

The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above.

This is a benign error as all the variables are on the stack.

*Solution*:

Finish the thread in orderly way by using a signalling variable.

---------------------------------------------------------------------------

PS-8204: Fix XML escape rules for audit plugin

https://jira.percona.com/browse/PS-8204

There was a wrong length specified for some XML
escape rules. As a result of this terminating null symbol from
replacement rule was copied into resulting string. This lead to
quer text truncation in audit log file.
In addition added empty replacement rules for '\b' and 'f' symbols
which just remove them from resulting string. These symboles are
not supported in XML 1.0.

---------------------------------------------------------------------------

PS-8854: Add main.percona_udf MTR test

Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions.
inikep added a commit that referenced this pull request Oct 26, 2023
PS-5741: Incorrect use of memset_s in keyring_vault.

Fixed the usage of memset_s. The arguments should be:
void memset_s(void *dest, size_t dest_max, int c, size_t n)
where the 2nd argument is size of buffer and the 3rd is
argument is character to fill.

---------------------------------------------------------------------------

PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate

---

*Problem:*

`st_mysql_value::val_str` might return a pointer to `buf` which after
the function called is deleted. Therefore the value in `save`, after
reuturnin from the function, is invalid.

In this particular case, the error is not manifesting as val_str`
returns memory allocated with `thd_strmake` and it does not use `buf`.

*Solution:*

Allocate memory with `thd_strmake` so the memory in `save` is not local.

---------------------------------------------------------------------------

Fix test main.bug12969156 when WITH_ASAN=ON

*Problem:*

ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`:

```
==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478
WRITE of size 24 at 0x7fe746d06d14 thread T16777215

Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame
    #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62

  This frame has 4 object(s):
    [48, 56) 'result' (line 66)
    [80, 112) '_db_stack_frame_' (line 63)
    [144, 200) 'tm_tmp' (line 67)
    [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
Thread T26 created by T25 here:
    #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
    #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104
    #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148
    #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279
    #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279
    #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664
    #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160
    #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952
    #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544
    #9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065
    #10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325
    #11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198
    #12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473
```

The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above.

This is a benign error as all the variables are on the stack.

*Solution*:

Finish the thread in orderly way by using a signalling variable.

---------------------------------------------------------------------------

PS-8204: Fix XML escape rules for audit plugin

https://jira.percona.com/browse/PS-8204

There was a wrong length specified for some XML
escape rules. As a result of this terminating null symbol from
replacement rule was copied into resulting string. This lead to
quer text truncation in audit log file.
In addition added empty replacement rules for '\b' and 'f' symbols
which just remove them from resulting string. These symboles are
not supported in XML 1.0.

---------------------------------------------------------------------------

PS-8854: Add main.percona_udf MTR test

Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions.
inikep added a commit that referenced this pull request Oct 26, 2023
PS-5741: Incorrect use of memset_s in keyring_vault.

Fixed the usage of memset_s. The arguments should be:
void memset_s(void *dest, size_t dest_max, int c, size_t n)
where the 2nd argument is size of buffer and the 3rd is
argument is character to fill.

---------------------------------------------------------------------------

PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate

---

*Problem:*

`st_mysql_value::val_str` might return a pointer to `buf` which after
the function called is deleted. Therefore the value in `save`, after
reuturnin from the function, is invalid.

In this particular case, the error is not manifesting as val_str`
returns memory allocated with `thd_strmake` and it does not use `buf`.

*Solution:*

Allocate memory with `thd_strmake` so the memory in `save` is not local.

---------------------------------------------------------------------------

Fix test main.bug12969156 when WITH_ASAN=ON

*Problem:*

ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`:

```
==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478
WRITE of size 24 at 0x7fe746d06d14 thread T16777215

Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame
    #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62

  This frame has 4 object(s):
    [48, 56) 'result' (line 66)
    [80, 112) '_db_stack_frame_' (line 63)
    [144, 200) 'tm_tmp' (line 67)
    [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
Thread T26 created by T25 here:
    #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
    #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104
    #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148
    #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279
    #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279
    #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664
    #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160
    #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952
    #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544
    #9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065
    #10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325
    #11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198
    #12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473
```

The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above.

This is a benign error as all the variables are on the stack.

*Solution*:

Finish the thread in orderly way by using a signalling variable.

---------------------------------------------------------------------------

PS-8204: Fix XML escape rules for audit plugin

https://jira.percona.com/browse/PS-8204

There was a wrong length specified for some XML
escape rules. As a result of this terminating null symbol from
replacement rule was copied into resulting string. This lead to
quer text truncation in audit log file.
In addition added empty replacement rules for '\b' and 'f' symbols
which just remove them from resulting string. These symboles are
not supported in XML 1.0.

---------------------------------------------------------------------------

PS-8854: Add main.percona_udf MTR test

Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions.
inikep added a commit that referenced this pull request Jan 17, 2024
PS-5741: Incorrect use of memset_s in keyring_vault.

Fixed the usage of memset_s. The arguments should be:
void memset_s(void *dest, size_t dest_max, int c, size_t n)
where the 2nd argument is size of buffer and the 3rd is
argument is character to fill.

---------------------------------------------------------------------------

PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate

---

*Problem:*

`st_mysql_value::val_str` might return a pointer to `buf` which after
the function called is deleted. Therefore the value in `save`, after
reuturnin from the function, is invalid.

In this particular case, the error is not manifesting as val_str`
returns memory allocated with `thd_strmake` and it does not use `buf`.

*Solution:*

Allocate memory with `thd_strmake` so the memory in `save` is not local.

---------------------------------------------------------------------------

Fix test main.bug12969156 when WITH_ASAN=ON

*Problem:*

ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`:

```
==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478
WRITE of size 24 at 0x7fe746d06d14 thread T16777215

Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame
    #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62

  This frame has 4 object(s):
    [48, 56) 'result' (line 66)
    [80, 112) '_db_stack_frame_' (line 63)
    [144, 200) 'tm_tmp' (line 67)
    [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
Thread T26 created by T25 here:
    #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
    #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104
    #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148
    #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279
    #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279
    #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664
    #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160
    #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952
    #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544
    #9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065
    #10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325
    #11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198
    #12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473
```

The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above.

This is a benign error as all the variables are on the stack.

*Solution*:

Finish the thread in orderly way by using a signalling variable.

---------------------------------------------------------------------------

PS-8204: Fix XML escape rules for audit plugin

https://jira.percona.com/browse/PS-8204

There was a wrong length specified for some XML
escape rules. As a result of this terminating null symbol from
replacement rule was copied into resulting string. This lead to
quer text truncation in audit log file.
In addition added empty replacement rules for '\b' and 'f' symbols
which just remove them from resulting string. These symboles are
not supported in XML 1.0.

---------------------------------------------------------------------------

PS-8854: Add main.percona_udf MTR test

Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions.
inikep added a commit that referenced this pull request Jan 17, 2024
PS-5741: Incorrect use of memset_s in keyring_vault.

Fixed the usage of memset_s. The arguments should be:
void memset_s(void *dest, size_t dest_max, int c, size_t n)
where the 2nd argument is size of buffer and the 3rd is
argument is character to fill.

---------------------------------------------------------------------------

PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate

---

*Problem:*

`st_mysql_value::val_str` might return a pointer to `buf` which after
the function called is deleted. Therefore the value in `save`, after
reuturnin from the function, is invalid.

In this particular case, the error is not manifesting as val_str`
returns memory allocated with `thd_strmake` and it does not use `buf`.

*Solution:*

Allocate memory with `thd_strmake` so the memory in `save` is not local.

---------------------------------------------------------------------------

Fix test main.bug12969156 when WITH_ASAN=ON

*Problem:*

ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`:

```
==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478
WRITE of size 24 at 0x7fe746d06d14 thread T16777215

Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame
    #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62

  This frame has 4 object(s):
    [48, 56) 'result' (line 66)
    [80, 112) '_db_stack_frame_' (line 63)
    [144, 200) 'tm_tmp' (line 67)
    [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
Thread T26 created by T25 here:
    #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
    #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104
    #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148
    #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279
    #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279
    #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664
    #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160
    #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952
    #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544
    #9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065
    #10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325
    #11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198
    #12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473
```

The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above.

This is a benign error as all the variables are on the stack.

*Solution*:

Finish the thread in orderly way by using a signalling variable.

---------------------------------------------------------------------------

PS-8204: Fix XML escape rules for audit plugin

https://jira.percona.com/browse/PS-8204

There was a wrong length specified for some XML
escape rules. As a result of this terminating null symbol from
replacement rule was copied into resulting string. This lead to
quer text truncation in audit log file.
In addition added empty replacement rules for '\b' and 'f' symbols
which just remove them from resulting string. These symboles are
not supported in XML 1.0.

---------------------------------------------------------------------------

PS-8854: Add main.percona_udf MTR test

Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions.
inikep pushed a commit that referenced this pull request Feb 8, 2024
TransporterRegistry and MgmtSrvr both open some MGM connections
to management servers. Use MGM TLS for these, and extend the
init_tls() method to add the MGM TLS requirement level.

Change-Id: Iea6d776eeb676bd979267b03461bf62d59ca25b1
inikep added a commit that referenced this pull request Feb 8, 2024
PS-5741: Incorrect use of memset_s in keyring_vault.

Fixed the usage of memset_s. The arguments should be:
void memset_s(void *dest, size_t dest_max, int c, size_t n)
where the 2nd argument is size of buffer and the 3rd is
argument is character to fill.

---------------------------------------------------------------------------

PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate

---

*Problem:*

`st_mysql_value::val_str` might return a pointer to `buf` which after
the function called is deleted. Therefore the value in `save`, after
reuturnin from the function, is invalid.

In this particular case, the error is not manifesting as val_str`
returns memory allocated with `thd_strmake` and it does not use `buf`.

*Solution:*

Allocate memory with `thd_strmake` so the memory in `save` is not local.

---------------------------------------------------------------------------

Fix test main.bug12969156 when WITH_ASAN=ON

*Problem:*

ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`:

```
==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478
WRITE of size 24 at 0x7fe746d06d14 thread T16777215

Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame
    #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62

  This frame has 4 object(s):
    [48, 56) 'result' (line 66)
    [80, 112) '_db_stack_frame_' (line 63)
    [144, 200) 'tm_tmp' (line 67)
    [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
Thread T26 created by T25 here:
    #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
    #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104
    #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148
    #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279
    #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279
    #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664
    #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160
    #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952
    #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544
    #9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065
    #10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325
    #11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198
    #12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473
```

The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above.

This is a benign error as all the variables are on the stack.

*Solution*:

Finish the thread in orderly way by using a signalling variable.

---------------------------------------------------------------------------

PS-8204: Fix XML escape rules for audit plugin

https://jira.percona.com/browse/PS-8204

There was a wrong length specified for some XML
escape rules. As a result of this terminating null symbol from
replacement rule was copied into resulting string. This lead to
quer text truncation in audit log file.
In addition added empty replacement rules for '\b' and 'f' symbols
which just remove them from resulting string. These symboles are
not supported in XML 1.0.

---------------------------------------------------------------------------

PS-8854: Add main.percona_udf MTR test

Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions.
inikep pushed a commit that referenced this pull request Mar 15, 2024
…ocal DDL

         executed

https://perconadev.atlassian.net/browse/PS-9018

Problem
-------
In high concurrency scenarios, MySQL replica can enter into a deadlock due to a
race condition between the replica applier thread and the client thread
performing a binlog group commit.

Analysis
--------
It needs at least 3 threads for this deadlock to happen

1. One client thread
2. Two replica applier threads

How this deadlock happens?
--------------------------
0. Binlog is enabled on replica, but log_replica_updates is disabled.

1. Initially, both "Commit Order" and "Binlog Flush" queues are empty.

2. Replica applier thread 1 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier
   thread 1

   3.1. Becomes leader (In Commit_stage_manager::enroll_for()).

   3.2. Registers in the commit order queue.

   3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log.

   3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is
        not yet released.

   NOTE: SE commit for applier thread is already done by the time it reaches
         here.

4. Replica applier thread 2 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the
   applier thread 2

   5.1. Becomes leader (In Commit_stage_manager::enroll_for())

   5.2. Registers in the commit order queue.

   5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier
        thread 1 it will wait until the lock is released.

6. Client thread enters the group commit pipeline to register in the
   "Binlog Flush" queue.

7. Since "Commit Order" queue is not empty (there is applier thread 2 in the
   queue), it enters the conditional wait `m_stage_cond_leader` with an
   intention to become the leader for both the "Binlog Flush" and
   "Commit Order" queues.

8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update
   the GTID by calling gtid_state->update_commit_group() from
   Commit_order_manager::flush_engine_and_signal_threads().

9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log.

   9.1. It checks if there is any thread waiting in the "Binlog Flush" queue
        to become the leader. Here it finds the client thread waiting to be
        the leader.

   9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the
        cond_var `m_stage_cond_leader` and enters a conditional wait until the
        thread's `tx_commit_pending` is set to false by the client thread
       (will be done in the
       Commit_stage_manager::process_final_stage_for_ordered_commit_group()
       called by client thread from fetch_and_process_flush_stage_queue()).

10. The client thread wakes up from the cond_var `m_stage_cond_leader`.  The
    thread has now become a leader and it is its responsibility to update GTID
    of applier thread 2.

    10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log.

    10.2. Returns from `enroll_for()` and proceeds to process the
          "Commit Order" and "Binlog Flush" queues.

    10.3. Fetches the "Commit Order" and "Binlog Flush" queues.

    10.4. Performs the storage engine flush by calling ha_flush_logs() from
          fetch_and_process_flush_stage_queue().

    10.5. Proceeds to update the GTID of threads in "Commit Order" queue by
          calling gtid_state->update_commit_group() from
          Commit_stage_manager::process_final_stage_for_ordered_commit_group().

11. At this point, we will have

    - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and
    - Applier thread 1 performing GTID update for itself (from step 8).

    Due to the lack of proper synchronization between the above two threads,
    there exists a time window where both threads can call
    gtid_state->update_commit_group() concurrently.

    In subsequent steps, both threads simultaneously try to modify the contents
    of the array `commit_group_sidnos` which is used to track the lock status of
    sidnos. This concurrent access to `update_commit_group()` can cause a
    lock-leak resulting in one thread acquiring the sidno lock and not
    releasing at all.

-----------------------------------------------------------------------------------------------------------
Client thread                                           Applier Thread 1
-----------------------------------------------------------------------------------------------------------
update_commit_group() => global_sid_lock->rdlock();     update_commit_group() => global_sid_lock->rdlock();

calls update_gtids_impl_lock_sidnos()                   calls update_gtids_impl_lock_sidnos()

set commit_group_sidno[2] = true                        set commit_group_sidno[2] = true

                                                        lock_sidno(2) -> successful

lock_sidno(2) -> waits

                                                        update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

                                                        if (commit_group_sidnos[2]) {
                                                          unlock_sidno(2);
                                                          commit_group_sidnos[2] = false;
                                                        }

                                                        Applier thread continues..

lock_sidno(2) -> successful

update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

if (commit_group_sidnos[2]) { <=== this check fails and lock is not released.
  unlock_sidno(2);
  commit_group_sidnos[2] = false;
}

Client thread continues without releasing the lock
-----------------------------------------------------------------------------------------------------------

12. As the above lock-leak can also happen the other way i.e, the applier
    thread fails to unlock, there can be different consequences hereafter.

13. If the client thread continues without releasing the lock, then at a later
    stage, it can enter into a deadlock with the applier thread performing a
    GTID update with stack trace.

    Client_thread
    -------------
    #1  __GI___lll_lock_wait
    #2  ___pthread_mutex_lock
    #3  native_mutex_lock                                       <= waits for commit lock while holding sidno lock
    #4  Commit_stage_manager::enroll_for
    #5  MYSQL_BIN_LOG::change_stage
    #6  MYSQL_BIN_LOG::ordered_commit
    #7  MYSQL_BIN_LOG::commit
    #8  ha_commit_trans
    #9  trans_commit_implicit
    #10 mysql_create_like_table
    #11 Sql_cmd_create_table::execute
    #12 mysql_execute_command
    percona#13 dispatch_sql_command

    Applier thread
    --------------
    #1  ___pthread_mutex_lock
    #2  native_mutex_lock
    #3  safe_mutex_lock
    #4  Gtid_state::update_gtids_impl_lock_sidnos               <= waits for sidno lock
    #5  Gtid_state::update_commit_group
    #6  Commit_order_manager::flush_engine_and_signal_threads   <= acquires commit lock here
    #7  Commit_order_manager::finish
    #8  Commit_order_manager::wait_and_finish
    #9  ha_commit_low
    #10 trx_coordinator::commit_in_engines
    #11 MYSQL_BIN_LOG::commit
    #12 ha_commit_trans
    percona#13 trans_commit
    percona#14 Xid_log_event::do_commit
    percona#15 Xid_apply_log_event::do_apply_event_worker
    percona#16 Slave_worker::slave_worker_exec_event
    percona#17 slave_worker_exec_job_group
    percona#18 handle_slave_worker

14. If the applier thread continues without releasing the lock, then at a later
    stage, it can perform recursive locking while setting the GTID for the next
    transaction (in set_gtid_next()).

    In debug builds the above case hits the assertion
    `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the
    replica applier thread when it tries to re-acquire the lock.

Solution
--------
In the above problematic example, when seen from each thread
individually, we can conclude that there is no problem in the order of lock
acquisition, thus there is no need to change the lock order.

However, the root cause for this problem is that multiple threads can
concurrently access to the array `Gtid_state::commit_group_sidnos`.

In its initial implementation, it was expected that threads should
hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it
was not considered when upstream implemented WL#7846 (MTS:
slave-preserve-commit-order when log-slave-updates/binlog is disabled).

With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired
when the client thread (binlog flush leader) when it tries to perform GTID
update on behalf of threads waiting in "Commit Order" queue, thus providing a
guarantee that `Gtid_state::commit_group_sidnos` array is never accessed
without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep pushed a commit that referenced this pull request Apr 9, 2024
inikep pushed a commit that referenced this pull request Apr 9, 2024
inikep pushed a commit that referenced this pull request Jun 14, 2024
…ocal DDL

         executed

https://perconadev.atlassian.net/browse/PS-9018

Merge remote-tracking branch 'venki/PS-9018-8.0-gca' into HEAD

Problem
-------
In high concurrency scenarios, MySQL replica can enter into a deadlock due to a
race condition between the replica applier thread and the client thread
performing a binlog group commit.

Analysis
--------
It needs at least 3 threads for this deadlock to happen

1. One client thread
2. Two replica applier threads

How this deadlock happens?
--------------------------
0. Binlog is enabled on replica, but log_replica_updates is disabled.

1. Initially, both "Commit Order" and "Binlog Flush" queues are empty.

2. Replica applier thread 1 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier
   thread 1

   3.1. Becomes leader (In Commit_stage_manager::enroll_for()).

   3.2. Registers in the commit order queue.

   3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log.

   3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is
        not yet released.

   NOTE: SE commit for applier thread is already done by the time it reaches
         here.

4. Replica applier thread 2 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the
   applier thread 2

   5.1. Becomes leader (In Commit_stage_manager::enroll_for())

   5.2. Registers in the commit order queue.

   5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier
        thread 1 it will wait until the lock is released.

6. Client thread enters the group commit pipeline to register in the
   "Binlog Flush" queue.

7. Since "Commit Order" queue is not empty (there is applier thread 2 in the
   queue), it enters the conditional wait `m_stage_cond_leader` with an
   intention to become the leader for both the "Binlog Flush" and
   "Commit Order" queues.

8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update
   the GTID by calling gtid_state->update_commit_group() from
   Commit_order_manager::flush_engine_and_signal_threads().

9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log.

   9.1. It checks if there is any thread waiting in the "Binlog Flush" queue
        to become the leader. Here it finds the client thread waiting to be
        the leader.

   9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the
        cond_var `m_stage_cond_leader` and enters a conditional wait until the
        thread's `tx_commit_pending` is set to false by the client thread
       (will be done in the
       Commit_stage_manager::process_final_stage_for_ordered_commit_group()
       called by client thread from fetch_and_process_flush_stage_queue()).

10. The client thread wakes up from the cond_var `m_stage_cond_leader`.  The
    thread has now become a leader and it is its responsibility to update GTID
    of applier thread 2.

    10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log.

    10.2. Returns from `enroll_for()` and proceeds to process the
          "Commit Order" and "Binlog Flush" queues.

    10.3. Fetches the "Commit Order" and "Binlog Flush" queues.

    10.4. Performs the storage engine flush by calling ha_flush_logs() from
          fetch_and_process_flush_stage_queue().

    10.5. Proceeds to update the GTID of threads in "Commit Order" queue by
          calling gtid_state->update_commit_group() from
          Commit_stage_manager::process_final_stage_for_ordered_commit_group().

11. At this point, we will have

    - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and
    - Applier thread 1 performing GTID update for itself (from step 8).

    Due to the lack of proper synchronization between the above two threads,
    there exists a time window where both threads can call
    gtid_state->update_commit_group() concurrently.

    In subsequent steps, both threads simultaneously try to modify the contents
    of the array `commit_group_sidnos` which is used to track the lock status of
    sidnos. This concurrent access to `update_commit_group()` can cause a
    lock-leak resulting in one thread acquiring the sidno lock and not
    releasing at all.

-----------------------------------------------------------------------------------------------------------
Client thread                                           Applier Thread 1
-----------------------------------------------------------------------------------------------------------
update_commit_group() => global_sid_lock->rdlock();     update_commit_group() => global_sid_lock->rdlock();

calls update_gtids_impl_lock_sidnos()                   calls update_gtids_impl_lock_sidnos()

set commit_group_sidno[2] = true                        set commit_group_sidno[2] = true

                                                        lock_sidno(2) -> successful

lock_sidno(2) -> waits

                                                        update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

                                                        if (commit_group_sidnos[2]) {
                                                          unlock_sidno(2);
                                                          commit_group_sidnos[2] = false;
                                                        }

                                                        Applier thread continues..

lock_sidno(2) -> successful

update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

if (commit_group_sidnos[2]) { <=== this check fails and lock is not released.
  unlock_sidno(2);
  commit_group_sidnos[2] = false;
}

Client thread continues without releasing the lock
-----------------------------------------------------------------------------------------------------------

12. As the above lock-leak can also happen the other way i.e, the applier
    thread fails to unlock, there can be different consequences hereafter.

13. If the client thread continues without releasing the lock, then at a later
    stage, it can enter into a deadlock with the applier thread performing a
    GTID update with stack trace.

    Client_thread
    -------------
    #1  __GI___lll_lock_wait
    #2  ___pthread_mutex_lock
    #3  native_mutex_lock                                       <= waits for commit lock while holding sidno lock
    #4  Commit_stage_manager::enroll_for
    #5  MYSQL_BIN_LOG::change_stage
    #6  MYSQL_BIN_LOG::ordered_commit
    #7  MYSQL_BIN_LOG::commit
    #8  ha_commit_trans
    #9  trans_commit_implicit
    #10 mysql_create_like_table
    #11 Sql_cmd_create_table::execute
    #12 mysql_execute_command
    percona#13 dispatch_sql_command

    Applier thread
    --------------
    #1  ___pthread_mutex_lock
    #2  native_mutex_lock
    #3  safe_mutex_lock
    #4  Gtid_state::update_gtids_impl_lock_sidnos               <= waits for sidno lock
    #5  Gtid_state::update_commit_group
    #6  Commit_order_manager::flush_engine_and_signal_threads   <= acquires commit lock here
    #7  Commit_order_manager::finish
    #8  Commit_order_manager::wait_and_finish
    #9  ha_commit_low
    #10 trx_coordinator::commit_in_engines
    #11 MYSQL_BIN_LOG::commit
    #12 ha_commit_trans
    percona#13 trans_commit
    percona#14 Xid_log_event::do_commit
    percona#15 Xid_apply_log_event::do_apply_event_worker
    percona#16 Slave_worker::slave_worker_exec_event
    percona#17 slave_worker_exec_job_group
    percona#18 handle_slave_worker

14. If the applier thread continues without releasing the lock, then at a later
    stage, it can perform recursive locking while setting the GTID for the next
    transaction (in set_gtid_next()).

    In debug builds the above case hits the assertion
    `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the
    replica applier thread when it tries to re-acquire the lock.

Solution
--------
In the above problematic example, when seen from each thread
individually, we can conclude that there is no problem in the order of lock
acquisition, thus there is no need to change the lock order.

However, the root cause for this problem is that multiple threads can
concurrently access to the array `Gtid_state::commit_group_sidnos`.

In its initial implementation, it was expected that threads should
hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it
was not considered when upstream implemented WL#7846 (MTS:
slave-preserve-commit-order when log-slave-updates/binlog is disabled).

With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired
when the client thread (binlog flush leader) when it tries to perform GTID
update on behalf of threads waiting in "Commit Order" queue, thus providing a
guarantee that `Gtid_state::commit_group_sidnos` array is never accessed
without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep pushed a commit that referenced this pull request Sep 11, 2024
…s=0 and a local DDL

         executed

https://perconadev.atlassian.net/browse/PS-9018

Problem
-------
In high concurrency scenarios, MySQL replica can enter into a deadlock due to a
race condition between the replica applier thread and the client thread
performing a binlog group commit.

Analysis
--------
It needs at least 3 threads for this deadlock to happen

1. One client thread
2. Two replica applier threads

How this deadlock happens?
--------------------------
0. Binlog is enabled on replica, but log_replica_updates is disabled.

1. Initially, both "Commit Order" and "Binlog Flush" queues are empty.

2. Replica applier thread 1 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier
   thread 1

   3.1. Becomes leader (In Commit_stage_manager::enroll_for()).

   3.2. Registers in the commit order queue.

   3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log.

   3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is
        not yet released.

   NOTE: SE commit for applier thread is already done by the time it reaches
         here.

4. Replica applier thread 2 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the
   applier thread 2

   5.1. Becomes leader (In Commit_stage_manager::enroll_for())

   5.2. Registers in the commit order queue.

   5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier
        thread 1 it will wait until the lock is released.

6. Client thread enters the group commit pipeline to register in the
   "Binlog Flush" queue.

7. Since "Commit Order" queue is not empty (there is applier thread 2 in the
   queue), it enters the conditional wait `m_stage_cond_leader` with an
   intention to become the leader for both the "Binlog Flush" and
   "Commit Order" queues.

8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update
   the GTID by calling gtid_state->update_commit_group() from
   Commit_order_manager::flush_engine_and_signal_threads().

9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log.

   9.1. It checks if there is any thread waiting in the "Binlog Flush" queue
        to become the leader. Here it finds the client thread waiting to be
        the leader.

   9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the
        cond_var `m_stage_cond_leader` and enters a conditional wait until the
        thread's `tx_commit_pending` is set to false by the client thread
       (will be done in the
       Commit_stage_manager::process_final_stage_for_ordered_commit_group()
       called by client thread from fetch_and_process_flush_stage_queue()).

10. The client thread wakes up from the cond_var `m_stage_cond_leader`.  The
    thread has now become a leader and it is its responsibility to update GTID
    of applier thread 2.

    10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log.

    10.2. Returns from `enroll_for()` and proceeds to process the
          "Commit Order" and "Binlog Flush" queues.

    10.3. Fetches the "Commit Order" and "Binlog Flush" queues.

    10.4. Performs the storage engine flush by calling ha_flush_logs() from
          fetch_and_process_flush_stage_queue().

    10.5. Proceeds to update the GTID of threads in "Commit Order" queue by
          calling gtid_state->update_commit_group() from
          Commit_stage_manager::process_final_stage_for_ordered_commit_group().

11. At this point, we will have

    - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and
    - Applier thread 1 performing GTID update for itself (from step 8).

    Due to the lack of proper synchronization between the above two threads,
    there exists a time window where both threads can call
    gtid_state->update_commit_group() concurrently.

    In subsequent steps, both threads simultaneously try to modify the contents
    of the array `commit_group_sidnos` which is used to track the lock status of
    sidnos. This concurrent access to `update_commit_group()` can cause a
    lock-leak resulting in one thread acquiring the sidno lock and not
    releasing at all.

-----------------------------------------------------------------------------------------------------------
Client thread                                           Applier Thread 1
-----------------------------------------------------------------------------------------------------------
update_commit_group() => global_sid_lock->rdlock();     update_commit_group() => global_sid_lock->rdlock();

calls update_gtids_impl_lock_sidnos()                   calls update_gtids_impl_lock_sidnos()

set commit_group_sidno[2] = true                        set commit_group_sidno[2] = true

                                                        lock_sidno(2) -> successful

lock_sidno(2) -> waits

                                                        update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

                                                        if (commit_group_sidnos[2]) {
                                                          unlock_sidno(2);
                                                          commit_group_sidnos[2] = false;
                                                        }

                                                        Applier thread continues..

lock_sidno(2) -> successful

update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

if (commit_group_sidnos[2]) { <=== this check fails and lock is not released.
  unlock_sidno(2);
  commit_group_sidnos[2] = false;
}

Client thread continues without releasing the lock
-----------------------------------------------------------------------------------------------------------

12. As the above lock-leak can also happen the other way i.e, the applier
    thread fails to unlock, there can be different consequences hereafter.

13. If the client thread continues without releasing the lock, then at a later
    stage, it can enter into a deadlock with the applier thread performing a
    GTID update with stack trace.

    Client_thread
    -------------
    #1  __GI___lll_lock_wait
    #2  ___pthread_mutex_lock
    #3  native_mutex_lock                                       <= waits for commit lock while holding sidno lock
    #4  Commit_stage_manager::enroll_for
    #5  MYSQL_BIN_LOG::change_stage
    #6  MYSQL_BIN_LOG::ordered_commit
    #7  MYSQL_BIN_LOG::commit
    #8  ha_commit_trans
    #9  trans_commit_implicit
    #10 mysql_create_like_table
    #11 Sql_cmd_create_table::execute
    #12 mysql_execute_command
    percona#13 dispatch_sql_command

    Applier thread
    --------------
    #1  ___pthread_mutex_lock
    #2  native_mutex_lock
    #3  safe_mutex_lock
    #4  Gtid_state::update_gtids_impl_lock_sidnos               <= waits for sidno lock
    #5  Gtid_state::update_commit_group
    #6  Commit_order_manager::flush_engine_and_signal_threads   <= acquires commit lock here
    #7  Commit_order_manager::finish
    #8  Commit_order_manager::wait_and_finish
    #9  ha_commit_low
    #10 trx_coordinator::commit_in_engines
    #11 MYSQL_BIN_LOG::commit
    #12 ha_commit_trans
    percona#13 trans_commit
    percona#14 Xid_log_event::do_commit
    percona#15 Xid_apply_log_event::do_apply_event_worker
    percona#16 Slave_worker::slave_worker_exec_event
    percona#17 slave_worker_exec_job_group
    percona#18 handle_slave_worker

14. If the applier thread continues without releasing the lock, then at a later
    stage, it can perform recursive locking while setting the GTID for the next
    transaction (in set_gtid_next()).

    In debug builds the above case hits the assertion
    `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the
    replica applier thread when it tries to re-acquire the lock.

Solution
--------
In the above problematic example, when seen from each thread
individually, we can conclude that there is no problem in the order of lock
acquisition, thus there is no need to change the lock order.

However, the root cause for this problem is that multiple threads can
concurrently access to the array `Gtid_state::commit_group_sidnos`.

In its initial implementation, it was expected that threads should
hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it
was not considered when upstream implemented WL#7846 (MTS:
slave-preserve-commit-order when log-slave-updates/binlog is disabled).

With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired
when the client thread (binlog flush leader) when it tries to perform GTID
update on behalf of threads waiting in "Commit Order" queue, thus providing a
guarantee that `Gtid_state::commit_group_sidnos` array is never accessed
without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep added a commit that referenced this pull request Sep 11, 2024
PS-5741: Incorrect use of memset_s in keyring_vault.

Fixed the usage of memset_s. The arguments should be:
void memset_s(void *dest, size_t dest_max, int c, size_t n)
where the 2nd argument is size of buffer and the 3rd is
argument is character to fill.

---------------------------------------------------------------------------

PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate

---

*Problem:*

`st_mysql_value::val_str` might return a pointer to `buf` which after
the function called is deleted. Therefore the value in `save`, after
reuturnin from the function, is invalid.

In this particular case, the error is not manifesting as val_str`
returns memory allocated with `thd_strmake` and it does not use `buf`.

*Solution:*

Allocate memory with `thd_strmake` so the memory in `save` is not local.

---------------------------------------------------------------------------

Fix test main.bug12969156 when WITH_ASAN=ON

*Problem:*

ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`:

```
==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478
WRITE of size 24 at 0x7fe746d06d14 thread T16777215

Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame
    #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62

  This frame has 4 object(s):
    [48, 56) 'result' (line 66)
    [80, 112) '_db_stack_frame_' (line 63)
    [144, 200) 'tm_tmp' (line 67)
    [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
Thread T26 created by T25 here:
    #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
    #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104
    #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148
    #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279
    #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279
    #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664
    #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160
    #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952
    #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544
    #9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065
    #10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325
    #11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198
    #12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473
```

The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above.

This is a benign error as all the variables are on the stack.

*Solution*:

Finish the thread in orderly way by using a signalling variable.

---------------------------------------------------------------------------

PS-8204: Fix XML escape rules for audit plugin

https://jira.percona.com/browse/PS-8204

There was a wrong length specified for some XML
escape rules. As a result of this terminating null symbol from
replacement rule was copied into resulting string. This lead to
quer text truncation in audit log file.
In addition added empty replacement rules for '\b' and 'f' symbols
which just remove them from resulting string. These symboles are
not supported in XML 1.0.

---------------------------------------------------------------------------

PS-8854: Add main.percona_udf MTR test

Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions.

---------------------------------------------------------------------------

PS-9218: Merge MySQL 8.4.0 (fix gcc-14 build)

https://perconadev.atlassian.net/browse/PS-9218
inikep pushed a commit that referenced this pull request Sep 12, 2024
…s=0 and a local DDL

         executed

https://perconadev.atlassian.net/browse/PS-9018

Problem
-------
In high concurrency scenarios, MySQL replica can enter into a deadlock due to a
race condition between the replica applier thread and the client thread
performing a binlog group commit.

Analysis
--------
It needs at least 3 threads for this deadlock to happen

1. One client thread
2. Two replica applier threads

How this deadlock happens?
--------------------------
0. Binlog is enabled on replica, but log_replica_updates is disabled.

1. Initially, both "Commit Order" and "Binlog Flush" queues are empty.

2. Replica applier thread 1 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier
   thread 1

   3.1. Becomes leader (In Commit_stage_manager::enroll_for()).

   3.2. Registers in the commit order queue.

   3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log.

   3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is
        not yet released.

   NOTE: SE commit for applier thread is already done by the time it reaches
         here.

4. Replica applier thread 2 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the
   applier thread 2

   5.1. Becomes leader (In Commit_stage_manager::enroll_for())

   5.2. Registers in the commit order queue.

   5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier
        thread 1 it will wait until the lock is released.

6. Client thread enters the group commit pipeline to register in the
   "Binlog Flush" queue.

7. Since "Commit Order" queue is not empty (there is applier thread 2 in the
   queue), it enters the conditional wait `m_stage_cond_leader` with an
   intention to become the leader for both the "Binlog Flush" and
   "Commit Order" queues.

8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update
   the GTID by calling gtid_state->update_commit_group() from
   Commit_order_manager::flush_engine_and_signal_threads().

9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log.

   9.1. It checks if there is any thread waiting in the "Binlog Flush" queue
        to become the leader. Here it finds the client thread waiting to be
        the leader.

   9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the
        cond_var `m_stage_cond_leader` and enters a conditional wait until the
        thread's `tx_commit_pending` is set to false by the client thread
       (will be done in the
       Commit_stage_manager::process_final_stage_for_ordered_commit_group()
       called by client thread from fetch_and_process_flush_stage_queue()).

10. The client thread wakes up from the cond_var `m_stage_cond_leader`.  The
    thread has now become a leader and it is its responsibility to update GTID
    of applier thread 2.

    10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log.

    10.2. Returns from `enroll_for()` and proceeds to process the
          "Commit Order" and "Binlog Flush" queues.

    10.3. Fetches the "Commit Order" and "Binlog Flush" queues.

    10.4. Performs the storage engine flush by calling ha_flush_logs() from
          fetch_and_process_flush_stage_queue().

    10.5. Proceeds to update the GTID of threads in "Commit Order" queue by
          calling gtid_state->update_commit_group() from
          Commit_stage_manager::process_final_stage_for_ordered_commit_group().

11. At this point, we will have

    - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and
    - Applier thread 1 performing GTID update for itself (from step 8).

    Due to the lack of proper synchronization between the above two threads,
    there exists a time window where both threads can call
    gtid_state->update_commit_group() concurrently.

    In subsequent steps, both threads simultaneously try to modify the contents
    of the array `commit_group_sidnos` which is used to track the lock status of
    sidnos. This concurrent access to `update_commit_group()` can cause a
    lock-leak resulting in one thread acquiring the sidno lock and not
    releasing at all.

-----------------------------------------------------------------------------------------------------------
Client thread                                           Applier Thread 1
-----------------------------------------------------------------------------------------------------------
update_commit_group() => global_sid_lock->rdlock();     update_commit_group() => global_sid_lock->rdlock();

calls update_gtids_impl_lock_sidnos()                   calls update_gtids_impl_lock_sidnos()

set commit_group_sidno[2] = true                        set commit_group_sidno[2] = true

                                                        lock_sidno(2) -> successful

lock_sidno(2) -> waits

                                                        update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

                                                        if (commit_group_sidnos[2]) {
                                                          unlock_sidno(2);
                                                          commit_group_sidnos[2] = false;
                                                        }

                                                        Applier thread continues..

lock_sidno(2) -> successful

update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

if (commit_group_sidnos[2]) { <=== this check fails and lock is not released.
  unlock_sidno(2);
  commit_group_sidnos[2] = false;
}

Client thread continues without releasing the lock
-----------------------------------------------------------------------------------------------------------

12. As the above lock-leak can also happen the other way i.e, the applier
    thread fails to unlock, there can be different consequences hereafter.

13. If the client thread continues without releasing the lock, then at a later
    stage, it can enter into a deadlock with the applier thread performing a
    GTID update with stack trace.

    Client_thread
    -------------
    #1  __GI___lll_lock_wait
    #2  ___pthread_mutex_lock
    #3  native_mutex_lock                                       <= waits for commit lock while holding sidno lock
    #4  Commit_stage_manager::enroll_for
    #5  MYSQL_BIN_LOG::change_stage
    #6  MYSQL_BIN_LOG::ordered_commit
    #7  MYSQL_BIN_LOG::commit
    #8  ha_commit_trans
    #9  trans_commit_implicit
    #10 mysql_create_like_table
    #11 Sql_cmd_create_table::execute
    #12 mysql_execute_command
    percona#13 dispatch_sql_command

    Applier thread
    --------------
    #1  ___pthread_mutex_lock
    #2  native_mutex_lock
    #3  safe_mutex_lock
    #4  Gtid_state::update_gtids_impl_lock_sidnos               <= waits for sidno lock
    #5  Gtid_state::update_commit_group
    #6  Commit_order_manager::flush_engine_and_signal_threads   <= acquires commit lock here
    #7  Commit_order_manager::finish
    #8  Commit_order_manager::wait_and_finish
    #9  ha_commit_low
    #10 trx_coordinator::commit_in_engines
    #11 MYSQL_BIN_LOG::commit
    #12 ha_commit_trans
    percona#13 trans_commit
    percona#14 Xid_log_event::do_commit
    percona#15 Xid_apply_log_event::do_apply_event_worker
    percona#16 Slave_worker::slave_worker_exec_event
    percona#17 slave_worker_exec_job_group
    percona#18 handle_slave_worker

14. If the applier thread continues without releasing the lock, then at a later
    stage, it can perform recursive locking while setting the GTID for the next
    transaction (in set_gtid_next()).

    In debug builds the above case hits the assertion
    `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the
    replica applier thread when it tries to re-acquire the lock.

Solution
--------
In the above problematic example, when seen from each thread
individually, we can conclude that there is no problem in the order of lock
acquisition, thus there is no need to change the lock order.

However, the root cause for this problem is that multiple threads can
concurrently access to the array `Gtid_state::commit_group_sidnos`.

In its initial implementation, it was expected that threads should
hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it
was not considered when upstream implemented WL#7846 (MTS:
slave-preserve-commit-order when log-slave-updates/binlog is disabled).

With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired
when the client thread (binlog flush leader) when it tries to perform GTID
update on behalf of threads waiting in "Commit Order" queue, thus providing a
guarantee that `Gtid_state::commit_group_sidnos` array is never accessed
without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep added a commit that referenced this pull request Sep 12, 2024
PS-5741: Incorrect use of memset_s in keyring_vault.

Fixed the usage of memset_s. The arguments should be:
void memset_s(void *dest, size_t dest_max, int c, size_t n)
where the 2nd argument is size of buffer and the 3rd is
argument is character to fill.

---------------------------------------------------------------------------

PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate

---

*Problem:*

`st_mysql_value::val_str` might return a pointer to `buf` which after
the function called is deleted. Therefore the value in `save`, after
reuturnin from the function, is invalid.

In this particular case, the error is not manifesting as val_str`
returns memory allocated with `thd_strmake` and it does not use `buf`.

*Solution:*

Allocate memory with `thd_strmake` so the memory in `save` is not local.

---------------------------------------------------------------------------

Fix test main.bug12969156 when WITH_ASAN=ON

*Problem:*

ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`:

```
==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478
WRITE of size 24 at 0x7fe746d06d14 thread T16777215

Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame
    #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62

  This frame has 4 object(s):
    [48, 56) 'result' (line 66)
    [80, 112) '_db_stack_frame_' (line 63)
    [144, 200) 'tm_tmp' (line 67)
    [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
Thread T26 created by T25 here:
    #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
    #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104
    #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148
    #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279
    #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279
    #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664
    #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160
    #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952
    #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544
    #9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065
    #10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325
    #11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198
    #12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473
```

The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above.

This is a benign error as all the variables are on the stack.

*Solution*:

Finish the thread in orderly way by using a signalling variable.

---------------------------------------------------------------------------

PS-8204: Fix XML escape rules for audit plugin

https://jira.percona.com/browse/PS-8204

There was a wrong length specified for some XML
escape rules. As a result of this terminating null symbol from
replacement rule was copied into resulting string. This lead to
quer text truncation in audit log file.
In addition added empty replacement rules for '\b' and 'f' symbols
which just remove them from resulting string. These symboles are
not supported in XML 1.0.

---------------------------------------------------------------------------

PS-8854: Add main.percona_udf MTR test

Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions.

---------------------------------------------------------------------------

PS-9218: Merge MySQL 8.4.0 (fix gcc-14 build)

https://perconadev.atlassian.net/browse/PS-9218
inikep pushed a commit that referenced this pull request Sep 17, 2024
…s=0 and a local DDL

         executed

https://perconadev.atlassian.net/browse/PS-9018

Problem
-------
In high concurrency scenarios, MySQL replica can enter into a deadlock due to a
race condition between the replica applier thread and the client thread
performing a binlog group commit.

Analysis
--------
It needs at least 3 threads for this deadlock to happen

1. One client thread
2. Two replica applier threads

How this deadlock happens?
--------------------------
0. Binlog is enabled on replica, but log_replica_updates is disabled.

1. Initially, both "Commit Order" and "Binlog Flush" queues are empty.

2. Replica applier thread 1 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier
   thread 1

   3.1. Becomes leader (In Commit_stage_manager::enroll_for()).

   3.2. Registers in the commit order queue.

   3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log.

   3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is
        not yet released.

   NOTE: SE commit for applier thread is already done by the time it reaches
         here.

4. Replica applier thread 2 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the
   applier thread 2

   5.1. Becomes leader (In Commit_stage_manager::enroll_for())

   5.2. Registers in the commit order queue.

   5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier
        thread 1 it will wait until the lock is released.

6. Client thread enters the group commit pipeline to register in the
   "Binlog Flush" queue.

7. Since "Commit Order" queue is not empty (there is applier thread 2 in the
   queue), it enters the conditional wait `m_stage_cond_leader` with an
   intention to become the leader for both the "Binlog Flush" and
   "Commit Order" queues.

8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update
   the GTID by calling gtid_state->update_commit_group() from
   Commit_order_manager::flush_engine_and_signal_threads().

9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log.

   9.1. It checks if there is any thread waiting in the "Binlog Flush" queue
        to become the leader. Here it finds the client thread waiting to be
        the leader.

   9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the
        cond_var `m_stage_cond_leader` and enters a conditional wait until the
        thread's `tx_commit_pending` is set to false by the client thread
       (will be done in the
       Commit_stage_manager::process_final_stage_for_ordered_commit_group()
       called by client thread from fetch_and_process_flush_stage_queue()).

10. The client thread wakes up from the cond_var `m_stage_cond_leader`.  The
    thread has now become a leader and it is its responsibility to update GTID
    of applier thread 2.

    10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log.

    10.2. Returns from `enroll_for()` and proceeds to process the
          "Commit Order" and "Binlog Flush" queues.

    10.3. Fetches the "Commit Order" and "Binlog Flush" queues.

    10.4. Performs the storage engine flush by calling ha_flush_logs() from
          fetch_and_process_flush_stage_queue().

    10.5. Proceeds to update the GTID of threads in "Commit Order" queue by
          calling gtid_state->update_commit_group() from
          Commit_stage_manager::process_final_stage_for_ordered_commit_group().

11. At this point, we will have

    - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and
    - Applier thread 1 performing GTID update for itself (from step 8).

    Due to the lack of proper synchronization between the above two threads,
    there exists a time window where both threads can call
    gtid_state->update_commit_group() concurrently.

    In subsequent steps, both threads simultaneously try to modify the contents
    of the array `commit_group_sidnos` which is used to track the lock status of
    sidnos. This concurrent access to `update_commit_group()` can cause a
    lock-leak resulting in one thread acquiring the sidno lock and not
    releasing at all.

-----------------------------------------------------------------------------------------------------------
Client thread                                           Applier Thread 1
-----------------------------------------------------------------------------------------------------------
update_commit_group() => global_sid_lock->rdlock();     update_commit_group() => global_sid_lock->rdlock();

calls update_gtids_impl_lock_sidnos()                   calls update_gtids_impl_lock_sidnos()

set commit_group_sidno[2] = true                        set commit_group_sidno[2] = true

                                                        lock_sidno(2) -> successful

lock_sidno(2) -> waits

                                                        update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

                                                        if (commit_group_sidnos[2]) {
                                                          unlock_sidno(2);
                                                          commit_group_sidnos[2] = false;
                                                        }

                                                        Applier thread continues..

lock_sidno(2) -> successful

update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

if (commit_group_sidnos[2]) { <=== this check fails and lock is not released.
  unlock_sidno(2);
  commit_group_sidnos[2] = false;
}

Client thread continues without releasing the lock
-----------------------------------------------------------------------------------------------------------

12. As the above lock-leak can also happen the other way i.e, the applier
    thread fails to unlock, there can be different consequences hereafter.

13. If the client thread continues without releasing the lock, then at a later
    stage, it can enter into a deadlock with the applier thread performing a
    GTID update with stack trace.

    Client_thread
    -------------
    #1  __GI___lll_lock_wait
    #2  ___pthread_mutex_lock
    #3  native_mutex_lock                                       <= waits for commit lock while holding sidno lock
    #4  Commit_stage_manager::enroll_for
    #5  MYSQL_BIN_LOG::change_stage
    #6  MYSQL_BIN_LOG::ordered_commit
    #7  MYSQL_BIN_LOG::commit
    #8  ha_commit_trans
    #9  trans_commit_implicit
    #10 mysql_create_like_table
    #11 Sql_cmd_create_table::execute
    #12 mysql_execute_command
    percona#13 dispatch_sql_command

    Applier thread
    --------------
    #1  ___pthread_mutex_lock
    #2  native_mutex_lock
    #3  safe_mutex_lock
    #4  Gtid_state::update_gtids_impl_lock_sidnos               <= waits for sidno lock
    #5  Gtid_state::update_commit_group
    #6  Commit_order_manager::flush_engine_and_signal_threads   <= acquires commit lock here
    #7  Commit_order_manager::finish
    #8  Commit_order_manager::wait_and_finish
    #9  ha_commit_low
    #10 trx_coordinator::commit_in_engines
    #11 MYSQL_BIN_LOG::commit
    #12 ha_commit_trans
    percona#13 trans_commit
    percona#14 Xid_log_event::do_commit
    percona#15 Xid_apply_log_event::do_apply_event_worker
    percona#16 Slave_worker::slave_worker_exec_event
    percona#17 slave_worker_exec_job_group
    percona#18 handle_slave_worker

14. If the applier thread continues without releasing the lock, then at a later
    stage, it can perform recursive locking while setting the GTID for the next
    transaction (in set_gtid_next()).

    In debug builds the above case hits the assertion
    `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the
    replica applier thread when it tries to re-acquire the lock.

Solution
--------
In the above problematic example, when seen from each thread
individually, we can conclude that there is no problem in the order of lock
acquisition, thus there is no need to change the lock order.

However, the root cause for this problem is that multiple threads can
concurrently access to the array `Gtid_state::commit_group_sidnos`.

In its initial implementation, it was expected that threads should
hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it
was not considered when upstream implemented WL#7846 (MTS:
slave-preserve-commit-order when log-slave-updates/binlog is disabled).

With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired
when the client thread (binlog flush leader) when it tries to perform GTID
update on behalf of threads waiting in "Commit Order" queue, thus providing a
guarantee that `Gtid_state::commit_group_sidnos` array is never accessed
without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep added a commit that referenced this pull request Sep 17, 2024
PS-5741: Incorrect use of memset_s in keyring_vault.

Fixed the usage of memset_s. The arguments should be:
void memset_s(void *dest, size_t dest_max, int c, size_t n)
where the 2nd argument is size of buffer and the 3rd is
argument is character to fill.

---------------------------------------------------------------------------

PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate

---

*Problem:*

`st_mysql_value::val_str` might return a pointer to `buf` which after
the function called is deleted. Therefore the value in `save`, after
reuturnin from the function, is invalid.

In this particular case, the error is not manifesting as val_str`
returns memory allocated with `thd_strmake` and it does not use `buf`.

*Solution:*

Allocate memory with `thd_strmake` so the memory in `save` is not local.

---------------------------------------------------------------------------

Fix test main.bug12969156 when WITH_ASAN=ON

*Problem:*

ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`:

```
==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478
WRITE of size 24 at 0x7fe746d06d14 thread T16777215

Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame
    #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62

  This frame has 4 object(s):
    [48, 56) 'result' (line 66)
    [80, 112) '_db_stack_frame_' (line 63)
    [144, 200) 'tm_tmp' (line 67)
    [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
Thread T26 created by T25 here:
    #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
    #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104
    #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148
    #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279
    #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279
    #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664
    #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160
    #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952
    #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544
    #9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065
    #10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325
    #11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198
    #12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473
```

The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above.

This is a benign error as all the variables are on the stack.

*Solution*:

Finish the thread in orderly way by using a signalling variable.

---------------------------------------------------------------------------

PS-8204: Fix XML escape rules for audit plugin

https://jira.percona.com/browse/PS-8204

There was a wrong length specified for some XML
escape rules. As a result of this terminating null symbol from
replacement rule was copied into resulting string. This lead to
quer text truncation in audit log file.
In addition added empty replacement rules for '\b' and 'f' symbols
which just remove them from resulting string. These symboles are
not supported in XML 1.0.

---------------------------------------------------------------------------

PS-8854: Add main.percona_udf MTR test

Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions.

---------------------------------------------------------------------------

PS-9218: Merge MySQL 8.4.0 (fix gcc-14 build)

https://perconadev.atlassian.net/browse/PS-9218
inikep added a commit that referenced this pull request Sep 25, 2024
PS-5741: Incorrect use of memset_s in keyring_vault.

Fixed the usage of memset_s. The arguments should be:
void memset_s(void *dest, size_t dest_max, int c, size_t n)
where the 2nd argument is size of buffer and the 3rd is
argument is character to fill.

---------------------------------------------------------------------------

PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate

---

*Problem:*

`st_mysql_value::val_str` might return a pointer to `buf` which after
the function called is deleted. Therefore the value in `save`, after
reuturnin from the function, is invalid.

In this particular case, the error is not manifesting as val_str`
returns memory allocated with `thd_strmake` and it does not use `buf`.

*Solution:*

Allocate memory with `thd_strmake` so the memory in `save` is not local.

---------------------------------------------------------------------------

Fix test main.bug12969156 when WITH_ASAN=ON

*Problem:*

ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`:

```
==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478
WRITE of size 24 at 0x7fe746d06d14 thread T16777215

Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame
    #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62

  This frame has 4 object(s):
    [48, 56) 'result' (line 66)
    [80, 112) '_db_stack_frame_' (line 63)
    [144, 200) 'tm_tmp' (line 67)
    [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
Thread T26 created by T25 here:
    #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
    #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104
    #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148
    #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279
    #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279
    #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664
    #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160
    #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952
    #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544
    #9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065
    #10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325
    #11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198
    #12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473
```

The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above.

This is a benign error as all the variables are on the stack.

*Solution*:

Finish the thread in orderly way by using a signalling variable.

---------------------------------------------------------------------------

PS-8204: Fix XML escape rules for audit plugin

https://jira.percona.com/browse/PS-8204

There was a wrong length specified for some XML
escape rules. As a result of this terminating null symbol from
replacement rule was copied into resulting string. This lead to
quer text truncation in audit log file.
In addition added empty replacement rules for '\b' and 'f' symbols
which just remove them from resulting string. These symboles are
not supported in XML 1.0.

---------------------------------------------------------------------------

PS-8854: Add main.percona_udf MTR test

Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions.
inikep pushed a commit that referenced this pull request Sep 25, 2024
…ocal DDL

         executed

https://perconadev.atlassian.net/browse/PS-9018

Problem
-------
In high concurrency scenarios, MySQL replica can enter into a deadlock due to a
race condition between the replica applier thread and the client thread
performing a binlog group commit.

Analysis
--------
It needs at least 3 threads for this deadlock to happen

1. One client thread
2. Two replica applier threads

How this deadlock happens?
--------------------------
0. Binlog is enabled on replica, but log_replica_updates is disabled.

1. Initially, both "Commit Order" and "Binlog Flush" queues are empty.

2. Replica applier thread 1 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier
   thread 1

   3.1. Becomes leader (In Commit_stage_manager::enroll_for()).

   3.2. Registers in the commit order queue.

   3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log.

   3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is
        not yet released.

   NOTE: SE commit for applier thread is already done by the time it reaches
         here.

4. Replica applier thread 2 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the
   applier thread 2

   5.1. Becomes leader (In Commit_stage_manager::enroll_for())

   5.2. Registers in the commit order queue.

   5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier
        thread 1 it will wait until the lock is released.

6. Client thread enters the group commit pipeline to register in the
   "Binlog Flush" queue.

7. Since "Commit Order" queue is not empty (there is applier thread 2 in the
   queue), it enters the conditional wait `m_stage_cond_leader` with an
   intention to become the leader for both the "Binlog Flush" and
   "Commit Order" queues.

8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update
   the GTID by calling gtid_state->update_commit_group() from
   Commit_order_manager::flush_engine_and_signal_threads().

9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log.

   9.1. It checks if there is any thread waiting in the "Binlog Flush" queue
        to become the leader. Here it finds the client thread waiting to be
        the leader.

   9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the
        cond_var `m_stage_cond_leader` and enters a conditional wait until the
        thread's `tx_commit_pending` is set to false by the client thread
       (will be done in the
       Commit_stage_manager::process_final_stage_for_ordered_commit_group()
       called by client thread from fetch_and_process_flush_stage_queue()).

10. The client thread wakes up from the cond_var `m_stage_cond_leader`.  The
    thread has now become a leader and it is its responsibility to update GTID
    of applier thread 2.

    10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log.

    10.2. Returns from `enroll_for()` and proceeds to process the
          "Commit Order" and "Binlog Flush" queues.

    10.3. Fetches the "Commit Order" and "Binlog Flush" queues.

    10.4. Performs the storage engine flush by calling ha_flush_logs() from
          fetch_and_process_flush_stage_queue().

    10.5. Proceeds to update the GTID of threads in "Commit Order" queue by
          calling gtid_state->update_commit_group() from
          Commit_stage_manager::process_final_stage_for_ordered_commit_group().

11. At this point, we will have

    - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and
    - Applier thread 1 performing GTID update for itself (from step 8).

    Due to the lack of proper synchronization between the above two threads,
    there exists a time window where both threads can call
    gtid_state->update_commit_group() concurrently.

    In subsequent steps, both threads simultaneously try to modify the contents
    of the array `commit_group_sidnos` which is used to track the lock status of
    sidnos. This concurrent access to `update_commit_group()` can cause a
    lock-leak resulting in one thread acquiring the sidno lock and not
    releasing at all.

-----------------------------------------------------------------------------------------------------------
Client thread                                           Applier Thread 1
-----------------------------------------------------------------------------------------------------------
update_commit_group() => global_sid_lock->rdlock();     update_commit_group() => global_sid_lock->rdlock();

calls update_gtids_impl_lock_sidnos()                   calls update_gtids_impl_lock_sidnos()

set commit_group_sidno[2] = true                        set commit_group_sidno[2] = true

                                                        lock_sidno(2) -> successful

lock_sidno(2) -> waits

                                                        update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

                                                        if (commit_group_sidnos[2]) {
                                                          unlock_sidno(2);
                                                          commit_group_sidnos[2] = false;
                                                        }

                                                        Applier thread continues..

lock_sidno(2) -> successful

update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

if (commit_group_sidnos[2]) { <=== this check fails and lock is not released.
  unlock_sidno(2);
  commit_group_sidnos[2] = false;
}

Client thread continues without releasing the lock
-----------------------------------------------------------------------------------------------------------

12. As the above lock-leak can also happen the other way i.e, the applier
    thread fails to unlock, there can be different consequences hereafter.

13. If the client thread continues without releasing the lock, then at a later
    stage, it can enter into a deadlock with the applier thread performing a
    GTID update with stack trace.

    Client_thread
    -------------
    #1  __GI___lll_lock_wait
    #2  ___pthread_mutex_lock
    #3  native_mutex_lock                                       <= waits for commit lock while holding sidno lock
    #4  Commit_stage_manager::enroll_for
    #5  MYSQL_BIN_LOG::change_stage
    #6  MYSQL_BIN_LOG::ordered_commit
    #7  MYSQL_BIN_LOG::commit
    #8  ha_commit_trans
    #9  trans_commit_implicit
    #10 mysql_create_like_table
    #11 Sql_cmd_create_table::execute
    #12 mysql_execute_command
    percona#13 dispatch_sql_command

    Applier thread
    --------------
    #1  ___pthread_mutex_lock
    #2  native_mutex_lock
    #3  safe_mutex_lock
    #4  Gtid_state::update_gtids_impl_lock_sidnos               <= waits for sidno lock
    #5  Gtid_state::update_commit_group
    #6  Commit_order_manager::flush_engine_and_signal_threads   <= acquires commit lock here
    #7  Commit_order_manager::finish
    #8  Commit_order_manager::wait_and_finish
    #9  ha_commit_low
    #10 trx_coordinator::commit_in_engines
    #11 MYSQL_BIN_LOG::commit
    #12 ha_commit_trans
    percona#13 trans_commit
    percona#14 Xid_log_event::do_commit
    percona#15 Xid_apply_log_event::do_apply_event_worker
    percona#16 Slave_worker::slave_worker_exec_event
    percona#17 slave_worker_exec_job_group
    percona#18 handle_slave_worker

14. If the applier thread continues without releasing the lock, then at a later
    stage, it can perform recursive locking while setting the GTID for the next
    transaction (in set_gtid_next()).

    In debug builds the above case hits the assertion
    `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the
    replica applier thread when it tries to re-acquire the lock.

Solution
--------
In the above problematic example, when seen from each thread
individually, we can conclude that there is no problem in the order of lock
acquisition, thus there is no need to change the lock order.

However, the root cause for this problem is that multiple threads can
concurrently access to the array `Gtid_state::commit_group_sidnos`.

In its initial implementation, it was expected that threads should
hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it
was not considered when upstream implemented WL#7846 (MTS:
slave-preserve-commit-order when log-slave-updates/binlog is disabled).

With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired
when the client thread (binlog flush leader) when it tries to perform GTID
update on behalf of threads waiting in "Commit Order" queue, thus providing a
guarantee that `Gtid_state::commit_group_sidnos` array is never accessed
without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep added a commit that referenced this pull request Oct 28, 2024
PS-5741: Incorrect use of memset_s in keyring_vault.

Fixed the usage of memset_s. The arguments should be:
void memset_s(void *dest, size_t dest_max, int c, size_t n)
where the 2nd argument is size of buffer and the 3rd is
argument is character to fill.

---------------------------------------------------------------------------

PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate

---

*Problem:*

`st_mysql_value::val_str` might return a pointer to `buf` which after
the function called is deleted. Therefore the value in `save`, after
reuturnin from the function, is invalid.

In this particular case, the error is not manifesting as val_str`
returns memory allocated with `thd_strmake` and it does not use `buf`.

*Solution:*

Allocate memory with `thd_strmake` so the memory in `save` is not local.

---------------------------------------------------------------------------

Fix test main.bug12969156 when WITH_ASAN=ON

*Problem:*

ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`:

```
==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478
WRITE of size 24 at 0x7fe746d06d14 thread T16777215

Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame
    #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62

  This frame has 4 object(s):
    [48, 56) 'result' (line 66)
    [80, 112) '_db_stack_frame_' (line 63)
    [144, 200) 'tm_tmp' (line 67)
    [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
Thread T26 created by T25 here:
    #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
    #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104
    #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148
    #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279
    #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279
    #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664
    #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160
    #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952
    #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544
    #9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065
    #10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325
    #11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198
    #12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473
```

The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above.

This is a benign error as all the variables are on the stack.

*Solution*:

Finish the thread in orderly way by using a signalling variable.

---------------------------------------------------------------------------

PS-8204: Fix XML escape rules for audit plugin

https://jira.percona.com/browse/PS-8204

There was a wrong length specified for some XML
escape rules. As a result of this terminating null symbol from
replacement rule was copied into resulting string. This lead to
quer text truncation in audit log file.
In addition added empty replacement rules for '\b' and 'f' symbols
which just remove them from resulting string. These symboles are
not supported in XML 1.0.

---------------------------------------------------------------------------

PS-8854: Add main.percona_udf MTR test

Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions.

---------------------------------------------------------------------------

PS-9369: Fix currently processed query comparison in audit_log

https://perconadev.atlassian.net/browse/PS-9369

The audit_log uses stack to keep track of table access operations being
performed in scope of one query. It compares last known table access query
string stored on top of this stack with actual query in audit event being
processed at the moment to decide if new record should be pushed to stack
or it is time to clean records from the stack.

Currently audit_log simply compares char* variables to decide if this is
the same query string. This approach doesn't work. As a result plugin looses
control of the stack size and it starts growing with the time consuming
memory. This issue is not noticable on short term server connections
as memory is freed once connection is closed. At the same time this
leads to extra memory consumption for long running server connections.

The following is done to fix the issue:
- Query is sent along with audit event as MYSQL_LEX_CSTRING structure.
  It is not correct to ignore MYSQL_LEX_CSTRING.length comparison as
  sometimes MYSQL_LEX_CSTRING.str pointer may be not iniialised
  properly. Added string length check to make sure structure contains
  any valid string.
- Used strncmp to compare actual strings instead of comparing char*
  variables.
inikep pushed a commit that referenced this pull request Oct 28, 2024
…ocal DDL

         executed

https://perconadev.atlassian.net/browse/PS-9018

Problem
-------
In high concurrency scenarios, MySQL replica can enter into a deadlock due to a
race condition between the replica applier thread and the client thread
performing a binlog group commit.

Analysis
--------
It needs at least 3 threads for this deadlock to happen

1. One client thread
2. Two replica applier threads

How this deadlock happens?
--------------------------
0. Binlog is enabled on replica, but log_replica_updates is disabled.

1. Initially, both "Commit Order" and "Binlog Flush" queues are empty.

2. Replica applier thread 1 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier
   thread 1

   3.1. Becomes leader (In Commit_stage_manager::enroll_for()).

   3.2. Registers in the commit order queue.

   3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log.

   3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is
        not yet released.

   NOTE: SE commit for applier thread is already done by the time it reaches
         here.

4. Replica applier thread 2 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the
   applier thread 2

   5.1. Becomes leader (In Commit_stage_manager::enroll_for())

   5.2. Registers in the commit order queue.

   5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier
        thread 1 it will wait until the lock is released.

6. Client thread enters the group commit pipeline to register in the
   "Binlog Flush" queue.

7. Since "Commit Order" queue is not empty (there is applier thread 2 in the
   queue), it enters the conditional wait `m_stage_cond_leader` with an
   intention to become the leader for both the "Binlog Flush" and
   "Commit Order" queues.

8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update
   the GTID by calling gtid_state->update_commit_group() from
   Commit_order_manager::flush_engine_and_signal_threads().

9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log.

   9.1. It checks if there is any thread waiting in the "Binlog Flush" queue
        to become the leader. Here it finds the client thread waiting to be
        the leader.

   9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the
        cond_var `m_stage_cond_leader` and enters a conditional wait until the
        thread's `tx_commit_pending` is set to false by the client thread
       (will be done in the
       Commit_stage_manager::process_final_stage_for_ordered_commit_group()
       called by client thread from fetch_and_process_flush_stage_queue()).

10. The client thread wakes up from the cond_var `m_stage_cond_leader`.  The
    thread has now become a leader and it is its responsibility to update GTID
    of applier thread 2.

    10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log.

    10.2. Returns from `enroll_for()` and proceeds to process the
          "Commit Order" and "Binlog Flush" queues.

    10.3. Fetches the "Commit Order" and "Binlog Flush" queues.

    10.4. Performs the storage engine flush by calling ha_flush_logs() from
          fetch_and_process_flush_stage_queue().

    10.5. Proceeds to update the GTID of threads in "Commit Order" queue by
          calling gtid_state->update_commit_group() from
          Commit_stage_manager::process_final_stage_for_ordered_commit_group().

11. At this point, we will have

    - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and
    - Applier thread 1 performing GTID update for itself (from step 8).

    Due to the lack of proper synchronization between the above two threads,
    there exists a time window where both threads can call
    gtid_state->update_commit_group() concurrently.

    In subsequent steps, both threads simultaneously try to modify the contents
    of the array `commit_group_sidnos` which is used to track the lock status of
    sidnos. This concurrent access to `update_commit_group()` can cause a
    lock-leak resulting in one thread acquiring the sidno lock and not
    releasing at all.

-----------------------------------------------------------------------------------------------------------
Client thread                                           Applier Thread 1
-----------------------------------------------------------------------------------------------------------
update_commit_group() => global_sid_lock->rdlock();     update_commit_group() => global_sid_lock->rdlock();

calls update_gtids_impl_lock_sidnos()                   calls update_gtids_impl_lock_sidnos()

set commit_group_sidno[2] = true                        set commit_group_sidno[2] = true

                                                        lock_sidno(2) -> successful

lock_sidno(2) -> waits

                                                        update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

                                                        if (commit_group_sidnos[2]) {
                                                          unlock_sidno(2);
                                                          commit_group_sidnos[2] = false;
                                                        }

                                                        Applier thread continues..

lock_sidno(2) -> successful

update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

if (commit_group_sidnos[2]) { <=== this check fails and lock is not released.
  unlock_sidno(2);
  commit_group_sidnos[2] = false;
}

Client thread continues without releasing the lock
-----------------------------------------------------------------------------------------------------------

12. As the above lock-leak can also happen the other way i.e, the applier
    thread fails to unlock, there can be different consequences hereafter.

13. If the client thread continues without releasing the lock, then at a later
    stage, it can enter into a deadlock with the applier thread performing a
    GTID update with stack trace.

    Client_thread
    -------------
    #1  __GI___lll_lock_wait
    #2  ___pthread_mutex_lock
    #3  native_mutex_lock                                       <= waits for commit lock while holding sidno lock
    #4  Commit_stage_manager::enroll_for
    #5  MYSQL_BIN_LOG::change_stage
    #6  MYSQL_BIN_LOG::ordered_commit
    #7  MYSQL_BIN_LOG::commit
    #8  ha_commit_trans
    #9  trans_commit_implicit
    #10 mysql_create_like_table
    #11 Sql_cmd_create_table::execute
    #12 mysql_execute_command
    percona#13 dispatch_sql_command

    Applier thread
    --------------
    #1  ___pthread_mutex_lock
    #2  native_mutex_lock
    #3  safe_mutex_lock
    #4  Gtid_state::update_gtids_impl_lock_sidnos               <= waits for sidno lock
    #5  Gtid_state::update_commit_group
    #6  Commit_order_manager::flush_engine_and_signal_threads   <= acquires commit lock here
    #7  Commit_order_manager::finish
    #8  Commit_order_manager::wait_and_finish
    #9  ha_commit_low
    #10 trx_coordinator::commit_in_engines
    #11 MYSQL_BIN_LOG::commit
    #12 ha_commit_trans
    percona#13 trans_commit
    percona#14 Xid_log_event::do_commit
    percona#15 Xid_apply_log_event::do_apply_event_worker
    percona#16 Slave_worker::slave_worker_exec_event
    percona#17 slave_worker_exec_job_group
    percona#18 handle_slave_worker

14. If the applier thread continues without releasing the lock, then at a later
    stage, it can perform recursive locking while setting the GTID for the next
    transaction (in set_gtid_next()).

    In debug builds the above case hits the assertion
    `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the
    replica applier thread when it tries to re-acquire the lock.

Solution
--------
In the above problematic example, when seen from each thread
individually, we can conclude that there is no problem in the order of lock
acquisition, thus there is no need to change the lock order.

However, the root cause for this problem is that multiple threads can
concurrently access to the array `Gtid_state::commit_group_sidnos`.

In its initial implementation, it was expected that threads should
hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it
was not considered when upstream implemented WL#7846 (MTS:
slave-preserve-commit-order when log-slave-updates/binlog is disabled).

With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired
when the client thread (binlog flush leader) when it tries to perform GTID
update on behalf of threads waiting in "Commit Order" queue, thus providing a
guarantee that `Gtid_state::commit_group_sidnos` array is never accessed
without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep pushed a commit that referenced this pull request Nov 6, 2024
… for connection xxx'.

The new iterator based explains are not impacted.

The issue here is a race condition. More than one thread is using the
query term iterator at the same time (whoch is neithe threas safe nor
reantrant), and part of its state is in the query terms being visited
which leads to interference/race conditions.

a) the explain thread

uses an iterator here:

   Sql_cmd_explain_other_thread::execute

is inspecting the Query_expression of the running query
calling master_query_expression()->find_blocks_query_term which uses
an iterator over the query terms in the query expression:

   for (auto qt : query_terms<>()) {
       if (qt->query_block() == qb) {
           return qt;
       }
   }

the above search fails to find qb due to the interference of the
thread b), see below, and then tries to access a nullpointer:

    * thread percona#36, name = ‘connection’, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  frame #0: 0x000000010bb3cf0d mysqld`Query_block::type(this=0x00007f8f82719088) const at sql_lex.cc:4441:11
  frame #1: 0x000000010b83763e mysqld`(anonymous namespace)::Explain::explain_select_type(this=0x00007000020611b8) at opt_explain.cc:792:50
  frame #2: 0x000000010b83cc4d mysqld`(anonymous namespace)::Explain_join::explain_select_type(this=0x00007000020611b8) at opt_explain.cc:1487:21
  frame #3: 0x000000010b837c34 mysqld`(anonymous namespace)::Explain::prepare_columns(this=0x00007000020611b8) at opt_explain.cc:744:26
  frame #4: 0x000000010b83ea0e mysqld`(anonymous namespace)::Explain_join::explain_qep_tab(this=0x00007000020611b8, tabnum=0) at opt_explain.cc:1415:32
  frame #5: 0x000000010b83ca0a mysqld`(anonymous namespace)::Explain_join::shallow_explain(this=0x00007000020611b8) at opt_explain.cc:1364:9
  frame #6: 0x000000010b83379b mysqld`(anonymous namespace)::Explain::send(this=0x00007000020611b8) at opt_explain.cc:770:14
  frame #7: 0x000000010b834147 mysqld`explain_query_specification(explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, query_term=0x00007f8f82719088, ctx=CTX_JOIN) at opt_explain.cc:2088:20
  frame #8: 0x000000010bd36b91 mysqld`Query_expression::explain_query_term(this=0x00007f8f7a090360, explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, qt=0x00007f8f82719088) at sql_union.cc:1519:11
  frame #9: 0x000000010bd36c68 mysqld`Query_expression::explain_query_term(this=0x00007f8f7a090360, explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, qt=0x00007f8f8271d748) at sql_union.cc:1526:13
  frame #10: 0x000000010bd373f7 mysqld`Query_expression::explain(this=0x00007f8f7a090360, explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00) at sql_union.cc:1591:7
  frame #11: 0x000000010b835820 mysqld`mysql_explain_query_expression(explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, unit=0x00007f8f7a090360) at opt_explain.cc:2392:17
  frame #12: 0x000000010b835400 mysqld`explain_query(explain_thd=0x00007f8fbb111e00, query_thd=0x00007f8fbb919c00, unit=0x00007f8f7a090360) at opt_explain.cc:2353:13
 * frame percona#13: 0x000000010b8363e4 mysqld`Sql_cmd_explain_other_thread::execute(this=0x00007f8fba585b68, thd=0x00007f8fbb111e00) at opt_explain.cc:2531:11
  frame percona#14: 0x000000010bba7d8b mysqld`mysql_execute_command(thd=0x00007f8fbb111e00, first_level=true) at sql_parse.cc:4648:29
  frame percona#15: 0x000000010bb9e230 mysqld`dispatch_sql_command(thd=0x00007f8fbb111e00, parser_state=0x0000700002065de8) at sql_parse.cc:5303:19
  frame percona#16: 0x000000010bb9a4cb mysqld`dispatch_command(thd=0x00007f8fbb111e00, com_data=0x0000700002066e38, command=COM_QUERY) at sql_parse.cc:2135:7
  frame percona#17: 0x000000010bb9c846 mysqld`do_command(thd=0x00007f8fbb111e00) at sql_parse.cc:1464:18
  frame percona#18: 0x000000010b2f2574 mysqld`handle_connection(arg=0x0000600000e34200) at connection_handler_per_thread.cc:304:13
  frame percona#19: 0x000000010e072fc4 mysqld`pfs_spawn_thread(arg=0x00007f8fba8160b0) at pfs.cc:3051:3
  frame percona#20: 0x00007ff806c2b202 libsystem_pthread.dylib`_pthread_start + 99
  frame percona#21: 0x00007ff806c26bab libsystem_pthread.dylib`thread_start + 15

b) the query thread being explained is itself performing LEX::cleanup
and as part of the iterates over the query terms, but still allows
EXPLAIN of the query plan since

   thd->query_plan.set_query_plan(SQLCOM_END, ...)

hasn't been called yet.

     20:frame: Query_terms<(Visit_order)1, (Visit_leaves)0>::Query_term_iterator::operator++() (in mysqld) (query_term.h:613)
     21:frame: Query_expression::cleanup(bool) (in mysqld) (sql_union.cc:1861)
     22:frame: LEX::cleanup(bool) (in mysqld) (sql_lex.h:4286)
     30:frame: Sql_cmd_dml::execute(THD*) (in mysqld) (sql_select.cc:799)
     31:frame: mysql_execute_command(THD*, bool) (in mysqld) (sql_parse.cc:4648)
     32:frame: dispatch_sql_command(THD*, Parser_state*) (in mysqld) (sql_parse.cc:5303)
     33:frame: dispatch_command(THD*, COM_DATA const*, enum_server_command) (in mysqld) (sql_parse.cc:2135)
     34:frame: do_command(THD*) (in mysqld) (sql_parse.cc:1464)
     57:frame: handle_connection(void*) (in mysqld) (connection_handler_per_thread.cc:304)
     58:frame: pfs_spawn_thread(void*) (in mysqld) (pfs.cc:3053)
     65:frame: _pthread_start (in libsystem_pthread.dylib) + 99
     66:frame: thread_start (in libsystem_pthread.dylib) + 15

Solution:

This patch solves the issue by removing iterator state from
Query_term, making the query_term iterators thread safe. This solution
labels every child query_term with its index in its parent's
m_children vector.  The iterator can therefore easily compute the next
child to visit based on Query_term::m_sibling_idx.

A unit test case is added to check reentrancy.

One can also manually verify that we have no remaining race condition
by running two client connections files (with \. <file>) with a big
number of copies of the repro query in one connection and a big number
of EXPLAIN format=json FOR <connection>, e.g.

    EXPLAIN FORMAT=json FOR CONNECTION 8\G

in the other. The actual connection number would need to verified
in connection one, of course.

Change-Id: Ie7d56610914738ccbbecf399ccc4f465f7d26ea7
inikep pushed a commit that referenced this pull request Nov 11, 2024
…s=0 and a local DDL

         executed

https://perconadev.atlassian.net/browse/PS-9018

Problem
-------
In high concurrency scenarios, MySQL replica can enter into a deadlock due to a
race condition between the replica applier thread and the client thread
performing a binlog group commit.

Analysis
--------
It needs at least 3 threads for this deadlock to happen

1. One client thread
2. Two replica applier threads

How this deadlock happens?
--------------------------
0. Binlog is enabled on replica, but log_replica_updates is disabled.

1. Initially, both "Commit Order" and "Binlog Flush" queues are empty.

2. Replica applier thread 1 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

3. Since both "Commit Order" and "Binlog Flush" queues are empty, the applier
   thread 1

   3.1. Becomes leader (In Commit_stage_manager::enroll_for()).

   3.2. Registers in the commit order queue.

   3.3. Acquires the lock MYSQL_BIN_LOG::LOCK_log.

   3.4. Commit Order queue is emptied, but the lock MYSQL_BIN_LOG::LOCK_log is
        not yet released.

   NOTE: SE commit for applier thread is already done by the time it reaches
         here.

4. Replica applier thread 2 enters the group commit pipeline to register in the
   "Commit Order" queue since `log-replica-updates` is disabled on the replica
   node.

5. Since the "Commit Order" queue is empty (emptied by applier thread 1 in 3.4), the
   applier thread 2

   5.1. Becomes leader (In Commit_stage_manager::enroll_for())

   5.2. Registers in the commit order queue.

   5.3. Tries to acquire the lock MYSQL_BIN_LOG::LOCK_log. Since it is held by applier
        thread 1 it will wait until the lock is released.

6. Client thread enters the group commit pipeline to register in the
   "Binlog Flush" queue.

7. Since "Commit Order" queue is not empty (there is applier thread 2 in the
   queue), it enters the conditional wait `m_stage_cond_leader` with an
   intention to become the leader for both the "Binlog Flush" and
   "Commit Order" queues.

8. Applier thread 1 releases the lock MYSQL_BIN_LOG::LOCK_log and proceeds to update
   the GTID by calling gtid_state->update_commit_group() from
   Commit_order_manager::flush_engine_and_signal_threads().

9. Applier thread 2 acquires the lock MYSQL_BIN_LOG::LOCK_log.

   9.1. It checks if there is any thread waiting in the "Binlog Flush" queue
        to become the leader. Here it finds the client thread waiting to be
        the leader.

   9.2. It releases the lock MYSQL_BIN_LOG::LOCK_log and signals on the
        cond_var `m_stage_cond_leader` and enters a conditional wait until the
        thread's `tx_commit_pending` is set to false by the client thread
       (will be done in the
       Commit_stage_manager::process_final_stage_for_ordered_commit_group()
       called by client thread from fetch_and_process_flush_stage_queue()).

10. The client thread wakes up from the cond_var `m_stage_cond_leader`.  The
    thread has now become a leader and it is its responsibility to update GTID
    of applier thread 2.

    10.1. It acquires the lock MYSQL_BIN_LOG::LOCK_log.

    10.2. Returns from `enroll_for()` and proceeds to process the
          "Commit Order" and "Binlog Flush" queues.

    10.3. Fetches the "Commit Order" and "Binlog Flush" queues.

    10.4. Performs the storage engine flush by calling ha_flush_logs() from
          fetch_and_process_flush_stage_queue().

    10.5. Proceeds to update the GTID of threads in "Commit Order" queue by
          calling gtid_state->update_commit_group() from
          Commit_stage_manager::process_final_stage_for_ordered_commit_group().

11. At this point, we will have

    - Client thread performing GTID update on behalf if applier thread 2 (from step 10.5), and
    - Applier thread 1 performing GTID update for itself (from step 8).

    Due to the lack of proper synchronization between the above two threads,
    there exists a time window where both threads can call
    gtid_state->update_commit_group() concurrently.

    In subsequent steps, both threads simultaneously try to modify the contents
    of the array `commit_group_sidnos` which is used to track the lock status of
    sidnos. This concurrent access to `update_commit_group()` can cause a
    lock-leak resulting in one thread acquiring the sidno lock and not
    releasing at all.

-----------------------------------------------------------------------------------------------------------
Client thread                                           Applier Thread 1
-----------------------------------------------------------------------------------------------------------
update_commit_group() => global_sid_lock->rdlock();     update_commit_group() => global_sid_lock->rdlock();

calls update_gtids_impl_lock_sidnos()                   calls update_gtids_impl_lock_sidnos()

set commit_group_sidno[2] = true                        set commit_group_sidno[2] = true

                                                        lock_sidno(2) -> successful

lock_sidno(2) -> waits

                                                        update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

                                                        if (commit_group_sidnos[2]) {
                                                          unlock_sidno(2);
                                                          commit_group_sidnos[2] = false;
                                                        }

                                                        Applier thread continues..

lock_sidno(2) -> successful

update_gtids_impl_own_gtid() -> Add the thd->owned_gtid in `executed_gtids()`

if (commit_group_sidnos[2]) { <=== this check fails and lock is not released.
  unlock_sidno(2);
  commit_group_sidnos[2] = false;
}

Client thread continues without releasing the lock
-----------------------------------------------------------------------------------------------------------

12. As the above lock-leak can also happen the other way i.e, the applier
    thread fails to unlock, there can be different consequences hereafter.

13. If the client thread continues without releasing the lock, then at a later
    stage, it can enter into a deadlock with the applier thread performing a
    GTID update with stack trace.

    Client_thread
    -------------
    #1  __GI___lll_lock_wait
    #2  ___pthread_mutex_lock
    #3  native_mutex_lock                                       <= waits for commit lock while holding sidno lock
    #4  Commit_stage_manager::enroll_for
    #5  MYSQL_BIN_LOG::change_stage
    #6  MYSQL_BIN_LOG::ordered_commit
    #7  MYSQL_BIN_LOG::commit
    #8  ha_commit_trans
    #9  trans_commit_implicit
    #10 mysql_create_like_table
    #11 Sql_cmd_create_table::execute
    #12 mysql_execute_command
    percona#13 dispatch_sql_command

    Applier thread
    --------------
    #1  ___pthread_mutex_lock
    #2  native_mutex_lock
    #3  safe_mutex_lock
    #4  Gtid_state::update_gtids_impl_lock_sidnos               <= waits for sidno lock
    #5  Gtid_state::update_commit_group
    #6  Commit_order_manager::flush_engine_and_signal_threads   <= acquires commit lock here
    #7  Commit_order_manager::finish
    #8  Commit_order_manager::wait_and_finish
    #9  ha_commit_low
    #10 trx_coordinator::commit_in_engines
    #11 MYSQL_BIN_LOG::commit
    #12 ha_commit_trans
    percona#13 trans_commit
    percona#14 Xid_log_event::do_commit
    percona#15 Xid_apply_log_event::do_apply_event_worker
    percona#16 Slave_worker::slave_worker_exec_event
    percona#17 slave_worker_exec_job_group
    percona#18 handle_slave_worker

14. If the applier thread continues without releasing the lock, then at a later
    stage, it can perform recursive locking while setting the GTID for the next
    transaction (in set_gtid_next()).

    In debug builds the above case hits the assertion
    `safe_mutex_assert_not_owner()` meaning the lock is already acquired by the
    replica applier thread when it tries to re-acquire the lock.

Solution
--------
In the above problematic example, when seen from each thread
individually, we can conclude that there is no problem in the order of lock
acquisition, thus there is no need to change the lock order.

However, the root cause for this problem is that multiple threads can
concurrently access to the array `Gtid_state::commit_group_sidnos`.

In its initial implementation, it was expected that threads should
hold the `MYSQL_BIN_LOG::LOCK_commit` before modifying its contents. But it
was not considered when upstream implemented WL#7846 (MTS:
slave-preserve-commit-order when log-slave-updates/binlog is disabled).

With this patch, we now ensure that `MYSQL_BIN_LOG::LOCK_commit` is acquired
when the client thread (binlog flush leader) when it tries to perform GTID
update on behalf of threads waiting in "Commit Order" queue, thus providing a
guarantee that `Gtid_state::commit_group_sidnos` array is never accessed
without the protection of `MYSQL_BIN_LOG::LOCK_commit`.
inikep added a commit that referenced this pull request Nov 11, 2024
PS-5741: Incorrect use of memset_s in keyring_vault.

Fixed the usage of memset_s. The arguments should be:
void memset_s(void *dest, size_t dest_max, int c, size_t n)
where the 2nd argument is size of buffer and the 3rd is
argument is character to fill.

---------------------------------------------------------------------------

PS-7769 - Fix use-after-return error in audit_log_exclude_accounts_validate

---

*Problem:*

`st_mysql_value::val_str` might return a pointer to `buf` which after
the function called is deleted. Therefore the value in `save`, after
reuturnin from the function, is invalid.

In this particular case, the error is not manifesting as val_str`
returns memory allocated with `thd_strmake` and it does not use `buf`.

*Solution:*

Allocate memory with `thd_strmake` so the memory in `save` is not local.

---------------------------------------------------------------------------

Fix test main.bug12969156 when WITH_ASAN=ON

*Problem:*

ASAN complains about stack-buffer-overflow on function `mysql_heartbeat`:

```
==90890==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fe746d06d14 at pc 0x7fe760f5b017 bp 0x7fe746d06cd0 sp 0x7fe746d06478
WRITE of size 24 at 0x7fe746d06d14 thread T16777215

Address 0x7fe746d06d14 is located in stack of thread T26 at offset 340 in frame
    #0 0x7fe746d0a55c in mysql_heartbeat(void*) /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:62

  This frame has 4 object(s):
    [48, 56) 'result' (line 66)
    [80, 112) '_db_stack_frame_' (line 63)
    [144, 200) 'tm_tmp' (line 67)
    [240, 340) 'buffer' (line 65) <== Memory access at offset 340 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
Thread T26 created by T25 here:
    #0 0x7fe760f5f6d5 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:216
    #1 0x557ccbbcb857 in my_thread_create /home/yura/ws/percona-server/mysys/my_thread.c:104
    #2 0x7fe746d0b21a in daemon_example_plugin_init /home/yura/ws/percona-server/plugin/daemon_example/daemon_example.cc:148
    #3 0x557ccb4c69c7 in plugin_initialize /home/yura/ws/percona-server/sql/sql_plugin.cc:1279
    #4 0x557ccb4d19cd in mysql_install_plugin /home/yura/ws/percona-server/sql/sql_plugin.cc:2279
    #5 0x557ccb4d218f in Sql_cmd_install_plugin::execute(THD*) /home/yura/ws/percona-server/sql/sql_plugin.cc:4664
    #6 0x557ccb47695e in mysql_execute_command(THD*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5160
    #7 0x557ccb47977c in mysql_parse(THD*, Parser_state*, bool) /home/yura/ws/percona-server/sql/sql_parse.cc:5952
    #8 0x557ccb47b6c2 in dispatch_command(THD*, COM_DATA const*, enum_server_command) /home/yura/ws/percona-server/sql/sql_parse.cc:1544
    #9 0x557ccb47de1d in do_command(THD*) /home/yura/ws/percona-server/sql/sql_parse.cc:1065
    #10 0x557ccb6ac294 in handle_connection /home/yura/ws/percona-server/sql/conn_handler/connection_handler_per_thread.cc:325
    #11 0x557ccbbfabb0 in pfs_spawn_thread /home/yura/ws/percona-server/storage/perfschema/pfs.cc:2198
    #12 0x7fe760ab544f in start_thread nptl/pthread_create.c:473
```

The reason is that `my_thread_cancel` is used to finish the daemon thread. This is not and orderly way of finishing the thread. ASAN does not register the stack variables are not used anymore which generates the error above.

This is a benign error as all the variables are on the stack.

*Solution*:

Finish the thread in orderly way by using a signalling variable.

---------------------------------------------------------------------------

PS-8204: Fix XML escape rules for audit plugin

https://jira.percona.com/browse/PS-8204

There was a wrong length specified for some XML
escape rules. As a result of this terminating null symbol from
replacement rule was copied into resulting string. This lead to
quer text truncation in audit log file.
In addition added empty replacement rules for '\b' and 'f' symbols
which just remove them from resulting string. These symboles are
not supported in XML 1.0.

---------------------------------------------------------------------------

PS-8854: Add main.percona_udf MTR test

Add a test to check FNV1A_64, FNV_64, and MURMUR_HASH user-defined functions.

---------------------------------------------------------------------------

PS-9218: Merge MySQL 8.4.0 (fix gcc-14 build)

https://perconadev.atlassian.net/browse/PS-9218
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants