Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vats 5.7 #3

Merged
merged 27 commits into from
Dec 8, 2016
Merged

Vats 5.7 #3

merged 27 commits into from
Dec 8, 2016

Conversation

sensssz
Copy link

@sensssz sensssz commented Dec 7, 2016

This patch uses a new implementation of VATS for Percona 5.7.

laurynas-biveinis and others added 27 commits October 12, 2016 15:53
@laurynas-biveinis laurynas-biveinis merged commit 15352a4 into laurynas-biveinis:vats-5.7 Dec 8, 2016
laurynas-biveinis pushed a commit that referenced this pull request Jul 21, 2017
In WL-included builds ASAN run witnessed missed ~Query_log_event invocation.
The destruct-or was not called due to the WL's changes in the error propagation
that specifically affect LC MTS.
The failure is exposed in particular by rpl_trigger as the following
stack:

  #0 0x9ecd98 in __interceptor_malloc (/export/home/pb2/test/sb_2-22611026-1489061390.32/mysql-commercial-8.0.1-dmr-linux-x86_64-asan/bin/mysqld+0x9ecd98)
  #1 0x2b1a245 in my_raw_malloc(unsigned long, int) obj/mysys/../../mysqlcom-pro-8.0.1-dmr/mysys/my_malloc.cc:209:12
  #2 0x2b1a245 in my_malloc obj/mysys/../../mysqlcom-pro-8.0.1-dmr/mysys/my_malloc.cc:72
  #3 0x2940590 in Query_log_event::Query_log_event(char const*, unsigned int, binary_log::Format_description_event const*, binary_log::Log_event_type) obj/sql/../../mysqlcom-pro-8.0.1-dmr/sql/log_event.cc:4343:46
  #4 0x293d235 in Log_event::read_log_event(char const*, unsigned int, char const**, Format_description_log_event const*, bool) obj/sql/../../mysqlcom-pro-8.0.1-dmr/sql/log_event.cc:1686:17
  #5 0x293b96f in Log_event::read_log_event()
  #6 0x2a2a1c9 in next_event(Relay_log_info*)

Previously before the WL
Mts_submode_logical_clock::wait_for_workers_to_finish() had not
returned any error even when Coordinator thread is killed.

The WL patch needed to refine such behavior, but at doing so
it also had to attend log_event.cc::schedule_next_event() to register
an error to follow an existing pattern.
While my_error() does not take place the killed Coordinator continued
scheduling, ineffectively though - no Worker gets engaged (legal case
of deferred scheduling), and without noticing its killed status up to
a point when it resets the event pointer in
apply_event_and_update_pos():

  *ptr_ev= NULL; // announcing the event is passed to w-worker

The reset was intended for an assigned Worker to perform the event
destruction or by Coordinator itself when the event is deferred.
As neither is the current case the event gets unattended for its termination.

In contrast in the pre-WL sources the killed Coordinator does find a Worker.
However such Worker could be already down (errored out and exited), in
which case apply_event_and_update_pos() reasonably returns an error and executes

  delete ev

in exec_relay_log_event() error branch.

**Fixed** with deploying my_error() call in log_event.cc::schedule_next_event()
error branch which fits to the existing pattern.
THD::is_error() has been always checked by Coordinator before any attempt to
reset *ptr_ev= NULL. In the errored case Coordinator does not reset and
destroys the event itself in the exec_relay_log_event() error branch pretty similarly to
how the pre-WL sources do.

Tested against rpl_trigger and rpl suites to pass.

Approved on rb#15667.
laurynas-biveinis pushed a commit that referenced this pull request Jul 21, 2017
Some character sets are designated as MY_CS_STRNXFRM, meaning that sorting
needs to go through my_strnxfrm() (implemented by the charset), and some are
not, meaning that a client can do the strnxfrm itself based on
cs->sort_order. However, most of the logic related to the latter has been
removed already (e.g. filesort always uses my_strnxfrm() since 2003), and now
it's mostly in the way. The three main uses left are:

 1. A microoptimization for constructing sort keys in filesort.
 2. A home-grown implementation of Boyer-Moore for accelerating certain
    LIKE patterns that should probably be handled through FTS.
 3. Some optimizations to MyISAM prefix keys.

Given that our default collation (utf8mb4_0900_ai_ci) now is a strnxfrm-based
collation, the benefits of keeping these around for a narrow range of
single-byte locales (like latin1_swedish_ci, cp850 and a bunch of more
obscure locales) seems dubious. We seemingly can't remove the flag entirely
due to #3 seemingly affecting the on-disk MyISAM structure, but we can remove
the code for #1 and #2.

Change-Id: If974e490d451b7278355e33ab1fca993f446b792
laurynas-biveinis pushed a commit that referenced this pull request Oct 9, 2017
Patch #3:

find_child_doms() has two kinds of duplicate elimination. One for
removing duplicates that occur due to multiple ellipses, and one for
removing duplicates that occur due to auto-wrapping on results
returned by an ellipsis.

The first kind of duplicate elimination is performed by maintaining a
sorted set of results. The second kind performs a linear search of the
results to see if the value is already in the result vector.

This patch consolidates this code so that they both use the first kind
of duplicate elimination. It also makes sure that duplicate
elimination for auto-wrapping only happens for paths that could
produce duplicates (only if the auto-wrapping path leg comes after an
ellipsis path leg).

This is just a code cleanup. The microbenchmark results are
indistinguishable from noise.

Microbenchmarks (64-bit, Intel Core i7-4770 3.4 GHz, GCC 6.3):

BM_JsonDomSearchEllipsis              25443 ns/iter [+1.0%]
BM_JsonDomSearchEllipsis_OnlyOne      17757 ns/iter [+0.7%]
BM_JsonDomSearchKey                     128 ns/iter [ 0.0%]
BM_JsonBinarySearchEllipsis          233469 ns/iter [-0.9%]
BM_JsonBinarySearchEllipsis_OnlyOne  226089 ns/iter [-1.5%]
BM_JsonBinarySearchKey                   86 ns/iter [ 0.0%]

Change-Id: Ia62916098096032adf9f2ecc70a42c845f625c1c
laurynas-biveinis pushed a commit that referenced this pull request Jan 30, 2018
…TABLE_UPGRADE_GUARD

To repeat: cmake -DWITH_ASAN=1 -DWITH_ASAN_SCOPE=1
./mtr --mem --sanitize main.dd_upgrade_error

A few dd tests fail with:
==26861==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7000063bf5e8 at pc 0x00010d4dbe8b bp 0x7000063bda40 sp 0x7000063bda38
READ of size 8 at 0x7000063bf5e8 thread T2
    #0 0x10d4dbe8a in Prealloced_array<st_plugin_int**, 16ul>::empty() const prealloced_array.h:186
    #1 0x10d406a8b in lex_end(LEX*) sql_lex.cc:560
    #2 0x10dae4b6d in dd::upgrade::Table_upgrade_guard::~Table_upgrade_guard() (mysqld:x86_64+0x100f87b6d)
    #3 0x10dadc557 in dd::upgrade::migrate_table_to_dd(THD*, std::__1::basic_string<char, std::__1::char_traits<char>, Stateless_allocator<char, dd::String_type_alloc, My_free_functor> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, Stateless_allocator<char, dd::String_type_alloc, My_free_functor> > const&, bool) (mysqld:x86_64+0x100f7f557)
    #4 0x10dad7e85 in dd::upgrade::migrate_plugin_table_to_dd(THD*) (mysqld:x86_64+0x100f7ae85)
    #5 0x10daec6a1 in dd::upgrade::do_pre_checks_and_initialize_dd(THD*) upgrade.cc:1216
    #6 0x10cd0a5c0 in bootstrap::handle_bootstrap(void*) bootstrap.cc:336

Change-Id: I265ec6dd97ee8076aaf03763840c0cdf9e20325b
Fix: increase lifetime of 'LEX lex;' which is used by 'table_guard'
laurynas-biveinis added a commit that referenced this pull request Oct 18, 2018
A subset of binlog encryption tests was crashing with:

* thread percona#39, stop reason = signal SIGSTOP
    frame #0: 0x00007fff56063b66 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff5622e080 libsystem_pthread.dylib`pthread_kill + 333
    frame #2: 0x000000010657442b mysqld-debug`my_write_core(sig=11) at stacktrace.cc:278
    frame #3: 0x0000000104d84334 mysqld-debug`::handle_fatal_signal(sig=11) at signal_handler.cc:254
    frame #4: 0x00007fff56221f5a libsystem_platform.dylib`_sigtramp + 26
    frame #5: 0x00007fff5622934d libsystem_pthread.dylib`pthread_mutex_lock + 1
    frame #6: 0x0000000106578d05 mysqld-debug`native_mutex_lock(mutex=0x0000000000000000) at thr_mutex.h:93
    frame #7: 0x0000000106578a57 mysqld-debug`safe_mutex_lock(mp=0x0000000000000000, try_lock=false, file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", line=113) at thr_mutex.cc:70
    frame #8: 0x000000010653cd3a mysqld-debug`my_mutex_lock(mp=0x00007ffb6b215038, file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", line=113) at thr_mutex.h:180
    frame #9: 0x000000010653b2cc mysqld-debug`inline_mysql_mutex_lock(that=0x00007ffb6b215038, src_file="/Users/laurynas/percona/mysql-server/mysys/mf_iocache2.cc", src_line=113) at mysql_mutex.h:267
  * frame #10: 0x000000010653b0d8 mysqld-debug`my_b_append_tell(info=0x00007ffb6b214fd8) at mf_iocache2.cc:113
    frame #11: 0x0000000105ed6a96 mysqld-debug`MYSQL_BIN_LOG::write_buffer(this=0x00007ffb6b214cb8, buf="", len=47, mi=0x00007ffb6b1f6a00) at binlog.cc:7128
    frame #12: 0x0000000105f4d54b mysqld-debug`queue_event(mi=0x00007ffb6b1f6a00, buf="", event_len=47, do_flush_mi=true) at rpl_slave.cc:7756
    frame #13: 0x0000000105f3a243 mysqld-debug`::handle_slave_io(arg=0x00007ffb6b1f6a00) at rpl_slave.cc:5382
    frame #14: 0x00000001065b87a5 mysqld-debug`pfs_spawn_thread(arg=0x00007ffb6a543af0) at pfs.cc:2836
    frame #15: 0x00007fff5622b661 libsystem_pthread.dylib`_pthread_body + 340
    frame #16: 0x00007fff5622b50d libsystem_pthread.dylib`_pthread_start + 377
    frame #17: 0x00007fff5622abf9 libsystem_pthread.dylib`thread_start + 13

This was caused by my_b_append_tell trying to lock a nullptr
IO_CACHE::append_buffer_lock. The lock was nullptr, because it's only
initialized for SEQ_READ_APPEND IO_CACHEs, whereas we have
WRITE_CACHE. This mismatch was introduced by WL#8599 [1] changing the
IO_CACHE type from the former to the latter.

Fix by using the correct API for the new IO_CACHE type: my_b_tell
instead of my_b_append_tell.

[1]:

commit dbd2ca2
Author: Joao Gramacho <joao.gramacho@oracle.com>
Date:   Tue Nov 1 06:45:39 2016 +0000

    WL#8599: Reduce contention in IO and SQL threads
    (...)
laurynas-biveinis added a commit that referenced this pull request Oct 18, 2018
create_table_info_t::create_table_def leaked memory in the case
enable_encryption(table) call failed:

worker[5] Sanitizer report from /tmp/results/PS/mysql-test/var/5/log/mysqld.2.err after tests:
 binlog_encryption.binlog_encryption_without_keyring group_replication.gr_change_master_hidden group_replication.gr_server_uuid_matches_group_name group_replication.gr_perfschema_connect_status group_replication.gr_single_primary_and_leader_election_on_error group_replication.gr_without_perfschema rpl.rpl_key_rotation
--------------------------------------------------------------------------
==14131==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1136 byte(s) in 1 object(s) allocated from:
    #0 0x7fe9233f1602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0xc692483 in ut_allocator<unsigned char>::allocate(unsigned long, unsigned char const*, unsigned int, bool, bool) storage/innobase/include/ut0new.h:608
    #2 0xc692483 in mem_heap_create_block_func(mem_block_info_t*, unsigned long, unsigned long) storage/innobase/mem/memory.cc:281
    #3 0xb99ff96 in mem_heap_create_func storage/innobase/include/mem0mem.ic:464
    #4 0xbae8604 in create_table_info_t::create_table_def(dd::Table const*) storage/innobase/handler/ha_innodb.cc:10349
    #5 0xbaee018 in create_table_info_t::create_table(dd::Table const*) storage/innobase/handler/ha_innodb.cc:12420
    #6 0xbaf1aba in int innobase_basic_ddl::create_impl<dd::Table>(THD*, char const*, TABLE*, HA_CREATE_INFO*, dd::Table*, bool, bool, bool, unsigned long, unsigned long) storage/innobase/handler/ha_innodb.cc:12805
    #7 0xbaf7e6a in ha_innobase::create(char const*, TABLE*, HA_CREATE_INFO*, dd::Table*) storage/innobase/handler/ha_innodb.cc:13756
    #8 0x2857f7a in ha_create_table(THD*, char const*, char const*, char const*, HA_CREATE_INFO*, List<Create_field> const*, bool, bool, dd::Table*) sql/handler.cc:5156
    #9 0x19d0d9f in rea_create_base_table sql/sql_table.cc:991
    #10 0x19d0d9f in create_table_impl sql/sql_table.cc:7118
    #11 0x19d37cf in mysql_create_table_no_lock(THD*, char const*, char const*, HA_CREATE_INFO*, Alter_info*, unsigned int, bool, bool*, handlerton**) sql/sql_table.cc:7200
    #12 0x19dffb2 in mysql_create_table(THD*, TABLE_LIST*, HA_CREATE_INFO*, Alter_info*) sql/sql_table.cc:7950
    #13 0x3b58b9b in Sql_cmd_create_table::execute(THD*) sql/sql_cmd_ddl_table.cc:319
    #14 0x15917c1 in mysql_execute_command(THD*, bool) sql/sql_parse.cc:4417
    #15 0x15b086e in mysql_parse(THD*, Parser_state*, bool) sql/sql_parse.cc:5139
    #16 0x8efc7fd in Query_log_event::do_apply_event(Relay_log_info const*, char const*, unsigned long) sql/log_event.cc:5295
    #17 0x8f7ea48 in Log_event::apply_event(Relay_log_info*) sql/log_event.cc:3882
    #18 0x91cb682 in apply_event_and_update_pos sql/rpl_slave.cc:4352
    #19 0x9215e69 in exec_relay_log_event sql/rpl_slave.cc:4812
    #20 0x9254685 in handle_slave_sql sql/rpl_slave.cc:6912
    #21 0xb1913a3 in pfs_spawn_thread storage/perfschema/pfs.cc:2836
    #22 0x7fe9231436b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9)

Fix by adding the missing mem_heap_free(heap) call.
laurynas-biveinis added a commit that referenced this pull request Oct 18, 2018
Avoid undefined behavior in audit_log_update_thd_local by avoiding
passing NULL as source pointer to memcpy, even with zero length.

The UBSan report fixed is

/usr/include/x86_64-linux-gnu/bits/string3.h:53:71: runtime error: null pointer passed as argument 2, which is declared to never be null
    #0 0x7fe5aad56fb1 in memcpy /usr/include/x86_64-linux-gnu/bits/string3.h:53
    #1 0x7fe5aad56fb1 in audit_log_update_thd_local plugin/audit_log/audit_log.cc:987
    #2 0x7fe5aad56fb1 in audit_log_notify plugin/audit_log/audit_log.cc:1105
    #3 0x1ecac37 in plugins_dispatch sql/sql_audit.cc:1284
    #4 0x1ecac37 in event_class_dispatch sql/sql_audit.cc:1322
    #5 0x1ecb311 in event_class_dispatch_error sql/sql_audit.cc:1340
    #6 0x1ed21b1 in mysql_audit_notify(THD*, mysql_event_connection_subclass_t, char const*, int) sql/sql_audit.cc:438
    #7 0x1350071 in check_connection sql/sql_connect.cc:868
    #8 0x1350071 in login_connection sql/sql_connect.cc:929
    #9 0x1357881 in thd_prepare_connection(THD*, bool) sql/sql_connect.cc:1084
    #10 0x1e66347 in handle_connection sql/conn_handler/connection_handler_per_thread.cc:313
    #11 0xb1913a3 in pfs_spawn_thread storage/perfschema/pfs.cc:2836
    #12 0x7fe5d352f6b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9)
    #13 0x7fe5d0bd741c in clone (/lib/x86_64-linux-gnu/libc.so.6+0x10741c)
laurynas-biveinis pushed a commit that referenced this pull request Jun 21, 2019
Upstream commit ID : fb-mysql-5.6.35/8cb1dc836b68f1f13e8b2655b2b8cb2d57f400b3
PS-5217 : Merge fb-prod201803

Summary:
Original report: https://jira.mariadb.org/browse/MDEV-15816

To reproduce this bug just following below steps,

client 1:
USE test;
CREATE TABLE t1 (i INT) ENGINE=MyISAM;
HANDLER t1 OPEN h;
CREATE TABLE t2 (i INT) ENGINE=RocksDB;
LOCK TABLES t2 WRITE;

client 2:
FLUSH TABLES WITH READ LOCK;

client 1:
INSERT INTO t2 VALUES (1);

So client 1 acquired the lock and set m_lock_rows = RDB_LOCK_WRITE.
Then client 2 calls store_lock(TL_IGNORE) and m_lock_rows was wrongly
set to RDB_LOCK_NONE, as below

```
 #0  myrocks::ha_rocksdb::store_lock (this=0x7fffbc03c7c8, thd=0x7fffc0000ba0, to=0x7fffc0011220, lock_type=TL_IGNORE)
 #1  get_lock_data (thd=0x7fffc0000ba0, table_ptr=0x7fffe84b7d20, count=1, flags=2)
 #2  mysql_lock_abort_for_thread (thd=0x7fffc0000ba0, table=0x7fffbc03bbc0)
 #3  THD::notify_shared_lock (this=0x7fffc0000ba0, ctx_in_use=0x7fffbc000bd8, needs_thr_lock_abort=true)
 #4  MDL_lock::notify_conflicting_locks (this=0x555557a82380, ctx=0x7fffc0000cc8)
 #5  MDL_context::acquire_lock (this=0x7fffc0000cc8, mdl_request=0x7fffe84b8350, lock_wait_timeout=2)
 #6  Global_read_lock::lock_global_read_lock (this=0x7fffc0003fe0, thd=0x7fffc0000ba0)
```

Finally, client 1 "INSERT INTO..." hits the Assertion 'm_lock_rows == RDB_LOCK_WRITE'
failed in myrocks::ha_rocksdb::write_row()

Fix this bug by not setting m_locks_rows if lock_type == TL_IGNORE.

Closes facebook/mysql-5.6#838
Pull Request resolved: facebook/mysql-5.6#871

Differential Revision: D9417382

Pulled By: lth

fbshipit-source-id: c36c164e06c
laurynas-biveinis pushed a commit that referenced this pull request Jul 18, 2019
…E TO A SERVER

Problem
========================================================================
Running the GCS tests with ASAN seldomly reports a user-after-free of
the server reference that the acceptor_learner_task uses.

Here is an excerpt of ASAN's output:

==43936==ERROR: AddressSanitizer: heap-use-after-free on address 0x63100021c840 at pc 0x000000530ff8 bp 0x7fc0427e8530 sp 0x7fc0427e8520
WRITE of size 8 at 0x63100021c840 thread T3
    #0 0x530ff7 in server_detected /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:962
    #1 0x533814 in buffered_read_bytes /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:1249
    #2 0x5481af in buffered_read_msg /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:1399
    #3 0x51e171 in acceptor_learner_task /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:4690
    #4 0x562357 in task_loop /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/task.c:1140
    #5 0x5003b2 in xcom_taskmain2 /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:1324
    #6 0x6a278a in Gcs_xcom_proxy_impl::xcom_init(unsigned short, node_address*) /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_proxy.cc:164
    #7 0x59b3c1 in xcom_taskmain_startup /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_control_interface.cc:107
    #8 0x7fc04a2e4dd4 in start_thread (/lib64/libpthread.so.0+0x7dd4)
    #9 0x7fc047ff2bfc in __clone (/lib64/libc.so.6+0xfebfc)

0x63100021c840 is located 64 bytes inside of 65688-byte region [0x63100021c800,0x63100022c898)
freed by thread T3 here:
    #0 0x7fc04a5d7508 in __interceptor_free (/lib64/libasan.so.4+0xde508)
    #1 0x52cf86 in freesrv /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:836
    #2 0x52ea78 in srv_unref /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:868
    #3 0x524c30 in reply_handler_task /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:4914
    #4 0x562357 in task_loop /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/task.c:1140
    #5 0x5003b2 in xcom_taskmain2 /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:1324
    #6 0x6a278a in Gcs_xcom_proxy_impl::xcom_init(unsigned short, node_address*) /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_proxy.cc:164
    #7 0x59b3c1 in xcom_taskmain_startup /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_control_interface.cc:107
    #8 0x7fc04a2e4dd4 in start_thread (/lib64/libpthread.so.0+0x7dd4)

previously allocated by thread T3 here:
    #0 0x7fc04a5d7a88 in __interceptor_calloc (/lib64/libasan.so.4+0xdea88)
    #1 0x543604 in mksrv /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:721
    #2 0x543b4c in addsrv /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:755
    #3 0x54af61 in update_servers /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_transport.c:1747
    #4 0x501082 in site_install_action /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:1572
    #5 0x55447c in import_config /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/site_def.c:486
    #6 0x506dfc in handle_x_snapshot /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:5257
    #7 0x50c444 in xcom_fsm /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:5325
    #8 0x516c36 in dispatch_op /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:4510
    #9 0x521997 in acceptor_learner_task /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:4772
    #10 0x562357 in task_loop /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/task.c:1140
    #11 0x5003b2 in xcom_taskmain2 /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/xcom/xcom_base.c:1324
    #12 0x6a278a in Gcs_xcom_proxy_impl::xcom_init(unsigned short, node_address*) /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_proxy.cc:164
    #13 0x59b3c1 in xcom_taskmain_startup /home/tvale/mysql/plugin/group_replication/libmysqlgcs/src/bindings/xcom/gcs_xcom_control_interface.cc:107
    #14 0x7fc04a2e4dd4 in start_thread (/lib64/libpthread.so.0+0x7dd4)

Analysis
========================================================================
The server structure is reference counted by the associated sender_task
and reply_handler_task.
When they finish, they unreference the server, which leads to its memory
being freed.

However, the acceptor_learner_task keeps a "naked" reference to the
server structure.
Under the right ordering of operations, i.e. the sender_task and
reply_handler_task terminating after the acceptor_learner_task acquires,
but before it uses, the reference to the server structure, leads to the
acceptor_learner_task accessing the server structure after it has been
freed.

Solution
========================================================================
Let the acceptor_learner_task also reference count the server structure
so it is not freed while still in use.

Reviewed-by: André Negrão <andre.negrao@oracle.com>
Reviewed-by: Venkatesh Venugopal <venkatesh.venugopal@oracle.com>
RB: 21209
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants