Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MyRocks 8.0.28 hangs during the insert benchmark #1291

Open
mdcallag opened this issue Apr 5, 2023 · 6 comments
Open

MyRocks 8.0.28 hangs during the insert benchmark #1291

mdcallag opened this issue Apr 5, 2023 · 6 comments

Comments

@mdcallag
Copy link
Contributor

mdcallag commented Apr 5, 2023

I have run it many times (maybe 30) and there was a hang once. From PMP this looks like a hang in RocksDB

I compiled 8.0.28 from source at git hash f335788

commit f33578867c9d38e7ba8d8d9ccf71cbea78b7db47 (HEAD)
Author: Chun Ni <chni@fb.com>
Date:   Tue Mar 14 15:00:06 2023 -0700

RocksDB is at

commit 9502856edd77260bf8a12a66f2a232078ddb2d60 (HEAD)
Author: Hui Xiao <huixiao@fb.com>
Date:   Thu Dec 29 15:05:36 2022 -0800

The server is a c2-standard-60 on GCP with 240G RAM and 30 cores (hyperthreading disabled). More details on the setup are here and this is with the rel_o2_lto build.

The benchmark is run with 20 clients (threads) and the hang occurs during a phase where all clients are doing inserts. It has been stuck for ~5 days.

SHOW PROCESSLIST is

Id      User    Host    db      Command Time    State   Info    Tid
5       event_scheduler localhost       NULL    Daemon  472337  Waiting on empty queue  NULL    1093556
1016    root    localhost:33474 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093642
1018    root    localhost:33490 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093628
1020    root    localhost:33514 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093628
1022    root    localhost:33542 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1100616
1024    root    localhost:33564 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093628
1026    root    localhost:33584 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093651
1029    root    localhost:33610 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093642
1031    root    localhost:33620 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093642
1033    root    localhost:33770 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093642
1035    root    localhost:33790 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1100574
1037    root    localhost:33796 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093642
1040    root    localhost:33838 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093642
1042    root    localhost:33854 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093642
1044    root    localhost:33864 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1100602
1046    root    localhost:33892 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093628
1048    root    localhost:33908 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1100574
1051    root    localhost:33922 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093628
1053    root    localhost:33946 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093642
1055    root    localhost:33464 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093642
1057    root    localhost:33468 ib      Query   466391  waiting for handler commit      insert into pi1 (dateandtime,cashregisterid,customerid,productid,price,data) values ('2023-03-31 05:    1093642
1300    root    localhost:51802 test    Query   466390  executing       SHOW GLOBAL STATUS      1100647
1304    root    localhost       NULL    Query   0       init    show processlist        1303632

my.cnf is

[mysqld]
sql_mode=no_engine_substitution,no_unsigned_subtraction
default_tmp_storage_engine=MyISAM
tmpdir=/data/m/fbmy/data
#query_cache_size=0
#query_cache_type=0
# slow_query_log_file=/data/m/fbmy/slow.log
long_query_time=5
# skip_log_bin
log_bin=/data/m/fbmy/binlogs/bl
max_binlog_size=256M
server_id=77
datadir=/data/m/fbmy/data

slow_query_log=1
performance_schema=1
ssl=0
default_authentication_plugin=mysql_native_password
character_set_server=latin1
collation_server=latin1_swedish_ci
binlog_format=ROW
binlog_checksum=crc32
eq_range_index_dive_limit=10

rocksdb
default-storage-engine=rocksdb
skip-innodb
default-tmp-storage-engine=MyISAM
rocksdb_wal_dir=/data/m/fbmy/txlogs
transaction-isolation=READ-COMMITTED

sync_binlog=0
rocksdb_flush_log_at_trx_commit=2

# x1
rocksdb_block_cache_size=180G

# x2
rocksdb_max_background_jobs=12

# x6b
rocksdb_default_cf_options=level_compaction_dynamic_level_bytes=true;block_based_table_factory={cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=1};optimize_filters_for_hits=true;max_write_buffer_number=4;write_buffer_size=32m;max_bytes_for_level_base=256m;compression_per_level=kNoCompression;bottommost_compression=kLZ4Compression

# y9c
rocksdb_block_size=8192
rocksdb_cache_index_and_filter_with_high_priority=ON
rocksdb_cache_high_pri_pool_ratio=0.5

[mysqld_safe]
malloc-lib=/usr/lib/x86_64-linux-gnu/libjemalloc.so
@mdcallag
Copy link
Contributor Author

mdcallag commented Apr 5, 2023

From PMP, hierarchical is here, flat is here

I saved the RocksDB LOG file but it is 1.6M compressed and I don't know where to put it. It is saved on my laptop.

Nothing useful in the MySQL error log

2023-03-31T03:55:40.158936Z mysqld_safe Logging to '/data/m/fbmy/data/c2-60-3.err'.
2023-03-31T03:55:40.179300Z mysqld_safe Starting mysqld daemon with databases from /data/m/fbmy/data
2023-03-31T03:55:40.575208Z 0 [Warning] [MY-011068] [Server] The syntax '--ssl=off' is deprecated and will be removed in a future release. Please use --tls-version=invalid instead.
2023-03-31T03:55:40.575243Z 0 [Warning] [MY-010080] [Server] The use of InnoDB is mandatory since MySQL 5.7. The former options like '--innodb=0/1/OFF/ON' or '--skip-innodb' are ignored.
2023-03-31T03:55:40.575402Z 0 [Warning] [MY-010918] [Server] 'default_authentication_plugin' is deprecated and will be removed in a future release. Please use authentication_policy instead.
2023-03-31T03:55:40.575414Z 0 [System] [MY-010116] [Server] /home/mcallaghan/d/fbmy80/bin/mysqld (mysqld 8.0.28) starting as process 1093467
2023-03-31T03:55:40.584687Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2023-03-31T03:55:40.672863Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2023-03-31T03:55:40.798897Z 0 [System] [MY-010229] [Server] Starting XA crash recovery...
2023-03-31T03:55:40.810645Z 0 [System] [MY-010232] [Server] XA crash recovery finished.
2023-03-31T03:55:40.900373Z 0 [System] [MY-010931] [Server] /home/mcallaghan/d/fbmy80/bin/mysqld: ready for connections. Version: '8.0.28'  socket: '/tmp/mysql.sock'  port: 3306  Source distribution.

@mdcallag
Copy link
Contributor Author

mdcallag commented Apr 5, 2023

From PMP, and starting with hierarchical output after removing lines with abstimed_wait (output is here)

Many threads stuck in RocksDB, maybe trying to write the WAL

      9 __futex_abstimed_wait_common64,__futex_abstimed_wait_common,__GI___futex_abstimed_wait_cancelable64,__pthread_cond_wait_common,___pthread_cond_wait,wait<rocksdb::WriteThread::BlockingAwaitState(rocksdb::WriteThread::Writer*,,rocksdb::WriteThread::BlockingAwaitState,rocksdb::WriteThread::AwaitState,rocksdb::DBImpl::WriteImplWALOnly,rocksdb::DBImpl::WriteImpl,rocksdb::WriteCommittedTxn::PrepareInternal,rocksdb::PessimisticTransaction::Prepare,myrocks::Rdb_transaction_impl::prepare,rocksdb_prepare,ha_prepare_low,ha_commit_trans,trans_commit_stmt,mysql_execute_command,dispatch_sql_command,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread,clone3

Compaction threads are stuck

      3 futex_wait,__GI___lll_lock_wait,lll_mutex_lock_optimized,___pthread_mutex_lock,rocksdb::port::Mutex::Lock,rocksdb::InstrumentedMutex::LockInternal,rocksdb::InstrumentedMutex::Lock,rocksdb::DBImpl::BackgroundCompaction,rocksdb::DBImpl::BackgroundCallCompaction,rocksdb::DBImpl::BGWorkCompaction,rocksdb::ThreadPoolImpl::Impl::BGThread,rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper,??,start_thread,clone3

RocksDB memory monitoring thread is stuck

      1 futex_wait,__GI___lll_lock_wait,lll_mutex_lock_optimized,___pthread_mutex_lock,rocksdb::port::Mutex::Lock,rocksdb::InstrumentedMutex::LockInternal,rocksdb::InstrumentedMutex::Lock,rocksdb::InstrumentedMutexLock::InstrumentedMutexLock,rocksdb::DBImpl::GetAggregatedIntProperty,rocksdb::MemoryUtil::GetApproximateMemoryUsageByType,myrocks_update_memory_status,show_myrocks_vars,PFS_status_variable_cache::manifest,PFS_status_variable_cache::do_materialize_global,PFS_variable_cache<Status_variable>::materialize_global,PFS_variable_cache<Status_variable>::materialize_global,table_global_status::rnd_init,ha_perfschema::rnd_init,handler::ha_rnd_init,TableScanIterator::Init,MaterializeIterator::MaterializeQueryBlock,MaterializeIterator::Init,Query_expression::ExecuteIteratorQuery,Query_expression::execute,Sql_cmd_dml::execute_inner,Sql_cmd_dml::execute,Sql_cmd_show::execute,Sql_cmd_show_status::execute,mysql_execute_command,dispatch_sql_command,dispatch_command,do_command,handle_connection,pfs_spawn_thread,start_thread,clone3

Memtable flush is stuck

      1 futex_wait,__GI___lll_lock_wait,lll_mutex_lock_optimized,___pthread_mutex_lock,rocksdb::port::Mutex::Lock,rocksdb::InstrumentedMutex::LockInternal,rocksdb::InstrumentedMutex::Lock,rocksdb::FlushJob::WriteLevel0Table,rocksdb::FlushJob::Run,rocksdb::DBImpl::FlushMemTableToOutputFile,rocksdb::DBImpl::FlushMemTablesToOutputFiles,rocksdb::DBImpl::BackgroundFlush,rocksdb::DBImpl::BackgroundCallFlush,rocksdb::ThreadPoolImpl::Impl::BGThread,rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper,??,start_thread,clone3

@mdcallag
Copy link
Contributor Author

Another hang is here
This one from a build that uses CMAKE_BUILD_TYPE=RelWithDebInfo. The previous hang uses a build with CMAKE_BUILD_TYPE=Relase.

@kolinfluence
Copy link

@mdcallag

  1. is this hang possible to unhang by restart?
  2. is the database still functional or corrupted?

@mdcallag
Copy link
Contributor Author

It happened again. This is very intermittent. I will soon switch to a more recent build and hope it is gone. After kill -9 the database seems to be OK.

@mdcallag
Copy link
Contributor Author

From the hang today, which is with the same binary as the first hang:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants