Mempurge instead of flush is initiated even if only one memtable is picked by flush job #9151

riversand963 · 2021-11-10T19:56:40Z

This can be a potential bug once we merge #9142, which is a fix for a legit bug causing DB::Open failure. Currently before the fix, this bug is hidden.

In https://github.com/facebook/rocksdb/blob/6.26.fb/db/flush_job.cc#L233, a flush job will initiate a mempurge instead of flush even if mems_.size() is 1. Consequently, this flush job does not reduce the number of immutable memtables, leading to higher chance of write stall.

Expected behavior

When the number of immutable memtables reaches threshold, a flush is scheduled and executed, resulting in reduced number of immutable memtables. The db will eventually get out of write-stall, even when there are a lot of writes.

Actual behavior

Currently, when the number of immutable reaches threshold, a mempurge may be scheduled even if the number of memtables picked is 1. The new memtable will be added back, and does not mitigate write-stall condition. No further flush may be scheduled because normally a flush is scheduled after insertion, but insertion is currently stalled.

Steps to reproduce the behavior

Use #9150 , restart the job "build-linux-non-shm-1" with ssh access. Manually run the following

./db_flush_test --gtest_filter=DBFlushTest.MemPurgeWALSupport

It will hang.

The text was updated successfully, but these errors were encountered:

ajkr · 2021-11-18T23:49:03Z

Should it be assigned or up-for-grabs?

Summary: After RocksDB 6.19 and before this PR, RocksDB FlushJob may pick more memtables to flush beyond synced WALs. This can be problematic if there are multiple column families, since it can prematurely advance the flushed column family's log_number. Should subsequent attempts fail to sync the latest WALs and the database goes through a recovery, it may detect corrupted WAL number below the flushed column family's log number and complain about column family inconsistency. To fix, we record the maximum memtable ID of the column family being flushed. Then we call SyncClosedLogs() so that all closed WALs at the time when memtable ID is recorded will be synced. I also disabled a unit test temporarily due to reasons described in #9151 Pull Request resolved: #9142 Test Plan: make check Reviewed By: ajkr Differential Revision: D32299956 Pulled By: riversand963 fbshipit-source-id: 0da75888177d91905cf8c9d00605b73afb5970a7

briantkim93 · 2022-08-02T16:17:21Z

@ajkr I'm happy to get this assigned to me

briantkim93 · 2022-08-09T02:48:45Z

@ajkr With regards to this bug, I have 2 questions:

Why are we limiting the mempurge output to only one memtable ?
I tried following the code and did not see where the old memtables are destroyed after the new memtable is created

ajkr · 2022-08-11T07:29:17Z

Sorry I'm not currently familiar with mempurge. @riversand963 are you able to help answer the questions?

Fix facebook#9151

riversand963 mentioned this issue Nov 10, 2021

Fix a bug in FlushJob picking more memtables beyond synced WALs #9142

Closed

ajkr added bug Confirmed RocksDB bugs up-for-grabs Up for grabs labels Nov 18, 2021

ajkr removed the up-for-grabs Up for grabs label Aug 2, 2022

ajkr assigned briantkim93 Aug 2, 2022

ywave620 mentioned this issue Dec 27, 2022

Scan the immutable memtable list to get max/min id if mempurge is used #11057

Closed

ywave620 pushed a commit to ywave620/rocksdb that referenced this issue Dec 28, 2022

Scan the immutable memtable list to get max/min id if mempurge is used

3cbb722

Fix facebook#9151

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mempurge instead of flush is initiated even if only one memtable is picked by flush job #9151

Mempurge instead of flush is initiated even if only one memtable is picked by flush job #9151

riversand963 commented Nov 10, 2021 •

edited

Loading

ajkr commented Nov 18, 2021

briantkim93 commented Aug 2, 2022

briantkim93 commented Aug 9, 2022

ajkr commented Aug 11, 2022

Mempurge instead of flush is initiated even if only one memtable is picked by flush job #9151

Mempurge instead of flush is initiated even if only one memtable is picked by flush job #9151

Comments

riversand963 commented Nov 10, 2021 • edited Loading

Expected behavior

Actual behavior

Steps to reproduce the behavior

ajkr commented Nov 18, 2021

briantkim93 commented Aug 2, 2022

briantkim93 commented Aug 9, 2022

ajkr commented Aug 11, 2022

riversand963 commented Nov 10, 2021 •

edited

Loading