Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mempurge instead of flush is initiated even if only one memtable is picked by flush job #9151

Open
riversand963 opened this issue Nov 10, 2021 · 4 comments
Assignees
Labels
bug Confirmed RocksDB bugs

Comments

@riversand963
Copy link
Contributor

riversand963 commented Nov 10, 2021

This can be a potential bug once we merge #9142, which is a fix for a legit bug causing DB::Open failure. Currently before the fix, this bug is hidden.

In https://github.com/facebook/rocksdb/blob/6.26.fb/db/flush_job.cc#L233, a flush job will initiate a mempurge instead of flush even if mems_.size() is 1. Consequently, this flush job does not reduce the number of immutable memtables, leading to higher chance of write stall.

Expected behavior

When the number of immutable memtables reaches threshold, a flush is scheduled and executed, resulting in reduced number of immutable memtables. The db will eventually get out of write-stall, even when there are a lot of writes.

Actual behavior

Currently, when the number of immutable reaches threshold, a mempurge may be scheduled even if the number of memtables picked is 1. The new memtable will be added back, and does not mitigate write-stall condition. No further flush may be scheduled because normally a flush is scheduled after insertion, but insertion is currently stalled.

Steps to reproduce the behavior

Use #9150 , restart the job "build-linux-non-shm-1" with ssh access. Manually run the following

./db_flush_test --gtest_filter=DBFlushTest.MemPurgeWALSupport

It will hang.

@ajkr
Copy link
Contributor

ajkr commented Nov 18, 2021

Should it be assigned or up-for-grabs?

@ajkr ajkr added bug Confirmed RocksDB bugs up-for-grabs Up for grabs labels Nov 18, 2021
facebook-github-bot pushed a commit that referenced this issue Nov 19, 2021
Summary:
After RocksDB 6.19 and before this PR, RocksDB FlushJob may pick more memtables to flush beyond synced WALs.
This can be problematic if there are multiple column families, since it can prematurely advance the flushed column
family's log_number. Should subsequent attempts fail to sync the latest WALs and the database goes
through a recovery, it may detect corrupted WAL number below the flushed column family's log number
and complain about column family inconsistency.
To fix, we record the maximum memtable ID of the column family being flushed. Then we call SyncClosedLogs()
so that all closed WALs at the time when memtable ID is recorded will be synced.
I also disabled a unit test temporarily due to reasons described in #9151

Pull Request resolved: #9142

Test Plan: make check

Reviewed By: ajkr

Differential Revision: D32299956

Pulled By: riversand963

fbshipit-source-id: 0da75888177d91905cf8c9d00605b73afb5970a7
@briantkim93
Copy link

@ajkr I'm happy to get this assigned to me

@ajkr ajkr removed the up-for-grabs Up for grabs label Aug 2, 2022
@briantkim93
Copy link

@ajkr With regards to this bug, I have 2 questions:

  1. Why are we limiting the mempurge output to only one memtable ?
  2. I tried following the code and did not see where the old memtables are destroyed after the new memtable is created

@ajkr
Copy link
Contributor

ajkr commented Aug 11, 2022

Sorry I'm not currently familiar with mempurge. @riversand963 are you able to help answer the questions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed RocksDB bugs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants