-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Occasional performance drop on MongoRocks #3198
Comments
And this is rocksdb storage engine status during performance drop.
|
Thanks for the detailed report. I reshared it at There are compaction stalls. I am not sure whether that is the issue here. lz4 compaction for L1, L2 and maybe L3 will make the stalls worse compared to not using compression for L0/L1/L2. Internally there is a query done to confirm PK is unique, but that should be in-memory only because inserts are not in random order per the PK value. I am curious about what the user thread is doing when processing inserts. Does the overhead and work from the background compaction threads make it stall? My insert heavy workload has been the insert benchmark, but I didn't enough much attention to stalls and p99 insert rates to state whether or not it was a problem for me. Back in the day, write rates of 1000/second were a strong signal that write throttling was in progress. The code has changed and perhaps 2000/second means the same thing today. If that is the problem then there are my.cnf changes you can do. Do you have thread stacks that show where user threads are doing work or blocked in mongod? In the trace you provided I only saw background threads. In this case, there would be at most one thread busy for the user because prepare is single threaded. |
Thanks for the great report! |
Thanks for your attention. Sorry I have forgotten checking rocksdb LOG.
Also evaluating AddLiveFiles method, but there's no buggy things. As @mdcallag mentioned, this is compaction issue. The problem is too frequent compactions periodically. @mdcallag I am testing MongoRocks not MyRocks. And there's only a single insert thread. No query threads. And I have increased memtable size to 128MB and max-compaction thread to 20, then insert stalling period is shorter than before but still stalling happened. @igorcanadi No, there's only single insert thread. So cpu usage is not so high. With default options(max-compaction thread=8, memtable size 64MB), cpu usage is only 11%. This is sysbench collection's indexes and sample document.
I think indexes are not so many and document is smaller than normal use case, So I expected there's no compaction stall (Because there's L0 -> L0 compaction). @mdcallag @igorcanadi One more question about MongoRocks default options. |
These settings will indeed cause large number of files for bigger databases, but in practice we have not seen many problems related to large number of files. The biggest benefit of smaller files is that it increases the likelihood of parallel compactions. The index and filter blocks are also smaller, which is generally a good thing. However, feel free to increase the file size, especially if your database is on the bigger side (500GB-1TB). |
You must have thought about this.. Of course, this could not be a best solution. |
A dedicated thread for L0->L0 compactions (or even L0->L1) sounds like a good idea. Would be great to construct the experiment to see if it would speed up the ingest. |
After examining this insert throughput drop, I have found some possible cause. MongoRocks calls manual compaction periodically, And this prevent parallel compaction. But looks like all automatic compaction is blocked during manual compaction is running. I am not sure this scenario (I don't know much about rocksdb and mongorocks code). Here's what I am curious about are
|
@SunguckLee Great findings!
|
Thanks @igorcanadi Then what happen if I skip manual compaction(rangeCompact) from mongo-rocks. |
No, it will just be cleaned up later, when compaction decides to compact that range. |
I was wrong, exclusive compaction is not the cause of this performance drop. Looks like auto-scheduled compaction can't be run when manual compaction(not in-progress and not done status) is in the queue. Then auto-scheduled compaction can't doing job and put off later. But BackgroundCallCompaction() schedule new compaction right after return of BckgroundCompaction(). This loop is so frequent and takes DBImpl::mutex_ lock. Because of this loop, DBImpl::mutex_ is getting so hot and eventually user insert documents are delayed. About this code block(https://github.com/facebook/rocksdb/blob/v5.9.2/db/db_impl_compaction_flush.cc#L1506-L1513),
What do you think if we remove check HaveManualCompaction() code block |
Agreed, I don't know why that check is there as we mark the files as "being compacted" when manual is picked so subsequent automatic compactions can't step on its toes. Want to try changing it to check for exclusive manual compaction and see if anything breaks? |
Cool, looks like this might fix it! @ajkr seems like the loop causing the mutex to go hot would still be a problem when running exclusive manual compaction. Is that something worth fixing, too? |
That's true. Maybe we can keep a list of column families that reached the front of the compaction queue already but were rejected due to exclusive manual. Scheduling will first check whether any of these CFs have finished their exclusive manual before trying the regular compaction queue. |
It's pretty tricky though. Retiring exclusive manual compaction option would be my preference. |
That is not exclusive manual compaction. And I have changed code (https://github.com/facebook/rocksdb/blob/v5.9.2/db/db_impl_compaction_flush.cc#L1506)
==>
After this change now slow down and insert performance is getting really stable. I am not sure this change is good (no harm to data consistency or anything else). |
@SunguckLee I think @ajkr said above that this fix should be okay. Would you mind sending a PR?
Yeah, or remove them from the compaction queue and have it be manual compaction's responsibility to put them back to the queue once the manual compaction is done. It should be small code change, but will probably need to be built carefully. |
@igorcanadi My concern for that approach was a fairness issue where CFs that reach the front of the queue during exclusive manual compaction are penalized. Those CFs would have to wait in the queue again after the manual compaction finished, whereas CFs that hadn't yet reached the front would get no additional wait. Thinking about it more, a similar fairness issue already existed since we put CFs in the back of the queue when they're popped during exclusive manual, which meant reaching the front earlier could result in longer wait. So I am ok with the solution you suggested. We could also address the fairness issue by inserting to the front of the compaction queue at the end of exclusive manual if we know the CF has reached the front already but been rejected due to conflict. |
If there's manual compaction in the queue, then "HaveManualCompaction(compaction_queue_.front())" will return true, and this cause too frequent MaybeScheduleFlushOrCompaction(). facebook#3198
I have made pull-request simple patch for DBImpl::mutex_ contention. |
Summary: If there's manual compaction in the queue, then "HaveManualCompaction(compaction_queue_.front())" will return true, and this cause too frequent MaybeScheduleFlushOrCompaction(). #3198 Closes #3366 Differential Revision: D6729575 Pulled By: ajkr fbshipit-source-id: 96da04f8fd33297b1ccaec3badd9090403da29b0
Hey @ajkr , there were a lot of commits and reverts flying around for this issue, so I'm a bit confused. :) Is this finally resolved? |
Summary: If there's manual compaction in the queue, then "HaveManualCompaction(compaction_queue_.front())" will return true, and this cause too frequent MaybeScheduleFlushOrCompaction(). facebook#3198 Closes facebook#3366 Differential Revision: D6729575 Pulled By: ajkr fbshipit-source-id: 96da04f8fd33297b1ccaec3badd9090403da29b0
Hi , |
EDIT: delete this comment as it was wrong |
It's fixed in 5.12 and later. |
@ajkr , the most stable higher version I was able to use is 5.9.3, all versions above this version including 5.12 failed to normally sync with their primary and fail with " [replExecDBWorker-0] Failed to load timestamp of most recently applied operation: NoMatchingDocument: Did not find any entries in local.oplog.rs" error By any chance, can you create a new branch such as 5.9.4 to include this change? |
I'd prefer to fix the issue preventing upgrade to 5.12 instead of backporting non-critical fixes to arbitrary old versions. Want to report the issue you see with upgrading? |
Expected behavior
I have prepared mongodb 3.4.10 + RocksDB 5.9.
And ran sysbench(https://github.com/Percona-Lab/sysbench/tree/dev-mongodb-support-1.0) prepare for test with this options.
I have expected stable insert throughput. because there's no query(Find). only single thread inserts document to test collection (sbtest).
Actual behavior
First a few hours, everything is peaceful (thanks to L0 -> L0 compaction).
But after about 0.4 billion documents are insert sbtest1 (the first collection), insert performance is dropped drastically. At normal status, 30K insert / second, but it's dropped to 2K/sec. And once this case happens, this low throughput continue about 1~2 minutes.
During this performance drop, I ran linux profile and pstack for mongod process.
And I have found something interesting. During performance drop, "rocksdb::Version::AddLiveFiles" took a lot of cpu time. I have not checked source code yet, but this is not expected (might be even you). But I am worried about 30K/sec performance drop to 2K/sec.
Profile report(During performance drop) : https://gist.github.com/SunguckLee/e68a02981f6edece6214b32db8b37513#file-perf_report_slow-txt
Profile report(During normal 30K insert) : https://gist.github.com/SunguckLee/9fd03e1397eb2446e858e024200cc2ec#file-perf_report_normal-txt
Stacktrace : https://gist.github.com/SunguckLee/a0fb86b182c259360450b8528433a8f0#file-stacetace-txt
Steps to reproduce the behavior
Build rocksdb with "USE_RTTI=1 CFLAGS=-fPIC make static_lib".
Build mongodb with "scons CPPPATH="/mongodb-rocksdb-3.4/rocksdb/include" LIBPATH="/mongodb-rocksdb-3.4/rocksdb" LIBS="lz4 zstd" -j5 mongod mongo".
And run mongodb with below mongod configuration.
And run sysbench prepare command.
I have built mogorocks with devtoolset-6 and tested it on ...
The text was updated successfully, but these errors were encountered: