New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MV background thread -- Causing system overload #2206
Comments
@goranschwarz, Sorry to hear about all those problems, but please understand that 30-35GB database with 2400 tables/indexes is not a typical application for H2. I am sure we are going to fix those inefficiencies, but please understand that you are raising the bar here.
Database file maintenance is necessary evil and it never comes for free. Two basic actions are there: chunks movements (defragmentation) and chunks re-writing (compaction). Chunk is a collection of pages (up to 1-2 mb in size), which was written at once and that serves as a unit of "garbage collection". In order to make continuous free file space to write a new chunk, existing chunks has to be moved around (as a whole units). Chunk movement is a single read and single write. Chunk rewriting, on the other hand is a process of elimination of a sparsely populated (due to page attrition) chunk. Here individual sill alive pages are read, collected and written as a one new chunk, so it's multiple (possibly hundreds) reads and one write. But the worst part is that in order to figure out dead pages all maps has to be scanned to find live pages (somewhat similar to the "mark" stage of java GC), and that involves tons of page reads blowing away any page caches. That's why number of read requests is so high, but majority of them are very small (4-10Kb), so iostat is not that terrible. Please understand that H2 was designed with SSD in mind, under assumption that random small reads are cheap. That dead/live page determination definitely does not scale well, has to be re-worked, and I have some ideas/plans in that area. For now, to disable chunk re-writing, you can use undocumented way: add ";MAX_COMPACT_COUNT=100" to the url. You should definitely see reduction in read request count (but not necessary amount of data read). What surprises me a lot and what I was not able to reproduce in my experiments (I've been playing with 3-10GB databases, containing just a few tables) is the fact that plain "SHUTDOWN" (without defrag) takes so long. What is also surprising is your complain about high CPU usage. Single background thread can't take more than one core, so where this
coming from? BTW, what is you heap size? MVStore's autoCommitBufferSize is calculated based on this, and it seems to be very high, judging by the output you provided, like "set.size()=2021", which means it tries to re-write 2021 chunks at once. |
After I installed 1.4.200 I got the following issues:
Note: I closed the case #1820 (which was a bit to fragmented and continued here)
I think that everything is connected to
MV background thread
And this is how I draw that conclution!
MVStore$BackgroundWriterThread -> writeInBackground -> doMaintenance -> rewriteChunks -> compactRewrite -> ...
compactRewrite
to write out (see code snippet later)And here are some of the results
Execution Time
varies from: 20 minutes up to above 3 hoursNumber of reads
averaging around 4M reads (yes 4_000_000)I guess this is not how things was intended to work!
This will cause:
wa - iowait
if the pages are not in H2 cache or OS FileSystemCacheus - user time
if the pages are in cache... but it still eats alot of CPUThis problem is mainly on larger databases.
my 'DBXTUNE_CENTRAL_DB' database is around 30-35 GB
NOTE On smaller databases 2-4 GB
On smaller databases (which I have a couple of, for the moment 9 collector databases, where a new database is created after 24 hour, as an archive mechanism) the "Execution time" for method
compactRewrite
is typically around 2-10 seconds.But it still does alot of IO's. between 50K and 200K on every invocation.
The file system cache might hold many of those IO requests (because I don't see that many IO reads from
isostat
, but theMVStore: FileStore fileStore.readCount
is still incremented.Meaning: it's not idle... it's using alot of CPU cyckles.
URL
typically:
jdbc:h2:file:/home/sybase/.dbxtune/dbxc/data/DBXTUNE_CENTRAL_DB;DATABASE_TO_UPPER=f alse;RETENTION_TIME=1000;MAX_COMPACT_TIME=2000;COMPRESS=TRUE;WRITE_DELAY=30000;DB_CLOSE_ON_EXIT=FALSE
Short snippet from the startup phase (where the above URL was grabbed from)
Includes all settings from the information_schema
Possible Solution
Not sure what the best solution is, but some part of it might be:
for example in a
shutdown
scenario, we should honnorMAX_COMPACT_TIME
or similar, so we do not spend ages in shutdown... causing apps to do unclean shutdownsBG_MAINT_INTERVALL=###
where we can set that the background threadcleanup part, is only executed after ### seconds (so some kind of mechanism to not run it as often as it is today)
Normall stacktrace
Code changes to probe compactRewrite
The code that is not indentation is the added code
I also added a small probe to method
rewriteChunks
to probe thewriteLimit
parameter, and also to see when the method was called.But I removed that output from the output "a bit up", because it was to messy...
The text was updated successfully, but these errors were encountered: