lob cleaner issue #3583

wburzyns · 2022-07-29T20:19:41Z

Unfortunately this happened to me while closing my DB after removal of a large number of LOBs:

Exception in thread "H2-lob-cleaner": email.com.gmail.wburzyns.org.h2.mvstore.MVStoreException: Reading from file split:30:W:/__db/FinSys DB/storage/matfile.mv.db failed at 41526190657 (length 45768523776), read 384, remaining 0 [2.1.214/1]
at email.com.gmail.wburzyns.org.h2.mvstore.DataUtils.newMVStoreException(DataUtils.java:1004)
at email.com.gmail.wburzyns.org.h2.mvstore.DataUtils.readFully(DataUtils.java:470)
at email.com.gmail.wburzyns.org.h2.mvstore.FileStore.readFully(FileStore.java:98)
at email.com.gmail.wburzyns.org.h2.mvstore.Chunk.readBufferForPage(Chunk.java:422)
at email.com.gmail.wburzyns.org.h2.mvstore.MVStore.readPage(MVStore.java:2569)
at email.com.gmail.wburzyns.org.h2.mvstore.MVMap.readPage(MVMap.java:633)
at email.com.gmail.wburzyns.org.h2.mvstore.Page$NonLeaf.getChildPage(Page.java:1125)
at email.com.gmail.wburzyns.org.h2.mvstore.CursorPos.traverseDown(CursorPos.java:61)
at email.com.gmail.wburzyns.org.h2.mvstore.MVMap.operate(MVMap.java:1770)
at email.com.gmail.wburzyns.org.h2.mvstore.MVMap.remove(MVMap.java:518)
at email.com.gmail.wburzyns.org.h2.mvstore.StreamStore.remove(StreamStore.java:301)
at email.com.gmail.wburzyns.org.h2.mvstore.db.LobStorageMap.doRemoveLob(LobStorageMap.java:493)
at email.com.gmail.wburzyns.org.h2.mvstore.db.LobStorageMap.cleanup(LobStorageMap.java:443)
at email.com.gmail.wburzyns.org.h2.mvstore.db.LobStorageMap.lambda$null$0(LobStorageMap.java:122)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedByInterruptException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:740)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:718)
at email.com.gmail.wburzyns.org.h2.store.fs.split.FileSplit.read(FileSplit.java:63)
at email.com.gmail.wburzyns.org.h2.mvstore.DataUtils.readFully(DataUtils.java:456)
... 15 more

So it looks like a race condition is still lurking in the code. This is 100% reproducible but does not happen if the number of LOBs removed is greatly reduced. BTW the removal goes fine as long as the DB is still opened. This happens only on close.

grandinj · 2022-07-29T20:23:33Z

Something called interrupt() on the thread that as doing the cleanup.

Perhaps some watchdog type timer you have in your application?

wburzyns · 2022-07-29T20:40:55Z

Nope, I don't have anything like that. I'd rather guess it's related to automatic DB compaction on close (200 ms by default, IIRC).

katzyn · 2022-07-30T02:46:25Z

H2 never interrupts own threads performing disk I/O by itself (unless you modified something in sources of H2 in your custom build).

So you need to figure out which part of your application (including libraries and web/application server, if any) does that and fix it somehow.

If you cannot prevent these interrupts, you can use a separate H2 Server process and use remote connections to it from your application or you can use the async: file system, if you need the embedded database.

https://h2database.com/html/features.html#connection_modes

wburzyns · 2022-07-30T08:01:51Z

In the same environment the DB compaction via SHUTDOWN COMPACT always succeeds even though it lasts about 40 minutes. Isn't this suggesting that the interruption comes from H2? The crash is almost instant after CLOSE if prior to the closing a large number of LOBs has been removed (and time that elapsed between removal of LOBs and CLOSE doesn't affect the occurence of the crash).

katzyn · 2022-07-30T09:53:59Z

Isn't this suggesting that the interruption comes from H2?

No, H2 never interrupts its own threads excluding the watchdog thread that doesn't perform disk I/O.

wburzyns · 2022-07-30T10:19:48Z

So how do you explain SHUTDOWN COMPACT always successfully finishing and ordinary DB closing after massive LOBs removal always failing?
If I develop an isolated testcase (it would be the only thing running in JVM) and reproduce the crash on my DB this way would you agree that the problem is on H2's side? Unfortunately I cannot afford the time to develop a fully standalone testcase.

katzyn · 2022-07-30T10:58:02Z

First of all, you should test the original H2 instead of your own custom build.

To debug this issue you need to start your application with a debugger. Set up a breakpoint on Thread.interrupt() method to catch callers of this method.

wburzyns · 2022-07-30T11:19:09Z

The build has not been recompiled. The only change is that is has been repackaged with JarJar.

Could you share some insight on point 1?

katzyn · 2022-07-30T12:56:31Z

Maybe your application (or web / application server, if you use it) waits some time and interrupts threads only after that when it tries to finish its work and close the database as a part of application termination? When cleanup tasks are fast connections are closed before the timeout, but when database needs more time to finish its work your application (or application server) tries to interrupt its threads?

Anyway, you can find callers of Thread.interrupt() only by yourself. If you don't want to debug it, use the async: file system abstraction layer as suggested in documentation of H2 for applications when they use thread interruptions.

wburzyns · 2022-07-30T13:21:06Z

My environment is single threaded. It waits 40 minutes for SHUTDOWN COMPACT to complete and I see no reason why - in exactly the same setting - would it interrupt normal close, but only after a massive LOB removal.

I managed to remove all the LOBs by doing this operation in chunks: remove some LOBs (30000 of them; 50000 causes the problem to appear), close the DB, reopen the DB, repeat.

A race condition is evident here. You suspect that it's caused by my environment, I suspect H2 itself. If SHUTDOWN COMPACT were failing in a similar manner then I'd agree with you. Unfortunately the LOB subsystem has a long history of problems and I believe this is yet another.

katzyn · 2022-07-30T13:47:43Z

I think we're talking about different issues.

If you complain about slow SHUTDOWN COMPACT this is an issue about LOB storage and database compaction and ClosedByInterruptException in the provided stack trace is completely irrelevant to it.

andreitokar · 2022-07-30T14:51:27Z

Is it possible, that cleanup process takes about 10 sec. or more? Because it is asynchronous and thread (ThreadPoolExecutor) is forcibly shutdown after 10 sec. (may issue Thread.interrupt() in the process). Probably it wasn't the best idea to use such procedure, or time-out should be a way higher. Some cooperation from LobStorageMap is required. Also, I just realized that BLOB clean-up absolutely ignores the time allotted for a clean-up on shutdown.

wburzyns · 2022-07-30T21:46:35Z

@andreitokar

Yes, it does take more than 10 seconds. After your remarks I think I know why limiting the amount of LOBs deleted before DB closing to 30000 (on my quite old machine) remedies the issue (BTW if the LOBs are smaller, more of them can be removed before causing a crash).

As long as I'm concerned, I don't mind if all deleted LOBs are removed on normal shutdown. A periodic (say once a week) SHUTDOWN COMPACT is acceptable. However, a crash on normal DB closing is a no-go.

katzyn · 2022-07-31T05:58:29Z

@wburzyns
Sorry, I really missed that new cleanup logic introduced in H2 2.1.214 uses a ThreadPoolExecutor and invokes its shutdownNow() method on timeout. 2.1.212 and older versions didn't have this code.

@andreitokar
I think we need to stop execution in some different way, for example, check value of some volatile flag in LobStorageMap.cleanup(long) in its loop.

wburzyns · 2022-07-31T07:23:04Z

@katzyn
No problem, I'm glad that we've came to a meaningful conclusion :)

andreitokar · 2022-08-01T02:43:28Z

To stop execution without interrupt is an easy part. The problem is that, as of now, LOB removal fact is not persisted, just queued for processing into in-memory queue, so premature interruption will leave LOBs as live forever.

andreitokar · 2022-08-11T02:43:10Z

I guess, for now just eliminate interrupt and live with long shutdown time in case of massive LOB removal.

andreitokar mentioned this issue Aug 11, 2022

Eliminate use of interrupt in executor shutdown #3595

Merged

andreitokar closed this as completed in #3595 Aug 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lob cleaner issue #3583

lob cleaner issue #3583

wburzyns commented Jul 29, 2022

grandinj commented Jul 29, 2022

wburzyns commented Jul 29, 2022

katzyn commented Jul 30, 2022

wburzyns commented Jul 30, 2022

katzyn commented Jul 30, 2022

wburzyns commented Jul 30, 2022

katzyn commented Jul 30, 2022

wburzyns commented Jul 30, 2022

katzyn commented Jul 30, 2022

wburzyns commented Jul 30, 2022

katzyn commented Jul 30, 2022

andreitokar commented Jul 30, 2022

wburzyns commented Jul 30, 2022

katzyn commented Jul 31, 2022

wburzyns commented Jul 31, 2022

andreitokar commented Aug 1, 2022

andreitokar commented Aug 11, 2022

lob cleaner issue #3583

lob cleaner issue #3583

Comments

wburzyns commented Jul 29, 2022

grandinj commented Jul 29, 2022

wburzyns commented Jul 29, 2022

katzyn commented Jul 30, 2022

wburzyns commented Jul 30, 2022

katzyn commented Jul 30, 2022

wburzyns commented Jul 30, 2022

katzyn commented Jul 30, 2022

wburzyns commented Jul 30, 2022

katzyn commented Jul 30, 2022

wburzyns commented Jul 30, 2022

katzyn commented Jul 30, 2022

andreitokar commented Jul 30, 2022

wburzyns commented Jul 30, 2022

katzyn commented Jul 31, 2022

wburzyns commented Jul 31, 2022

andreitokar commented Aug 1, 2022

andreitokar commented Aug 11, 2022