Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lob cleaner issue #3583

Closed
wburzyns opened this issue Jul 29, 2022 · 17 comments · Fixed by #3595
Closed

lob cleaner issue #3583

wburzyns opened this issue Jul 29, 2022 · 17 comments · Fixed by #3595

Comments

@wburzyns
Copy link

@andreitokar

Unfortunately this happened to me while closing my DB after removal of a large number of LOBs:

Exception in thread "H2-lob-cleaner": email.com.gmail.wburzyns.org.h2.mvstore.MVStoreException: Reading from file split:30:W:/__db/FinSys DB/storage/matfile.mv.db failed at 41526190657 (length 45768523776), read 384, remaining 0 [2.1.214/1]
at email.com.gmail.wburzyns.org.h2.mvstore.DataUtils.newMVStoreException(DataUtils.java:1004)
at email.com.gmail.wburzyns.org.h2.mvstore.DataUtils.readFully(DataUtils.java:470)
at email.com.gmail.wburzyns.org.h2.mvstore.FileStore.readFully(FileStore.java:98)
at email.com.gmail.wburzyns.org.h2.mvstore.Chunk.readBufferForPage(Chunk.java:422)
at email.com.gmail.wburzyns.org.h2.mvstore.MVStore.readPage(MVStore.java:2569)
at email.com.gmail.wburzyns.org.h2.mvstore.MVMap.readPage(MVMap.java:633)
at email.com.gmail.wburzyns.org.h2.mvstore.Page$NonLeaf.getChildPage(Page.java:1125)
at email.com.gmail.wburzyns.org.h2.mvstore.CursorPos.traverseDown(CursorPos.java:61)
at email.com.gmail.wburzyns.org.h2.mvstore.MVMap.operate(MVMap.java:1770)
at email.com.gmail.wburzyns.org.h2.mvstore.MVMap.remove(MVMap.java:518)
at email.com.gmail.wburzyns.org.h2.mvstore.StreamStore.remove(StreamStore.java:301)
at email.com.gmail.wburzyns.org.h2.mvstore.db.LobStorageMap.doRemoveLob(LobStorageMap.java:493)
at email.com.gmail.wburzyns.org.h2.mvstore.db.LobStorageMap.cleanup(LobStorageMap.java:443)
at email.com.gmail.wburzyns.org.h2.mvstore.db.LobStorageMap.lambda$null$0(LobStorageMap.java:122)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedByInterruptException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:740)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:718)
at email.com.gmail.wburzyns.org.h2.store.fs.split.FileSplit.read(FileSplit.java:63)
at email.com.gmail.wburzyns.org.h2.mvstore.DataUtils.readFully(DataUtils.java:456)
... 15 more

So it looks like a race condition is still lurking in the code. This is 100% reproducible but does not happen if the number of LOBs removed is greatly reduced. BTW the removal goes fine as long as the DB is still opened. This happens only on close.

@grandinj
Copy link
Contributor

Something called interrupt() on the thread that as doing the cleanup.

Perhaps some watchdog type timer you have in your application?

@wburzyns
Copy link
Author

Nope, I don't have anything like that. I'd rather guess it's related to automatic DB compaction on close (200 ms by default, IIRC).

@katzyn
Copy link
Contributor

katzyn commented Jul 30, 2022

H2 never interrupts own threads performing disk I/O by itself (unless you modified something in sources of H2 in your custom build).

So you need to figure out which part of your application (including libraries and web/application server, if any) does that and fix it somehow.

If you cannot prevent these interrupts, you can use a separate H2 Server process and use remote connections to it from your application or you can use the async: file system, if you need the embedded database.

https://h2database.com/html/features.html#connection_modes

@wburzyns
Copy link
Author

In the same environment the DB compaction via SHUTDOWN COMPACT always succeeds even though it lasts about 40 minutes. Isn't this suggesting that the interruption comes from H2? The crash is almost instant after CLOSE if prior to the closing a large number of LOBs has been removed (and time that elapsed between removal of LOBs and CLOSE doesn't affect the occurence of the crash).

@katzyn
Copy link
Contributor

katzyn commented Jul 30, 2022

Isn't this suggesting that the interruption comes from H2?

No, H2 never interrupts its own threads excluding the watchdog thread that doesn't perform disk I/O.

@wburzyns
Copy link
Author

  1. So how do you explain SHUTDOWN COMPACT always successfully finishing and ordinary DB closing after massive LOBs removal always failing?

  2. If I develop an isolated testcase (it would be the only thing running in JVM) and reproduce the crash on my DB this way would you agree that the problem is on H2's side? Unfortunately I cannot afford the time to develop a fully standalone testcase.

@katzyn
Copy link
Contributor

katzyn commented Jul 30, 2022

First of all, you should test the original H2 instead of your own custom build.

To debug this issue you need to start your application with a debugger. Set up a breakpoint on Thread.interrupt() method to catch callers of this method.

@wburzyns
Copy link
Author

The build has not been recompiled. The only change is that is has been repackaged with JarJar.

Could you share some insight on point 1?

@katzyn
Copy link
Contributor

katzyn commented Jul 30, 2022

Maybe your application (or web / application server, if you use it) waits some time and interrupts threads only after that when it tries to finish its work and close the database as a part of application termination? When cleanup tasks are fast connections are closed before the timeout, but when database needs more time to finish its work your application (or application server) tries to interrupt its threads?

Anyway, you can find callers of Thread.interrupt() only by yourself. If you don't want to debug it, use the async: file system abstraction layer as suggested in documentation of H2 for applications when they use thread interruptions.

@wburzyns
Copy link
Author

My environment is single threaded. It waits 40 minutes for SHUTDOWN COMPACT to complete and I see no reason why - in exactly the same setting - would it interrupt normal close, but only after a massive LOB removal.

I managed to remove all the LOBs by doing this operation in chunks: remove some LOBs (30000 of them; 50000 causes the problem to appear), close the DB, reopen the DB, repeat.

A race condition is evident here. You suspect that it's caused by my environment, I suspect H2 itself. If SHUTDOWN COMPACT were failing in a similar manner then I'd agree with you. Unfortunately the LOB subsystem has a long history of problems and I believe this is yet another.

@katzyn
Copy link
Contributor

katzyn commented Jul 30, 2022

I think we're talking about different issues.

If you complain about slow SHUTDOWN COMPACT this is an issue about LOB storage and database compaction and ClosedByInterruptException in the provided stack trace is completely irrelevant to it.

@andreitokar
Copy link
Contributor

Is it possible, that cleanup process takes about 10 sec. or more? Because it is asynchronous and thread (ThreadPoolExecutor) is forcibly shutdown after 10 sec. (may issue Thread.interrupt() in the process). Probably it wasn't the best idea to use such procedure, or time-out should be a way higher. Some cooperation from LobStorageMap is required. Also, I just realized that BLOB clean-up absolutely ignores the time allotted for a clean-up on shutdown.

@wburzyns
Copy link
Author

@andreitokar

Yes, it does take more than 10 seconds. After your remarks I think I know why limiting the amount of LOBs deleted before DB closing to 30000 (on my quite old machine) remedies the issue (BTW if the LOBs are smaller, more of them can be removed before causing a crash).

As long as I'm concerned, I don't mind if all deleted LOBs are removed on normal shutdown. A periodic (say once a week) SHUTDOWN COMPACT is acceptable. However, a crash on normal DB closing is a no-go.

@katzyn
Copy link
Contributor

katzyn commented Jul 31, 2022

@wburzyns
Sorry, I really missed that new cleanup logic introduced in H2 2.1.214 uses a ThreadPoolExecutor and invokes its shutdownNow() method on timeout. 2.1.212 and older versions didn't have this code.

@andreitokar
I think we need to stop execution in some different way, for example, check value of some volatile flag in LobStorageMap.cleanup(long) in its loop.

@wburzyns
Copy link
Author

@katzyn
No problem, I'm glad that we've came to a meaningful conclusion :)

@andreitokar
Copy link
Contributor

To stop execution without interrupt is an easy part. The problem is that, as of now, LOB removal fact is not persisted, just queued for processing into in-memory queue, so premature interruption will leave LOBs as live forever.

@andreitokar
Copy link
Contributor

I guess, for now just eliminate interrupt and live with long shutdown time in case of massive LOB removal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants