-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
store()/storeAll() causes "Storage is shut down." exception when garbage collection is involved #250
Comments
Hello, The Logs above don’t show any cause why the storage has been shut down. The first idea to narrow down the problem would be to log/check the startup of the storage. |
No, that was/is the only available log.
Is there a reason why there is no information about this in the console/logs? Would be very informative to know.
I did it in the try / catch block, but as you pointed out, no exception occurred on startup. Please note that the problem with the storage only occurs in connection with the GC. If this is deactivated, the storage can be used normally again. I noticed that the LazyReferenceManager cannot always be stopped when shutting down the Tomcat. The stopping is carried out in the class MicroStreamContextDestroyer (lazyReferenceManager.stop()).
Can the starting / use of the LazyReferenceManager be deactivated if no LazyReferences are used to circumvent such cases? I don't use lazy references in the DB. Could the incorrect shutdown of the LazyReferenceManager possibly trigger the problems with the GC? |
I made an bad typing error in my last post I have to apologize for: I wrote But it should be |
You can’t prevent the LazyReferenceManager from starting but it’s ok to stop it immediately after starting the storage:
If the LazyReferenceManager is stopped Lazy-References won’t be cleaned automatically but this should be no problem if you don’t use them. The LazyReferenceManager’s related shutdown issue should not have an impact on the GC. LazyReferenceManager.get().stop() just does not wait for the task to be closed. |
Thanks for the hint.
Do you have any other suggestions what I can do?
Where can / should I catch this exception? For logging I'm using Log4j in combination with SLF4J. I adjusted my Log4j config and set the log level from "Debug" to "Trace".
Are further settings advisable to get better logging of the db? Sorry, I am not so familiar with logging. Thanks for your help! |
Nearly every Microstream call can throw exceptions. Especially the “store” calls may be interesting.
Microstream itself has no logging that could be enabled.
As you don’t have any other logs I run out of ideas. I would have expected your webserver to log any uncatched exception. |
The problem doesn't seem to be primarily with the store () / storeAll () methods, but with GarbageCollection. In this regard, I have again made outputs that show the status of the storage. Please pay attention to the status "isShutdown". Garbage collection is not activated in the first log file. Storage tests containing store () / storeAll () are also performed. There are no problems with the storage.
Garbage collection is activated in the second log file (see source code of the problem description at the beginning). It can be observed that the storage starts without errors. Shortly thereafter, the garbage collection is started and 2 seconds later the storage is shut down. No StorageStoreTests (with store () / storeAll ()) were performed. The activation, execution and deactivation of the GarbageCollection was packed in a try / catch block to catch any exceptions. But no exceptions occurred.
Have you already made such observations? Have you been able to reproduce the behavior? How do you rate this behavior? Have you already received this feedback from others or am I alone? Could the error be caused by some data type that the GC cannot handle? Or maybe the problem from the ticket #240 has something to do with this problem?
Why is/was no logging implemented? Does this have any particular reason? That would be very helpful for debugging.
Ich habe meinen Code gecheckt, im Code sind keine schluckenden Catch Clauses. Mit dem Logging des Webservers muss ich mich noch beschäftigen. I checked my code, there are no catch clauses in the code which are swallowing exceptions. I still have to deal with the logging of the web server. |
Many thanks for the detailed logging of the storage’s state. With that and the previous logs you already provided I must admit that this is a new Bug we did not observe until now.
An issue that causes the storage to shut down without an exception in interaction with the GC is new. With simpler scenarios we where not able to reproduce this until now. But we still try to do so.
I doubt that this is related to issue #240. Issue #240 resulted in 2 threads blocking each other but not in a storage shutdown. If such blocked thread is detected and killed by your webserver, I would expect some kind of log entry related to in the server logs.
If there are types not handleable there should be an exception the first time the type gets persisted, but this is not related to the GC.
There is one additional option the add more logging. You could add a custom implementation of the one.microstream.storage.types.StorageEventLogger. (Maybe the StorageEventLogger.Default implementation may be sufficient too). With that you can log a few internals of the storage channels behaviors.
One other option might be to shut down the storage after the manual GC run and start it again without the GC enabled. |
My approach is to start the GC manually at a much later point in time. Because apparently the crux of the matter is that the problem occurs when the GC is executed directly at the start of the application and the DB. If the GC runs later then the problem does not seem to occur. I tried to start the storage again after it was shut down when the GC started and got the following output. This may help you further:
Furthermore, I was able to observe that when the GC is started at the start of the application and the DB, the storage is shut down after processing a file (default: 8MB). |
Many thanks for the additional log. |
You are welcome. |
I think that was a typo and you must mean StorageEventLogger.Debug(). I implemented it and got the following output, which you have probably already seen in connection with the "StorageIsShutdown" exception:
|
I have to correct myself. This exception "StorageIsShutdown" also occurs if the GC is started manually later and not only to start the application / database. Is this a bug that is a high priority for you and you are currently working on a solution? |
The NullPointer "Cannot read field file because is null" indicates that this is the same bug we can reproduce and we're currently analyzing. |
Yes, we're working on that currently. But currently I can't make any estimations how long this will take. |
commit bac8f5b Author: hg-ms <53219833+hg-ms@users.noreply.github.com> Date: Fri Oct 22 10:27:59 2021 +0200 Missed a very important line commit ab35386 Author: hg-ms <53219833+hg-ms@users.noreply.github.com> Date: Fri Oct 22 08:30:38 2021 +0200 Fixing Nullpointer in StorageEntityChache#internalCacheCheck this fix is related to #250 microstream-one/microstream-private#588 microstream-one/microstream-private#585
It's nice to hear that the cause has been found and resolved! |
Fixing Nullpointer in StorageEntityChache#internalCacheCheck NewBackupFileValidator for every channel this fix is related to #250 microstream-one/microstream-private#588 microstream-one/microstream-private#585
We plan to do a minor release containing that fix. Please apologize that I can't tell when this will be available at the moment. |
Hello, |
Environment Details
Describe the bug
Since I synchronize larger amounts of data at night, I activated the MicroStream GarbageCollector, carried out the GarbageCollection and deactivated the MicroStream GarbageCollector again.
After an indefinite period of time, the already known problem with store () / storeAll () and the exception "Storage is shut down" occurs. This also occurs after restarting the application and without a shutdown of the storage that I initiated or intended. No exceptions were shown to me while starting the storage. Reading the db is still partially possible.
I noticed that if this exception occurred when saving to the storage and then the application is started without the following garbage collection, then the storage can be used again without problems, i.e. store () / storeAll () is also possible again.
However, if the GarbageCollection is activated again the next time it is started, the store () / storeAll () problem occurs again.
Are there possibilities / methods to better document the start, the behavior or the state of the storage or the GC, or to receive specific outputs? Could you give me some code examples in this regard?
I might be able to provide further / helpful information to solve this issue. It would also be possible for me to create a ThreadDump in the db after the unsuccessful write attempt, shortly after the db started.
I've attached the exceptions and the code for starting and shutting down.
Additional context
The text was updated successfully, but these errors were encountered: