Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convenience start methods must shutdown on problems #44

Closed
tm-ms opened this issue Nov 26, 2019 · 8 comments
Closed

Convenience start methods must shutdown on problems #44

tm-ms opened this issue Nov 26, 2019 · 8 comments

Comments

@tm-ms
Copy link
Contributor

tm-ms commented Nov 26, 2019

Bane of convenience methods:
If complex logic is hidden in a short methode, there is no way to add a try-catch and care for cleanup (e.g. shutdown a started storage instance). So the convenience methods have to handle such cases.

Especially ensuring that all file handles are closed.
Maybe even with an internal "forceAbort" or something like that, in case task processing does not work anymore.

@tm-ms
Copy link
Contributor Author

tm-ms commented Nov 27, 2019

Wrapped the additional startup logic in EmbeddedStorageManager#start after calling StorageManager#start with a try-catch and an exception handling that terminates all active threads and closes open files.

@tm-ms
Copy link
Contributor Author

tm-ms commented Nov 29, 2019

As described in #4, there is still a difference between the queryable information of being shutdown and the state of actually being completely inactive and having no more files opened.
Since this is by design (the backup thread processing its remaining items even after shutdown), additional internal logic and API functionality has to be provided to get it right.
For example:

  • registering "users" for a file (channels, backup handler, etc.) and only really closing it when the user count is 0. This prevents race conditions on preliminary closed but reopened files.
  • providing an "isInactive()" method to the storage manager that actually queries if all its active parts have really become inactive. This is different from "isShutdown()", which only queries if the internal mode is set to be (or shortly become) inactive.
  • maybe also providing a "waitForInactivity()" method or something like that that blocks until the storage is truly inactive.

Probably all 3 of these. Then, the problem should be gone and the complex cases of shutting down, being shut down and being really inactive should be conveniently handleable.

@tm-ms
Copy link
Contributor Author

tm-ms commented Nov 29, 2019

StorageLockedFile already has a user-count mechanism, albeit a too simplistic one. The user count is a simple int. Necessary is a "usages" count per user, e.g. the storage channel instance is a user, the backup handler is another user. Shutting down the storage clears all usages of the channel instance but not those of the backup handler. Only when the total usage count is 0 can a file really be closed.

Of course this mechanism has to be properly synchronized since multiple threads are calling it with arbitrary timing. This also means that there must be a "closingDecrement" method that decrements the usage count and immediately closes the file if it removed the last usage.
In turn, this means that the simple close() call must throw an exception if there are still usages.

@tm-ms
Copy link
Contributor Author

tm-ms commented Dec 2, 2019

Implemented a proper "user" registration, including a checking close() method, a corresponding tryClose() method, lock-protected unregisterClosing variants, including closing logic.
Also, a StorageManager#isActive querying method that properly checks all its active parts internally.

Testing showed some forgotten details (initial the file manager being initially registered as a user at data file instance creation time). Those have been fixed, now.
But it still does not work properly.

The current reason is:
java.nio.file.Files#deleteIfExists is being called, returns true, but the file is still not deleted.
I am not sure if that means the file is still opened (but then they return a success status, geniuses ...) or that the method does not work properly in the first place. This has to be further researched.

The test already waits for the storage to be inactive.
All files are closed prior to deleting them.
There isn't even a backup thread.
But still the deleteIfExists method fails.

This is going to be ... "fun"...

@tm-ms
Copy link
Contributor Author

tm-ms commented Dec 2, 2019

The Windows ressource monitor shows that the file handle is indeed released. Albeit after a suprisingly long time (~30 seconds) while it still writes to the file with the tiniest bandwith (slowly dropping from ~500 to ~150 B/s).
But even when calling the delete after the handle definitely disappeared, it reports true but does not delete the file.
Calling Files#delete does nothing (void, no exception), which indirectly indicates success, but still the file is NOT deleted.
If Files#deleteIfExists is called after that, a wird AccessDeniedException is thrown with no usable information.
When the java process is ended after that, THEN the file is deleted.

This is all very, very wird.

@tm-ms
Copy link
Contributor Author

tm-ms commented Dec 2, 2019

After more research:
The file is actually deleted, but seems to be immediately recreated.
This applies only to the channel data files, not to the transactions files.
I can rule out that this has anything to do with microstream code for the following simple reason:
VisualVM shows that:

  • no thread even remotely related to microstream is still running
  • no instance of anything remotely related to microstream still exists
    IT IS ALL GONE.

Yet, still, the file keeps reappearing.
It can even be deleted manually in the file explorer. But on the next refresh, it is still there.

Also weird:
The transactions file and the data files are created and handled by EXACTELY the same logic.
Still, only the data files show this behavior, not the transactions files.

@tm-ms
Copy link
Contributor Author

tm-ms commented Dec 2, 2019

The reason was a still existing user count, but swallowed by "closeSilent" logic.
The weird thing was that the files were still deleted at jvm shutdown and were deleteable via explorer.

Nevertheless: closeSilent is a naive pestilence that has to be exterminated sooner rather than later.
Replaced the occuring 2 patterns with:
1.) try-catch-finally with Throwable suppressed linking.
2.) Newly invented DisruptionCollector and MultiCauseException pattern for loops over multiple to be closed items where the first exception should not prevent the others.

Tested. Works.

@tm-ms
Copy link
Contributor Author

tm-ms commented Dec 3, 2019

Works in JUnit tests as well, so this issue can be closed.

@tm-ms tm-ms closed this as completed Dec 3, 2019
@fh-ms fh-ms transferred this issue from another repository Apr 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant