-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor metadata concept #50
Comments
Implementation looks decent already. Latest commit (
As expected: This speeds up things quite a lot (the memory management improvements were mostly byproducts and not a direct consequence of the new metadata concept per se). @els0r I assume effects could be even more pronounced on a system with a spinning disk, so whenever you got time, feel free to give it a shot on a host of your choice (obviously the DB has to be converted from scratch - luckily conversion is now about 5-10x faster than before as well). Things done:
|
@els0r As for the legacy / additional metadata we still have:
My proposal about those:
So from my point of view that leaves us with only the number of dropped packets on capture level (per block) to keep. What do you think? And should I take care of adding the |
Some additional minor improvements to memory management (LZ4 cgo interface typing and some trivial zero-allocation string handling when dealing with paths) yield a further reduction in number of allocated objects (rest isn't affected within uncertainties since these objects are small and don't incur any significant CPU load):
|
Alright. Running the tests as we speak. Still in the process, since the conversion tool craps out for some days of data:
I suspect this is due to faulty data inside of the origin database, but am wondering whether we should just skip and continue writing. As for the tests, I also ran into some other panics while querying:
Not sure if the latter is related to the former in the sense that the conversion failed altogether. |
Thanks for testing! There's a few things that come to mind:
Aside from those two obvious things I'd like to make sure that the source DB is really corrupt and that there's no error on our end. Recent commit(s) contain fixes for the above issues. Could you maybe try the following to make sure this is a corruption on the source: Run a legacy goQuery on the pre-conversion DB in console logging mode, if those blocks are indeed corrupt it should throw an error on all of them. If that's the case, all good, otherwise we're probably missing something on the conversion side (e.g. a mismatch between the cloned lz4cust code part or similar)... |
And something to sweeten your weekend. Conversion of a 19G DB in 5 minutes.
Let's take the heaviest interface we can find:
(see what I did there ;)). That should be enough to get goquery to sweat a little bit.
With the new version that boils down to
You essentially chopped query times in half. <3 seconds to crawl through Toodles, |
Nice, like it! Thanks a lot for confirming! I hope it's going to be even better once the changes from #47 are completed and merged. I would assume that the reduce GC load is even more pronounced the bigger the underlying DB is... Stay tuned! Aside from that, the latest commit on this branch removed the legacy The DB got even smaller by another 8% w.r.t to the results posted above (probably less in your scenario because you have a better raw to metadata ratio to begin with, but for a "commodity" host / server like mine it's substantial):
I'll finish up, extend the |
Currently, GPFile block metadata is stored in
.gpf.meta
files, one metadata file per GPFile. On interface level this is invisible to the caller (GPFile essentially handling its respective header under the hood). Unfortunately there's a few issues with the whole concept that need to be tackled:DoD
The text was updated successfully, but these errors were encountered: