New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Brm save improvement #785
Brm save improvement #785
Conversation
bitmap in one swipe vs small bits at a time.
checking for errors right, or creating the dir right.
|
ACK! wait a min, just realized I left something undone. |
|
OK. I made it also not save a snapshot after every transaction. |
| size_t progress = 0, writeSize = emNumElements * sizeof(EMEntry); | ||
| int err; | ||
| char *writePos = (char *) fExtentMap; | ||
| while (progress < writeSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could there ever be a case, such as corrupted file, where this condition is never met, causing an infinite loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now there's not much protection against corruption, but if it get an early EOF or a read error, the if stmt below will catch it (and so not loop infinitely). Garbled data is something it can't handle ATM. If we want, we could add a checksum to the header w/o too much pain though, or checksums throughout, etc. Easy to change the file format for these.
This is for MCOL-3242. Want to get this in before I go on vacation.
This replaces the element-by-element writing pattern of the BRM structures with a clustered writing pattern of like-types. Ex, the EM/VBBM/VSS only write allocated entries. The new version scans for groups of allocated/unallocated entries, and writes a group of consecutive allocated entries in one write() op. This greatly reduces the # of write() calls. On the read() side, I have it read the whole thing in one read(), and do whatever per-element processing needs to be done on it afterward.
I modded the OID server code the same way, because I saw the same access pattern when it would init the oid bitmap.
I don't know who did the IDB* conversion (was originally fstream) for these classes, but what they did is copy the original code and flip between them depending on whether HDFS was being used. So, the IO code all looks like:
if (HDFS)
{
original code using IDBDataFiles
}
else
{
original code using fstreams
}
There's no need for that. Or at least there should be no need. Hard to say whether the hacker had a reason for that or not. What I've done for this patch is force the decision to go down the IDBDataFile version, which is the path with the optimization. After an appropriate period where we see things are working, 2 weeks after merging maybe, I will follow up and delete the fstream version.
To verify correctness, I hacked my local copy to go down both the IDBDatafile path that I've modified, and the original fstream version, which produced 2 files for each BRM struct. I ran many tests to populate them, saved at random points, and used a script to verify the 2 files were identical. I could have written code to automatically check after every execution but figured this was good enough b/c the changes aren't that complicated. That said, the reviewer should go over it with a fine-tooth comb.