-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use serde instead of hand rolled serialization #18
Conversation
This is interesting! What does the ultimate serialization end up as? (I read it in a hex editor to debug it sometimes.) What is the performance impact? (I don't have a good way of measuring it, hmmmm.) |
cbor. The format was fairly arbitrarily chosen, the only constraint being that it needs to store its own serialized length so we can serialize multiple entries back-to-back . You can pick any other listed on https://serde.rs/ if you wish.
If that's a concern, you could pick yaml or json.
I haven't measured it but I'm certain it's negligible (esp. when using a binary format like cbor). Disassembled it for fun: https://gist.github.com/piscisaureus/d09b68b083d680295fa5dfa00ab95dbd |
I'm mildly positive on this. I had considered serde initially but I was reluctant to take on dependencies I don't well understand, and in particular here I don't understand what the actual serialization ends up being, but I can read about cbor I guess. I think JSON/YAML aren't acceptable for perf reasons. Reading this is in the startup critical path and in my recollection the comparable file in Ninja ends up being megabytes. I will look into a way to evaluating the perf of this and get back to you. |
I tried this:
This measures the no-op time to check that clang-format is up to date, which involves reading this db. With the in-tree impl I got With this patch (and main merged, to make the comparison fair) I got Some other notes: the binary size grew by like 200kb which is like 15% which is no problem. Based on my understanding of the serde docs, it could be used to implement a serialization more similar to the current serialization (which was chosen for performance reasons). In other words I think we could still use serde with some more custom logic instead of cbor and maybe get most of the benefit? Not sure, I'm not too familiar with serde. |
That's possible, albeit of somewhat debatable benefit. We could also try bincode.
I suspect that it is not the serialization itself that causes the slowdown, but the fact that it "flushes to disk" on every write. That's expected because: serde_cbor::ser::to_writer(&mut self.w, &entry)?; // <-- self.w is a `File`. I think if we put a buffered writer there it might get a little better. |
I've updated the PR to use a BufReader/BufWriter, hopefully that will reduce time spent in the kernel. |
OK I'm convinced now that (lack of) buffering was indeed the issue. Hand rolled:
Serde without buffering:
Serde with BufReader/BufWriter:
|
804ac03
to
d3756e0
Compare
I think to resummarize this one, it adds a dependency, grows the binary by a bit, makes the code a bit slower, and saves ~140 lines of serialization code. I'm not sure it's really worth it. I wonder if using bincode directly would be better? Its encoding looks pretty similar to what I wrote. |
03de3da
to
afe500f
Compare
I tried porting this on top of #58: https://github.com/GrigorenkoPV/n2/tree/serde Comparison between 05d11b9 (baseline) & f827571 (serde):
Surprisingly, it is ~4% faster. The binary did, however, get bigger. Stripped versions, size in bytes:
Another thing worth mentioning is that this allows to get rid of some dubious unsafe code like unsafe { MaybeUninit::uninit().assume_init() }, from here. Should I create a PR? I feel like I should mention @piscisaureus's authorship in my comment, because it is mostly his code, but I don't actually know how to do it. Probably there's some git command for that. |
Closing in favor of #59 |
No description provided.