New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use deserialize_from for faster deserialization #2128
Conversation
So we tested this out(with the bincode float patch applied) on the gmail recording in wrench, and it looks like this drops BuiltDisplayListIter::next from 12% of our time to 4%. Nice! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel really sad that we are working around a problem that shouldn't be there in an ideal world (with compiler optimizing out all the redundant move). But within the current situation, I think we should take this in and move on. Great work @gankro !
I'd like to wait on landing this for a bit while I negotiate with upstream about details. The only urgency on this (since it's an almost completely transparent change) is shaping up for the all-hands? |
Correct. |
bump byteorder version to get transmuting float deserialization This is the float change that was referenced in #2128 Before every deserialize of a float did a branch to canonicalize NaNs, now it just feeds the bits through. The effect can be seen most dramatically when deserializing the `[f32; 16]` matrices in PushStackingContext, where the cost of reading a float is reduced enough to convince llvm to unroll the entire loop. <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/webrender/2138) <!-- Reviewable:end -->
Status update: convinced the serde devs that the core stuff is basically good, in concept (impl details still need review). Currently refactoring so the extra derive stuff can be cfg'd out with a feature flag, for people who don't want/need it, since it does blow up compile times a bit (worth it for us). |
8ee54db
to
57d9165
Compare
57d9165
to
ce71b22
Compare
@staktrace has figured out how to get this to work upstream, so we should be able to land it if y'all are ready? #2181 should land first tho |
By "upstream" you really mean "downstream" :) (or less ambiguously, "in gecko") Try pushes that include this PR and the modified serde stuff: (There's some libudev-sys changes in the patches which I'm landing separately beforehand in bug 1423236, so you can ignore that if you're looking through the patches) |
Slight hitch: turns out the solution only works for Rust >= 1.22, and we don't have that on all platforms yet. Our minimum build requirement is still 1.21. Bug 1421097 is tracking the update to 1.22 and according to the schedule it's supposed to happen tomorrow but it might be delayed due to issues. |
☔ The latest upstream changes (presumably #2192) made this pull request unmergeable. Please resolve the merge conflicts. |
The update to Rust 1.22 just landed on autoland (bug 1421097). Assuming that doesn't bounce we should be good to merge this PR now. |
e139835
to
9a9c9a8
Compare
rebased |
This soft-forks serde_derive to add a deserialize_from derivation for repr(u8) enums.
9a9c9a8
to
36864e9
Compare
@bors-servo r=kvark |
📌 Commit 36864e9 has been approved by |
use deserialize_from for faster deserialization It seems to at least be correct! Heading out for the day, but @glennw or @jrmuizel might be interested in profiling this, so here it is. What's going on here is an incredibly deep yak shave that I don't have the spirit to enumerate, but in essence this changes our deserialize pattern from: `deserialize() -> Result<T, E>` to `deserialize(dest: &mut T) -> Result<(), E>` (which is recursively invoked on fields). This seems to significantly improve the codegen, as Rust is Very Bad at optimizing away all the copies from using Results. Currently this relies on my own fork of serde/serde_derive for the new trait method, and all the derivation work. I hope to upstream most of this. The Hard Part of this entire thing, which I doubt will be upstreamable, will be handling enums. There is no way in Rust to build an enum up "in place". This is a pretty serious problem, because SpecificDisplayItem, the thing we're trying to deserialize, is an enum! To handle this, I rely on the [deterministic layout](rust-lang/rfcs#2195) of a `repr(u8)` enum to transmute into its repr and manipulate the tag and payload seperately. Specifically I can produce a more optimized enum deserialization iff: * The enum has a repr(int) annotation * The enum has a nonelike (payloadless) variant * The enum has a somelike (payloadfull) variant * The enum is Copy We then generate roughly this code: ```rust fn deserialize_enum(&mut self) { let repr = transmute<&mut Self, &mut Repr>(self); // First mark the enum as empty repr.somelike.tag = Tag::nonelike; match deserialize_tag() { Tag::A => { try!(repr.A.field1.deserialize_from()); try!(repr.A.field2.deserialize_from()); try!(repr.A.field3.deserialize_from()); // NOW mark the enum as having data! repr.somelike.tag = Tag::A; Ok(()) } Tag::B => { ... } } } ``` The point of this is that it prevents any memory-unsafety from observing a "partially" deserialized enum. The problem with it is that it would cause a leak if one of the cases contained a type with a destructor. However this is doubly fine for us: SpecificDisplayItem is completely POD, and we hard abort the whole process if we find an error while deserializing. Also note that currently unions forbid non-Copy types, so destructors aren't *actually* a concern, but also we have literally no way to check for trait impls so this just leads to a cryptic error pointing to the `#[derive(Deserialize)]` annotation. So either this design leaks, or it leads to cryptic errors. Either way, this will probably block the enum derive code from ever being properly upstreamed. Missing optimizations from this PR that are also possible: * Bincode still branches+masks on every f32 read due to some safety scare. This should be fixed upstream Soon™️. Without this, deserializing f32's is bloated enough that it actually produces a loop, instead of straight-line code. With this, we get straight-line code, but there's still a branch for length checking on each f32 read, instead of a single check and a memcpy. It's hard for us to fix this at our abstraction level; ideally rustc/llvm would fix it. * AuxIter still uses the old deserialization method because it needs to return an `Option<T>` anyway. It could be changed to be a "fake" iterator like BuiltDisplayListIter too if it's desirable. <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/webrender/2128) <!-- Reviewable:end -->
☀️ Test successful - status-appveyor, status-travis |
It seems to at least be correct!
Heading out for the day, but @glennw or @jrmuizel might be interested in profiling this, so here it is.
What's going on here is an incredibly deep yak shave that I don't have the spirit to enumerate, but in essence this changes our deserialize pattern from:
deserialize() -> Result<T, E>
todeserialize(dest: &mut T) -> Result<(), E>
(which is recursively invoked on fields).This seems to significantly improve the codegen, as Rust is Very Bad at optimizing away all the copies from using Results. Currently this relies on my own fork of serde/serde_derive for the new trait method, and all the derivation work. I hope to upstream most of this.
The Hard Part of this entire thing, which I doubt will be upstreamable, will be handling enums. There is no way in Rust to build an enum up "in place". This is a pretty serious problem, because SpecificDisplayItem, the thing we're trying to deserialize, is an enum!
To handle this, I rely on the deterministic layout of a
repr(u8)
enum to transmute into its repr and manipulate the tag and payload seperately. Specifically I can produce a more optimized enum deserialization iff:We then generate roughly this code:
The point of this is that it prevents any memory-unsafety from observing a "partially" deserialized enum. The problem with it is that it would cause a leak if one of the cases contained a type with a destructor. However this is doubly fine for us: SpecificDisplayItem is completely POD, and we hard abort the whole process if we find an error while deserializing. Also note that currently unions forbid non-Copy types, so destructors aren't actually a concern, but also we have literally no way to check for trait impls so this just leads to a cryptic error pointing to the
#[derive(Deserialize)]
annotation.So either this design leaks, or it leads to cryptic errors. Either way, this will probably block the enum derive code from ever being properly upstreamed.
Missing optimizations from this PR that are also possible:
Bincode still branches+masks on every f32 read due to some safety scare. This should be fixed upstream Soon™️. Without this, deserializing f32's is bloated enough that it actually produces a loop, instead of straight-line code. With this, we get straight-line code, but there's still a branch for length checking on each f32 read, instead of a single check and a memcpy. It's hard for us to fix this at our abstraction level; ideally rustc/llvm would fix it.
AuxIter still uses the old deserialization method because it needs to return an
Option<T>
anyway. It could be changed to be a "fake" iterator like BuiltDisplayListIter too if it's desirable.This change is