use deserialize_from for faster deserialization #2128

Gankra · 2017-11-28T22:27:41Z

It seems to at least be correct!

Heading out for the day, but @glennw or @jrmuizel might be interested in profiling this, so here it is.

What's going on here is an incredibly deep yak shave that I don't have the spirit to enumerate, but in essence this changes our deserialize pattern from: deserialize() -> Result<T, E> to deserialize(dest: &mut T) -> Result<(), E> (which is recursively invoked on fields).

This seems to significantly improve the codegen, as Rust is Very Bad at optimizing away all the copies from using Results. Currently this relies on my own fork of serde/serde_derive for the new trait method, and all the derivation work. I hope to upstream most of this.

The Hard Part of this entire thing, which I doubt will be upstreamable, will be handling enums. There is no way in Rust to build an enum up "in place". This is a pretty serious problem, because SpecificDisplayItem, the thing we're trying to deserialize, is an enum!

To handle this, I rely on the deterministic layout of a repr(u8) enum to transmute into its repr and manipulate the tag and payload seperately. Specifically I can produce a more optimized enum deserialization iff:

The enum has a repr(int) annotation
The enum has a nonelike (payloadless) variant
The enum has a somelike (payloadfull) variant
The enum is Copy

We then generate roughly this code:

fn deserialize_enum(&mut self) {
  let repr = transmute<&mut Self, &mut Repr>(self);
  
  // First mark the enum as empty
  repr.somelike.tag = Tag::nonelike;

  match deserialize_tag() {
     Tag::A => {
       try!(repr.A.field1.deserialize_from());
       try!(repr.A.field2.deserialize_from());
       try!(repr.A.field3.deserialize_from());

       // NOW mark the enum as having data!
       repr.somelike.tag = Tag::A;
       Ok(())
     }
     Tag::B => { ... }
  }
}

The point of this is that it prevents any memory-unsafety from observing a "partially" deserialized enum. The problem with it is that it would cause a leak if one of the cases contained a type with a destructor. However this is doubly fine for us: SpecificDisplayItem is completely POD, and we hard abort the whole process if we find an error while deserializing. Also note that currently unions forbid non-Copy types, so destructors aren't actually a concern, but also we have literally no way to check for trait impls so this just leads to a cryptic error pointing to the #[derive(Deserialize)] annotation.

So either this design leaks, or it leads to cryptic errors. Either way, this will probably block the enum derive code from ever being properly upstreamed.

Missing optimizations from this PR that are also possible:

Bincode still branches+masks on every f32 read due to some safety scare. This should be fixed upstream Soon™️. Without this, deserializing f32's is bloated enough that it actually produces a loop, instead of straight-line code. With this, we get straight-line code, but there's still a branch for length checking on each f32 read, instead of a single check and a memcpy. It's hard for us to fix this at our abstraction level; ideally rustc/llvm would fix it.
AuxIter still uses the old deserialization method because it needs to return an Option<T> anyway. It could be changed to be a "fake" iterator like BuiltDisplayListIter too if it's desirable.

This change is

Gankra · 2017-11-29T18:55:34Z

So we tested this out(with the bincode float patch applied) on the gmail recording in wrench, and it looks like this drops BuiltDisplayListIter::next from 12% of our time to 4%. Nice!

kvark

I feel really sad that we are working around a problem that shouldn't be there in an ideal world (with compiler optimizing out all the redundant move). But within the current situation, I think we should take this in and move on. Great work @gankro !

Gankra · 2017-11-29T22:22:46Z

I'd like to wait on landing this for a bit while I negotiate with upstream about details. The only urgency on this (since it's an almost completely transparent change) is shaping up for the all-hands?

jrmuizel · 2017-11-29T22:30:08Z

Correct.

bump byteorder version to get transmuting float deserialization This is the float change that was referenced in #2128 Before every deserialize of a float did a branch to canonicalize NaNs, now it just feeds the bits through. The effect can be seen most dramatically when deserializing the `[f32; 16]` matrices in PushStackingContext, where the cost of reading a float is reduced enough to convince llvm to unroll the entire loop.  --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/webrender/2138)

Gankra · 2017-12-04T19:45:06Z

Status update: convinced the serde devs that the core stuff is basically good, in concept (impl details still need review). Currently refactoring so the extra derive stuff can be cfg'd out with a feature flag, for people who don't want/need it, since it does blow up compile times a bit (worth it for us).

Gankra · 2017-12-06T21:02:17Z

@staktrace has figured out how to get this to work upstream, so we should be able to land it if y'all are ready? #2181 should land first tho

staktrace · 2017-12-06T21:15:07Z

By "upstream" you really mean "downstream" :) (or less ambiguously, "in gecko")

Try pushes that include this PR and the modified serde stuff:
Linux - https://treeherder.mozilla.org/#/jobs?repo=try&revision=564675d02666501b448090a8c9076f24edf3f94c
Windows - https://treeherder.mozilla.org/#/jobs?repo=try&revision=ba8013b6b617948cfeab67f372839b0edecd7119

(There's some libudev-sys changes in the patches which I'm landing separately beforehand in bug 1423236, so you can ignore that if you're looking through the patches)

staktrace · 2017-12-06T21:53:09Z

Slight hitch: turns out the solution only works for Rust >= 1.22, and we don't have that on all platforms yet. Our minimum build requirement is still 1.21. Bug 1421097 is tracking the update to 1.22 and according to the schedule it's supposed to happen tomorrow but it might be delayed due to issues.

bors-servo · 2017-12-08T12:02:52Z

☔ The latest upstream changes (presumably #2192) made this pull request unmergeable. Please resolve the merge conflicts.

staktrace · 2017-12-08T19:21:02Z

The update to Rust 1.22 just landed on autoland (bug 1421097). Assuming that doesn't bounce we should be good to merge this PR now.

Gankra · 2017-12-08T19:48:42Z

rebased

This soft-forks serde_derive to add a deserialize_from derivation for repr(u8) enums.

Gankra · 2017-12-08T20:13:43Z

@bors-servo r=kvark

bors-servo · 2017-12-08T20:13:43Z

📌 Commit 36864e9 has been approved by kvark

bors-servo · 2017-12-08T20:42:08Z

⌛ Testing commit 36864e9 with merge 822edff...

@glennw

use deserialize_from for faster deserialization It seems to at least be correct! Heading out for the day, but @glennw or @jrmuizel might be interested in profiling this, so here it is. What's going on here is an incredibly deep yak shave that I don't have the spirit to enumerate, but in essence this changes our deserialize pattern from: `deserialize() -> Result<T, E>` to `deserialize(dest: &mut T) -> Result<(), E>` (which is recursively invoked on fields). This seems to significantly improve the codegen, as Rust is Very Bad at optimizing away all the copies from using Results. Currently this relies on my own fork of serde/serde_derive for the new trait method, and all the derivation work. I hope to upstream most of this. The Hard Part of this entire thing, which I doubt will be upstreamable, will be handling enums. There is no way in Rust to build an enum up "in place". This is a pretty serious problem, because SpecificDisplayItem, the thing we're trying to deserialize, is an enum! To handle this, I rely on the [deterministic layout](rust-lang/rfcs#2195) of a `repr(u8)` enum to transmute into its repr and manipulate the tag and payload seperately. Specifically I can produce a more optimized enum deserialization iff: * The enum has a repr(int) annotation * The enum has a nonelike (payloadless) variant * The enum has a somelike (payloadfull) variant * The enum is Copy We then generate roughly this code: ```rust fn deserialize_enum(&mut self) { let repr = transmute<&mut Self, &mut Repr>(self); // First mark the enum as empty repr.somelike.tag = Tag::nonelike; match deserialize_tag() { Tag::A => { try!(repr.A.field1.deserialize_from()); try!(repr.A.field2.deserialize_from()); try!(repr.A.field3.deserialize_from()); // NOW mark the enum as having data! repr.somelike.tag = Tag::A; Ok(()) } Tag::B => { ... } } } ``` The point of this is that it prevents any memory-unsafety from observing a "partially" deserialized enum. The problem with it is that it would cause a leak if one of the cases contained a type with a destructor. However this is doubly fine for us: SpecificDisplayItem is completely POD, and we hard abort the whole process if we find an error while deserializing. Also note that currently unions forbid non-Copy types, so destructors aren't *actually* a concern, but also we have literally no way to check for trait impls so this just leads to a cryptic error pointing to the `#[derive(Deserialize)]` annotation. So either this design leaks, or it leads to cryptic errors. Either way, this will probably block the enum derive code from ever being properly upstreamed. Missing optimizations from this PR that are also possible: * Bincode still branches+masks on every f32 read due to some safety scare. This should be fixed upstream Soon™️. Without this, deserializing f32's is bloated enough that it actually produces a loop, instead of straight-line code. With this, we get straight-line code, but there's still a branch for length checking on each f32 read, instead of a single check and a memcpy. It's hard for us to fix this at our abstraction level; ideally rustc/llvm would fix it. * AuxIter still uses the old deserialization method because it needs to return an `Option<T>` anyway. It could be changed to be a "fake" iterator like BuiltDisplayListIter too if it's desirable.  --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/webrender/2128)

bors-servo · 2017-12-08T22:07:05Z

☀️ Test successful - status-appveyor, status-travis
Approved by: kvark
Pushing 822edff to master...

kvark approved these changes Nov 29, 2017

View reviewed changes

Gankra mentioned this pull request Nov 29, 2017

bump byteorder version to get transmuting float deserialization #2138

Merged

Gankra force-pushed the deserialize_from branch from 8ee54db to 57d9165 Compare December 4, 2017 20:02

Gankra changed the title ~~WIP: use deserialize_from for faster deserialization~~ use deserialize_from for faster deserialization Dec 4, 2017

Gankra force-pushed the deserialize_from branch from 57d9165 to ce71b22 Compare December 6, 2017 19:35

Gankra force-pushed the deserialize_from branch 2 times, most recently from e139835 to 9a9c9a8 Compare December 8, 2017 19:48

Gankra added 2 commits December 8, 2017 15:02

bump serde/serde_derive versions to prepare for soft-fork

70f5e68

use deserialize_from for faster deserialization

36864e9

This soft-forks serde_derive to add a deserialize_from derivation for repr(u8) enums.

Gankra force-pushed the deserialize_from branch from 9a9c9a8 to 36864e9 Compare December 8, 2017 20:02

bors-servo merged commit 36864e9 into servo:master Dec 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use deserialize_from for faster deserialization #2128

use deserialize_from for faster deserialization #2128

Gankra commented Nov 28, 2017 •

edited

Gankra commented Nov 29, 2017

kvark left a comment

Gankra commented Nov 29, 2017

jrmuizel commented Nov 29, 2017

Gankra commented Dec 4, 2017

Gankra commented Dec 6, 2017

staktrace commented Dec 6, 2017

staktrace commented Dec 6, 2017

bors-servo commented Dec 8, 2017

staktrace commented Dec 8, 2017

Gankra commented Dec 8, 2017

Gankra commented Dec 8, 2017

bors-servo commented Dec 8, 2017

bors-servo commented Dec 8, 2017

bors-servo commented Dec 8, 2017

use deserialize_from for faster deserialization #2128

use deserialize_from for faster deserialization #2128

Conversation

Gankra commented Nov 28, 2017 • edited

Gankra commented Nov 29, 2017

kvark left a comment

Choose a reason for hiding this comment

Gankra commented Nov 29, 2017

jrmuizel commented Nov 29, 2017

Gankra commented Dec 4, 2017

Gankra commented Dec 6, 2017

staktrace commented Dec 6, 2017

staktrace commented Dec 6, 2017

bors-servo commented Dec 8, 2017

staktrace commented Dec 8, 2017

Gankra commented Dec 8, 2017

Gankra commented Dec 8, 2017

bors-servo commented Dec 8, 2017

bors-servo commented Dec 8, 2017

bors-servo commented Dec 8, 2017

Gankra commented Nov 28, 2017 •

edited