Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use deserialize_from for faster deserialization #2128

Merged
merged 2 commits into from Dec 8, 2017

Conversation

Gankra
Copy link
Contributor

@Gankra Gankra commented Nov 28, 2017

It seems to at least be correct!

Heading out for the day, but @glennw or @jrmuizel might be interested in profiling this, so here it is.

What's going on here is an incredibly deep yak shave that I don't have the spirit to enumerate, but in essence this changes our deserialize pattern from: deserialize() -> Result<T, E> to deserialize(dest: &mut T) -> Result<(), E> (which is recursively invoked on fields).

This seems to significantly improve the codegen, as Rust is Very Bad at optimizing away all the copies from using Results. Currently this relies on my own fork of serde/serde_derive for the new trait method, and all the derivation work. I hope to upstream most of this.

The Hard Part of this entire thing, which I doubt will be upstreamable, will be handling enums. There is no way in Rust to build an enum up "in place". This is a pretty serious problem, because SpecificDisplayItem, the thing we're trying to deserialize, is an enum!

To handle this, I rely on the deterministic layout of a repr(u8) enum to transmute into its repr and manipulate the tag and payload seperately. Specifically I can produce a more optimized enum deserialization iff:

  • The enum has a repr(int) annotation
  • The enum has a nonelike (payloadless) variant
  • The enum has a somelike (payloadfull) variant
  • The enum is Copy

We then generate roughly this code:

fn deserialize_enum(&mut self) {
  let repr = transmute<&mut Self, &mut Repr>(self);
  
  // First mark the enum as empty
  repr.somelike.tag = Tag::nonelike;

  match deserialize_tag() {
     Tag::A => {
       try!(repr.A.field1.deserialize_from());
       try!(repr.A.field2.deserialize_from());
       try!(repr.A.field3.deserialize_from());

       // NOW mark the enum as having data!
       repr.somelike.tag = Tag::A;
       Ok(())
     }
     Tag::B => { ... }
  }
}

The point of this is that it prevents any memory-unsafety from observing a "partially" deserialized enum. The problem with it is that it would cause a leak if one of the cases contained a type with a destructor. However this is doubly fine for us: SpecificDisplayItem is completely POD, and we hard abort the whole process if we find an error while deserializing. Also note that currently unions forbid non-Copy types, so destructors aren't actually a concern, but also we have literally no way to check for trait impls so this just leads to a cryptic error pointing to the #[derive(Deserialize)] annotation.

So either this design leaks, or it leads to cryptic errors. Either way, this will probably block the enum derive code from ever being properly upstreamed.

Missing optimizations from this PR that are also possible:

  • Bincode still branches+masks on every f32 read due to some safety scare. This should be fixed upstream Soon™️. Without this, deserializing f32's is bloated enough that it actually produces a loop, instead of straight-line code. With this, we get straight-line code, but there's still a branch for length checking on each f32 read, instead of a single check and a memcpy. It's hard for us to fix this at our abstraction level; ideally rustc/llvm would fix it.

  • AuxIter still uses the old deserialization method because it needs to return an Option<T> anyway. It could be changed to be a "fake" iterator like BuiltDisplayListIter too if it's desirable.


This change is Reviewable

@Gankra
Copy link
Contributor Author

Gankra commented Nov 29, 2017

So we tested this out(with the bincode float patch applied) on the gmail recording in wrench, and it looks like this drops BuiltDisplayListIter::next from 12% of our time to 4%. Nice!

Copy link
Member

@kvark kvark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel really sad that we are working around a problem that shouldn't be there in an ideal world (with compiler optimizing out all the redundant move). But within the current situation, I think we should take this in and move on. Great work @gankro !

@Gankra
Copy link
Contributor Author

Gankra commented Nov 29, 2017

I'd like to wait on landing this for a bit while I negotiate with upstream about details. The only urgency on this (since it's an almost completely transparent change) is shaping up for the all-hands?

@jrmuizel
Copy link
Collaborator

Correct.

bors-servo pushed a commit that referenced this pull request Nov 30, 2017
bump byteorder version to get transmuting float deserialization

This is the float change that was referenced in #2128

Before every deserialize of a float did a branch to canonicalize NaNs, now it just feeds the bits through. The effect can be seen most dramatically when deserializing the `[f32; 16]` matrices in PushStackingContext, where the cost of reading a float is reduced enough to convince llvm to unroll the entire loop.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/webrender/2138)
<!-- Reviewable:end -->
@Gankra
Copy link
Contributor Author

Gankra commented Dec 4, 2017

Status update: convinced the serde devs that the core stuff is basically good, in concept (impl details still need review). Currently refactoring so the extra derive stuff can be cfg'd out with a feature flag, for people who don't want/need it, since it does blow up compile times a bit (worth it for us).

@Gankra Gankra changed the title WIP: use deserialize_from for faster deserialization use deserialize_from for faster deserialization Dec 4, 2017
@Gankra
Copy link
Contributor Author

Gankra commented Dec 6, 2017

@staktrace has figured out how to get this to work upstream, so we should be able to land it if y'all are ready? #2181 should land first tho

@staktrace
Copy link
Contributor

By "upstream" you really mean "downstream" :) (or less ambiguously, "in gecko")

Try pushes that include this PR and the modified serde stuff:
Linux - https://treeherder.mozilla.org/#/jobs?repo=try&revision=564675d02666501b448090a8c9076f24edf3f94c
Windows - https://treeherder.mozilla.org/#/jobs?repo=try&revision=ba8013b6b617948cfeab67f372839b0edecd7119

(There's some libudev-sys changes in the patches which I'm landing separately beforehand in bug 1423236, so you can ignore that if you're looking through the patches)

@staktrace
Copy link
Contributor

Slight hitch: turns out the solution only works for Rust >= 1.22, and we don't have that on all platforms yet. Our minimum build requirement is still 1.21. Bug 1421097 is tracking the update to 1.22 and according to the schedule it's supposed to happen tomorrow but it might be delayed due to issues.

@bors-servo
Copy link
Contributor

☔ The latest upstream changes (presumably #2192) made this pull request unmergeable. Please resolve the merge conflicts.

@staktrace
Copy link
Contributor

The update to Rust 1.22 just landed on autoland (bug 1421097). Assuming that doesn't bounce we should be good to merge this PR now.

@Gankra Gankra force-pushed the deserialize_from branch 2 times, most recently from e139835 to 9a9c9a8 Compare December 8, 2017 19:48
@Gankra
Copy link
Contributor Author

Gankra commented Dec 8, 2017

rebased

This soft-forks serde_derive to add a deserialize_from derivation for
repr(u8) enums.
@Gankra
Copy link
Contributor Author

Gankra commented Dec 8, 2017

@bors-servo r=kvark

@bors-servo
Copy link
Contributor

📌 Commit 36864e9 has been approved by kvark

@bors-servo
Copy link
Contributor

⌛ Testing commit 36864e9 with merge 822edff...

bors-servo pushed a commit that referenced this pull request Dec 8, 2017
use deserialize_from for faster deserialization

It seems to at least be correct!

Heading out for the day, but @glennw or @jrmuizel might be interested in profiling this, so here it is.

What's going on here is an incredibly deep yak shave that I don't have the spirit to enumerate, but in essence this changes our deserialize pattern from: `deserialize() -> Result<T, E>` to `deserialize(dest: &mut T) -> Result<(), E>` (which is recursively invoked on fields).

This seems to significantly improve the codegen, as Rust is Very Bad at optimizing away all the copies from using Results. Currently this relies on my own fork of serde/serde_derive for the new trait method, and all the derivation work. I hope to upstream most of this.

The Hard Part of this entire thing, which I doubt will be upstreamable, will be handling enums. There is no way in Rust to build an enum up "in place". This is a pretty serious problem, because SpecificDisplayItem, the thing we're trying to deserialize, is an enum!

To handle this, I rely on the [deterministic layout](rust-lang/rfcs#2195) of a `repr(u8)` enum to transmute into its repr and manipulate the tag and payload seperately. Specifically I can produce a more optimized enum deserialization iff:

* The enum has a repr(int) annotation
* The enum has a nonelike (payloadless) variant
* The enum has a somelike (payloadfull) variant
* The enum is Copy

We then generate roughly this code:

```rust
fn deserialize_enum(&mut self) {
  let repr = transmute<&mut Self, &mut Repr>(self);

  // First mark the enum as empty
  repr.somelike.tag = Tag::nonelike;

  match deserialize_tag() {
     Tag::A => {
       try!(repr.A.field1.deserialize_from());
       try!(repr.A.field2.deserialize_from());
       try!(repr.A.field3.deserialize_from());

       // NOW mark the enum as having data!
       repr.somelike.tag = Tag::A;
       Ok(())
     }
     Tag::B => { ... }
  }
}
```

The point of this is that it prevents any memory-unsafety from observing a "partially" deserialized enum. The problem with it is that it would cause a leak if one of the cases contained a type with a destructor. However this is doubly fine for us: SpecificDisplayItem is completely POD, and we hard abort the whole process if we find an error while deserializing. Also note that currently unions forbid non-Copy types, so destructors aren't *actually* a concern, but also we have literally no way to check for trait impls so this just leads to a cryptic error pointing to the `#[derive(Deserialize)]` annotation.

So either this design leaks, or it leads to cryptic errors. Either way, this will probably block the enum derive code from ever being properly upstreamed.

Missing optimizations from this PR that are also possible:

* Bincode still branches+masks on every f32 read due to some safety scare. This should be fixed upstream Soon™️. Without this, deserializing f32's is bloated enough that it actually produces a loop, instead of straight-line code. With this, we get straight-line code, but there's still a branch for length checking on each f32 read, instead of a single check and a memcpy. It's hard for us to fix this at our abstraction level; ideally rustc/llvm would fix it.

* AuxIter still uses the old deserialization method because it needs to return an `Option<T>` anyway. It could be changed to be a "fake" iterator like BuiltDisplayListIter too if it's desirable.

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/webrender/2128)
<!-- Reviewable:end -->
@bors-servo
Copy link
Contributor

☀️ Test successful - status-appveyor, status-travis
Approved by: kvark
Pushing 822edff to master...

@bors-servo bors-servo merged commit 36864e9 into servo:master Dec 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants