Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: bincode IPC #1124

Closed
wants to merge 2 commits into from
Closed

WIP: bincode IPC #1124

wants to merge 2 commits into from

Conversation

@Gankra
Copy link
Contributor

Gankra commented Apr 17, 2017

(based on my other PR that adds profiling; posting this now to discuss results)

Note: this adds a bincode dep to webrender-traits

This replaces the unsound transmute-based IPC with sound bincode-based IPC (results in subsequent comments).


This change is Reviewable

Gankra added 2 commits Apr 13, 2017
Note: this adds a dependency on `time` to `webrender_traits`.

The purpose of this is to evaluate the overhead of different IPC implementations and serialization strategies. In particular, we need to replace the current implementation, as it is unsound; it just blindly transmutes, leaving room for a malicious process to create Undefined representations of e.g. enums.

We take 4 time stamps:

* before serialization to `Vec<u8>` happens
* after serialization
* before deserialization
* after deserialization

Because some of these time stamps happen in webrender_traits and cross-process, we need to take them manually and send the first two over IPC. Currently I'm just stuffing them in one of the descriptors, since that's easy and minimally invasive (but Firefox's headers will need to be updated, I think).

We consider the time to perform the message send cross-process to be the time between serialization ending and deserialization starting. Overall IPC overhead is the whole process. We also record how large the two sent buffers are, as this may change significantly with a Proper Serialization Mechanism.

In my preliminary testing basically all overhead is currently in the actual message send process (about 0.5ms on Mac in Servo). There's sometimes some overhead from deserialization, which is probably stuffing the auxlist in a map. About 200k is being sent for the display list, and 100k for the aux list.
@Gankra
Copy link
Contributor Author

Gankra commented Apr 17, 2017

I've only profiled Servo so far. Summary:

Mean IPC time goes from 0.5ms to 0.9ms with this change. Most of these losses are in deserialization time; some of it is in serialization time. It looks like most of this is when processing DisplayItems, and not AuxiliaryLists (need to dig into this, but it makes sense since AuxiliaryLists are relatively simple).

Note however that (de)serialization costs are suppressed by a surprising reduction in message send time, which appears to be the result of a reduction in the size of the message sent. Currently DisplayItems are ~twice as big as they could be in memory, and bincode manages to realize this size reduction. In principle this means transmute-based IPC could be reduced to ~0.3ms by just optimizing the layout of DisplayItems? However this may improve (de)serialization time too.

AuxiliaryLists seem to already be perfectly compact.

All in all this is pretty promising, considering the current implementation is completely naive. There's probably interesting wins to find in bincode::deserialize and DisplayItem layout.

Data:

transmute:

total IPC time:
    min:  0.35
    mean: 0.54
    max:  1.0

serial:   0.00
send:     0.5
deserial: 0.01

diplay-len: 211680
aux-len:    106300

bincode:

total IPC time:
    min:  0.70
    mean: 0.94
    max:  1.38

serial:   0.17
send:     0.3
deserial: 0.50

display-len: 111879
aux-len:     106688

(edit: note that the numbers don't perfectly sum because I was mostly eye-balling the serial/send/deserial means -- close enough for horse grenades)

@Gankra
Copy link
Contributor Author

Gankra commented Apr 17, 2017

@TyOverby
Copy link
Contributor

TyOverby commented Apr 17, 2017

There are a few optimizations that I'm going to be adding to bincode in the nearish future that might help webrender out:

  • Variable-length number encoding for enum discriminants. Most enum discriminants will go from being u32 to u8, saving some space.
  • The zero-copy API for serde will allow bincode to keep references to &str and &[u8] that are located in the original buffer. If you are serializing large strings or byte slices, this could prevent an extra memcpy.
@kvark
Copy link
Member

kvark commented Apr 17, 2017

@gankro thanks for the experiment and analysis!

Machine performance: transmute looks superior.
Human performance: Gecko folks would probably prefer leaving the transmute on.

Stability/portability: possible undefined behavior?
Code quality: bincode is cleaner, but then you'd just be shoving a transmute down the dependency line

@Gankra
Copy link
Contributor Author

Gankra commented Apr 17, 2017

@kvark it's not so much a stability/portability issue as a "blatant hole in the multi-process security model". If the content process is hijacked, it can feed arbitrary bytes into the buffer, leading to undefined enum descriminants. This can in turn lead to trying to execute non-existent jump table slots (to name one obvious result of UB).

@TyOverby
Copy link
Contributor

TyOverby commented Apr 17, 2017

you'd just be shoving a transmute down the dependency line

@kvark: what do you mean by this?

@kvark
Copy link
Member

kvark commented Apr 17, 2017

@TyOverby
I mean that from the code quality point of view, it would make WR cleaner by reducing one more unsafe spot, but bincode probably has a few unsafe spots inside anyway, so this unsafety can be seen as just going down from WR to the dependency (bincode).

@gankro ok, makes sense. Would it be reasonable to look into asserting for the enum discriminants (as in - assert!(transmuted_value < target_enum_variant_count)? Or do you consider bincode inevitable?

@TyOverby
Copy link
Contributor

TyOverby commented Apr 17, 2017

bincode probably has a few unsafe spots inside anyway

That would be news to me!

As far as I can tell, the only unsafe is this one which is just an optimization avoiding zeroing out a Vec<u8>. Looking at the servo structs that are being serialized, I'm fairly certain that you never hit this code path to begin with, but I'd be happy to introduce a crate feature that doesn't include the optimization.

On top of this, bincode has been through several days of fuzzing, never hitting any vulnerabilities (it found a few panics that were immediately fixed). I'm using bincode in networked applications, so I'm pretty concerned about the security aspect of the project.

@kvark
Copy link
Member

kvark commented Apr 17, 2017

@TyOverby oh, right! My mistake then, thanks for correcting me :)

@Gankra
Copy link
Contributor Author

Gankra commented Apr 17, 2017

@kvark bincode would be strongly preferred because we don't need to maintain it, but if I can't get the overhead low enough, the next avenue of exploration would be a fork of bincode that "happens" to match the Rust ABI exactly so that LLVM hopefully optimizes everything into transmute + validation.

@bors-servo
Copy link
Contributor

bors-servo commented Apr 17, 2017

The latest upstream changes (presumably #1123) made this pull request unmergeable. Please resolve the merge conflicts.

@TyOverby TyOverby mentioned this pull request Apr 18, 2017
0 of 1 task complete
@Gankra
Copy link
Contributor Author

Gankra commented Apr 28, 2017

Replacing with new PR.

@Gankra Gankra closed this Apr 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants
You can’t perform that action at this time.