Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upAvoid unnecessary FD cloning #127
Merged
+383
−271
Conversation
Keep tests for the internal platform back-ends separate from tests for the official high-level API. This should hopefully make the various tests a bit more manageable. Also changing the names of exisitng low-level tests while at it, to make them more descriptive. The existing `receiver_set` submodule moves into the new `ipc` module as is. I'm not sure we really need to keep the `receiver_set` tests separate from (upcoming) other high-level tests -- but it shouldn't hurt; and I think it's the least invasive option for now...
No point in a helper function that is used only once; and is performing an action actually not more complex than invoking that helper function...
…ceiver sets Having only 5 and 10 as data points makes it hard to extrapolate anything useful. A set of 1, 10, and 100 should be more informative I believe.
…tion Since `add_n_rxs()` was always being invoked right after creating a set, we can actually move the set creation into the helper function. (And rename it accordingly.) While at it, also moving the helper function closer to usage sites; and using it in more places.
In the `add_and_remove_closed_receivers()` test, do not add a new receiver to the set after adding and dropping the original one. This extra operation didn't add anything illuminating to the test -- since we are already measuring the performance of adding receivers seperately -- and only further obscured differences in the actual remove performance. (Even with this, it's still not measuring the pure remove performance, since we still have to add the original receiver before closing it -- but it's much easier now to observe the difference agaist the test that just adds receivers without dropping...)
Since the parametric `gen_select_test` benchmark function is not a macro, a name with `gen_` is actually confusing. Just using a name with `bench_` instead, like for the other parametric benchmarks we already have.
Explicitly declaring `-> ()` for functions not returning anything is just noise.
Various small changes to make the code simpler, while preserving functionality. There is indeed one minor functional change though: this version no longer actually iterates the results vector. In theory, this should have a minor performance impact -- however, I expect it to be insignificant.
While we do not really need them here to work around huge thread spawning overhead like for the transfer tests, honouring the iteration count in all tests makes the results less confusing. Some of the `receiver_set` tests actually show pretty large random fluctutions on my system, in the range of almost 10% -- however, doing multiple iterations unfortunately doesn't seem to help much with this...
The whole point of this type is that it only has a single instance, shared by means of an `Arc<>`...
Don't intermix the definitions and impls of `SharedFileDescriptor` and `OsIpcReceiver`.
This is an internal helper type, which shouldn't ever be exposed outside this module.
While the result was already being checked for `OsIpcSender`, it was being ignored for `OsIpcReceiver` and `OsOpaqueIpcChannel`. This check can help catching FD double-close errors, which are serious errors that can have major consequences. (Both in themselves, and as an indicator for potential FD use-after-close.)
…cReceiver`
To track the effect of upcoming channel passing optimisations, adding some tests to measure the performance of sender and receiver transfer. These new tests use the official high-level API like the `receiver_set` tests, rather than calling the platform primitives like the transfer tests, because the unnecessary FD cloning is actually happening mostly during serialisation/deserialisation etc. Additional low-level benchmarks for descriptor passing might be useful as well though for other purposes... The benchmarks go up to 64 senders or receivers transferred in one message -- which is unlikely in practice, but makes optimisations stand out better. (64 is currently the maximum the `unix` implementation can handle.) Here are the results for all the `ipc` tests (including the new ones, as well as the previously added `receiver_set` tests) from a typical run on my (pretty old) system. Note that some of the `receiver_set` tests fluctuate randomly quite a lot -- especially those about creating/populating sets. test ipc::receiver_set::add_and_remove_100_closed_receivers ... bench: 669,602 ns/iter (+/- 135,160) test ipc::receiver_set::add_and_remove_10_closed_receivers ... bench: 66,834 ns/iter (+/- 12,690) test ipc::receiver_set::add_and_remove_1_closed_receivers ... bench: 7,823 ns/iter (+/- 1,441) test ipc::receiver_set::create_and_destroy_empty_set ... bench: 27 ns/iter (+/- 0) test ipc::receiver_set::create_and_destroy_set_of_1 ... bench: 5,222 ns/iter (+/- 906) test ipc::receiver_set::create_and_destroy_set_of_10 ... bench: 50,648 ns/iter (+/- 17,547) test ipc::receiver_set::create_and_destroy_set_of_100 ... bench: 522,241 ns/iter (+/- 205,798) test ipc::receiver_set::send_on_100_of_100 ... bench: 422,804 ns/iter (+/- 20,692) test ipc::receiver_set::send_on_1_of_1 ... bench: 3,243 ns/iter (+/- 53) test ipc::receiver_set::send_on_1_of_100 ... bench: 9,568 ns/iter (+/- 123) test ipc::receiver_set::send_on_1_of_20 ... bench: 4,082 ns/iter (+/- 137) test ipc::receiver_set::send_on_1_of_5 ... bench: 3,370 ns/iter (+/- 60) test ipc::receiver_set::send_on_20_of_100 ... bench: 64,136 ns/iter (+/- 1,455) test ipc::receiver_set::send_on_20_of_20 ... bench: 58,150 ns/iter (+/- 1,156) test ipc::receiver_set::send_on_2_of_5 ... bench: 5,689 ns/iter (+/- 110) test ipc::receiver_set::send_on_5_of_100 ... bench: 19,552 ns/iter (+/- 212) test ipc::receiver_set::send_on_5_of_20 ... bench: 13,903 ns/iter (+/- 446) test ipc::receiver_set::send_on_5_of_5 ... bench: 12,832 ns/iter (+/- 216) test ipc::transfer_empty ... bench: 2,429 ns/iter (+/- 41) test ipc::transfer_receivers_00 ... bench: 2,608 ns/iter (+/- 46) test ipc::transfer_receivers_01 ... bench: 5,293 ns/iter (+/- 111) test ipc::transfer_receivers_08 ... bench: 16,182 ns/iter (+/- 218) test ipc::transfer_receivers_64 ... bench: 100,242 ns/iter (+/- 2,537) test ipc::transfer_senders_00 ... bench: 2,623 ns/iter (+/- 70) test ipc::transfer_senders_01 ... bench: 5,058 ns/iter (+/- 82) test ipc::transfer_senders_08 ... bench: 13,831 ns/iter (+/- 303) test ipc::transfer_senders_64 ... bench: 82,403 ns/iter (+/- 2,345)
In line with the other back-ends, add interior mutability for the receiver handle (by wrapping it in a `Cell<>`), to enable truly "consuming" the receiver, even when all we have is just a shared reference to the original receiver. (I.e. allow actually unsetting the field in the original object, so we get proper move semantics.) This is necessary, since `consume()` is called from our `serialize()` implementation (for Serde), which only gets passed a shared reference to the source data as well. Without being able to properly move it out, we had to resort to temporarily cloning the FD (i.e. calling `libc::dup()` on it), just for the original instance to be dropped a moment later at the end of the `send()` method -- since otherwise we would get shared ownership of the FD, thus making the handling fragile against leaking or premature dropping. Now that we can actually unset the original handle, we just transfer full responsibility for the FD to the new owner -- i.e. the new owner can safely `close()` the FD while dropping the data structure when it is done with it; whereas dropping the original receiver instance won't do anything, as it doesn't have access to the FD anymore. Another place this affects is `OsIpcReceiverSet.add()`, where the FD was being cloned temporarily as well -- though strictly speaking this one could have been fixed even without interior mutability, as this method actually gets the receiver by value... These changes improve performance of passing many receivers by almost a third. (Abount 350 ns each, which is fairly significant even in the more likely case of passing only a single receiver.) test ipc::transfer_empty ... bench: 2,434 ns/iter (+/- 56) test ipc::transfer_receivers_00 ... bench: 2,645 ns/iter (+/- 49) test ipc::transfer_receivers_01 ... bench: 5,013 ns/iter (+/- 113) test ipc::transfer_receivers_08 ... bench: 13,234 ns/iter (+/- 192) test ipc::transfer_receivers_64 ... bench: 76,912 ns/iter (+/- 2,048) test ipc::transfer_senders_00 ... bench: 2,705 ns/iter (+/- 48) test ipc::transfer_senders_01 ... bench: 5,137 ns/iter (+/- 98) test ipc::transfer_senders_08 ... bench: 13,919 ns/iter (+/- 294) test ipc::transfer_senders_64 ... bench: 83,491 ns/iter (+/- 1,805) Performance of adding receivers to a set is also affected significantly, improving by some 8% or so -- though it's hard to say exactly, with the strong fluctuations these tests are seeing... test ipc::receiver_set::add_and_remove_100_closed_receivers ... bench: 641,313 ns/iter (+/- 97,666) test ipc::receiver_set::add_and_remove_10_closed_receivers ... bench: 63,397 ns/iter (+/- 9,943) test ipc::receiver_set::add_and_remove_1_closed_receivers ... bench: 7,905 ns/iter (+/- 491) test ipc::receiver_set::create_and_destroy_empty_set ... bench: 27 ns/iter (+/- 0) test ipc::receiver_set::create_and_destroy_set_of_1 ... bench: 4,733 ns/iter (+/- 1,119) test ipc::receiver_set::create_and_destroy_set_of_10 ... bench: 47,696 ns/iter (+/- 16,627) test ipc::receiver_set::create_and_destroy_set_of_100 ... bench: 470,423 ns/iter (+/- 110,249)
The `to_sender()` and `to_receiver()` methods on `OsOpaqueIpcChannel` were cloning the FD unnecessarily, as the original object is not supposed to be used other than for the one-time conversion, and gets dropped at the end of de-serialisation anyway. Since we have a mutable reference, we can just set it to an invalid value instead, to make sure it's not accidentally reused -- just like what the other back-ends are doing. Note: it would be more elegant for these methods to actually take ownership of the original object, and get rid of it after the conversion. This would however require non-trivial changes to the handling of `OS_IPC_CHANNELS_FOR_DESERIALIZATION` (in `ipc.rs`), and it's not clear this would actually be a win... This improves transfer of many senders by about 40% (about 350 ns each); and also gives receiver transfer another substantial boost -- together with the previous optimisation, almost doubling the performance compared to the original version. The improvement in sender transfer in particular should boost total performance of a simple RPC roundtrip (as the most common use case) by about 5%. (As an additional bonus, `strace` on `ipc-channel` becomes more readable without the unnecessary cloning...) test ipc::transfer_empty ... bench: 2,442 ns/iter (+/- 44) test ipc::transfer_receivers_00 ... bench: 2,545 ns/iter (+/- 89) test ipc::transfer_receivers_01 ... bench: 4,441 ns/iter (+/- 67) test ipc::transfer_receivers_08 ... bench: 10,015 ns/iter (+/- 121) test ipc::transfer_receivers_64 ... bench: 53,257 ns/iter (+/- 1,665) test ipc::transfer_senders_00 ... bench: 2,556 ns/iter (+/- 65) test ipc::transfer_senders_01 ... bench: 4,520 ns/iter (+/- 89) test ipc::transfer_senders_08 ... bench: 10,666 ns/iter (+/- 214) test ipc::transfer_senders_64 ... bench: 59,287 ns/iter (+/- 1,413)
|
Updated to not include the "stolen" benchmark commit, which landed in #94 now. |
|
This looks good to me. Haven't reviewed the benchmarks ultra-carefully, but I think they're sound. @bors-servo r+ |
|
|
bors-servo
added a commit
that referenced
this pull request
Dec 25, 2016
Avoid unnecessary FD cloning This makes a bunch of changes to how FDs are handled in the Unix back-end: most notably, adding inner mutability for the FD in `OsIpcReceiver` by wrapping it in a `Cell<>`, just like the other back-ends do -- thus enabling us to avoid unnecessary `dup()` calls when passing around FDs, without introducing potential FD leaks or premature closing. This also contains a couple of related cleanups and minor fixes. Furthermore, it adds benchmarks for FD passing, to track the effect of the optimisations; and a bunch of improvements to the existing benchmarks for receiver set handling.
|
BTW, awesome commit separation and description, made reviewing this
straight-forward :)
|
|
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
antrik commentedDec 2, 2016
•
edited
This makes a bunch of changes to how FDs are handled in the Unix back-end: most notably, adding inner mutability for the FD in
OsIpcReceiverby wrapping it in aCell<>, just like the other back-ends do -- thus enabling us to avoid unnecessarydup()calls when passing around FDs, without introducing potential FD leaks or premature closing.This also contains a couple of related cleanups and minor fixes. Furthermore, it adds benchmarks for FD passing, to track the effect of the optimisations; and a bunch of improvements to the existing benchmarks for receiver set handling.