Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid unnecessary FD cloning #127

Merged
merged 19 commits into from Dec 25, 2016
Merged

Avoid unnecessary FD cloning #127

merged 19 commits into from Dec 25, 2016

Conversation

@antrik
Copy link
Contributor

antrik commented Dec 2, 2016

This makes a bunch of changes to how FDs are handled in the Unix back-end: most notably, adding inner mutability for the FD in OsIpcReceiver by wrapping it in a Cell<>, just like the other back-ends do -- thus enabling us to avoid unnecessary dup() calls when passing around FDs, without introducing potential FD leaks or premature closing.

This also contains a couple of related cleanups and minor fixes. Furthermore, it adds benchmarks for FD passing, to track the effect of the optimisations; and a bunch of improvements to the existing benchmarks for receiver set handling.

antrik added 19 commits Nov 22, 2016
Keep tests for the internal platform back-ends separate from tests for
the official high-level API. This should hopefully make the various
tests a bit more manageable.

Also changing the names of exisitng low-level tests while at it, to make
them more descriptive.

The existing `receiver_set` submodule moves into the new `ipc` module as
is. I'm not sure we really need to keep the `receiver_set` tests
separate from (upcoming) other high-level tests -- but it shouldn't
hurt; and I think it's the least invasive option for now...
No point in a helper function that is used only once; and is performing
an action actually not more complex than invoking that helper function...
…ceiver sets

Having only 5 and 10 as data points makes it hard to extrapolate
anything useful. A set of 1, 10, and 100 should be more informative I
believe.
…tion

Since `add_n_rxs()` was always being invoked right after creating a set,
we can actually move the set creation into the helper function. (And
rename it accordingly.)

While at it, also moving the helper function closer to usage sites; and
using it in more places.
In the `add_and_remove_closed_receivers()` test, do not add a new
receiver to the set after adding and dropping the original one.

This extra operation didn't add anything illuminating to the test --
since we are already measuring the performance of adding receivers
seperately -- and only further obscured differences in the actual remove
performance.

(Even with this, it's still not measuring the pure remove performance,
since we still have to add the original receiver before closing it --
but it's much easier now to observe the difference agaist the test that
just adds receivers without dropping...)
Since the parametric `gen_select_test` benchmark function is not a
macro, a name with `gen_` is actually confusing.

Just using a name with `bench_` instead, like for the other parametric
benchmarks we already have.
Explicitly declaring `-> ()` for functions not returning anything is
just noise.
Various small changes to make the code simpler, while preserving
functionality.

There is indeed one minor functional change though: this version no
longer actually iterates the results vector. In theory, this should have
a minor performance impact -- however, I expect it to be insignificant.
While we do not really need them here to work around huge thread
spawning overhead like for the transfer tests, honouring the iteration
count in all tests makes the results less confusing.

Some of the `receiver_set` tests actually show pretty large random
fluctutions on my system, in the range of almost 10% -- however, doing
multiple iterations unfortunately doesn't seem to help much with this...
The whole point of this type is that it only has a single instance,
shared by means of an `Arc<>`...
Don't intermix the definitions and impls of `SharedFileDescriptor` and
`OsIpcReceiver`.
This is an internal helper type, which shouldn't ever be exposed outside
this module.
While the result was already being checked for `OsIpcSender`, it was
being ignored for `OsIpcReceiver` and `OsOpaqueIpcChannel`.

This check can help catching FD double-close errors, which are serious
errors that can have major consequences. (Both in themselves, and as an
indicator for potential FD use-after-close.)
To track the effect of upcoming channel passing optimisations, adding
some tests to measure the performance of sender and receiver transfer.

These new tests use the official high-level API like the `receiver_set`
tests, rather than calling the platform primitives like the transfer
tests, because the unnecessary FD cloning is actually happening mostly
during serialisation/deserialisation etc.

Additional low-level benchmarks for descriptor passing might be useful
as well though for other purposes...

The benchmarks go up to 64 senders or receivers transferred in one
message -- which is unlikely in practice, but makes optimisations stand
out better. (64 is currently the maximum the `unix` implementation can
handle.)

Here are the results for all the `ipc` tests (including the new ones, as
well as the previously added `receiver_set` tests) from a typical run on
my (pretty old) system. Note that some of the `receiver_set` tests
fluctuate randomly quite a lot -- especially those about
creating/populating sets.

test ipc::receiver_set::add_and_remove_100_closed_receivers ... bench:     669,602 ns/iter (+/- 135,160)
test ipc::receiver_set::add_and_remove_10_closed_receivers  ... bench:      66,834 ns/iter (+/- 12,690)
test ipc::receiver_set::add_and_remove_1_closed_receivers   ... bench:       7,823 ns/iter (+/- 1,441)
test ipc::receiver_set::create_and_destroy_empty_set        ... bench:          27 ns/iter (+/- 0)
test ipc::receiver_set::create_and_destroy_set_of_1         ... bench:       5,222 ns/iter (+/- 906)
test ipc::receiver_set::create_and_destroy_set_of_10        ... bench:      50,648 ns/iter (+/- 17,547)
test ipc::receiver_set::create_and_destroy_set_of_100       ... bench:     522,241 ns/iter (+/- 205,798)
test ipc::receiver_set::send_on_100_of_100                  ... bench:     422,804 ns/iter (+/- 20,692)
test ipc::receiver_set::send_on_1_of_1                      ... bench:       3,243 ns/iter (+/- 53)
test ipc::receiver_set::send_on_1_of_100                    ... bench:       9,568 ns/iter (+/- 123)
test ipc::receiver_set::send_on_1_of_20                     ... bench:       4,082 ns/iter (+/- 137)
test ipc::receiver_set::send_on_1_of_5                      ... bench:       3,370 ns/iter (+/- 60)
test ipc::receiver_set::send_on_20_of_100                   ... bench:      64,136 ns/iter (+/- 1,455)
test ipc::receiver_set::send_on_20_of_20                    ... bench:      58,150 ns/iter (+/- 1,156)
test ipc::receiver_set::send_on_2_of_5                      ... bench:       5,689 ns/iter (+/- 110)
test ipc::receiver_set::send_on_5_of_100                    ... bench:      19,552 ns/iter (+/- 212)
test ipc::receiver_set::send_on_5_of_20                     ... bench:      13,903 ns/iter (+/- 446)
test ipc::receiver_set::send_on_5_of_5                      ... bench:      12,832 ns/iter (+/- 216)
test ipc::transfer_empty                                    ... bench:       2,429 ns/iter (+/- 41)
test ipc::transfer_receivers_00                             ... bench:       2,608 ns/iter (+/- 46)
test ipc::transfer_receivers_01                             ... bench:       5,293 ns/iter (+/- 111)
test ipc::transfer_receivers_08                             ... bench:      16,182 ns/iter (+/- 218)
test ipc::transfer_receivers_64                             ... bench:     100,242 ns/iter (+/- 2,537)
test ipc::transfer_senders_00                               ... bench:       2,623 ns/iter (+/- 70)
test ipc::transfer_senders_01                               ... bench:       5,058 ns/iter (+/- 82)
test ipc::transfer_senders_08                               ... bench:      13,831 ns/iter (+/- 303)
test ipc::transfer_senders_64                               ... bench:      82,403 ns/iter (+/- 2,345)
In line with the other back-ends, add interior mutability for the
receiver handle (by wrapping it in a `Cell<>`), to enable truly
"consuming" the receiver, even when all we have is just a shared
reference to the original receiver. (I.e. allow actually unsetting the
field in the original object, so we get proper move semantics.)

This is necessary, since `consume()` is called from our `serialize()`
implementation (for Serde), which only gets passed a shared reference to
the source data as well. Without being able to properly move it out, we
had to resort to temporarily cloning the FD (i.e. calling `libc::dup()`
on it), just for the original instance to be dropped a moment later at
the end of the `send()` method -- since otherwise we would get shared
ownership of the FD, thus making the handling fragile against leaking or
premature dropping.

Now that we can actually unset the original handle, we just transfer
full responsibility for the FD to the new owner -- i.e. the new owner
can safely `close()` the FD while dropping the data structure when it is
done with it; whereas dropping the original receiver instance won't do
anything, as it doesn't have access to the FD anymore.

Another place this affects is `OsIpcReceiverSet.add()`, where the FD was
being cloned temporarily as well -- though strictly speaking this one
could have been fixed even without interior mutability, as this method
actually gets the receiver by value...

These changes improve performance of passing many receivers by almost a
third. (Abount 350 ns each, which is fairly significant even in the
more likely case of passing only a single receiver.)

test ipc::transfer_empty                                    ... bench:       2,434 ns/iter (+/- 56)
test ipc::transfer_receivers_00                             ... bench:       2,645 ns/iter (+/- 49)
test ipc::transfer_receivers_01                             ... bench:       5,013 ns/iter (+/- 113)
test ipc::transfer_receivers_08                             ... bench:      13,234 ns/iter (+/- 192)
test ipc::transfer_receivers_64                             ... bench:      76,912 ns/iter (+/- 2,048)
test ipc::transfer_senders_00                               ... bench:       2,705 ns/iter (+/- 48)
test ipc::transfer_senders_01                               ... bench:       5,137 ns/iter (+/- 98)
test ipc::transfer_senders_08                               ... bench:      13,919 ns/iter (+/- 294)
test ipc::transfer_senders_64                               ... bench:      83,491 ns/iter (+/- 1,805)

Performance of adding receivers to a set is also affected significantly,
improving by some 8% or so -- though it's hard to say exactly, with the strong
fluctuations these tests are seeing...

test ipc::receiver_set::add_and_remove_100_closed_receivers ... bench:     641,313 ns/iter (+/- 97,666)
test ipc::receiver_set::add_and_remove_10_closed_receivers  ... bench:      63,397 ns/iter (+/- 9,943)
test ipc::receiver_set::add_and_remove_1_closed_receivers   ... bench:       7,905 ns/iter (+/- 491)
test ipc::receiver_set::create_and_destroy_empty_set        ... bench:          27 ns/iter (+/- 0)
test ipc::receiver_set::create_and_destroy_set_of_1         ... bench:       4,733 ns/iter (+/- 1,119)
test ipc::receiver_set::create_and_destroy_set_of_10        ... bench:      47,696 ns/iter (+/- 16,627)
test ipc::receiver_set::create_and_destroy_set_of_100       ... bench:     470,423 ns/iter (+/- 110,249)
The `to_sender()` and `to_receiver()` methods on `OsOpaqueIpcChannel`
were cloning the FD unnecessarily, as the original object is not
supposed to be used other than for the one-time conversion, and gets
dropped at the end of de-serialisation anyway.

Since we have a mutable reference, we can just set it to an invalid
value instead, to make sure it's not accidentally reused -- just like
what the other back-ends are doing.

Note: it would be more elegant for these methods to actually take
ownership of the original object, and get rid of it after the
conversion. This would however require non-trivial changes to the
handling of `OS_IPC_CHANNELS_FOR_DESERIALIZATION` (in `ipc.rs`), and
it's not clear this would actually be a win...

This improves transfer of many senders by about 40% (about 350 ns each);
and also gives receiver transfer another substantial boost -- together
with the previous optimisation, almost doubling the performance compared
to the original version. The improvement in sender transfer in
particular should boost total performance of a simple RPC roundtrip (as
the most common use case) by about 5%.

(As an additional bonus, `strace` on `ipc-channel` becomes more readable
without the unnecessary cloning...)

test ipc::transfer_empty             ... bench:       2,442 ns/iter (+/- 44)
test ipc::transfer_receivers_00      ... bench:       2,545 ns/iter (+/- 89)
test ipc::transfer_receivers_01      ... bench:       4,441 ns/iter (+/- 67)
test ipc::transfer_receivers_08      ... bench:      10,015 ns/iter (+/- 121)
test ipc::transfer_receivers_64      ... bench:      53,257 ns/iter (+/- 1,665)
test ipc::transfer_senders_00        ... bench:       2,556 ns/iter (+/- 65)
test ipc::transfer_senders_01        ... bench:       4,520 ns/iter (+/- 89)
test ipc::transfer_senders_08        ... bench:      10,666 ns/iter (+/- 214)
test ipc::transfer_senders_64        ... bench:      59,287 ns/iter (+/- 1,413)
@antrik antrik force-pushed the antrik:avoid-FD-cloning branch from 30b5a69 to de52e79 Dec 25, 2016
@antrik
Copy link
Contributor Author

antrik commented Dec 25, 2016

Updated to not include the "stolen" benchmark commit, which landed in #94 now.

@emilio
Copy link
Member

emilio commented Dec 25, 2016

This looks good to me. Haven't reviewed the benchmarks ultra-carefully, but I think they're sound.

@bors-servo r+

@bors-servo
Copy link
Contributor

bors-servo commented Dec 25, 2016

📌 Commit de52e79 has been approved by emilio

@bors-servo
Copy link
Contributor

bors-servo commented Dec 25, 2016

Testing commit de52e79 with merge e0e5e73...

bors-servo added a commit that referenced this pull request Dec 25, 2016
Avoid unnecessary FD cloning

This makes a bunch of changes to how FDs are handled in the Unix back-end: most notably, adding inner mutability for the FD in `OsIpcReceiver` by wrapping it in a `Cell<>`, just like the other back-ends do -- thus enabling us to avoid unnecessary `dup()` calls when passing around FDs, without introducing potential FD leaks or premature closing.

This also contains a couple of related cleanups and minor fixes. Furthermore, it adds benchmarks for FD passing, to track the effect of the optimisations; and a bunch of improvements to the existing benchmarks for receiver set handling.
@emilio
Copy link
Member

emilio commented Dec 25, 2016

@bors-servo
Copy link
Contributor

bors-servo commented Dec 25, 2016

☀️ Test successful - status-appveyor, status-travis

@bors-servo bors-servo merged commit de52e79 into servo:master Dec 25, 2016
3 checks passed
3 checks passed
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
homu Test successful
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.