Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upUse OOL transfers for macOS payloads. #199
Conversation
|
Seems fine, r=me. We can fix this later if it turns out to be a perf problem. |
| @@ -428,3 +428,12 @@ fn test_reentrant() { | |||
| sender.send(null.clone()).unwrap(); | |||
| assert_eq!(null, receiver.recv().unwrap()); | |||
| } | |||
|
|
|||
| #[test] | |||
| fn test_large_send() { | |||
This comment has been minimized.
This comment has been minimized.
antrik
May 29, 2018
Contributor
We don't use test_ prefixes in Rust...
BTW, is there some specific reason why you put this test in src/test.rs rather than src/platform/test.rs, where the existing bid data tests are?
(This would make it easier to just copy the existing big_data() test as huge_data() or so, thus addressing the problems pointed out below...)
|
|
||
| #[test] | ||
| fn test_large_send() { | ||
| let payload = vec![5u32; 1024 * 1024 * 12]; |
This comment has been minimized.
This comment has been minimized.
antrik
May 29, 2018
Contributor
Such a simplistic "pattern" doesn't ensure that the data was actually transferred correctly. IIRC one of the first bugs I fixed in ipc-channel was obscured by that, which is why I modified the existing tests in src/platform/test.rs...
| let payload = vec![5u32; 1024 * 1024 * 12]; | ||
| let (tx, rx) = ipc::channel().unwrap(); | ||
| tx.send(payload.clone()).unwrap(); | ||
| let received_payload = rx.recv().unwrap(); |
This comment has been minimized.
This comment has been minimized.
antrik
May 29, 2018
Contributor
This won't fly on platforms that block while sending large messages. (Including unix, as well as the upcoming windows.) See existing big_data() test...
|
@jdm can you please run the benchmarks, and post before/after results? Intuitively, I would expect messages smaller than a page size to become significantly slower... |
|
Oh good, benchmarks yield this on master:
|
|
@jdm well, now that you mention it, it's kinda expected that benchmarks would fail, since they measure sizes up to 16 MiB... I don't understand though why it panics at this particular point. Maybe the output is misleading due to buffering or something?... Just try removing the tests that are too big, and check the remaining results I guess? (Or remove any other problematic tests. Though in that case it would be very desirable to create a proper test case to reproduce the problem...) |
|
With fixes to the library to allow the benchmarks to complete:
|
|
Seeing these results, is anyone still in favour of going forward with this approach?... I can think of a bunch of venues for improving efficiency here. One potential way would be to attempt optimising the Instead of changing the Either way, I'm pretty sure performance for small messages would still be very poor -- so regardless of any out-of-line transfer improvements, I believe that method must be used only conditionally. Since the code for in-line transfers is already there, I hope adding the conditional shouldn't be too hard... The most interesting question there is how the receiver recognises whether out-of-line transfer was used or not. (Having a zero in-line data size might work? Not sure.) |
|
@jdm BTW, out of curiosity: what kind of machine did you run this on? The results are significantly worse across the board than on my current GNU/Linux system; and some are even worse than on my previous, much slower machine... Mach IPC is sure not known to be very efficient, compared to newer micro-kernels -- but I seriously wouldn't have expected it to do worse than Unix sockets... |
|
My original change supported both inline and out-of-line transfers, but I couldn't figure out how to automatically determine if an inline transfer would exceed the undocumented port send limits. I'll try just choosing a size and adding a boolean to the message payload that indicates which kind of transfer is in use. |
|
As for machine in use, this is a 2015 MBP. |
|
Does the kernel return a clear error when attempting to send an overly big message? If so, maybe you could just fall back to OOL in that case... (IIRC the If you keep track across calls of the size at which this starts happening, this wouldn't even cause significant overhead on repeated huge sends... |
|
So I believe the relevant kernel check is: Which references the https://github.com/apple/darwin-xnu/blob/master/osfmk/ipc/ipc_init.c#L122 That value calculates to I think it'd be best to not rely on this and to instead dynamically determine the maximum size. Maybe we could have a global atomic "max safe size" variable that starts at, say, 45MB and if we get |
|
@pcwalton that's more or less what I had in mind -- except that instead of starting with a fixed limit and halving on failure, I'd just record "that's the smallest size yet that failed"; so future sends at or above that size will go for OOL directly. |
|
Benchmarks for a static cutoff of 45mb:
|
|
It's not really clear to me why so many benchmarks would claim to get faster when the actual code changes mean that they are branching a bit more and allocating a bit more space for every message. |
|
These benchmarks are unfortunately quite noisy -- the discrepancies seem in line with random variations I have seen myself. You'd probably get the same amount of variation between runs with the same code. Having said that, I have occasionally seen some unexpected variations when doing code changes... My best guess are compiler optimisation anomalies. (Including interactions with the test bench.) |
|
Just a quick remark for now. Properly reviewing all the fragile pointer wrangling will take longer I'm afraid... |
| @@ -309,11 +310,13 @@ enum SendData<'a> { | |||
| OutOfLine(Option<OsIpcSharedMemory>), | |||
| } | |||
|
|
|||
| const MAX_INLINE_SIZE: usize = 45 * 1024 * 1024; | |||
| lazy_static! { | |||
| static ref MAX_INLINE_SIZE: AtomicUsize = AtomicUsize::new(45 * 1024 * 1024); | |||
This comment has been minimized.
This comment has been minimized.
antrik
May 31, 2018
Contributor
I'd rather just go with usize::max_value() for the starting value... Not least to make sure the auto-shrinking code actually gets exercised.
(On that note, to really test this, the test case would need to do a sequence of sends at sizes above and below 48 MiB...)
|
Fixups look fine; please autosquash, so I can r+ once I'm done reviewing the pointer wrangling. (Probably won't be today, though...) |
|
As written, this is causing surprising behaviour in Servo that I'm investigating. Please hold off merging. |
|
@jdm I just realised that the performance difference for large transfers might actually be real: since you are sending an extra |
|
Another thing I just realised is that we do not seem to have any tests for transferring large amounts of data along with channels/SHM regions in the same message... This could be quite relevant here? |
|
Actually, that's not entirely true: there is |
|
The benchmarks look a little more even changing the bool to a usize:
|
…mit.
|
The code is ready for consideration. |
|
Nice, thanks for doing all this work @jdm! |
|
@jdm doesn't really look more even to me... Would need several runs to get any meaningful comparison though, since random fluctuations between runs are clearly much larger than any actual performance changes there might be :-( (You might get somewhat more consistent results by temporarily increasing |
|
Just to be clear: I don't think it's actually necessary to run more benchmarks, unless you are curious. The current PR shouldn't have any measurable performance impact really as far as I can tell. |
| let data_size_dest = data_dest as *mut usize; | ||
| *data_size_dest = data_size; | ||
| let is_inline_dest = data_dest as *mut usize; | ||
| *is_inline_dest = data.is_inline() as usize; |
This comment has been minimized.
This comment has been minimized.
antrik
Jun 7, 2018
Contributor
AIUI, you are using usize now to work around a compiler bug? I'd say that deserves a comment...
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
antrik
Jun 7, 2018
Contributor
I think there was a bit of a misunderstanding here. I only brought up alignment changes as a possible explanation for differences in particular benchmark results -- I didn't mean to suggest that you should try to force alignment. For that, using usize for the flag field wouldn't be enough: the Mach port types use 32 bit values, and we have a variable amount of them in the header -- so while using bool does change alignment, whether the change is for worse or for better depends on the situation. To actually force optimal alignment, we'd have to do some sort of adaptive padding.
(Also, usize wouldn't preserve 8-byte alignment on 32 bit systems -- but that probably doesn't matter for the macos back-end?...)
|
|
||
| data_dest = data_dest.offset(mem::size_of::<usize>() as isize); | ||
| ptr::copy_nonoverlapping(data.as_ptr(), data_dest, data_size); | ||
| data_dest = data_dest.offset(mem::size_of::<usize>() as isize); |
This comment has been minimized.
This comment has been minimized.
antrik
Jun 7, 2018
Contributor
While this is not wrong, I think it could be pretty hard to read, and even more fragile than necessary... Why not increment is_inline_dest before converting it to data_dest, thus keeping it consistent with the other offset operations?
Or at least add a comment pointing out that this increment is for the "is inline" flag, since that's somewhat non-obvious here...
| @@ -483,9 +538,19 @@ impl OsIpcSender { | |||
| MACH_MSG_TIMEOUT_NONE, | |||
| MACH_PORT_NULL); | |||
| libc::free(message as *mut _); | |||
| if os_result == MACH_SEND_TOO_LARGE && data.is_inline() { | |||
| MAX_INLINE_SIZE.store(data.inline_data().len(), Ordering::Relaxed); | |||
This comment has been minimized.
This comment has been minimized.
antrik
Jun 7, 2018
Contributor
It seems to me there is a bit of a race condition here: if another thread stored a smaller size in the mean time, we would increase it again here...
(Which in turn might confuse the other thread while it's doing its resend.)
| if os_result != MACH_MSG_SUCCESS { | ||
| return Err(MachError::from(os_result)) | ||
| } | ||
| for outgoing_port in ports { | ||
| mem::forget(outgoing_port); |
This comment has been minimized.
This comment has been minimized.
antrik
Jun 7, 2018
Contributor
I believe this changes error handling behaviour: previously, mem::forget() was applied regardless whether the send was successful or not -- now, it's only done for a successful send; while an unsuccessful one will attempt to drop the ports.
If I'm reading the documentation right, neither approach is strictly correct, since depending on the exact error, the kernel might have either freed the ports or not... Yet attempting to free ports that have already been freed should be unproblematic I think, while the other option means potentially leaking ports -- so this change is probably for the better.
However, since it's orthogonal to the purpose of the commit, and it might have undesirable side effects, I don't think it's a good idea to silently do this along with other changes. It should go into a separate commit at least I'd say...
| assert!(payload_size <= max_payload_size); | ||
| payload_ptr = payload_ptr.offset(mem::size_of::<usize>() as isize); | ||
| let payload = slice::from_raw_parts(payload_ptr, payload_size).to_vec(); | ||
| let has_inline_data_ptr = shared_memory_descriptor as *mut u8; |
This comment has been minimized.
This comment has been minimized.
antrik
Jun 7, 2018
Contributor
Similar remark as for the sender side: I think it would be cleaner to cast it to usize, and then cast it again after we are done with this field.
|
@pcwalton Can you sign off on these changes? |
|
Looks fine to me. |
|
@bors-servo r=pcwalton |
|
|
|
|
fe429b8
into
master
jdm commentedMay 29, 2018
Fixes #98. This was much easier than figuring out how to calculate if a message would exceed the maximum limit, and all tests continue to pass.