Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upIgnore EINTR on unix::OsIpcReceiverSet::select #137
Conversation
| @@ -472,7 +472,12 @@ impl OsIpcReceiverSet { | |||
| let mut selection_results = Vec::new(); | |||
| match self.poll.poll(&mut self.events, None) { | |||
| Ok(sz) if sz > 0 => {}, | |||
| _ => { return Err(UnixError::last()); } | |||
| Err(ref e) if e.kind() != ErrorKind::Interrupted => { | |||
| return Ok(selection_results); | |||
This comment has been minimized.
This comment has been minimized.
dlrobertson
Feb 2, 2017
Author
Collaborator
Debated about what to return here. I think a empty vec makes the most sense.
This comment has been minimized.
This comment has been minimized.
antrik
Feb 2, 2017
Contributor
Well, the canonical way of handling EINTR is using a retry loop.
In most cases it's probably not strictly necessary to handle it here, since typical users calling select() in a loop and iterating over the result each time will generally just work -- not sure though we should rely on this, as we never return an empty result otherwise as far as I can tell. On a more conceptual level, it seems less ideal to pass control back to the caller unnecessarily -- better just handle it transparently I'd say.
This comment has been minimized.
This comment has been minimized.
dlrobertson
Feb 2, 2017
Author
Collaborator
Good point... I like the retry better. The update will include this
| @@ -472,7 +472,12 @@ impl OsIpcReceiverSet { | |||
| let mut selection_results = Vec::new(); | |||
| match self.poll.poll(&mut self.events, None) { | |||
| Ok(sz) if sz > 0 => {}, | |||
| _ => { return Err(UnixError::last()); } | |||
| Err(ref e) if e.kind() != ErrorKind::Interrupted => { | |||
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
dlrobertson
Feb 2, 2017
Author
Collaborator
Good catch. I changed the control flow right before committing and forgot to change this.
This comment has been minimized.
This comment has been minimized.
antrik
Feb 2, 2017
Contributor
I wonder whether it would be realistic to add a test case for this? :-)
This comment has been minimized.
This comment has been minimized.
dlrobertson
Feb 3, 2017
Author
Collaborator
Another good point. This one is definitely difficult to reproduce as a unit test. I made my best attempt at it with b03e9db. Parts of it feel a bit hacky, like the "kill sled", but it does cause older versions to panic. It also caught a bug in the code! Hence the new changes to OsIpcReceiverSet::select
|
While the solution is probably fine, what I'd like to understand is why this only came up with |
579c77d
to
b03e9db
Agreed. This is really interesting. I'm not really sure that I'll be able to definitively answer this. It may have been just a timing issue? Pre-mio versions do fail the unit test I just added, which causes me to lean towards the timing theory. |
7e94d53
to
56b66bb
| _ => { return Err(UnixError::last()); } | ||
| Ok(sz) => { | ||
| if sz < 1 { | ||
| return self.select(); |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
dlrobertson
Feb 5, 2017
Author
Collaborator
In testing poll always returned 0 after each call that returned EINTR. I need to do some more testing and debugging around this.
| }, | ||
| Err(ref e) => { | ||
| if e.kind() == ErrorKind::Interrupted { | ||
| return self.select(); |
This comment has been minimized.
This comment has been minimized.
antrik
Feb 3, 2017
Contributor
Sneaky ;-)
I think I'd actually be more comfortable with an explicit loop -- not sure how others feel about this...
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
antrik
Feb 4, 2017
Contributor
Indeed this is not even a real tail call (as @mbrubeck pointed out on IRC), since the Vec has a destructor -- so even in optimised builds this clutters the stack... Not much of an issue here, as it's a rare exceptional situation, so the worst thing I guess is the bloated backtrace -- but it's a good thing to keep in mind in general I guess :-)
This comment has been minimized.
This comment has been minimized.
dlrobertson
Feb 5, 2017
Author
Collaborator
Good call. I replaced the recursive calls with a while loop.
| let parent_pid = unsafe { | ||
| libc::getpid() | ||
| }; | ||
| unsafe { |
This comment has been minimized.
This comment has been minimized.
antrik
Feb 3, 2017
Contributor
Not sure it's a good idea to have consecutive unsafe blocks... Generally we just use a larger enclosing block, if there are no (non-trivial) safe operations in between.
| libc::signal(libc::SIGTERM, | ||
| sigterm_handler as fn(libc::c_int) as *mut libc::c_void as libc::sighandler_t); | ||
| } | ||
| let _ = unsafe { fork(|| { |
This comment has been minimized.
This comment has been minimized.
antrik
Feb 3, 2017
Contributor
No need to explicitly ignore the return value here.
BTW, I wonder whether you could send the signal the other way instead, so only the child process is affected, avoiding problematic interaction with other tests? (Admittedly, this would complicate the code, as the child would have to report back the result I guess...)
This comment has been minimized.
This comment has been minimized.
dlrobertson
Feb 5, 2017
Author
Collaborator
I wonder whether you could send the signal the other way instead
Overall, I like sending the signal to the child better. It could make debugging a little harder if an unexpected panic occurs in the select. Overall much better though.
| } | ||
| tx0.send(b"A", vec![], vec![]).unwrap(); | ||
| })}; | ||
| for _ in rx_set.select().unwrap() {} |
This comment has been minimized.
This comment has been minimized.
antrik
Feb 3, 2017
Contributor
Not sure how useful it is to iterate the results without actually checking them... I'd suggest invoking next() to get the first result, like some other receiver_set() tests are already doing. (Possibly followed by another next() to assert that it's None.)
This comment has been minimized.
This comment has been minimized.
|
I was thinking of a slightly different test: rather then sending libc::kill(parent_pid, libc::SIGSTOP);
thread::sleep(Duration::from_millis(3));
libc::kill(parent_pid, libc::SIGCONT);(Possibly with another wait before the first signal -- though it doesn't seem necessary in my testing...) AIUI, that's more or less what happens when running in a debugger. Unlike the (The only way I got a failure with the old implementation when running multi-threaded is by using a humongous number of Another option I considered is using BTW, did you see that the test timed out on the MacOS builder? I wonder whether it actually hangs there for some reason -- or sending many signals just takes very long on MacOS... |
e2500fd
to
9dae7c1
I like this a lot better. The current implementation uses this.
I did notice this, and it does appear to still be happening. I'll try to reproduce it locally on my mac.
I also found this to be true.
Hadn't really thought about using |
| Ok(_) => tx0.send(b"\x00", vec![], vec![]).unwrap(), | ||
| Err(e) => { | ||
| let err = format!("{:?}", e); | ||
| tx0.send(err.into_bytes().as_slice(), vec![], vec![]).unwrap(); |
This comment has been minimized.
This comment has been minimized.
antrik
Feb 6, 2017
Contributor
You can just use as_bytes() instead of into_bytes().as_slice().
Note that this is not exactly the same, since into_bytes() consumes the original string, and the resulting temporary byte string is dropped at the end of the expression; while as_bytes() keeps the original string in place. In this case (and many others), this doesn't really make a difference though :-)
This comment has been minimized.
This comment has been minimized.
| } | ||
| unsafe { | ||
| // reap the child | ||
| libc::waitpid(child_pid, &mut stat, 0); |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| tx0.send(b"B", vec![], vec![]).unwrap(); | ||
| // Send the result of the select back to the parent | ||
| match rx_set.select() { | ||
| Ok(_) => tx0.send(b"\x00", vec![], vec![]).unwrap(), |
This comment has been minimized.
This comment has been minimized.
antrik
Feb 6, 2017
Contributor
You are not checking at all any more whether the select() actually yields the expected result?
This comment has been minimized.
This comment has been minimized.
| // Receive the results from the child and panic if the the | ||
| // returned value from select was an error. | ||
| result_set.add(rx0).unwrap(); | ||
| for result in result_set.select().unwrap() { |
This comment has been minimized.
This comment has been minimized.
antrik
Feb 6, 2017
Contributor
I don't see a need to use a set here: this channel is only for getting back the result; the actual select test is done when sending to the child...
This comment has been minimized.
This comment has been minimized.
| Ok(sz) if sz > 0 => {}, | ||
| _ => { return Err(UnixError::last()); } | ||
| }; | ||
| let mut sz = 0; |
This comment has been minimized.
This comment has been minimized.
antrik
Feb 6, 2017
Contributor
Nitpick: I generally prefer more meaningful variable names, especially when they have a non-trivial scope -- which sz now has after moving it out of the match.
This comment has been minimized.
This comment has been minimized.
| _ => { return Err(UnixError::last()); } | ||
| }; | ||
| let mut sz = 0; | ||
| while sz <=0 { |
This comment has been minimized.
This comment has been minimized.
antrik
Feb 6, 2017
Contributor
Nitpick: missing blank after <=.
BTW, since this is usize, I'd say it would be less misleading to just check == 0?
This comment has been minimized.
This comment has been minimized.
| Ok(num_events) => { | ||
| if num_events > 0 { | ||
| sz = num_events; | ||
| } |
This comment has been minimized.
This comment has been minimized.
antrik
Feb 6, 2017
Contributor
There is no need for this if: you are checking the same thing in the while condition.
(Or you could use a loop instead of while, and do an explicit break when getting results; thus keeping the condition here. Not sure though whether this would be clearer, or less clear...)
a1f7daf
to
b5c702b
|
Cleaned up the test a bit and addressed comments in the review. I also went back and found the minimal example needed to trigger the mac hang. The following should cause us to hang on mac #[test]
fn basic() {
let (tx0, rx0) = platform::channel().unwrap();
let child_pid = unsafe {
fork(|| {
// child panics
tx0.send(b"\x00", vec![], vec![]).unwrap();
})
};
let (res, _, _) = rx0.recv().unwrap();
assert!(res == b"\x00");
child_pid.wait();
} |
|
Looks good now apart from one remaining issue with the test case. |
| Ok(result) => { | ||
| for evt in result { | ||
| if let OsIpcSelectionResult::DataReceived(_, data, _, _) = evt { | ||
| assert!(data == b"\x00"); |
This comment has been minimized.
This comment has been minimized.
antrik
Feb 6, 2017
Contributor
I'm sorry to say that the error checking is both overly complicated and incomplete: it doesn't check for wrong number of results (including 0); nor for unexpected ChannelClosed instead of DataReceived...
As I mentioned before, you can just do let (id, data, _, _) = result.into_iter().next().unwrap().unwrap(), like existing test cases.
(In case it's not clear, the first unwrap() is for the next() returning Some(...), while the second one is a custom method of OsIpcSelectionResult.)
Or being really thorough:
let mut result_iter = result.into_iter();
let (id, data, _, _) = result_iter.next().unwrap().unwrap();
assert!(id == rx_id);
assert!(data == b"\x00");
assert!(result_iter.next().is_none());
This comment has been minimized.
This comment has been minimized.
dlrobertson
Feb 7, 2017
Author
Collaborator
I tried to follow this, but instead of using asserts I formatted strings to send back to the parent so that we get informative error messages if we hit one of these.
This comment has been minimized.
This comment has been minimized.
antrik
Feb 8, 2017
Contributor
Well, I see your point -- but if you really want to go down this road, you'd have to match all the result values manually in the child process, rather than using any asserts/unwraps etc. Or maybe you could use something with set_hook() and/or catch_unwind(), which would allow sticking with the usual assert/unwrap stuff -- but that seems tricky to get right as well.
To be honest, I'm not sure either is worth the effort. Just send back the correct result I'd say, and panic the child process otherwise -- like the other multi-threaded/multi-process tests do. Running into a test failure is not the common case; when someone does, they kinda just have to figure out that they need to run the tests with -- --nocapture to see the additional error output... Not ideal, but good enough for the tests.
This comment has been minimized.
This comment has been minimized.
dlrobertson
Feb 9, 2017
Author
Collaborator
Ah! Another good point... I'll just stick with the assert!
|
@danlrobertson the other cross-process tests are using (This is actually quite unsurprising -- we have the problem with raw Mach ports in GNU Hurd...) |
Interesting. I changed the cfg for the test so it doesn't run on mac. |
|
@danlrobertson well, it shouldn't be too hard to change the test to use Though I just realised that there is also another problem there: any messages sent at the platform level will be padded to a multiple of 4 bytes on MacOS -- so you need to avoid odd lengths, or truncate the results, like other platform tests do... (Or maybe you could consider implementing this as a top-level test instead of a platform test? Automatic serialisation of the messages would avoid such pitfalls, while otherwise not changing much in this case I think; and I'd don't see any obvious obstacles from the top of my head...) |
Ignore EINTR on Poll::poll in unix::OsIpcReceiverSet::select. Returning EINTR as an error on select causes panics in servo.
|
Updated the unit test to use |
| fn receiver_set_eintr() { | ||
| let (server, name) = OsIpcOneShotServer::new().unwrap(); | ||
| let (tx1, rx1) = platform::channel().unwrap(); | ||
| let mut rx_set = OsIpcReceiverSet::new().unwrap(); |
This comment has been minimized.
This comment has been minimized.
antrik
Feb 9, 2017
Contributor
Are you sure rx_set actually survives the fork() on MacOS? I had assumed that you would have to create the main channel and receiver set in the child process, and send tx1 to the parent process through tx0 along with the "ready" message...
(In fact I don't think forking a receiver or receiver set is a good idea even with the Unix back-end using sockets, since that breaks the guarantee of having a unique receiver I believe.)
This comment has been minimized.
This comment has been minimized.
dlrobertson
Feb 11, 2017
Author
Collaborator
create the main channel and receiver set in the child process, and send tx1 to the parent process through tx0 along with the "ready" message
Good point. The update does this.
| assert_eq!(rx_id, id); | ||
| assert_eq!(vec![0, 0, 0, 0], data); | ||
| assert!(result_iter.next().is_none()); | ||
| }) |
This comment has been minimized.
This comment has been minimized.
antrik
Feb 9, 2017
Contributor
Uh... Seems I have been unclear again: you still need to send an indicator to the parent that the child received the expected result successfully before it exits -- otherwise the parent (and thus test harness) never knows whether the test actually succeeded or not... My point was just that you don't really have to send anything back when the child detects some failure, since the parent never getting the success indication is sufficient to make the test case fail (by timing out).
This comment has been minimized.
This comment has been minimized.
dlrobertson
Feb 11, 2017
Author
Collaborator
Your right, my bad... I wasn't thinking when I removed the last send
| let mut result_iter = result.into_iter(); | ||
| let (id, data, _, _) = result_iter.next().unwrap().unwrap(); | ||
| assert_eq!(rx_id, id); | ||
| assert_eq!(vec![0, 0, 0, 0], data); |
This comment has been minimized.
This comment has been minimized.
antrik
Feb 9, 2017
Contributor
Nitpick: it's rather inconsistent to use a vector here, while you use a byte string slice in send()... You could use assert_eq!(b"\x00\x00\x00\x00".as_ref().to_owned(), data) or assert_eq!(b"\x00\x00\x00\x00".as_ref(), &*data).
(And while at it, I'd say [0; 4] would be more readable than b"\x00\x00\x00\x00", if you want to stick with the 0s. Probably better though to use byte string slices of four actual characters instead -- preferably different one in each send(), to tell them apart...)
This comment has been minimized.
This comment has been minimized.
dlrobertson
Feb 11, 2017
Author
Collaborator
assert_eq!(b"\x00\x00\x00\x00".as_ref(), &*data) I like this best. The update uses this and uses more descriptive strings.
| #[cfg(not(any(feature = "force-inprocess", target_os = "windows", target_os = "android")))] | ||
| fn receiver_set_eintr() { | ||
| let (start_server, start_name) = OsIpcOneShotServer::new().unwrap(); | ||
| let (end_server, end_name) = OsIpcOneShotServer::new().unwrap(); |
This comment has been minimized.
This comment has been minimized.
antrik
Feb 11, 2017
Contributor
Why are you using a separate server (channel) for the final result, rather than just sending the success message over the first channel as well?
Note that this isn't really wrong; so I wouldn't mind merging it as is -- it just makes the test more complicated than necessary...
This comment has been minimized.
This comment has been minimized.
dlrobertson
Feb 11, 2017
Author
Collaborator
Why are you using a separate server (channel) for the final result, rather than just sending the success message over the first channel as well?
Sorry, just hadn't used OneShotServers before and didn't read the code close enough. Updated in the latest changes.
| // Wait until the child is ready | ||
| let (_, res, mut channels, _) = start_server.accept().unwrap(); | ||
| assert!(res == b" Ready! "); | ||
| assert!(channels.len() == 1); |
This comment has been minimized.
This comment has been minimized.
antrik
Feb 11, 2017
Contributor
I don't think that last assert is really necessary: as this receive is not really part of the stuff we are testing here (rather just needed for the test harness), there is no need to be too thorough here. If we don't get the channel we require for the test, the next unwrap() will panic anyway.
This comment has been minimized.
This comment has been minimized.
dlrobertson
Feb 11, 2017
Author
Collaborator
True, this is a complicated enough test, so IMO any removal of unnecessary asserts or code is a win.
If Poll::poll is interrupted by a signal, it returns EINTR. A test should be created that sends a signal that will be caught while blocking on OsIpcReceiverSet::select.
|
@bors-servo r+ |
|
|
|
|
dlrobertson commentedFeb 2, 2017
Ignore
EINTRon Poll::poll inunix::OsIpcReceiverSet::select. ReturningEINTRas an error on select causes panics in servo.Resolves: #131