Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix notification handling tests and `unix` implementation #197

Merged
merged 3 commits into from May 30, 2018

Conversation

@antrik
Copy link
Contributor

antrik commented May 17, 2018

The first commit attempts to make notification tests more robust. (Hopefully fixing intermittent failures on Linux.) The other commits add some additional related tests, and fix a (minor) issue uncovered by them.

@metajack
Copy link

metajack commented May 17, 2018

Are there no other solutions? Sleeps feel like a bit of a hack.

@antrik antrik force-pushed the antrik:fix-notification-tests branch from 26541eb to f5090ee May 30, 2018
antrik added 3 commits May 17, 2018
`no_receiver_notification()`, and especially
`no_senders_notification_try_recv()`, fail quite frequently on CI.
The only reason I can think of is that apparently the kernel sometimes
doesn't process the close event immediately, but rather continues
running the user thread, so the next read or write call doesn't get the
notification yet.

Attempting to fix this by repeatedly polling for the notifications to
arrive. Let's hope this doesn't have any undesired side effects...

Note that `no_senders_notification()` should not be affected, since that
one isn't async. (The regular `recv()` method waits for a result to
arrive.)
Turn `UnixError` into an enum with an explicit variant for the
"connection closed" status -- like other back-ends do -- rather than
overloading `ECONNRESET`.

The overloading was causing misleading error reporting, especially when
unwrapping a result: the panic message would report it like an error
produced by the system -- while in fact Unix doesn't report this
condition (attempting to receive from a socket with no senders left) as
a system error.

What's worse, Linux actually reports `ECONNRESET` in some cases when
attempting a *send* after the *receiver* end was closed (specifically,
when the receiver was closed while there were unreceived messages
pending) -- so the overloaded error could cause serious confusion.
@antrik antrik force-pushed the antrik:fix-notification-tests branch from f5090ee to 1c01539 May 30, 2018
@antrik antrik changed the title Attempt fixing intermittents in async notification tests Fix notification handling tests and `unix` implementation May 30, 2018
@antrik
Copy link
Contributor Author

antrik commented May 30, 2018

@metajack my assumption was that if it's a thread scheduling issue, a sleep of any duration should make it reliable, as it explicitly hands off scheduling... But it seems my assumption was wrong, as the sleep didn't fix the intermittents.

I have implemented polling in a loop now instead -- let's hope this works better.

In order to verify that the loop indeed handles delayed notifications, I added separate tests forcing a delay -- and uncovered an actual issue in the notification handling on Linux in the process... So the PR grew in scope :-)

@jdm
Copy link
Member

jdm commented May 30, 2018

One of the TravisCI jobs seems displeased: https://travis-ci.org/servo/ipc-channel/jobs/385864860

@antrik
Copy link
Contributor Author

antrik commented May 30, 2018

@jdm that's a known intermittent (problem in test cases rather than implementation), unrelated to this PR.

(I have known about this problem, and what causes it, for quite some time -- but only recently it started getting triggered quite a lot on Travis... So I'm moving it way up on my list.)

@jdm
Copy link
Member

jdm commented May 30, 2018

@bors-servo
Copy link
Contributor

bors-servo commented May 30, 2018

📌 Commit 1c01539 has been approved by jdm

@highfive highfive assigned jdm and unassigned metajack May 30, 2018
@bors-servo
Copy link
Contributor

bors-servo commented May 30, 2018

Testing commit 1c01539 with merge eadc963...

bors-servo added a commit that referenced this pull request May 30, 2018
Fix notification handling tests and `unix` implementation

The first commit attempts to make notification tests more robust. (Hopefully fixing intermittent failures on Linux.) The other commits add some additional related tests, and fix a (minor) issue uncovered by them.
@bors-servo
Copy link
Contributor

bors-servo commented May 30, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: jdm
Pushing eadc963 to master...

@bors-servo bors-servo merged commit 1c01539 into servo:master May 30, 2018
2 of 3 checks passed
2 of 3 checks passed
continuous-integration/travis-ci/pr The Travis CI build could not complete due to an error
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
homu Test successful
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants
You can’t perform that action at this time.