Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux: Always use blocking receive for followup fragments #48

Merged
merged 2 commits into from Mar 9, 2016

Conversation

@antrik
Copy link
Contributor

antrik commented Mar 8, 2016

Fixes an issue (see commit message for details) that I discovered in the code while working on other stuff. I don't know how likely it is to manifest in real life; but it seems entirely likely that it's causing some spurious failures now and again...

The PR also includes a commit with improvements to several unrelated test cases, which I didn't want to untangle from the main commit with the newly added tests. It should do no harm.

antrik added 2 commits Feb 27, 2016
Rather then just sending a large block of the same byte, send an
alternating sequence. This way we should be able to detect when parts of
the message have been shuffled, even when the total length hasn't
changed.
While large messages need to use fragmentation internally, from the
user's point of view a receive should still be atomic. Thus, once we
received the initial fragment of a multi-fragment message, the receive
operations for the followup fragments should always block -- even if the
caller requested non-blocking receive. Otherwise, a receive attempt on a
followup fragment before it is available will cause the entire message
receive invocation to abort -- so the message gets lost on the receiver
side, while the sender blocks forever trying to send the remaining
fragments.

Note that blocking for follow-up fragments shouldn't actually cause
major delays in the receiver, as the sender thread doesn't do any other
slow or blocking operations between the individual fragment sends.

The issue turns out rather tricky to demonstrace (at least on my Linux
system): apparently the way the scheduler works, once the sender starts
pushing out fragments, it usually doesn't get preempted; but rather just
sends packets until the queue gets full, causing the send call to block
-- until the receiver picks up the pending fragment, thus making space,
and consequently enabling the sender to resume and immediately send the
next fragment.

The somewhat complex new try_recv_large_delayed() test (using multiple
threads with artificial delays) manages to get around this on my system;
but only with a release build -- I haven't found a sane way to reproduce
the problem on non-release builds...
@antrik antrik force-pushed the antrik:fix-try_recv_large branch from 61af077 to 78c1be4 Mar 8, 2016
@pcwalton
Copy link
Collaborator

pcwalton commented Mar 9, 2016

@bors-servo
Copy link
Contributor

bors-servo commented Mar 9, 2016

📌 Commit 78c1be4 has been approved by pcwalton

@bors-servo
Copy link
Contributor

bors-servo commented Mar 9, 2016

Testing commit 78c1be4 with merge 6ad3885...

bors-servo added a commit that referenced this pull request Mar 9, 2016
Linux: Always use blocking receive for followup fragments

Fixes an issue (see commit message for details) that I discovered in the code while working on other stuff. I don't know how likely it is to manifest in real life; but it seems entirely likely that it's causing some spurious failures now and again...

The PR also includes a commit with improvements to several unrelated test cases, which I didn't want to untangle from the main commit with the newly added tests. It should do no harm.
@bors-servo
Copy link
Contributor

bors-servo commented Mar 9, 2016

☀️ Test successful - travis

@bors-servo bors-servo merged commit 78c1be4 into servo:master Mar 9, 2016
0 of 3 checks passed
0 of 3 checks passed
continuous-integration/appveyor/pr AppVeyor build failed
Details
continuous-integration/travis-ci/pr The Travis CI build failed
Details
homu Testing commit 78c1be4 with merge 6ad3885...
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.