-
Notifications
You must be signed in to change notification settings - Fork 48
EndToEnd tests failing(hanging) on latest 4.0 snapshots on Linux #22
Comments
I updated BlueSockets to the latest 4.0 toolchain (see https://github.com/IBM-Swift/BlueSocket/tree/UnsupportedWIP/) to see if that was the problem, and the BlueSocket tests run clean on Linux under the latest snapshots, so that's not it. The investigation continues. |
It appears that the issue is happening because, in the test that passes, the attempt to close the socket fails (although the test passes). Here's the working output (with some extra prints) on the Mac:
And here's the same (failing) call on Linux:
The two diverge at: I can rework things so that this doesn't happen in these tests (and probably will), but that leaves open the possibility that it can happen in the wild, causing a server hang, so it needs to get fixed. Next step to see if I'm correct, is to replicate this error in a much simpler sample project. |
Here is the output of two test runs on the same system, one from Swift3.1.1 and the other from last week's 4.0 Snapshot: In both cases, the code is doing a blocking call to I'm also attaching a tar.gz of both source trees, to make it easier for other people to reproduce: |
Possibilities:
|
Hi Carl. Am I reading it correctly that this test used to work on Swift 4 development snapshots until about a week ago? Asking because Apple merged several of their own PRs into dispatch during the July 13-July 17 time period. I'm trying to figure out if it is worth selectively backing those out to see if one of them was buggy. They touched bits of code dealing with Linux signal handling, ioctls, and processing of epoll. |
As far as i know, it's never worked on Swift 4. It certainly hasn't worked on any of the 4.0-DEVELOPMENT snapshots that I've tried. It has worked on all the 3.x ones. (Sorry about the "on the last few 4.0 snapshots" language. I was trying to indicate that I hadn't tested it on earlier ones, not that it had worked on earlier ones). One of the things I will do if I run out of other things to try is to go back in time snapshot by snapshot until I find the one where it stopped working. That will be a lot of downloading snapshots, though, so I was hoping to figure it out some other way. |
ok, thanks. Then I will look from first principles instead of focusing on what Apple changed in their PRs last week. |
Ok, so good news and bad news. The good news is, I've discovered via brute force that the regression happens between the 2017-05-30 and 2017-05-31 snapshots (e.g. the tests pass with the May 30th snapshot and hang/fail with the May 31st one). The bad news is there are no libDispatch changes between the The worse news is that of the changes between those tags, the biggest change is the addition of |
sounds like its over to the Foundation team then. |
Quite possibly. I'm just really not sure how a change to |
I've discovered can fix this problem by applying a change to I think this is probably the right thing to do in any case (well, really I think BlueSocket should have exposed a The other problem is: that "fix" in no way answers the question about why the previous version works fine on the Mac and on pre-May31st/pre-_HTTPURLProtocol versions of Swift, but fails on post-May30th/post-_HTTPURLProtocol versions. So I'm not sure if I should apply the BlueSocket fix and just move on, or keep trying to answer the question about what changed, expecting that this won't be the only place that the post-_HTTPURLProtocol behavior might be causing people issues once they move to Swift4. Opinions? @ianpartridge ? @pushkarnk ? |
I tried to get the tests running, but the
|
Any idea what I am missing here? |
You have to remove the See the |
There's also a tar file in a comment above (#22 (comment)) that has the whole directory structure with everything checked out, if that's easier. |
Thanks @carlbrown I can see the failures now. I'll start the investigation. |
Any status on this @pushkarnk @ianpartridge ? |
@carlbrown I see that before the failing tests this message is printed: |
@pushkarnk No, that means the socket was shutdown while the You'll see that message on both the May 30th and May 31st snapshots. |
The |
What snapshot are you seeing them passing with? What changed since 29 days ago when you said you saw the failures? |
@carlbrown I don't see them pass when simply run as |
swiftlang/swift-corelibs-foundation#1195 has fixed this problem. URLSession holds references to URLSessionTask. URLSessionTask holds a reference to an EasyHandle, which is a wrapper around a curl easy handle. Per this doc, to close a connection pertaining to an easy handle, curl_easy_cleanup needs to be called. Now, we call curl_easy_cleanup in the deinit for EasyHandle. In the early days of URLSession, to keep URLSessionTasks alive beyond the function in which they were locally held (because, the responses are async), we had introduced a retain cycle between URLSession and URLSessionTask, and we broke it after the task completed. Unfortunately, this cycle breaking was missed in the refactoring exercise. As a result, URLSessionTask objects accumulated in memory, holding EasyHandles along with them. EasyHandle.deinit() and hence curl_easy_cleanup were never invoked, and hence connections were kept alive. Hence this problem in swift-server/http. The client end did not initiate a connection shutdown and the server was stuck in read() blocking the queue on which reads() for the subsequent connections waited indefinitely.
|
With the last few swift-4.0 development snapshots, all but one of the
XCTests
with names that end withEndToEnd
fail on Linux with timeouts. These are the tests that start a server on a random port and create aURLSession
to send requests to it Those tests pass fine on Darwin (and on Linux under Swift 3.1.1).If you comment all but one of the EndToEnd tests out, then the one you leave in (which ever one you pick) will pass. With two EndToEnd tests in, regardless of the order you choose, the first one will complete, and the second one will time out.
I'm guessing something isn't being cleaned up correctly, but I'm not sure what. Will update as I learn more.
The text was updated successfully, but these errors were encountered: