-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent crash/hang with TLS on Windows under poor network conditions #191
Comments
I recently heard about haskell-tls/hs-tls#124 On Thu, Apr 21, 2016, 7:11 AM Echo Nolan notifications@github.com wrote:
|
Still happens on nightly-2016-04-20 with tls-1.3.5 :( |
I don't have ready access to a Windows machine to try this on. Can you test if this kind of problem occurs at all when using the connection package directly? Pinging @vincenthz for any other thoughts on narrowing down the problem. |
it does smells like a |
@enolan Can you test if you can reproduce this at all with an insecure connection? |
I tried pretty hard and couldn't. Debug printfs indicate the crash happens in hGetBuf. I'm trying to narrow it down further. |
Wow, good catch. Is there any potential workaround outside of GHC? |
It only happens when sockets are converted to handles, so if tls used On Tue, May 3, 2016, 8:20 PM Michael Snoyman notifications@github.com
|
I don't know the details of how this works internally to tls/connection. @vincenthz is there any reasonable way to move away from |
Good catch @enolan ! I've pushed support to |
Are there any changes that need to be made on the http-client-tls side to take advantage of this support, or just recompiling against master? |
it should just work as is |
Awesome, thank you :) |
@vincenthz Sorry! I read |
:( could |
The crash happens before control returns from On Sun, May 8, 2016, 11:55 AM Vincent Hanquez notifications@github.com
|
darn :\ . Could probably have network call a different foreign import maybe, one that returns the right thing, because fixing |
Summary: They return signed 32 bit ints on Windows, even on a 64 bit OS, rather than Linux's 64 bit ssize_t. This means when recv() returned -1 to signal an error we thought it was 4294967295. It was converted to an int, -1 and the buffer was memcpy'd which caused a segfault. Other bad stuff happened with send()s. See also note CSsize in System.Posix.Internals. Add a test for #12010 Test Plan: - GHC testsuite (T12010) - http-conduit test (snoyberg/http-client#191) Reviewers: austin, hvr, bgamari, Phyx Reviewed By: Phyx Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2170 GHC Trac Issues: #12010
This is the rabbit hole with no bottom. It's actually fixable in @vincenthz Your change to |
This is fixed upstream in |
Awesome, thank you! On Thu, Aug 4, 2016, 6:51 AM Echo Nolan notifications@github.com wrote:
|
Summary: They return signed 32 bit ints on Windows, even on a 64 bit OS, rather than Linux's 64 bit ssize_t. This means when recv() returned -1 to signal an error we thought it was 4294967295. It was converted to an int, -1 and the buffer was memcpy'd which caused a segfault. Other bad stuff happened with send()s. See also note CSsize in System.Posix.Internals. Add a test for #12010 Test Plan: - GHC testsuite (T12010) - http-conduit test (snoyberg/http-client#191) Reviewers: austin, hvr, bgamari, Phyx Reviewed By: Phyx Subscribers: thomie Differential Revision: https://phabricator.haskell.org/D2170 GHC Trac Issues: #12010 (cherry picked from commit 1ee47c1)
I can get a simple http-conduit program to crash or hang fairly often when the connection has occasional data corruption. I've also observed the hang when only simulating packet loss. Here's a repo with my test case. The program is tiny:
Using Clumsy in tamper mode set to 20%, it'll crash or hang maybe half the time.
When I instrument API calls with API Monitor I can see that the crashes happen immediately after a call to
recv()
fails.recv()
returnsSOCKET_ERROR
which is a constant equal to -1. (When it's successful,recv()
returns the number of bytes received.) Thenmemcpy()
is called with the length set to -1. This causes the Windows equivalent of a segfault. Somewhere, the return value ofrecv()
isn't getting checked.The same thing with a cleartext HTTP connection is no problem. I also set up a simple cleartext HTTP server on a Linux machine and injected RSTs with
tcpkill
. This doesn't make it crash either.I found this in the course of debugging commercialhaskell/stack#1689 (in the course of getting Idris building on AppVeyor). Stack hasn't ever crashed on me, but it seems likely the hangs and crashes in the simple http-conduit program are related to the hangs in Stack.
To reproduce:
stack build
stack exec bad-memcpy-crash
You may have to run it several times. When it crashes you should see the Windows crash dialog and if you have Visual Studio installed, an offer to open a debugger.
Let me know if there's more I can do to figure this out.
The text was updated successfully, but these errors were encountered: