Skip to content

TCP Send Mangles ByteStrings #51

@andrewthad

Description

@andrewthad

I'm using socket-0.7 with GHC-8.0 and building on NixOS. The changelog for 0.8 does not indicate that this was addressed in the new release, but if upgrading will fix this, let me know. The application we are writing sends data to carbon-cache (part of the graphite technology stack) using a plaintext protocol. Basically, we open up a TCP socket, do a ton of sendAll calls (no recvs), and then close the socket. What goes wrong is that the haskell-socket library occassionally mangles bytes that it's sending across the wire. I have confirmed this by using tcpflow. I cannot give an actual example because the data contains confidential information, but here is something similar to what is happening to the TCP stream.

Expected:

foo.bar.baz.node1.metric 1.98 1490804745
foo.bar.baz.node2.metric 2.16 1490804745
foo.bar.baz.node3.metric 2.04 1490804745

Actual:

foo.bar.baz.node1.metric 1.98 1490804745
foo.bar.baz.node2.metfoo.bar.baz.node56 2.43 14908047360804745
foo.bar.baz.node3.metric 2.04 1490804745

Just for extra clarity, the second line has had a fragment deleted from it and another another line has replaced that fragment:

foo.bar.baz.node2.met[[[foo.bar.baz.node56 2.43 1490804736]]]0804745

Basically, another line from somewhere else in the TCP stream shows up in the middle of the line. Here is some additional information that may be helpful:

  • This happens regardless of whether we use sendAll or sendAllBuilder.
  • This only happens when using a real network interface. This issue is never manifested when using the loopback interface. In the application my team works on, we have a TCP connection to localhost and to a remote host. We send the same metrics to both. Only the metrics going to the remote host get mangled. This makes me suspect that there is a subtle concurrency issue. The loopback interface is probably fast enough to hide it.
  • The application is multi-threaded and makes concurrent calls to sendAll.
  • The frequency of mangled-line-occurences is about thirty per hour (out of the 5 million lines sent every hour).

That's everything I know. I've looked through the code a little, and I cannot see any obvious issues. If there's any additional information that I could provide, let me know.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions