-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Net::TCPClient#socket_write corrupts data on short writes #10
Comments
@SpComb did you fixed it ? |
looks like implementation should be similar to lann/tcp-timeout-ruby@a167299#diff-783b62709d9f59031b907690bb165c6cR66 |
@Fivell since it appears you have the code fix, do you want to submit a PR to merge your changes? |
@reidmorrison yes we tested it in production, however I have no specs for it |
@reidmorrison seems travis.yml need to be fixed |
Merged the fix and will research Travis failures. Thank you for the PR. |
I think this also occurs on short reads... |
The
Net::TCPClient#socket_write
callssocket.write_nonblock(data)
. However, it does not check the returned number of bytes written:I believe that in the case of full socket write buffers, and calls to
write
with large buffers that the kernel does not accept in a singlewrite
syscall could result in lost data, as thewrite
syscall only sends part of the buffer. This leads to corruption once thatwrite
returns, and the nextwrite
is called.Here's a repro case demonstrating this issue: https://gist.github.com/SpComb/c8b857fc5bb575a1f859f7ea57603a29
The
test-client.rb
sends one ~1m lines of sequential integers formatted as '%07d\n' (00000001..0998001
). It sends them 1k lines at a time, in 8kbwrite
calls. It logs the return value fromNet::TCPClient#write
, which seems to be the value returned bysocket.write_nonblocking
.The
test-server.rb
reads one line at a time, and prints. N
+ sleeps 1s/10k lines for the first 100k lines, and then 1s/100k lines for the rest. It parses each line as an integer, comparing it to the previous line, logging a!
message if the lines are not strictly sequentialDuring the first slow read period, the sending client gets blocked on a full send buffer, which shows up as short writes on the client after about ~400-500kb. It blocks for a short period, and then continues, as the server empties its recv buffer and makes room for more data:
Each of these short writes results in corruption on the server:
The text was updated successfully, but these errors were encountered: