Skip to content

net, net/http: unpredictable slowness of TestSendfile and TestOnlyWriteTimeout w/ squeezed TCP windows #19032

@mikioh

Description

@mikioh

This is probably not an issue for the standard library; for the runtime package which retrieves data readable/writable notifications from the kernel, for the net package which invokes read/write system calls, and for the net/http package which adds, drops and manages in-flight connections and IO calls for customer traffic. This happens, as fa as I can observe, on any combination of FreeBSD 10 through 12 and go1.7 through tip.

When we run a script like the following:

#!/usr/bin/env bash

function die() {
        rc=$?
        sudo sysctl -w net.inet.tcp.recvspace=65536
        sudo sysctl -w net.inet.tcp.sendspace=32768
        exit $rc
}

for (( wnd=65536; wnd >= 1024; wnd /= 2 )); do
        sudo sysctl -w net.inet.tcp.recvspace=$wnd
        sudo sysctl -w net.inet.tcp.sendspace=$wnd
        go test -v -run TestSendfile net || die
#        go test -v -run TestOnlyWriteTimeout net/http || die
done

die

TestSendfile takes a bit long time with some window size.

net.inet.tcp.recvspace: 65536 -> 65536
net.inet.tcp.sendspace: 32768 -> 65536
=== RUN   TestSendfile
--- PASS: TestSendfile (0.11s)
net.inet.tcp.recvspace: 65536 -> 32768
net.inet.tcp.sendspace: 65536 -> 32768
=== RUN   TestSendfile
--- PASS: TestSendfile (15.24s)
net.inet.tcp.recvspace: 32768 -> 16384
net.inet.tcp.sendspace: 32768 -> 16384
=== RUN   TestSendfile
--- PASS: TestSendfile (0.21s)
net.inet.tcp.recvspace: 16384 -> 8192
net.inet.tcp.sendspace: 16384 -> 8192
=== RUN   TestSendfile
--- PASS: TestSendfile (83.76s)
net.inet.tcp.recvspace: 8192 -> 4096
net.inet.tcp.sendspace: 8192 -> 4096
=== RUN   TestSendfile
--- PASS: TestSendfile (14.93s)
net.inet.tcp.recvspace: 4096 -> 2048
net.inet.tcp.sendspace: 4096 -> 2048
=== RUN   TestSendfile
--- PASS: TestSendfile (25.12s)
net.inet.tcp.recvspace: 2048 -> 1024
net.inet.tcp.sendspace: 2048 -> 1024
=== RUN   TestSendfile
--- PASS: TestSendfile (45.75s)

With a quick glance at system call trace, it looks like the kevent system call sometimes takes a long breath.

Not sure the reason; sb{r,w}wakeup miss some important condition? tcp_input miscalculates some threshold for sb{r,w}wakeup? tcp over ip over loopback interface does some special short-circuit/fuse stuff? or runtime-integrated network poller is not invoked appropriately? (the last one is unlikely)

Note that I ran the above test under some VM stuff on my '09 laptop and I guess that test results may vary on circumstances.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions