Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible performance regression in 1.9 - recvfrom calling EAGAIN even after all the data is read #168

Open
gopalakrishna opened this issue Jul 14, 2019 · 8 comments

Comments

@gopalakrishna
Copy link

commented Jul 14, 2019

Output of haproxy -vv and uname -a

`

 

$ ./bin/haproxy -vv
HA-Proxy version 1.9.5 2019/03/19 - https://haproxy.org/
Build options

TARGET = linux2628

CPU = generic
ot. = gcc
CFLAGS = -0O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare

-Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers
-Wtype-limits
OPTIONS = USE ZLIB=1 USE REGPARM=1 USE OPENSSL=1 USE LUA=1 USE STATIC PCRE=1

Default settings
maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with OpenSSL version : OpenSSL 1.1.1b 26 Feb 2019

Running on OpenSSL version : OpenSSL 1.1.1b 26 Feb 2019

OpenSSL library supports TLS extensions : yes

OpenSSL library supports SNI : yes

OpenSSL library supports : TLSv1l.0 TLSvl.1 TLSvl.2 TLSvl.3

Built with Lua version : Lua 5.3.5

Built with transparent proxy support using: IP TRANSPARENT IPV6é TRANSPARENT IP FREEBIND
Built with zlib version : 1.2.11

Running on zlib version : 1.2.11

Compression algorithms supported : identity("identity"), deflate ("deflate"), raw-deflate ("deflate"), gzip("gzip")
Built with PCRE version : 8.42 2018-03-20

Running on PCRE version : 8.42 2018-03-20

PCRE library supports JIT : no (USE _PCRE JIT not set)

Encrypted password support via crypt(3): yes

What's the configuration?

global
  stats timeout 2m
  maxconn 40
  cpu-map atts
  #tune.recv enough 200
  daemon

defaults
  mode tcp
  log global
  option dontlognull
  option splice-auto
  retries 1
  timeout queue 1m
  timeout connect 10s
  timeout client 1m
  timeout server 1m
  timeout http-keep-alive 10s
  maxconn 30
  option tcplog

frontend mainl
  bind <IP1>:44444
  use backend testl

backend testl
  balance first
  server serverl <IP2>:55555

Steps to reproduce the behavior

  1. I was trying to upgrade my system's haproxy from 1.5 to 1.9.5 and found that there was a performance regression with the same config. It was taking more time for TCP pass through forwarding than in 1.5.

Actual behavior

I have an application that pumps in messages and measures the time taken (a self written ping pong test). There was a regression in the latency performance between 1.5 and 1.9.5.

Expected behavior

There should be no performance regression.

Do you have any idea what may have caused this?

Did a strace between 1.5 and 1.9.5 to understand what might be causing the issue. Figured out that the difference was that recvfrom was being called twice for each read even if the first recvfrom returned all the data (205 bytes is my data size) and the second one was always failing with EAGAIN. Below are the difference between 1.5 and 1.9.5

Strace for 1.5:

socket (AF_INET, SOCK_STREAM, IPPROTO_TCP) = 5

fcntl(5, F_SETFL, O_RDONLY|O_NONBLOCK) = 0

setsockopt(5, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0

setsockopt(5, SOL_SOCKET, SO_REUSEPORT, [1], [4]) = 0

bind(5, {sa _family=AF_ INET, sin_port=htons (44444), sin_addr=inet addr("<IP1>")}, 16) = 10
getsockopt (5, SOL_SOCKET, SO_ACCEPTCONN, [0], [4]) = 0

listen(5, 30) = 0

accept4(5, {sa_family=AF_INET, sin_port=htons (37999), sin_addr=inet addr("<IP2>")}, [16], SOCK _NONBLOCK) = 7
setsockopt(7, SOL_TCP, TCP_NODELAY, [1], 4) = 0

laccept4 (5, Ox7£fes64ed£s0, Ox7ffeses¢edf4c, SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(7, OxSSe0a5cc39f4, 8192, 9, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(7, "<MSG1>"..., 8192, 0, NULL, NULL) = 98
socket (AF_INET, SOCK_STREAM, IPPROTO_TCP) = 8

fcntl1(s, F_SETFL, O_RDONLY|O_NONBLOCK) = 0

setsockopt (8, SOL_TCP, TCP_NODELAY, [1], 4) = 0

connect(8, {sa _family=AF_ INET, Sin_port=htons(55555), sin_addr=inet_ addr("<IP2>")}, 16) = -1 EINPROGRESS (Operation now in progress)
sendto(8, "<MSG1>"..., 98, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 9) = 98
recvfrom(8, OxSS5e0a5cc7a84, 16384, 90, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(8, "<MSG1>"..., 16384, 90, NULL, NULL) = 93
sendto(7, "<MSG1>"..., 93, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 9) = 93

recvfrom(7, "<MSG2>"..., 16384, 0, NULL, NULL) = 205
sendto(8, "<MSG2>"..., 205, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 9) = 205

recvfrom(7, "<MSG3>"..., 16384, 0, NULL, NULL) = 205
sendto(8, "<MSG3>"..., 205, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 9) = 205

Strace for 1.9.5:

getsockopt (6, SOL_SOCKET, SO_LINGER, {onoff=0, linger=0}, [8]) 0
setsockopt(6, SOL_SOCKET, SO_LINGER, {onoff=0, linger=0}, 8) =
setsockopt (6, SOL_SOCKET, SO_REUSEPORT, [1], 4) = 0
setsockopt(6, SOL_TCP, TCP_DEFER_ACCEPT, [0], 4) = 0

bind(6, {sa_family=-A_ INET, sin _port=htons (44444), sin _addr=inet addr("<IP1>")}, 16) = 90

getsockopt (6, SOL_SOCKET, SO_ACCEPTCONN, [0], [4]) = 0

listen(6, 30) = 0

setsockopt(6, SOL_TCP, TCP_QUICKACK, [1], 4) = 0

accept4(6, {sa_family=AF INET, sin_port=htons (36439), sin_addr=inet addr("<IP2>")}, [16], SOCK _NONBLOCK) = 10
setsockopt(10, SOL_TCP, TCP_NODELAY, [1], 4) = 0

accept4(6, Ox7f£ffde0o7des0, Ox7£fffdeo7dédc, SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)

recvfrom(10, 0x192b630, 15360, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)

recvfrom(10, "<MSG1>"..., 15360, 0, NULL, NULL) = 98

recvfrom(10, 0x1927682, 15262, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)

socket (AF INET, SOCK_STREAM, IPPROTO_TCP) = 11

fcntl(11, F_SETFL, O_RDONLY|O_NONBLOCK) = 0

setsockopt(11, SOL_TCP, TCP_NODELAY, [1], 4) = 0

connect(11, {sa_family=AF_INET, sin_port=htons(55555), sin_addr=inet_addr("<IP2>")}, 16) = -1 EINPROGRESS (Operation now in progress)

sendto(11, "<MSG1>"..., 98, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 98
recvfrom(11, 0x192b630, 16384, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
recvfrom(11, "<MSG1>"..., 16384, 0, NULL, NULL) = 93
recvfrom(11, 0xl92b68d, 16291, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
sendto(10, "<MSG1>"..., 93, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 93
recvfrom(10, "<MSG2>"..., 16384, 90, NULL, NULL) = 205
recvfrom(10, 0xl92b6éfd, 16179, 90, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
sendto(11, "<MSG2>"..., 205, MSG DONTWAIT|MSG NOSIGNAL, NULL, 0) = 205

I tried setting tune.recv enough=200 which is less than the message size of 205 hoping that this would solve the issue. Turns out it did not .

Do you have an idea how to solve the issue?

I am not an expert in networking. Hence no idea how to solve this.

@wtarreau

This comment has been minimized.

Copy link

commented Jul 15, 2019

I am not surprized at all to be honest. In 1.5, all I/O operations were performed directly from the upper layers where we know what strategy to apply depending on what was expected to be done with these data. E.g. "read as short as possible, it's for low-latency TCP forwarding". Now with the layering necessary to support multiple protocols, we need to progressively extend the internal API and propagate that knowledge down the stack. There are plans for improving this for 2.1 and 2.2, maybe some such parts could later be backported if proven trivial enough.

@wtarreau

This comment has been minimized.

Copy link

commented Jul 15, 2019

I'm leaving it as a bug eventhough it's not really one but more of a performance regression, and tagging it as "works as designed". This way we can keep track of it for future versions.

@wtarreau

This comment has been minimized.

Copy link

commented Jul 15, 2019

Upon deeper code inspection, the reason for this is that we need to read one extra time to have the chance to detect the close that comes with a potential response. Sometimes the poller can report this (EPOLL_RDHUP) but the information is lost as this flag is not kept in the FD cache. You can actually emulate the older behaviour by building with DEFINE="-DMAX_READ_POLL_LOOPS=1" which wil prevent subsequent calls from being done, but on HTTP close traffic this results in a 10% performance loss.

@gopalakrishna

This comment has been minimized.

Copy link
Author

commented Jul 15, 2019

Thanks for the detailed response. The make flag helped get the number down to exactly as it was in 1.5.

My usecase is for a long running tcp connection that needs to be tunneled. Hence this penalty is acceptable.

@wtarreau

This comment has been minimized.

Copy link

commented Jul 15, 2019

@gopalakrishna

This comment has been minimized.

Copy link
Author

commented Jul 16, 2019

The performance definitely improved compared to the stock build. Now it is same as in 1.5.

@lukastribus

This comment has been minimized.

Copy link
Contributor

commented Aug 12, 2019

Also confirmed on discourse in:

https://discourse.haproxy.org/t/increased-latency-when-migrating-from-1-8-to-1-9-or-2-0/4066/7

Our service “normally” reports a p99 response time to a downstream service with 600ms. After the upgrade to HAProxy, this p99 latency went up to 1200ms (yes, 1.2 seconds!!). So, it appears that HAProxy 2.0.2 introduces extra latency of up to 600ms! The “average” processing time is measured as 55ms and with HAProxy 2.0.2 it went up to 68ms. So the average latency increase was 13ms!

So in certain configurations it appears to affect the latency negatively by 100%.

@wtarreau

This comment has been minimized.

Copy link

commented Aug 13, 2019

As mentioned on discourse, we're already speaking about extremely high numbers and I'm wondering what this is really hiding. I suspect a polling issue or something. It's not possible that this single syscall is responsible for anything measurable, let alone numbers measured in milliseconds!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.