Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stale HAProxy configurations remain listening #71

Closed
ejether opened this issue Feb 16, 2016 · 19 comments
Closed

Stale HAProxy configurations remain listening #71

ejether opened this issue Feb 16, 2016 · 19 comments

Comments

@ejether
Copy link

ejether commented Feb 16, 2016

We are seeing an interesting issue with v1.0.1 of marthon-lb running in SSE.
On a reconfiguration, occasionally the older process remains listening causing stale configuration and the newer process to fail.

I have not be able to reliable reproduce it but it has cause several issues for us recently and the only way. It seems to occur when a service is flapping, or when many deployments are occurring at once. I have a theory that there is a race condition between when the new process has successfully started and the pidfile is read by the next starting process. (https://github.com/mesosphere/marathon-lb/blob/master/service/haproxy/run#L30)

I'm interested in any ideas anyone might have on how to reliably reproduce this or how to guaranty it doesn't bite us anymore. We are not ready to upgrade to v1.1.0 because we don't currently have an up front list of ports.

@flosell
Copy link
Contributor

flosell commented Feb 16, 2016

Sounds a lot like issues we've had in the past. HAProxy, when doing a soft-reload, was spawning a new process with the new config and kept the old process around to finish processing requests currently in progress. But it was also accepting new requests, so the old and new config were both staying alive. Updating HAProxy to the most recent version fixed that issue for us.

Along with that, at least on systemd, haproxy seems to have more issues reloading properly:

@ejether
Copy link
Author

ejether commented Feb 16, 2016

We are using the marathon-lb docker container which is running HAProxy 1.5.8.
What version did you find fixed your problem @flosell?

@flosell
Copy link
Contributor

flosell commented Feb 17, 2016

we are running HAProxy 1.5.14 right now

@brndnmtthws
Copy link
Contributor

When this occurs, can you check how many instances of HAProxy are running? As in, do a ps or pidof haproxy. Can you also check the output of docker ps? I think this may be similar to #72.

@ejether
Copy link
Author

ejether commented Feb 18, 2016

I will do that @brndnmtthws:
I don't konw for sure if the problem is occurring on this node (its hard to find until something in particular starts throwing errors) but this is a typical output.

ejetherington@mesos-master-5vzd:~$ docker ps
CONTAINER ID        IMAGE                           COMMAND                  CREATED             STATUS              PORTS               NAMES
65c0b9df1fb6        mesosphere/marathon-lb:v1.0.1   "/marathon-lb/run sse"   7 days ago          Up 7 days                               marathon-lb
ejetherington@mesos-master-5vzd:~$ docker exec marathon-lb ps  aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0  20052  2988 ?        Ss   Feb10   0:00 /bin/bash /marathon-lb/run sse --marathon http://mesos-master-hzfk:8080 --marathon http://mesos-master-5vzd:8080 --marathon http://mesos-master-4fb0:8080 --dont-bind-http-https --group *
root        12  0.0  0.0   4092   700 ?        S    Feb10   0:00 /usr/bin/runsv /marathon-lb/service/haproxy
root        13  0.7  0.1  65372 21924 ?        S    Feb10  86:42 python3 /marathon-lb/marathon_lb.py --syslog-socket /dev/null --haproxy-config /marathon-lb/haproxy.cfg -c sv reload /marathon-lb/service/haproxy --sse --marathon http://mesos-master-hzfk:8080 --marathon http://mesos-master-5vzd:8080 --marathon http://mesos-master-4fb0:8080 --dont-bind-http-https --group *
root        14  0.0  0.0  21084  3936 ?        S    Feb10   4:17 /bin/bash ./run
root        27  0.0  0.0  26756  6452 ?        Ss   Feb10   7:06 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -sf
root      5782  0.1  0.0  26000  5708 ?        Ss   Feb12  11:53 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -sf 5770
root     13859  0.0  0.0  26000  5760 ?        Ss   Feb14   1:42 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -sf 13365
root     17276  0.0  0.0  26080  5840 ?        Ss   Feb11   0:58 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -sf 17264
root     17915  0.0  0.0  26080  4572 ?        Ss   15:02   0:01 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -sf 17905
root     20407  0.0  0.0   4228   680 ?        S    15:43   0:00 sleep 1
root     20408  0.0  0.0  17492  2120 ?        Rs   15:43   0:00 ps aux
root     22709  0.0  0.0  26348  5992 ?        Ss   Feb17   0:42 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -sf 9208
ejetherington@mesos-master-5vzd:~$ ps aux | grep haproxy
root      1105  0.0  0.0   4092   700 ?        S    Feb10   0:00 /usr/bin/runsv /marathon-lb/service/haproxy
root      1106  0.7  0.1  65372 21924 ?        S    Feb10  86:42 python3 /marathon-lb/marathon_lb.py --syslog-socket /dev/null --haproxy-config /marathon-lb/haproxy.cfg -c sv reload /marathon-lb/service/haproxy --sse --marathon http://mesos-master-hzfk:8080 --marathon http://mesos-master-5vzd:8080 --marathon http://mesos-master-4fb0:8080 --dont-bind-http-https --group *
root      1122  0.0  0.0  26756  6452 ?        Ss   Feb10   7:06 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -sf
root      8271  0.0  0.0  26080  5840 ?        Ss   Feb11   0:58 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -sf 17264
root     15134  0.0  0.0  26000  5760 ?        Ss   Feb14   1:42 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -sf 13365
root     18360  0.0  0.0  26348  5992 ?        Ss   Feb17   0:42 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -sf 9208
root     21907  0.0  0.0  26080  4572 ?        Ss   15:02   0:01 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -sf 17905
ejether+ 26252  0.0  0.0  10472  2116 pts/7    S+   15:43   0:00 grep --color=auto haproxy
root     32450  0.1  0.0  26000  5708 ?        Ss   Feb12  11:53 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -sf 5770
ejetherington@mesos-master-5vzd:~$

@brndnmtthws
Copy link
Contributor

Interesting. You definitely have some extra haproxy instances there. I wonder if the use of flock isn't working as one might expect. Do you have any long-lived TCP connections going through haproxy?

Any chance you can test with the current master code? I made a few changes to how the reloads are handled, and I haven't seen the same behaviour in my recent testing.

@ejether
Copy link
Author

ejether commented Feb 18, 2016

I think it will help, and its on my road map but currently, we don't have a nice way of getting the information required for the $PORTS variable in an automated fashion so it will take a bit of time. I'll try and get it wrapped up next week and report back. Thanks for looking into it.

@brndnmtthws
Copy link
Contributor

It would be sufficient to specify some subset of ports (or even just one port, for that matter) to test. At the very least, it wouldn't be any worse than what you're using now.

@ejether
Copy link
Author

ejether commented Feb 18, 2016

Ok, I was under the impression (only because I didn't investigate too deeply) that it wouldn't work if I didn't have all the $PORTS configured. That makes it a lot easier to test. Thanks

@brndnmtthws
Copy link
Contributor

The only limitation is that reloads will not be completely 'zero-downtime' unless you supply the ports ahead of time.

@ejether
Copy link
Author

ejether commented Feb 23, 2016

I have upgraded docker marathon-lb to the 1.1.1 tag in our development environment. I haven't noticed the problem so far but as it was fairly unpredictable to begin with, it may not come up for some time. I'll roll with this for a while and hope it keeps working. Here is the same output as before, for comparision:

ejetherington@sandbox-mesos-slave-4h1m:~$ docker ps | grep marathon-lb
81edbad16176        mesosphere/marathon-lb:v1.1.1         "/marathon-lb/run sse"   15 hours ago        Up 15 hours                                                            marathon-lb
ejetherington@sandbox-mesos-slave-4h1m:~$ docker exec marathon-lb ps  aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0  20064  2956 ?        Ss   00:34   0:00 /bin/bash /marathon-lb/run sse --marathon http://sandbox-mesos-master-1:8080 --dont-bind-http-https --group *
root        19  0.0  0.0   4100   648 ?        S    00:34   0:00 /usr/bin/runsv /marathon-lb/service/haproxy
root        20  1.5  4.0 120023316 1259348 ?   Sl   00:34  14:22 python3 /marathon-lb/marathon_lb.py --syslog-socket /dev/null --haproxy-config /marathon-lb/haproxy.cfg -c sv reload /marathon-lb/service/haproxy --sse --marathon http://sandbox-mesos-master-1:8080 --dont-bind-http-https --group *
root        21  0.1  0.0  20916  3756 ?        S    00:34   1:22 /bin/bash ./run
root        54  0.0  0.0  28972  5500 ?        Ss   00:34   0:37 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf
root       999  0.0  0.0  29124  5636 ?        Ss   00:40   0:17 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 918
root      1075  0.0  0.0  29644  6164 ?        Ss   00:40   0:20 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 1036
root      3540  0.1  0.0  29248  5616 ?        Ss   00:50   1:02 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 3507
root     17900  0.0  0.0   4236   664 ?        S    16:04   0:00 sleep 0.5
root     17902  0.0  0.0  17500  1976 ?        Rs   16:04   0:00 ps aux
ejetherington@sandbox-mesos-slave-4h1m:~$ ps -aux | grep haproxy
root     12957  0.0  0.0  20864   752 ?        Ss   Feb18   1:20 /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D -sf 34
ejether+ 21382  0.0  0.0  10472  2116 pts/0    S+   16:04   0:00 grep --color=auto haproxy
root     22282  0.0  0.0   4100   648 ?        S    00:34   0:00 /usr/bin/runsv /marathon-lb/service/haproxy
root     22283  1.5  4.0 120023316 1259348 ?   Sl   00:34  14:22 python3 /marathon-lb/marathon_lb.py --syslog-socket /dev/null --haproxy-config /marathon-lb/haproxy.cfg -c sv reload /marathon-lb/service/haproxy --sse --marathon http://sandbox-mesos-master-1:8080 --dont-bind-http-https --group *
root     22320  0.0  0.0  28972  5500 ?        Ss   00:34   0:37 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf
root     24338  0.0  0.0  29124  5636 ?        Ss   00:40   0:17 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 918
root     24418  0.0  0.0  29644  6164 ?        Ss   00:40   0:20 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 1036
root     28406  0.0  0.0  19656   752 ?        Ss   Feb19   1:09 haproxy -f /run/haproxy.cfg -p /run/haproxy.pid -D
root     29659  0.1  0.0  29248  5616 ?        Ss   00:50   1:02 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 3507
ejetherington@sandbox-mesos-slave-4h1m:~$

@brndnmtthws
Copy link
Contributor

Cool, that looks better. I'm going to close the issue for now, but please reopen it if it comes up again.

@ejether
Copy link
Author

ejether commented Feb 24, 2016

We had an issue it this in production last night. I wasn't the one troubleshooting it so my information is limited, but there were many haproxy instances running

On the host:

suhas@mesos-slave-ch9u:~$ ps -ef | grep haproxy | grep marathon-lb
root      5358 12951  1 01:31 ?        00:02:01 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -sf 8664
root      9005 12951  1 03:56 ?        00:00:44 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -sf 17417
root     12973 12951  0 Feb23 ?        00:00:00 /usr/bin/runsv /marathon-lb/service/haproxy
root     12974 12951  2 Feb23 ?        00:07:53 python3 /marathon-lb/marathon_lb.py --syslog-socket /dev/null --haproxy-config /marathon-lb/haproxy.cfg -c sv reload /marathon-lb/service/haproxy --sse --marathon http://mesos-master-hzfk:8080 --marathon http://mesos-master-5vzd:8080 --marathon http://mesos-master-4fb0:8080 --dont-bind-http-https --group *
root     12977 12951  0 Feb23 ?        00:00:49 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg
root     14274 12951  0 Feb23 ?        00:00:23 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -sf 21
root     16243 12951  1 03:04 ?        00:01:03 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -sf 8681
root     21166 12951  0 Feb23 ?        00:00:06 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -sf 240
root     23608 12951  0 Feb23 ?        00:03:00 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -sf 1286
root     28843 12951  1 04:34 ?        00:00:04 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -sf 19749

@ejether
Copy link
Author

ejether commented Feb 24, 2016

Never mind, I hadn't upgraded our production environment yet. My mistake.

@ejether
Copy link
Author

ejether commented Feb 25, 2016

Well, we are still having the issue even after upgrading in production.
Some output from ps-ef in the docker containers:

mesos-slave-jtr6:
    UID        PID  PPID  C STIME TTY          TIME CMD
    root         1     0  0 16:59 ?        00:00:00 /bin/bash /marathon-lb/run sse --marathon http://mesos-master-hzfk:8080 --marathon http://mesos-master-5vzd:8080 --marathon http://mesos-master-4fb0:8080 --dont-bind-http-https --group *
    root        18     1  0 16:59 ?        00:00:00 /usr/bin/runsv /marathon-lb/service/haproxy
    root        19     1  7 16:59 ?        00:12:05 python3 /marathon-lb/marathon_lb.py --syslog-socket /dev/null --haproxy-config /marathon-lb/haproxy.cfg -c sv reload /marathon-lb/service/haproxy --sse --marathon http://mesos-master-hzfk:8080 --marathon http://mesos-master-5vzd:8080 --marathon http://mesos-master-4fb0:8080 --dont-bind-http-https --group *
    root        20    18  0 16:59 ?        00:00:10 /bin/bash ./run
    root        35     1  0 16:59 ?        00:00:28 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 15526
    root     17444     1  0 18:43 ?        00:00:09 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 17406
    root     19002     1  0 18:52 ?        00:00:01 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 18970
    root     19959     1  1 18:56 ?        00:00:39 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 19911
    root     22529     1  0 19:13 ?        00:00:00 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 19959
    root     22579     1  0 19:13 ?        00:00:01 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 22529
    root     22696     1  0 19:13 ?        00:00:01 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 22634
    root     23173     1  3 19:16 ?        00:00:39 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 22964
    root     25663     0  7 19:32 ?        00:00:00 ps -ef
mesos-slave-ch9u:
    UID        PID  PPID  C STIME TTY          TIME CMD
    root         1     0  0 16:59 ?        00:00:00 /bin/bash /marathon-lb/run sse --marathon http://mesos-master-hzfk:8080 --marathon http://mesos-master-5vzd:8080 --marathon http://mesos-master-4fb0:8080 --dont-bind-http-https --group *
    root        18     1  0 16:59 ?        00:00:00 /usr/bin/runsv /marathon-lb/service/haproxy
    root        19     1  8 16:59 ?        00:12:27 python3 /marathon-lb/marathon_lb.py --syslog-socket /dev/null --haproxy-config /marathon-lb/haproxy.cfg -c sv reload /marathon-lb/service/haproxy --sse --marathon http://mesos-master-hzfk:8080 --marathon http://mesos-master-5vzd:8080 --marathon http://mesos-master-4fb0:8080 --dont-bind-http-https --group *
    root        20    18  0 16:59 ?        00:00:09 /bin/bash ./run
    root        35     1  1 16:59 ?        00:01:37 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 15766
    root      1828     1  0 17:08 ?        00:00:03 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 1272
    root      2134     1  0 17:09 ?        00:00:37 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 2098
    root      5367     1  0 17:28 ?        00:00:06 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 5314
    root     11608     1  1 18:06 ?        00:01:11 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 9963
    root     22644     1  0 19:13 ?        00:00:01 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 22591
    root     22706     1  0 19:13 ?        00:00:01 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 22644
    root     22808     1  0 19:13 ?        00:00:00 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 22742
    root     22974     1  0 19:14 ?        00:00:03 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 22910
    root     23179     1  3 19:15 ?        00:00:34 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 22974
    root     25675    20  0 19:32 ?        00:00:00 sleep 0.5
    root     25676     0  0 19:32 ?        00:00:00 ps -ef
mesos-slave-j5q6:
    UID        PID  PPID  C STIME TTY          TIME CMD
    root         1     0  0 16:59 ?        00:00:00 /bin/bash /marathon-lb/run sse --marathon http://mesos-master-hzfk:8080 --marathon http://mesos-master-5vzd:8080 --marathon http://mesos-master-4fb0:8080 --dont-bind-http-https --group *
    root        18     1  0 16:59 ?        00:00:00 /usr/bin/runsv /marathon-lb/service/haproxy
    root        19     1  9 16:59 ?        00:14:11 python3 /marathon-lb/marathon_lb.py --syslog-socket /dev/null --haproxy-config /marathon-lb/haproxy.cfg -c sv reload /marathon-lb/service/haproxy --sse --marathon http://mesos-master-hzfk:8080 --marathon http://mesos-master-5vzd:8080 --marathon http://mesos-master-4fb0:8080 --dont-bind-http-https --group *
    root        20    18  0 16:59 ?        00:00:10 /bin/bash ./run
    root     11597     1  1 18:06 ?        00:01:43 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 9957
    root     19976     1  1 18:56 ?        00:00:35 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 19929
    root     23236     1  3 19:16 ?        00:00:37 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 22993
    root     25688    20  0 19:32 ?        00:00:00 sleep 0.5
    root     25689     0  0 19:32 ?        00:00:00 ps -ef
mesos-slave-12r1:
    UID        PID  PPID  C STIME TTY          TIME CMD
    root         1     0  0 19:31 ?        00:00:00 /bin/bash /marathon-lb/run sse --marathon http://mesos-master-hzfk:8080 --marathon http://mesos-master-5vzd:8080 --marathon http://mesos-master-4fb0:8080 --dont-bind-http-https --group *
    root        18     1  0 19:31 ?        00:00:00 /usr/bin/runsv /marathon-lb/service/haproxy
    root        19     1  5 19:31 ?        00:00:05 python3 /marathon-lb/marathon_lb.py --syslog-socket /dev/null --haproxy-config /marathon-lb/haproxy.cfg -c sv reload /marathon-lb/service/haproxy --sse --marathon http://mesos-master-hzfk:8080 --marathon http://mesos-master-5vzd:8080 --marathon http://mesos-master-4fb0:8080 --dont-bind-http-https --group *
    root        20    18  0 19:31 ?        00:00:00 /bin/bash ./run
    root        51     1  4 19:31 ?        00:00:04 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf
    root       336    20  0 19:32 ?        00:00:00 sleep 0.5
    root       337     0  0 19:32 ?        00:00:00 ps -ef
mesos-slave-bsww:
    UID        PID  PPID  C STIME TTY          TIME CMD
    root         1     0  0 16:59 ?        00:00:00 /bin/bash /marathon-lb/run sse --marathon http://mesos-master-hzfk:8080 --marathon http://mesos-master-5vzd:8080 --marathon http://mesos-master-4fb0:8080 --dont-bind-http-https --group *
    root        18     1  0 16:59 ?        00:00:00 /usr/bin/runsv /marathon-lb/service/haproxy
    root        19     1  9 16:59 ?        00:14:28 python3 /marathon-lb/marathon_lb.py --syslog-socket /dev/null --haproxy-config /marathon-lb/haproxy.cfg -c sv reload /marathon-lb/service/haproxy --sse --marathon http://mesos-master-hzfk:8080 --marathon http://mesos-master-5vzd:8080 --marathon http://mesos-master-4fb0:8080 --dont-bind-http-https --group *
    root        20    18  0 16:59 ?        00:00:11 /bin/bash ./run
    root        35     1  1 16:59 ?        00:02:11 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 15713
    root     17424     1  2 18:43 ?        00:01:03 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 17384
    root     18855     1  0 18:52 ?        00:00:02 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 18805
    root     22750     1  0 19:13 ?        00:00:02 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 22709
    root     22816     1  0 19:13 ?        00:00:00 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 22750
    root     22980     1  0 19:14 ?        00:00:05 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 22914
    root     23233     1  4 19:16 ?        00:00:49 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 22980
    root     25673    20  0 19:32 ?        00:00:00 sleep 0.5
    root     25674     0  0 19:32 ?        00:00:00 ps -ef
mesos-slave-5o6w:
    UID        PID  PPID  C STIME TTY          TIME CMD
    root         1     0  0 16:59 ?        00:00:00 /bin/bash /marathon-lb/run sse --marathon http://mesos-master-hzfk:8080 --marathon http://mesos-master-5vzd:8080 --marathon http://mesos-master-4fb0:8080 --dont-bind-http-https --group *
    root        18     1  0 16:59 ?        00:00:00 /usr/bin/runsv /marathon-lb/service/haproxy
    root        19     1  8 16:59 ?        00:13:44 python3 /marathon-lb/marathon_lb.py --syslog-socket /dev/null --haproxy-config /marathon-lb/haproxy.cfg -c sv reload /marathon-lb/service/haproxy --sse --marathon http://mesos-master-hzfk:8080 --marathon http://mesos-master-5vzd:8080 --marathon http://mesos-master-4fb0:8080 --dont-bind-http-https --group *
    root        20    18  0 16:59 ?        00:00:11 /bin/bash ./run
    root     11583     1  2 18:06 ?        00:01:45 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 9948
    root     17282     1  0 18:43 ?        00:00:08 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 11583
    root     17315     1  0 18:43 ?        00:00:04 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 17282
    root     17344     1  0 18:43 ?        00:00:04 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 17315
    root     22826     1  0 19:14 ?        00:00:01 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 22788
    root     22886     1  0 19:14 ?        00:00:02 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 22826
    root     22954     1  0 19:14 ?        00:00:04 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 22886
    root     23185     1  4 19:16 ?        00:00:40 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 22954
    root     25649     0  8 19:32 ?        00:00:00 ps -ef
    root     25659    20  0 19:32 ?        00:00:00 sleep 0.5
mesos-slave-pj18:
    UID        PID  PPID  C STIME TTY          TIME CMD
    root         1     0  0 16:59 ?        00:00:00 /bin/bash /marathon-lb/run sse --marathon http://mesos-master-hzfk:8080 --marathon http://mesos-master-5vzd:8080 --marathon http://mesos-master-4fb0:8080 --dont-bind-http-https --group *
    root        18     1  0 16:59 ?        00:00:00 /usr/bin/runsv /marathon-lb/service/haproxy
    root        19     1  8 16:59 ?        00:13:12 python3 /marathon-lb/marathon_lb.py --syslog-socket /dev/null --haproxy-config /marathon-lb/haproxy.cfg -c sv reload /marathon-lb/service/haproxy --sse --marathon http://mesos-master-hzfk:8080 --marathon http://mesos-master-5vzd:8080 --marathon http://mesos-master-4fb0:8080 --dont-bind-http-https --group *
    root        20    18  0 16:59 ?        00:00:12 /bin/bash ./run
    root        34     1  0 16:59 ?        00:00:36 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 15462
    root     17430     1  0 18:43 ?        00:00:11 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 17395
    root     23275     1  3 19:16 ?        00:00:39 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 23023
    root     25712    20  0 19:32 ?        00:00:00 sleep 0.5
    root     25713     0  0 19:32 ?        00:00:00 ps -ef
mesos-slave-4t3b:
    UID        PID  PPID  C STIME TTY          TIME CMD
    root         1     0  0 16:59 ?        00:00:00 /bin/bash /marathon-lb/run sse --marathon http://mesos-master-hzfk:8080 --marathon http://mesos-master-5vzd:8080 --marathon http://mesos-master-4fb0:8080 --dont-bind-http-https --group *
    root        18     1  0 16:59 ?        00:00:00 /usr/bin/runsv /marathon-lb/service/haproxy
    root        19     1  9 16:59 ?        00:15:06 python3 /marathon-lb/marathon_lb.py --syslog-socket /dev/null --haproxy-config /marathon-lb/haproxy.cfg -c sv reload /marathon-lb/service/haproxy --sse --marathon http://mesos-master-hzfk:8080 --marathon http://mesos-master-5vzd:8080 --marathon http://mesos-master-4fb0:8080 --dont-bind-http-https --group *
    root        20    18  0 16:59 ?        00:00:13 /bin/bash ./run
    root        34     1  0 16:59 ?        00:00:29 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 15336
    root     17357     1  0 18:43 ?        00:00:11 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 17330
    root     19965     1  1 18:56 ?        00:00:41 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 19915
    root     23209     1  4 19:16 ?        00:00:41 haproxy -p /tmp/haproxy.pid -f /marathon-lb/haproxy.cfg -D -sf 22954
    root     25642    20  0 19:32 ?        00:00:00 sleep 0.5
    root     25643     0  0 19:32 ?        00:00:00 ps -ef

@brndnmtthws
Copy link
Contributor

Strange. Are you sure there are no long-lived TCP connections? Are you using anything like websockets?

@ejether
Copy link
Author

ejether commented Feb 26, 2016

I know that there are long running connections and yes, that is probably the real source of our problem. I'll be doing work to eliminate those long running connections from being proxied through marathon-lb today. In general, do you have any suggestions for proxying long running connections with mesos/marathon?

@malterb
Copy link

malterb commented Mar 16, 2016

I am seeing similar issues at QubitProducts/bamboo#200

Have you found a solution to stale processes? There should not be any long running connections in our setup.

@ejether
Copy link
Author

ejether commented Mar 16, 2016

We upgraded marathon and marathon-lb and removed long running connections from marathon and marathon-lb. Since then the number of problems related to stale connections has droped to almost zero. Once in a while, we will get some 502 errors from haproxy. Usually this coincided with a flapping service in marathon.

I think if you can slow down the rate of reconfigurations you are likely to avoid issues. Good luck!

brndnmtthws added a commit that referenced this issue Sep 26, 2016
This is to address issues #5, #71, #267, #276, and #318.
brndnmtthws added a commit that referenced this issue Sep 26, 2016
This is to address issues #5, #71, #267, #276, and #318.
brndnmtthws added a commit that referenced this issue Sep 26, 2016
This is to address issues #5, #71, #267, #276, and #318.
brndnmtthws added a commit that referenced this issue Sep 29, 2016
This is to address issues #5, #71, #267, #276, and #318.
brndnmtthws added a commit that referenced this issue Sep 29, 2016
This is to address issues #5, #71, #267, #276, and #318.
brndnmtthws added a commit that referenced this issue Sep 29, 2016
This is to address issues #5, #71, #267, #276, and #318.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants