Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Envoy Crashes at 300+ req/s using hot-restarter and systemd #2685

Closed
phanama opened this issue Feb 28, 2018 · 4 comments
Closed

Envoy Crashes at 300+ req/s using hot-restarter and systemd #2685

phanama opened this issue Feb 28, 2018 · 4 comments
Labels

Comments

@phanama
Copy link

phanama commented Feb 28, 2018

Description:
I created a systemd service to call hot-restarter.py to start envoy. Load tested it with hundreds of RPS.
Envoy crashed.

Envoy version: 1.5.0

Repro steps:
Enable systemd
Start envoy.service using systemd
Give it loads of traffic

My envoy.service:

[Unit]
Description=Envoy meeeen
After=network.target
[Service]
User=root
Type=simple
ExecStart=/etc/envoy/hot-restarter.py /etc/envoy/start-envoy.sh
ExecStartPre=/etc/envoy/check_envoy.sh
ExecReload=/etc/envoy/reload_envoy.sh $MAINPID
ExecStop=/bin/kill -15 $MAINPID
TimeoutStopSec=10
KillMode=process
[Install]
WantedBy=multi-user.target

start-envoy.sh:

#!/bin/bash
set -e
/usr/sbin/envoy -c /etc/envoy/config.yaml --mode validate --base-id 6969;
if [ ! $? ]; then
exit 1;
fi
exec /usr/sbin/envoy -c /etc/envoy/config.yaml --restart-epoch $RESTART_EPOCH

check_envoy.sh:

#!/bin/bash
set -e
if [ -s /etc/envoy/config.yaml ]; then
/usr/sbin/envoy -c /etc/envoy/config.yaml --mode validate;
else
echo "File /etc/envoy/config.yaml is empty!"
exit 1;
fi

reload_envoy.sh:

#!/bin/bash
set -e
export MAIN_PID=$1
/usr/sbin/envoy -c /etc/envoy/config.yaml --mode validate --base-id 6969;
kill -1 $MAINPID;

Config:
envoy.yaml:

static_resources:
listeners:

  • address: #http-address
    socket_address:
    address: 0.0.0.0
    port_value: 80
    filter_chains:
    • filters:
      • name: envoy.http_connection_manager
        config:
        codec_type: AUTO
        stat_prefix: ingress_http
        access_log:
        • name: envoy.file_access_log
          config:
          path: /var/log/envoy/http-access.log
          http_filters:
        • name: envoy.router
          route_config:
          virtual_hosts: #http-hosts
          • name: redirect-https
            require_tls: all
            domains:
            • example.com
          • name: example
            domains:
            • example.com
              routes:
            • match:
              prefix: ""
              route:
              cluster: example
  • address: #https-address
    socket_address:
    address: 0.0.0.0
    port_value: 443
    filter_chains:
    • filters:
      • name: envoy.http_connection_manager
        config:
        codec_type: AUTO
        stat_prefix: ingress_http
        access_log:
        • name: envoy.file_access_log
          config:
          path: /var/log/envoy/http-access.log
          http_filters:
        • name: envoy.router
          route_config:
          virtual_hosts: #https-hosts
          • name: example
            domains:
            • example.com
              routes:
            • match:
              prefix: ""
              route:
              cluster: example
              clusters:
  • name: example
    type: STRICT_DNS
    connect_timeout:
    seconds: 60
    nanos: 0
    lb_policy: ROUND_ROBIN
    hosts:
    • socket_address:
      address: example-backend.com
      port_value: 80
      admin:
      access_log_path: /dev/null
      address:
      socket_address:
      address: 0.0.0.0
      port_value: 8001

Call Stack: (syslog)

Feb 28 07:53:18 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:53:18.281][6748][info][config] source/server/listener_manager_impl.cc:482] all dependencies initialized. starting workers
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.610][6751][critical][assert] source/common/network/address_impl.cc:112] assert failure: fd != -1
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.611][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:101] Caught Aborted, suspect fault
ing address 0x1a5c
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.611][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:85] Backtrace obj</lib/x86_64-linu
x-gnu/libc.so.6> thr<6751> (use tools/stack_decode.py):
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.612][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #0 0x7fbbd94a2428
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.612][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #1 0x7fbbd94a4029
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.612][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:93] thr<6751> obj</usr/sbin/envoy>
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.612][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #2 0x9acf51
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.612][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #3 0x9ad503
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.613][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #4 0x6fec76
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.613][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #5 0x5f1e6c
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.613][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #6 0x691149
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.613][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #7 0x690f50
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.613][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #8 0x684d42
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.613][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #9 0x68320c
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.614][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #10 0x683534
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.614][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #11 0x8898a4
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.614][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #12 0x885fbb
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.614][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #13 0x77cac1
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.614][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #14 0x77c225
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.614][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #15 0x7a58fd
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #16 0x7a3908
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #17 0x7a392c
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #18 0x7aeae7
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #19 0x7a43e0
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #20 0x7a42ca
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #21 0x779a66
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #22 0x703589
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #23 0x703605
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #24 0x6fd5a9
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #25 0x6fe15f
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #26 0x6fdf2a
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.615][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #27 0x6fbf28
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #28 0x6ff269
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #29 0x5f84ed
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #30 0x5f74fd
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #31 0x5f752d
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #32 0xa344d1
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #33 0xa34c2e
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #34 0x5f28c7
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #35 0x5e5007
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #36 0x5e4b97
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #37 0x5e56e6
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #38 0x4a1d31
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #39 0xa3eb9f
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #40 0xa3ebc4
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:93] thr<6751> obj</lib/x86_64-linux-gnu/libpthread.so.0>
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #41 0x7fbbd9b476b9
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:93] thr<6751> obj</lib/x86_64-linux-gnu/libc.so.6>
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:95] thr<6751> #42 0x7fbbd957441c
Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.616][6751][critical][backtrace] bazel-out/local-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:97] end backtrace thread 6751

@dio
Copy link
Member

dio commented Feb 28, 2018

Feb 28 07:55:33 envoy-machine hot-restarter.py[6744]: [2018-02-28 07:55:33.610][6751][critical][assert] source/common/network/address_impl.cc:112] assert failure: fd != -1

Seems it is related to your ulimit settings?

@phanama
Copy link
Author

phanama commented Feb 28, 2018

ulimit -n
65536

That's the ulimit setting.
Is there any requirement from envoy to bump it up?

@mattklein123
Copy link
Member

@yudiandreanp can you provide a core dump or a fully resolved stack trace if you can repro this? It's hard to tell what is happening from the report.

@phanama
Copy link
Author

phanama commented Feb 28, 2018

It turned out that it really is an open file limit problem.
Systemd doesn't respect global ulimit cofig on /etc/security/security.conf and has its own defaults

I have to add

LimitNOFILE=65536

in the systemd [Service] section to bump its limit up

That resolved the problem. Thanks!

@phanama phanama closed this as completed Feb 28, 2018
Shikugawa pushed a commit to Shikugawa/envoy that referenced this issue Mar 28, 2020
* Add root and vm id for access log filter

Signed-off-by: gargnupur <gargnupur@google.com>

* Check gRPC status too

Signed-off-by: gargnupur <gargnupur@google.com>

* Fix test

Signed-off-by: gargnupur <gargnupur@google.com>

* Fix test

Signed-off-by: gargnupur <gargnupur@google.com>

* Fix test

Signed-off-by: gargnupur <gargnupur@google.com>

* Fixed based on feedback

Signed-off-by: gargnupur <gargnupur@google.com>

* Fixed based on feedback

Signed-off-by: gargnupur <gargnupur@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants