Uneven Load Distribution Across Envoy Pods in Kubernetes During Stress Testing #33024

jaosn60810 · 2024-03-21T07:36:19Z

Questions:

What could be the possible reasons for this uneven load distribution across the Envoy pods specifically during stress testing with K6?
How should we adjust our configuration or setup to ensure even load distribution during high traffic conditions prompted by stress testing?

We have set up a Kubernetes (k8s) environment with 2 pods that communicate via gRPC. For load balancing, we've deployed Envoy with 2 replicas, aiming to distribute the traffic evenly between these pods, following the architecture similar to the one described in a GCP article [1].

Everything works as expected under normal conditions. However, when we conduct stress tests using K6 and monitor resource usage, we observe that only one of the Envoy pods is experiencing significant resource consumption. The other Envoy pod shows minimal resource usage, almost as if it's not handling traffic at all. It's important to note that this uneven load distribution issue only occurs during K6 stress testing and is not observed under normal operation conditions.

This uneven load distribution is unexpected and seems to undermine our load-balancing setup. We have attached our Envoy configuration file below for reference.

Relevant Details:

Envoy version: latest
Configuration file:

apiVersion: v1
kind: ConfigMap
metadata:
  name: envoy-conf
data:
  envoy.yaml:  |
    admin:
      access_log_path: /dev/null
      address:
        socket_address: 
          protocol: TCP
          address: 0.0.0.0
          port_value: 8090

    static_resources:
      listeners:
      - name: login_server
        address:
          socket_address: 
            protocol: TCP
            address: 0.0.0.0
            port_value: 5000
        filter_chains:
        - filters:
          - name: envoy.http_connection_manager
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
              access_log:
              - name: envoy.access_loggers.stdout
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
              stat_prefix: ingress_http              
              route_config:
                name: local_route
                virtual_hosts:
                - name: default # ingress check
                  domains: ["*"]
                  routes:
                  - match: {path: "/healthz"}
                    direct_response:
                      status: 200
                  - match: { prefix: "/" }
                    route: { cluster: login_server_cluster }
              http_filters:
              - name: envoy.filters.http.health_check
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.health_check.v3.HealthCheck
                  pass_through_mode: false
                  headers:
                  - name: ":path"
                    exact_match: "/healthz"
                  - name: "x-envoy-livenessprobe"
                    exact_match: "healthz"
              - name: envoy.filters.http.router
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
      - name: notification_server
        address:
          socket_address: 
            protocol: TCP
            address: 0.0.0.0
            port_value: 5001
        filter_chains:
        - filters:
          - name: envoy.http_connection_manager
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
              access_log:
              - name: envoy.access_loggers.stdout
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog              
              stat_prefix: ingress_http
              route_config:
                name: local_route
                virtual_hosts:
                - name: default # ingress check
                  domains: ["*"]
                  routes:
                  - match: {path: "/healthz"}
                    direct_response:
                      status: 200
                  - match: { prefix: "/" }
                    route: { cluster: notification_server_cluster }
              http_filters:
              - name: envoy.filters.http.health_check
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.health_check.v3.HealthCheck
                  pass_through_mode: false
                  headers:
                  - name: ":path"
                    exact_match: "/healthz"
                  - name: "x-envoy-livenessprobe"
                    exact_match: "healthz"
              - name: envoy.filters.http.router
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
      - name: recharge_server
        address:
          socket_address: 
            protocol: TCP
            address: 0.0.0.0
            port_value: 5002
        filter_chains:
        - filters:
          - name: envoy.http_connection_manager
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
              access_log:
              - name: envoy.access_loggers.stdout
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
              stat_prefix: ingress_http
              route_config:
                name: local_route
                virtual_hosts:
                - name: default # ingress check
                  domains: ["*"]
                  routes:
                  - match: {path: "/healthz"}
                    direct_response:
                      status: 200
                  - match: { prefix: "/" }
                    route: { cluster: recharge_server_cluster }
              http_filters:
              - name: envoy.filters.http.health_check
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.health_check.v3.HealthCheck
                  pass_through_mode: false
                  headers:
                  - name: ":path"
                    exact_match: "/healthz"
                  - name: "x-envoy-livenessprobe"
                    exact_match: "healthz"
              - name: envoy.filters.http.router
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
      - name: payment_server
        address:
          socket_address: 
            protocol: TCP
            address: 0.0.0.0
            port_value: 5003
        filter_chains:
        - filters:
          - name: envoy.http_connection_manager
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
              access_log:
              - name: envoy.access_loggers.stdout
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog              
              stat_prefix: ingress_http
              route_config:
                name: local_route
                virtual_hosts:
                - name: default # ingress check
                  domains: ["*"]
                  routes:
                  - match: {path: "/healthz"}
                    direct_response:
                      status: 200
                  - match: { prefix: "/" }
                    route: { cluster: payment_server_cluster }
              http_filters:
              - name: envoy.filters.http.health_check
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.health_check.v3.HealthCheck
                  pass_through_mode: false
                  headers:
                  - name: ":path"
                    exact_match: "/healthz"
                  - name: "x-envoy-livenessprobe"
                    exact_match: "healthz"
              - name: envoy.filters.http.router
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
      - name: check
        address:
          socket_address: 
            protocol: TCP
            address: 0.0.0.0
            port_value: 5100
        filter_chains:
        - filters:
          - name: envoy.http_connection_manager
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
              # access_log:
              # - name: envoy.access_loggers.stdout
              #   typed_config:
              #     "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog              
              stat_prefix: ingress_http
              route_config:
                name: local_route
                virtual_hosts:
                - name: default # ingress check
                  domains: ["*"]
                  routes:
                  - match: {path: "/healthz"}
                    direct_response:
                      status: 200
              http_filters:
              - name: envoy.filters.http.health_check
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.health_check.v3.HealthCheck
                  pass_through_mode: false
                  headers:
                  - name: ":path"
                    exact_match: "/healthz"
                  - name: "x-envoy-livenessprobe"
                    exact_match: "healthz"
              - name: envoy.filters.http.router
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

      clusters:
      - name: login_server_cluster
        connect_timeout: 0.5s
        type: STRICT_DNS
        lb_policy: ROUND_ROBIN
        http2_protocol_options: {}
        dns_lookup_family: V4_ONLY
        load_assignment:
          cluster_name: login_server_cluster
          endpoints:
          - lb_endpoints:
            - endpoint:
                address:
                  socket_address:
                    address: envoy-login-server-service
                    port_value: 5000
      - name: notification_server_cluster
        connect_timeout: 0.25s
        type: STRICT_DNS
        lb_policy: ROUND_ROBIN
        http2_protocol_options: {}
        dns_lookup_family: V4_ONLY
        load_assignment:
          cluster_name: notification_server_cluster
          endpoints:
          - lb_endpoints:
            - endpoint:
                address:
                  socket_address:
                    address: envoy-notification-server-service
                    port_value: 5001
      - name: recharge_server_cluster
        connect_timeout: 0.25s
        type: STRICT_DNS
        lb_policy: ROUND_ROBIN
        http2_protocol_options: {}
        dns_lookup_family: V4_ONLY
        load_assignment:
          cluster_name: recharge_server_cluster
          endpoints:
          - lb_endpoints:
            - endpoint:
                address:
                  socket_address:
                    address: envoy-recharge-server-service
                    port_value: 5002
      - name: payment_server_cluster
        connect_timeout: 0.25s
        type: STRICT_DNS
        lb_policy: ROUND_ROBIN
        http2_protocol_options: {}
        dns_lookup_family: V4_ONLY
        load_assignment:
          cluster_name: payment_server_cluster
          endpoints:
          - lb_endpoints:
            - endpoint:
                address:
                  socket_address:
                    address: envoy-payment-server-service
                    port_value: 5003

Stress testing tool: K6

Thank you for your assistance in resolving this issue.

htuch · 2024-03-22T04:50:47Z

I think the issue here is likely less about Envoy behavior and more related to setup of the pass-thru network LB on GKE / k8s.

github-actions · 2024-04-21T08:01:08Z

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions · 2024-04-28T08:01:27Z

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

jaosn60810 added the triage Issue requires triage label Mar 21, 2024

htuch added question Questions that are neither investigations, bugs, nor enhancements area/perf and removed triage Issue requires triage labels Mar 22, 2024

github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Apr 21, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uneven Load Distribution Across Envoy Pods in Kubernetes During Stress Testing #33024

Uneven Load Distribution Across Envoy Pods in Kubernetes During Stress Testing #33024

jaosn60810 commented Mar 21, 2024 •

edited

Loading

htuch commented Mar 22, 2024

github-actions bot commented Apr 21, 2024

github-actions bot commented Apr 28, 2024

Uneven Load Distribution Across Envoy Pods in Kubernetes During Stress Testing #33024

Uneven Load Distribution Across Envoy Pods in Kubernetes During Stress Testing #33024

Comments

jaosn60810 commented Mar 21, 2024 • edited Loading

htuch commented Mar 22, 2024

github-actions bot commented Apr 21, 2024

github-actions bot commented Apr 28, 2024

jaosn60810 commented Mar 21, 2024 •

edited

Loading