Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uneven Load Distribution Across Envoy Pods in Kubernetes During Stress Testing #33024

Closed
jaosn60810 opened this issue Mar 21, 2024 · 3 comments
Closed
Labels
area/perf question Questions that are neither investigations, bugs, nor enhancements stale stalebot believes this issue/PR has not been touched recently

Comments

@jaosn60810
Copy link

jaosn60810 commented Mar 21, 2024

Questions:

  1. What could be the possible reasons for this uneven load distribution across the Envoy pods specifically during stress testing with K6?
  2. How should we adjust our configuration or setup to ensure even load distribution during high traffic conditions prompted by stress testing?

We have set up a Kubernetes (k8s) environment with 2 pods that communicate via gRPC. For load balancing, we've deployed Envoy with 2 replicas, aiming to distribute the traffic evenly between these pods, following the architecture similar to the one described in a GCP article [1].

Everything works as expected under normal conditions. However, when we conduct stress tests using K6 and monitor resource usage, we observe that only one of the Envoy pods is experiencing significant resource consumption. The other Envoy pod shows minimal resource usage, almost as if it's not handling traffic at all. It's important to note that this uneven load distribution issue only occurs during K6 stress testing and is not observed under normal operation conditions.

This uneven load distribution is unexpected and seems to undermine our load-balancing setup. We have attached our Envoy configuration file below for reference.

Relevant Details:

  • Envoy version: latest
  • Configuration file:
apiVersion: v1
kind: ConfigMap
metadata:
  name: envoy-conf
data:
  envoy.yaml:  |
    admin:
      access_log_path: /dev/null
      address:
        socket_address: 
          protocol: TCP
          address: 0.0.0.0
          port_value: 8090

    static_resources:
      listeners:
      - name: login_server
        address:
          socket_address: 
            protocol: TCP
            address: 0.0.0.0
            port_value: 5000
        filter_chains:
        - filters:
          - name: envoy.http_connection_manager
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
              access_log:
              - name: envoy.access_loggers.stdout
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
              stat_prefix: ingress_http              
              route_config:
                name: local_route
                virtual_hosts:
                - name: default # ingress check
                  domains: ["*"]
                  routes:
                  - match: {path: "/healthz"}
                    direct_response:
                      status: 200
                  - match: { prefix: "/" }
                    route: { cluster: login_server_cluster }
              http_filters:
              - name: envoy.filters.http.health_check
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.health_check.v3.HealthCheck
                  pass_through_mode: false
                  headers:
                  - name: ":path"
                    exact_match: "/healthz"
                  - name: "x-envoy-livenessprobe"
                    exact_match: "healthz"
              - name: envoy.filters.http.router
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
      - name: notification_server
        address:
          socket_address: 
            protocol: TCP
            address: 0.0.0.0
            port_value: 5001
        filter_chains:
        - filters:
          - name: envoy.http_connection_manager
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
              access_log:
              - name: envoy.access_loggers.stdout
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog              
              stat_prefix: ingress_http
              route_config:
                name: local_route
                virtual_hosts:
                - name: default # ingress check
                  domains: ["*"]
                  routes:
                  - match: {path: "/healthz"}
                    direct_response:
                      status: 200
                  - match: { prefix: "/" }
                    route: { cluster: notification_server_cluster }
              http_filters:
              - name: envoy.filters.http.health_check
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.health_check.v3.HealthCheck
                  pass_through_mode: false
                  headers:
                  - name: ":path"
                    exact_match: "/healthz"
                  - name: "x-envoy-livenessprobe"
                    exact_match: "healthz"
              - name: envoy.filters.http.router
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
      - name: recharge_server
        address:
          socket_address: 
            protocol: TCP
            address: 0.0.0.0
            port_value: 5002
        filter_chains:
        - filters:
          - name: envoy.http_connection_manager
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
              access_log:
              - name: envoy.access_loggers.stdout
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
              stat_prefix: ingress_http
              route_config:
                name: local_route
                virtual_hosts:
                - name: default # ingress check
                  domains: ["*"]
                  routes:
                  - match: {path: "/healthz"}
                    direct_response:
                      status: 200
                  - match: { prefix: "/" }
                    route: { cluster: recharge_server_cluster }
              http_filters:
              - name: envoy.filters.http.health_check
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.health_check.v3.HealthCheck
                  pass_through_mode: false
                  headers:
                  - name: ":path"
                    exact_match: "/healthz"
                  - name: "x-envoy-livenessprobe"
                    exact_match: "healthz"
              - name: envoy.filters.http.router
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
      - name: payment_server
        address:
          socket_address: 
            protocol: TCP
            address: 0.0.0.0
            port_value: 5003
        filter_chains:
        - filters:
          - name: envoy.http_connection_manager
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
              access_log:
              - name: envoy.access_loggers.stdout
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog              
              stat_prefix: ingress_http
              route_config:
                name: local_route
                virtual_hosts:
                - name: default # ingress check
                  domains: ["*"]
                  routes:
                  - match: {path: "/healthz"}
                    direct_response:
                      status: 200
                  - match: { prefix: "/" }
                    route: { cluster: payment_server_cluster }
              http_filters:
              - name: envoy.filters.http.health_check
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.health_check.v3.HealthCheck
                  pass_through_mode: false
                  headers:
                  - name: ":path"
                    exact_match: "/healthz"
                  - name: "x-envoy-livenessprobe"
                    exact_match: "healthz"
              - name: envoy.filters.http.router
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
      - name: check
        address:
          socket_address: 
            protocol: TCP
            address: 0.0.0.0
            port_value: 5100
        filter_chains:
        - filters:
          - name: envoy.http_connection_manager
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
              # access_log:
              # - name: envoy.access_loggers.stdout
              #   typed_config:
              #     "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog              
              stat_prefix: ingress_http
              route_config:
                name: local_route
                virtual_hosts:
                - name: default # ingress check
                  domains: ["*"]
                  routes:
                  - match: {path: "/healthz"}
                    direct_response:
                      status: 200
              http_filters:
              - name: envoy.filters.http.health_check
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.health_check.v3.HealthCheck
                  pass_through_mode: false
                  headers:
                  - name: ":path"
                    exact_match: "/healthz"
                  - name: "x-envoy-livenessprobe"
                    exact_match: "healthz"
              - name: envoy.filters.http.router
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

      clusters:
      - name: login_server_cluster
        connect_timeout: 0.5s
        type: STRICT_DNS
        lb_policy: ROUND_ROBIN
        http2_protocol_options: {}
        dns_lookup_family: V4_ONLY
        load_assignment:
          cluster_name: login_server_cluster
          endpoints:
          - lb_endpoints:
            - endpoint:
                address:
                  socket_address:
                    address: envoy-login-server-service
                    port_value: 5000
      - name: notification_server_cluster
        connect_timeout: 0.25s
        type: STRICT_DNS
        lb_policy: ROUND_ROBIN
        http2_protocol_options: {}
        dns_lookup_family: V4_ONLY
        load_assignment:
          cluster_name: notification_server_cluster
          endpoints:
          - lb_endpoints:
            - endpoint:
                address:
                  socket_address:
                    address: envoy-notification-server-service
                    port_value: 5001
      - name: recharge_server_cluster
        connect_timeout: 0.25s
        type: STRICT_DNS
        lb_policy: ROUND_ROBIN
        http2_protocol_options: {}
        dns_lookup_family: V4_ONLY
        load_assignment:
          cluster_name: recharge_server_cluster
          endpoints:
          - lb_endpoints:
            - endpoint:
                address:
                  socket_address:
                    address: envoy-recharge-server-service
                    port_value: 5002
      - name: payment_server_cluster
        connect_timeout: 0.25s
        type: STRICT_DNS
        lb_policy: ROUND_ROBIN
        http2_protocol_options: {}
        dns_lookup_family: V4_ONLY
        load_assignment:
          cluster_name: payment_server_cluster
          endpoints:
          - lb_endpoints:
            - endpoint:
                address:
                  socket_address:
                    address: envoy-payment-server-service
                    port_value: 5003
  • Stress testing tool: K6

Thank you for your assistance in resolving this issue.

@jaosn60810 jaosn60810 added the triage Issue requires triage label Mar 21, 2024
@htuch
Copy link
Member

htuch commented Mar 22, 2024

I think the issue here is likely less about Envoy behavior and more related to setup of the pass-thru network LB on GKE / k8s.

@htuch htuch added question Questions that are neither investigations, bugs, nor enhancements area/perf and removed triage Issue requires triage labels Mar 22, 2024
Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Apr 21, 2024
Copy link

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/perf question Questions that are neither investigations, bugs, nor enhancements stale stalebot believes this issue/PR has not been touched recently
Projects
None yet
Development

No branches or pull requests

2 participants