Skip to content

Memory leak & haproxy.cfg stopped syncing #312

@aaronbbrown

Description

@aaronbbrown

👋🏻 from GitHub

Today, we ran into an issue where the controller started began leaking memory and then stopped updating /etc/haproxy/haproxy.cfg leading to a situation where haproxy had an incorrect set of backends. Memory usage had a spike right around the time haproxy.cfg stopped updating and then kept slowly creeping up.

image
The light green line is the problematic Pod. Red is the CPU request, and the other lines are the other Pods from this Deployment of the ingress controller which were functioning normally.

Attached is an SVG flame graph of the process captured with perf several hours after the initial spike, while the memory was staircasing its way up as well as a stack trace captured by sending the process SIGQUIT.

Kubernetes Deployment YAML
---
# Source: kubernetes-ingress/templates/controller-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: ingress-unicorn-api-kubernetes-ingress
  namespace: haproxytech-ingress
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
    app.kubernetes.io/name: kubernetes-ingress
    helm.sh/chart: kubernetes-ingress-1.14.2
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/instance: ingress-unicorn-api
    app.kubernetes.io/version: 1.6.0
data:
  syslog-server: address:stdout, facility:daemon, format:raw, 
  nbthread: "1"
  timeout-client: 150s
  timeout-client-fin: 5s
  timeout-connect: 100ms
  timeout-queue: 90s
  timeout-server: 150s
  timeout-server-fin: 5s
  timeout-tunnel: 150s


---
# Source: kubernetes-ingress/templates/controller-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ingress-unicorn-api-kubernetes-ingress
  namespace: haproxytech-ingress
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
    app.kubernetes.io/name: kubernetes-ingress
    helm.sh/chart: kubernetes-ingress-1.14.2
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/instance: ingress-unicorn-api
    app.kubernetes.io/version: 1.6.0
spec:
  replicas: 4
  selector:
    matchLabels:
      app.kubernetes.io/name: kubernetes-ingress
      app.kubernetes.io/instance: ingress-unicorn-api
  template:
    metadata:
      labels:
        app.kubernetes.io/name: kubernetes-ingress
        app.kubernetes.io/instance: ingress-unicorn-api
    spec:
      serviceAccountName: ingress-unicorn-api-kubernetes-ingress
      terminationGracePeriodSeconds: 60
      dnsPolicy: ClusterFirst
      priorityClassName: system-cluster-critical

      # The helm chart doesn't have a configurable for securityContext, so this is manually added
      securityContext:
        sysctls:
        - name: net.core.somaxconn
          value: "10240"
        - name: net.ipv4.tcp_tw_reuse
          value: "1"

      containers:
        - name: kubernetes-ingress-controller
          image: "haproxytech/kubernetes-ingress:1.6.0"
          imagePullPolicy: IfNotPresent
          args:
          - --configmap=haproxytech-ingress/ingress-unicorn-api-kubernetes-ingress
          - --ingress.class=haproxytech-unicorn-api
          - --publish-service=haproxytech-ingress/ingress-unicorn-api-kubernetes-ingress
          - --log=info
          - --configmap-tcp-services=haproxytech-ingress/tcpservices-unicorn-api
          - --disable-http
          - --disable-https
          - --disable-ipv6
          ports:
            - name: stat
              containerPort: 1024
              protocol: TCP
            - name: http-tcp
              containerPort: 8080
              protocol: TCP
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /
              port: 1024
              scheme: HTTP
            initialDelaySeconds: 5
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
          startupProbe:
            failureThreshold: 20
            httpGet:
              path: /healthz
              port: 1042
              scheme: HTTP
            initialDelaySeconds: 0
            periodSeconds: 1
            successThreshold: 1
            timeoutSeconds: 1
          env:
          - name: POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          - name: GOMAXPROCS
            value: "1"
          resources:
            requests:
              cpu: 1
              memory: 256Mi
          lifecycle:
            preStop:
              exec:
                command:
                - /bin/sh
                - -c
                - sleep 5
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app.kubernetes.io/name
                operator: In
                values:
                - kubernetes-ingress
            topologyKey: kubernetes.io/hostname

Versions:

  • Ingress controller v1.6.0
  • Kubernetes 1.20.2
  • Kernel 4.19

❤️ for any help you can provide/suggestions. Happy to provide more information.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions