Memory leak & haproxy.cfg stopped syncing

👋🏻 from GitHub

Today, we ran into an issue where the controller started began leaking memory and then stopped updating `/etc/haproxy/haproxy.cfg` leading to a situation where haproxy had an incorrect set of backends. Memory usage had a spike right around the time `haproxy.cfg` stopped updating and then kept slowly creeping up.

![image](https://user-images.githubusercontent.com/469234/117711414-3a3f2580-b1a1-11eb-81fe-10255eee8e90.png)
The light green line is the problematic Pod. Red is the CPU request, and the other lines are the other Pods from this Deployment of the ingress controller which were functioning normally.

Attached is an SVG flame graph of the process captured with `perf` several hours after the initial spike, while the memory was staircasing its way up as well as a stack trace captured by sending the process `SIGQUIT`.

- [stacktrace.log](https://github.com/haproxytech/kubernetes-ingress/files/6454449/stacktrace.log)
- [controller.svg.gz](https://github.com/haproxytech/kubernetes-ingress/files/6454456/controller.svg.gz)

<details><summary>Kubernetes Deployment YAML</summary>

```yaml
---
# Source: kubernetes-ingress/templates/controller-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: ingress-unicorn-api-kubernetes-ingress
  namespace: haproxytech-ingress
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
    app.kubernetes.io/name: kubernetes-ingress
    helm.sh/chart: kubernetes-ingress-1.14.2
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/instance: ingress-unicorn-api
    app.kubernetes.io/version: 1.6.0
data:
  syslog-server: address:stdout, facility:daemon, format:raw, 
  nbthread: "1"
  timeout-client: 150s
  timeout-client-fin: 5s
  timeout-connect: 100ms
  timeout-queue: 90s
  timeout-server: 150s
  timeout-server-fin: 5s
  timeout-tunnel: 150s


---
# Source: kubernetes-ingress/templates/controller-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ingress-unicorn-api-kubernetes-ingress
  namespace: haproxytech-ingress
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
    app.kubernetes.io/name: kubernetes-ingress
    helm.sh/chart: kubernetes-ingress-1.14.2
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/instance: ingress-unicorn-api
    app.kubernetes.io/version: 1.6.0
spec:
  replicas: 4
  selector:
    matchLabels:
      app.kubernetes.io/name: kubernetes-ingress
      app.kubernetes.io/instance: ingress-unicorn-api
  template:
    metadata:
      labels:
        app.kubernetes.io/name: kubernetes-ingress
        app.kubernetes.io/instance: ingress-unicorn-api
    spec:
      serviceAccountName: ingress-unicorn-api-kubernetes-ingress
      terminationGracePeriodSeconds: 60
      dnsPolicy: ClusterFirst
      priorityClassName: system-cluster-critical

      # The helm chart doesn't have a configurable for securityContext, so this is manually added
      securityContext:
        sysctls:
        - name: net.core.somaxconn
          value: "10240"
        - name: net.ipv4.tcp_tw_reuse
          value: "1"

      containers:
        - name: kubernetes-ingress-controller
          image: "haproxytech/kubernetes-ingress:1.6.0"
          imagePullPolicy: IfNotPresent
          args:
          - --configmap=haproxytech-ingress/ingress-unicorn-api-kubernetes-ingress
          - --ingress.class=haproxytech-unicorn-api
          - --publish-service=haproxytech-ingress/ingress-unicorn-api-kubernetes-ingress
          - --log=info
          - --configmap-tcp-services=haproxytech-ingress/tcpservices-unicorn-api
          - --disable-http
          - --disable-https
          - --disable-ipv6
          ports:
            - name: stat
              containerPort: 1024
              protocol: TCP
            - name: http-tcp
              containerPort: 8080
              protocol: TCP
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /
              port: 1024
              scheme: HTTP
            initialDelaySeconds: 5
            periodSeconds: 10
            successThreshold: 1
            timeoutSeconds: 1
          startupProbe:
            failureThreshold: 20
            httpGet:
              path: /healthz
              port: 1042
              scheme: HTTP
            initialDelaySeconds: 0
            periodSeconds: 1
            successThreshold: 1
            timeoutSeconds: 1
          env:
          - name: POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          - name: GOMAXPROCS
            value: "1"
          resources:
            requests:
              cpu: 1
              memory: 256Mi
          lifecycle:
            preStop:
              exec:
                command:
                - /bin/sh
                - -c
                - sleep 5
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app.kubernetes.io/name
                operator: In
                values:
                - kubernetes-ingress
            topologyKey: kubernetes.io/hostname

```

</details>

Versions:
- Ingress controller v1.6.0
- Kubernetes 1.20.2
- Kernel 4.19

❤️ for any help you can provide/suggestions.  Happy to provide more information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory leak & haproxy.cfg stopped syncing #312

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory leak & haproxy.cfg stopped syncing #312

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions