Helm Chart - Probe Webhook Retry Logic #2872

eshaanm25 · 2023-07-12T19:08:08Z

What steps did you take and what happened:

The purpose of the Gatekeeper post-install webhook is to probe the Gatekeeper Webhook API for availability. The post-install webhook is designed to retry probe requests until a time defined in postInstall.probeWebhook.waitTimeout. However, the probe webhook exits early when the connection is refused because the Gatekeeper Webhook Pods are not listening on the service port yet.

During the installation of Gatekeeper, if the postInstall.labelNamespace flag is set to false and postInstall.probeWebhook is set to true, the installation of Gatekeeper may fail because the gatekeeper-probe-webhook-post-install Job will exit immediately rather than probing until the specified timeout. Once the Job fails 6 times, the Helm installation will fail because the Job has reached the default backoff limit.

What did you expect to happen:

The post-install webhook should continue to probe the Gatekeeper Webhook API even if the connection is refused because Gatekeeper Webhook Pods are not listening on the service port yet. In order to do this, the --retry--connrefused flag should be added to the probe webhook command.

Anything else you would like to add:

We should also consider setting the backoffLimit of the post-install webhook to 0. In its current state, the webhook will retry probe requests until a time defined in postInstall.probeWebhook.waitTimeout, but this will happen 6 times before a Helm Installation actually fails.

Images

Probe Webhook Retrying 6 times

Log Output of Probe Webhook

Helm Installation Faliure

PostInstall Helm Configurations

postInstall:
  labelNamespace:
    enabled: false
    extraRules: []
    image:
      repository: openpolicyagent/gatekeeper-crds
      tag: v3.12.0
      pullPolicy: IfNotPresent
      pullSecrets: []
    extraNamespaces: []
    podSecurity: ["pod-security.kubernetes.io/audit=restricted",
      "pod-security.kubernetes.io/audit-version=latest",
      "pod-security.kubernetes.io/warn=restricted",
      "pod-security.kubernetes.io/warn-version=latest",
      "pod-security.kubernetes.io/enforce=restricted",
      "pod-security.kubernetes.io/enforce-version=v1.24"]
    extraAnnotations: {}
  probeWebhook:
    enabled: true
    image:
      repository: curlimages/curl
      tag: 7.83.1
      pullPolicy: IfNotPresent
      pullSecrets: []
    waitTimeout: 60
    httpTimeout: 2
    insecureHTTPS: false
  affinity: {}
  tolerations: []
  nodeSelector: {kubernetes.io/os: linux}
  securityContext:
    allowPrivilegeEscalation: false
    capabilities:
      drop:
      - ALL
    readOnlyRootFilesystem: true
    runAsGroup: 999
    runAsNonRoot: true
    runAsUser: 1000

Environment:

Gatekeeper version: 3.12.0
Kubernetes version: v1.27.3
Helm Version: v.3.12.1

The text was updated successfully, but these errors were encountered:

eshaanm25 · 2023-07-12T19:08:26Z

PR Incoming!

eshaanm25 added the bug Something isn't working label Jul 12, 2023

eshaanm25 mentioned this issue Jul 12, 2023

fix: helm probe webhook retry logic #2873

Merged

ritazh closed this as completed in #2873 Jul 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Helm Chart - Probe Webhook Retry Logic #2872

Helm Chart - Probe Webhook Retry Logic #2872

eshaanm25 commented Jul 12, 2023 •

edited

eshaanm25 commented Jul 12, 2023

Helm Chart - Probe Webhook Retry Logic #2872

Helm Chart - Probe Webhook Retry Logic #2872

Comments

eshaanm25 commented Jul 12, 2023 • edited

What steps did you take and what happened:

What did you expect to happen:

Anything else you would like to add:

Images

eshaanm25 commented Jul 12, 2023

eshaanm25 commented Jul 12, 2023 •

edited