Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm Chart - Probe Webhook Retry Logic #2872

Closed
eshaanm25 opened this issue Jul 12, 2023 · 1 comment · Fixed by #2873
Closed

Helm Chart - Probe Webhook Retry Logic #2872

eshaanm25 opened this issue Jul 12, 2023 · 1 comment · Fixed by #2873
Labels
bug Something isn't working

Comments

@eshaanm25
Copy link
Contributor

eshaanm25 commented Jul 12, 2023

What steps did you take and what happened:

The purpose of the Gatekeeper post-install webhook is to probe the Gatekeeper Webhook API for availability. The post-install webhook is designed to retry probe requests until a time defined in postInstall.probeWebhook.waitTimeout. However, the probe webhook exits early when the connection is refused because the Gatekeeper Webhook Pods are not listening on the service port yet.

During the installation of Gatekeeper, if the postInstall.labelNamespace flag is set to false and postInstall.probeWebhook is set to true, the installation of Gatekeeper may fail because the gatekeeper-probe-webhook-post-install Job will exit immediately rather than probing until the specified timeout. Once the Job fails 6 times, the Helm installation will fail because the Job has reached the default backoff limit.

What did you expect to happen:

The post-install webhook should continue to probe the Gatekeeper Webhook API even if the connection is refused because Gatekeeper Webhook Pods are not listening on the service port yet. In order to do this, the --retry--connrefused flag should be added to the probe webhook command.

Anything else you would like to add:

We should also consider setting the backoffLimit of the post-install webhook to 0. In its current state, the webhook will retry probe requests until a time defined in postInstall.probeWebhook.waitTimeout, but this will happen 6 times before a Helm Installation actually fails.

Images

Probe Webhook Retrying 6 times
Probe Webhook Retrying 6 times

Log Output of Probe Webhook
Log Output of Probe Webhook

Helm Installation Faliure
Helm Installation Faliure

PostInstall Helm Configurations
postInstall:
  labelNamespace:
    enabled: false
    extraRules: []
    image:
      repository: openpolicyagent/gatekeeper-crds
      tag: v3.12.0
      pullPolicy: IfNotPresent
      pullSecrets: []
    extraNamespaces: []
    podSecurity: ["pod-security.kubernetes.io/audit=restricted",
      "pod-security.kubernetes.io/audit-version=latest",
      "pod-security.kubernetes.io/warn=restricted",
      "pod-security.kubernetes.io/warn-version=latest",
      "pod-security.kubernetes.io/enforce=restricted",
      "pod-security.kubernetes.io/enforce-version=v1.24"]
    extraAnnotations: {}
  probeWebhook:
    enabled: true
    image:
      repository: curlimages/curl
      tag: 7.83.1
      pullPolicy: IfNotPresent
      pullSecrets: []
    waitTimeout: 60
    httpTimeout: 2
    insecureHTTPS: false
  affinity: {}
  tolerations: []
  nodeSelector: {kubernetes.io/os: linux}
  securityContext:
    allowPrivilegeEscalation: false
    capabilities:
      drop:
      - ALL
    readOnlyRootFilesystem: true
    runAsGroup: 999
    runAsNonRoot: true
    runAsUser: 1000

Environment:

  • Gatekeeper version: 3.12.0
  • Kubernetes version: v1.27.3
  • Helm Version: v.3.12.1
@eshaanm25 eshaanm25 added the bug Something isn't working label Jul 12, 2023
@eshaanm25
Copy link
Contributor Author

PR Incoming!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant