Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding pod anti-affinity to Kyverno #1985

Merged
merged 10 commits into from
Sep 20, 2021
Merged

adding pod anti-affinity to Kyverno #1985

merged 10 commits into from
Sep 20, 2021

Conversation

RinkiyaKeDad
Copy link
Contributor

@RinkiyaKeDad RinkiyaKeDad commented Jun 9, 2021

Signed-off-by: RinkiyaKeDad arshsharma461@gmail.com

Related issue

Fixes #1966

What type of PR is this

/kind feature

Proposed Changes

Added pod anti-affinity to kyverno.
Can enable or disable this by editing the following flag in charts/kyverno/value.yaml:

antiAffinity:
   # This can be disabled in a 1 node cluster.
   enable: true

Proof Manifests

Checklist

Further Comments

@RinkiyaKeDad
Copy link
Contributor Author

@realshuting after making changes to deployment.yaml and running make kustomize-crd, a bunch of formating changes were introduced automatically to the yaml files. Is that okay?

@@ -28,7 +28,16 @@ spec:
securityContext: {{ tpl (toYaml .) $ | nindent 8 }}
{{- end }}
{{- with .Values.affinity }}
affinity: {{ tpl (toYaml .) $ | nindent 8 }}
affinity:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressing this review - I couldn't find how to add pod anti affinity to helm charts so I'm not sure if what I've done is correct. @realshuting could you please point me to the correct way of doing this? Thanks.

Initially I saw how elestic search had implemented this their charts so I did the same but I'm confused now if the change is required in the deployment.yaml or values.yaml file.

@realshuting realshuting added the wip work in progress label Jun 10, 2021
@RinkiyaKeDad RinkiyaKeDad removed the wip work in progress label Jun 16, 2021
@treydock
Copy link
Member

@RinkiyaKeDad The Helm failure has this message

0/1 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity rules, 1 node(s) didn't satisfy existing pods anti-affinity rules.

If I had to guess the label kubernetes.io/hostname might not be getting set for the Kind cluster where Helm executes? My local Kind cluster has that label set. I would expect that to work given you made an identical change to YAML manifests and if I recall correctly those are also applied to the kind cluster.

I'll try to run the Helm tests on my local system to see if I can reproduce and maybe probe things a bit to see what's broken.

@treydock
Copy link
Member

I found the problem. During the Helm testing, a test pod is launched to execute some wget to validate the Helm deployment is healthy. That pod has same labels as the main Kyverno pod so the anti-affinity is getting in the way since there is only 1 node available to Kind clusters. The error message I posted is actually part of the kyverno-test pod from here: https://github.com/kyverno/kyverno/blob/main/charts/kyverno/templates/tests/test.yaml

Here is one possible solution I verified on my local system:

diff --git a/charts/kyverno/templates/_helpers.tpl b/charts/kyverno/templates/_helpers.tpl
index 05934cdb..4be734b2 100644
--- a/charts/kyverno/templates/_helpers.tpl
+++ b/charts/kyverno/templates/_helpers.tpl
@@ -42,6 +42,17 @@ helm.sh/chart: {{ template "kyverno.chart" . }}
 {{- end }}
 {{- end -}}
 
+{{/* Helm required labels */}}
+{{- define "kyverno.test-labels" -}}
+app.kubernetes.io/component: kyverno
+app.kubernetes.io/instance: {{ .Release.Name }}
+app.kubernetes.io/managed-by: {{ .Release.Service }}
+app.kubernetes.io/name: {{ template "kyverno.name" . }}-test
+app.kubernetes.io/part-of: {{ template "kyverno.name" . }}
+app.kubernetes.io/version: "{{ .Chart.Version }}"
+helm.sh/chart: {{ template "kyverno.chart" . }}
+{{- end -}}
+
 {{/* matchLabels */}}
 {{- define "kyverno.matchLabels" -}}
 app.kubernetes.io/name: {{ template "kyverno.name" . }}
diff --git a/charts/kyverno/templates/tests/test.yaml b/charts/kyverno/templates/tests/test.yaml
index f176cdd4..3f548657 100644
--- a/charts/kyverno/templates/tests/test.yaml
+++ b/charts/kyverno/templates/tests/test.yaml
@@ -3,7 +3,7 @@ kind: Pod
 metadata:
   name: "{{ template "kyverno.fullname" . }}-test"
   labels:
-    {{- include "kyverno.labels" . | nindent 4 }}
+    {{- include "kyverno.test-labels" . | nindent 4 }}
   annotations:
     "helm.sh/hook": test
 spec:

This pretty much just duplicates main label set but changes app.kubernetes.io/name to have -test suffix for value which will have the test pod avoid the anti affinity rules.

@RinkiyaKeDad
Copy link
Contributor Author

It worked! Thank you so much for helping @treydock! :)

@treydock
Copy link
Member

One issue I see with forcing affinity is if someone says replicaCount=2 and they have like a dev 2 node cluster, it becomes impossible to doing a rolling restart during updates. I ran into this on my dev cluster when doing replica 2 deployment. I solved it with this:

spec:
  selector:
    matchLabels:
      app: web
  replicas: 2
  strategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate

So might be good to to expose strategy as a freeform hash.

@NoSkillGirl
Copy link
Contributor

dependent on - #2006

@NoSkillGirl
Copy link
Contributor

@treydock - can you please check the PR and let me know if anything else is required.

@treydock
Copy link
Member

Looks fine to me, I think only thing I'd recommend changing is make it possible with Helm chart to turn off the anti affinity so that in simpler 1 node Kubernetes clusters it's possible to test HA and not need a second node.

Maybe like Helm value of antiAffinityEnable: true so that someone can set antiAffinityEnable: false and not have anti affinity enabled.

@treydock
Copy link
Member

Another option would be make the Helm values be like this:

antiAffinity:
  enable: true
  topologyKey: "kubernetes.io/hostname"

@NoSkillGirl NoSkillGirl removed the hold label Sep 11, 2021
@NoSkillGirl
Copy link
Contributor

@treydock - I have made the changes. Please check and let me know if I need to do any further changes.

@treydock
Copy link
Member

Looks good to me

RinkiyaKeDad and others added 8 commits September 15, 2021 21:11
Signed-off-by: RinkiyaKeDad <arshsharma461@gmail.com>
Signed-off-by: RinkiyaKeDad <arshsharma461@gmail.com>
Signed-off-by: RinkiyaKeDad <arshsharma461@gmail.com>
Signed-off-by: RinkiyaKeDad <arshsharma461@gmail.com>
Signed-off-by: RinkiyaKeDad <arshsharma461@gmail.com>
Signed-off-by: RinkiyaKeDad <arshsharma461@gmail.com>
Signed-off-by: NoSkillGirl <singhpooja240393@gmail.com>
Signed-off-by: NoSkillGirl <singhpooja240393@gmail.com>
Signed-off-by: NoSkillGirl <singhpooja240393@gmail.com>
Signed-off-by: NoSkillGirl <singhpooja240393@gmail.com>
@NoSkillGirl NoSkillGirl merged commit 42d4948 into kyverno:main Sep 20, 2021
@NoSkillGirl NoSkillGirl added this to the Kyverno Release 1.5.0 milestone Sep 20, 2021
@realshuting realshuting added the milestone 1.5.0 Issues and PRs for the 1.5.0 release. label Sep 20, 2021
@realshuting realshuting removed this from the Kyverno Release 1.5.0 milestone Sep 20, 2021
@realshuting
Copy link
Member

@RinkiyaKeDad @NoSkillGirl - I'm not able to update Kyverno running in a single node cluster when installed with direct manifest. Should we remove the affinity rule from the direct manifest?

affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- kyverno
topologyKey: kubernetes.io/hostname

@NoSkillGirl
Copy link
Contributor

NoSkillGirl commented Sep 22, 2021

After removing does it works as expected?

What about when we run multiple kyverno pods? If we remove the above podAntiaffinity, still it will work as expected?

@treydock
Copy link
Member

I've run into similar issues on just regular deployments when pod anti-affinity is enabled as well as rolling update strategy. So I think the issue is pod anti affinity combined with this might cause some issues with single node deployment and updates because one pod is terminating and won't let another start on that same node.

  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 40%
    type: RollingUpdate

@NoSkillGirl
Copy link
Contributor

I have not worked on the PR. I don't have much understanding about the above changes.

I connect with @RinkiyaKeDad - he says he have lost so much context and don't remember much.

@treydock @realshuting - should I revert this PR?

@realshuting
Copy link
Member

Thanks @NoSkillGirl!

No, we should update the anti-affinity to the soft limit, see #1982 (comment). @anushkamittal20 is working on that (let's create a separate issue to track).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
milestone 1.5.0 Issues and PRs for the 1.5.0 release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

adding pod anti-affinity to Kyverno
4 participants