-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
verbose debuging #95
Comments
Hi @davem-git, At a first glance I see no issues with your modrule or the way you are using it. Can you remove the idempotency check I wonder if:
|
Also, I don't see a difference between your two modrules. |
Oops I removed the bottom part of the selection. I just didn't copy and paste it properly. I corrected the above sample |
Ok, so KubeMod does not detect the Node-related event. |
I'm causing the nodes to scale out and in. I've even deleted the user node pool and added a brand new one back. I haven't tried rebooting. Ideally this will pick up nodes as they come online through auto scaling events |
Got it, thanks. |
Thank you! If there's anyway i can help debug this, any flags i can enable for more verbose logging please let me know. |
Logging every CREATE and UPDATE event that passes through KubeMod won't be reasonable as KubeMod interception is quite promiscuous and there may be hundreds of events per second at the time large application stacks are being stood up, torn down, scaled up/down. It may make sense to add flags to log events for specific resources (for example nodes). Let me see what I can find in my node modrule tests. |
Great! sounds like a good plan. Thanks again! |
Hi @davem-git, I have an update. I tried your ModRule (first version above) on my Docker for Windows deployment of Kubernetes. Then I triggered an UPDATE event on the
This triggered your ModRule and applied the taint on the node as expected, issuing the following log line in
Can you try the same on your end? See if that triggers the ModRule patch. This is obviously not your use case - I am only trying to test if node-related events are being intercepted by KubeMod in your AKS cluster. |
I'll give it a try. Thanks! |
still nothing. I uploaded my yaml as a .MD since its not supported as yaml format. incase I have something just wrong, but I don't think i do as there wasn't much to customize for my environment. Labels: agentpool=pool1az0
beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=Standard_E4ds_v5
beta.kubernetes.io/os=linux
color=blue ❯ kubectl -n kubemod-system logs kubemod-operator-d595864b8-dz25h
{"level":"info","ts":"2022-10-14 20:30:56.834Z","logger":"webapp-setup","msg":"web app server is starting to listen","addr":":8081"}
{"level":"info","ts":"2022-10-14 20:30:57.149Z","logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8082"}
{"level":"info","ts":"2022-10-14 20:30:57.149Z","logger":"operator-setup","msg":"health server is starting to listen","addr":":8083"}
{"level":"info","ts":"2022-10-14 20:30:57.149Z","logger":"controller-runtime.builder","msg":"Registering a mutating webhook","GVK":"api.kubemod.io/v1beta1, Kind=ModRule","path":"/mutate-api-kubemod-io-v1beta1-modrule"}
{"level":"info","ts":"2022-10-14 20:30:57.149Z","logger":"controller-runtime.webhook","msg":"registering webhook","path":"/mutate-api-kubemod-io-v1beta1-modrule"}
{"level":"info","ts":"2022-10-14 20:30:57.149Z","logger":"controller-runtime.builder","msg":"Registering a validating webhook","GVK":"api.kubemod.io/v1beta1, Kind=ModRule","path":"/validate-api-kubemod-io-v1beta1-modrule"}
{"level":"info","ts":"2022-10-14 20:30:57.149Z","logger":"controller-runtime.webhook","msg":"registering webhook","path":"/validate-api-kubemod-io-v1beta1-modrule"}
{"level":"info","ts":"2022-10-14 20:30:57.149Z","logger":"operator-setup","msg":"registering core mutating webhook"}
{"level":"info","ts":"2022-10-14 20:30:57.149Z","logger":"controller-runtime.webhook","msg":"registering webhook","path":"/dragnet-webhook"}
{"level":"info","ts":"2022-10-14 20:30:57.149Z","logger":"operator-setup","msg":"starting manager"}
{"level":"info","ts":"2022-10-14 20:30:57.149Z","logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"}
{"level":"info","ts":"2022-10-14 20:30:57.149Z","logger":"controller","msg":"Starting EventSource","reconcilerGroup":"api.kubemod.io","reconcilerKind":"ModRule","controller":"modrule","source":"kind source: /, Kind="}
{"level":"info","ts":"2022-10-14 20:30:57.149Z","logger":"controller-runtime.webhook.webhooks","msg":"starting webhook server"}
{"level":"info","ts":"2022-10-14 20:30:57.150Z","logger":"controller-runtime.certwatcher","msg":"Updated current TLS certificate"}
{"level":"info","ts":"2022-10-14 20:30:57.150Z","logger":"controller-runtime.webhook","msg":"serving webhook server","host":"","port":9443}
{"level":"info","ts":"2022-10-14 20:30:57.150Z","logger":"controller-runtime.certwatcher","msg":"Starting certificate watcher"}
{"level":"info","ts":"2022-10-14 20:30:57.250Z","logger":"controller","msg":"Starting Controller","reconcilerGroup":"api.kubemod.io","reconcilerKind":"ModRule","controller":"modrule"}
{"level":"info","ts":"2022-10-14 20:30:57.250Z","logger":"controller","msg":"Starting workers","reconcilerGroup":"api.kubemod.io","reconcilerKind":"ModRule","controller":"modrule","worker count":1} |
I wonder if there is some timing issue involved. Please make sure that your test steps are in this order:
If this is the order in which you've performed your test, and you still don't see any activity in the log, then I think there might be some configuration in your environment that prevents KubeMod from receiving node-related admission webhook requests. Can you test if KubeMod works against other resources in your cluster? For example, you can test by deploying the following sample modrue to any namespace... kubectl apply -f https://raw.githubusercontent.com/kubemod/kubemod/master/samples/modrules/modrule-1.yaml ... and then deploy the following sample NGINX deployment in the same namespace:
The above should produce KubeMod log item like this:
You can then clean up by deleting the modrule and deployment:
Please let me know if the above produces logs. |
no luck. That was the order of operations I was using. I didn't have any luck with the nginx rule test either. I'm going to remove kubemod and try again seeing if I missed something in the setup. |
after doing that, the modrule kicked off for the nginx test! I'll go back any try the node rules again. {"level":"info","ts":"2022-10-17 15:04:04.010Z","logger":"modrule-webhook","msg":"Applying ModRule patch","request uid":"9e7e0c3b-90bd-468b-b474-0683620f05d7","namespace":"default","resource":"deployments/nginx","operation":"CREATE","patch":[{"op":"replace","path":"/metadata/annotations/kubectl.kubernetes.io~1last-applied-configuration","value":"{\"apiVersion\":\"apps/v1\",\"kind\":\"Deployment\",\"metadata\":{\"annotations\":{},\"labels\":{\"app\":\"nginx\",\"color\":\"whatever\"},\"name\":\"nginx\",\"namespace\":\"default\"},\"spec\":{\"replicas\":1,\"selector\":{\"matchLabels\":{\"app\":\"nginx\"}},\"template\":{\"metadata\":{\"labels\":{\"app\":\"nginx\"}},\"spec\":{\"containers\":[{\"image\":\"bitnami/nginx:1.14.2\",\"name\":\"nginx\",\"ports\":[{\"containerPort\":8080,\"protocol\":\"TCP\"}],\"resources\":{\"limits\":{\"cpu\":\"500m\",\"memory\":\"1Gi\"}}},{\"command\":[\"sh\",\"-c\",\"while true; do sleep 5; done;\"],\"image\":\"alpine:3\",\"name\":\"injected\"}]}}}}"},{"op":"add","path":"/metadata/labels/color","value":"whatever"},{"op":"add","path":"/spec/template/spec/containers/1","value":{"command":["sh","-c","while true; do sleep 5; done;"],"image":"alpine:3","name":"injected"}},{"op":"replace","path":"/spec/template/spec/containers/0/image","value":"bitnami/nginx:1.14.2"},{"op":"replace","path":"/spec/template/spec/containers/0/ports/0/containerPort","value":8080}]} |
no luck. It doesn't seem to be seeing any node rules. At least the add label hasn't triggered it to run |
I'm not sure what I did, but it seems to be working? color","value":"whatever"},{"op":"add","path":"/spec/template/spec/containers/1","value":{"command":["sh","-c","while true; do sleep 5; done;"],"image":"alpine:3","name":"injected"}},{"op":"replace","path":"/spec/template/spec/containers/0/image","value":"bitnami/nginx:1.14.2"},{"op":"replace","path":"/spec/template/spec/containers/0/ports/0/containerPort","value":8080}]}
{"level":"info","ts":"2022-10-17 15:31:13.070Z","logger":"modrule-webhook","msg":"Applying ModRule patch","request uid":"8deefcc6-6235-40dc-9720-e7313e0fe876","namespace":"","resource":"nodes/aks-pool0-38869568-vmss000000","operation":"UPDATE","patch":[{"op":"add","path":"/spec/taints","value":[{"effect":"NoSchedule","key":"node.cilium.io/agent-not-ready","value":"true"}]}]}
{"level":"info","ts":"2022-10-17 15:31:13.108Z","logger":"modrule-webhook","msg":"Applying ModRule patch","request uid":"3c0bbb55-be4e-4152-993d-efb72d865d56","namespace":"","resource":"nodes/aks-pool1az0-57119049-vmss000003","operation":"UPDATE","patch":[{"op":"add","path":"/spec/taints","value":[{"effect":"NoSchedule","key":"node.cilium.io/agent-not-ready","value":"true"}]}]}
{"level":"info","ts":"2022-10-17 15:31:13.142Z","logger":"modrule-webhook","msg":"Applying ModRule patch","request uid":"a33e82fb-65ba-4312-8ef4-bd6c31c10f89","namespace":"","resource":"nodes/aks-pool1az0-57119049-vmss000004","operation":"UPDATE","patch":[{"op":"add","path":"/spec/taints","value":[{"effect":"NoSchedule","key":"node.cilium.io/agent-not-ready","value":"true"}]}]}
{"level":"info","ts":"2022-10-17 15:31:13.174Z","logger":"modrule-webhook","msg":"Applying ModRule patch","request uid":"f43b0106-6fa9-4091-aa43-1ef2b3a3edde","namespace":"","resource":"nodes/aks-pool1az0-57119049-vmss000005","operation":"UPDATE","patch":[{"op":"add","path":"/spec/taints","value":[{"effect":"NoSchedule","key":"node.cilium.io/agent-not-ready","value":"true"}]}] |
Hmm, it does feel (again) like timing has something to do with it.
I wonder if we can reset and retest by following these steps:
If this works, then test your ModRule by scaling your node pool up and down. |
seems like things are mostly working. Everything besides trying to get the rule to apply only once. It seems to be running over and over again. i'll play around with the - select: '$.spec.taints[*].key'
matchValue: node.cilium.io/agent-not-ready
negate: true |
i think i found this to work better. still running some tests apiVersion: api.kubemod.io/v1beta1
kind: ModRule
metadata:
name: add-cilium-taint
namespace: kubemod-system
spec:
type: Patch
match:
- select: '$.kind'
matchValue: Node
- select: '$.metadata.labels["kube-moderule-applied"]'
negate: true
patch:
- op: add
path: /metadata/labels/kube-moderule-applied
- op: add
path: /spec/taints/-1
value: |-
effect: NoSchedule
key: node.cilium.io/agent-not-ready
value: "true"
|
@davem-git any update? |
that rule seems to fixed it. I think this is safe to close. I'm going to delete my cluster and start again to ensure my results are repeatable. I was pulled away on to something else I should be finishing today. I had hope to tests this again by now. |
Great - thanks! |
hmm weird re-install didn't work again on a new cluster. seems inconsistent trying to figure out what's different |
It seems to be an order of operation issues. My company uses kustomize. so I've broken the deployment down to fit our standard. Kustomize seems to try to deploy the modrule before its an object on the cluster. It errors, which case i normally run it again and it applies cleanly. kb workload/kube-mod/base | kubectl apply -f -
namespace/kubemod-system created
customresourcedefinition.apiextensions.k8s.io/modrules.api.kubemod.io created
role.rbac.authorization.k8s.io/kubemod-crt created
clusterrole.rbac.authorization.k8s.io/kubemod-crt created
clusterrole.rbac.authorization.k8s.io/kubemod-manager created
rolebinding.rbac.authorization.k8s.io/kubemod-crt created
clusterrolebinding.rbac.authorization.k8s.io/kubemod-crt created
clusterrolebinding.rbac.authorization.k8s.io/kubemod-manager created
service/kubemod-webapp-service created
service/kubemod-webhook-service created
deployment.apps/kubemod-operator created
cronjob.batch/kubemod-crt-cron-job created
job.batch/kubemod-crt-job created
mutatingwebhookconfiguration.admissionregistration.k8s.io/kubemod-mutating-webhook-configuration created
validatingwebhookconfiguration.admissionregistration.k8s.io/kubemod-validating-webhook-configuration created
error: unable to recognize "STDIN": no matches for kind "ModRule" in version "api.kubemod.io/v1beta1"
❯ kb workload/kube-mod/base | kubectl apply -f -
namespace/kubemod-system unchanged
customresourcedefinition.apiextensions.k8s.io/modrules.api.kubemod.io configured
role.rbac.authorization.k8s.io/kubemod-crt unchanged
clusterrole.rbac.authorization.k8s.io/kubemod-crt unchanged
clusterrole.rbac.authorization.k8s.io/kubemod-manager configured
rolebinding.rbac.authorization.k8s.io/kubemod-crt unchanged
clusterrolebinding.rbac.authorization.k8s.io/kubemod-crt unchanged
clusterrolebinding.rbac.authorization.k8s.io/kubemod-manager unchanged
service/kubemod-webapp-service unchanged
service/kubemod-webhook-service unchanged
deployment.apps/kubemod-operator unchanged
cronjob.batch/kubemod-crt-cron-job unchanged
modrule.api.kubemod.io/add-cilium-taint created
job.batch/kubemod-crt-job unchanged
mutatingwebhookconfiguration.admissionregistration.k8s.io/kubemod-mutating-webhook-configuration configured
validatingwebhookconfiguration.admissionregistration.k8s.io/kubemod-validating-webhook-configuration configured some order of operation issue with this error causes kube-mod to not get any events, at least from nodes. my testing shows from anything at this point. If i remove kubemod re-install it without the rule, then add the rule after it works fine. Not sure what's going on. I'm working on a way to ensure that always runs last to see if that fixes it. |
Ah, this makes sense. So what's happening is, you are installing KubeMod and an instance of a ModRule ( Part of the KubeMod installation registers Please see this as a description of your issue: kubernetes/kubectl#1117 It makes sense to split your installation into three parts:
And to be 100% safe, you can wait for the kubemod operator to be up and running before you apply your modrule.
|
apparently that's easier said than done with the tools i have at my disposal. Ideally this would be something we can deploy with argoCD as the rest of our applications use it. That method above really doesn't fit that format. It doesn't seem to have a way to wait. Normally this isn't a problem as ArgoCD will try to deploy it again, and again deploy whats missing. That doesn't work in this case as it seems to permanently break kube mod with no logs. which requires manual removal and redeployment without the rule. Since this will be used as part of provisioning, I think i can remove the argo aspect and get some way of manually installing it when new clusters are provisioning. It would be nice to know why this order of operations issue can't be fixed without a reinstall |
I think it is clear that you cannot deploy an instance of a CRD before the CRD was registered with the cluster. ArgoCD provides the following solution: argoproj/argo-cd#1999 |
oh good to know ARGOCD. Thanks for all your help. This can be closed out! |
Thanks! Closing :) |
I've installed kubemod with a basic setup. I'd like to be able to add taints to nodes as they come online. So far I haven't gotten it to do anything. I don't see any errors or any indication of what's going on at all. I've reverted to a basic simple match. Its possible that's still incorrect.
I've ran the full setup, including the patch with extra objects, though that seems to be the default now.
i've tried this
i've even tried a more basic
I don't get anything useful in the logs
any help will be appreciated
I'm on AKS
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.6", GitCommit:"b39bf148cd654599a52e867485c02c4f9d28b312", GitTreeState:"clean", BuildDate:"2022-09-21T21:46:51Z", GoVersion:"go1.18.6", Compiler:"gc", Platform:"linux/amd64"}
with the latest install of kubemod
The text was updated successfully, but these errors were encountered: