Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS errors when updating kube-fledged helm chart #200

Open
robwittman opened this issue Feb 7, 2023 · 4 comments
Open

TLS errors when updating kube-fledged helm chart #200

robwittman opened this issue Feb 7, 2023 · 4 comments
Assignees
Labels
bug Something isn't working done Code pushed to develop branch
Milestone

Comments

@robwittman
Copy link

robwittman commented Feb 7, 2023

When the kube-fledged helm chart is redeployed, if the changes don't cause the webhook-server component to restart, any ImageCache operations start failing with

 Internal error occurred: failed calling webhook "validate-image-cache.kubefledged.io": failed to call webhook: Post "https://kube-fledged-webhook-server.kube-system.svc:3443/validate-image-cache?timeout=1s": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubefledged.io")

It looks like this is because the webhook CA bundle is hardcoded in the helm chart, but when the webhook server is started, init-server generates a new CA bundle and updates the webhook configuration. When another deployment occurs, the original CA bundle is reapplied, and the webhook requests begin to fail, until the webhook component is restarted again to patch the bundle

Is there a best practice for keeping that CA bundle configured appropriately? Would support for an existing Certificate secret make sense?

Steps to reproduce

Install base helm chart

helm install kube-fledged kubefledged-charts/kube-fledged -n kube-system --wait

Deploy a simple image cache

echo 'apiVersion: kubefledged.io/v1alpha2
kind: ImageCache
metadata:
  name: vault
  namespace: kube-system
spec:
  cacheSpec:
  - images:
    - vault:1.9.5
' | kubectl apply -f - 

Update the helm chart, with a value that doesn't restart the webhook server

helm upgrade --install kube-fledged kubefledged-charts/kube-fledged -n kube-system --set 'args.controllerImagePullPolicy=IfNotPresent --wait

If you were to update the ImageCache above, the webhook errors are returned. After restarting the webhook component, they succeed again

@senthilrch senthilrch self-assigned this Feb 12, 2023
@senthilrch senthilrch added the bug Something isn't working label Feb 12, 2023
@senthilrch senthilrch added this to the v0.11.0 milestone Mar 5, 2023
@senthilrch senthilrch added the done Code pushed to develop branch label Mar 5, 2023
@anbrsap
Copy link

anbrsap commented Apr 3, 2023

@senthilrch In 9259975 the service account for the webhook server has been removed. When I helm upgraded from v0.10.0 to (my fork of) v0.10.0 with 9259975 cherry-picked, the webhook server deployment still contained the fields serviceAccount and serviceAccountName with the former value (used Helm v3.11.2). This happens because Helm patches the existing deployment manifest (unless using --force), this does not remove field serviceAccount (deprecated but kept in sync with serviceAccountName) and K8s re-populates serviceAccountName from serviceAccount. Consequently, the new pod could not be started because the service account did not exist anymore.

There are two option to fix this:

  1. Add the following to the Helm template:

          # ensure helm upgrade deletes the formerly used fields
          serviceAccount: ""
          serviceAccountName: ""
  2. Revert the removal of the service account. In general it's a good practice to use a dedicated service account instead of default.

I suggest option 2.

@Chili-Man
Copy link
Contributor

@senthilrch any updates on this ?

@aledeulo
Copy link

aledeulo commented Jul 31, 2023

Hi there. @senthilrch any updates related to the fix of this issue?
I'm using zarf to deploy/re-deploy kube-fledged packages, so, in concept, you can do a fresh installation of the helm charts by removing the previous installation with zarf and then creating a new installation package again with zarf, that should be able to deploy everything related to kube-fledged from zero, including the webhook server. I'm having the same issue.
Thanks

@aledeulo
Copy link

aledeulo commented Aug 1, 2023

I've found a simple workaround to this issue:
Add the following configs to the values.yaml to disable the webhook server and the validation webhook

# Disable webhook server and validation webhook
webhookServer:
  enable: false
validatingWebhook:
  # Specifies whether a validating webhook configuration should be created
  create: false

This is probably not the best solution, but as I've seen in the code and also in the Make file in the deploy-using-yaml option, this is a very known issue and the validation is probably not 100% required.

status:
    completionTime: "2023-08-01T14:59:51Z"
    message: All requested images pulled succesfully to respective nodes
    reason: ImageCacheCreate
    startTime: "2023-08-01T14:59:42Z"
    status: Succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working done Code pushed to develop branch
Projects
None yet
Development

No branches or pull requests

5 participants