Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically roll karpenter deployment on logging or global settings configuration change #726

Closed
sidewinder12s opened this issue Feb 24, 2023 · 8 comments · Fixed by #821
Labels
good-first-issue Good for newcomers kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/feature Categorizes issue or PR as related to a new feature. v1.x Issues prioritized for post-1.0

Comments

@sidewinder12s
Copy link

Tell us about your request

Use Helm sha256 trick to automatically roll the karpenter controller pods when any of its config changes.

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

Currently you would need to manually restart Karpenter on certain configuration changes.

Are you currently working around this issue?

Manually.

Additional Context

https://helm.sh/docs/howto/charts_tips_and_tricks/#automatically-roll-deployments

Attachments

No response

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@sidewinder12s sidewinder12s added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 24, 2023
@bwagner5
Copy link
Contributor

Thanks for the feature request! We recently moved away from dynamic reload of the configmap to statically loading it. #174

This seems like a good idea to do this type of thing in our helm deploy to make sure configmap changes are applied properly on an update.

@bwagner5 bwagner5 added the good-first-issue Good for newcomers label Feb 26, 2023
@jonathan-innis
Copy link
Member

This change already exists for the karpenter-global-settings here. I'm confused by the ask. Are you not observing correct behavior or are you looking for the same change to be applied to the logging config?

@sidewinder12s
Copy link
Author

Oh didn't realize the global settings had it. Guess the ask then is to add it to the logging config as well. I'll confirm next time I change a global setting that it rolls, but when I changed logging settings it had not rolled.

@jonathan-innis
Copy link
Member

but when I changed logging settings it had not rolled.

Yeah, we expected that logging settings would be changed more frequently, so we decided to dynamically update those at runtime as opposed to restarting the deployment and causing unnecessary disruptions. Did you observe any issues when you updated the logging config with it updating dynamically?

@sidewinder12s
Copy link
Author

sidewinder12s commented Feb 27, 2023

but when I changed logging settings it had not rolled.

Yeah, we expected that logging settings would be changed more frequently, so we decided to dynamically update those at runtime as opposed to restarting the deployment and causing unnecessary disruptions. Did you observe any issues when you updated the logging config with it updating dynamically?

I didn't notice it change after updating, but had also read you guys had changed this behavior for settings in ~0.24/5 so I did not think it dynamically updated and forced a restart.

I generally have disliked any dynamic config update processes I have used. It throws out the deployment rolling update features of Kubernetes and introduces many failure modes that are not obvious to users like:

  • The dynamic config system needs to handle incorrect settings/not use the new settings. Usually this means the signal that the deployment was wrong can be totally lost
  • scaling/rescheduling can fail to run since any new pods will not have an old config to fall back on
  • Time to new config can be unpredictable

@jonathan-innis
Copy link
Member

  • The dynamic config system needs to handle incorrect settings/not use the new settings. Usually this means the signal that the deployment was wrong can be totally lost
  • scaling/rescheduling can fail to run since any new pods will not have an old config to fall back on
  • Time to new config can be unpredictable

I think the last two here are critical, particularly aws/karpenter-provider-aws#2. Do you find that you often get the logging config wrong when you make changes?

@sidewinder12s
Copy link
Author

I've run into problem 2 before in other apps that rely on dynamic config reloading + fallback to working config. Then when new pods that don't have the old config loaded run they just crash.

@jonathan-innis jonathan-innis added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Feb 28, 2023
@ellistarn ellistarn added the v1 Issues requiring resolution by the v1 milestone label Apr 18, 2023
@jonathan-innis jonathan-innis added v1.x Issues prioritized for post-1.0 and removed v1 Issues requiring resolution by the v1 milestone labels Aug 31, 2023
@njtran
Copy link
Contributor

njtran commented Oct 31, 2023

We've deprecated the configmap from v1beta1 APIs as part of v0.32+. You can now use environment variables for settings, where the configmap support will be removed as part of v0.33.0+. More information about the environment variables here.

@njtran njtran transferred this issue from aws/karpenter-provider-aws Nov 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good-first-issue Good for newcomers kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/feature Categorizes issue or PR as related to a new feature. v1.x Issues prioritized for post-1.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants