New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider automatic ENABLE_NATIVE_SIDECARS support #48794
Comments
some additional suggestions in addition to proposed modes might be:
Finally there is an option we discussed at SIG Node Working Group meeting to have a universal sidecar injector that inject both types of sidecars and they coordinate things in runtime. Here the link on notes. But this universal injector has some sideeffects. If universal injector something you would like to explore more - please let us know. |
How would it break users if we make N=1.29 and ensure it's an install-time flag, and not a runtime flag? It wouldn't, correct? Ideally I'd expect it to be "if installing new Istio version and Kube apiserver => 1.29 then use native sidecars". Unless people upgrade their k8s and Istio at the same time, that shouldn't create any problems? Might be missing something basic tho. |
Testing the API server version is easy. But this feature needs the nodes to also be 1.29+ |
Would the k8s nodes and the k8s API server be out of sync at any point, other than during a K8S upgrade? If enabling native sidecars is an Istio install option and not a runtime option, it should be extremely hard to footgun yourself. -> Upgrade K8S? Your existing version of Istio was not installed with native sidecar support, so you are fine. |
Sadly not really... I mean in broad strokes I suppose probably yes, but for many many users we are talking on the order of years to upgrade the nodes, with up to 3+ version skew. Based on some anecdotal data, not sure exact statistics here. |
sigh aight nevermind. |
To be fair, I think your suggestion of basically "just have the user make sure its safe" (or install time check) is still viable in spite of that as well. I am not the fence if we need to do more beyond that. |
Yeah and if people are running nodes that skewed, the "aggregate all nodes and only allow if all nodes >= 1.29" won't help them anyway, they won't get this feature anytime soon. I am inclined to just let it be an install time flag with an install time check (+ aggregate node check at install time if we want, if you're badly skewed at install time, turn off the feature or go sort yourself out). It's well-within |
This will finally solve the running jobs on Istio which is a main method of running AI/ML training. I recommend implementing it by default as this seems to be the right approach. #11659 |
not stale |
Currently, we have ENABLE_NATIVE_SIDECARS={true,false}. A user can set this to true IFF they know the API server is on 1.29 and all nodes that run Istio containers are on 1.29. (Technically, it can still be turned off in 1.29 with a feature flag, but its default on in k8s 1.29).
This makes automatic enablement challenging, as we currently only know the API server version.
If we want to automatically enable this flag, we have two options:
Detect API server version is at least N, if so turn it on. If we make N=1.29 it will certainly break users just upgrading to 1.29. If we make N=1.33, it will only break users outside the OSS supported skew range but will also delay this by 15 months.
Add support to read all node versions in the injector; turn on only if all nodes are >=1.29. We already have a node informer elsewhere, so we just need to pass this through to the injector. Note at injection time we do not know the node a pod will run on, so we need to aggregate all nodes, not just a specific one on each request. There may be false negatives here, as users could have some node pools with 1.29 and ensure Istio pods only run on those (via nodeSelector, etc); I don't think this is safe (or simple) to try to support, as we may not be the last webhook to run. We would need all nodes to be 1.29. Of course, users could always explicitly set this to enabled. This has a theoretical race condition if a new 1.28 node comes up and we haven't seen it yet, but this seems exceptionally unlikely.
Alternatively, we can just not do this at all and make users manually opt in. Keeping in mind this is a "Beta" feature in Kubernetes, it seems reasonable to not automatically turn it on for all users yet. However, the "automatic" mode could still be useful even if its off-by-default, so that users can opt into it but with safeguards.
The text was updated successfully, but these errors were encountered: