Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider automatic ENABLE_NATIVE_SIDECARS support #48794

Open
howardjohn opened this issue Jan 12, 2024 · 10 comments · May be fixed by #49570
Open

Consider automatic ENABLE_NATIVE_SIDECARS support #48794

howardjohn opened this issue Jan 12, 2024 · 10 comments · May be fixed by #49570

Comments

@howardjohn
Copy link
Member

Currently, we have ENABLE_NATIVE_SIDECARS={true,false}. A user can set this to true IFF they know the API server is on 1.29 and all nodes that run Istio containers are on 1.29. (Technically, it can still be turned off in 1.29 with a feature flag, but its default on in k8s 1.29).

This makes automatic enablement challenging, as we currently only know the API server version.

If we want to automatically enable this flag, we have two options:

  1. Detect API server version is at least N, if so turn it on. If we make N=1.29 it will certainly break users just upgrading to 1.29. If we make N=1.33, it will only break users outside the OSS supported skew range but will also delay this by 15 months.

  2. Add support to read all node versions in the injector; turn on only if all nodes are >=1.29. We already have a node informer elsewhere, so we just need to pass this through to the injector. Note at injection time we do not know the node a pod will run on, so we need to aggregate all nodes, not just a specific one on each request. There may be false negatives here, as users could have some node pools with 1.29 and ensure Istio pods only run on those (via nodeSelector, etc); I don't think this is safe (or simple) to try to support, as we may not be the last webhook to run. We would need all nodes to be 1.29. Of course, users could always explicitly set this to enabled. This has a theoretical race condition if a new 1.28 node comes up and we haven't seen it yet, but this seems exceptionally unlikely.

Alternatively, we can just not do this at all and make users manually opt in. Keeping in mind this is a "Beta" feature in Kubernetes, it seems reasonable to not automatically turn it on for all users yet. However, the "automatic" mode could still be useful even if its off-by-default, so that users can opt into it but with safeguards.

@SergeyKanzhelev
Copy link

some additional suggestions in addition to proposed modes might be:

  • Inject using sidecar if a Pod already has another sidecar container declared.
  • control the injection per pod via Pod attributes. So one can force to use built-in sidecar on a specific Pod.

Finally there is an option we discussed at SIG Node Working Group meeting to have a universal sidecar injector that inject both types of sidecars and they coordinate things in runtime. Here the link on notes. But this universal injector has some sideeffects. If universal injector something you would like to explore more - please let us know.

@bleggett
Copy link
Contributor

bleggett commented Jan 16, 2024

How would it break users if we make N=1.29 and ensure it's an install-time flag, and not a runtime flag? It wouldn't, correct?

Ideally I'd expect it to be "if installing new Istio version and Kube apiserver => 1.29 then use native sidecars". Unless people upgrade their k8s and Istio at the same time, that shouldn't create any problems? Might be missing something basic tho.

@howardjohn
Copy link
Member Author

Testing the API server version is easy. But this feature needs the nodes to also be 1.29+

@bleggett
Copy link
Contributor

bleggett commented Jan 16, 2024

Testing the API server version is easy. But this feature needs the nodes to also be 1.29+

Would the k8s nodes and the k8s API server be out of sync at any point, other than during a K8S upgrade?

If enabling native sidecars is an Istio install option and not a runtime option, it should be extremely hard to footgun yourself.

-> Upgrade K8S? Your existing version of Istio was not installed with native sidecar support, so you are fine.
-> Done upgrading K8S, all nodes updated? Install a new revision/upgrade of Istio and explicitly enable native sidecar support, we can check the API server to make extra-sure.
-> Upgrade K8S and try to install a new Istio instance on top of it mid-upgrade before all the nodes are transitioned? Well, that's a mess, don't do that. No one would do that anyway, in the real world, right? AFAIK we don't support or recommend that in any of our operational guides anyway.

@howardjohn
Copy link
Member Author

Would the k8s nodes and the k8s API server be out of sync at any point, other than during a K8S upgrade?

Sadly not really... I mean in broad strokes I suppose probably yes, but for many many users we are talking on the order of years to upgrade the nodes, with up to 3+ version skew. Based on some anecdotal data, not sure exact statistics here.

@bleggett
Copy link
Contributor

Would the k8s nodes and the k8s API server be out of sync at any point, other than during a K8S upgrade?

Sadly not really... I mean in broad strokes I suppose probably yes, but for many many users we are talking on the order of years to upgrade the nodes, with up to 3+ version skew. Based on some anecdotal data, not sure exact statistics here.

sigh aight nevermind.

@howardjohn
Copy link
Member Author

To be fair, I think your suggestion of basically "just have the user make sure its safe" (or install time check) is still viable in spite of that as well. I am not the fence if we need to do more beyond that.

@bleggett
Copy link
Contributor

bleggett commented Jan 16, 2024

Yeah and if people are running nodes that skewed, the "aggregate all nodes and only allow if all nodes >= 1.29" won't help them anyway, they won't get this feature anytime soon.

I am inclined to just let it be an install time flag with an install time check (+ aggregate node check at install time if we want, if you're badly skewed at install time, turn off the feature or go sort yourself out).

It's well-within istioctl's realm of responsibility to perform a preinstall node check, I feel.

@yasensim
Copy link

This will finally solve the running jobs on Istio which is a main method of running AI/ML training. I recommend implementing it by default as this seems to be the right approach. #11659

@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Apr 17, 2024
@mikemorris mikemorris removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Apr 22, 2024
@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Apr 22, 2024
@mikemorris
Copy link
Member

not stale

@istio-policy-bot istio-policy-bot removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants