Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial merge of Solo out-of-tree inpod CNI #48253

Merged
merged 56 commits into from Jan 26, 2024

Conversation

bleggett
Copy link
Contributor

@bleggett bleggett commented Dec 9, 2023

CNI node agent changes that go with @yuval-k's istio/ztunnel#747

For context see: https://docs.google.com/document/d/1dynKlnNgIOywv3cwuw_2RCk_SKFRs7IrESaC8_r-sm0/edit

Basically, this reworks the ambient parts of the current Istio CNI agent, and fully replaces the old ambient CNI components.

Reviewers

Key code to review is:

  • cni/nodeagent
  • cni/plugin
  • cni/zdsapi

the rest is tests and integrating with existing code, and deletion of now-unused code.

Internally, we actually had built this as a pure-ambient agent and had stripped out all of the sidecar-related code from the plugin/installer/agent, running it as a separate container in the istio-cni daemonset and chaining with the Istio plugin.

For this PR I've re-combined them since that's not likely to be acceptable in-tree, so note that the old sidecar iptables stuff and tests (and related messiness, TODOs, and non-mockability) has not been touched for the most part (other than renaming it to more clearly segregate it from the ambient stuff.

TL;DR

Old code:

  • If ambient was enabled, the CNI plugin installed on the node by the node-agent install server would no-op.
  • If ambient was enabled, the CNI server would start an ambient cluster watch server, which would configure host-netns networking for newly observed pods.

New code:

  • If ambient is enabled (for CNI agent itself, and the observed pod), the CNI plugin installed on the node by the node-agent install server will forward an Add event to the ambient watch server over UDS (but will otherwise do nothing).
  • If ambient is enabled for the CNI agent itself, the CNI server will start an ambient cluster watch server, which will listen for plugin events over UDS and/or K8S control plane events for relevant pods. The CNI agent will obtain the netns of the relevant pods, jump into the pods, and set up iptables redirection rules. It will then push a message to the node ztunnel over another UDS socket with the pod netns file descriptor, so that ztunnel can establish listeners inside the pod's netns.

Goals:

  • Redirection/enrollment happens at the earliest possible time - when the CNI is informed of the pod creation, before K8S itself is aware or the pod is ready.
  • CNI agent on the node is not in the pod datapath, and can safely be restarted (though ambient-enrolled pods will not be schedulable if the CNI agent on the node they are scheduled on is non-responsible, as the CNI plugin will error out - this is A Good Thing)
  • ztunnel on the node can also be safely restarted - on restart it will re-announce itself to the node's CNI agent and get a snapshot of current workloads.

TODOs:

  • As part of this we added a more mockable/testable/robust iptables impl than the one that was previously used by the plugin for sidecar redirection - Ideally we could replace the sidecar iptables impl with this new one, rather than have two. The sidecar one makes use of several not-very-flexible plugin-level global var overrides to support mocking and testing and it would be good to remove it in favor of the new one, but I have not done that.

  • This replaces the old ambient CNI server, which renders the existing ambient eBPF impl (which assumes a host-netns redirection model among other things) non-working.

  • The re-merge might have broken some of the test suites, I'll fix those shortly - however note that integ tests will not pass until a new ztunnel build is cut from inpod redirection mode ztunnel#747 and the SHA is bumped.

@istio-testing istio-testing added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Dec 9, 2023
cni/README.md Show resolved Hide resolved
cni/README.md Outdated Show resolved Hide resolved
@PlatformLC
Copy link
Contributor

A general question: will we plan to remove/cleanup the old redirection implementation eventually or support both mode?

Copy link
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you split it into several prs or commits, i think it is difficult to review

@bleggett
Copy link
Contributor Author

bleggett commented Dec 11, 2023

A general question: will we plan to remove/cleanup the old redirection implementation eventually or support both mode?

This PR fully replaces the old impl, dropping it, and I would suggest we continue with that tack.

The CNI agent is IMO (with ambient + sidecar + alt/optional flows for each, 2-3 listening servers spawned) already too complex and multipurpose. Keeping the old impl around in what is already a nonstable mode just adds more spaghetti, so I don't think we should.

That's partially why in the internal impl we split the ambient CNI out into a separate container that was optionally enabled if ambient was enabled and disabled ambient mode in the existing container, such that it only handled sidecar CNI and the additional container handled only ambient CNI, effectively shipping 2 separate containers inside the Istio CNI daemonset.

That is very clean operationally and testing-wise but it introduces quite a bit of code duplication (not a bad thing in a private fork as it nicely buffers you from upstream) that I don't think would be acceptable to the Istio maintainers.

If I'm wrong about that I'd be happy to go back to the two-container model, where we have one sidecar CNI impl and one ambient CNI impl and they live in different trees and are built as 2 separate containers managing different sets of pods, deployed in the same daemonset - but keeping everything as 1 container and adding yet-another-flag on top of the ones we have seems like a needless maintenance mess when we have no back compat requirements for ambient.

@bleggett
Copy link
Contributor Author

bleggett commented Dec 11, 2023

can you split it into several prs or commits, i think it is difficult to review

I agree and can, but not in a way that would be functional, or would be mergable/pass CI.

I can do what we had done internally and PR a net-new standalone container that only does ambient CNI, and disables the existing ambient handling in the current container so that it only manages sidecars. That would still be a lot of (duplicated) code but it would be all net-new - that might be easier to review, it's hard for me to judge.

@bleggett bleggett force-pushed the bleggett/inpod-cni-agent branch 2 times, most recently from fc56f58 to 1f83e31 Compare December 11, 2023 19:15
@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Dec 11, 2023
@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Dec 11, 2023
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
@bleggett
Copy link
Contributor Author

@howardjohn and @keithmattix I think I addressed/resolved most of the simple stuff, thanks - there are a few unresolved comments where I opened an issue to track as a followup, or there's an open question.

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
@bleggett
Copy link
Contributor Author

@howardjohn when you're ready to remove the DNM I'll cherry-pick to 1.21 - thanks!

@bleggett bleggett added the cherrypick/release-1.21 Set this label on a PR to auto-merge it to the release-1.21 branch label Jan 26, 2024
@howardjohn howardjohn removed the do-not-merge/hold Block automatic merging of a PR. label Jan 26, 2024
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
This reverts commit 316bd69.
@bleggett
Copy link
Contributor Author

bleggett commented Jan 26, 2024

Reverting 316bd69 - I can reproduce test flakes due to missed pod events with the mock kube client the tests use without this.

This is jogging my memory - this is why @yuval-k did the queueing in this manner - to work around the fact that the mock client in pkg/kube/client.go has slightly different behavior than the real thing, and otherwise in tests will miss events.

With the revert, the flake is gone (after ~500 iterations anyway) and this tracks with why we didn't see it internally for the past 6 months.

Digging into why the existing test rig kube client drops events without this manual requeue, but the above revert should be fine either way.

This is (we think) related to kubernetes/kubernetes#95372

Threw up #49027 - the double-queue we have here will work, but we might be able to arrive at a solution for tests that doesn't require that by either forcing more cache syncs, or fixing bugs in the fake kube client.

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
@istio-testing
Copy link
Collaborator

In response to a cherrypick label: new pull request created: #49028

liwenhao0810 pushed a commit to liwenhao0810/istio that referenced this pull request Feb 1, 2024
* Initial merge of out-of-tree inpod CNI

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Note

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Fixup

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Remove these, they're unused now that this is in-tree.

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Some review comments

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Doc capabilities for ztunnel

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Remove stray message type

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* More comments for iptables

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Switch back to in-tree assert lib and fix dupe queue

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Review comments/fixes

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Minor fixes

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Review comments

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Review comment - allowlist all node-local traffic

Do not selectively allowlist it by pod healthcheck ports.

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Update cni/pkg/zdsapi/zds.proto

Co-authored-by: John Howard <howardjohn@google.com>

* Review comments

* Review comment cleanups

* Review: shallow-copy cache to avoid long lock

* Drop this - has special reqs+redundant with other tests

* Update cni/pkg/nodeagent/cni-watcher.go

Co-authored-by: Zhonghu Xu <xuzhonghu@huawei.com>

* Revert "Review comment - allowlist all node-local traffic"

Applying the updated iptables rules to the host breaks the host. will follow-up with a different suggestion.

This reverts commit 66bcbab.

* fix proto-gen

* remove port from health check redirection

The previous approach of not using pod ips at all cause our iptables rules to apply to host traffic.
This brings back the ipset, but only adds pod ips, not pod ports. to simplify the logic and make all host -> ambient pod traffic consistent.

* fix merge issue

* tidy-up

* don't enabled dns by default

* remove DNS_PROXY_ADDR from integration test values

* default mark to 0b1 instead of 0b11

* make redirect dns configurable

* avoid capturing pod ip to itself traffic

* read hello msg from ztunnel

* set EnableCNI to true in the ambient integration tests

This will cause some tests that are not supported in CNI mode to be skipped.

* fix iptables unit tests

* change mark to 1337

* Lints and fixups

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Remove full CmdAddEvent

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Add a relnote

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Quell gen

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* DNS redirection flags are not being respected

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Remove now-unused ambient redirectMode

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* small review comments

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Remove unary "mode" switch

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Comment fixes

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Review comments

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Update cni/README.md

Co-authored-by: Keith Mattix II <keithmattix2@gmail.com>

* Comment

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* nits, securityContext

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Attempt to handle slightly stale pod cache edge case

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Drop null checks

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Add proper knobs for DNS capture

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Fixup README

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Use maps.clone

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* no double queue

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Note to self: look at correct file

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Do not need to server.Start here

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Revert "no double queue"

This reverts commit 316bd69.

* Add a better explainer + wait for cache sync

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

---------

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
Co-authored-by: John Howard <howardjohn@google.com>
Co-authored-by: Zhonghu Xu <xuzhonghu@huawei.com>
Co-authored-by: Yuval Kohavi <yuval.kohavi@gmail.com>
Co-authored-by: Keith Mattix II <keithmattix2@gmail.com>
thedebugger pushed a commit to thedebugger/istio that referenced this pull request Mar 5, 2024
* Initial merge of out-of-tree inpod CNI

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Note

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Fixup

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Remove these, they're unused now that this is in-tree.

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Some review comments

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Doc capabilities for ztunnel

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Remove stray message type

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* More comments for iptables

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Switch back to in-tree assert lib and fix dupe queue

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Review comments/fixes

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Minor fixes

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Review comments

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Review comment - allowlist all node-local traffic

Do not selectively allowlist it by pod healthcheck ports.

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Update cni/pkg/zdsapi/zds.proto

Co-authored-by: John Howard <howardjohn@google.com>

* Review comments

* Review comment cleanups

* Review: shallow-copy cache to avoid long lock

* Drop this - has special reqs+redundant with other tests

* Update cni/pkg/nodeagent/cni-watcher.go

Co-authored-by: Zhonghu Xu <xuzhonghu@huawei.com>

* Revert "Review comment - allowlist all node-local traffic"

Applying the updated iptables rules to the host breaks the host. will follow-up with a different suggestion.

This reverts commit 66bcbab.

* fix proto-gen

* remove port from health check redirection

The previous approach of not using pod ips at all cause our iptables rules to apply to host traffic.
This brings back the ipset, but only adds pod ips, not pod ports. to simplify the logic and make all host -> ambient pod traffic consistent.

* fix merge issue

* tidy-up

* don't enabled dns by default

* remove DNS_PROXY_ADDR from integration test values

* default mark to 0b1 instead of 0b11

* make redirect dns configurable

* avoid capturing pod ip to itself traffic

* read hello msg from ztunnel

* set EnableCNI to true in the ambient integration tests

This will cause some tests that are not supported in CNI mode to be skipped.

* fix iptables unit tests

* change mark to 1337

* Lints and fixups

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Remove full CmdAddEvent

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Add a relnote

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Quell gen

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* DNS redirection flags are not being respected

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Remove now-unused ambient redirectMode

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* small review comments

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Remove unary "mode" switch

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Comment fixes

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Review comments

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Update cni/README.md

Co-authored-by: Keith Mattix II <keithmattix2@gmail.com>

* Comment

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* nits, securityContext

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Attempt to handle slightly stale pod cache edge case

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Drop null checks

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Add proper knobs for DNS capture

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Fixup README

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Use maps.clone

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* no double queue

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Note to self: look at correct file

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Do not need to server.Start here

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

* Revert "no double queue"

This reverts commit 316bd69.

* Add a better explainer + wait for cache sync

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>

---------

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
Co-authored-by: John Howard <howardjohn@google.com>
Co-authored-by: Zhonghu Xu <xuzhonghu@huawei.com>
Co-authored-by: Yuval Kohavi <yuval.kohavi@gmail.com>
Co-authored-by: Keith Mattix II <keithmattix2@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ambient Issues related to ambient mesh area/networking area/security cherrypick/release-1.21 Set this label on a PR to auto-merge it to the release-1.21 branch size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet