Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for wrong iptables binary (#49279) + general iptables wrapper correctness pass #49703

Merged
merged 23 commits into from
Mar 11, 2024

Conversation

bleggett
Copy link
Contributor

@bleggett bleggett commented Mar 5, 2024

Please provide a description of this PR:

Background

In looking at a fix for #49279 I realized that our iptables binary selection logic across ambient and sidecar was just fundamentally not correct, and the effort to fix that bled into the correctness rework I had wanted to do on both inpod and sidecar iptables wrappers since merging inpod.

For context, there are 3 basic iptables binaries we use.

  • iptables
  • iptables-save
  • iptables-restore

But with the kernel legacy/nft split, there are parallel binaries for each of those, and again for ipv6 tables, so the actual potential binary set on any given linux machine can be some combination of:

  • iptables-legacy
  • iptables-nft
  • ip6tables
  • iptables-legacy-save
  • iptables-nft-save
  • iptables-legacy-restore
  • iptables-nft-restore
  • ip6tables-legacy
    ...and on and on - the matrix is like 12 different binaries, potentially. Which is ridiculous but common.

We use our iptables wrapper lib 4 different ways across sidecar and ambient, and in each case "which iptables binary should we use" is a question with a different answer.

Usage Using iptables Binaries From Networking context to select correct binary against
from CNI plugin (sidecar) host $PATH pod netns context
from init container (sidecar) init container $PATH pod netns context
from CNI agent (ambient) CNI container $PATH pod netns context
from CNI agent (ambient) CNI container $PATH host netns context

If, for instance, the host has iptables-legacy and iptables-nft in $PATH, which should we use? We should see if rules are defined in nft at all and prefer that, but if no rules are in nft tables, and the legacy binary is available and rules are defined in legacy, we should use the legacy binary. If no rules are defined in either, we should use the system-default.

If we are running directly on the host (not in a container) we can just assume whatever binary (legacy or nft) is aliased to iptables is correct and use that, and we're done. This is what k8s does for the kubelet.

However we do something a little more weird than K8S - we also have CNI, and that is running iptables from inside a container against the host netns. The CNI container ships its own iptables binaries that may differ from the host. So we have to match the rules present in the host netns (which we can see) with the correct binary we have access to (only our own baked into the container) - we cannot rely purely on the K8S logic here because we are not only concerned with a kubelet-style context.

So basically selecting the right binary for one of the above cases is a heuristic with 2 inputs:

  1. Does the particular binary variant (legacy, nft, ip6tables) even exist in $PATH?
  2. In the current netns context, are there already rules defined in one of (legacy, nft) tables? If so, prefer that one.

This was the cause of #49279 - we were using the in-container default binary, without inspecting host rules to see if that was actually the one we should use in the host context.

Really, calling code should never assume what binary variant of iptables is selected, and should simply define the type of operation (insert/save/restore) and let detection logic pick the correct binary given the environmental context.

But a lot of our code just assumes binary names and when used in some contexts, the result is simply incorrect.

What this does

This fixes that by fixing the wrapper lib to have binary detection logic that will work in all contexts where we use it, rely fully on binary autodetection in all spots, properly handling iptables/ip6tables variants (which, because of successive refactorings, we were not handling correctly, which the updated test goldens show), and fixing some incomplete or simply incorrect iptables tests.

The end result of this is that if our iptables wrapper is used, the correct iptables binary should be selected regardless of whether the context is a pod, the host, a container, or the host filesystem, or what "default" binary is configured in either the container or the host.

Improves but does not fix

  • Dualstack iptables rules around ambient and sidecar. In ambient we always create ip6tables rules in parallel in all spots. In sidecar, it's a bit messier, and I think there are a few spots where we insert ipv6 rules for some feature flags but not others - either way there's not a lot of testing to check that. In the interests of not ballooning the PR further, I left this largely as-is - but we are explicitly detecting the ipv6 binary variant now.

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
@bleggett bleggett added the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Mar 5, 2024
@bleggett bleggett requested a review from a team as a code owner March 5, 2024 01:13
@istio-policy-bot istio-policy-bot added area/ambient Issues related to ambient mesh area/dual-stack area/networking release-notes-none Indicates a PR that does not require release notes. labels Mar 5, 2024
@istio-testing istio-testing added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Mar 5, 2024
@bleggett bleggett changed the title (WIP) Fix for wrong iptables binary (#49279) + a bit of iptables cleanup between sidecar and inpod. (WIP) Fix for wrong iptables binary (#49279) + general iptables wrapper correctness pass Mar 5, 2024
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
This reverts commit a86d8e7.
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
@bleggett bleggett changed the title (WIP) Fix for wrong iptables binary (#49279) + general iptables wrapper correctness pass Fix for wrong iptables binary (#49279) + general iptables wrapper correctness pass Mar 6, 2024
@istio-testing istio-testing removed the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Mar 6, 2024
ExistingRules: existingRules,
}, nil
}
return IptablesVersion{}, fmt.Errorf("iptables save binary: %s not present", iptablesSaveBin)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can capture the err in Line 75 and use the following log message.

Suggested change
return IptablesVersion{}, fmt.Errorf("iptables save binary: %s not present", iptablesSaveBin)
return IptablesVersion{}, fmt.Errorf("error executing the %s binary: %v", iptablesSaveBin, err)

Copy link
Contributor Author

@bleggett bleggett Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the reason I did not capture the err on 75 is that it's redundant -

if iptables-save --version fails, iptables-save will fail, and vice-versa, so I only check+return for the first failure.

There is no real scenario where the first would not fail and the second would, so there is no need to capture both.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Lets change the log message to include the execErr.

The current message could be misleading to the users.
i.e., the failure need not be due to missing binary. As you know, there could be many other errors for the command to fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sridhargaddam you were right! there is a difference :D

#49850

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bleggett thanks for the PR. I noticed a small typo in the log message and pushed the following PR to address it.
#49848

CC: @howardjohn

// binary is `legacy`, or vice-versa - we must match the binaries we have in our $PATH to what rules are actually defined
// in our current netns context.
//
// Q: Why not simply "use the host default binary" at $PATH/iptables?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternate way to select between the two modes at run time is to use the approach explained in the following repo - https://github.com/kubernetes-sigs/iptables-wrappers

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not completely familiar with Istio code base. Should we update this code as well? (as its forcing iptables to always use legacy)

&& update-alternatives --set iptables /usr/sbin/iptables-legacy \

Copy link
Contributor Author

@bleggett bleggett Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternate way to select between the two modes at run time is to use the approach explained in the following repo - https://github.com/kubernetes-sigs/iptables-wrappers

Yes - as mentioned in the PR description this will not work for us because we do not always have access to the host binaries because we are not always running in the host FS. kubelet does, so k8s is fine doing it this way. Their heuristic for detecting which binary to use (checking for existing rules) is similar to what we are doing here, however.

I'm not completely familiar with Istio code base. Should we update this code as well? (as its forcing iptables to always use legacy)

&& update-alternatives --set iptables /usr/sbin/iptables-legacy \

It doesn't matter, as with this PR we will prefer legacy or nft binaries explicitly, so the "container default" binary is moot - we can't rely on the container default being the same as the host default, so the container default isn't relevant to our selection heuristic.

With this PR, we no longer need to force a container default, and so can remove that line in a followup if we wish however - it won't change anything.

Copy link
Contributor

@sridhargaddam sridhargaddam Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - as mentioned in the PR description this will not work for us because we do not always have access to the host binaries because we are not always running in the host FS.

We dont need access to the host binaries inside the pod. We can add the logic to the Dockerfile. As an example, you can see the following file - https://github.com/submariner-io/submariner/blob/3248d61fd6e20a6cb1d877ce75e30ba2a226faa6/package/Dockerfile.submariner-route-agent#L39

However, one requirement is that the pod should have priviledges to run iptables command (I guess the pod will have them, as it needs to program some iptables rules).

It doesn't matter, as with this PR we will prefer legacy or nft binaries explicitly, so the "container default" binary is moot - we can't rely on the container default being the same as the host default, so the container default isn't relevant to our selection heuristic.

Right, I understand that. If the container defaults points to appropriate binary then we dont have to make any changes in the Istio go code.
Anyways, I don't have any strong opinions on the approach and even the approach of selecting the iptables-nft/legacy inside the pod would work.

Copy link
Contributor Author

@bleggett bleggett Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - as mentioned in the PR description this will not work for us because we do not always have access to the host binaries because we are not always running in the host FS.

We dont need access to the host binaries inside the pod. We can add the logic to the Dockerfile. As an example, you can see the following file - https://github.com/submariner-io/submariner/blob/3248d61fd6e20a6cb1d877ce75e30ba2a226faa6/package/Dockerfile.submariner-route-agent#L39

Since we are jumping into arbitrary pod network namespaces during runtime in some contexts, we cannot select the correct binary except at runtime, dynamically, at that point. So it is not something we can accurately detect in the Dockerfile, even if we wanted to.

Right, I understand that. If the container defaults points to appropriate binary then we dont have to make any changes in the Istio go code. Anyways, I don't have any strong opinions on the approach and even the approach of selecting the iptables-nft/legacy inside the pod would work.

The container default can never point to the appropriate binary in all cases where the wrapper is invoked, is the problem. No matter what we do with it. So we cannot rely on the system default binary no matter what.

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
// 3. If so, use `nft` binary set
// 4. Otherwise, see if we have `legacy` binary set, and use that.
// 5. Otherwise, see if we have `iptables` binary set, and use that (detecting whether it's nft or legacy).
func (r *RealDependencies) DetectIptablesVersion(ipV6 bool) (IptablesVersion, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have some logging around this? (I may have missed it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only thing that stopped me was that the logger wasn't already imported here, but I might just add it and put in some logging, yeah.

@istio-testing istio-testing merged commit 8592312 into istio:master Mar 11, 2024
28 checks passed
bleggett added a commit to bleggett/istio that referenced this pull request Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ambient Issues related to ambient mesh area/dual-stack area/networking release-notes-none Indicates a PR that does not require release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants