Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CNI] CNI binary for 1.19.6 crashes if sh is not on the host #48746

Closed
2 tasks done
Smeb opened this issue Jan 10, 2024 · 19 comments · Fixed by #48757
Closed
2 tasks done

[CNI] CNI binary for 1.19.6 crashes if sh is not on the host #48746

Smeb opened this issue Jan 10, 2024 · 19 comments · Fixed by #48757
Assignees

Comments

@Smeb
Copy link
Contributor

Smeb commented Jan 10, 2024

Is this the right place to submit this?

  • This is not a security vulnerability or a crashing bug
  • This is not a question about how to use Istio

Bug Description

We're running into an issue when we try to use Istio 1.19.6 on our EKS clusters. 1.19.5 works. We think it's because of this patch #48422. We're running on bottlerocket, which doesn't include a shell - that change means that exec.Command("sh" ...) is called on both code paths, while previously it was only run for older versions of iptables. This blocks us from upgrading to 1.19.6 (and presumably 1.20.2 which includes that logic).

The error we get (kubelet logs):

Warning   FailedCreatePodSandBox   Pod/httpbin-1-94bf7fddd-5m4zh     Failed to create pod sandbox: rpc error: code = Unknown
desc = failed to setup network for sandbox "809561b2f41924b90f7299a5b26c67189fa6c4527dcc7ab0cb7f59c5266a933b": plugin
type="istio-cni" name="istio-cni" failed (add): exec: "sh": executable file not found in $PATH

Not sure on the best fix if this is the issue. One (slightly hacky) option would be for the CNI binary to call itself to set up the mini-container environment, since that way we know the binary exists and can run.

Version

$ istioctl version --context sandbox20 (sandbox 20 is a testing cluster)
client version: 1.17.2
control plane version: 1.19.6
data plane version: 1.18.5 (6 proxies), 1.19.6 (2 proxies)

-> the mix of proxies is because 2 1.19.6 proxies were able to run since they run in excluded namespaces and don't reach the issue in the CNI logic.

Additional Information

No response

@howardjohn
Copy link
Member

@Smeb any chance you can try out #48757?

TAG=test-sh HUB=localhost:5000
CGO_ENABLED=0 BUILD_WITH_CONTAINER=0 BUILD_ALL=false DOCKER_TARGETS=docker.install-cni make dockerx.pushx
kubectl -n istio-system set image daemonset/istio-cni-node install-cni=${HUB}/install-cni:${TAG}

can do it (replace hub, probably)

@Smeb
Copy link
Contributor Author

Smeb commented Jan 10, 2024

Awesome turnaround! Can do tomorrow (GMT). Will get back to you when I've had a chance to try it.

@Smeb
Copy link
Contributor Author

Smeb commented Jan 11, 2024

Runs into a permission issue when changing just the image and nothing else:

Warning  FailedCreatePodSandBox  4m45s    kubelet  Failed to create pod sandbox: rpc error: code = Unknown
desc = failed to setup network for sandbox "250d4f5b075fbbbe9c8f29c67abd1073417782508dd0bded96c97aea6d787619": 
plugin type="istio-cni" name="istio-cni" failed (add): fork/exec /opt/cni/bin/istio-cni: permission denied

Here are some logs from running on the node (just to show the binary is there, running the patched version, and executable):

bash-5.1# /opt/cni/bin/istio-cni
CNI plugin istio-cni 1.21-dev
CNI protocol versions supported: 0.1.0, 0.2.0, 0.3.0, 0.3.1, 0.4.0, 1.0.0
bash-5.1# /opt/cni/bin/istio-cni unshare
usage: unshare --lock-file=file -- <command> [args...]
bash-5.1# /opt/cni/bin/istio-cni unshare ls
bin  boot  dev  etc  home  lib  lib64  local  lost+found  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var  x86_64-bottlerocket-linux-gnu

Which looking in the kubelet logs is a result of

AVC avc:  denied  { mounton } for  pid=2545767 comm="istio-cni" path="/" dev="dm-0" ino=2 scontext=system_u:system_r:container_t:s0 tcontext=system_u:object_r:os_t:s0 tclass=dir permissive=0

Something we can fix, but indicates that it may cause issues for other folks.

@Smeb
Copy link
Contributor Author

Smeb commented Jan 11, 2024

Something else I saw is that in the case where we get an error, the logging doesn't work from inside cmdAdd. If I use the patch but just force it to work for our environment (by not doing the mount), then the logging works. I think this might be the fix.

@diranged
Copy link
Contributor

For reference, this also occurs on 1.20.2. We're running on BottleRocket nodes. Thanks for reporting this so quickly.

@howardjohn
Copy link
Member

Which looking in the kubelet logs is a result of

So this is SELinux blocking any mount calls, right? Do you have custom SELinux rules or is this default on bottlerocket?

@Smeb
Copy link
Contributor Author

Smeb commented Jan 12, 2024

We don't have any custom SELinux rules set, so it's the default (we're using aws-k8s-1.28/x86_64/latest). The policies are here. I do see some CNI related rules here and one exception where they allow writes to /etc/resolve.conf here - so maybe a change would be possible. I guess that might look something like (cni_exec_t etc_t (files (mount))) (though there is also a rule that prevents writing dynamic files explicitly).

I can try to take a look at the configuration today. One other option might be using chroot for the sandbox, because I think it should be possible to mount the files iptables needs in another context. I'll try and iterate a bit on your patch on our environment and see if I can get something that'll co-operate.

@Smeb
Copy link
Contributor Author

Smeb commented Jan 17, 2024

Small follow up - we also escalated this with AWS who are investigating. Might be they have a solution for the AWS Bottlerocket case that works for everyone.

I'm a little concerned that we don't have a solution (yet). Is this a behaviour we could gate behind a flag? Obviously the ideal would be to find something that we know works for all environments, but I'm a little concerned that might be tricky given the diversity of environments out there. I don't like the idea of complicating the configuration, but currently Bottlerocket users can't easily use the latest supported patch releases of the currently supported versions as is.

@bcressey
Copy link

Regarding this error:

AVC avc:  denied  { mounton } for  pid=2545767 comm="istio-cni" path="/" dev="dm-0" ino=2 scontext=system_u:system_r:container_t:s0 tcontext=system_u:object_r:os_t:s0 tclass=dir permissive=0

This is saying that istio-cni is not allowed to mount something directly on the host's /. (dm-0 refers to the dm-verity root filesystem which has the os_t label in most places.) I don't think that's actually what you're trying to accomplish so this particular denial may not be worth exploring.

However, if you did try to mount over /etc/nsswitch.conf, I expect you'd see a failure like this:

# made up example
AVC avc:  denied  { mounton } for  pid=??? comm="istio-cni" path="nsswitch.conf" dev="tmpfs" ino=??? scontext=system_u:system_r:container_t:s0 tcontext=system_u:object_r:etc_t:s0 tclass=file permissive=0

From a security policy standpoint, it doesn't make sense to allow a process from an unprivileged container (the container_t bit in this output indicates unprivileged) to overmount a sensitive host file like /etc/nsswitch.conf. Bottlerocket could transition to a more privileged label when executing CNI plugins but that creates a different set of security concerns.

If the mount attempt were treated as optional, that would work for Bottlerocket, which has a very generic /etc/nsswitch.conf that won't trigger network calls for UID lookups. Otherwise, ideally you'd be able to point glibc to a specific NSS config via environment variable or something, but I'm not aware of a way to do that except via LD_PRELOAD which carries its own set of drawbacks.

@howardjohn
Copy link
Member

From a security policy standpoint, it doesn't make sense to allow a process from an unprivileged container

Ignoring selinux, is this true given we are unsharing the mount namespace? With this, the mount shouldn't impact other processes?

Even if thats true, selinux still may not take that into consideration, though.

Otherwise, ideally you'd be able to point glibc to a specific NSS config via environment variable or something, but I'm not aware of a way to do that except via LD_PRELOAD which carries its own set of drawbacks.

This is my top preference but I couldn't find a great solution (LD_PRELOAD didn't seem great).


Here is my current options in preference order:

  1. Come up with a way to make the mount work universally. It seems plausible this isn't going to happen
  2. Come up with a way to detect if we need the mount, and only use it if so. This seems fairly straightforward, "just" parse nsswitch.conf and see if its doing anything weird?
  3. Explicit user configuration to control whether we need to mess with nsswitch.conf as well. This is my least favorite option because it will mean bottlerocket (or similar) users are broken and will need to follow some document to configure things manually, which is a pain.

I'll explore these a bit, but input from others welcome as well. I am trying to setup a bottlerocket env so I can reliably test each of these as well (if someone knows a way to run bottlerocket in docker like kind -- let me know!).

In the meantime, I would recommend users on bottlerocket do not upgrade to 1.19.6/1.20.2. While I cannot make promises, I am confident we will implement at least option 3 by the next patch release (so you shouldn't need to worry about being stuck on an old version and some security patch comes out that is incompatible, for instance).

@howardjohn
Copy link
Member

I got a bottlerocket setup and can reproduce

However, if you did try to mount over /etc/nsswitch.conf, I expect you'd see a failure like this:

type=1400 audit(1705537456.860:6): avc: denied { mounton } for pid=30072 comm="istio-cni" path="/etc/nsswitch.conf" dev="tmpfs" ino=117 scontext=system_u:system_r:container_t:s0 tcontext=system_u:object_r:etc_t:s0 tclass=file permissive=0 is the real log we end up seeing.


Parsing nsswitch.conf seems like the type of thing that might be error prone, so we could change "Come up with a way to detect if we need the mount, and only use it if so" to "Try to overwrite it, but do not exit on error". The downside is we will get these audit messages though

howardjohn added a commit to howardjohn/istio that referenced this issue Jan 18, 2024
@howardjohn
Copy link
Member

One big question I had was why can we not unshare(CLONE_NEWNS); mount ... -- this is what containers do, and obviously containers run on bottlerocket.

I believe here we restrict CNI plugins: https://github.com/bottlerocket-os/bottlerocket/blob/7452c37e0b1abf19797a23bb7ab4d70bb0aef1cf/packages/selinux-policy/rules.cil#L90-L93 while the container runtime runs as another group (runtime_t) which has more access (maybe a bit off, first time dealing with SELinux rules...). So the CNI plugins essentially have less privilege than the container runtime?

@bcressey
Copy link

One big question I had was why can we not unshare(CLONE_NEWNS); mount ... -- this is what containers do, and obviously containers run on bottlerocket. ... So the CNI plugins essentially have less privilege than the container runtime?

Yes, the container runtime is relatively privileged - it needs to be able to change the label of processes it launches, which is a bit like sudo for mandatory access control mechanisms.

The answer to your first question is that SELinux is not namespace-aware, so it doesn't have a way to take the namespace into account when answering the question "does subject have permission to perform action on target ?" For this case, we might want to say "yes, as long as that action doesn't happen in the initial mount namespace" but there's no way to express that constraint.

"just" parse nsswitch.conf and see if its doing anything weird?

Go does this for "hosts" to determine which DNS resolver it should use; if it finds a mechanism it doesn't understand, it switches to the cgo resolver so that the system library is used instead.

In practice I expect you'll find that most Linux distros include an entry more complex than Bottlerocket's passwd: files. On my Fedora development machine, it's passwd: files sss systemd where sss could trigger network-based lookups and systemd wouldn't. There's no way to tell just from the name, so you'd need to maintain a list of known problematic cases that should trigger the overmount.

"Try to overwrite it, but do not exit on error". The downside is we will get these audit messages though

I wouldn't worry about the audit message very much if it's just one AVC denial when the pod starts up. That is pretty common in my experience, and relatively harmless if the software has a fallback path.

howardjohn added a commit to howardjohn/istio that referenced this issue Jan 18, 2024
@howardjohn
Copy link
Member

New iteration pushed up to #48757. This seems to work on GKE and Bottlerocket (where I have tested so far). Basically if we cannot setup the mounts we just continue anyways

@Smeb
Copy link
Contributor Author

Smeb commented Jan 22, 2024

Thanks for all the follow up work. I just ran the latest version in our environment and can confirm it works. We get a lot of audit messages (as expected), but otherwise everything looks good.

howardjohn added a commit to howardjohn/istio that referenced this issue Jan 22, 2024
istio-testing pushed a commit that referenced this issue Jan 22, 2024
* cni: allow running in environments without sh/mount

Fixes #48746

* wip

* working

* wip

* Finally looking reasonabler

* add release-note
howardjohn added a commit to howardjohn/istio that referenced this issue Jan 23, 2024
* cni: allow running in environments without sh/mount

Fixes istio#48746

* wip

* working

* wip

* Finally looking reasonabler

* add release-note

(cherry picked from commit 0b32950)
howardjohn added a commit to howardjohn/istio that referenced this issue Jan 23, 2024
* cni: allow running in environments without sh/mount

Fixes istio#48746

* wip

* working

* wip

* Finally looking reasonabler

* add release-note

(cherry picked from commit 0b32950)
(cherry picked from commit 4a81bd6)
@howardjohn
Copy link
Member

One warning to folks - between the regression in 1.18 and the fix, 1.18 is now EOL https://istio.io/latest/news/support/announcing-1.18-eol-final/. As such, this fix (and any other bugs or CVEs in Istio) will only be applied to 1.19+. 1.19 and 1.20 will be fixed in the next patch release (couple of weeks)

istio-testing pushed a commit that referenced this issue Jan 23, 2024
* iptables: avoid `nsenter`ing twice (#48423)

We are already in the network namespace as this point, no need to enter
again (see `plugin.Program`)

(cherry picked from commit e93c47d)
(cherry picked from commit 7e8a5bd)

* cni: allow running in environments without sh/mount (#48757)

* cni: allow running in environments without sh/mount

Fixes #48746

* wip

* working

* wip

* Finally looking reasonabler

* add release-note

(cherry picked from commit 0b32950)
(cherry picked from commit 4a81bd6)
istio-testing pushed a commit that referenced this issue Jan 23, 2024
* iptables: avoid `nsenter`ing twice (#48423)

We are already in the network namespace as this point, no need to enter
again (see `plugin.Program`)

(cherry picked from commit e93c47d)

* cni: allow running in environments without sh/mount (#48757)

* cni: allow running in environments without sh/mount

Fixes #48746

* wip

* working

* wip

* Finally looking reasonabler

* add release-note

(cherry picked from commit 0b32950)
liwenhao0810 pushed a commit to liwenhao0810/istio that referenced this issue Feb 1, 2024
* cni: allow running in environments without sh/mount

Fixes istio#48746

* wip

* working

* wip

* Finally looking reasonabler

* add release-note
thedebugger pushed a commit to thedebugger/istio that referenced this issue Mar 5, 2024
* cni: allow running in environments without sh/mount

Fixes istio#48746

* wip

* working

* wip

* Finally looking reasonabler

* add release-note
@ajaykumarmandapati
Copy link

ajaykumarmandapati commented Apr 4, 2024

@howardjohn

One warning to folks - between the regression in 1.18 and the fix, 1.18 is now EOL https://istio.io/latest/news/support/announcing-1.18-eol-final/. As such, this fix (and any other bugs or CVEs in Istio) will only be applied to 1.19+. 1.19 and 1.20 will be fixed in the next patch release (couple of weeks)

Hi, can you link in which patch release versions does this fix exist?
I did try with istio v1.20.4 and with 1.21.0 and still have the same issue

2024-04-04T13:13:49.120967Z	warn	cni	failed to setup execution environment, attempting to continue anyways: failed to remount /: permission denied
2024-04-04T13:13:49.120970Z	info	cni	Running ip6tables-restore with the following input:
2024-04-04T13:13:49.120973Z	info	cni	Running command (without lock by env and nss): ip6tables-restore --noflush
2024-04-04T13:13:49.120976Z	warn	cni	failed to setup execution environment, attempting to continue anyways: failed to remount /: permission denied
2024-04-04T13:13:49.120978Z	info	cni	Running command (without nss): iptables-save
2024-04-04T13:13:49.120981Z	warn	cni	failed to setup execution environment, attempting to continue anyways: failed to remount /: permission denied
2024-04-04T13:13:49.120985Z	info	cni	Command output: 
# Generated by iptables-save v1.8.9 on Thu Apr  4 13:13:49 2024
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:ISTIO_INBOUND - [0:0]
:ISTIO_IN_REDIRECT - [0:0]
:ISTIO_OUTPUT - [0:0]
:ISTIO_REDIRECT - [0:0]
-A PREROUTING -p tcp -j ISTIO_INBOUND
-A OUTPUT -p tcp -j ISTIO_OUTPUT
-A ISTIO_INBOUND -p tcp -m tcp --dport 15008 -j RETURN
-A ISTIO_INBOUND -p tcp -m tcp --dport 15020 -j RETURN
-A ISTIO_INBOUND -p tcp -m tcp --dport 15021 -j RETURN
-A ISTIO_INBOUND -p tcp -m tcp --dport 15090 -j RETURN
-A ISTIO_INBOUND -p tcp -j ISTIO_IN_REDIRECT
-A ISTIO_IN_REDIRECT -p tcp -j REDIRECT --to-ports 15006

@ajaykumarmandapati
Copy link

@howardjohn

One warning to folks - between the regression in 1.18 and the fix, 1.18 is now EOL https://istio.io/latest/news/support/announcing-1.18-eol-final/. As such, this fix (and any other bugs or CVEs in Istio) will only be applied to 1.19+. 1.19 and 1.20 will be fixed in the next patch release (couple of weeks)

Hi, can you link in which patch release versions does this fix exist? I did try with istio v1.20.4 and with 1.21.0 and still have the same issue

2024-04-04T13:13:49.120967Z	warn	cni	failed to setup execution environment, attempting to continue anyways: failed to remount /: permission denied
2024-04-04T13:13:49.120970Z	info	cni	Running ip6tables-restore with the following input:
2024-04-04T13:13:49.120973Z	info	cni	Running command (without lock by env and nss): ip6tables-restore --noflush
2024-04-04T13:13:49.120976Z	warn	cni	failed to setup execution environment, attempting to continue anyways: failed to remount /: permission denied
2024-04-04T13:13:49.120978Z	info	cni	Running command (without nss): iptables-save
2024-04-04T13:13:49.120981Z	warn	cni	failed to setup execution environment, attempting to continue anyways: failed to remount /: permission denied
2024-04-04T13:13:49.120985Z	info	cni	Command output: 
# Generated by iptables-save v1.8.9 on Thu Apr  4 13:13:49 2024
*nat
:PREROUTING ACCEPT [0:0]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:POSTROUTING ACCEPT [0:0]
:ISTIO_INBOUND - [0:0]
:ISTIO_IN_REDIRECT - [0:0]
:ISTIO_OUTPUT - [0:0]
:ISTIO_REDIRECT - [0:0]
-A PREROUTING -p tcp -j ISTIO_INBOUND
-A OUTPUT -p tcp -j ISTIO_OUTPUT
-A ISTIO_INBOUND -p tcp -m tcp --dport 15008 -j RETURN
-A ISTIO_INBOUND -p tcp -m tcp --dport 15020 -j RETURN
-A ISTIO_INBOUND -p tcp -m tcp --dport 15021 -j RETURN
-A ISTIO_INBOUND -p tcp -m tcp --dport 15090 -j RETURN
-A ISTIO_INBOUND -p tcp -j ISTIO_IN_REDIRECT
-A ISTIO_IN_REDIRECT -p tcp -j REDIRECT --to-ports 15006

@howardjohn Is there a release with the fix? I see there are quite a few releases in the last weeks.

@howardjohn
Copy link
Member

#48757

fixed in 1.21.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants