Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linkerd without CNI as non-root #49

Merged
merged 2 commits into from
Nov 2, 2021

Conversation

chrischdi
Copy link
Contributor

Changes the image for being able to run linkerd without CNI as non-root

  • adds libcap and sets capabilities on binaries required for non-root
  • set USER to uid of nobody to run as non-root per default
  • replace usage of iptables-save by iptables -t nat -L

New PR to continue work initially done at #31.

Credits to @pankajmt which did the initial implementation.

I set the default user to the uid of nobody.
I did use the uid because otherwise kubernetes requires us to set runAsUser explicitly because a non-numeric user is not proven to not being root.

Christian Schlotter , Daimler TSS GmbH, Provider Information

Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chrischdi
Unfortunately the ConfigureFirewall() function in iptables/iptables.go relies on the format returned by iptables-save to determine whether the rules have already been setup. This is why the integration test is failing. I think for now the easiest would be to revert that particular change and then we can test how this all integrates with linkerd. Would be lovely if you could also submit the corresponding PR in the linkerd2 repo changing the manifests to run the init container as non-root 😉

@chrischdi
Copy link
Contributor Author

Hi @alpeb,

thank you for your response.

I'll take a look if I get iptables-save also to run as non root in this case. In my first try adding capabilities to the BusyBox binary did not work but I'll take another look :-)

If this works out I of course will also open the other PR, thanks :-)

@chrischdi chrischdi force-pushed the schloc/capabilities-for-nonroot branch from 880c09b to 863d13c Compare October 27, 2021 10:41
@chrischdi
Copy link
Contributor Author

I reverted to iptables-save, however I had add -t nat as arguments to iptables-save.

Otherwise it tries to read the wanted table from /proc/net/ip_tables_names which is only readable by root:

/ # cat /proc/net/ip_tables_names
nat
/ # ls -lh /proc/net/ip_tables_names
-r--r-----    1 root     root           0 Oct 27 10:43 /proc/net/ip_tables_names

However this should be safe because we only use -t nat when adjusting rules via iptables.

@chrischdi chrischdi force-pushed the schloc/capabilities-for-nonroot branch from 863d13c to 099dd70 Compare October 27, 2021 10:43
* adds libcap and sets capabilities on binaries required for non-root
* set USER to uid of nobody to run as non-root per default
* replace usage of `iptables-save` by `iptables-save -t nat`

Signed-off-by: Schlotter, Christian <christian.schlotter@daimler.com>
Signed-off-by: Schlotter, Christian <christian.schlotter@daimler.com>
Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worked great, thanks @chrischdi !
I tested with runAsNonRoot: true and runAsUser: 65534 on the init container securityContext, and it worked perfectly 👍

Looking forward for the followup PR in the linkerd2 repo 😉

@chrischdi
Copy link
Contributor Author

Sounds great :-) I opened the PR here: linkerd/linkerd2#7162 , however I assume this requires this PR to get merged first or at least I don't know how the proxy-init image gets updated there :-)

@alpeb
Copy link
Member

alpeb commented Oct 28, 2021

Well, it turns out we were also leveraging root privileges when using the --timeout-close-wait-secs flag, enabled through linkerd's proxyInit.closeWaitTimeoutSecs config. This was added to address an edge case about long-running requests (see #linkerd/linkerd2#4276). If that flag is enabled and the container isn't privileged, it'll crash. The offending call is done by sysctl. @chrischdi could you see if there's a chance to add extra capabilities to the sysctl binary in order to avoid this? Otherwise, one more complicated solution would be to create two separate linkerd2-proxy-init images, root vs non-root, to be used depending on the value of that flag...

@part-time-githubber
Copy link

Many thanks @chrischdi for re-surrecting and following this up. Our priorities have been focussed on other things for some time now.

@chrischdi
Copy link
Contributor Author

Well, it turns out we were also leveraging root privileges when using the --timeout-close-wait-secs flag, enabled through linkerd's proxyInit.closeWaitTimeoutSecs config. This was added to address an edge case about long-running requests (see #linkerd/linkerd2#4276). If that flag is enabled and the container isn't privileged, it'll crash. The offending call is done by sysctl. @chrischdi could you see if there's a chance to add extra capabilities to the sysctl binary in order to avoid this? Otherwise, one more complicated solution would be to create two separate linkerd2-proxy-init images, root vs non-root, to be used depending on the value of that flag...

I only see two options when proxyInit.closeWaitTimeoutSecs is configured:

  • instead of using sysctl in proxy-init, set the proper sysctl value for net.netfilter.nf_conntrack_tcp_timeout_close_wait via pod.spec.securityContext.sysctls.

  • or configure the pods to run as root with the following values, this even works using the same image as for non-root:

    {{- if .Values.proxyInit.closeWaitTimeoutSecs }}
    privileged: true
    runAsNonRoot: false
    runAsUser: 0
    {{- else }}
    privileged: false
    runAsNonRoot: true
    {{- end }}

@alpeb
Copy link
Member

alpeb commented Oct 29, 2021

Using securityContext.sysctls sounded promising, but I'm realizing it's only available at the pod level, not at the container level, so the proxy injector would have to mutate the pod instead of just adding metadata and (init-)containers, which would complicate things too much just for this edge case. If your second solution does the trick, I think that's the way to go! :-)

@chrischdi
Copy link
Contributor Author

I agree, the second option is way less complicated :-) I will adjust the other PR the next days according to option 2 👍

Copy link
Member

@mateiidavid mateiidavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @chrischdi, thanks for working on this! I'm still going through everything that has been discussed so far. I left a few comments, mostly out of curiosity on how this change will impact our future efforts to add tproxy support to iptables. You can read about it here if you're curious: linkerd/linkerd2#7089.

I'll give this a more in-depth review, my only concern is a bit superficial here: adding -t nat might get in the way of changing other tables.

As far as sysctl goes, I think your second option sounds good, it would be great to have the ability to run the init container as root in very specific scenarios. Here's what I think (aside from @alpeb's comments):
I know with sysctl we can also only configure some kernel parameters that are deemed "unsafe" by k8s. At the moment, we don't plan on making use of any unsafe params. I'm a bit wary of relying on securityContext.sysctl (and hence the first option suggested) because in the future, our requirements might change. It will be easier to make use of sysctl in code as opposed to enabling unsafe params on a per node basis (which would fall on the operator and kind of break our "works out of the box" philosophy, it would also just complicate everything).

My random thoughts aside, this looks really good!

@@ -336,7 +335,7 @@ func makeJumpFromChainToAnotherForAllProtocols(
}

func makeShowAllRules() *exec.Cmd {
return exec.Command("iptables-save")
return exec.Command("iptables-save", "-t", "nat")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted to iptables-save, however I had add -t nat as arguments to iptables-save.
Otherwise it tries to read the wanted table from /proc/net/ip_tables_names which is only readable by root:

We might be looking at operating on the mangle table to add support for tproxy in the coming weeks. Do you anticipate this getting more complicated? How would our ability to add rules in different tables be affected by this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding rules is just as easy as iptabels -t mangle ... However the makeShowAllRules should then also include the output of iptables-save -t mangle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, sounds good!

@chrischdi
Copy link
Contributor Author

Hey @chrischdi, thanks for working on this! I'm still going through everything that has been discussed so far. I left a few comments, mostly out of curiosity on how this change will impact our future efforts to add tproxy support to iptables. You can read about it here if you're curious: linkerd/linkerd2#7089.

I'll give this a more in-depth review, my only concern is a bit superficial here: adding -t nat might get in the way of changing other tables.

As mentioned above: the changes here should still allow adding rules to the mangle table via iptables -t mangle .... However, it would require to concatenate the output of iptables-save -t mangle in the makeShowAllRules function to the output of iptables-save -t nat in this function to still see all added rules.

As far as sysctl goes, I think your second option sounds good, it would be great to have the ability to run the init container as root in very specific scenarios. Here's what I think (aside from @alpeb's comments): I know with sysctl we can also only configure some kernel parameters that are deemed "unsafe" by k8s. At the moment, we don't plan on making use of any unsafe params. I'm a bit wary of relying on securityContext.sysctl (and hence the first option suggested) because in the future, our requirements might change. It will be easier to make use of sysctl in code as opposed to enabling unsafe params on a per node basis (which would fall on the operator and kind of break our "works out of the box" philosophy, it would also just complicate everything).

Setting sysctls directly from inside the container always requires the root user. This behavior is not adjustable via a container image because it is a mounted filesystem and only the root user / uid 0 has write access to the files (sysctl -w is just a shortcut to do a echo "value" >> /proc/sys/my/path) and if I'm right always requires privileged mode to be even writeable for a root user:

$ ls /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_close_wait -lh
-rw-r--r-- 1 root root 0 Nov  2 11:04 /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_close_wait
$ mount | grep '/proc/sys '
proc on /proc/sys type proc (ro,nosuid,nodev,noexec,relatime)

So setting safe or unsafe sysctls the "non-root" way would always result in using the pods securityContext.sysctl.

To help on the "works out of the box" philosophy: for me it would also be okay to "opt-in" for the non-root part on the linkerd chart.

So to only enable nonroot for the proxy-init container as an edge case.
For doing this the changes here would be still required.
But on the linkerd/linkerd2 part it could look like something like this:

{{- if .Values.proxyInit.closeWaitTimeoutSecs }}
privileged: true
runAsNonRoot: false
runAsUser: 0
{{- else }}
{{- if .Values.proxyInit.nonRoot }}
privileged: false
runAsNonRoot: true
{{- else }}
runAsNonRoot: false
runAsUser: 0
{{- end }}

This would keep the current behaviour, but allow everyone to opt-in to a nonRoot working proxyInit container (expect closeWaitTimeoutSecs is wanted.

My random thoughts aside, this looks really good!

Thanks a lot :-)

Copy link
Member

@mateiidavid mateiidavid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @chrischdi! The sysctl part makes a lot of sense, thanks for confirming. From my side, this is good to 🚢 📦!

@@ -336,7 +335,7 @@ func makeJumpFromChainToAnotherForAllProtocols(
}

func makeShowAllRules() *exec.Cmd {
return exec.Command("iptables-save")
return exec.Command("iptables-save", "-t", "nat")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, sounds good!

@alpeb
Copy link
Member

alpeb commented Nov 2, 2021

I'll get this merged and release v1.5.0. I think this should be launched as non-root by default, but we can discuss that in the linkerd2 PR 🙂

@alpeb alpeb merged commit 593abd1 into linkerd:master Nov 2, 2021
alpeb added a commit to linkerd/linkerd2 that referenced this pull request Nov 2, 2021
To include the changes from linkerd/linkerd2-proxy-init#49 (allow the
proxy-init image to be run as non-root)
alpeb added a commit to linkerd/linkerd2 that referenced this pull request Nov 3, 2021
To include the changes from linkerd/linkerd2-proxy-init#49 (allow the
proxy-init image to be run as non-root)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants