Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube_proxy running under a static kernel reports false-positive module load failures #69006

Closed
kcgen opened this issue Sep 24, 2018 · 7 comments
Labels
area/kube-proxy kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/network Categorizes an issue or PR as relevant to SIG Network.

Comments

@kcgen
Copy link

kcgen commented Sep 24, 2018

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:
Building a fully static Linux kernel with kube_proxy's desired IP_VS (Virtual Server) modules linked statically:

CONFIG_IP_VS=y
CONFIG_IP_VS_RR=y
CONFIG_IP_VS_WRR=y
CONFIG_IP_VS_SH=y
CONFIG_NF_CONNTRACK_IPV4=y

Causes kube_proxy to believe that these features don't exist because it can't find modules for them:

W0923 18:20:43.979259       1 proxier.go:469] Failed to load kernel module ip_vs with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W0923 18:20:43.979848       1 proxier.go:469] Failed to load kernel module ip_vs_rr with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W0923 18:20:43.980415       1 proxier.go:469] Failed to load kernel module ip_vs_wrr with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W0923 18:20:43.981481       1 proxier.go:469] Failed to load kernel module ip_vs_sh with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W0923 18:20:43.982052       1 proxier.go:469] Failed to load kernel module nf_conntrack_ipv4 with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules

What you expected to happen:

If Kube_proxy's desired kernel features are linked statically in the kernel, kube_proxy should carry on and not attempt and throw error messages about modules not being found.

Under the hood, kube_proxy should simply check for the existence of the desired kernel features in the /proc tree (ie: /proc/sys/net/ipv4/vs, and others) before attempting to look for and load modules. If, after module loads fails, then throw the error messages and switch to iptables.

How to reproduce it (as minimally and precisely as possible):

  1. Download and extract the current stable kernel source version (today it's 4.18.9) from kernel.org
  2. Copy your existing running /boot/config-old_kernel_version into the source tree as .config
  3. Run make oldconfig and answer any questions
  4. Run make localyesconfig to switch to a fully static kernel
  5. Modify .config and ensure the following exist, and are not =m:
    CONFIG_IP_VS=y
    CONFIG_IP_VS_RR=y
    CONFIG_IP_VS_WRR=y
    CONFIG_IP_VS_SH=y
    CONFIG_NF_CONNTRACK_IPV4=y
    
  6. Run make menuconfig and deselect [ ] Enable loadable module support (box is empty)
  7. make -j$(nproc)
  8. cp arch/x86/boot/bzImage /boot/vmlinuz-4.18.9-kube_proxy_mods_static
  9. cp .config /boot/config-4.18.9-kube_proxy_mods_static
  10. Regenerate your boot loader's config (grub2-mkconfig -o /boot/grub2/grub.cfg) or (grub-mkconfig -o /boot/grub/grub.cfg) depending on your distribution.
  11. Reboot and select your new kernel with the kube_proxy modules linked statically.
  12. Start kube_proxy and watch it throw the above errors about not finding the modules.

Anything else we need to know?:

Fully static kernels will not generate /lib/modules/$kerne_version/modules.builtin so inspecting that file will fail - as well in the kube_proxy source. The gold-standard is to simply check for the features in the active/proc/... tree first, and consider those as "built-in".

The next best options is checking /boot/config-$(uname -r) for CONFIG_<FEATURE>=y statements, which tells you if they're built-in. Note that /boot/config-<version> files are almost universally installed by OS maintainers.

And finally, you could try checking /proc/config.gz, which is a compressed version of the externally held .config text file, in a similar manner noted above, however it's up to the maintainer to build that feature into the kernel (so it might not exist).

For reference, here's the list of modules kube_proxy wants: https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/ipvs/proxier.go#L161

Here we see kube_proxy's errant module-centric code that needs to be made compatible w/ static kernels:
https://github.com/kubernetes/kubernetes/blob/master/pkg/proxy/ipvs/proxier.go#L433

Environment:

  • Kubernetes version (use kubectl version): 1.11.3
  • Cloud provider or hardware configuration: On premise
  • OS (e.g. from /etc/os-release): Leap 15.0
  • Kernel (e.g. uname -a): 4.18.9 statically linked
  • Install tools: gcc 8.1
  • Others:
@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 24, 2018
@shubheksha
Copy link
Contributor

/sig network
/area kube-proxy

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. area/kube-proxy and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 24, 2018
@kasisnu
Copy link

kasisnu commented Sep 28, 2018

Hi

If there isn't a hurry, can I volunteer to look into this? I don't have much experience in this area so it'll probably take me a while to figure things out.

New to the k8s codebase.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 1, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 31, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@andrewrynhard
Copy link
Contributor

Anyway we can get this reopened or should I create a new issue? We are running into this issue in https://github.com/talos-systems/talos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kube-proxy kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
Development

No branches or pull requests

6 participants