Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WSL2] Sync failed errors in kube-proxy for Service with SessionAffinity: ClientIP #1740

Closed
valeneiko opened this issue Jul 20, 2020 · 17 comments · Fixed by #2337
Closed

[WSL2] Sync failed errors in kube-proxy for Service with SessionAffinity: ClientIP #1740

valeneiko opened this issue Jul 20, 2020 · 17 comments · Fixed by #2337
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/external upstream bugs kind/support Categorizes issue or PR as a support question.

Comments

@valeneiko
Copy link
Contributor

What happened:
iptables fail to be updated on the nodes after a Service with sessionAffinity: ClientIP is created.
The issue manifests in requests beeing dropped to any Services that were created after the Service with session affinity.

kube-proxy pod is logging the following error:

E0720 14:29:10.934607       1 proxier.go:1507] Failed to execute iptables-restore: exit status 2 (iptables-restore v1.8.3 (legacy): Couldn't load match `recent':No such file or directory

Error occurred at line: 96
Try `iptables-restore -h' or 'iptables-restore --help' for more information.
)
I0720 14:29:10.934636       1 proxier.go:779] Sync failed; retrying in 30s

What you expected to happen:
iptables to be updated correctly so that requests could be routed to any Service in the cluster.

How to reproduce it (as minimally and precisely as possible):
Create a Service with sessionAffinity: ClientIP

apiVersion: v1
kind: Service
metadata:
  labels:
    alertmanager: main
  name: alertmanager-main
  namespace: monitoring
spec:
  ports:
  - name: web
    port: 9093
    targetPort: web
  selector:
    alertmanager: main
    app: alertmanager
  sessionAffinity: ClientIP

Anything else we need to know?:
Issue is reproducible with both kubeProxyMode: iptables (default) and kubeProxyMode: ipvs

Environment:

  • kind version:
    • kind v0.8.1 go1.13.8 linux/amd64
    • kind v0.9.0-alpha+95753c11434213 go1.15beta1 linux/amd64
  • Kubernetes version:
    • Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.5", GitCommit:"e0fccafd69541e3750d460ba0f9743b90336f24f", GitTreeState:"clean", BuildDate:"2020-05-01T02:11:15Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
    • also tried v1.18.2 (default with kind v0.8.1) and v1.18.6 (default with kind v0.9.0-alpha)
  • Docker version: Docker Desktop with WSL2
Client:
 Debug Mode: false

Server:
 Containers: 1
  Running: 1
  Paused: 0
  Stopped: 0
 Images: 1
 Server Version: 19.03.8
 Storage Driver: overlay2
  Backing Filesystem: <unknown>
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 4.19.104-microsoft-standard
 Operating System: Docker Desktop
 OSType: linux
 Architecture: x86_64
 CPUs: 24
 Total Memory: 25GiB
 Name: docker-desktop
 ID: D4I2:L4Y5:PGPS:CEUY:H3TU:C33L:HASQ:VZKB:53SE:SHQG:OOQV:BZMQ
 Docker Root Dir: /var/lib/docker
 Debug Mode: true
  File Descriptors: 48
  Goroutines: 57
  System Time: 2020-07-20T15:35:34.5632783Z
  EventsListeners: 3
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine

WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
  • OS: Windows 10 (Build: 19041.388)
@valeneiko valeneiko added the kind/bug Categorizes issue or PR as related to a bug. label Jul 20, 2020
@BenTheElder BenTheElder changed the title Sync failed errors in kube-proxy for Service with SessionAffinity: ClientIP [WSL2] Sync failed errors in kube-proxy for Service with SessionAffinity: ClientIP Jul 20, 2020
@aojea
Copy link
Contributor

aojea commented Jul 20, 2020

hmm, I think that is missing one kernel module, If I'm correct it should be xt_recent
@PatrickLang you are the WSL2 expert, how is possible to include this module?

@BenTheElder BenTheElder removed the kind/bug Categorizes issue or PR as related to a bug. label Jul 20, 2020
@BenTheElder
Copy link
Member

kind is not going to mess with your kernel modules so bug => support

If docker desktop is missing a module, that's probably hard to fix as an end user, but they might be willing to seeing as they also offer running the docker desktop VM as a single fixed-version kubernetes node instead of just dockerd.

@BenTheElder BenTheElder added the kind/support Categorizes issue or PR as a support question. label Jul 20, 2020
@BenTheElder
Copy link
Member

/kind external

@k8s-ci-robot k8s-ci-robot added the kind/external upstream bugs label Jul 21, 2020
@BenTheElder BenTheElder added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Aug 27, 2020
@aojea aojea removed their assignment Sep 9, 2020
@tallaxes
Copy link

tallaxes commented Oct 2, 2020

Yes, it looks like the current WSL2 Kernel is built without xt_recent, needed by iptables -m recent ... which kube-proxy uses to implement sessionAffinity: ClientIP. Custom Kernel built with CONFIG_NETFILTER_XT_MATCH_RECENT=y fixed it for me. Submitted microsoft/WSL2-Linux-Kernel#198 (4.19.y) and microsoft/WSL2-Linux-Kernel#199 (5.4.y)

@BenTheElder
Copy link
Member

thanks @tallaxes !

@WSLUser
Copy link

WSLUser commented Oct 6, 2020

If someone wants to compile the 5.10 LTS kernel for WSL2 with this option enabled, take a look here https://github.com/WSLUser/WSL2-Linux-Kernel/blob/linux-msft-wsl-5.10.y/Microsoft/config-wsl. Follow https://wsl.dev/wsl2-kernel-zfs/ for steps for compiling your own kernel.

@hawk29
Copy link

hawk29 commented Dec 2, 2020

CONFIG_NETFILTER_XT_MATCH_RECENT=y

I am sorry but a newbie question. I have come across the same issue using docker-desktop. I have downloaded and installed the latest docker-desktop but to no avail. Is there a release where this will be embedded for end-users or do we have to compile on our own?

Client:
Debug Mode: false
Plugins:
scan: Docker Scan (Docker Inc., v0.3.4)

Server:
Containers: 86
Running: 80
Paused: 0
Stopped: 6
Images: 24
Server Version: 19.03.13
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 8fba4e9a7d01810a393d5d25a3621dc101981175
runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 4.19.128-microsoft-standard
Operating System: Docker Desktop
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 24.77GiB
Name: docker-desktop
ID: 4IEZ:4LGJ:N7EI:P4FA:XJYC:5TTB:X7HG:FCPV:BTFP:YTO2:M75E:QDKH
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled

@tallaxes
Copy link

tallaxes commented Dec 18, 2020

@hawk29 - that would be a question to WSL2 maintainers; as far as I can tell it is not included in any recent releases. (And I don't see any PR merging activity at microsoft/WSL2-Linux-Kernel - so maybe they just don't accept contributions ...)

FWIW, in tallaxes/WSL2-Linux-Kernel fork I have configured GitHub Action to build it, so you should be able to get built Kernel image from there - without worrying about downloading/running "mystery meat" bits - since the build process is transparent. The Kernel image is captured as build artifact - click on build run, scroll to Artifacts, look for bzImage. Then follow instructions for configuring global options in .wslconfig, setting kernel key to point to the custom kernel. (Obviously, use at your own risk, #include <disclamer.h> ...)

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 18, 2021
@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 17, 2021
@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@thavlik
Copy link

thavlik commented Jun 18, 2021

@tallaxes FYI your build artifact was removed due to age.

@tallaxes
Copy link

@thavlik Rebuilt

@kubernetes-sigs kubernetes-sigs deleted a comment from fejta-bot Jun 24, 2021
@kubernetes-sigs kubernetes-sigs deleted a comment from fejta-bot Jun 24, 2021
@kubernetes-sigs kubernetes-sigs deleted a comment from fejta-bot Jun 24, 2021
@BenTheElder BenTheElder removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 24, 2021
@BenTheElder
Copy link
Member

I don't recall if this existed then but https://kind.sigs.k8s.io/docs/user/using-wsl2/ is where we host what we know needs to be done for WSL2, since the maintainers don't use WSL2 we can really use any missing bits contributed there, https://kind.sigs.k8s.io/docs/contributing/development/#documentation

thanks!

OP: if your issue is not resolved, please file a new one, I've eliminated that bot from this repo, but I think maybe this issue is now stale anyhow 🤔

@valeneiko
Copy link
Contributor Author

I haven't needed SessionAffinity for a while now, so not sure if the issue is resolved. I can try to to check when I get some time to do so.

@thavlik Did you run into this issue recently? Is it still reproducible?

If so it might be worth adding the information about cusom kernel to the wsl2 docs.

@BenTheElder BenTheElder reopened this Jun 25, 2021
@thavlik
Copy link

thavlik commented Jun 26, 2021

@thavlik Did you run into this issue recently? Is it still reproducible?

Yes, on both WSL2 and Hyper-V backends I have an issue where a microservice that issues a token is a few seconds ahead of the test code, and the golang JWT library will error if you use a token before it's issued. I worked around it by catching the error in development environments only.

@valeneiko
Copy link
Contributor Author

valeneiko commented Jun 27, 2021

I can confirm. The issue is still reproducible. The solution with custom kernel works. I compiled 5.4.72 to check (the version currently used by WSL2).

The soluton

  1. Build a kernel with xt_recent kernel module enabled
    docker run --name wsl-kernel-builder --rm -it ubuntu:latest bash
    
    WSL_COMMIT_REF=linux-msft-5.4.72 # change this line to the version you want to build
    
    # Install dependencies
    apt update
    apt install -y git build-essential flex bison libssl-dev libelf-dev bc
    
    # Checkout WSL2 Kernel repo
    mkdir src
    cd src
    git init
    git remote add origin https://github.com/microsoft/WSL2-Linux-Kernel.git
    git config --local gc.auto 0
    git -c protocol.version=2 fetch --no-tags --prune --progress --no-recurse-submodules --depth=1 origin +${WSL_COMMIT_REF}:refs/remotes/origin/build/linux-msft-wsl-5.4.y
    git checkout --progress --force -B build/linux-msft-wsl-5.4.y refs/remotes/origin/build/linux-msft-wsl-5.4.y
    
    # Enable xt_recent kernel module
    sed -i 's/# CONFIG_NETFILTER_XT_MATCH_RECENT is not set/CONFIG_NETFILTER_XT_MATCH_RECENT=y/' Microsoft/config-wsl
    
    # Compile the kernel 
    make -j2 KCONFIG_CONFIG=Microsoft/config-wsl
    
    # From host terminal copy the built kernel
    docker cp wsl-kernel-builder:/src/arch/x86/boot/bzImage .
  2. Configure WSL to use newly built kernel: https://docs.microsoft.com/en-us/windows/wsl/wsl-config#configure-global-options-with-wslconfig

@BenTheElder
Copy link
Member

this is at least documented now, thanks @anyname2.
also thanks for microsoft/WSL#7124 to track upstream.

@Itnotf
Copy link

Itnotf commented Feb 23, 2023

thanks @valeneiko

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/external upstream bugs kind/support Categorizes issue or PR as a support question.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants