Skip to content

quickwit helm chart doesn't work on IPv6 EKS cluster in dualstack VPC #5913

@neoakris

Description

@neoakris

Describe the bug

  • Quickwit doesn't seem to work in IPv6 environments
    • Let's use an IPv4 environment as a point of comparison that can establish expected behavior:
      • I have rancher-desktop provided k3s kubernetes cluster running on my M3 Macbook, (so it's basically giving me an IPv4 environment to compare to).
      • I installed quickwit via helm on rancher-desktop kubernetes cluster with default settings
      • kubectl exec -it qw-quickwit-searcher-0 -- curl localhost:7280/health/livez
        ^-- This returns true
      • If I edit the statefulset to allow running as root (and force reboot so it takes effect)
        And then run these debug commands
        kubectl exec -it qw-quickwit-searcher-0 -- /bin/bash
        apt update
        
        apt install netcat-traditional
        nc -zv localhost 7280
        # localhost [127.0.0.1] 7280 (?) open
        
        apt install net-tools
        netstat -plant | grep LISTEN
        # tcp        0      0 0.0.0.0:7280            0.0.0.0:*               LISTEN      1/quickwit          
        # tcp        0      0 0.0.0.0:7281            0.0.0.0:*               LISTEN      1/quickwit   
        
        apt install -y iproute2
        ss -plant | grep LISTEN
        # LISTEN    0      128          0.0.0.0:7280         0.0.0.0:*     users:(("quickwit",pid=1,fd=11))
        # LISTEN    0      128          0.0.0.0:7281         0.0.0.0:*     users:(("quickwit",pid=1,fd=13))
        
        apt install -y lsof
        lsof -i -P -n | grep LISTEN
        # quickwit   1 root   11u  IPv4  49029      0t0  TCP *:7280 (LISTEN)
        # quickwit   1 root   13u  IPv4  47988      0t0  TCP *:7281 (LISTEN)

When I install the helm chart in an IPv6 Kubernetes cluster:

  • kubernetes startup probe instant fails
  • I temporarily disabled all probes and configured kubernetes pod of searcher statefulset to run as root (so I could install debug tools to take a look) and from what I can tell it's not serving traffic on port 7280
kubectl exec -it qw-quickwit-searcher-0 -- curl localhost:7280/health/livez
# curl: (7) Failed to connect to localhost port 7280 after 0 ms: Couldn't connect to server

kubectl exec -it qw-quickwit-searcher-0 -- /bin/bash
apt update

apt install netcat-traditional
nc -zv localhost 7280
# localhost [127.0.0.1] 7280 (?) : Connection refused

apt install -y net-tools
netstat -plant | grep LISTEN
# (blank no listening ports found)

apt install -y iproute2
ss -plant | grep LISTEN
# (blank no listening ports found)

apt install -y lsof
lsof -i -P -n | grep LISTEN
# (blank no listening ports found)
lsof -i -P -n
# COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
# quickwit   1 root   10u  IPv6 481325      0t0  UDP [2600:1f11:8ca:fa04:f3b5::4]:7282 

Steps to reproduce (if applicable)
Steps to reproduce the behavior:

  1. Deploy a dual stack VPC on AWS
  2. Deploy an IPv6 EKS cluster (which involves worker nodes getting ipv4 & ipv6 addresses, and kubernetes pods and kubernetes services being ipv6 only)
  3. Deploy quickwit helm chart
  4. To install debug tools, I manually edited the kubernetes statefulset to get rid of probes and set
        securityContext:
          runAsNonRoot: false
          runAsUser: 0
...
      securityContext:
        fsGroup: 0

Expected behavior

  • Above I mentioned that I expect quickwit to listen on port 7280, like it does in IPv4 mode.
  • Quickwit's public docs mention IPv6 is supported, so I was expecting quickwit to run correctly in an IPv6 environment.
  • At first I thought it was a config issue, where things are optimized for IPv4 by default and IPv6 requires additional config, but I tried various combinations of setting environment variable & config file to IPv6 values and could never get IPv6 to work, which makes me question how well IPv6 support has been tested.
    • Note: '::' is IPv6 equivalent of 0.0.0.0, I also tried other syntax variations, but other syntax's made the app crash. This one was at least accepted, but didn't work.
    • env var "QW_LISTEN_ADDRESS": '::'
    • config file "listen_address": '::'
    • both set to '::'
    • Note: The helm chart makes env var POD_IP available by default
    • env var "QW_LISTEN_ADDRESS": '$(POD_IP)'
    • config file "listen_address": '${POD_IP}'
    • both set to (dynamic value of pod ip)
  • When quickwit is installed to an IPv6 only environment, I expect curl localhost:7280/health/livez to work
    (I couldn't test anything else as I failed to get the app to run when deployed to IPv6 only environment, specifically IPv6_EKS where the kubernetes worker nodes are dualstack and pods and services are IPv6 only.)

Configuration:
I tried multiple configurations

  • By default I run on ARM64 bottlerocket AMI nodes, I saw the docs mention sometimes OS dependencies are needed, and that ARM support was technically experimental.
  • So I also tested on AL2023 x86_64, but got the same error
  • It also worked fine on my rancher-desktop hosted on ARM based M3 Macbook (representing ARM running in an IPv4 environment) so I'm confident the issue is specific to IPv6.
  • I'm confident I'm doing the config file correctly, because
kubectl exec -it qw-quickwit-searcher-0 -- /bin/bash

apt update
apt install -y procps
ps aux
# USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
# root           1  0.0  0.8 459024 33364 ?        Ssl  19:28   0:00 quickwit run --service searcher

quickwit run --service searcher --help
      --config <config>
          Config file location
          
          [env: QW_CONFIG=/quickwit/node.yaml]
          [default: config/quickwit.yaml]

echo $QW_CONFIG
/quickwit/node.yaml

  • ^-- I verified this was correctly being edited based on values added to helm values

  • I tested tons of variations and here's the one's worth mentioning:

    • helm chart defaults, where both:
      • config file "listen_address": '0.0.0.0'
      • env var "QW_LISTEN_ADDRESS": '0.0.0.0'
      • Note: '::' is IPv6 equivalent of 0.0.0.0, I also tried other syntax variations, but other syntax's made the app crash. This one was at least accepted, but didn't work.
      • env var "QW_LISTEN_ADDRESS": '::'
      • config file "listen_address": '::'
      • both set to '::'
      • Note: The helm chart makes env var POD_IP available by default
      • env var "QW_LISTEN_ADDRESS": '$(POD_IP)'
      • config file "listen_address": '${POD_IP}'
      • both set to (dynamic value of pod ip)
  • All resulted in the app never running correctly (as in port not serving traffic)

  1. Output of quickwit --version
    Quickwit 0.8.2 (aarch64-unknown-linux-gnu 2024-06-17T16:36:47Z 42766b8)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions