Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit protocol selection breaks DS HTTP egress #35315

Closed
emike922 opened this issue Sep 22, 2021 · 2 comments
Closed

Explicit protocol selection breaks DS HTTP egress #35315

emike922 opened this issue Sep 22, 2021 · 2 comments

Comments

@emike922
Copy link

emike922 commented Sep 22, 2021

Bug Description

We are currently testing a dual-stack solution for Istio, loosely based on #29076, ie by using :: ipv4_compat listeners serving both IP families for any/all wildcards (0.0.0.0 or ::). For testing we rebuilt pilot-agent with (only) the #35310 changes to achieve IPv6 traffic interception and plugged it in to the official istio/proxyv2:1.11.2 image.

In the 1.10 release this had already been working quite well. Now trying to resume the testing on 1.11, one direct HTTP egress case has started failing when/if explicit protocol selection is used (ie using name=http-port or appProtocol=http in the k8s service). Requests that would previously route to the correct endpoint now suddenly end up in BlackHoleCluster (using REGISTRY_ONLY setting, otherwise Passthrough).
What is more, when running an ip6tables-enabled 1.10-based client in the 1.11 control plane, it still manages to connect to the external HTTP service.
Removing the port name/appProtocol fields from the k8s service immediately restores connectivity to the external service on the 1.11-based client.

It appears that enabling explicit protocol selection removes the HttpProtocolOptions setting from the outbound service clusters and replaces the separate <SVC_IPv4>_80 / <SVC_IPv6>_80 listeners by a singular wildcard 0.0.0.0_80 listener. But this behavior does not seem to be new and had previously worked when modifying the listener to :: ipv4_compat, as described.

Version

$ istioctl version
client version: 1.11.2
control plane version: 1.11.2
data plane version: 1.10.4 (1 proxies), 1.11.2 (1 proxies)
$ kubectl version --short
Client Version: v1.18.2
Server Version: v1.21.2

Additional Information

DS cluster
Set-up using kind and the following config file:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
  ipFamily: dual

Istio
Deployed using helm:

$ helm install --namespace istio-system --create-namespace istio-base manifests/charts/base
$ helm install --namespace istio-system istiod manifests/charts/istio-control/istio-discovery --set meshConfig.accessLogFile=/dev/stdout --set meshConfig.outboundTrafficPolicy.mode=REGISTRY_ONLY

Test resources
EnvoyFilter replacing the outbound 15001 listener (minimal solution, but we have actually replaced ALL wildcards with same results)

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: sidecar-outbound-ipv4compat-listener
spec:
  configPatches:
  - applyTo: LISTENER
    match:
      context: SIDECAR_OUTBOUND
      listener:
        portNumber: 15001
    patch:
      operation: MERGE
      value:
        address:
          socket_address:
            address: "::"
            ipv4_compat: true

Service, separately exposed via IPv4 and IPv6:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: httpbin
  name: httpbin
spec:
  selector:
    matchLabels:
      app: httpbin
  template:
    metadata:
      annotations:
        sidecar.istio.io/inject: "false"
      labels:
        app: httpbin
        sidecar.istio.io/inject: "false"
    spec:
      containers:
      - args:
        - -b
        - '[::]:80'
        - --access-logfile
        - '-'
        - httpbin:app
        command:
        - gunicorn
        image: kennethreitz/httpbin
        imagePullPolicy: Always
        name: httpbin
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: httpbin4
  name: httpbin4
spec:
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http-port ## this works fine in the current test
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: httpbin
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: httpbin6
  name: httpbin6
spec:
  ipFamilies:
  - IPv6
  ipFamilyPolicy: SingleStack
  ports:
  - name: http-port ## this causes the problem
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: httpbin

ip6tables-enabled clients (1.10 and 1.11):

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: client
  name: client
spec:
  selector:
    matchLabels:
      app: client
  template:
    metadata:
      annotations:
        sidecar.istio.io/inject: "true"
        sidecar.istio.io/proxyImage: <PRIVATE_REGISTRY>/proxyv2:1.11.2-h9554646
      labels:
        app: client
        sidecar.istio.io/inject: "true"
    spec:
      containers:
      - args:
        - 3650d
        command:
        - sleep
        image: curlimages/curl
        imagePullPolicy: Always
        name: curl
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: clientold
  name: clientold
spec:
  selector:
    matchLabels:
      app: clientold
  template:
    metadata:
      annotations:
        sidecar.istio.io/inject: "true"
        sidecar.istio.io/proxyImage: <PRIVATE_REGISTRY>/proxyv2:1.10.4-h50704d3
      labels:
        app: clientold
        sidecar.istio.io/inject: "true"
    spec:
      containers:
      - args:
        - 3650d
        command:
        - sleep
        image: curlimages/curl
        imagePullPolicy: Always
        name: curl

Reproduction

NAMESPACE=test
CLIENT110=$(k get po -n "${NAMESPACE}" -l app=clientold -o name)
CLIENT111=$(k get po -n "${NAMESPACE}" -l app=client -o name)

k logs -n "${NAMESPACE}" "${CLIENT111}" -c istio-proxy --since=1s -f &
k logs -n "${NAMESPACE}" "${CLIENT110}" -c istio-proxy --since=1s -f &

k exec -n "${NAMESPACE}" "${CLIENT111}" -c curl -- curl -s http://httpbin4/status/418
    -=[ teapot ]=-
...
[2021-09-22T15:31:35.170Z] "GET /status/418 HTTP/1.1" 418 - via_upstream - "-" 0 135 3 2 "-" "curl/7.78.0-DEV" "a7f4fd3f-7fa4-4530-aaf8-ad535728d38e" "httpbin4" "10.244.0.10:80" outbound|80||httpbin4.test.svc.cluster.local 10.244.0.18:40414 10.96.187.16:80 10.244.0.18:43840 - default
k exec -n "${NAMESPACE}" "${CLIENT111}" -c curl -- curl -s http://httpbin6/status/418
command terminated with exit code 56
2021-09-22T15:57:40.796066Z	debug	envoy filter	original_dst: New connection accepted
2021-09-22T15:57:40.796182Z	debug	envoy filter	[C8339] new tcp proxy session
2021-09-22T15:57:40.796222Z	debug	envoy filter	[C8339] Creating connection to cluster BlackHoleCluster
[2021-09-22T15:57:40.796Z] "- - -" 0 UH - - "-" 0 0 0 - "-" "-" "-" "-" "-" BlackHoleCluster - [fd00:10:96::c6a5]:80 [fd00:10:244::12]:35630 - -

k exec -n "${NAMESPACE}" "${CLIENT110}" -c curl -- curl -s http://httpbin4/status/418
    -=[ teapot ]=-
...
[2021-09-22T15:33:42.301Z] "GET /status/418 HTTP/1.1" 418 - via_upstream - "-" 0 135 1 1 "-" "curl/7.78.0-DEV" "b4618d63-dde1-44a9-a8db-cff2dc9dd25e" "httpbin4" "10.244.0.10:80" outbound|80||httpbin4.test.svc.cluster.local 10.244.0.19:33772 10.96.187.16:80 10.244.0.19:37750 - default
k exec -n "${NAMESPACE}" "${CLIENT110}" -c curl -- curl -s http://httpbin6/status/418
    -=[ teapot ]=-
...
[2021-09-22T15:33:49.741Z] "GET /status/418 HTTP/1.1" 418 - via_upstream - "-" 0 135 1 0 "-" "curl/7.78.0-DEV" "19c67766-b786-4b76-900a-01b00549b4b0" "httpbin6" "[fd00:10:244::a]:80" outbound|80||httpbin6.test.svc.cluster.local [fd00:10:244::13]:56000 [fd00:10:96::c6a5]:80 [fd00:10:244::13]:60424 - default

k edit svc -n "${NAMESPACE}" httpbin4 # remove port name
k edit svc -n "${NAMESPACE}" httpbin6 # both MUST be changed to see the problem!!

k exec -n "${NAMESPACE}" "${CLIENT111}" -c curl -- curl -s http://httpbin4/status/418
    -=[ teapot ]=-
...
[2021-09-22T15:36:53.316Z] "GET /status/418 HTTP/1.1" 418 - via_upstream - "-" 0 135 1 1 "-" "curl/7.78.0-DEV" "d9d49c3f-1fe1-4459-ade2-a62344c2ad98" "httpbin4" "10.244.0.10:80" outbound|80||httpbin4.test.svc.cluster.local 10.244.0.18:54818 10.96.187.16:80 10.244.0.18:58244 - default
k exec -n "${NAMESPACE}" "${CLIENT111}" -c curl -- curl -s http://httpbin6/status/418
    -=[ teapot ]=-
...
[2021-09-22T15:36:59.511Z] "GET /status/418 HTTP/1.1" 418 - via_upstream - "-" 0 135 3 2 "-" "curl/7.78.0-DEV" "4d1f75ea-ca9b-4da1-8887-ad7707fd2c8a" "httpbin6" "[fd00:10:244::a]:80" outbound|80||httpbin6.test.svc.cluster.local [fd00:10:244::12]:57726 [fd00:10:96::c6a5]:80 [fd00:10:244::12]:35988 - default


k exec -n "${NAMESPACE}" "${CLIENT110}" -c curl -- curl -s http://httpbin4/status/418
    -=[ teapot ]=-
...
[2021-09-22T15:37:05.464Z] "GET /status/418 HTTP/1.1" 418 - via_upstream - "-" 0 135 3 2 "-" "curl/7.78.0-DEV" "19a6f324-e791-4829-8cf3-eccc4a8b79f1" "httpbin4" "10.244.0.10:80" outbound|80||httpbin4.test.svc.cluster.local 10.244.0.19:42986 10.96.187.16:80 10.244.0.19:46964 - default
k exec -n "${NAMESPACE}" "${CLIENT110}" -c curl -- curl -s http://httpbin6/status/418
    -=[ teapot ]=-
...
[2021-09-22T15:37:10.349Z] "GET /status/418 HTTP/1.1" 418 - via_upstream - "-" 0 135 4 2 "-" "curl/7.78.0-DEV" "65a39f61-f58a-4257-9c84-7f57bfcb5985" "httpbin6" "[fd00:10:244::a]:80" outbound|80||httpbin6.test.svc.cluster.local [fd00:10:244::13]:36888 [fd00:10:96::c6a5]:80 [fd00:10:244::13]:41312 - default

Proxy config dumps
named-port-config-1112.txt
unnamed-port-config-1104.txt
unnamed-port-config-1112.txt
named-port-config-1104.txt

@emike922
Copy link
Author

Patching the port 80 wildcard listener as well

  - applyTo: LISTENER
    match:
      context: SIDECAR_OUTBOUND
      listener:
        portNumber: 80
    patch:
      operation: MERGE
      value:
        address:
          socket_address:
            address: "::"
            ipv4_compat: true

Switches the roles: IPv4 fails, IPv6 works

@emike922
Copy link
Author

The behavior is result of the new Envoy runtime guard envoy.reloadable_features.listener_wildcard_match_ip_family. That seems to filter listeners by IP family matching the request. But :: listeners using the ipv4_compat socket option appear to be discarded when processing IPv4 requests, despite the fact that they are perfectly capable of handling the request...

I don't know if there is any interest from Istio side to use this information to potentially disable the runtime guard? Probably not while there is no dual-stack support, and maybe not even when/if that should come!?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants