Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS NLB's health check fails if nodeports are updated #35348

Closed
psibi opened this issue Sep 24, 2021 · 11 comments
Closed

AWS NLB's health check fails if nodeports are updated #35348

psibi opened this issue Sep 24, 2021 · 11 comments
Labels
area/networking lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while

Comments

@psibi
Copy link

psibi commented Sep 24, 2021

Bug Description

As the issue title explains, if I update the ingress gateway's node
port, then it results in my network load balancer health check failure.

Steps to reproduce:

  1. Install istio using the istio operator
  2. Apply this manifest and wait till the ingress is up
Click to Expand manifest
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
 name: istiocontrolplane
 namespace: istio-system
spec:
 components:
   egressGateways:
   - enabled: true
     k8s:
       hpaSpec:
         maxReplicas: 5
         minReplicas: 2
     name: istio-egressgateway
   ingressGateways:
   - enabled: true
     k8s:
       hpaSpec:
         maxReplicas: 5
         minReplicas: 2
       serviceAnnotations:
         service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
         service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "600"
         service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
         service.beta.kubernetes.io/aws-load-balancer-type: nlb
     name: istio-ingressgateway
   pilot:
     enabled: true
 hub: gcr.io/istio-release
 meshConfig:
   defaultConfig:
     tracing:
       zipkin: null
   enablePrometheusMerge: true
 profile: default
 values:
   global:
     proxy:
       resources:
         limits:
           cpu: 100m
           memory: 100Mi
         requests:
           cpu: 100m
           memory: 100Mi
  1. Wait till the network load balancer and the pods are up. Now update
    the above manifest like this to change the node port:
Click to expand manifest
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  name: istiocontrolplane
  namespace: istio-system
spec:
  components:
    egressGateways:
    - enabled: true
      k8s:
        hpaSpec:
          maxReplicas: 5
          minReplicas: 2
      name: istio-egressgateway
    ingressGateways:
    - enabled: true
      k8s:
        hpaSpec:
          maxReplicas: 5
          minReplicas: 2
        service:
          ports:
          - name: status-port
            nodePort: 30644
            port: 15021
          - name: http2
            nodePort: 32085
            port: 80
          - name: https
            nodePort: 32082
            port: 443
        serviceAnnotations:
          service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
          service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "600"
          service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
          service.beta.kubernetes.io/aws-load-balancer-type: nlb
      name: istio-ingressgateway
    pilot:
      enabled: true
  hub: gcr.io/istio-release
  meshConfig:
    defaultConfig:
      tracing:
        zipkin: null
    enablePrometheusMerge: true
  profile: default
  values:
    global:
      proxy:
        resources:
          limits:
            cpu: 100m
            memory: 100Mi
          requests:
            cpu: 100m
            memory: 100Mi
  1. Now go and observe the health of your NLB load balancer and it will fail for both the https and http port. Snapshot:

unhealth_snap

Surprisingly for the status-port, the health checks are fine.

Note that if I change the node port of my Nginx controllers (backed by NLB) - they do work fine.

There is also a similar looking issue opened previously but which has been closed stating that it should be fixed in 1.8+ : #28856

Version

❯ istioctl version
client version: 1.10.3
control plane version: 1.10.3
data plane version: 1.10.3 (7 proxies)
❯ kubectl version --short
Client Version: v1.19.0
Server Version: v1.19.13-eks-8df270
@psibi psibi changed the title AWS NLB's health check fails if the nodeports are updated AWS NLB's health check fails if nodeports are updated Sep 24, 2021
@howardjohn
Copy link
Member

I don't see how this can be triggered by Istio specifically? I am not an AWS expert though so maybe it would need to be converted into Istio concepts rather than just "NLB health checks" (ie curl reproducer, etc).

But in general Ingress gw doesn't care or know about node ports. It just accepts traffic sent to it.

@psibi
Copy link
Author

psibi commented Sep 24, 2021

@howardjohn Note that if I don't specify any specific node port (or remove the explicit node port which causes this issue), the new ingress pods come alive and the NLB turns into an healthy state.

Note that after the NLB goes into an unhealthy state, the ingress doesn't receive any traffic. Also what's suprising is that the node port for status port works as compared to the http2 and https port.

Looking at #28856, I believe this is a similar issue since even that seems to be reproducible when node ports are different. And looking at the recent comment on the issue (#28856 (comment)), it seems something similar is happening during upgrades too.

Let me know if you need more details and I can gather the relevant logs etc. Thank you!

@howardjohn
Copy link
Member

#28856 was about the operator unintentionally changing the port. It seems like you are doing it intentionally though?

@psibi
Copy link
Author

psibi commented Sep 24, 2021

@howardjohn Yeah, it was an intentional change for some of the use cases we have.

@shichangzhang
Copy link

shichangzhang commented Dec 6, 2021

Hi @psibi, I've just tried to expose another port on an existing NLB installed via istio operator. After updating the IstioOperator k8s resource, I see that an AWS target group is created for me with no targets.

To get it working, I did the following:

  • Manually added targets to the target group
  • Added target group as listener on NLB
  • Updated the security group to allow traffic for the newly specified node port (Before doing this, the NLB health checks would fail.)

@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Dec 24, 2021
@istio-policy-bot istio-policy-bot added the lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. label Jan 8, 2022
@psibi
Copy link
Author

psibi commented Jan 8, 2022

If you feel this issue or pull request deserves attention, please reopen the issue.

This issue is still present and I don't have enough access to reopen the issue.

@zirain zirain removed lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. labels Jan 8, 2022
@zirain zirain reopened this Jan 8, 2022
@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Apr 8, 2022
@istio-policy-bot istio-policy-bot added the lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. label Apr 23, 2022
@psibi
Copy link
Author

psibi commented Apr 27, 2022

If you feel this issue or pull request deserves attention, please reopen the issue.

This issue is still present and I don't have enough access to reopen the issue.

@howardjohn
Copy link
Member

I can reopen it but I don't see how this is an Istio issue.

  1. AWS NLB (not controlled by Istio) does not update health check if node port changed
  2. The user explicitly changed node ports
  3. NLB health checks broke

Same would happen without Istio?

@psibi
Copy link
Author

psibi commented Apr 27, 2022

Same would happen without Istio?

When I do the same steps for NGINX controller (modifying the ports in the Service manifest), it doesn't happen and the NLB checks work fine.

@howardjohn howardjohn reopened this Apr 27, 2022
@istio-policy-bot istio-policy-bot removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Apr 27, 2022
@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Jul 27, 2022
@istio-policy-bot
Copy link

🚧 This issue or pull request has been closed due to not having had activity from an Istio team member since 2022-04-27. If you feel this issue or pull request deserves attention, please reopen the issue. Please see this wiki page for more information. Thank you for your contributions.

Created by the issue and PR lifecycle manager.

@psibi
Copy link
Author

psibi commented Aug 11, 2022

If you feel this issue or pull request deserves attention, please reopen the issue.

This issue is still present and I don't have enough access to reopen the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while
Projects
None yet
Development

No branches or pull requests

5 participants