Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting empty reply from server when trying to request from one pod to another within the cluster #6758

Closed
manuasir opened this issue Jan 15, 2021 · 26 comments
Labels
triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@manuasir
Copy link

manuasir commented Jan 15, 2021

NGINX Ingress controller version:
0.32.0
Kubernetes version (use kubectl version):
1.18

Environment:

  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release): Amazon Linux
  • Kernel (e.g. uname -a): Linux ip-10-2-5-110.eu-west-1.compute.internal 4.14.209-160.335.amzn2.x86_64 Basic structure  #1 SMP Wed Dec 2 23:31:46 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

What happened:
I'm currently in an EKS environment, using a private node group (three private subnets within the VPC where I connect to through a VPN). After setting up the IC, everything seems to work fine. When connecting to the VPC, the requests to my services behind the LB work. There is no loss, I use curl for testing purposes and all the requests are able to resolve successfully. The problem comes when curling from a pod to another within the cluster. Eventually, I get this Empty reply from server error.

image

Let's say that ~10% of the requests fail.

My architecture is very simple, like the following:

  • 3 private subnets (eu-west-1a, eu-west-1b, eu-west-1c).
  • 4 nodes.
  • NGINX Ingress controller (Network load balancer) as a DaemonSet, so each node has a pod of the IC running on it. This was also reproduced used a Deployment.

How to reproduce it (as minimally and precisely as possible):

Just attach a shell to any pod within your cluster and curl another pod.

Anything else we need to know?:

I'm attaching several configuration files we're currently using

  • Ingress
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: api-stg
  namespace: api
  annotations:
    kubernetes.io/ingress.class: "nginx"
    nginx.ingress.kubernetes.io/ssl-redirect: "true" 
spec:
  rules:
    - host: stgapi.myhost.com
      http:
        paths:
          - path: "/"
            backend:
              serviceName: api-stg-service
              servicePort: 80
  • Service
apiVersion: v1
kind: Service
metadata:
  namespace: api
  name: api-stg-service
spec:
  type: ClusterIP
  ports:
  - port: 80
    name: http
    targetPort: 80
  selector:
    app: api-stg-pod

Environment:

EKS

What you expected to happen:
All requests can be resolved without loss.

Any help would be much appreciated, don't hesitate to ask for any details.

Cheers

@manuasir manuasir added the kind/bug Categorizes issue or PR as related to a bug. label Jan 15, 2021
@aledbf
Copy link
Member

aledbf commented Jan 15, 2021

@manuasir please complete the template issue when opening an issue.

Please check and post the ingress-nginx pod log.

@manuasir
Copy link
Author

Apologies for that, already updated the issue with more information from the template.

As each node runs a pod, I've checked the logs of all ingress-nginx pods. They're all like this:

image

The requests that gone fine are logged, the failing ones don't even appear in the logs, like they never arrived in the pod.

@aledbf
Copy link
Member

aledbf commented Jan 15, 2021

NGINX Ingress controller version:
0.32.0

Please update to the latest version v0.43.0

@aledbf
Copy link
Member

aledbf commented Jan 15, 2021

The requests that gone fine are logged, the failing ones don't even appear in the logs, like they never arrived in the pod.

That looks correct.

How did you install the ingress controller?
How are you testing this?
Can you test from the cluster itself (to remove the VPN).
Are you using an ELB or NLB?

@manuasir
Copy link
Author

How did you install the ingress controller?

I followed this guide.

How are you testing this?

I'm performing curls from one pod to another.

Can you test from the cluster itself (to remove the VPN).
Are you using an ELB or NLB?

I'm curling the service's DNS from within the cluster. I have an NLB and a Route 53 registry pointing to it.

I'm going to update to the latest version and then try, thanks!

@manuasir
Copy link
Author

manuasir commented Jan 15, 2021

Updated to the 0.43.0 version, still have the problem:

image

@aledbf
Copy link
Member

aledbf commented Jan 15, 2021

@manuasir please add -vvv to the curl command and check the output when returns an error.

@manuasir
Copy link
Author

image

@aledbf
Copy link
Member

aledbf commented Jan 15, 2021

@manuasir from the provided information I don't see anything related to the ingress controller itself:

  • no errors in the pod logs
  • TLS termination is being done in the LB, not ingress-nginx, and returns valid responses
  • the times in the log are consistent.

Maybe this is related to a networking issue between the LB and the node/s where the ingress controller pod is running.

To check this, please enable the LB logs (S3 bucket) adding the next annotations to the ingress-nginx service:
https://kubernetes.io/docs/concepts/services-networking/service/#elb-access-logs-on-aws

    service.beta.kubernetes.io/aws-load-balancer-access-log-enabled: "true"
    service.beta.kubernetes.io/aws-load-balancer-access-log-s3-bucket-name: "<some bucket>"
    service.beta.kubernetes.io/aws-load-balancer-access-log-s3-bucket-prefix: "ingress-nginx"

wait until the LB configuration is updated and then run the test.

From the log in S3, you should see if the issue is produced in a particular target (node) and also get more details about the error

@JihadMotii-REISys
Copy link

JihadMotii-REISys commented Mar 24, 2021

@manuasir were you able to find a solution for this issue? I'm facing the same issue as you have described.

I don't want to open another ticket since this ticket is still open and it is related to my current issue as well.

Here is my current configuration value.yaml

imagePullSecrets:
  - name: my-docker-secret
controller:
  kind: DaemonSet
  podAnnotations:
    linkerd.io/inject: enabled
  containerPort:
    http: 80
    https: 443
    tohttps: 2443
  service:
    externalTrafficPolicy: Local
    targetPorts:
      http: tohttps
      https: http
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
      service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "ARN_ACM"
      service.beta.kubernetes.io/aws-load-balancer-ssl-negotiation-policy: ELBSecurityPolicy-FS-1-2-Res-2020-10
      service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
      service.beta.kubernetes.io/aws-load-balancer-ssl-ports: 443
      service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: "*"
      service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: '60'
      service.beta.kubernetes.io/aws-load-balancer-type: nlb
      service.beta.kubernetes.io/aws-load-balancer-internal: "true"
      service.beta.kubernetes.io/aws-load-balancer-access-log-enabled: "true"
      service.beta.kubernetes.io/aws-load-balancer-access-log-s3-bucket-name: "MY_BUCKET_NAME"
      service.beta.kubernetes.io/aws-load-balancer-access-log-s3-bucket-prefix: "MY_BUCKET_PREFIX"
      service.beta.kubernetes.io/aws-load-balancer-access-log-emit-interval: 5
  extraArgs:
    enable-ssl-passthrough: "true"
  config:
    force-ssl-redirect: 'false'
    use-proxy-protocol: 'false'
    proxy-body-size: "0"
    proxy-buffer-size: "16k"
    proxy-real-ip-cidr: "xx.xx.xx.xx/xx"
    use-forwarded-headers: "true"
    ssl-redirect: 'false' # SSL Redirection is taken care of by the snippet below that forwards 80 -> 2443 which is HTTPS. Enabling this will cause all ingress to results in redirect loop 308
    http-snippet: |
      server {
        listen 2443;
        return 308 https://$host$request_uri;
      }

@aledbf any guidance would be much appreciate it.

@manuasir
Copy link
Author

We were not able to solve this yet.

@kay-ramme
Copy link

this seems to be related to

use-proxy-protocol

which needs to be turned on for requests coming through the NLB, but fails for this pod to itself, even if going through the NLB ..

@kay-ramme
Copy link

Cloud it be, that NLB fails to pass proxy-protocol data when the request comes from the same host the pod is running on ? ... 🤔

@longwuyuan
Copy link
Contributor

longwuyuan commented Jun 8, 2021

Is it possible that if you are facing this problem, then you do the following steps and provide data as suggested in these steps ;

  • Make sure you install the ingress-nginx-controller in the latest release
  • Make sure that you install YAML as per docs,AWS section
  • Don't use daemonset for this test if possible. Try to run just a normal deployment with one single replica. If not possible, then adapt below information reporting accordingly
  • Show kubectl get po,svc,ing -A -o wide
  • Show kubectl -n <namespace> describe po of the pod that should respond to curl
  • Show kubectl -n <namespace> describe svc of the svc that is expected to get used for your curl command
  • Show kubectl -n <namespace> describe ing <ingressname> of the ingress that should get used for your curl command
  • Start tailing logs of ingresscontrollerpod kubectl -n <namespace> logs -f <ingresscontrollerpodname>
  • Show where youe going to run curl from. Describe where is the host from which you are going to run curl
  • Show the complete curl command as it is raw, with -vvv and the output
  • Show the logs of the ingresscontrollerpod related to curl
  • Any other related information

Or you can adapt above steps for below test ;

  • kubectl run test0 --image nginx:alpine --port 80
  • kubectl run test1 --image nginx:alpine --port 80
  • kubectl expose test0 --port 80
  • kubectl expose test1 --port 80
  • kubectl exec -ti test0 -- sh`
  • From shell inside the test0 pod curl test1

@iamNoah1
Copy link
Contributor

Hi @manuasir @JihadMotii-REISys can you guys confirm that the issue still exists with newer versions of ingress-nginx?

@Kevin-Molina
Copy link

Anyone end up figuring this one out?

@sharms
Copy link
Contributor

sharms commented Jul 15, 2021

We resolved this by disabling NLB Client IP Preservation

@longwuyuan
Copy link
Contributor

hi, can you close this issue and come talk about it on kubernetes.slack.com in the ingress-nginx-users channel. There are many developers and engineers who may have insight and not all of them would be looking here.

/remove-kind bug
/triage needs-information

@k8s-ci-robot k8s-ci-robot added triage/needs-information Indicates an issue needs more information in order to work on it. and removed kind/bug Categorizes issue or PR as related to a bug. labels Jul 16, 2021
@iamNoah1
Copy link
Contributor

/close

feel free to open this one again or submitting a new issue.

@k8s-ci-robot
Copy link
Contributor

@iamNoah1: Closing this issue.

In response to this:

/close

feel free to open this one again or submitting a new issue.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@benbonnet
Copy link

@sharms if i may ask here; how do you disable NLB Client IP Preservation ? ingress annotations ? lb config ?

@sharms
Copy link
Contributor

sharms commented Dec 16, 2021

@benbonnet - Here was the code from the time:

    nginx-ingress:
      controller:
        service:
          externalTrafficPolicy: Cluster
          enableHttp: false
          targetPorts:
            https: http
          annotations:
            service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags: Name=gitlab-nginx,class=nginx,role=test,vpc=test
            service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "tcp"
            service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443"
            service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: instance
            service.beta.kubernetes.io/aws-load-balancer-type: external
            service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: "preserve_client_ip.enabled=false"

@pparthesh
Copy link

Deploy ingress-nginx pods on different kubernetes nodegroup.

@9you
Copy link

9you commented Jan 18, 2023

@benbonnet - Here was the code from the time:

    nginx-ingress:
      controller:
        service:
          externalTrafficPolicy: Cluster
          enableHttp: false
          targetPorts:
            https: http
          annotations:
            service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags: Name=gitlab-nginx,class=nginx,role=test,vpc=test
            service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "tcp"
            service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443"
            service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: instance
            service.beta.kubernetes.io/aws-load-balancer-type: external
            service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: "preserve_client_ip.enabled=false"

Thank you so much @sharms, we stuck in this problem for months.

@rapyder-user
Copy link

Thanks sharms for the info.

@ftasbasi
Copy link

ftasbasi commented Jun 8, 2024

We resolved this by enabling Proxy protocol v2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests