Getting empty reply from server when trying to request from one pod to another within the cluster #6758

manuasir · 2021-01-15T00:04:08Z

NGINX Ingress controller version:
0.32.0
Kubernetes version (use kubectl version):
1.18

Environment:

Cloud provider or hardware configuration: AWS
OS (e.g. from /etc/os-release): Amazon Linux
Kernel (e.g. uname -a): Linux ip-10-2-5-110.eu-west-1.compute.internal 4.14.209-160.335.amzn2.x86_64 Basic structure #1 SMP Wed Dec 2 23:31:46 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

What happened:
I'm currently in an EKS environment, using a private node group (three private subnets within the VPC where I connect to through a VPN). After setting up the IC, everything seems to work fine. When connecting to the VPC, the requests to my services behind the LB work. There is no loss, I use curl for testing purposes and all the requests are able to resolve successfully. The problem comes when curling from a pod to another within the cluster. Eventually, I get this Empty reply from server error.

Let's say that ~10% of the requests fail.

My architecture is very simple, like the following:

3 private subnets (eu-west-1a, eu-west-1b, eu-west-1c).
4 nodes.
NGINX Ingress controller (Network load balancer) as a DaemonSet, so each node has a pod of the IC running on it. This was also reproduced used a Deployment.

How to reproduce it (as minimally and precisely as possible):

Just attach a shell to any pod within your cluster and curl another pod.

Anything else we need to know?:

I'm attaching several configuration files we're currently using

Ingress

---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: api-stg
  namespace: api
  annotations:
    kubernetes.io/ingress.class: "nginx"
    nginx.ingress.kubernetes.io/ssl-redirect: "true" 
spec:
  rules:
    - host: stgapi.myhost.com
      http:
        paths:
          - path: "/"
            backend:
              serviceName: api-stg-service
              servicePort: 80

Service

apiVersion: v1
kind: Service
metadata:
  namespace: api
  name: api-stg-service
spec:
  type: ClusterIP
  ports:
  - port: 80
    name: http
    targetPort: 80
  selector:
    app: api-stg-pod

Environment:

EKS

What you expected to happen:
All requests can be resolved without loss.

Any help would be much appreciated, don't hesitate to ask for any details.

Cheers

The text was updated successfully, but these errors were encountered:

aledbf · 2021-01-15T00:47:43Z

@manuasir please complete the template issue when opening an issue.

Please check and post the ingress-nginx pod log.

manuasir · 2021-01-15T00:59:24Z

Apologies for that, already updated the issue with more information from the template.

As each node runs a pod, I've checked the logs of all ingress-nginx pods. They're all like this:

The requests that gone fine are logged, the failing ones don't even appear in the logs, like they never arrived in the pod.

aledbf · 2021-01-15T01:02:30Z

NGINX Ingress controller version:
0.32.0

Please update to the latest version v0.43.0

aledbf · 2021-01-15T01:03:52Z

The requests that gone fine are logged, the failing ones don't even appear in the logs, like they never arrived in the pod.

That looks correct.

How did you install the ingress controller?
How are you testing this?
Can you test from the cluster itself (to remove the VPN).
Are you using an ELB or NLB?

manuasir · 2021-01-15T01:10:17Z

How did you install the ingress controller?

I followed this guide.

How are you testing this?

I'm performing curls from one pod to another.

Can you test from the cluster itself (to remove the VPN).
Are you using an ELB or NLB?

I'm curling the service's DNS from within the cluster. I have an NLB and a Route 53 registry pointing to it.

I'm going to update to the latest version and then try, thanks!

manuasir · 2021-01-15T01:14:50Z

Updated to the 0.43.0 version, still have the problem:

aledbf · 2021-01-15T01:20:52Z

@manuasir please add -vvv to the curl command and check the output when returns an error.

manuasir · 2021-01-15T01:26:46Z

aledbf · 2021-01-15T01:49:25Z

@manuasir from the provided information I don't see anything related to the ingress controller itself:

no errors in the pod logs
TLS termination is being done in the LB, not ingress-nginx, and returns valid responses
the times in the log are consistent.

Maybe this is related to a networking issue between the LB and the node/s where the ingress controller pod is running.

To check this, please enable the LB logs (S3 bucket) adding the next annotations to the ingress-nginx service:
https://kubernetes.io/docs/concepts/services-networking/service/#elb-access-logs-on-aws

    service.beta.kubernetes.io/aws-load-balancer-access-log-enabled: "true"
    service.beta.kubernetes.io/aws-load-balancer-access-log-s3-bucket-name: "<some bucket>"
    service.beta.kubernetes.io/aws-load-balancer-access-log-s3-bucket-prefix: "ingress-nginx"

wait until the LB configuration is updated and then run the test.

From the log in S3, you should see if the issue is produced in a particular target (node) and also get more details about the error

JihadMotii-REISys · 2021-03-24T19:59:58Z

@manuasir were you able to find a solution for this issue? I'm facing the same issue as you have described.

I don't want to open another ticket since this ticket is still open and it is related to my current issue as well.

Here is my current configuration value.yaml

imagePullSecrets:
  - name: my-docker-secret
controller:
  kind: DaemonSet
  podAnnotations:
    linkerd.io/inject: enabled
  containerPort:
    http: 80
    https: 443
    tohttps: 2443
  service:
    externalTrafficPolicy: Local
    targetPorts:
      http: tohttps
      https: http
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
      service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "ARN_ACM"
      service.beta.kubernetes.io/aws-load-balancer-ssl-negotiation-policy: ELBSecurityPolicy-FS-1-2-Res-2020-10
      service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
      service.beta.kubernetes.io/aws-load-balancer-ssl-ports: 443
      service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: "*"
      service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: '60'
      service.beta.kubernetes.io/aws-load-balancer-type: nlb
      service.beta.kubernetes.io/aws-load-balancer-internal: "true"
      service.beta.kubernetes.io/aws-load-balancer-access-log-enabled: "true"
      service.beta.kubernetes.io/aws-load-balancer-access-log-s3-bucket-name: "MY_BUCKET_NAME"
      service.beta.kubernetes.io/aws-load-balancer-access-log-s3-bucket-prefix: "MY_BUCKET_PREFIX"
      service.beta.kubernetes.io/aws-load-balancer-access-log-emit-interval: 5
  extraArgs:
    enable-ssl-passthrough: "true"
  config:
    force-ssl-redirect: 'false'
    use-proxy-protocol: 'false'
    proxy-body-size: "0"
    proxy-buffer-size: "16k"
    proxy-real-ip-cidr: "xx.xx.xx.xx/xx"
    use-forwarded-headers: "true"
    ssl-redirect: 'false' # SSL Redirection is taken care of by the snippet below that forwards 80 -> 2443 which is HTTPS. Enabling this will cause all ingress to results in redirect loop 308
    http-snippet: |
      server {
        listen 2443;
        return 308 https://$host$request_uri;
      }

@aledbf any guidance would be much appreciate it.

manuasir · 2021-03-31T22:29:17Z

We were not able to solve this yet.

kay-ramme · 2021-06-08T14:43:48Z

this seems to be related to

use-proxy-protocol

which needs to be turned on for requests coming through the NLB, but fails for this pod to itself, even if going through the NLB ..

kay-ramme · 2021-06-08T14:47:27Z

Cloud it be, that NLB fails to pass proxy-protocol data when the request comes from the same host the pod is running on ? ... 🤔

longwuyuan · 2021-06-08T14:58:35Z

Is it possible that if you are facing this problem, then you do the following steps and provide data as suggested in these steps ;

Make sure you install the ingress-nginx-controller in the latest release
Make sure that you install YAML as per docs,AWS section
Don't use daemonset for this test if possible. Try to run just a normal deployment with one single replica. If not possible, then adapt below information reporting accordingly
Show kubectl get po,svc,ing -A -o wide
Show kubectl -n <namespace> describe po of the pod that should respond to curl
Show kubectl -n <namespace> describe svc of the svc that is expected to get used for your curl command
Show kubectl -n <namespace> describe ing <ingressname> of the ingress that should get used for your curl command
Start tailing logs of ingresscontrollerpod kubectl -n <namespace> logs -f <ingresscontrollerpodname>
Show where youe going to run curl from. Describe where is the host from which you are going to run curl
Show the complete curl command as it is raw, with -vvv and the output
Show the logs of the ingresscontrollerpod related to curl
Any other related information

Or you can adapt above steps for below test ;

kubectl run test0 --image nginx:alpine --port 80
kubectl run test1 --image nginx:alpine --port 80
kubectl expose test0 --port 80
kubectl expose test1 --port 80
kubectl exec -ti test0 -- sh`
From shell inside the test0 pod curl test1

iamNoah1 · 2021-06-29T07:47:35Z

Hi @manuasir @JihadMotii-REISys can you guys confirm that the issue still exists with newer versions of ingress-nginx?

Kevin-Molina · 2021-07-08T15:52:56Z

Anyone end up figuring this one out?

sharms · 2021-07-15T19:51:01Z

We resolved this by disabling NLB Client IP Preservation

longwuyuan · 2021-07-16T13:54:36Z

hi, can you close this issue and come talk about it on kubernetes.slack.com in the ingress-nginx-users channel. There are many developers and engineers who may have insight and not all of them would be looking here.

/remove-kind bug
/triage needs-information

iamNoah1 · 2021-08-26T14:40:58Z

/close

feel free to open this one again or submitting a new issue.

k8s-ci-robot · 2021-08-26T14:41:15Z

@iamNoah1: Closing this issue.

In response to this:

/close

feel free to open this one again or submitting a new issue.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

benbonnet · 2021-12-16T14:47:16Z

@sharms if i may ask here; how do you disable NLB Client IP Preservation ? ingress annotations ? lb config ?

sharms · 2021-12-16T15:15:28Z

@benbonnet - Here was the code from the time:

    nginx-ingress:
      controller:
        service:
          externalTrafficPolicy: Cluster
          enableHttp: false
          targetPorts:
            https: http
          annotations:
            service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags: Name=gitlab-nginx,class=nginx,role=test,vpc=test
            service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "tcp"
            service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443"
            service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: instance
            service.beta.kubernetes.io/aws-load-balancer-type: external
            service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: "preserve_client_ip.enabled=false"

pparthesh · 2022-07-19T12:09:44Z

Deploy ingress-nginx pods on different kubernetes nodegroup.

9you · 2023-01-18T07:43:48Z

@benbonnet - Here was the code from the time:

    nginx-ingress:
      controller:
        service:
          externalTrafficPolicy: Cluster
          enableHttp: false
          targetPorts:
            https: http
          annotations:
            service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags: Name=gitlab-nginx,class=nginx,role=test,vpc=test
            service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "tcp"
            service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443"
            service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: instance
            service.beta.kubernetes.io/aws-load-balancer-type: external
            service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: "preserve_client_ip.enabled=false"

Thank you so much @sharms, we stuck in this problem for months.

rapyder-user · 2023-11-09T19:23:00Z

Thanks sharms for the info.

ftasbasi · 2024-06-08T22:03:31Z

We resolved this by enabling Proxy protocol v2

manuasir added the kind/bug Categorizes issue or PR as related to a bug. label Jan 15, 2021

k8s-ci-robot added triage/needs-information Indicates an issue needs more information in order to work on it. and removed kind/bug Categorizes issue or PR as related to a bug. labels Jul 16, 2021

k8s-ci-robot closed this as completed Aug 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting empty reply from server when trying to request from one pod to another within the cluster #6758

Getting empty reply from server when trying to request from one pod to another within the cluster #6758

manuasir commented Jan 15, 2021 •

edited

Loading

aledbf commented Jan 15, 2021

manuasir commented Jan 15, 2021

aledbf commented Jan 15, 2021

aledbf commented Jan 15, 2021 •

edited

Loading

manuasir commented Jan 15, 2021

manuasir commented Jan 15, 2021 •

edited

Loading

aledbf commented Jan 15, 2021

manuasir commented Jan 15, 2021

aledbf commented Jan 15, 2021

JihadMotii-REISys commented Mar 24, 2021 •

edited

Loading

manuasir commented Mar 31, 2021

kay-ramme commented Jun 8, 2021

kay-ramme commented Jun 8, 2021

longwuyuan commented Jun 8, 2021 •

edited

Loading

iamNoah1 commented Jun 29, 2021

Kevin-Molina commented Jul 8, 2021

sharms commented Jul 15, 2021

longwuyuan commented Jul 16, 2021

iamNoah1 commented Aug 26, 2021

k8s-ci-robot commented Aug 26, 2021

benbonnet commented Dec 16, 2021

sharms commented Dec 16, 2021

pparthesh commented Jul 19, 2022

9you commented Jan 18, 2023

rapyder-user commented Nov 9, 2023

ftasbasi commented Jun 8, 2024

Getting empty reply from server when trying to request from one pod to another within the cluster #6758

Getting empty reply from server when trying to request from one pod to another within the cluster #6758

Comments

manuasir commented Jan 15, 2021 • edited Loading

aledbf commented Jan 15, 2021

manuasir commented Jan 15, 2021

aledbf commented Jan 15, 2021

aledbf commented Jan 15, 2021 • edited Loading

manuasir commented Jan 15, 2021

manuasir commented Jan 15, 2021 • edited Loading

aledbf commented Jan 15, 2021

manuasir commented Jan 15, 2021

aledbf commented Jan 15, 2021

JihadMotii-REISys commented Mar 24, 2021 • edited Loading

manuasir commented Mar 31, 2021

kay-ramme commented Jun 8, 2021

kay-ramme commented Jun 8, 2021

longwuyuan commented Jun 8, 2021 • edited Loading

iamNoah1 commented Jun 29, 2021

Kevin-Molina commented Jul 8, 2021

sharms commented Jul 15, 2021

longwuyuan commented Jul 16, 2021

iamNoah1 commented Aug 26, 2021

k8s-ci-robot commented Aug 26, 2021

benbonnet commented Dec 16, 2021

sharms commented Dec 16, 2021

pparthesh commented Jul 19, 2022

9you commented Jan 18, 2023

rapyder-user commented Nov 9, 2023

ftasbasi commented Jun 8, 2024

manuasir commented Jan 15, 2021 •

edited

Loading

aledbf commented Jan 15, 2021 •

edited

Loading

manuasir commented Jan 15, 2021 •

edited

Loading

JihadMotii-REISys commented Mar 24, 2021 •

edited

Loading

longwuyuan commented Jun 8, 2021 •

edited

Loading