Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helm: connection failure #648

Closed
HofmannZ opened this issue Apr 14, 2021 · 6 comments
Closed

helm: connection failure #648

HofmannZ opened this issue Apr 14, 2021 · 6 comments
Labels
waiting-reply Waiting on the issue creator for a response before taking further action

Comments

@HofmannZ
Copy link

Yesterday overnight some of our services started throwing the flowing error:

upstream connect error or disconnect/reset before headers. reset reason: connection failure

The odd thing is that the behaviour persists on a fresh cluster.

Allow me to describe the current setup:

We have one deployment (A) that proxies traffic from the internet through a gateway (G). The deployment (A) calls another deployment (B) over HTTP. Both deployments have the protocol in the respective ServiceDefaults set to http2. And the ServiceIntentions are set up so that G can call A and A can call B.

These two deployments and gateway work fine.

We also have another deployment (C), which is called by deployment (B) over gRPC. This deployment has the protocol in the ServiceDefaults set to grpc. And the ServiceIntentions are set up so that B can call C.

This used to work, but broke overnight. Any idea what may have caused this?

Config of deployment (B)

apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceDefaults
metadata:
  name: deployment-b
  namespace: foo
spec:
  protocol: http2
---
apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceIntentions
metadata:
  name: deployment-b
  namespace: foo
spec:
  destination:
    name: deployment-b
  sources:
    - name: deployment-a
      action: allow
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: deployment-b
  namespace: foo
---
kind: Service
apiVersion: v1
metadata:
  labels:
    app: deployment-b
  name: deployment-b
  namespace: foo
spec:
  type: ClusterIP
  selector:
    app: deployment-b
  ports:
    - name: http
      port: 80
      targetPort: 3000
      protocol: TCP
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: deployment-b
  name: deployment-b-deployment
  namespace: foo
spec:
  selector:
    matchLabels:
      app: deployment-b
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 75%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        consul.hashicorp.com/connect-inject: "true"
        consul.hashicorp.com/connect-service: "deployment-b"
        consul.hashicorp.com/connect-service-upstreams: deployment-c:3001
      labels:
        app: deployment-b
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: deployment-b
              topologyKey: "kubernetes.io/hostname"
      containers:
      - name: deployment-b
        image: deployment-b:v1.0.0
        env:
        - name: DEPLOYMENT_C_ADDRESS
          value: localhost:3001
        livenessProbe:
          ...
        readinessProbe:
          ...
        resources:
          limits:
            cpu: 160m
            memory: 400Mi
          requests:
            cpu: 80m
            memory: 200Mi
      serviceAccountName: deployment-b
      terminationGracePeriodSeconds: 30

Config of deployment (C)

apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceDefaults
metadata:
  name: deployment-c
  namespace: foo
spec:
  protocol: grpc
---
apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceIntentions
metadata:
  name: deployment-c
  namespace: foo
spec:
  destination:
    name: deployment-c
  sources:
    - name: deployment-b
      action: allow
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: deployment-c
  namespace: foo
---
kind: Service
apiVersion: v1
metadata:
  labels:
    app: deployment-c
  name: deployment-c
  namespace: foo
spec:
  type: ClusterIP
  selector:
    app: deployment-c
  ports:
    - name: grpc
      port: 3000
      targetPort: 3000
      protocol: TCP
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: deployment-c
  name: deployment-c-deployment
  namespace: foo
spec:
  selector:
    matchLabels:
      app: deployment-c
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 75%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        consul.hashicorp.com/connect-inject: "true"
        consul.hashicorp.com/connect-service: "deployment-c"
      labels:
        app: deployment-c
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: deployment-c
              topologyKey: "kubernetes.io/hostname"
      containers:
      - name: deployment-c
        image: deployment-c:v1.0.0
        livenessProbe:
          ...
        readinessProbe:
          ...
        resources:
          limits:
            cpu: 160m
            memory: 400Mi
          requests:
            cpu: 80m
            memory: 200Mi
      serviceAccountName: deployment-c
      terminationGracePeriodSeconds: 30
@ndhanushkodi
Copy link
Contributor

Hi @HofmannZ I tried to reproduce this by deploying a grpc-client(B) ---grpc--> grpc-server(C), and I saw the requests go through sucessfully using a very similar setup. I omitted just the Kubernetes Services and used my own image for the grpc-client and grpc-server. See the details for the exact configuration.

First I deployed using the following `consul-values.yaml` with Consul-helm 0.31.1, then and I did `kubectl apply -f` on both the grpc service files, and looked at the logs for deployment b and saw that it was able to reach deployment c.

consul-values.yaml

global:
  domain: consul
  datacenter: dc1
  metrics:
    enabled: true
    enableAgentMetrics: true

acls:
  manageSystemACLs: true

server:
  replicas: 1
  bootstrapExpect: 1
client:
  enabled: true
  grpc: true
controller:
  enabled: true

# Installs Prometheus and Grafana
prometheus:
  enabled: true

grafana:
  enabled: true

# UI metrics values
ui:
  enabled: true
  metrics:
    enabled: true
    baseURL: http://prometheus-server
    provider: "prometheus"

# Connect service metrics and metrics merging values
connectInject:
  enabled: true
  metrics:
    defaultEnabled: true

grpc-server.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  name: deployment-c
  namespace: default
---
apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceDefaults
metadata:
  name: deployment-c
  namespace: default
spec:
  protocol: grpc
---
apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceIntentions
metadata:
  name: deployment-c
  namespace: default
spec:
  destination:
    name: deployment-c
  sources:
    - name: deployment-b
      action: allow
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: deployment-c
  name: deployment-c-deployment
  namespace: default
spec:
  selector:
    matchLabels:
      app: deployment-c
  template:
    metadata:
      annotations:
        consul.hashicorp.com/connect-inject: "true"
        consul.hashicorp.com/connect-service: "deployment-c"
      labels:
        app: deployment-c
    spec:
      serviceAccountName: deployment-c
      containers:
      - name: deployment-c
        image: gcr.io/nitya-293720/grpc-demo-server
        imagePullPolicy: Always
        ports:
          - containerPort: 50051

and grpc-client.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  name: deployment-b
  namespace: default
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: deployment-b
  name: deployment-b-deployment
  namespace: default
spec:
  selector:
    matchLabels:
      app: deployment-b
  template:
    metadata:
      annotations:
        consul.hashicorp.com/connect-inject: "true"
        consul.hashicorp.com/connect-service: "deployment-b"
        consul.hashicorp.com/connect-service-upstreams: deployment-c:50051
      labels:
        app: deployment-b
    spec:
      serviceAccountName: deployment-b
      containers:
      - name: deployment-b
        image: gcr.io/nitya-293720/grpc-demo-client
        imagePullPolicy: Always

In testing, I originally did not have any container ports on (C) which caused an upstream connect error. Once I added the container port, that went away.

Can you confirm that deployment-c has container port 3001 exposed? In your config, deployment-b is trying to reach deployment-c on that port and I don't see that port in your deployment-c yaml. (I know you had this working before and then it stopped working but can you check)

     containers:
      - name: deployment-c
        image: deployment-c:v1.0.0
        ports:
          - containerPort: 3001

If that is not the issue, maybe we can narrow this down to environment. Would you be able to deploy the grpc-client and grpc-server files to your kubernetes cluster with your setup and see if the grpc-client is logging "Greeting: Hello World" successfully? If it fails in your environment, we can continue to debug that further.

@HofmannZ
Copy link
Author

Hi @ndhanushkodi! Thanks for having a look.

The client code seems to miss one import part from our case. It takes the server address as an environment variable:

       env:
        - name: DEPLOYMENT_C_ADDRESS
          value: localhost:3001

With hardcoded addresses, we were already able to get it to work 😅.

Regarding the containerPort; that is set in Docker when the container is built. In K8s, as far as I'm aware, containerPort simply is a way to override that the port set in the container.

@thisisnotashwin
Copy link
Contributor

Hey @HofmannZ ! Im glad that you were able to get it to work! Which address is it that you had to hardcode in this case in order to get things to work?

@HofmannZ
Copy link
Author

@thisisnotashwin - What I mean to say is that we already got it working with hard coded ports before opening the issue. The problem occurs when we use Consul for service discovery for localhost:3001 (which used to work fine and stopped working overnight 🤷.)

We're happy to move back to K8s service discovery once we get our hands on the new transparent proxy 🎉.

Nevertheless, I thought it was a good idea to report the issue and dig into why this is happening suddenly.

@t-eckert t-eckert changed the title upstream connect error or disconnect/reset before headers. reset reason: connection failure helm: connection failure Aug 24, 2021
@t-eckert t-eckert transferred this issue from hashicorp/consul-helm Aug 24, 2021
lawliet89 pushed a commit to lawliet89/consul-k8s that referenced this issue Sep 13, 2021
In addition to running tests with a license, update the enterprise license job to not suppress error output.
@lkysow
Copy link
Member

lkysow commented Nov 4, 2021

Hi @HofmannZ , I know this is pretty old so sorry for bugging you but I'm wondering if you've tried out transparent proxy and whether the issues described here are still occurring?

@lkysow lkysow added the waiting-reply Waiting on the issue creator for a response before taking further action label Nov 4, 2021
@lkysow
Copy link
Member

lkysow commented Nov 17, 2021

I'm going to close this for now but please ping us if we should re-open.

@lkysow lkysow closed this as completed Nov 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting-reply Waiting on the issue creator for a response before taking further action
Projects
None yet
Development

No branches or pull requests

4 participants