Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canary ingress having the same version backend service with main ingress causes 503 service unavailable (when in the same namespace) #3952

Closed
kidlj opened this issue Apr 1, 2019 · 6 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@kidlj
Copy link

kidlj commented Apr 1, 2019

Is this a BUG REPORT or FEATURE REQUEST? (choose one):

BUG REPORT

NGINX Ingress controller version:

image: quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.23.0

Kubernetes version (use kubectl version):

$ kubectl version

Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.3", GitCommit:"721bfa751924da8d1680787490c54b9179b1fed0", GitTreeState:"clean", BuildDate:"2019-02-04T04:48:03Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"darwin/amd64"}

Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.6", GitCommit:"ab91afd7062d4240e95e51ac00a18bd58fddd365", GitTreeState:"clean", BuildDate:"2019-02-26T12:49:28Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
$ cat /etc/os-release 
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

What happened:

When configuring a canary ingress(weight, header or cookie) in the same namespace with the main ingress, if the canary ingress has the same backend service with the main ingress, the controller responds with 100% 503 service unavailable for the configured ingress path.

What you expected to happen:

Though normally we should have different version of service for canary ingresses, but having the same service with main ingress should not cause 503 responses.

How to reproduce it (as minimally and precisely as possible):

The two ingresses are in the same namespace:

$ kubectl get ingress -n=echo-production
NAME              HOSTS      ADDRESS   PORTS   AGE
http-svc          echo.com             80      6h45m
http-svc-canary   echo.com             80      163m

The main ingress:

$ kubectl get ingress -n=echo-production http-svc -o yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"extensions/v1beta1","kind":"Ingress","metadata":{"annotations":{"kubernetes.io/ingress.class":"nginx"},"name":"http-svc","namespace":"echo-production"},"spec":{"rules":[{"host":"echo.com","http":{"paths":[{"backend":{"serviceName":"http-svc","servicePort":80}}]}}]}}
    kubernetes.io/ingress.class: nginx
  creationTimestamp: "2019-04-01T03:31:54Z"
  generation: 1
  name: http-svc
  namespace: echo-production
  resourceVersion: "2909034"
  selfLink: /apis/extensions/v1beta1/namespaces/echo-production/ingresses/http-svc
  uid: b1e085cb-542e-11e9-a0a9-fa163e7b2db1
spec:
  rules:
  - host: echo.com
    http:
      paths:
      - backend:
          serviceName: http-svc
          servicePort: 80
status:
  loadBalancer: {}

The http-svc-canary ingress is in the same namespace as http-svc ingress, and it has the same backend service with main ingress:

$ kubectl get ingress -n=echo-production http-svc-canary -o yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "30"
  creationTimestamp: "2019-04-01T07:34:05Z"
  generation: 1
  name: http-svc-canary
  namespace: echo-production
  resourceVersion: "2929466"
  selfLink: /apis/extensions/v1beta1/namespaces/echo-production/ingresses/http-svc-canary
  uid: 872f3ff7-5450-11e9-a0a9-fa163e7b2db1
spec:
  rules:
  - host: echo.com
    http:
      paths:
      - backend:
          serviceName: http-svc
          servicePort: 80
status:
  loadBalancer: {}

After echo.com has been pointed to the controller ip in /etc/hosts:

$ curl -s http://echo.com
<html>
<head><title>503 Service Temporarily Unavailable</title></head>
<body>
<center><h1>503 Service Temporarily Unavailable</h1></center>
<hr><center>nginx/1.15.9</center>
</body>
</html>

The http-svc yaml:

cat demo-echo-service.yaml 
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: http-svc
spec:
  replicas: 1
  selector:
    matchLabels:
      app: http-svc
  template:
    metadata:
      labels:
        app: http-svc
    spec:
      containers:
      - name: http-svc
        image: gcr.io/kubernetes-e2e-test-images/echoserver:2.1
        ports:
        - containerPort: 8080
        env:
          - name: NODE_NAME
            valueFrom:
              fieldRef:
                fieldPath: spec.nodeName
          - name: POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          - name: POD_IP
            valueFrom:
              fieldRef:
                fieldPath: status.podIP

---

apiVersion: v1
kind: Service
metadata:
  name: http-svc
  labels:
    app: http-svc
spec:
  ports:
  - port: 80
    targetPort: 8080
    protocol: TCP
    name: http
  selector:
    app: http-svc

Anything else we need to know:

  1. Main ingress and canary ingress in the same namespace;
  2. Main ingress and canary ingress have the same backend service;
  3. 503 100%;
  4. Ingress controller pod is ok running.
@ElvinEfendi
Copy link
Member

Yeah this is not good - patches are welcome :) Otherwise we will address it sometime later.

@joshsouza
Copy link

Just to add some confirmation/information here that may be helpful (I'm still fully troubleshooting a slightly different problem, but I believe it's related):

It appears to me that the canary settings are tied to the upstream/backend that they use, and the first ingress rule that generates the upstream's code will "win" in defining the upstream. This prevents you from utilizing the same backend for multiple ingresses with any one of them using canary (since the upstream will be created without the canary trafficshaping settings) but conversely, I imagine if the canary created the upstream first, you might inadvertently propagate trafficshaping rules to non-canary things. (I haven't confirmed that yet, just seems logical).

I don't know if there's a fix unless there was a separate "canary" upstream created for canary things, but I also haven't looked far enough into the code to know if that's appropriate.

Hopefully this information is useful for others, or in investigating this further.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 25, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 26, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants