pod rolling update cause error reuqest flow #76580

luweiv9988 · 2019-04-15T06:43:06Z

What happened:
when kubernetes rolling update killing off old pod there will be a small amount of traffic still going through the old pod and cause http 502 (php)

What you expected to happen:

here are my configuration:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: php-chongzhi
  namespace: juhe-test
  labels: 
    app: php-chongzhi
spec:
  selector:
    matchLabels:
      app: php-chongzhi
  replicas: 8
  minReadySeconds: 10
  strategy:
    # indicate which strategy we want for rolling update
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
  # rollback version limitd in 5 times
  revisionHistoryLimit: 5
  template:
    metadata:
      labels:
        app: php-chongzhi
    spec:
      initContainers:
      - name: copywebdata
        image: registry.cn-hangzhou.aliyuncs.com/kubernetes_hub/chongzhi.juhe.cn:f705bedb 
        command:
        - bash
        - "-c"
        - |
          set -ex
          # Copy web-data to tmp folder
          cp -rf /data/www/nginx/chongzhi.juhe.cn/ /mnt/
        volumeMounts:
        - name: chongzhi-data
          mountPath: /mnt/
        - name: localtime
          mountPath: /etc/localtime

      imagePullSecrets:
      - name: pullpass

      containers:
      - name: phpfpm
        image: registry.cn-hangzhou.aliyuncs.com/kubernetes_hub/chongzhi.juhe.cn:f705bedb 
        ports:
        - name: php
          containerPort: 9000
        resources:
          requests:
            cpu: 200m
            memory: 256Mi
          limits:
            cpu: "500m"
            memory: "512Mi"
        readinessProbe:
          periodSeconds: 1
          timeoutSeconds: 1
          tcpSocket: 
            port: 9000
        env:
        - name: aliyun_logs_php-chongzhi
          value: "stdout"
        - name: aliyun_logs_php-chongzhi_tag
          value: app=php-chongzhi

      volumes:
      - name: localtime
        hostPath: 
          path: /etc/localtime
      - name: chongzhi-data
        persistentVolumeClaim: 
          claimName: website-data

---
apiVersion: v1
kind: Service
metadata:
  name: php-chongzhi
  namespace: juhe-test
  labels:
    app: php-chongzhi
spec:
  ports:
  - name: php
    port: 9000
  selector:
    app: php-chongzhi

How to reproduce it (as minimally and precisely as possible):

$ kubectl apply -f deployment.yaml --record

Anything else we need to know?:

php application
http 502

Old pod is terminating ,kuberntes shoud be mark it be dead. but user flow stil can request to
terminating pod cause application error :(

Environment:

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.2", GitCommit:"17c77c7898218073f14c8d573582e8d2313dc740", GitTreeState:"clean", BuildDate:"2018-10-30T21:39:16Z", GoVersion:"go1.11.1", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.5", GitCommit:"753b2dbc622f5cc417845f0ff8a77f539a4213ea", GitTreeState:"clean", BuildDate:"2018-11-26T14:31:35Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider or hardware configuration:
Alicloud
OS (e.g: cat /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:

The text was updated successfully, but these errors were encountered:

luweiv9988 · 2019-04-16T03:53:16Z

/sig scheduling

huguesalary · 2019-05-05T04:07:24Z

Try setting strategy.rollingUpdate.maxUnavailable=0, this will ensure that you always have a new pod running (and ready) before the old one gets terminated.

fejta-bot · 2019-08-03T05:05:58Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-09-02T06:06:13Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2019-10-02T06:49:50Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-10-02T06:49:58Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

luweiv9988 added the kind/bug Categorizes issue or PR as related to a bug. label Apr 15, 2019

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Apr 15, 2019

k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 16, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 3, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 2, 2019

k8s-ci-robot closed this as completed Oct 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pod rolling update cause error reuqest flow #76580

pod rolling update cause error reuqest flow #76580

luweiv9988 commented Apr 15, 2019

luweiv9988 commented Apr 16, 2019

huguesalary commented May 5, 2019

fejta-bot commented Aug 3, 2019

fejta-bot commented Sep 2, 2019

fejta-bot commented Oct 2, 2019

k8s-ci-robot commented Oct 2, 2019

pod rolling update cause error reuqest flow #76580

pod rolling update cause error reuqest flow #76580

Comments

luweiv9988 commented Apr 15, 2019

luweiv9988 commented Apr 16, 2019

huguesalary commented May 5, 2019

fejta-bot commented Aug 3, 2019

fejta-bot commented Sep 2, 2019

fejta-bot commented Oct 2, 2019

k8s-ci-robot commented Oct 2, 2019