Pod is deleted after job is failed, with restartPolicy: Never #83999

mofirouz · 2019-10-16T10:11:48Z

What happened:
A job is created with a single InitContainer and a single main container. The pod restart policy is set to "Never". If the job fails, the pod is randomly deleted. I should mention that the pod is deleted sometimes, not all the time.

Most importantly, we did not observe this issue in Kubernetes 1.12.9-gke.15, but we are observing it now in 1.14.6-gke.1 - we do not have a Kubernetes 1.13 cluster.

What you expected to happen:
The pod to remain indefinitely as long as the Job object remains on the system or is explicitly deleted.

How to reproduce it (as minimally and precisely as possible):

apiVersion: batch/v1
kind: Job
metadata:
  name: test-job
  namespace: test
spec:
  activeDeadlineSeconds: 300
  backoffLimit: 0
  completions: 1
  parallelism: 1
  template:
    spec:
      terminationGracePeriodSeconds: 30
      restartPolicy: Never
      automountServiceAccountToken: false
      containers:
      - image: perl
        name: pi
        command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      initContainers:
      - image: alpine/git:latest
        name: git
        command:
        - /bin/sh
        - -ec
        - git clone git@github.com:test/badrepo
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - kill
            - sys_chroot
            - mknod
            - net_raw
            - chown
            - dac_override
            - fowner
            - fsetid
            - setgid
            - setuid
            - setpcap
            - net_bind_service
            - audit_write
            - setfcap
          readOnlyRootFilesystem: false
          runAsNonRoot: true
          runAsUser: 1001

Anything else we need to know?:

I have a sinking suspension that this may be related to the following issue (#79398) / PR (#79451) - hopefully I'm not completed off-base here.

Environment: GKE

Kubernetes version (use kubectl version):

Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.6-gke.1", GitCommit:"61c30f98599ad5309185df308962054d9670bafa", GitTreeState:"clean", BuildDate:"2019-08-28T11:06:42Z", GoVersion:"go1.12.9b4", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider or hardware configuration: GKE
OS (e.g: cat /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Network plugin and version (if this is a network-related bug):
Others:

The text was updated successfully, but these errors were encountered:

mofirouz · 2019-10-16T10:19:41Z

@kubernetes/sig-node-bugs

k8s-ci-robot · 2019-10-16T10:19:49Z

@mofirouz: Reiterating the mentions to trigger a notification:
@kubernetes/sig-node-bugs

In response to this:

@kubernetes/sig-node-bugs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

mofirouz · 2019-12-01T07:40:22Z

I've figured this out - it's to do with auto-resizing of node pools in GKE - after ~15min the underlying node that was hosting the pod goes away and Kubernetes removes all the components that were once connected to that node.

mofirouz added the kind/bug Categorizes issue or PR as related to a bug. label Oct 16, 2019

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Oct 16, 2019

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 16, 2019

mofirouz closed this as completed Dec 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pod is deleted after job is failed, with restartPolicy: Never #83999

Pod is deleted after job is failed, with restartPolicy: Never #83999

mofirouz commented Oct 16, 2019 •

edited

Loading

mofirouz commented Oct 16, 2019

k8s-ci-robot commented Oct 16, 2019

mofirouz commented Dec 1, 2019

Pod is deleted after job is failed, with restartPolicy: Never #83999

Pod is deleted after job is failed, with restartPolicy: Never #83999

Comments

mofirouz commented Oct 16, 2019 • edited Loading

mofirouz commented Oct 16, 2019

k8s-ci-robot commented Oct 16, 2019

mofirouz commented Dec 1, 2019

mofirouz commented Oct 16, 2019 •

edited

Loading