Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job fails before the .spec.backoffLimit #93559

Closed
georgettica opened this issue Jul 30, 2020 · 7 comments · Fixed by #93779
Closed

job fails before the .spec.backoffLimit #93559

georgettica opened this issue Jul 30, 2020 · 7 comments · Fixed by #93779
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/apps Categorizes an issue or PR as relevant to SIG Apps.

Comments

@georgettica
Copy link

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):
$oc create job -f job.yaml
$ oc get job jobname -ojson | jq .status
{
"active": 1,
"failed": 1,
"startTime": "2020-07-30T06:45:37Z"
}
{
"active": 1,
"failed": 2,
"startTime": "2020-07-30T06:45:37Z"
}
{
"active": 1,
"failed": 3,
"startTime": "2020-07-30T06:45:37Z"
}
{
"active": 1,
"failed": 4,
"startTime": "2020-07-30T06:45:37Z"
}
{
"conditions": [
{
"lastProbeTime": "2020-07-30T06:53:10Z",
"lastTransitionTime": "2020-07-30T06:53:10Z",
"message": "Job has reached the specified backoff limit",
"reason": "BackoffLimitExceeded",
"status": "True",
"type": "Failed"
}
],
"failed": 5,
"startTime": "2020-07-30T06:45:37Z"
}
$ oc get job jobname -ojson | jq .spec.backoffLimit
6
Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
    Client Version: 4.5.3
    Server Version: 4.4.13
    Kubernetes Version: v1.17.1+b8568b

  • Cloud provider or hardware configuration:

  • OS (e.g: cat /etc/os-release):
    NAME="Red Hat Enterprise Linux CoreOS"
    VERSION_ID="4.4"
    OPENSHIFT_VERSION="4.4"
    RHEL_VERSION="8.2"
    ID="rhcos"
    ID_LIKE="rhel fedora"
    ANSI_COLOR="0;31"
    CPE_NAME="cpe:/o:redhat:enterprise_linux:8::coreos"
    HOME_URL="https://www.redhat.com/"
    BUG_REPORT_URL="https://bugzilla.redhat.com/"
    REDHAT_BUGZILLA_PRODUCT="OpenShift Container Platform"
    REDHAT_BUGZILLA_PRODUCT_VERSION="4.4"
    REDHAT_SUPPORT_PRODUCT="OpenShift Container Platform"
    REDHAT_SUPPORT_PRODUCT_VERSION="4.4"
    OSTREE_VERSION='44.82.202007141430-0'

  • Kernel (e.g. uname -a):
    Linux ip-10-0-192-232 4.18.0-193.13.2.el8_2.x86_64 Unit test coverage in Kubelet is lousy. (~30%) #1 SMP Mon Jul 13 23:17:28 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

  • Install tools:

  • Others:
    apiVersion: batch/v1
    kind: Job
    metadata:
    name: sre-build-test-fail
    namespace: openshift-build-test
    spec:
    backoffLimit: 6
    completions: 1
    parallelism: 1
    template:
    spec:
    affinity:
    nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    nodeSelectorTerms:
    - matchExpressions:
    - key: node-role.kubernetes.io
    operator: In
    values:
    - infra
    containers:
    - command:
    - /bin/bash
    - -c
    - |
    # ensure we fail if something exits non-zero
    set -o errexit
    set -o nounset
    set -o pipefail
    exit 1

      imagePullPolicy: Always
      name: sre-build-test
    restartPolicy: Never
    terminationGracePeriodSeconds: 30
    tolerations:
    - effect: NoSchedule
      key: node-role.kubernetes.io
      operator: Equal
      value: infra
    
@georgettica georgettica added the kind/bug Categorizes issue or PR as related to a bug. label Jul 30, 2020
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jul 30, 2020
@georgettica
Copy link
Author

/sig apps

@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 30, 2020
@georgettica
Copy link
Author

not sure but that is the best I can think of at this time

@xinxinh2020
Copy link

i have noticed this problem also, and sometimes, it seems to work normally.

@georgettica
Copy link
Author

georgettica commented Aug 5, 2020

so I found out that in here the test is previousRetry + 1 so this can cause the issue.. it has been there from the beginning of the codes inception..

WDYT? @hongxu-jia

@kow3ns
Copy link
Member

kow3ns commented Aug 10, 2020

/assign @soltysh

@soltysh
Copy link
Contributor

soltysh commented Aug 10, 2020

Duplicate of #92245.
/close

@k8s-ci-robot
Copy link
Contributor

@soltysh: Closing this issue.

In response to this:

Duplicate of #92245.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/apps Categorizes an issue or PR as relevant to SIG Apps.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants