Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipelines never marked as finished (neither failed / success) after lighthouse / jx upgrade #7444

Closed
axsaucedo opened this issue Jul 16, 2020 · 1 comment · Fixed by #7463
Assignees
Labels
area/gc kind/bug Issue is a bug priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. ready-for-review

Comments

@axsaucedo
Copy link

axsaucedo commented Jul 16, 2020

Summary

We've upgraded our cluster to the latest version of JX, we have several jobs which run for 120+ minutes, but when they terminate the github status never changes (neither to failed or successful). When looking at the pods they still terminate, so that leads me to assume that the issue could be with containers that run for too long, or perhaps the return value was of certain format which didn't register into the jenkins-x-controllerbuild. I can see that the container exited with exit value 0 and with status Completed / Terminated.

Below I added the output of get and describe of the pod, which shows that it was completed, but still doesn't update the github environment. When looking at the jenkins-x-controller I also can't see any logs that show the completion of the pod, so it seems the controllerbuild is not able to be notified.

Is there any known issue about this? Is there further information that could be provided to understand what may be causing this?

The output of kubectl get pod <jx-pipeline-pod>:

seldonio-seldon-core-pr-2145-no-l5pkw-3-end-to-end-prc5j-pod-0917d3         0/4     Completed   0          109m

The output of kubectl get pod <jx-pipeline-pod>:

Name:           seldonio-seldon-core-pr-2145-no-l5pkw-3-end-to-end-prc5j-pod-0917d3
Namespace:      jx
Priority:       0
Node:           gke-jx-production-cluster-pool-2-9a7402c1-hg42/10.154.15.209
Start Time:     Thu, 16 Jul 2020 12:58:46 +0100
Labels:         app.kubernetes.io/managed-by=tekton-pipelines
                branch=PR-2145
                build=3
                context=notebooks
                jenkins.io/pipelineType=build
                jenkins.io/task-stage-name=end-to-end
                owner=SeldonIO
                repository=seldon-core
                tekton.dev/pipeline=seldonio-seldon-core-pr-2145-no-l5pkw-3
                tekton.dev/pipelineRun=seldonio-seldon-core-pr-2145-no-l5pkw-3
                tekton.dev/pipelineTask=end-to-end
                tekton.dev/task=seldonio-seldon-core-pr-2145-no-l5pkw-end-to-end-3
                tekton.dev/taskRun=seldonio-seldon-core-pr-2145-no-l5pkw-3-end-to-end-prc5j
Annotations:    tekton.dev/ready: READY
Status:         Succeeded
IP:             10.12.0.5
IPs:            <none>
Controlled By:  TaskRun/seldonio-seldon-core-pr-2145-no-l5pkw-3-end-to-end-prc5j
Init Containers:
  step-credential-initializer-c9hqr:
    Container ID:  docker://d4be97f48a07adbbd2b160d46a8ade70752e819589e7dbd9b49b7c68865f9c2e
    Image:         gcr.io/abayer-pipeline-crd/tekton-for-jx/creds-init:v20200414-2b72e7c6
    Image ID:      docker-pullable://gcr.io/abayer-pipeline-crd/tekton-for-jx/creds-init@sha256:97f2b5dfa382700a93aabdaf45f95353007c6936d0cf799be5ce5266f9793e4c
    Port:          <none>
    Host Port:     <none>
    Command:
      /ko-app/creds-init
    Args:
      -basic-git=knative-git-user-pass=https://github.com
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 16 Jul 2020 12:59:04 +0100
      Finished:     Thu, 16 Jul 2020 12:59:05 +0100
    Ready:          True
    Restart Count:  0
    Environment:
      HOME:  /builder/home
    Mounts:
      /builder/home from home (rw)
      /var/build-secrets/knative-git-user-pass from secret-volume-knative-git-user-pass-dwzdk (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from tekton-bot-token-gqjdp (ro)
      /workspace from workspace (rw)
  step-working-dir-initializer-pkhdv:
    Container ID:  docker://46112cb5a258357a81f64052d98cd52ec2ec76e538be6b333af9c476c10a37c3
    Image:         gcr.io/abayer-pipeline-crd/tekton-for-jx/bash:v20200414-2b72e7c6
    Image ID:      docker-pullable://gcr.io/abayer-pipeline-crd/tekton-for-jx/bash@sha256:e513906182438b3b17221cebfbc7f657cd0db982ec4c302fbfcb8f58f7c3ebac
    Port:          <none>
    Host Port:     <none>
    Command:
      /ko-app/bash
    Args:
      -args
      mkdir -p /workspace/source
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 16 Jul 2020 12:59:06 +0100
      Finished:     Thu, 16 Jul 2020 12:59:06 +0100
    Ready:          True
    Restart Count:  0
    Environment:
      HOME:  /builder/home
    Mounts:
      /builder/home from home (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from tekton-bot-token-gqjdp (ro)
      /workspace from workspace (rw)
  step-place-tools:
    Container ID:  docker://47351bc18b74fda401d3e149b6cac91459519faaa38ff2446d7e99fe64177750
    Image:         gcr.io/abayer-pipeline-crd/tekton-for-jx/entrypoint:v20200414-2b72e7c6
    Image ID:      docker-pullable://gcr.io/abayer-pipeline-crd/tekton-for-jx/entrypoint@sha256:c74f9b3a76ab2ad4c9f9730073161a89311e9d9233836fbe8930066325153373
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
    Args:
      -c
      cp /ko-app/entrypoint /builder/tools/entrypoint
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 16 Jul 2020 12:59:07 +0100
      Finished:     Thu, 16 Jul 2020 12:59:07 +0100
    Ready:          True
    Restart Count:  0
    Environment:
      HOME:  /builder/home
    Mounts:
      /builder/home from home (rw)
      /builder/tools from tools (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from tekton-bot-token-gqjdp (ro)
      /workspace from workspace (rw)
Containers:
  step-create-dir-workspace-2gg5g:
    Container ID:  docker://242859151e57e223b6bc72300ccad570fb5cc0a5f3cf39a758d25100995dd17c
    Image:         gcr.io/abayer-pipeline-crd/tekton-for-jx/bash:v20200414-2b72e7c6
    Image ID:      docker-pullable://gcr.io/abayer-pipeline-crd/tekton-for-jx/bash@sha256:e513906182438b3b17221cebfbc7f657cd0db982ec4c302fbfcb8f58f7c3ebac
    Port:          <none>
    Host Port:     <none>
    Command:
      /builder/tools/entrypoint
    Args:
      -wait_file
      /builder/downward/ready
      -post_file
      /builder/tools/0
      -wait_file_content
      -entrypoint
      /ko-app/bash
      --
      -args
      mkdir -p /workspace/source
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 16 Jul 2020 12:59:08 +0100
      Finished:     Thu, 16 Jul 2020 12:59:14 +0100
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:                0
      ephemeral-storage:  0
      memory:             0
    Environment:
      HOME:  /builder/home
    Mounts:
      /builder/downward from downward (rw)
      /builder/home from home (rw)
      /builder/tools from tools (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from tekton-bot-token-gqjdp (ro)
      /workspace from workspace (rw)
  step-source-copy-workspace-qspz4:
    Container ID:  docker://cc19d4360f6e391ec9cdf88f113bf7b2eea71a7ef8f45c6f96467f1a72a47182
    Image:         gcr.io/abayer-pipeline-crd/tekton-for-jx/bash:v20200414-2b72e7c6
    Image ID:      docker-pullable://gcr.io/abayer-pipeline-crd/tekton-for-jx/bash@sha256:e513906182438b3b17221cebfbc7f657cd0db982ec4c302fbfcb8f58f7c3ebac
    Port:          <none>
    Host Port:     <none>
    Command:
      /builder/tools/entrypoint
    Args:
      -wait_file
      /builder/tools/0
      -post_file
      /builder/tools/1
      -entrypoint
      /ko-app/bash
      --
      -args
      cp -r /pvc/pr-build-comment/workspace/. /workspace/source
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 16 Jul 2020 12:59:08 +0100
      Finished:     Thu, 16 Jul 2020 12:59:17 +0100
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:                0
      ephemeral-storage:  0
      memory:             0
    Environment:
      HOME:  /builder/home
    Mounts:
      /builder/home from home (rw)
      /builder/tools from tools (rw)
      /pvc from seldonio-seldon-core-pr-2145-no-l5pkw-3-pvc (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from tekton-bot-token-gqjdp (ro)
      /workspace from workspace (rw)
  step-setup-builder-home:
    Container ID:  docker://44366f6fb527de98549db560c099cfc7681231c522fee1a4385d6ff905a4c44f
    Image:         gcr.io/jenkinsxio/builder-jx:2.1.94-721
    Image ID:      docker-pullable://gcr.io/jenkinsxio/builder-jx@sha256:8bf7a59c70e9ec241685381a4ecec6d1df7ccfa2729b28dce4d2052ece75be90
    Port:          <none>
    Host Port:     <none>
    Command:
      /builder/tools/entrypoint
    Args:
      -wait_file
      /builder/tools/1
      -post_file
      /builder/tools/2
      -entrypoint
      /bin/sh
      --
      -c
      [ -d /builder/home ] || mkdir -p /builder && ln -s /tekton/home /builder/home
    State:          Completed
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 16 Jul 2020 12:59:09 +0100
      Finished:     Thu, 16 Jul 2020 12:59:19 +0100
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:                4
      ephemeral-storage:  6Gi
      memory:             8000Mi
    Environment:
      HOME:                              /builder/home
      APP_NAME:                          seldon-core
      BRANCH_NAME:                       PR-2145
      BUILD_NUMBER:                      3
      JOB_NAME:                          SeldonIO/seldon-core/PR-2145
      JOB_SPEC:                          type:presubmit
      JOB_TYPE:                          presubmit
      PIPELINE_CONTEXT:                  notebooks
      PIPELINE_KIND:                     pullrequest
      PULL_BASE_REF:                     master
      PULL_BASE_SHA:                     bbe0fefc1a80556189736e19a4c94fb88319e361
      PULL_NUMBER:                       2145
      PULL_PULL_SHA:                     47d8c4beb9b2d033703aa4f2ab65328250f99445
      PULL_REFS:                         master:bbe0fefc1a80556189736e19a4c94fb88319e361,2145:47d8c4beb9b2d033703aa4f2ab65328250f99445
      REPO_NAME:                         seldon-core
      REPO_OWNER:                        SeldonIO
      SELDON_E2E_TESTS_POD_INFORMATION:  true
      SELDON_E2E_TESTS_TO_RUN:           notebooks
      SOURCE_URL:                        https://github.com/SeldonIO/seldon-core.git
      DOCKER_REGISTRY:
      GIT_AUTHOR_NAME:                   seldondev
      GIT_AUTHOR_EMAIL:                  devext@seldon.io
      GIT_COMMITTER_NAME:                seldondev
      GIT_COMMITTER_EMAIL:               devext@seldon.io
      JX_BATCH_MODE:                     true
      VERSION:                           0.0.0-SNAPSHOT-PR-2145-3
      BUILD_ID:                          3
      PREVIEW_VERSION:                   0.0.0-SNAPSHOT-PR-2145-3
    Mounts:
      /builder/home from home (rw)
      /builder/tools from tools (rw)
      /etc/podinfo from podinfo (ro)
      /lib/modules from modules (ro)
      /sys/fs/cgroup from cgroup (rw)
      /var/lib/docker from dind-storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from tekton-bot-token-gqjdp (ro)
      /workspace from workspace (rw)
  step-test-end-to-end:
    Container ID:  docker://8436b80672a22683bddff76e4afb8094ad39ad894b3d5d9eae6a0b93133f723d
    Image:         seldonio/core-builder:0.15
    Image ID:      docker-pullable://seldonio/core-builder@sha256:a75c2ffb034fdb83f0c71a2902d1c41179e04fef87485159351565dadb18c87b
    Port:          <none>
    Host Port:     <none>
    Command:
      /builder/tools/entrypoint
    Args:
      -wait_file
      /builder/tools/2
      -post_file
      /builder/tools/3
      -entrypoint
      /bin/sh
      --
      -c
      cd testing/scripts && bash kind_test_all.sh
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 16 Jul 2020 12:59:11 +0100
      Finished:     Thu, 16 Jul 2020 14:42:08 +0100
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:                0
      ephemeral-storage:  0
      memory:             0
    Environment:
      HOME:                              /builder/home
      APP_NAME:                          seldon-core
      BRANCH_NAME:                       PR-2145
      BUILD_NUMBER:                      3
      JOB_NAME:                          SeldonIO/seldon-core/PR-2145
      JOB_SPEC:                          type:presubmit
      JOB_TYPE:                          presubmit
      PIPELINE_CONTEXT:                  notebooks
      PIPELINE_KIND:                     pullrequest
      PULL_BASE_REF:                     master
      PULL_BASE_SHA:                     bbe0fefc1a80556189736e19a4c94fb88319e361
      PULL_NUMBER:                       2145
      PULL_PULL_SHA:                     47d8c4beb9b2d033703aa4f2ab65328250f99445
      PULL_REFS:                         master:bbe0fefc1a80556189736e19a4c94fb88319e361,2145:47d8c4beb9b2d033703aa4f2ab65328250f99445
      REPO_NAME:                         seldon-core
      REPO_OWNER:                        SeldonIO
      SELDON_E2E_TESTS_POD_INFORMATION:  true
      SELDON_E2E_TESTS_TO_RUN:           notebooks
      SOURCE_URL:                        https://github.com/SeldonIO/seldon-core.git
      DOCKER_REGISTRY:
      GIT_AUTHOR_NAME:                   seldondev
      GIT_AUTHOR_EMAIL:                  devext@seldon.io
      GIT_COMMITTER_NAME:                seldondev
      GIT_COMMITTER_EMAIL:               devext@seldon.io
      JX_BATCH_MODE:                     true
      VERSION:                           0.0.0-SNAPSHOT-PR-2145-3
      BUILD_ID:                          3
      PREVIEW_VERSION:                   0.0.0-SNAPSHOT-PR-2145-3
    Mounts:
      /builder/home from home (rw)
      /builder/tools from tools (rw)
      /etc/podinfo from podinfo (ro)
      /lib/modules from modules (ro)
      /sys/fs/cgroup from cgroup (rw)
      /var/lib/docker from dind-storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from tekton-bot-token-gqjdp (ro)
      /workspace from workspace (rw)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  cgroup:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/fs/cgroup
    HostPathType:  Directory
  dind-storage:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:  Directory
  podinfo:
    Type:  DownwardAPI (a volume populated by information about the pod)
    Items:
      metadata.labels -> labels
  seldonio-seldon-core-pr-2145-no-l5pkw-3-pvc:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  seldonio-seldon-core-pr-2145-no-l5pkw-3-pvc
    ReadOnly:   false
  tools:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  downward:
    Type:  DownwardAPI (a volume populated by information about the pod)
    Items:
      metadata.annotations['tekton.dev/ready'] -> ready
  workspace:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  home:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  secret-volume-knative-git-user-pass-dwzdk:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  knative-git-user-pass
    Optional:    false
  tekton-bot-token-gqjdp:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  tekton-bot-token-gqjdp
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>

Jx version

The output of jx version is:

Version        2.1.95
Commit         b486665
Build date     2020-07-02T10:28:34Z
Go version     1.13.8
Git tree state clean

Diagnostic information

The output of jx diagnose version is:

Running in namespace: jx
Version        2.1.95
Commit         b486665
Build date     2020-07-02T10:28:34Z
Go version     1.13.8
Git tree state clean
NAME                          VERSION
Kubernetes cluster            v1.14.10-gke.36
kubectl (installed in JX_BIN) v1.16.6-beta.0
helm client                   2.12.2
git                           2.17.1
Operating System              Ubuntu 18.04.3 LTS

Please visit https://jenkins-x.io/faq/issues/ for any known issues.

Finished printing diagnostic information.

Kubernetes cluster

Kubectl version

The output of kubectl version --client is:

Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.4", GitCommit:"224be7bdce5a9dd0c2fd0d46b83865648e2fe0ba", GitTreeState:"clean", BuildDate:"2019-12-11T12:47:40Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.10-gke.36", GitCommit:"34a615f32e9a0c9e97cdb9f749adb392758349a6", GitTreeState:"clean", BuildDate:"2020-04-06T16:33:17Z", GoVersion:"go1.12.12b4", Compiler:"gc", Platform:"linux/amd64"}

Operating system / Environment

Linux ubuntu 18.04

@axsaucedo
Copy link
Author

We've done some exploration, and have been seeing that we could also try to notify the container build manually. Is there a jx command to explicitly update the container build that a pipeline has finished? Could this be possible with something like jx update webhook?

@abayer abayer self-assigned this Jul 20, 2020
@abayer abayer added area/gc kind/bug Issue is a bug priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. ready-for-review labels Jul 20, 2020
abayer added a commit to abayer/jx that referenced this issue Jul 20, 2020
In practice, this results in builds that take longer than the max age
sometimes (though not always) failing to get their status actually
recorded properly. So let's increase that age, and also add a check to
make sure the `PipelineRun` is in a terminal state before we even
consider deleting it.

fixes jenkins-x#7444

Signed-off-by: Andrew Bayer <andrew.bayer@gmail.com>
jenkins-x-bot pushed a commit that referenced this issue Jul 21, 2020
In practice, this results in builds that take longer than the max age
sometimes (though not always) failing to get their status actually
recorded properly. So let's increase that age, and also add a check to
make sure the `PipelineRun` is in a terminal state before we even
consider deleting it.

fixes #7444

Signed-off-by: Andrew Bayer <andrew.bayer@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/gc kind/bug Issue is a bug priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. ready-for-review
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants