Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing builds for CRI-O jobs #1547

Closed
runcom opened this issue Aug 7, 2018 · 20 comments
Closed

missing builds for CRI-O jobs #1547

runcom opened this issue Aug 7, 2018 · 20 comments

Comments

@runcom
Copy link
Contributor

runcom commented Aug 7, 2018

We don't have anything in here https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/kubernetes-incubator_cri-o/1724/test_pull_request_crio_e2e_rhel/1239/

We should have something that looks like PRs builds in Origin. We're very luckily missing logs/junits from GCE storage. For instance, in this job we're getting (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_crio_e2e_rhel/1235/consoleFull#-202735258358b6e51eb7608a5981914356):

+ scp -F /var/lib/jenkins/jobs/test_pull_request_crio_e2e_rhel/workspace/.config/origin-ci-tool/inventory/.ssh_config -r /var/lib/jenkins/jobs/test_pull_request_crio_e2e_rhel/workspace/gcs openshiftdevel:/data
scp: /data/gcs: Permission denied

I'm not familiar with gce and the ci. Can someone assist me on this? we would need build info for CI jobs in CRI-O to show e2e results in upstream k8s test-grid.

/cc @mrunalp @smarterclayton @stevekuznetsov

@stevekuznetsov
Copy link
Contributor

Did this change recently? I don't understand why we would get permissions denied on the target host, we are root on that machine. Is this reproducible?

@runcom
Copy link
Contributor Author

runcom commented Aug 8, 2018

looks like every job is failing on gcs artifacts 😕

@runcom
Copy link
Contributor Author

runcom commented Aug 27, 2018

Did this change recently? I don't understand why we would get permissions denied on the target host, we are root on that machine. Is this reproducible?

@stevekuznetsov we're still not able to collect logs/artifacts for CRI-O builds. We can see everything is working properly for builds in gcp-crio for origin, but when it comes to the crio repo itself, we don't have anything:

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/kubernetes-incubator_cri-o/1724/test_pull_request_crio_e2e_rhel/1239/ ---> any of this link will show "Missing build ..."

@runcom
Copy link
Contributor Author

runcom commented Aug 27, 2018

This looks fine instead as said https://openshift-gce-devel.appspot.com/builds/origin-ci-test/pr-logs/directory/pull-ci-origin-e2e-gcp-crio/ (this is origin repo, not cri-o repo)

@runcom
Copy link
Contributor Author

runcom commented Aug 27, 2018

This PR is now breaking everything I believe kubernetes/test-infra#5943 -> https://k8s-testgrid.appspot.com/sig-node-cri-o#Summary

@stevekuznetsov
Copy link
Contributor

Looks like there is no /data directory on the remote host.

@runcom
Copy link
Contributor Author

runcom commented Aug 28, 2018

@stevekuznetsov looks like the e2e jobs in cri-o/cri-o#1715 are still missing /data/gcs even if you merged #1563. Are the changes in that PR already in place? Do we need to update the def in jenkins or anything else?

The failing jobs are:

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/kubernetes-incubator_cri-o/1715/test_pull_request_crio_e2e_fedora/1153/
https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/kubernetes-incubator_cri-o/1715/test_pull_request_crio_e2e_rhel/1295/

They should have run with the changes to create /data/gcs but if you jump on the vms, no /data/gsc is there.

@runcom
Copy link
Contributor Author

runcom commented Aug 28, 2018

On top of the above, looks like the e2e_fedora job linked above never completes: https://ci.openshift.redhat.com/jenkins/job/test_pull_request_crio_e2e_fedora/1153/console - it's stuck in gathering info

@runcom
Copy link
Contributor Author

runcom commented Aug 28, 2018

@runcom
Copy link
Contributor Author

runcom commented Aug 28, 2018

further update, the /data/gcs change isn't taking any effect on jobs...... both vms don't have /data/gcs

@runcom
Copy link
Contributor Author

runcom commented Aug 28, 2018

@stevekuznetsov
Copy link
Contributor

Looks like something is causing a no-op on the push?

########## STARTING STAGE: PUSH THE ARTIFACTS AND METADATA ##########
+ [[ -s /var/lib/jenkins/jobs/test_pull_request_crio_e2e_rhel/workspace@2/activate ]]
+ source /var/lib/jenkins/jobs/test_pull_request_crio_e2e_rhel/workspace@2/activate
++ export VIRTUAL_ENV=/var/lib/jenkins/origin-ci-tool/dae8b1fdd92e4c6b040802a9f6893334ae0660fd
++ VIRTUAL_ENV=/var/lib/jenkins/origin-ci-tool/dae8b1fdd92e4c6b040802a9f6893334ae0660fd
++ export PATH=/var/lib/jenkins/origin-ci-tool/dae8b1fdd92e4c6b040802a9f6893334ae0660fd/bin:/sbin:/usr/sbin:/bin:/usr/bin
++ PATH=/var/lib/jenkins/origin-ci-tool/dae8b1fdd92e4c6b040802a9f6893334ae0660fd/bin:/sbin:/usr/sbin:/bin:/usr/bin
++ unset PYTHON_HOME
++ export OCT_CONFIG_HOME=/var/lib/jenkins/jobs/test_pull_request_crio_e2e_rhel/workspace@2/.config
++ OCT_CONFIG_HOME=/var/lib/jenkins/jobs/test_pull_request_crio_e2e_rhel/workspace@2/.config
++ mktemp
+ script=/tmp/tmp.36LzkUloCQ
+ cat
+ chmod +x /tmp/tmp.36LzkUloCQ
+ scp -F /var/lib/jenkins/jobs/test_pull_request_crio_e2e_rhel/workspace@2/.config/origin-ci-tool/inventory/.ssh_config /tmp/tmp.36LzkUloCQ openshiftdevel:/tmp/tmp.36LzkUloCQ
+ ssh -F /var/lib/jenkins/jobs/test_pull_request_crio_e2e_rhel/workspace@2/.config/origin-ci-tool/inventory/.ssh_config -t openshiftdevel 'bash -l -c "timeout 300 /tmp/tmp.36LzkUloCQ"'
+ cd /home/origin
+ trap 'exit 0' EXIT
+ [[ -n '' ]]
+ exit 0
+ set +o xtrace
########## FINISHED STAGE: SUCCESS: PUSH THE ARTIFACTS AND METADATA [00h 00m 01s] ##########

@stevekuznetsov
Copy link
Contributor

The script is:

      trap 'exit 0' EXIT
      if [[ -n "${JOB_SPEC:-}" ]]; then
        JOB_SPEC="$( jq --compact-output ".buildid |= \"${BUILD_NUMBER}\"" <<<"${JOB_SPEC}" )"
        docker run -e JOB_SPEC="${JOB_SPEC}" -v "/data:/data:z" registry.svc.ci.openshift.org/ci/gcsupload:latest --dry-run=false --gcs-path=gs://origin-federated-results --gcs-credentials-file=/data/credentials.json /data/gcs/*
      fi

@stevekuznetsov
Copy link
Contributor

fixed here #1567

@runcom
Copy link
Contributor Author

runcom commented Aug 28, 2018

I'm kicking off two new jobs and see how they go

@runcom
Copy link
Contributor Author

runcom commented Aug 28, 2018

e2e builds kicked, we're still getting Missing build ... when job is running https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/kubernetes-incubator_cri-o/1715/test_pull_request_crio_e2e_fedora/1157/ (we should see something like job's running right? it should point to where we pushed started.json and the like)

@runcom
Copy link
Contributor Author

runcom commented Aug 29, 2018

@stevekuznetsov
Copy link
Contributor

yeah the AMIs are missing jq

@runcom runcom closed this as completed Sep 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants