Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci-operator/templates/openshift/installer/cluster-launch-installer-e2e: Error-catching for Google OAuth pokes #6190

Merged
merged 1 commit into from Dec 3, 2019

Conversation

wking
Copy link
Member

@wking wking commented Dec 2, 2019

Add some echos to the pokes that initially landed in 0ec2cd9 (#5720) to make it easier to rule out that code when debugging mysterious failures like:

Container setup exited with code 6, reason Error
---
Lease acquired, installing...
Installing from release registry.svc.ci.openshift.org/ci-op-r6dy480t/release@sha256:284ff92845dbfc3ca1be73159acc58b36cbfe03aed05d0f79582ea4207035da9
---

@openshift-ci-robot openshift-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Dec 2, 2019
@wking wking changed the title ci-operator/templates/openshift/installer/cluster-launch-installer-e2e: Debug logging for Google OAuth pokes ci-operator/templates/openshift/installer/cluster-launch-installer-e2e: Error-catching for Google OAuth pokes Dec 2, 2019
@smarterclayton
Copy link
Contributor

I’m not sure what you’re trying to achieve with this change. Describe why you think it’s related?

@smarterclayton
Copy link
Contributor

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 3, 2019
@wking
Copy link
Member Author

wking commented Dec 3, 2019

#6190 (comment) is why DNS failures are giving us Container setup exited with code 6, reason Error and the echos would make the relationship very obvious, because the last line in the logs would mention the OAuth poke and non-400 response code.

@smarterclayton
Copy link
Contributor

/hold cancel

But want some changes

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 3, 2019
…e: Error-catching for Google OAuth pokes

Catch non-zero exit codes in the poke that initially landed in 0ec2cd9
(template: Try to poke the GCP auth endpoint in the container,
2019-10-31, openshift#5720) to make it easier to rule out that code when
debugging mysterious failures like [1]:

  Container setup exited with code 6, reason Error
  ---
  Lease acquired, installing...
  Installing from release registry.svc.ci.openshift.org/ci-op-r6dy480t/release@sha256:284ff92845dbfc3ca1be73159acc58b36cbfe03aed05d0f79582ea4207035da9
  ---

From curl(1), exit 6 is:

  Couldn't resolve host. The given remote host was not resolved.

Clayton suggested including the exit code in the non-zero exit log
entry [2].  Testing locally:

  $ echo $BASH_VERSION
  4.2.46(2)-release
  $ code="$( curl -s -o /dev/null -w "%{http_code}" https://does-not-exist.example.com -X POST -d '' || echo "Failed to POST https://oauth2.googleapis.com/token with $?" 1>&2)"
  Failed to POST https://oauth2.googleapis.com/token with 6

[1]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/pr-logs/pull/openshift_installer/2724/pull-ci-openshift-installer-release-4.3-e2e-gcp/8
[2]: openshift#6190 (comment)
@smarterclayton
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 3, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: smarterclayton, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 782eb7d into openshift:master Dec 3, 2019
@openshift-ci-robot
Copy link
Contributor

@wking: Updated the following 3 configmaps:

  • prow-job-cluster-launch-installer-e2e configmap in namespace ci-stg at cluster default using the following files:
    • key cluster-launch-installer-e2e.yaml using file ci-operator/templates/openshift/installer/cluster-launch-installer-e2e.yaml
  • prow-job-cluster-launch-installer-e2e configmap in namespace ci at cluster ci/api-build01-ci-devcluster-openshift-com:6443 using the following files:
    • key cluster-launch-installer-e2e.yaml using file ci-operator/templates/openshift/installer/cluster-launch-installer-e2e.yaml
  • prow-job-cluster-launch-installer-e2e configmap in namespace ci at cluster default using the following files:
    • key cluster-launch-installer-e2e.yaml using file ci-operator/templates/openshift/installer/cluster-launch-installer-e2e.yaml

In response to this:

Add some echos to the pokes that initially landed in 0ec2cd9 (#5720) to make it easier to rule out that code when debugging mysterious failures like:

Container setup exited with code 6, reason Error
---
Lease acquired, installing...
Installing from release registry.svc.ci.openshift.org/ci-op-r6dy480t/release@sha256:284ff92845dbfc3ca1be73159acc58b36cbfe03aed05d0f79582ea4207035da9
---

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wking wking deleted the gcp-token-poke-debug branch December 3, 2019 01:06
@openshift-ci-robot
Copy link
Contributor

@wking: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/rehearse/openshift/cloud-credential-operator/master/e2e-gcp 8cbef5e link /test pj-rehearse
ci/rehearse/openshift/cloud-credential-operator/master/e2e-azure 8cbef5e link /test pj-rehearse
ci/prow/pj-rehearse 8cbef5e link /test pj-rehearse

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@wking
Copy link
Member Author

wking commented Dec 3, 2019

Hrm, recent, mysterious exit 6 here. Did I miss something here?

wking added a commit to wking/ci-tools that referenced this pull request Dec 20, 2019
Bringing over a number of changes which have landed in
ci-operator/templates/openshift/installer/cluster-launch-installer-e2e.yaml
as of openshift/release@016eb4ed27 (Merge pull request
openshift/release#6505 from hongkailiu/clusterReaders, 2019-12-19).
One series was improved kill logic:

* openshift/release@9cd158adf3 (template: Use a more correct kill
  command, 2019-12-03, openshift/release#6223).
* openshift/release@d0744e520d (exit with 0 even if kill failed,
  2019-12-09, openshift/release#6295)

Another series was around AWS instance console logs:

* openshift/release@e102a16d89
  (ci-operator/templates/openshift/installer/cluster-launch-installer-e2e:
  Gather node console logs on AWS, 2019-12-02,
  openshift/release#6189).
* openshift/release@26fde70045
  (ci-operator/templates/openshift/installer/cluster-launch-installer-e2e:
  Set AWS_DEFAULT_REGION, 2019-12-04, openshift/release#6249).

And there was also:

* openshift/release@cdf97164aa (templates: Add large and xlarge
  variants, 2019-11-25, openshift/release#6081).
* openshift/release@8cbef5e4a7
  (ci-operator/templates/openshift/installer/cluster-launch-installer-e2e:
  Error-catching for Google OAuth pokes, 2019-12-02,
  openshift/release#6190).
* openshift/release@ad29eda8dd (template: Gather the prometheus target
  metadata during teardown, 2019-12-12, openshift/release#6379).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
4 participants