Bug 1802534: gcp-routes: move to MCO, implement downfile, tweak timing #1670

squeed · 2020-04-21T10:34:12Z

Move gcp-routes to the MCO, call it openshift-gcp-routes. Also, improve the behavior significantly:

always accept connections from external sources
use downfiles instead of stopping the service
implement the equivalent in gcp-routes-controller.

squeed · 2020-04-21T10:34:23Z

cc @sttts

cmd/gcp-routes-controller/README.md

squeed · 2020-04-21T10:38:13Z

PR adding downfile support to gcp-routes.sh: https://gitlab.cee.redhat.com/coreos/redhat-coreos/merge_requests/899

cmd/gcp-routes-controller/run.go

openshift-ci-robot · 2020-04-21T10:44:22Z

@squeed: This pull request references Bugzilla bug 1802534, which is valid. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target release (4.5.0) matches configured target release for branch (4.5.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1802534: cmd/gcp-routes-controller: use downfile if available, tweak timing

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

templates/master/00-master/gcp/files/etc-kubernetes-manifests-gcp-routes-controller.yaml

squeed · 2020-04-21T10:48:27Z

/hold
until the gcp PR is merged and "deployed" in to CI.

LorbusChris · 2020-04-21T11:56:51Z

/cherry-pick fcos
/cc @vrutkovs

openshift-cherrypick-robot · 2020-04-21T11:56:52Z

@LorbusChris: once the present PR merges, I will cherry-pick it on top of fcos in a new PR and assign it to you.

In response to this:

/cherry-pick fcos
/cc @vrutkovs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

squeed · 2020-04-21T16:25:36Z

Seems the unit test scaffolding is out-of-date, so this is not going to pass unit tests. Investigating.

umohnani8 · 2020-04-21T18:29:13Z

/test e2e-gcp-op

cmd/gcp-routes-controller/run.go

kikisdeliveryservice · 2020-04-22T00:10:46Z

seeing:
E0421 19:17:08.844158 1 leaderelection.go:331] error retrieving resource lock openshift-sdn/openshift-network-controller: Get https://api-int.ci-op-jd83nzz1-1354f.origin-ci-int-gce.dev.openshift.com:6443/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller: dial tcp 10.0.0.2:6443: i/o timeout

https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/1670/pull-ci-openshift-machine-config-operator-master-e2e-gcp-op/1940/artifacts/e2e-gcp-op/pods/openshift-sdn_sdn-controller-2t46q_sdn-controller.log

squeed · 2020-04-22T08:12:59Z

seeing:
E0421 19:17:08.844158 1 leaderelection.go:331] error retrieving resource lock openshift-sdn/openshift-network-controller: Get https://api-int.ci-op-jd83nzz1-1354f.origin-ci-int-gce.dev.openshift.com:6443/api/v1/namespaces/openshift-sdn/configmaps/openshift-network-controller: dial tcp 10.0.0.2:6443: i/o timeout

https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/1670/pull-ci-openshift-machine-config-operator-master-e2e-gcp-op/1940/artifacts/e2e-gcp-op/pods/openshift-sdn_sdn-controller-2t46q_sdn-controller.log

Nuts. Investigating.

squeed · 2020-04-22T08:44:13Z

Those error messages are red herrings; the node maintained access to the internal api server at that time. I suspect they're just from masters rebooting. Continuing the investigation.

squeed · 2020-04-22T09:45:49Z

Okay, this is weird: tests are failing because one of the worker nodes failed to start the MCD after a reboot - because CRIO won't create the sandbox.

Apr 21 21:21:32.525864 ci-op-d4rtc-w-b-xrmnd.c.openshift-gce-devel-ci.internal hyperkube[1630]: I0421 21:21:32.525753 1630 event.go:278] Event(v1.ObjectReference{Kind:"Pod", Namespace:"openshift-machine-config-operator", Name:"machine-config-daemon-4kr98", UID:"fb999c1d-c1ac-4e8f-bf2e-98e2b785acf4", APIVersion:"v1", ResourceVersion:"27971", FieldPath:""}): type: 'Warning' reason: 'FailedCreatePodSandBox' Failed to create pod sandbox: rpc error: code = Unknown desc = reserving pod sandbox name: error reserving pod name k8s_machine-config-daemon-4kr98_openshift-machine-config-operator_fb999c1d-c1ac-4e8f-bf2e-98e2b785acf4_1 for id 6bfb87bd099b9954b3ee5654c35e9b6fc8403df3f06f6f1a57331195f94dc28e: name is reserved

This definitely isn't caused by my change - mine only touches the master nodes. But it's not good. I wonder if the problem lies with kubelet, crio, or libpod?

squeed · 2020-04-22T10:01:18Z

Known (and very scary) bug: 1785399
/retest

squeed · 2020-04-22T12:19:39Z

Updated, fixed unit test scaffolding. This job is pretty prone to triggering the crio leak, so it might be a while before this goes green. AFAICT it's not a bug in the PR.

knobunc · 2020-04-22T15:43:02Z

/retest

openshift-bot · 2020-05-19T08:28:35Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-05-19T08:54:46Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-05-19T09:33:37Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-05-19T10:00:00Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-05-19T10:12:36Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-05-19T10:25:36Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-05-19T10:39:30Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-05-19T10:51:36Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-05-19T11:04:37Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-05-19T11:17:37Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-05-19T11:43:35Z

/retest

Please review the full test history for this PR and help us cut down flakes.

squeed · 2020-05-19T12:27:14Z

This seems to be a horrid test issue.
/hold
to stop the retests until it clears up

squeed · 2020-05-19T13:03:27Z

/test e2e-aws

squeed · 2020-05-19T15:13:28Z

/test e2e-aws

squeed · 2020-05-20T07:18:34Z

/hold cancel
now that quay is back
/retest

openshift-bot · 2020-05-20T07:27:59Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-05-20T08:57:08Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-05-20T10:28:37Z

/retest

Please review the full test history for this PR and help us cut down flakes.

squeed · 2020-05-20T11:50:12Z

It looks like whatever the problem in https://bugzilla.redhat.com/show_bug.cgi?id=1828606 is biting us.

Given that this doesn't touch aws at all, would this be a candidate for an /override ?

openshift-bot · 2020-05-20T13:17:09Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-ci-robot · 2020-05-20T15:03:56Z

@squeed: All pull requests linked via external trackers have merged: openshift/machine-config-operator#1670. Bugzilla bug 1802534 has been moved to the MODIFIED state.

In response to this:

Bug 1802534: gcp-routes: move to MCO, implement downfile, tweak timing

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-cherrypick-robot · 2020-05-20T15:04:16Z

@LorbusChris: new pull request created: #1741

In response to this:

/cherry-pick fcos
/cc @vrutkovs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sttts · 2020-06-04T12:35:21Z

/cherry-pick release-4.4

openshift-cherrypick-robot · 2020-06-04T12:35:30Z

@sttts: new pull request created: #1780

In response to this:

/cherry-pick release-4.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot requested review from ericavonb and runcom April 21, 2020 10:35

sttts reviewed Apr 21, 2020

View reviewed changes

cmd/gcp-routes-controller/README.md Outdated Show resolved Hide resolved

sttts reviewed Apr 21, 2020

View reviewed changes

cmd/gcp-routes-controller/run.go Outdated Show resolved Hide resolved

sttts reviewed Apr 21, 2020

View reviewed changes

cmd/gcp-routes-controller/run.go Outdated Show resolved Hide resolved

squeed changed the title ~~cmd/gcp-routes-controller: use downfile if available, tweak timing~~ Bug 1802534: cmd/gcp-routes-controller: use downfile if available, tweak timing Apr 21, 2020

openshift-ci-robot added the bugzilla/high label Apr 21, 2020

openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Apr 21, 2020

sttts reviewed Apr 21, 2020

View reviewed changes

templates/master/00-master/gcp/files/etc-kubernetes-manifests-gcp-routes-controller.yaml Show resolved Hide resolved

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 21, 2020

openshift-ci-robot requested a review from vrutkovs April 21, 2020 11:56

tedyu reviewed Apr 21, 2020

View reviewed changes

cmd/gcp-routes-controller/run.go Show resolved Hide resolved

tedyu reviewed Apr 21, 2020

View reviewed changes

cmd/gcp-routes-controller/run.go Outdated Show resolved Hide resolved

tedyu reviewed Apr 21, 2020

View reviewed changes

cmd/gcp-routes-controller/run.go Show resolved Hide resolved

tedyu reviewed Apr 21, 2020

View reviewed changes

cmd/gcp-routes-controller/run.go Outdated Show resolved Hide resolved

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 19, 2020

openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 20, 2020

openshift-merge-robot merged commit 50bc7b4 into openshift:master May 20, 2020

openshift-cherrypick-robot mentioned this pull request May 20, 2020

[fcos] Bug 1802534: gcp-routes: move to MCO, implement downfile, tweak timing #1741

Closed

openshift-cherrypick-robot mentioned this pull request Jun 4, 2020

[release-4.4] Bug 1843928: gcp-routes: move to MCO, implement downfile, tweak timing #1780

Merged

cgwalters mentioned this pull request Jun 30, 2020

also do GCP routes on bootstrap #1883

Closed

Bug 1802534: gcp-routes: move to MCO, implement downfile, tweak timing #1670

Bug 1802534: gcp-routes: move to MCO, implement downfile, tweak timing #1670

Conversation

squeed commented Apr 21, 2020 • edited

squeed commented Apr 21, 2020

squeed commented Apr 21, 2020

openshift-ci-robot commented Apr 21, 2020

squeed commented Apr 21, 2020

LorbusChris commented Apr 21, 2020

openshift-cherrypick-robot commented Apr 21, 2020

squeed commented Apr 21, 2020

umohnani8 commented Apr 21, 2020

kikisdeliveryservice commented Apr 22, 2020

squeed commented Apr 22, 2020

squeed commented Apr 22, 2020

squeed commented Apr 22, 2020

squeed commented Apr 22, 2020 • edited

squeed commented Apr 22, 2020

knobunc commented Apr 22, 2020

openshift-bot commented May 19, 2020

openshift-bot commented May 19, 2020

openshift-bot commented May 19, 2020

openshift-bot commented May 19, 2020

openshift-bot commented May 19, 2020

openshift-bot commented May 19, 2020

openshift-bot commented May 19, 2020

openshift-bot commented May 19, 2020

openshift-bot commented May 19, 2020

openshift-bot commented May 19, 2020

openshift-bot commented May 19, 2020

squeed commented May 19, 2020

squeed commented May 19, 2020

squeed commented May 19, 2020

squeed commented May 20, 2020

openshift-bot commented May 20, 2020

openshift-bot commented May 20, 2020

openshift-bot commented May 20, 2020

squeed commented May 20, 2020

openshift-bot commented May 20, 2020

openshift-ci-robot commented May 20, 2020

openshift-cherrypick-robot commented May 20, 2020

sttts commented Jun 4, 2020

openshift-cherrypick-robot commented Jun 4, 2020

squeed commented Apr 21, 2020 •

edited

squeed commented Apr 22, 2020 •

edited