Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1813062: Updates Status Reconciliation to Support DNS Service #187

Merged
merged 2 commits into from Aug 25, 2020

Conversation

danehans
Copy link
Contributor

@danehans danehans commented Aug 5, 2020

Previously, the operator would surface status conditions indicating an issue with the DNS daemonset if the DNS service could not be created. This PR does the following:

  1. Updates the Makefile to add support for running the operator locally.
  2. Updates the Makefile so the test-e2e target only runs e2e tests.
  3. Creates a separate controller for status (similar to ingress operator). Operator status is now based on the availability of dns instead of dependent resources of dns.
  4. Updates unit tests.
  5. Updates e2e tests.

/assign @Miciah @knobunc
/cc @frobware @sgreene570

@openshift-ci-robot openshift-ci-robot added the bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. label Aug 5, 2020
@openshift-ci-robot
Copy link
Contributor

@danehans: This pull request references Bugzilla bug 1813062, which is invalid:

  • expected the bug to be in one of the following states: NEW, ASSIGNED, ON_DEV, POST, POST, but it is ON_QA instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1813062: Updates Status Reconciliation to Support DNS Service

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 5, 2020
@danehans
Copy link
Contributor Author

danehans commented Aug 5, 2020

/bugzilla refresh

@openshift-ci-robot openshift-ci-robot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Aug 5, 2020
@openshift-ci-robot
Copy link
Contributor

@danehans: This pull request references Bugzilla bug 1813062, which is valid. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@sgreene570 sgreene570 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! One small comment.

return wait.PollImmediate(1*time.Second, timeout, func() (bool, error) {
co := &configv1.ClusterOperator{}
if err := cl.Get(context.TODO(), controller.DNSClusterOperatorName(), co); err != nil {
return false, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on returning false, nil here and logging the error instead of returning false, err?

return wait.PollImmediate(1*time.Second, timeout, func() (bool, error) {
dns := &operatorv1.DNS{}
if err := cl.Get(context.TODO(), name, dns); err != nil {
return false, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return false, err => return false, nil?
(Log the err instead)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgreene570 L259 mirrors ingress operator here. I prefer the 2 operators are as consistent and address both in a follow-on PR.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 19, 2020
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 19, 2020
@danehans
Copy link
Contributor Author

e2e-aws job failed due to exceeding aws throttling rate:

2020-08-19T21:16:05.721092959Z 2020-08-19T21:16:05.721Z	ERROR	operator.dns_controller	dns/controller.go:181	failed to publish DNS record to zone	{"record": {"dnsName":"*.apps.ci-op-x0729h3t-bcd5b.origin-ci-int-aws.dev.rhcloud.com.","targets":["a767b6002f7ce4760a6e2bb4fe68426a-899735716.us-east-1.elb.amazonaws.com"],"recordType":"CNAME","recordTTL":30}, "dnszone": {"tags":{"Name":"ci-op-x0729h3t-bcd5b-p7cq2-int","kubernetes.io/cluster/ci-op-x0729h3t-bcd5b-p7cq2":"owned"}}, "error": "failed to update alias in zone Z04249512EGDZ53SW9JG0: couldn't update DNS record in zone Z04249512EGDZ53SW9JG0: Throttling: Rate exceeded\n\tstatus code: 400, request id: 3d698c07-3d1c-41cf-b666-bcc72d7fde71"}

The good thing is that the operator surfaced a Degraded=True condition. I don't see how this PR causes the throttling issue.

/test e2e-aws

@sgreene570
Copy link
Contributor

CI is green.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 24, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danehans, sgreene570

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [danehans,sgreene570]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

6 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@sgreene570
Copy link
Contributor

"Pod openshift-marketplace/community-operators-f7gt8 is not healthy: Back-off pulling image \"quay.io/openshift-community-operators/catalog:latest\"",

/retest

@openshift-merge-robot openshift-merge-robot merged commit 924a03d into openshift:master Aug 25, 2020
@openshift-ci-robot
Copy link
Contributor

@danehans: All pull requests linked via external trackers have merged: openshift/cluster-dns-operator#187, openshift/cluster-dns-operator#182. Bugzilla bug 1813062 has been moved to the MODIFIED state.

In response to this:

Bug 1813062: Updates Status Reconciliation to Support DNS Service

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sgreene570 added a commit to sgreene570/cluster-dns-operator that referenced this pull request Sep 25, 2020
The dns cluster operator is fed the DNS operator's namespace in
pkg/operator/controller/status/controller.go. This commit fixes a
regression brough in by openshift#187
in which the cluster operator is given a reference to the "dns"
namespace, which does not exist. This commit instead gives the cluster
operator the a reference to the correct operator namespace set at run
time.
sgreene570 added a commit to sgreene570/cluster-dns-operator that referenced this pull request Sep 25, 2020
The dns cluster operator is fed the DNS operator's namespace in
pkg/operator/controller/status/controller.go. This commit fixes a
regression brought in by openshift#187
in which the cluster operator is given a reference to the "dns"
namespace, which does not exist. This commit instead gives the cluster
operator a reference to the correct operator namespace set at run
time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants