Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-4.6] Bug 1937090: Configure CoreDNS to shut down gracefully #248

Conversation

Miciah
Copy link
Contributor

@Miciah Miciah commented Mar 9, 2021

test/e2e: Re-enable TestCoreDNSImageUprade

test/e2e/operator_test.go: Re-enable TestCoreDNSImageUpgrade by adding a cluster version override for the DNS operator deployment. This override allows TestCoreDNSImageUpgrade to safely modify the DNS deployment (specifically the operator's container image field) without contention from the CVO. The cluster version override is removed at the end of the test via a defer block.

test/e2e: Block on TestCoreDNSImageUpgrade image revert

TestCoreDNSImageUpgrade: Ensure that the CoreDNS image change for default DNS is completely reverted before moving onto the next test.

TestDNSForwarding: Ensure that DNS pods are all available before verifying Corefile contents from each DNS pod. Also,log pod status if a given pod's Corefile doesn't meet the test's expectations.

This commit enhances the DNS operator tests to resolve BZ#1908891, in which TestDNSForwarding is noted as very flakey due to TestDNSForwarding's image revert rollout not blocking the premature execution of TestDNSFowarding.

Configure CoreDNS to shut down gracefully

This PR is the same as #205, which was reverted with #213, except that this PR does not change DNS pods' termination grace period.

  • assets/dns/daemonset.yaml: Change the readiness probe to use :8181/ready.
  • pkg/manifests/bindata.go: Regenerate.
  • pkg/operator/controller/controller_dns_configmap.go (corefileTemplate): Configure CoreDNS's health plugin to sleep 20 seconds when CoreDNS is shut down. Enable CoreDNS's ready plugin in order to provide a readiness endpoint on :8181/ready, which doesn't report ready until all plugins are initialized and stops reporting ready when CoreDNS is shutting down.
  • pkg/operator/controller/controller_dns_configmap_test.go (TestDesiredDNSConfigmap): Adjust for changes to corefileTemplate.

Delete TestCoreDNSImageUpgrade

Delete the TestCoreDNSImageUpgrade CI test. This test is unreliable, and we can achieve sufficient test coverage without it.

  • test/e2e/operator_test.go (TestCoreDNSImageUpgrade, setVersion, setImage): Delete functions.

Add TestCoreDNSDaemonSetReconciliation

Add an end-to-end test that verifies that the operator reconciles changes to the dns-default daemonset. This new test adds a node selector to the daemonset and verifies that the operator reverts the change.

The operator already has unit tests to verify that the daemonset update logic handles changes to image pullspecs and other important fields. Together, the new end-to-end test and the existing unit tests should provide sufficient test coverage for reconciliation of daemonsets.

  • test/e2e/operator_test.go (TestCoreDNSDaemonSetReconciliation): New test. Verify that the operator reconciles the dns-default daemonset.

This is a manual cherry-pick of #237, #230, and #226.

@openshift-ci-robot openshift-ci-robot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Mar 9, 2021
@openshift-ci-robot
Copy link
Contributor

@Miciah: This pull request references Bugzilla bug 1937090, which is invalid:

  • expected dependent Bugzilla bug 1937089 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is ASSIGNED instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

[release-4.7] Bug 1937090: Configure CoreDNS to shut down gracefully

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 9, 2021
@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 9, 2021
@Miciah Miciah changed the title [release-4.7] Bug 1937090: Configure CoreDNS to shut down gracefully [release-4.6] Bug 1937090: Configure CoreDNS to shut down gracefully Mar 9, 2021
@Miciah Miciah force-pushed the cherry-pick-237-to-release-4.6 branch from 47520d4 to 08015c7 Compare March 9, 2021 19:33
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 9, 2021
@Miciah
Copy link
Contributor Author

Miciah commented Mar 9, 2021

/retest

@Miciah
Copy link
Contributor Author

Miciah commented Mar 10, 2021

/test e2e-aws-operator

sgreene570 and others added 5 commits March 9, 2021 23:00
test/e2e/operator_test.go: Re-enable TestCoreDNSImageUpgrade by adding a
cluster version override for the DNS operator deployment. This override
allows TestCoreDNSImageUpgrade to safely modify the DNS deployment
(specifically the operator's container image field) without contention
from the CVO. The cluster version override is removed at the end of the
test via a `defer` block.
TestCoreDNSImageUpgrade: Ensure that the CoreDNS image change
for default DNS is completely reverted before moving onto the next test.

TestDNSForwarding: Ensure that DNS pods are all available
before verifying Corefile contents from each DNS pod. Also,
log pod status if a given pod's Corefile doesn't meet the test's
expectations.

This commit enhances the DNS operator tests to resolve
BZ#1908891, in which TestDNSForwarding is noted as
very flakey due to TestDNSForwarding's image revert rollout
not blocking the premature execution of TestDNSFowarding.
This commit is the same as commit f094ddf,
which was reverted with commit a96c45e,
except that this commit does not change DNS pods' termination grace period.

This commit is related to bug 1884053.

https://bugzilla.redhat.com/show_bug.cgi?id=1884053

* assets/dns/daemonset.yaml: Change the readiness probe to use :8181/ready.
* pkg/manifests/bindata.go: Regenerate.
* pkg/operator/controller/controller_dns_configmap.go (corefileTemplate):
Configure CoreDNS's health plugin to sleep 20 seconds when CoreDNS is shut
down.  Enable CoreDNS's ready plugin in order to provide a readiness
endpoint on :8181/ready, which doesn't report ready until all plugins are
initialized and stops reporting ready when CoreDNS is shutting down.
* pkg/operator/controller/controller_dns_configmap_test.go
(TestDesiredDNSConfigmap): Adjust for changes to corefileTemplate.
Delete the TestCoreDNSImageUpgrade CI test.  This test is unreliable, and
we can achieve sufficient test coverage without it.

* test/e2e/operator_test.go (TestCoreDNSImageUpgrade)
(setVersion, setImage, checkCurrentDNSImage): Delete functions.
Add an end-to-end test that verifies that the operator reconciles changes
to the dns-default daemonset.  This new test adds a node selector to the
daemonset and verifies that the operator reverts the change.

The operator already has unit tests to verify that the daemonset update
logic handles changes to image pullspecs and other important fields.
Together, the new end-to-end test and the existing unit tests should
provide sufficient test coverage for reconciliation of daemonsets.

* test/e2e/operator_test.go (TestCoreDNSDaemonSetReconciliation): New
test.  Verify that the operator reconciles the dns-default daemonset.
@Miciah
Copy link
Contributor Author

Miciah commented Mar 10, 2021

Latest push adds backports of #226 and #230 to fix the TestDNSForwarding failures.

@Miciah Miciah force-pushed the cherry-pick-237-to-release-4.6 branch from 08015c7 to 2d17b72 Compare March 10, 2021 04:18
@openshift-ci-robot
Copy link
Contributor

@Miciah: This pull request references Bugzilla bug 1937090, which is invalid:

  • expected dependent Bugzilla bug 1937089 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is POST instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

[release-4.6] Bug 1937090: Configure CoreDNS to shut down gracefully

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Miciah
Copy link
Contributor Author

Miciah commented Mar 10, 2021

/test e2e-aws

@sgreene570
Copy link
Contributor

/bugzilla refresh
/lgtm

@openshift-ci-robot openshift-ci-robot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Mar 31, 2021
@openshift-ci-robot
Copy link
Contributor

@sgreene570: This pull request references Bugzilla bug 1937090, which is valid.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.z) matches configured target release for branch (4.6.z)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
  • dependent bug Bugzilla bug 1937089 is in the state VERIFIED, which is one of the valid states (VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE))
  • dependent Bugzilla bug 1937089 targets the "4.7.z" release, which is one of the valid target releases: 4.7.0, 4.7.z
  • bug has dependents

Requesting review from QA contact:
/cc @lihongan

In response to this:

/bugzilla refresh
/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Miciah, sgreene570

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 31, 2021
@russellb russellb added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Apr 9, 2021
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 5d2c9b9 into openshift:release-4.6 Apr 12, 2021
@openshift-ci-robot
Copy link
Contributor

@Miciah: All pull requests linked via external trackers have merged:

Bugzilla bug 1937090 has been moved to the MODIFIED state.

In response to this:

[release-4.6] Bug 1937090: Configure CoreDNS to shut down gracefully

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants