Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-20024: Ignore max unavailable for status #386

Merged
merged 1 commit into from Oct 31, 2023

Conversation

candita
Copy link
Contributor

@candita candita commented Oct 12, 2023

Followup to PR #384.

  • pkg/operator/controller/controller_dns_node_resolver_daemonset.go - small update to a comment
  • pkg/operator/controller/dns_status.go - hardcode maxUnavailable to 10% of desiredNumberScheduled and remove condition "invalid maxUnavailable value"
  • pkg/operator/controller/dns_status_test.go - remove maxUnavailable format testing

@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. labels Oct 12, 2023
@openshift-ci-robot
Copy link
Contributor

@candita: This pull request references Jira Issue OCPBUGS-20024, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.15.0) matches configured target version for branch (4.15.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @melvinjoseph86

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Followup to PR #384.

  • pkg/operator/controller/controller_dns_node_resolver_daemonset.go - small update to a comment
  • pkg/operator/controller/dns_status.go - hardcode maxUnavailable to 10% of desiredNumberScheduled and remove condition "invalid maxUnavailable value"
  • pkg/operator/controller/dns_status_test.go - remove maxUnavailable format testing

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Oct 12, 2023
@candita
Copy link
Contributor Author

candita commented Oct 12, 2023

fail [github.com/openshift/origin/test/extended/prometheus/prometheus.go:577]: Unexpected error:
<errors.aggregate | len:1, cap:1>:
promQL query returned unexpected results:

/test e2e-aws-ovn

@candita
Copy link
Contributor Author

candita commented Oct 12, 2023

{"component":"entrypoint","file":"k8s.io/test-infra/prow/entrypoint/run.go:169","func":"k8s.io/test-infra/prow/entrypoint.Options.ExecuteProcess","level":"error","msg":"Process did not finish before 4h0m0s timeout","severity":"error","time":"2023-10-12T19:23:38Z"}

/test e2e-aws-ovn

@Miciah
Copy link
Contributor

Miciah commented Oct 13, 2023

/assign

@candita
Copy link
Contributor Author

candita commented Oct 13, 2023

could not initialize namespace: failed to wait for authentication cache to warm up after 15s: projects.project.openshift.io "ci-op-tbmipxqy" is forbidden: User "system:serviceaccount:ci:ci-operator" cannot get resource "projects" in API group "project.openshift.io" in the namespace "ci-op-tbmipxqy"

/test e2e-aws-ovn

@candita
Copy link
Contributor Author

candita commented Oct 17, 2023

After 4 hours:
Received signal. signal=interrupt
INFO[2023-10-12T19:23:38Z] error: Process interrupted with signal interrupt, cancelling execution...
INFO[2023-10-12T19:23:38Z] cleanup: Deleting release pod release-latest

/test e2e-aws-ovn

@candita
Copy link
Contributor Author

candita commented Oct 17, 2023

@Miciah PTAL when you get a chance.

Copy link
Contributor

@Miciah Miciah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fundamental change is good. I have some questions about tests and error handling.

Also, regarding the title, maxUnavailable was never modifiable by the cluster-admin. Maybe it would be clearer to say, "Ignore max unavailable for status" or, "Report degraded if >10% unavailable".

Comment on lines -233 to -267
{
name: "node-resolver invalid MaxUnavailable is ok",
clusterIP: "172.30.0.10",
dnsDaemonset: makeDaemonSet(10, 9, intstr.FromString("10%")),
nrDaemonset: makeDaemonSet(6, 0, intstr.FromString("TEST")),
expected: operatorv1.ConditionFalse,
},
{
name: "DNS invalid MaxUnavailable (string with digits without a percent sign)",
clusterIP: "172.30.0.10",
dnsDaemonset: makeDaemonSet(6, 6, intstr.IntOrString{Type: intstr.String, StrVal: "10"}),
nrDaemonset: makeDaemonSet(6, 6, intstr.FromString("33%")),
expected: operatorv1.ConditionUnknown,
},
{
name: "node-resolver invalid MaxUnavailable (string with digits without a percent sign) is ok",
clusterIP: "172.30.0.10",
dnsDaemonset: makeDaemonSet(6, 6, intstr.FromString("10%")),
nrDaemonset: makeDaemonSet(6, 6, intstr.IntOrString{Type: intstr.String, StrVal: "33"}),
expected: operatorv1.ConditionFalse,
},
{
name: "DNS invalid MaxUnavailable (string with letters)",
clusterIP: "172.30.0.10",
dnsDaemonset: makeDaemonSet(6, 6, intstr.IntOrString{Type: intstr.String, StrVal: "TEST"}),
nrDaemonset: makeDaemonSet(6, 6, intstr.FromString("33%")),
expected: operatorv1.ConditionUnknown,
},
{
name: "node-resolver invalid MaxUnavailable (string with letters) is ok",
clusterIP: "172.30.0.10",
dnsDaemonset: makeDaemonSet(6, 6, intstr.FromString("10%")),
nrDaemonset: makeDaemonSet(6, 6, intstr.IntOrString{Type: intstr.String, StrVal: "TEST"}),
expected: operatorv1.ConditionFalse,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting that we still had these test cases for invalid maxUnavailable values for the node-resolver daemonset. We kept these test cases when we changed the status reporting to ignore the node-resolver daemon set in #273. I would guess the reasoning was that we wanted the test cases to verify that the status reporting code really was not basing status on the daemon set'smaxUnavailable parameter. Does the same reasoning still apply? If so, it would make sense to keep (some of) these test cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be an oversight that I didn't clean these units tests then. We entirely removed the checking of node resolver daemonset and its maxUnavailable parameter. It doesn't make sense anymore to check anything having to do with node resolver when computing the degraded condition: https://github.com/openshift/cluster-dns-operator/pull/273/files#diff-32495132facf7e0819a407af732514958b198d9e40236656bc43b093345d1539L111

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I should remove the nrDaemonset field from the test cases. Does that make more sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be an oversight that I didn't clean these units tests then. We entirely removed the checking of node resolver daemonset and its maxUnavailable parameter. It doesn't make sense anymore to check anything having to do with node resolver when computing the degraded condition

Are you sure it was an oversight? My speculation was that it was intentional: If parameter X did affect result Y, and you are changing the logic so that X doesn't affect Y, then it does make sense for tests to verify that X doesn't affect Y, right? More generally, I'd rather err on the side of keeping possibly redundant test cases over risking removing test cases that could possibly detect faults. However, I'll defer to your judgment if you decide these test cases really should be removed and the test simplified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am quite sure it was an oversight. I removed the parameters haveNodeResolverDaemonset and nodeResolverDaemonset from the signature of computeDNSDegradedCondition, because we no longer use them in the function. We still use them in computeDNSProgressingCondition, but not computeDNSDegradedCondition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, I can live with that. Test coverage is still good overall.

pkg/operator/controller/dns_status.go Outdated Show resolved Hide resolved
@candita candita changed the title OCPBUGS-20024: Remove modifiability of maxUnavailable OCPBUGS-20024: Ignore max unavailable for status Oct 18, 2023
@candita candita force-pushed the OCPBUGS-20024-UseMaxSurge branch 2 times, most recently from 0e2c2aa to 5759e70 Compare October 19, 2023 17:03
Copy link
Contributor

@Miciah Miciah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple minor things.

Comment on lines 111 to 114
case intstrErr != nil:
// This should not happen, but is included just to safeguard against future changes.
degradedReasons = append(degradedReasons, "InvalidDNSMaxUnavailable")
messages = append(messages, fmt.Sprintf("The DNS daemonset has an invalid MaxUnavailable value: %v", intstrErr))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a reason to move this case up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, forgot to push.

@@ -14,6 +14,10 @@ import (
"k8s.io/apimachinery/pkg/util/intstr"
)

var (
maxUnavailable = intstr.FromInt(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to declare maxUnavailable here instead of inside makeDaemonSet? We usually tend towards making tests and helpers self-contained.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used in multiple functions and I removed it as a parameter to makeDaemonSet, so I declared it globally. It's used in TestDNSStatusConditions, TestComputeDNSDegradedCondition, TestComputeDNSProgressingCondition, and TestSkippingStatusUpdates.

For consistency, removed an extra declaration of it in TestDNSStatusConditions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, I won't insist on changing it.

@@ -201,7 +205,8 @@ func TestDNSStatusConditions(t *testing.T) {
// TestComputeDNSDegradedCondition verifies the computeDNSDegradedCondition has
// the expected behavior.
func TestComputeDNSDegradedCondition(t *testing.T) {
makeDaemonSet := func(desired, available int, maxUnavailable intstr.IntOrString) *appsv1.DaemonSet {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra blank line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Forgot to push.

Followup to PR openshift#384.

- pkg/operator/controller/controller_dns_node_resolver_daemonset.go - small update to a comment
- pkg/operator/controller/dns_status.go - hardcode maxUnavailable to 10% of desiredNumberScheduled
- pkg/operator/controller/dns_status_test.go - remove maxUnavailable format testing and cleanup noderesolver testing
@Miciah
Copy link
Contributor

Miciah commented Oct 31, 2023

Thanks!
/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 31, 2023
Copy link
Contributor

openshift-ci bot commented Oct 31, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Miciah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 31, 2023
Copy link
Contributor

openshift-ci bot commented Oct 31, 2023

@candita: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci openshift-ci bot merged commit 25020f4 into openshift:master Oct 31, 2023
9 checks passed
@openshift-ci-robot
Copy link
Contributor

@candita: Jira Issue OCPBUGS-20024: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-20024 has been moved to the MODIFIED state.

In response to this:

Followup to PR #384.

  • pkg/operator/controller/controller_dns_node_resolver_daemonset.go - small update to a comment
  • pkg/operator/controller/dns_status.go - hardcode maxUnavailable to 10% of desiredNumberScheduled and remove condition "invalid maxUnavailable value"
  • pkg/operator/controller/dns_status_test.go - remove maxUnavailable format testing

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot
Copy link
Contributor

Fix included in accepted release 4.15.0-0.nightly-2023-11-01-040931

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-cluster-dns-operator-container-v4.15.0-202311202349.p0.g25020f4.assembly.stream for distgit ose-cluster-dns-operator.
All builds following this will include this PR.

@candita
Copy link
Contributor Author

candita commented Jan 3, 2024

/cherry-pick release-4.14

@openshift-cherrypick-robot

@candita: new pull request created: #400

In response to this:

/cherry-pick release-4.14

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Miciah
Copy link
Contributor

Miciah commented Jan 4, 2024

/jira refresh

@openshift-ci-robot
Copy link
Contributor

@Miciah: Jira Issue OCPBUGS-20024: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-20024 has not been moved to the MODIFIED state.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@Miciah
Copy link
Contributor

Miciah commented Jan 4, 2024

/jira refresh

@openshift-ci-robot
Copy link
Contributor

@Miciah: Jira Issue OCPBUGS-20024: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-20024 has been moved to the MODIFIED state.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-robot
Copy link
Contributor

Fix included in accepted release 4.15.0-0.nightly-2024-01-05-151121

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants