Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-16019: prevent creation of multiple cni-sysctl-allowlist-ds pods #1904

Merged
merged 3 commits into from Aug 8, 2023

Conversation

mlguerrero12
Copy link
Member

The daemonset has a BestEffort QoS which means that in some cases, the pods won't ever be scheduled.
This also prevents unwanted retries when one or more pods are not ready due to issues with the cluster.
https://issues.redhat.com/browse/OCPBUGS-15818 (BestEffort Daemonset cni-sysctl-allowlist-ds)

When there are issues with the cluster such as in https://issues.redhat.com/browse/OCPBUGS-15284, some pods are stuck in the terminating state which affects subsequent reconciliations of the allowlist controller.

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jul 17, 2023
@openshift-ci-robot
Copy link
Contributor

@mlguerrero12: This pull request references Jira Issue OCPBUGS-16019, which is invalid:

  • expected the bug to target the "4.14.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

The daemonset has a BestEffort QoS which means that in some cases, the pods won't ever be scheduled.
This also prevents unwanted retries when one or more pods are not ready due to issues with the cluster.
https://issues.redhat.com/browse/OCPBUGS-15818 (BestEffort Daemonset cni-sysctl-allowlist-ds)

When there are issues with the cluster such as in https://issues.redhat.com/browse/OCPBUGS-15284, some pods are stuck in the terminating state which affects subsequent reconciliations of the allowlist controller.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mlguerrero12
Copy link
Member Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Jul 17, 2023
@openshift-ci-robot
Copy link
Contributor

@mlguerrero12: This pull request references Jira Issue OCPBUGS-16019, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.14.0) matches configured target version for branch (4.14.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @anuragthehatter

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Jul 17, 2023
@openshift-ci-robot
Copy link
Contributor

@mlguerrero12: This pull request references Jira Issue OCPBUGS-16019, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.14.0) matches configured target version for branch (4.14.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @anuragthehatter

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mlguerrero12
Copy link
Member Author

/retest-required

@mlguerrero12
Copy link
Member Author

/retest

@mlguerrero12
Copy link
Member Author

/retest-required

@mlguerrero12
Copy link
Member Author

/retest

@mlguerrero12
Copy link
Member Author

/retest-required

@mlguerrero12
Copy link
Member Author

/retest

@@ -216,16 +241,16 @@ func deleteDeamonSet(ctx context.Context, client cnoclient.Client) error {
return nil
}

func daemonsetExists(ctx context.Context, client cnoclient.Client) (bool, error) {
func getDeamonSet(ctx context.Context, client cnoclient.Client) (*appsv1.DaemonSet, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: getDaemonSet

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks.

@dougbtv
Copy link
Member

dougbtv commented Jul 20, 2023

/approve

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 20, 2023
@cgoncalves
Copy link

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 21, 2023
@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 3db0fc3 and 2 for PR HEAD ad8e30d in total

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD d4f68c0 and 1 for PR HEAD ad8e30d in total

@mlguerrero12
Copy link
Member Author

/retest-required

3 similar comments
@mlguerrero12
Copy link
Member Author

/retest-required

@mlguerrero12
Copy link
Member Author

/retest-required

@mlguerrero12
Copy link
Member Author

/retest-required

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD fc3e0e2 and 0 for PR HEAD ad8e30d in total

@openshift-ci-robot
Copy link
Contributor

/hold

Revision ad8e30d was retested 3 times: holding

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 2, 2023
@mlguerrero12
Copy link
Member Author

/hold cancel
/retest-required

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 3, 2023
@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 4178020 and 2 for PR HEAD ad8e30d in total

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD bbd4854 and 1 for PR HEAD ad8e30d in total

@mlguerrero12
Copy link
Member Author

/retest-required

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD ba2ab3a and 0 for PR HEAD ad8e30d in total

@openshift-ci-robot
Copy link
Contributor

/hold

Revision ad8e30d was retested 3 times: holding

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 4, 2023
Signed-off-by: Marcelo Guerrero <marguerr@redhat.com>
The daemonset has a BestEffort QoS which  means that
in some cases, the pods won't ever be scheduled. This
also prevents unwanted retries when one or more pods
are not ready due to issues with the cluster.
https://issues.redhat.com/browse/OCPBUGS-15818

Signed-off-by: Marcelo Guerrero <marguerr@redhat.com>
When there are issues with the cluster such as in
https://issues.redhat.com/browse/OCPBUGS-15284, some pods
are stuck in the terminating state which affects subsequent
reconciliations of the allowlist controller.

Signed-off-by: Marcelo Guerrero <marguerr@redhat.com>
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Aug 7, 2023
@mlguerrero12
Copy link
Member Author

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 7, 2023
@cgoncalves
Copy link

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 7, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 7, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgoncalves, dougbtv, mlguerrero12

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 829c548 and 2 for PR HEAD 98d6298 in total

@mlguerrero12
Copy link
Member Author

/retest-required

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 5104647 and 1 for PR HEAD 98d6298 in total

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 7, 2023

@mlguerrero12: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-ovn-upgrade 98d6298 link false /test e2e-gcp-ovn-upgrade
ci/prow/e2e-network-mtu-migration-ovn-ipv4 98d6298 link false /test e2e-network-mtu-migration-ovn-ipv4
ci/prow/e2e-openstack-ovn 98d6298 link false /test e2e-openstack-ovn
ci/prow/e2e-network-mtu-migration-sdn-ipv4 98d6298 link false /test e2e-network-mtu-migration-sdn-ipv4

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@mlguerrero12
Copy link
Member Author

/retest-required

@openshift-merge-robot openshift-merge-robot merged commit 2da32c1 into openshift:master Aug 8, 2023
28 of 32 checks passed
@openshift-ci-robot
Copy link
Contributor

@mlguerrero12: Jira Issue OCPBUGS-16019: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-16019 has been moved to the MODIFIED state.

In response to this:

The daemonset has a BestEffort QoS which means that in some cases, the pods won't ever be scheduled.
This also prevents unwanted retries when one or more pods are not ready due to issues with the cluster.
https://issues.redhat.com/browse/OCPBUGS-15818 (BestEffort Daemonset cni-sysctl-allowlist-ds)

When there are issues with the cluster such as in https://issues.redhat.com/browse/OCPBUGS-15284, some pods are stuck in the terminating state which affects subsequent reconciliations of the allowlist controller.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mlguerrero12
Copy link
Member Author

/cherry-pick release-4.13

@openshift-cherrypick-robot

@mlguerrero12: #1904 failed to apply on top of branch "release-4.13":

Applying: Fix log level message when allowlist controller succeeds
Applying: Do not retry when allowlist pods are not ready
Applying: Check only pods owned by current allowlist sysctl DS
Using index info to reconstruct a base tree...
M	pkg/controller/allowlist/allowlist_controller.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/controller/allowlist/allowlist_controller.go
CONFLICT (content): Merge conflict in pkg/controller/allowlist/allowlist_controller.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0003 Check only pods owned by current allowlist sysctl DS
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-4.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants