New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release-4.13] OCPBUGS-19894: remove prestop hooks for northd, sbdbd and nbdb #2001
[release-4.13] OCPBUGS-19894: remove prestop hooks for northd, sbdbd and nbdb #2001
Conversation
when the prestop hook executes the ovn-ctl stop... command this causes the containers main script to exit which also propagates to the main container (PID 1) process which exits causing the container to exit. This is not neccessarily a problem except that with recent changes in CRI-O [0] there is some problem with how kubelet views this case. the containers are Exited for crio but seen as Ready from kubelet's perspective. Because of this, the pod gets stuck in Terminating state until it times out (~3m) and then containers will restart. removing the prestop hook let's the SIGTERM reach the configured trap (example [1]) run which effectively does the same thing as what the prestop hook would have done. This prevents the pod from being stuck in Terminating state. [0] cri-o/cri-o#7168 [1] https://github.com/openshift/cluster-network-operator/blob/c55f19132864a9d8fe347ad6a8280eaa2f60e839/bindata/network/ovn-kubernetes/self-hosted/single-zone-interconnect/ovnkube-master.yaml#L74-L82 Signed-off-by: Jamo Luhrsen <jluhrsen@gmail.com>
@jluhrsen: This pull request references Jira Issue OCPBUGS-17391, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@jluhrsen: This pull request references Jira Issue OCPBUGS-19894, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest |
/jira refresh |
@jluhrsen: This pull request references Jira Issue OCPBUGS-19894, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@jluhrsen: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/jira refresh |
@jluhrsen: This pull request references Jira Issue OCPBUGS-19894, which is valid. The bug has been moved to the POST state. 6 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@jluhrsen: This pull request references Jira Issue OCPBUGS-19894, which is valid. 6 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jcaamano, jluhrsen The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/label cherry-pick-approved |
/retest |
3 similar comments
/retest |
/retest |
/retest |
/retest-required |
/retest |
/retest-required |
/retest |
2 similar comments
/retest |
/retest |
/retest-required |
/override ci/prow/e2e-metal-ipi-ovn-ipv6 |
@sdodson: sdodson unauthorized: /override is restricted to Repo administrators, approvers in top level OWNERS file, and the following github teams:openshift: openshift-release-oversight. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/override ci/prow/e2e-metal-ipi-ovn-ipv6 |
1 similar comment
/override ci/prow/e2e-metal-ipi-ovn-ipv6 |
@dcbw: Overrode contexts on behalf of dcbw: ci/prow/e2e-metal-ipi-ovn-ipv6 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@dougbtv: Overrode contexts on behalf of dougbtv: ci/prow/e2e-metal-ipi-ovn-ipv6 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@jluhrsen: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@jluhrsen: Jira Issue OCPBUGS-19894: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-19894 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-4.12 |
@jluhrsen: new pull request created: #2055 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Fix included in accepted release 4.13.0-0.nightly-2023-10-07-050717 |
when the prestop hook executes the ovn-ctl stop... command this causes the containers main script to exit which also propagates to the main container (PID 1) process which exits causing the container to exit. This is not neccessarily a problem except that with recent changes in CRI-O [0] there is some problem with how kubelet views this case. the containers are Exited for crio but seen as Ready from kubelet's perspective. Because of this, the pod gets stuck in Terminating state until it times out (~3m) and then containers will restart.
removing the prestop hook let's the SIGTERM reach the configured trap (example [1]) run which effectively does the same thing as what the prestop hook would have done. This prevents the pod from being stuck in Terminating state.
[0] cri-o/cri-o#7168
[1]
cluster-network-operator/bindata/network/ovn-kubernetes/self-hosted/single-zone-interconnect/ovnkube-master.yaml
Lines 74 to 82 in c55f191