-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-11888: handle daemonSet pods restart #347
OCPBUGS-11888: handle daemonSet pods restart #347
Conversation
when delete daemonSet or daemon pods manually pods will get recreated but the interface will have older xdp attached to it Signed-off-by: msherif1234 <mmahmoud@redhat.com>
@msherif1234: This pull request references Jira Issue OCPBUGS-11888, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: msherif1234 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@msherif1234: This pull request references Jira Issue OCPBUGS-11888, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@msherif1234: This pull request references Jira Issue OCPBUGS-11888, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest |
/assign @martinkennelly |
@msherif1234: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@@ -66,15 +66,15 @@ func (r *IngressNodeFirewallNodeStateReconciler) Reconcile(ctx context.Context, | |||
// Request object not found, could have been deleted after reconcile request. | |||
// Owned objects are automatically garbage collected. For additional cleanup logic use finalizers. | |||
// Return and don't requeue | |||
return r.reconcileResource(ctx, req, nodeState, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The code changes in ingressnofirewallnodestate_controller.go and in ebpfsyncer.go are not related, correct? So perhaps break this into 2 commits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that was just unused arg
@@ -82,6 +87,15 @@ func (e *ebpfSingleton) SyncInterfaceIngressRules( | |||
}() | |||
} | |||
|
|||
signal.Notify(sigc, os.Interrupt, syscall.SIGTERM) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you test if you can still sig TERM the process (if this is needed)?
In a standalone test process, I can't ctrl-c / sig TERM the process any more after doing this, but it might behave differently with the Operator SDK
I tested this quickly and kill -9 obviously works still, but the TERM signal doesn't shut down the process any more:
$ cat main.go
package main
import (
"fmt"
"os"
"os/signal"
"syscall"
"time"
)
type eStruct struct {
c interface{}
}
func (e eStruct) resetAll() {
fmt.Println("Reset: ", e.c)
}
func foo(e eStruct) {
fmt.Println("foo", e)
sigc := make(chan os.Signal, 1)
signal.Notify(sigc, os.Interrupt, syscall.SIGTERM)
go func(c chan os.Signal) {
// Wait for a SIGTERM
<-c
if e.c != nil {
e.resetAll()
}
}(sigc)
}
func main() {
e := eStruct{
c: "test",
}
foo(e)
time.Sleep(15 * time.Second)
fmt.Println("15 seconds are over, normal shutdown")
}
$ ./m & pid=$! ; sleep 2; kill $pid ; wait $pid
[1] 81616
foo {test}
Reset: test
15 seconds are over, normal shutdown
[1]+ Done ./m
When I add os.Exit(0)
, then the program will shut down upon receiving the signal:
19 func foo(e eStruct) {
20 fmt.Println("foo", e)
21
22 sigc := make(chan os.Signal, 1)
23 signal.Notify(sigc, os.Interrupt, syscall.SIGTERM)
24 go func(c chan os.Signal) {
25 // Wait for a SIGTERM
26 <-c
27 if e.c != nil {
28 e.resetAll()
29 }
30 os.Exit(0)
31 }(sigc)
32 }
33
[akaris@linux test-signal]$ ./m & pid=$! ; sleep 2; kill $pid ; wait $pid
[1] 82337
foo {test}
Reset: test
[1]+ Done
This is not a suggestion to add os.Exit, it's merely a question from my side
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when u delete ds or delete the pods with something like
oc delete pod -l=app=ingress-node-firewall-daemon -n openshift-ingress-node-firewall
SIGTERM will be sent and the controller runtime will be bring down all processes so I don't need to exit the process when the signal is detected I just need to cleanup and left the normal bring down flow continue did this answer ur question ? or I missunderstood ur question ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah my question is for the opposite case: when you run kill $pid, will the application be torn down and will the pod restart. Or if you test the process standalone, can you still CTRL-C and it will shut down
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will try it out and see if I can do much in separate PR thanks for the review!!!
Also: an E2E test for this perhaps? |
Marin has an updated e2e that specially test this condition so will wait on e2e till his PR merged |
/lgtm |
@msherif1234: Jira Issue OCPBUGS-11888: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-11888 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
when delete daemonSet or daemon pods manually
pods will get recreated but the interface will
have older xdp attached to it
- What this PR does and why is it needed
fix an issue where we have stale XDP attached to interface(s) when daemonset restarts
- How to verify it
$ oc rsh sctpserver sh-4.2# ncat -l 30102 --sctp $ oc rsh sctpclient sh-4.2# nc -v 10.129.2.26 30102 --sctp Ncat: Version 7.50 ( https://nmap.org/ncat ) Ncat: Connection timed out. $ oc logs -n openshift-ingress-node-firewall ingress-node-firewall-daemon-lvq5x -c events --follow 2023-05-18 16:13:02 +0000 UTC ci-ln-6rzv9ik-72292-ctfxk-worker-b-n2x9r ruleId 10 action Drop len 82 if genev_sys_6081 2023-05-18 16:13:02 +0000 UTC ci-ln-6rzv9ik-72292-ctfxk-worker-b-n2x9r ipv4 src addr 10.128.2.24 dst addr 10.129.2.26 2023-05-18 16:13:02 +0000 UTC ci-ln-6rzv9ik-72292-ctfxk-worker-b-n2x9r sctp srcPort 45827 dstPort 30102
$ oc logs -n openshift-ingress-node-firewall ingress-node-firewall-daemon-wwh6b -c events --follow 2023-05-18 16:14:20 +0000 UTC ci-ln-6rzv9ik-72292-ctfxk-worker-b-n2x9r ruleId 10 action Drop len 82 if genev_sys_6081 2023-05-18 16:14:20 +0000 UTC ci-ln-6rzv9ik-72292-ctfxk-worker-b-n2x9r ipv4 src addr 10.128.2.24 dst addr 10.129.2.26 2023-05-18 16:14:20 +0000 UTC ci-ln-6rzv9ik-72292-ctfxk-worker-b-n2x9r sctp srcPort 36697 dstPort 30102