-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
e2e ipv6 egress firewall fix #4385
Conversation
Add to avoid locking main egress firewall handler on internal dns resolver lock. Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
Update connectivity timeout to 3 seconds and allow 2 retries both for positive and negative cases Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
work with ipv6, since github runners don't have any routes for IPv6. Split current test that checks allow IP and allow CIDR+port into 2 tests to limit the amount of required external containers. Bonus: the only test that used external containers doesn't need to create them anymore, as they are created in beforeEach Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
avoid unneeded container creation. No extra changes Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
affected by deny all. Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
test/e2e/egress_firewall.go
Outdated
@@ -36,6 +36,10 @@ var _ = ginkgo.Describe("e2e egress firewall policy validation", func() { | |||
retryInterval = 1 * time.Second | |||
retryTimeout = 30 * time.Second | |||
ciNetworkName = "kind" | |||
externalContainerName1 = "e2e-egress-fw-external-container1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi @martinkennelly may impact your work to convert tests for downstraem
test/e2e/egress_firewall.go
Outdated
ginkgo.It("Should validate the egress firewall DNS does not deadlock when adding many dnsNames", func() { | ||
var egressFirewallConfig = fmt.Sprintf(`kind: EgressFirewall | ||
table.DescribeTable("Should validate the egress firewall policy functionality against cluster nodes by using node selector", | ||
func(checkDeadlock bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be good either in a comment or in ginkgo steps explain what kind of deadlock this test is testing for. Is it something specific in the code? or what lock are you trying to exercise. Otherwise someone else looking at this test wont know what we are testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i tried to do that under if checkDeadlock
comment, any ideas on what else to add?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant specify which mutexes you are trying to exercise here in the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, so as this is e2e, and not a unit test, I am not trying to reproduce a specific deadlock situation, but rather a "tricky e2e scenario" where multiple things can go wrong. Maybe the test name is not the best, but I just tried to adapt what was already there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe rename it to chaos or stress test or something? The name deadlock tells me as a reader you are trying to test a mutex, but the name of the mutex isn't listed. I can see someone in the future trying to go and guess which lock in the code this test was trying to exercise.
srcPodName := "e2e-egress-fw-src-pod" | ||
testContainer := fmt.Sprintf("%s-container", srcPodName) | ||
testContainerFlag := fmt.Sprintf("--container=%s", testContainer) | ||
// use random labels in case test runs again since it's a pain to remove the label from the node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it a pain? if everyone does this we could end up with flakes in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, you tell me 34350bd#diff-65b73a6a62a4e5b8903b9f17f0e03a42365b127e7d86f006f9bec363b4be04b8R1276
I just moved that part :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOL! I vaguely remember writing that and had originally written code to remove labels...but i dont remember why I ended up doing it. Since you are just moving code we can ignore it for now.
@@ -25,8 +25,6 @@ should provide Internet connection continuously when ovnkube-node pod is killed| | |||
should provide Internet connection continuously when pod running master instance of ovnkube-control-plane is killed|\ | |||
should provide Internet connection continuously when all pods are killed on node running master instance of ovnkube-control-plane|\ | |||
should provide Internet connection continuously when all ovnkube-control-plane pods are killed|\ | |||
Should validate the egress firewall policy functionality against remote hosts|\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yay!
To verify no deadlock, we need an intensive follow up workload. Node-selector testing work the best, as node events handling includes iterating over all egress firewalls internally. Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
that needs it. Use defer to cleanup instead of afterEach, as afterEach should cleanup resources created by beforeEach. Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
Remove unneeded external IPs from the deadlock test, as multiple unresolvable ds names is the main ingredient. Fix ip:port formatting for ipv6. Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
1 commit: call Delete dns name asynchronously cc @JacobTanenbaum please check if it looks safe, also ptal at e2e changes
other commits are restructuring ef e2es, cc @trozet to check that nodeSelector test changes make sense