Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e ipv6 egress firewall fix #4385

Merged
merged 12 commits into from
Jun 27, 2024
Merged

e2e ipv6 egress firewall fix #4385

merged 12 commits into from
Jun 27, 2024

Conversation

npinaeva
Copy link
Member

@npinaeva npinaeva commented May 23, 2024

1 commit: call Delete dns name asynchronously cc @JacobTanenbaum please check if it looks safe, also ptal at e2e changes
other commits are restructuring ef e2es, cc @trozet to check that nodeSelector test changes make sense

@coveralls
Copy link

coveralls commented May 23, 2024

Coverage Status

coverage: 52.732% (-0.002%) from 52.734%
when pulling 8c1e9ed on npinaeva:ef-e2e-ipv6
into 798eb14 on ovn-org:master.

@tssurya tssurya added ci-ipv6 Add support for IPV6 e2e's to run upstream area/e2e-testing feature/egress-firewall All issues related to egress firewall labels May 27, 2024
Add to avoid locking main egress firewall handler on internal dns
resolver lock.

Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
Update connectivity timeout to 3 seconds and allow 2 retries both for
positive and negative cases

Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
work with ipv6, since github runners don't have any routes for IPv6.
Split current test that checks allow IP and allow CIDR+port into 2
tests to limit the amount of required external containers.
Bonus: the only test that used external containers doesn't need to
create them anymore, as they are created in beforeEach

Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
avoid unneeded container creation. No extra changes

Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
affected by deny all.

Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
@npinaeva npinaeva marked this pull request as ready for review May 31, 2024 10:15
@npinaeva npinaeva requested a review from a team as a code owner May 31, 2024 10:15
JacobTanenbaum
JacobTanenbaum previously approved these changes May 31, 2024
@@ -36,6 +36,10 @@ var _ = ginkgo.Describe("e2e egress firewall policy validation", func() {
retryInterval = 1 * time.Second
retryTimeout = 30 * time.Second
ciNetworkName = "kind"
externalContainerName1 = "e2e-egress-fw-external-container1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi @martinkennelly may impact your work to convert tests for downstraem

ginkgo.It("Should validate the egress firewall DNS does not deadlock when adding many dnsNames", func() {
var egressFirewallConfig = fmt.Sprintf(`kind: EgressFirewall
table.DescribeTable("Should validate the egress firewall policy functionality against cluster nodes by using node selector",
func(checkDeadlock bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be good either in a comment or in ginkgo steps explain what kind of deadlock this test is testing for. Is it something specific in the code? or what lock are you trying to exercise. Otherwise someone else looking at this test wont know what we are testing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i tried to do that under if checkDeadlock comment, any ideas on what else to add?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant specify which mutexes you are trying to exercise here in the code

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, so as this is e2e, and not a unit test, I am not trying to reproduce a specific deadlock situation, but rather a "tricky e2e scenario" where multiple things can go wrong. Maybe the test name is not the best, but I just tried to adapt what was already there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rename it to chaos or stress test or something? The name deadlock tells me as a reader you are trying to test a mutex, but the name of the mutex isn't listed. I can see someone in the future trying to go and guess which lock in the code this test was trying to exercise.

srcPodName := "e2e-egress-fw-src-pod"
testContainer := fmt.Sprintf("%s-container", srcPodName)
testContainerFlag := fmt.Sprintf("--container=%s", testContainer)
// use random labels in case test runs again since it's a pain to remove the label from the node
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it a pain? if everyone does this we could end up with flakes in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL! I vaguely remember writing that and had originally written code to remove labels...but i dont remember why I ended up doing it. Since you are just moving code we can ignore it for now.

@@ -25,8 +25,6 @@ should provide Internet connection continuously when ovnkube-node pod is killed|
should provide Internet connection continuously when pod running master instance of ovnkube-control-plane is killed|\
should provide Internet connection continuously when all pods are killed on node running master instance of ovnkube-control-plane|\
should provide Internet connection continuously when all ovnkube-control-plane pods are killed|\
Should validate the egress firewall policy functionality against remote hosts|\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay!

To verify no deadlock, we need an intensive follow up workload.
Node-selector testing work the best, as node events handling includes
iterating over all egress firewalls internally.

Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
that needs it. Use defer to cleanup instead of afterEach, as afterEach
should cleanup resources created by beforeEach.

Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
Remove unneeded external IPs from the deadlock test, as multiple
unresolvable ds names is the main ingredient.
Fix ip:port formatting for ipv6.

Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
Signed-off-by: Nadia Pinaeva <n.m.pinaeva@gmail.com>
@trozet trozet merged commit 926f624 into ovn-org:master Jun 27, 2024
35 of 36 checks passed
@npinaeva npinaeva deleted the ef-e2e-ipv6 branch June 27, 2024 16:44
@kyrtapz kyrtapz mentioned this pull request Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/e2e-testing ci-ipv6 Add support for IPV6 e2e's to run upstream feature/egress-firewall All issues related to egress firewall
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants