New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes panic with delete NAT operations #3714
Conversation
Regression introduced by: 25d892c Would end up appending a nil NAT and causing an NPE when trying to execute EquivalentNAT: panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x78 pc=0x178ea7b] goroutine 140 [running]: github.com/ovn-org/ovn-kubernetes/go-controller/pkg/libovsdbops.isEquivalentNAT(0x0, 0xc002e6cea0) /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/libovsdbops/router.go:935 +0x9b github.com/ovn-org/ovn-kubernetes/go-controller/pkg/libovsdbops.DeleteNATsOps({0x24aa068, 0xc0008c0fc0}, {0x0, 0x0, 0x0}, 0xc002e1a4e0, {0xc000130c48, 0x1, 0xc00106f6f8?}) /go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/libovsdbops/router.go:1087 +0x728 Signed-off-by: Tim Rozet <trozet@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to think more on why this panic wasn't caught upstream in our CI -> but I won't hold the PR for that, glad we caught it before this made it to downstream.
@@ -1005,7 +1005,10 @@ func GetRouterNATs(nbClient libovsdbclient.Client, router *nbdb.LogicalRouter) ( | |||
nats := []*nbdb.NAT{} | |||
for _, uuid := range r.Nat { | |||
nat, err := GetNAT(nbClient, &nbdb.NAT{UUID: uuid}) | |||
if err != nil && err != libovsdbclient.ErrNotFound { | |||
if err == libovsdbclient.ErrNotFound { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, we were hitting another error that wasn't "not found" error which we weren't handling?
also was this the only spot we need to fix? I am unsure if I reviewed the initial fix but we had other places where we did similar fixes right @flavio-fernandes ? -> let's make sure we double check those..
but based on the introduced commit hash: 25d892c this LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I briefly scanned #3647 last night and didn't see anything...but that was like after midnight soooo :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm egress ip flake: ```
|
Looks like an IP was not correctly removed from the external gateway address set, and then a pod for the egressip test was re-allocated that IP and did not mach the node no-reroute policy (prio 102) because it matched the external gateway policy first (prio 501)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
@@ -1005,7 +1005,10 @@ func GetRouterNATs(nbClient libovsdbclient.Client, router *nbdb.LogicalRouter) ( | |||
nats := []*nbdb.NAT{} | |||
for _, uuid := range r.Nat { | |||
nat, err := GetNAT(nbClient, &nbdb.NAT{UUID: uuid}) | |||
if err != nil && err != libovsdbclient.ErrNotFound { | |||
if err == libovsdbclient.ErrNotFound { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: I would use errors.Is(err, libovsdbclient.ErrNotFound)
in case we wrap it in the future
if errors.Is(err, libovsdbclient.ErrNotFound) {
@@ -1005,7 +1005,10 @@ func GetRouterNATs(nbClient libovsdbclient.Client, router *nbdb.LogicalRouter) ( | |||
nats := []*nbdb.NAT{} | |||
for _, uuid := range r.Nat { | |||
nat, err := GetNAT(nbClient, &nbdb.NAT{UUID: uuid}) | |||
if err != nil && err != libovsdbclient.ErrNotFound { | |||
if err == libovsdbclient.ErrNotFound { | |||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duh! 🤦 I can't believe I missed that in previous fix.
The second attempt fails because the host network pod is not able to open the port
|
This also seems wrong: https://github.com/ovn-org/ovn-kubernetes/actions/runs/5397299956/jobs/9802739319?pr=3714 /me taking a closer look at it right now. e2e-dual-conversion (noHA, interconnect-single-node-zones, 3, 1)
|
We know the egress IP failure was due to an exgw issue fixed by another PR, and @flavio-fernandes and @tssurya identified the dualstack conversion failure is because routes are missing on ovn_cluster_router for ipv6. Therefore they have nothing to do with this PR. Will merge. |
Note: The fix for the missing ipv6 route is in PR #3724 |
Regression introduced by:
25d892c
Would end up appending a nil NAT and causing an NPE when trying to execute EquivalentNAT:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x78 pc=0x178ea7b]
goroutine 140 [running]:
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/libovsdbops.isEquivalentNAT(0x0, 0xc002e6cea0)
/go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/libovsdbops/router.go:935 +0x9b
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/libovsdbops.DeleteNATsOps({0x24aa068, 0xc0008c0fc0}, {0x0, 0x0, 0x0}, 0xc002e1a4e0, {0xc000130c48, 0x1, 0xc00106f6f8?})
/go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/libovsdbops/router.go:1087 +0x728
Seen downstream:
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_ovn-kubernetes/1726/pull-ci-openshift-ovn-kubernetes-master-e2e-metal-ipi-ovn-ipv6/1673751230423240704/artifacts/e2e-metal-ipi-ovn-ipv6/gather-extra/artifacts/pods/openshift-ovn-kubernetes_ovnkube-master-fb52f_ovnkube-master_previous.log