fix cleaning up workloadentry with same ip and network #43951

stevenctl · 2023-03-15T21:06:19Z

Could also use the autoregisteredWorkloadEntryName with less validation.

This keys the resource to be cleaned up as well as the input.

If there were already multiple same network/IP workloadentries, that's it's own problem. This just keeps it from being worse with stuck resources.

istio-policy-bot · 2023-03-15T21:06:22Z

🤔 🐛 You appear to be fixing a bug in Go code, yet your PR doesn't include updates to any test files. Did you forget to add a test?

Courtesy of your friendly test nag.

linux-foundation-easycla · 2023-03-15T21:06:23Z

❌ - login: @stevenctl / name: Steven Landow . The commit (d605f4b, 7946e7d) is not authorized under a signed CLA. Please click here to be authorized. For further assistance with EasyCLA, please submit a support request ticket.

dhawton

Release note looks good

dhawton · 2023-03-15T22:02:28Z

/test integ-security

howardjohn

Seems like maybe if we have:

wg1-1.2.3.4

and we get a connection for

wg2-1.2.3.4

we should just remove wg1-1.2.3.4 immediately? Its not valid to have overlapping IP in the same network, violating that may lead to strange behavior

stevenctl · 2023-03-15T23:06:25Z

we should just remove wg1-1.2.3.4 immediately? Its not valid to have overlapping IP in the same network, violating that may lead to strange behavior

Should we remove the existing one assuming that the new one is replacing it, and we're just waiting for the old one to go away? Or should we ignore the new one so we can't have something new kick an existing workload?

The former case sounds more realistic I guess

howardjohn · 2023-03-16T00:21:34Z

That (old one is stale and should be removed) would be my guess, but I am not sure I fully understand the cases that lead to this. I do know we make a strong assumption that "IP is unique within network" in a variety of places though

hzxuzhonghu · 2023-03-16T02:37:45Z

We donot have such case, first #43950 experiment is not valid, we do not allow running multi sidecar/ztunnel in a VM.

For VM workloadentry autoregister, we have gracefully handle that reconnect.

linsun · 2023-03-16T13:19:31Z

Agreed with @hzxuzhonghu - what is the user case of running multiple docker contains and each has its own ztunnel?

stevenctl · 2023-03-16T18:08:40Z

The docker experiment was a way for me to test running zTunnel on "vms" or in "dedicated mode" locally.
The fact that they had the same IP was a mistake I made.

Regardless of the fact that it's not a usecase to have multiple things with the same IP, we shouldn't have invalid workloadentries automatically created by istiod that never get cleaned up.

stevenctl · 2023-03-16T18:25:59Z

That (old one is stale and should be removed) would be my guess, but I am not sure I fully understand the cases that lead to this. I do know we make a strong assumption that "IP is unique within network" in a variety of places though

The case I ran into would only occur if a proxy with the same network/ip connects but asks for a different auto-register group. So maybe the same IP ends up getting re-used to be part of a different app or something.

We do our check to see if there is already a workloadentry (this name includes the WorkloadGroup and network)

istio/pilot/pkg/autoregistration/controller.go

Line 233 in b034571

if wle != nil {

That check misses, so now we add the new check that lists all WorkloadEntries with the same IP/network. If they have the same workloadgroup, we should have seen it in the first check. Anything that exists here should be cleaned up since the IP has "moved" to a new workloadgroup.

Regardless of all that logic, I still think this PR makes things more robust against misconfiguration. A user installing things on VMs for the first time could make the same mistake I did. Having a manual step of cleaning up resources that supposedly are managed my istiod doesn't seem right.

One question I have is for the new check, should we consider only same-namespace WorkloadEntries as duplicates?

hzxuzhonghu

I donot see any bad effect with more meta data in key

stevenctl · 2023-03-17T18:27:56Z

Yeah I think the key should map 1:1 with the WLE. It could just be the WLE name. Even if we have some logic to prevent duplicates, there can still be races when the old instance and new instance with the same IP are on different istiod instances.

The most common case that could cause something like this is editing the metadata to move the VM onto a different workload group and restarting the proxy.

fix cleaning up workloadentry with same ip and network

d605f4b

stevenctl requested a review from a team as a code owner March 15, 2023 21:06

istio-testing added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Mar 15, 2023

stevenctl requested review from a team as code owners March 15, 2023 21:13

istio-testing added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 15, 2023

release note

7946e7d

stevenctl force-pushed the fix-dupe-ip-wle-cleanup branch from bb4a4a4 to 7946e7d Compare March 15, 2023 21:15

dhawton approved these changes Mar 15, 2023

View reviewed changes

howardjohn reviewed Mar 15, 2023

View reviewed changes

hzxuzhonghu reviewed Mar 17, 2023

View reviewed changes

howardjohn approved these changes Mar 17, 2023

View reviewed changes

istio-testing merged commit e2706b1 into istio:master Mar 17, 2023
1 check passed

adiprerepa mentioned this pull request Mar 25, 2023

cleanup stale workloadentries on overlapping IPs #44113

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix cleaning up workloadentry with same ip and network #43951

fix cleaning up workloadentry with same ip and network #43951

stevenctl commented Mar 15, 2023 •

edited

istio-policy-bot commented Mar 15, 2023

linux-foundation-easycla bot commented Mar 15, 2023 •

edited

dhawton left a comment

dhawton commented Mar 15, 2023

howardjohn left a comment

stevenctl commented Mar 15, 2023

howardjohn commented Mar 16, 2023

hzxuzhonghu commented Mar 16, 2023

linsun commented Mar 16, 2023

stevenctl commented Mar 16, 2023 •

edited

stevenctl commented Mar 16, 2023

hzxuzhonghu left a comment

stevenctl commented Mar 17, 2023

fix cleaning up workloadentry with same ip and network #43951

fix cleaning up workloadentry with same ip and network #43951

Conversation

stevenctl commented Mar 15, 2023 • edited

istio-policy-bot commented Mar 15, 2023

linux-foundation-easycla bot commented Mar 15, 2023 • edited

dhawton left a comment

Choose a reason for hiding this comment

dhawton commented Mar 15, 2023

howardjohn left a comment

Choose a reason for hiding this comment

stevenctl commented Mar 15, 2023

howardjohn commented Mar 16, 2023

hzxuzhonghu commented Mar 16, 2023

linsun commented Mar 16, 2023

stevenctl commented Mar 16, 2023 • edited

stevenctl commented Mar 16, 2023

hzxuzhonghu left a comment

Choose a reason for hiding this comment

stevenctl commented Mar 17, 2023

stevenctl commented Mar 15, 2023 •

edited

linux-foundation-easycla bot commented Mar 15, 2023 •

edited

stevenctl commented Mar 16, 2023 •

edited