Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix cleaning up workloadentry with same ip and network #43951

Merged
merged 2 commits into from Mar 17, 2023

Conversation

stevenctl
Copy link
Contributor

@stevenctl stevenctl commented Mar 15, 2023

fixes #43950

Could also use the autoregisteredWorkloadEntryName with less validation.

This keys the resource to be cleaned up as well as the input.

If there were already multiple same network/IP workloadentries, that's it's own problem. This just keeps it from being worse with stuck resources.

@stevenctl stevenctl requested a review from a team as a code owner March 15, 2023 21:06
@istio-policy-bot
Copy link

🤔 🐛 You appear to be fixing a bug in Go code, yet your PR doesn't include updates to any test files. Did you forget to add a test?

Courtesy of your friendly test nag.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Mar 15, 2023

CLA Not Signed

@istio-testing istio-testing added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Mar 15, 2023
@stevenctl stevenctl requested review from a team as code owners March 15, 2023 21:13
@istio-testing istio-testing added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 15, 2023
Copy link
Member

@dhawton dhawton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Release note looks good

@dhawton
Copy link
Member

dhawton commented Mar 15, 2023

/test integ-security

Copy link
Member

@howardjohn howardjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like maybe if we have:

wg1-1.2.3.4

and we get a connection for

wg2-1.2.3.4

we should just remove wg1-1.2.3.4 immediately? Its not valid to have overlapping IP in the same network, violating that may lead to strange behavior

@stevenctl
Copy link
Contributor Author

we should just remove wg1-1.2.3.4 immediately? Its not valid to have overlapping IP in the same network, violating that may lead to strange behavior

Should we remove the existing one assuming that the new one is replacing it, and we're just waiting for the old one to go away? Or should we ignore the new one so we can't have something new kick an existing workload?

The former case sounds more realistic I guess

@howardjohn
Copy link
Member

That (old one is stale and should be removed) would be my guess, but I am not sure I fully understand the cases that lead to this. I do know we make a strong assumption that "IP is unique within network" in a variety of places though

@hzxuzhonghu
Copy link
Member

We donot have such case, first #43950 experiment is not valid, we do not allow running multi sidecar/ztunnel in a VM.

For VM workloadentry autoregister, we have gracefully handle that reconnect.

@linsun
Copy link
Member

linsun commented Mar 16, 2023

Agreed with @hzxuzhonghu - what is the user case of running multiple docker contains and each has its own ztunnel?

@stevenctl
Copy link
Contributor Author

stevenctl commented Mar 16, 2023

The docker experiment was a way for me to test running zTunnel on "vms" or in "dedicated mode" locally.
The fact that they had the same IP was a mistake I made.

Regardless of the fact that it's not a usecase to have multiple things with the same IP, we shouldn't have invalid workloadentries automatically created by istiod that never get cleaned up.

@stevenctl
Copy link
Contributor Author

That (old one is stale and should be removed) would be my guess, but I am not sure I fully understand the cases that lead to this. I do know we make a strong assumption that "IP is unique within network" in a variety of places though

The case I ran into would only occur if a proxy with the same network/ip connects but asks for a different auto-register group. So maybe the same IP ends up getting re-used to be part of a different app or something.

  1. We do our check to see if there is already a workloadentry (this name includes the WorkloadGroup and network)

  1. That check misses, so now we add the new check that lists all WorkloadEntries with the same IP/network. If they have the same workloadgroup, we should have seen it in the first check. Anything that exists here should be cleaned up since the IP has "moved" to a new workloadgroup.

Regardless of all that logic, I still think this PR makes things more robust against misconfiguration. A user installing things on VMs for the first time could make the same mistake I did. Having a manual step of cleaning up resources that supposedly are managed my istiod doesn't seem right.

One question I have is for the new check, should we consider only same-namespace WorkloadEntries as duplicates?

Copy link
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I donot see any bad effect with more meta data in key

@stevenctl
Copy link
Contributor Author

Yeah I think the key should map 1:1 with the WLE. It could just be the WLE name. Even if we have some logic to prevent duplicates, there can still be races when the old instance and new instance with the same IP are on different istiod instances.

The most common case that could cause something like this is editing the metadata to move the VM onto a different workload group and restarting the proxy.

@istio-testing istio-testing merged commit e2706b1 into istio:master Mar 17, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle workloadentries with overlapping IP and network
7 participants