You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is in relation to a feature added in #8076 which also started validating spec.podSelector to ensure that the field on new Server doesn't cause selector overlap with pod selectors on existing Server resources.
Unfortunately, this check trips up when users rename their Server resource. Because renames look like a create and delete pair of operations to Kubernetes, this causes the resource with the old name (that still exists in the cluster) and the resource with the new name (being submitted for creation) to have overlapping selectors.
Assuming I haven't missed any special flags, the problem is also that some tooling don't prune until after creation succeeds, while creation fails because the old resource hasn't been pruned yet:
helm doesn't delete resources (deleteResource call on line 459) until after creates (createResource call on line 412) and updates (updateResource call on on line 427) succeed.
kubectl apply --prune doesn't prune (masked as the call to PostProcessorFn on line 477) until after all objects are applied (applyOneObject).
How can it be reproduced?
Create a new Server resource, e.g.:
apiVersion: policy.linkerd.io/v1beta1
kind: Server
metadata:
name: http
labels:
app.kubernetes.io/name: web
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: web
port: http
proxyProtocol: HTTP/1
Apply the resource. Change the resource name (metadata.name field) to something new, e.g., web, and you'll get an error.
Logs, error output, etc
The error is:
could not create object: admission webhook "linkerd-policy-validator.linkerd.io" denied the request: identical server spec already exists
output of linkerd check -o short
N/A
Environment
linkerd 2.12.3
Possible solution
We currently have to manually delete the resource with the old name before doing any resource creation / update / apply. It was also not always obvious to users which resources are colliding, but #10187 seems to have alleviated that.
With GitOps-based controls, where users might not have write access to production systems, this sometimes means it requires intervention from cluster operators. With dozens of clusters and hundreds of microservices, this doesn't scale well. We could attempt to build something to handle it automatically, but deleting before creating does mean there could be downtime, because there are no Server resources that target the set of pods.
Another possible solution is to add a flag to allow the cluster administrator to disable pod selector overlap validation. A big question I haven't taken into account here (and don't know the answer to) is what the repercussions of overlapping selectors are, e.g., whether selector overlaps cause undefined behavior on the proxy.
Additional context
No response
Would you like to work on fixing this bug?
None
The text was updated successfully, but these errors were encountered:
The admission webhook blocking the above resource renaming will be removed or relaxed as we continue to improve Status fields on Linkerd policy resources. We have work in flight to address updating Statuses and we are keeping this issue open to track resolving the bug you have identified.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
What is the issue?
This is in relation to a feature added in #8076 which also started validating
spec.podSelector
to ensure that the field on newServer
doesn't cause selector overlap with pod selectors on existingServer
resources.Unfortunately, this check trips up when users rename their
Server
resource. Because renames look like a create and delete pair of operations to Kubernetes, this causes the resource with the old name (that still exists in the cluster) and the resource with the new name (being submitted for creation) to have overlapping selectors.Assuming I haven't missed any special flags, the problem is also that some tooling don't prune until after creation succeeds, while creation fails because the old resource hasn't been pruned yet:
deleteResource
call on line 459) until after creates (createResource
call on line 412) and updates (updateResource
call on on line 427) succeed.kubectl apply --prune
doesn't prune (masked as the call toPostProcessorFn
on line 477) until after all objects are applied (applyOneObject
).How can it be reproduced?
Create a new
Server
resource, e.g.:Apply the resource. Change the resource name (
metadata.name
field) to something new, e.g.,web
, and you'll get an error.Logs, error output, etc
The error is:
output of
linkerd check -o short
N/A
Environment
linkerd 2.12.3
Possible solution
We currently have to manually delete the resource with the old name before doing any resource creation / update / apply. It was also not always obvious to users which resources are colliding, but #10187 seems to have alleviated that.
With GitOps-based controls, where users might not have write access to production systems, this sometimes means it requires intervention from cluster operators. With dozens of clusters and hundreds of microservices, this doesn't scale well. We could attempt to build something to handle it automatically, but deleting before creating does mean there could be downtime, because there are no
Server
resources that target the set of pods.Another possible solution is to add a flag to allow the cluster administrator to disable pod selector overlap validation. A big question I haven't taken into account here (and don't know the answer to) is what the repercussions of overlapping selectors are, e.g., whether selector overlaps cause undefined behavior on the proxy.
Additional context
No response
Would you like to work on fixing this bug?
None
The text was updated successfully, but these errors were encountered: