Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Webhook does not process transient errors properly #2173

Closed
8 tasks done
oribon opened this issue Nov 19, 2023 · 2 comments · Fixed by #2202
Closed
8 tasks done

Webhook does not process transient errors properly #2173

oribon opened this issue Nov 19, 2023 · 2 comments · Fixed by #2202
Assignees
Labels

Comments

@oribon
Copy link
Member

oribon commented Nov 19, 2023

MetalLB Version

latest

Deployment method

Not relevant

Main CNI

N/A

Kubernetes Version

No response

Cluster Distribution

No response

Describe the bug

Using the webhooks, we do not want to block resources depending on the order they are received - e.g we don't want to block a peer specifying a bfdProfile that doesn't exist yet because it could eventually converge. today we achieve that using the TransientError type, returning it only when an error that relates to dependencies between resources occur and we ignore this type of err in the webhook's validator.
The problem with that approach is that in some cases this causes the webhook to miss an actual error, because in the webhook we pass multiple objects of the same type to the validation and return early when hitting any error, which for example if a TransientError happens for the first resource the rest of the resources are not being validated at all.
As an example, consider the following resources:

apiVersion: metallb.io/v1beta2
kind: BGPPeer
metadata:
  name: peer1
  namespace: metallb-system
spec:
  peerAddress: 172.16.30.2
  peerASN: 65433
  myASN: 65439
  bfdProfile: "non-existing"
---
apiVersion: metallb.io/v1beta2
kind: BGPPeer
metadata:
  name: peer2
  namespace: metallb-system
spec:
  peerAddress: 172.16.30.3
  peerASN: 0
  myASN: 65439

peer2 is invalid since it specifies peerASN: 0 so we would expect the webhook to block it anyways, but the outcome of the webhook changes as a result of the order: if peer2 is created without peer1 it is blocked, if it is created after peer1 it passes.
A possible solution is instead of ignoring transient errors in the webhook's validator, it would pass the resources to For without fields that can cause a transient error (resetting them), essentially not allowing the webhook to receive a transient error at all.
TransientErrors could stay in the codebase (serving as a "doc"), we would just not act on them in the webhooks.

To Reproduce

described above

Expected Behavior

described above

Additional Context

described above

I've read and agree with the following

  • I've checked all open and closed issues and my request is not there.
  • I've checked all open and closed pull requests and my request is not there.

I've read and agree with the following

  • I've checked all open and closed issues and my issue is not there.
  • This bug is reproducible when deploying MetalLB from the main branch
  • I have read the troubleshooting guide and I am still not able to make it work
  • I checked the logs and MetalLB is not discarding the configuration as not valid
  • I enabled the debug logs, collected the information required from the cluster using the collect script and will attach them to the issue
  • I will provide the definition of my service and the related endpoint slices and attach them to this issue
@oribon oribon added the bug label Nov 19, 2023
@oribon oribon self-assigned this Nov 19, 2023
@shimritproj
Copy link
Contributor

I would like to work on it.

@oribon oribon assigned shimritproj and unassigned oribon Nov 19, 2023
@oribon
Copy link
Member Author

oribon commented Nov 19, 2023

assigned, thanks!

shimritproj added a commit to shimritproj/metallb that referenced this issue Dec 6, 2023
shimritproj added a commit to shimritproj/metallb that referenced this issue Dec 6, 2023
Signed-off-by: shimritproj <shimritp74@gmail.com>
shimritproj added a commit to shimritproj/metallb that referenced this issue Dec 6, 2023
Fixed metallb#2173

Signed-off-by: shimritproj <shimritp74@gmail.com>
shimritproj added a commit to shimritproj/metallb that referenced this issue Dec 6, 2023
Fixed metallb#2173

Signed-off-by: shimritproj <shimritp74@gmail.com>
shimritproj added a commit to shimritproj/metallb that referenced this issue Dec 6, 2023
shimritproj added a commit to shimritproj/metallb that referenced this issue Dec 6, 2023
shimritproj added a commit to shimritproj/metallb that referenced this issue Dec 6, 2023
Fixed metallb#2173

Signed-off-by: shimritproj <shimritp74@gmail.com>
shimritproj added a commit to shimritproj/metallb that referenced this issue Dec 6, 2023
shimritproj added a commit to shimritproj/metallb that referenced this issue Dec 6, 2023
Fixed metallb#2173

Signed-off-by: shimritproj <shimritp74@gmail.com>
shimritproj added a commit to shimritproj/metallb that referenced this issue Dec 10, 2023
shimritproj added a commit to shimritproj/metallb that referenced this issue Dec 10, 2023
Fixed metallb#2173

Signed-off-by: shimritproj <shimritp74@gmail.com>
shimritproj added a commit to shimritproj/metallb that referenced this issue Dec 10, 2023
shimritproj added a commit to shimritproj/metallb that referenced this issue Dec 10, 2023
Fixed metallb#2173

Signed-off-by: shimritproj <shimritp74@gmail.com>
shimritproj added a commit to shimritproj/metallb that referenced this issue Dec 10, 2023
shimritproj added a commit to shimritproj/metallb that referenced this issue Dec 10, 2023
Resetting the fields that cause Transienterrors
shimritproj added a commit to shimritproj/metallb that referenced this issue Dec 10, 2023
Resetting the fields that cause Transienterrors

Signed-off-by: shimritproj <shimritp74@gmail.com>
shimritproj added a commit to shimritproj/metallb that referenced this issue Dec 11, 2023
shimritproj added a commit to shimritproj/metallb that referenced this issue Dec 11, 2023
Resetting the fields that cause Transienterrors

Signed-off-by: shimritproj <shimritp74@gmail.com>
shimritproj added a commit to shimritproj/metallb that referenced this issue Dec 11, 2023
shimritproj added a commit to shimritproj/metallb that referenced this issue Dec 11, 2023
Resetting the fields that cause Transienterrors

Signed-off-by: shimritproj <shimritp74@gmail.com>
shimritproj added a commit to shimritproj/metallb that referenced this issue Dec 11, 2023
shimritproj added a commit to shimritproj/metallb that referenced this issue Dec 11, 2023
shimritproj added a commit to shimritproj/metallb that referenced this issue Dec 11, 2023
Resetting the fields that cause Transienterrors

Signed-off-by: shimritproj <shimritp74@gmail.com>
shimritproj added a commit to shimritproj/metallb that referenced this issue Dec 12, 2023
shimritproj added a commit to shimritproj/metallb that referenced this issue Dec 12, 2023
Resetting the fields that cause Transienterrors

Signed-off-by: shimritproj <shimritp74@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants