EIP: Fix CPIP updates on cloud environments #3942
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
- What this PR does and why is it needed
The EIP controller being event driven means we don't
have control over the order of events that we get.
There is a bug around the ordering of event processing currently:
CloudPrivateIPConfig: %s already assigned to node: %s", cloudPrivateIPConfigName, cloudPrivateIPConfig.Spec.Node
because the deletion for CPIP takes time and its still inCloudResponsePending Deleting IP addres
phase.This PR fixes this by firstly fixing the CPIP config add logic to explicitly fail to add something if the object is currently being deleted. This is the same thing we do for updates. The next thing this PR does is to fix the retries around node's becoming online and offline. If the node add/delete fails, we do not retry, I think we should.
- Special notes for reviewers
Currently I added retry only for when node comes back online, if we want we can also add retries for when node goes offline?
Long term solution: Become level driven
- How to verify it
Tested on cloud env, we are blind here u/s. Not easy to reproduce the ordering of events via a test.
- Description for the changelog