New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't call DeleteHostSubnetRules on a replayed(?) Deleted event #18617
Don't call DeleteHostSubnetRules on a replayed(?) Deleted event #18617
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thanks Dan!
|
/retest |
| @@ -103,6 +103,10 @@ func (node *OsdnNode) handleDeleteHostSubnet(obj interface{}) { | |||
| if hs.HostIP == node.localIP { | |||
| return | |||
| } | |||
| if _, exists := node.hostSubnetMap[string(hs.UID)]; !exists { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we still delete the rules for the subnet portion? Just skip the host part of the openflow?
I am not sure if those rules are really non-existent, while we clearly do not want the host part to be deleted of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SubnetAllocator always returns the lowest-numbered available subnet, so deleted subnets will get reused right away, so it's possible you could get a replayed delete that mentioned a subnet that was already in use again by a different node.
(And we can't easily check both HostIP and Subnet when doing the deletes, because the table 10 rules don't include the subnet, and the table 50/90 rules only include the hostIP in a non-matchable-against field.)
Yeah, I'm vaguely concerned that this patch will cause us to ignore some delete that we shouldn't ignore, but that really shouldn't be possible... if we added OVS flows, then we also added the subnet to hostSubnetMap, so if it's not in hostSubnetMap, then we didn't add OVS flows, so we shouldn't delete them...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I feel that this fix make things better. May be not ideal, but better than what we see today.
|
/lgtm |
|
/lgtm though this will be interesting if/when we start supporting multiple hostsubnets per node, right? |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danwinship, dcbw, knobunc, rajatchopra The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
|
/test gcp |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
This may have more repercussions. We need to look at how replaying of these events affect in other handleDeleteResource() methods. For example: Consider handleDeleteNode() (code snippet below) If we get replay of these node events:
Step-3 will delete the node created in step-2 which is incorrect (as these are different nodes with same name)? |
|
Automatic merge from submit-queue (batch tested with PRs 18505, 18617, 18604). |
|
/test all [submit-queue is verifying that this PR is safe to merge] |
Yeah, there's more going on here, but I think the question is "why are we getting replayed HostSubnet delete events", not "what happens when other resource types get replayed delete events", because the evidence seems to suggest that they just don't. |
In some as-yet-undiagnosed set of circumstances, HostSubnet Deleted events are being replayed:
(Note that when the HostSubnet was recreated, it got a different subnet (
18vs10), but the second Deleted event references the old subnet again.)Currently, when we receive the second Deleted event, we don't notice that it's a replay, and just call DeleteHostSubnetRules again. That does:
Since the HostIP is still the same, the first DeleteFlows call ends up deleting the rule that was added by the earlier AddHostSubnetRules call. (The second and third DeleteFlows calls are no-ops, because they're trying to delete flows to the
10subnet, but the current OVS flows refer to the18subnet.) As a result, we end up with no rule to accept VXLAN traffic from that node, even though all the other rules for that node are still present.Anyway, the fix is easy: don't run DeleteHostSubnetRules if the event doesn't correspond to a currently-known-about HostSubnet.
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1544903