Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure purging neighbor cache for stale deletes #1433

Merged
merged 1 commit into from
Sep 16, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 23 additions & 13 deletions drivers/overlay/peerdb.go
Original file line number Diff line number Diff line change
Expand Up @@ -168,14 +168,14 @@ func (d *driver) peerDbAdd(nid, eid string, peerIP net.IP, peerIPMask net.IPMask
}

func (d *driver) peerDbDelete(nid, eid string, peerIP net.IP, peerIPMask net.IPMask,
peerMac net.HardwareAddr, vtep net.IP) bool {
peerMac net.HardwareAddr, vtep net.IP) peerEntry {
peerDbWg.Wait()

d.peerDb.Lock()
pMap, ok := d.peerDb.mp[nid]
if !ok {
d.peerDb.Unlock()
return false
return peerEntry{}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrjana The key for peerEntry in peerNetworkMap.mp is just the peer IP & mac

        pKey := peerKey{
                peerIP:  peerIP,
                peerMac: peerMac,
        }

In the case of out of order delete we will end up deleting the entry for the newly created peer entry. I think we need eid as part of the key because it will change in this case where we get an add before the delete of the old ep ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We won't delete it incorrectly. We do check for eid match below and that is needed in the caller as well.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, sorry... its right below.

d.peerDb.Unlock()

Expand All @@ -186,19 +186,20 @@ func (d *driver) peerDbDelete(nid, eid string, peerIP net.IP, peerIPMask net.IPM

pMap.Lock()

if pEntry, ok := pMap.mp[pKey.String()]; ok {
pEntry, ok := pMap.mp[pKey.String()]
if ok {
// Mismatched endpoint ID(possibly outdated). Do not
// delete peerdb
if pEntry.eid != eid {
pMap.Unlock()
return false
return pEntry
}
}

delete(pMap.mp, pKey.String())
pMap.Unlock()

return true
return pEntry
}

func (d *driver) peerDbUpdateSandbox(nid string) {
Expand Down Expand Up @@ -312,10 +313,9 @@ func (d *driver) peerDelete(nid, eid string, peerIP net.IP, peerIPMask net.IPMas
return err
}

var pEntry peerEntry
if updateDb {
if !d.peerDbDelete(nid, eid, peerIP, peerIPMask, peerMac, vtep) {
return nil
}
pEntry = d.peerDbDelete(nid, eid, peerIP, peerIPMask, peerMac, vtep)
}

n := d.network(nid)
Expand All @@ -328,14 +328,24 @@ func (d *driver) peerDelete(nid, eid string, peerIP net.IP, peerIPMask net.IPMas
return nil
}

// Delete fdb entry to the bridge for the peer mac
if err := sbox.DeleteNeighbor(vtep, peerMac); err != nil {
return fmt.Errorf("could not delete fdb entry into the sandbox: %v", err)
// Delete fdb entry to the bridge for the peer mac only if the
// entry existed in local peerdb. If it is a stale delete
// request, still call DeleteNeighbor but only to cleanup any
// leftover sandbox neighbor cache and not actually delete the
// kernel state.
if (eid == pEntry.eid && vtep.Equal(pEntry.vtep)) ||
(eid != pEntry.eid && !vtep.Equal(pEntry.vtep)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a curiosity, this could be compacted to if (eid == pEntry.eid) == vtep.Equal(pEntry.vtep) { }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will be clever but it will be awfully lot difficult to decipher the intent.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently peerDbDelete returns false in the error cases (ie: d.peerDb.mp[nid] not present or peerKey not present). Both are still valid error cases. I think we should retain the boolean return and add peerEntry as additional return type. So the existing !d.peerDbDelete check can be retained in peerDelete and the if check added above is not necessary. Its hard to follow the logic with the current check. We just need the eid == pEntry.eid && vtep.Equal(pEntry.vtep) in the DeleteNeighbor call. wdyt ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though the logic in the PR may be a little hard to follow it is absolutely necessary because even if endpoint Ids did not match we need to remove state in sandbox neighbor case (look for eid != pEntry.eid && !vtep.Equal(pEntry.vtep)) which might have existed from a previous neighbor add that did not get removed because we got another add before the delete.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. yeah, I see why we can't bail out on a false return from peerDbDelete. But I think the if check should also also include eid != pEntry.eid && vtep.Equal(pEntry.vtep) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it should not. If we do we will delete the vtep for the current valid entry which is specifically why we are doing the eid match to avoid a stale(out of order) delete from actually deleting anything. About the only thing that a stale delete should do is remove a stale sandbox neighbor cache.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I was thinking of the case where the task gets rescheduled and placed on the same node. But in that case the neighbor entry is actually correct and not stale.

LGTM. Good that you caught this in the testing. It would have been impossible to debug in live setups.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

if err := sbox.DeleteNeighbor(vtep, peerMac,
eid == pEntry.eid && vtep.Equal(pEntry.vtep)); err != nil {
return fmt.Errorf("could not delete fdb entry into the sandbox: %v", err)
}
}

// Delete neighbor entry for the peer IP
if err := sbox.DeleteNeighbor(peerIP, peerMac); err != nil {
return fmt.Errorf("could not delete neighbor entry into the sandbox: %v", err)
if eid == pEntry.eid {
if err := sbox.DeleteNeighbor(peerIP, peerMac, true); err != nil {
return fmt.Errorf("could not delete neighbor entry into the sandbox: %v", err)
}
}

if err := d.checkEncryption(nid, vtep, 0, false, false); err != nil {
Expand Down
52 changes: 28 additions & 24 deletions osl/neigh_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ func (n *networkNamespace) findNeighbor(dstIP net.IP, dstMac net.HardwareAddr) *
return nil
}

func (n *networkNamespace) DeleteNeighbor(dstIP net.IP, dstMac net.HardwareAddr) error {
func (n *networkNamespace) DeleteNeighbor(dstIP net.IP, dstMac net.HardwareAddr, osDelete bool) error {
var (
iface netlink.Link
err error
Expand All @@ -43,42 +43,46 @@ func (n *networkNamespace) DeleteNeighbor(dstIP net.IP, dstMac net.HardwareAddr)
return fmt.Errorf("could not find the neighbor entry to delete")
}

n.Lock()
nlh := n.nlHandle
n.Unlock()
if osDelete {
n.Lock()
nlh := n.nlHandle
n.Unlock()

if nh.linkDst != "" {
iface, err = nlh.LinkByName(nh.linkDst)
if err != nil {
return fmt.Errorf("could not find interface with destination name %s: %v",
nh.linkDst, err)
if nh.linkDst != "" {
iface, err = nlh.LinkByName(nh.linkDst)
if err != nil {
return fmt.Errorf("could not find interface with destination name %s: %v",
nh.linkDst, err)
}
}
}

nlnh := &netlink.Neigh{
IP: dstIP,
State: netlink.NUD_PERMANENT,
Family: nh.family,
}
nlnh := &netlink.Neigh{
IP: dstIP,
State: netlink.NUD_PERMANENT,
Family: nh.family,
}

if nlnh.Family > 0 {
nlnh.HardwareAddr = dstMac
nlnh.Flags = netlink.NTF_SELF
}
if nlnh.Family > 0 {
nlnh.HardwareAddr = dstMac
nlnh.Flags = netlink.NTF_SELF
}

if nh.linkDst != "" {
nlnh.LinkIndex = iface.Attrs().Index
}
if nh.linkDst != "" {
nlnh.LinkIndex = iface.Attrs().Index
}

if err := nlh.NeighDel(nlnh); err != nil {
return fmt.Errorf("could not delete neighbor entry: %v", err)
if err := nlh.NeighDel(nlnh); err != nil {
return fmt.Errorf("could not delete neighbor entry: %v", err)
}
}

n.Lock()
for i, nh := range n.neighbors {
if nh.dstIP.Equal(dstIP) && bytes.Equal(nh.dstMac, dstMac) {
n.neighbors = append(n.neighbors[:i], n.neighbors[i+1:]...)
}
}
n.Unlock()

return nil
}
Expand Down
2 changes: 1 addition & 1 deletion osl/sandbox.go
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ type Sandbox interface {
AddNeighbor(dstIP net.IP, dstMac net.HardwareAddr, option ...NeighOption) error

// DeleteNeighbor deletes neighbor entry from the sandbox.
DeleteNeighbor(dstIP net.IP, dstMac net.HardwareAddr) error
DeleteNeighbor(dstIP net.IP, dstMac net.HardwareAddr, osDelete bool) error

// Returns an interface with methods to set neighbor options.
NeighborOptions() NeighborOptionSetter
Expand Down