Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added code to avoid aggresive update due to rolling update issue on any node. #162

Merged
merged 2 commits into from Mar 5, 2019

Conversation

Projects
None yet
3 participants
@tathagatachowdhury
Copy link
Contributor

tathagatachowdhury commented Mar 4, 2019

This is a bug fix to check BGP readiness and avoid aggressive update if during rolling update there is any issue to update a particular node.

Fix issue in BIRD readiness reporting which can result in multiple nodes restarting at once
@fasaxc
Copy link
Member

fasaxc left a comment

Just a few minor comments. I do think you should write a UT for the new logic, which will be easiest if you move it to a new utility function that doesn't do any IO of its own.

if peer.BGPState != "Established" {
if peer.BGPState == "Established" {
numEstablishedPeer += 1
} else if peer.BGPState != "Established" {

This comment has been minimized.

@fasaxc

fasaxc Mar 4, 2019

Member

The if peer.BGPState != "Established" is redundant here since the only way to reach the else block is if peer.BGPState != "Established"


// Check for GR
gr, err := bird.GRInProgress(ipv)
fmt.Printf("Number of nodes with BGP peering established = %v", numEstablishedPeer)

This comment has been minimized.

@fasaxc

fasaxc Mar 4, 2019

Member

Should use log.Debugf or log.Infof, I think; that seems to be consistent with what the rest of this code does.

This comment has been minimized.

@tathagatachowdhury

tathagatachowdhury Mar 5, 2019

Author Contributor

Acknowledged and updated.

return err
} else if gr {
return errors.New("graceful restart in progress")
return fmt.Errorf("Error in nodename file: %v", err)

This comment has been minimized.

@fasaxc

fasaxc Mar 4, 2019

Member

How about "Failed to stat nodename file"

This comment has been minimized.

@tathagatachowdhury

tathagatachowdhury Mar 5, 2019

Author Contributor

Acknowledged and updated.

}

if time.Since(nodenameFileStat.ModTime()) < thresholdTime {
if len(s) > 0 {

This comment has been minimized.

@fasaxc

fasaxc Mar 4, 2019

Member

Think this needs a comment to explain why we're doing a different check here.

// When we first start up, only report ready if all our peerings are established.
// This prevents rolling update from proceeding until BGP is back up.

This comment has been minimized.

@tathagatachowdhury

tathagatachowdhury Mar 5, 2019

Author Contributor

Acknowledged and added.

return errors.New("graceful restart in progress")
}
} else if numEstablishedPeer > 0 {
log.Debugf("There exist(s) %v calico node which has BGP peering established.", numEstablishedPeer)

This comment has been minimized.

@fasaxc

fasaxc Mar 4, 2019

Member

Same here:

// After a while, only require a single peering to be up.  This prevents the whole mesh
// from reporting not-ready if some nodes go down.

This comment has been minimized.

@tathagatachowdhury

tathagatachowdhury Mar 5, 2019

Author Contributor

Acknowledged and added.

@@ -44,6 +45,9 @@ var birdReady = flagSet.Bool("bird-ready", false, "Run BIRD readiness checks")
var bird6Ready = flagSet.Bool("bird6-ready", false, "Run BIRD6 readiness checks")
var felixReady = flagSet.Bool("felix-ready", false, "Run felix readiness checks")

// thresholdTime is introduced for bird readiness check. Default value is 30 sec.
var thresholdTime = flagSet.Duration("thresholdTime", 30*time.Second, "Threshold time for bird readiness")

This comment has been minimized.

@fasaxc

fasaxc Mar 4, 2019

Member

For consistency with other params above, "thresholdTime" should be "threshhold-time"

This comment has been minimized.

@tathagatachowdhury

tathagatachowdhury Mar 5, 2019

Author Contributor

Acknowledged and updated.

// Check for unestablished peers
peers, err := bird.GetPeers(ipv)
log.Debugf("peers: %v", peers)
if err != nil {
return err
}

// numEstablishedPeer keeps count of number of peers with bgp state established

This comment has been minimized.

@fasaxc

fasaxc Mar 4, 2019

Member

How about moving the Stat() up to this point itn he function, then you can move the rest of the function to a new utility function, which can be easily UTed.

I.e. the new utility function should take arguments for the BGP peers and the node startup time.

@fasaxc

fasaxc approved these changes Mar 5, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.