New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retrieve HAProxy members using KUBE-API #52
Retrieve HAProxy members using KUBE-API #52
Conversation
This PR should be tested together with openshift/machine-config-operator#1574 MCO PR |
e1c6698
to
ebb07b8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lgtm. I'm going to hold off on giving it the label so Toni has a chance to look again if he wants, but I think his comments have all been addressed.
I left a couple of comments inline, but they're nothing I would block this over.
@@ -27,7 +26,7 @@ type RuntimeConfig struct { | |||
LBConfig *config.ApiLBConfig | |||
} | |||
|
|||
func Monitor(clusterName, clusterDomain, templatePath, cfgPath, apiVip string, apiPort, lbPort, statPort uint16, interval time.Duration) error { | |||
func Monitor(kubeconfigPath, clusterName, clusterDomain, templatePath, cfgPath, apiVip string, apiPort, lbPort, statPort uint16, interval time.Duration) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it kind of sucks that this has to be a public member of the package just so the cli command can call it. Technically this is an API-breaking change.
I guess it doesn't matter since I doubt anyone else calls it, but it offends the API designer in me. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, it's an API change, but we should be safe with this change since we have only internal consumers (well it's a single consumer: cmd/monitor) and we updated both consumer and provider.
log.WithFields(logrus.Fields{ | ||
"kubeconfigPath": kubeconfigPath, | ||
}).Info("GetLBConfig failed, sleep half of interval and retry") | ||
time.Sleep(interval / 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason for half the interval?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that haproxy-moniotr container exits (and restarted by kubelet) , the root cause was 'GetLBConfig' failed to read nodes via OCP API ( error was something like 'etcd leader changed').
So, since we implemented a de-bounce threshold in the monitor code, I think we should ignore this poll instead of exiting, and also it make sense to start the next poll faster. I set it to half the default interval, but I'm totally open for other ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I understand why a retry is needed here, I just wasn't sure why we needed a shorter interval. I'm not concerned about it, but I wanted to see if I was missing something.
/lgtm |
Update HAProxy monitor to retrieve LB backends details using KUBE-API instead of _etcd-server-ssl._tcp SRV record.
ebb07b8
to
d8dfe19
Compare
/hold |
/retest |
I checked it locally for IPV4 deployment - looks OK, but tests are still red for some reason (go vet test is red although it looks fine in my env) , Hmmmm |
I can see that tests fail also for other PRs, so I guess it isn't my fault :-) |
/hold cancel |
/retest |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: celebdor, cybertron, yboaron The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The utils.GetEtcdSRVMembers() is no longer used after openshift#52 Also removes the `chk_dns` script from the test keepalived config to remove all trace of etcd SRV records in baremetal-runtimecfg.
Update HAProxy monitor to retrieve LB backends details using
KUBE-API instead of _etcd-server-ssl._tcp SRV record.