Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrieve HAProxy members using KUBE-API #52

Merged
merged 1 commit into from Apr 17, 2020

Conversation

yboaron
Copy link
Contributor

@yboaron yboaron commented Mar 22, 2020

Update HAProxy monitor to retrieve LB backends details using
KUBE-API instead of _etcd-server-ssl._tcp SRV record.

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 22, 2020
@yboaron
Copy link
Contributor Author

yboaron commented Mar 22, 2020

This PR should be tested together with openshift/machine-config-operator#1574 MCO PR

@yboaron
Copy link
Contributor Author

yboaron commented Mar 22, 2020

/cc @bcrochet @celebdor @cybertron

pkg/config/node.go Show resolved Hide resolved
pkg/config/node.go Show resolved Hide resolved
pkg/config/node.go Show resolved Hide resolved
@yboaron yboaron force-pushed the haproxy_kubeapi branch 2 times, most recently from e1c6698 to ebb07b8 Compare March 25, 2020 17:00
Copy link
Member

@cybertron cybertron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lgtm. I'm going to hold off on giving it the label so Toni has a chance to look again if he wants, but I think his comments have all been addressed.

I left a couple of comments inline, but they're nothing I would block this over.

@@ -27,7 +26,7 @@ type RuntimeConfig struct {
LBConfig *config.ApiLBConfig
}

func Monitor(clusterName, clusterDomain, templatePath, cfgPath, apiVip string, apiPort, lbPort, statPort uint16, interval time.Duration) error {
func Monitor(kubeconfigPath, clusterName, clusterDomain, templatePath, cfgPath, apiVip string, apiPort, lbPort, statPort uint16, interval time.Duration) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it kind of sucks that this has to be a public member of the package just so the cli command can call it. Technically this is an API-breaking change.

I guess it doesn't matter since I doubt anyone else calls it, but it offends the API designer in me. :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it's an API change, but we should be safe with this change since we have only internal consumers (well it's a single consumer: cmd/monitor) and we updated both consumer and provider.

log.WithFields(logrus.Fields{
"kubeconfigPath": kubeconfigPath,
}).Info("GetLBConfig failed, sleep half of interval and retry")
time.Sleep(interval / 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason for half the interval?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that haproxy-moniotr container exits (and restarted by kubelet) , the root cause was 'GetLBConfig' failed to read nodes via OCP API ( error was something like 'etcd leader changed').
So, since we implemented a de-bounce threshold in the monitor code, I think we should ignore this poll instead of exiting, and also it make sense to start the next poll faster. I set it to half the default interval, but I'm totally open for other ideas.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I understand why a retry is needed here, I just wasn't sure why we needed a shorter interval. I'm not concerned about it, but I wanted to see if I was missing something.

@celebdor
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 16, 2020
Update HAProxy monitor to retrieve LB backends details using
KUBE-API instead of  _etcd-server-ssl._tcp SRV record.
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Apr 16, 2020
@yboaron
Copy link
Contributor Author

yboaron commented Apr 16, 2020

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 16, 2020
@yboaron
Copy link
Contributor Author

yboaron commented Apr 16, 2020

/retest

@yboaron
Copy link
Contributor Author

yboaron commented Apr 16, 2020

I checked it locally for IPV4 deployment - looks OK, but tests are still red for some reason (go vet test is red although it looks fine in my env) , Hmmmm

@yboaron
Copy link
Contributor Author

yboaron commented Apr 16, 2020

I can see that tests fail also for other PRs, so I guess it isn't my fault :-)

@yboaron
Copy link
Contributor Author

yboaron commented Apr 16, 2020

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 16, 2020
@yboaron
Copy link
Contributor Author

yboaron commented Apr 17, 2020

/retest

@celebdor
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 17, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: celebdor, cybertron, yboaron

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [celebdor,cybertron,yboaron]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 5982ddb into openshift:master Apr 17, 2020
mandre added a commit to mandre/baremetal-runtimecfg that referenced this pull request Apr 20, 2020
The utils.GetEtcdSRVMembers() is no longer used after
openshift#52

Also removes the `chk_dns` script from the test keepalived config to
remove all trace of etcd SRV records in baremetal-runtimecfg.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants