Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1866347: monitoring keepalive wrong format #89

Conversation

YuviGold
Copy link
Contributor

@YuviGold YuviGold commented Aug 17, 2020

  • Receiving mac address from the monitoring configuration file in yaml.
    Deploying multiple assisted installers where the user does not provide the VIP addresses with the same cluster name ends up with them competing for the same DHCP leased address and conflicting.
    The format should be as follows:
- name: api
  mac-address: ae:3e:38:f5:f8:15
  ip-address: 1.2.3.4
- name: ingress
  mac-address: 80:32:53:4f:cf:d6
  ip-address: 1.2.3.5
  • Watching the lease file with an inotify implementation https://github.com/fsnotify/fsnotify
    In order to verify the dhclient made a renew for the lease instead of creating a new one.

@cybertron
Copy link
Member

Can you provide more details in the description on why these changes are being made?

@YuviGold
Copy link
Contributor Author

YuviGold commented Aug 17, 2020

Can you provide more details in the description on why these changes are being made?

@cybertron Added. Full description can be found in Bugzilla

@romfreiman
Copy link

I think it worth mentioning in the PR the format of the expected file

pkg/monitor/lease.go Outdated Show resolved Hide resolved
cmd := exec.Command("dhclient", "-d", "--no-pid", "-sf", "/bin/true",
"-lf", getLeaseFile(cfgPath, name), "-v", iface.Name, "-H", name)
"-lf", lease_file, "-v", iface.Name, "-H", name)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are watching the file, but we will not know if the dhclient terminated for some reason. Shouldn't we monitor the process as well and maybe restart if it terminates?

pkg/monitor/lease.go Outdated Show resolved Hide resolved
@cybertron
Copy link
Member

/retitle Bug 1866347: monitoring keepalive wrong format

@openshift-ci-robot openshift-ci-robot changed the title Bugfix/ocpbugsm 13811 monitoring keepalive wrong format Bug 1866347: monitoring keepalive wrong format Aug 18, 2020
@openshift-ci-robot openshift-ci-robot added the bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. label Aug 18, 2020
@openshift-ci-robot
Copy link
Contributor

@YuviGold: This pull request references Bugzilla bug 1866347, which is invalid:

  • expected the bug to target the "4.6.0" release, but it targets "4.7.0" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1866347: monitoring keepalive wrong format

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Aug 18, 2020
@cybertron
Copy link
Member

Is this even a valid use case? Duplicate cluster names in a single DHCP domain are going to cause other issues besides this. Shouldn't we avoid that in the first place?

@YuviGold YuviGold force-pushed the bugfix/OCPBUGSM-13811-monitoring-keepalive-wrong-format branch from f89ddd4 to 870cd2e Compare August 19, 2020 10:06
@YuviGold
Copy link
Contributor Author

YuviGold commented Aug 19, 2020

Is this even a valid use case? Duplicate cluster names in a single DHCP domain are going to cause other issues besides this. Shouldn't we avoid that in the first place?

@cybertron
In assisted-service - Users can create multiple clusters, and we do not know which of them will be connected to the same DHCP server, it will be difficult to make sure that we don't have duplicate names. Hence we use the clusterID.

@YuviGold YuviGold force-pushed the bugfix/OCPBUGSM-13811-monitoring-keepalive-wrong-format branch 3 times, most recently from 24c7c93 to 97aeb2b Compare August 19, 2020 12:19
@bcrochet
Copy link
Member

Did all of the new vendoring really need to happen? Is that central to the fix?

@cybertron
Copy link
Member

/retest

Failed on an infra issue.

Copy link
Member

@cybertron cybertron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, if I'm understanding this correctly then there are two primary changes here:
-Read a passed in MAC instead of generating one. I guess this pushes the responsibility for generating the MAC to someone else, so fair enough.
-Parse the lease file to determine whether there is a conflict with another cluster. I guess this is fine too. I'm still not convinced two clusters with the same name on the same network is going to work, but that's someone else's problem. :-)

I do see a few issues which I've left comments on inline. The log ones are mostly nitpicks that could be addressed in followups, but I'd like fixes or explanations for the others.

pkg/monitor/lease.go Show resolved Hide resolved
pkg/monitor/lease.go Outdated Show resolved Hide resolved
}

func leaseVIPs(cfgPath string, clusterName string, vips []VIP) error {
func LeaseVIPs(log logrus.FieldLogger, cfgPath string, clusterName string, vipMasterIface string, vips []vip, runInfinite bool) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we passing a logger here? Can't this use the same one as the rest of the package?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can use and it actually uses the same one when it is called from dynkeepalived.go

if err = LeaseVIPs(log, cfgPath, vipIface.Name, []vip{*vips.APIVip, *vips.IngressVip}, true); err != nil {

We want to use the leasing functions from the monitor package in the assisted-service as well.
You define your own log in the monitor package via var log = logrus.New().
In case another package with their own logger wants to use it (or even for it to be manipulated for the tests) it would be impossible without giving a log parameter.
Hence a log parameter is more flexible

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, when adding a new use case for code like this it would be a good thing to highlight in the PR description. That's important context for reviewers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I would add new parameters to the end of the signature, not the beginning. Also, I'm not a huge fan of bare booleans. There is no context at the caller as to what a magic "true" or "false" might mean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, when adding a new use case for code like this it would be a good thing to highlight in the PR description. That's important context for reviewers.

@cybertron agreed

Personally, I would add new parameters to the end of the signature, not the beginning. Also, I'm not a huge fan of bare booleans. There is no context at the caller as to what a magic "true" or "false" might mean.

@bcrochet Seems fair. The boolean part has been removed anyway. But how would you implement that? Enums?

// --no-pid in order to allow running multiple dhclients
commandArgs := []string{"-v", iface.Name, "-H", name,
"-sf", "/bin/true", "-lf", leaseFile, "-d",
"--no-pid", "-pf", fmt.Sprintf("/var/run/dhclient.%s.pid", iface.Name)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't make sense to pass both --no-pid and -pf. I'd be surprised if this does what you want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • --no-pid in order to allow running multiple dhclient simultaneously
  • -pf allow killing the process

Couldn't make it work without them both

pkg/monitor/lease.go Outdated Show resolved Hide resolved
pkg/monitor/lease.go Outdated Show resolved Hide resolved
write := make(chan error)
defer close(write)

RunFiniteWatcher(log, watcher, leaseFile, iface.Name, ip, write)
Copy link

@ori-amizur ori-amizur Aug 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if you run with -1 flag, then it will run once, and then it will terminate. After that the lease file can be checked. Seems simpler flow. No need for goroutine, channel , and inotify usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be much simpler but unfortunately couldn't make it work

@YuviGold YuviGold force-pushed the bugfix/OCPBUGSM-13811-monitoring-keepalive-wrong-format branch from 97aeb2b to f036ccf Compare August 20, 2020 12:35
func CreateFileWatcher(log logrus.FieldLogger, fileName string) (*fsnotify.Watcher, error) {
watcher, err := fsnotify.NewWatcher()
if err != nil {
log.WithError(err).Error("Failed to add a create a new watcher")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think "add a" should be removed from this log message.

}

func leaseVIPs(cfgPath string, clusterName string, vips []VIP) error {
func LeaseVIPs(log logrus.FieldLogger, cfgPath string, clusterName string, vipMasterIface string, vips []vip, runInfinite bool) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, when adding a new use case for code like this it would be a good thing to highlight in the PR description. That's important context for reviewers.

Copy link
Member

@cybertron cybertron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

One nit on a log message inline, but that can be fixed in a followup.

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 20, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cybertron, YuviGold

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 20, 2020
@cybertron
Copy link
Member

/bugzilla refresh

@openshift-ci-robot openshift-ci-robot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Aug 20, 2020
@openshift-ci-robot
Copy link
Contributor

@cybertron: This pull request references Bugzilla bug 1866347, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit b2b74d7 into openshift:master Aug 20, 2020
@openshift-ci-robot
Copy link
Contributor

@YuviGold: All pull requests linked via external trackers have merged: openshift/baremetal-runtimecfg#89. Bugzilla bug 1866347 has been moved to the MODIFIED state.

In response to this:

Bug 1866347: monitoring keepalive wrong format

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@YuviGold YuviGold deleted the bugfix/OCPBUGSM-13811-monitoring-keepalive-wrong-format branch August 21, 2020 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants