Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1847705: Stop forcing dhclient in baremetal and friends #1840

Merged
merged 1 commit into from Jul 7, 2020

Conversation

cybertron
Copy link
Member

In 4.6 the RHCOS image no longer has dhclient. In order to maintain
the same behavior for DHCPv6 it is also necessary to set the
ipv6.dhcp-iaid option to mac (which is what dhclient always uses).

Note that I applied the same config for ovirt even though it was
only setting dhcp=dhclient before. I think making all of these
configs consistent is the right thing to do, even if a platform
doesn't care about IPv6.

Also, there was previously a dhclient.conf file that overrode the
search domain. Since we moved the resolv prepender to a dispatcher
script I don't think we need that setting anymore. The prepender
script already overrides the search domain. It looks like dhclient.conf
was also being used to prepend the DNS VIP on at least one platform.
I've still removed that file because I believe it has no effect on
the 4.6 images so if that breaks anything it will need to be fixed
separately anyway.

- What I did

- How to verify it

- Description for the changelog

@openshift-ci-robot openshift-ci-robot added bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Jun 17, 2020
@openshift-ci-robot
Copy link
Contributor

@cybertron: This pull request references Bugzilla bug 1847705, which is valid. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1847705: Stop forcing dhclient in baremetal and friends

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cybertron
Copy link
Member Author

Note that my baremetal deployments are still failing for other reasons, but this seems to have gotten DHCP working correctly.

/cc @mandre @rgolangh @patrickdillon

I doubt this will cause any new breakage since dhclient is not on the 4.6 images anyway, but please verify that this won't mess up your platforms.

@cgwalters
Copy link
Member

Hmm. But there's something tricky going on here because the Ignition here can only affect the real root. So what's happening in the initramfs isn't being configured here.

If this somehow fixes things then we have a drift between the initramfs and the real root.

BTW this is tangentially related to coreos/fedora-coreos-tracker#513 I think.

@cgwalters
Copy link
Member

Did you test this with the current scenario of pivoting from 4.5, or with openshift/installer#3763 ?

@cybertron
Copy link
Member Author

Yeah, this still gets the wrong address initially, but at least it gets the right one in the end so the deployment works. I'm not entirely sure that's a change in behavior since previously we'd have had the same issue with the duid. I'll have to look at a 4.5 deployment to see how that behaved.

So far I've only tested with the installer patch to use the 4.6 image. I can try it without, although we might be able to tell how it worked from the ci job. It won't pass, but it might get far enough to see how DHCP worked.

@hardys
Copy link
Contributor

hardys commented Jun 18, 2020

Tested locally and I can confirm this resolves the DHCP issue we were seeing with dev-scripts and ipv6

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 18, 2020
@hardys
Copy link
Contributor

hardys commented Jun 18, 2020

Did you test this with the current scenario of pivoting from 4.5, or with openshift/installer#3763 ?

I just tested both with and without openshift/installer#3763 - in both cases the masters now get their expected DHCPv6 reservations so I think this part of the fix is working OK.

@hardys
Copy link
Contributor

hardys commented Jun 18, 2020

Also looking at the e2e-metal-ipi CI output (libvirt serial console output for each master VM we see:

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_machine-config-operator/1840/pull-ci-openshift-machine-config-operator-master-e2e-metal-ipi/1273351120147714048/artifacts/e2e-metal-ipi/baremetalds-devscripts-gather/

$ cat ostest_master_0-serial0.log | grep ^enp2s0 | uniq
enp2s0: fd2e:6f44:5dd8:c956::14
enp2s0: fd2e:6f44:5dd8:c956::4
$ cat ostest_master_1-serial0.log | grep ^enp2s0 | uniq
enp2s0: fd2e:6f44:5dd8:c956::15
$ cat ostest_master_2-serial0.log | grep ^enp2s0 | uniq
enp2s0: fd2e:6f44:5dd8:c956::16

This again shows the expected IPs, although I'm a little surprised to see the ingressVIP show up on master-0 - this is a deployment with 2 workers so the ingress VIP should only ever point to workers I think? /cc @yboaron @celebdor - that's unrelated to this fix though, it may be a separate issue we need to address.

@stbenjam
Copy link
Member

This again shows the expected IPs, although I'm a little surprised to see the ingressVIP show up on master-0 - this is a deployment with 2 workers so the ingress VIP should only ever point to workers I think? /cc @yboaron @celebdor - that's unrelated to this fix though, it may be a separate issue we need to address.

Did the build you're using have #1817? For a long time baremetal masters have been scheduable regardless of # of workers due to the old kubelet hack still being there.

@hardys
Copy link
Contributor

hardys commented Jun 18, 2020

This again shows the expected IPs, although I'm a little surprised to see the ingressVIP show up on master-0 - this is a deployment with 2 workers so the ingress VIP should only ever point to workers I think? /cc @yboaron @celebdor - that's unrelated to this fix though, it may be a separate issue we need to address.

Did the build you're using have #1817? For a long time baremetal masters have been scheduable regardless of # of workers due to the old kubelet hack still being there.

Yes I just fetched this PR branch which includes that commit, so I guess there must still be something that causes the ingressVIP to end up on a master - not yet looked into it in detail

Edit actually the results were from the CI run - there it's NUM_WORKERS=2 also AFAICS

@celebdor
Copy link
Contributor

This again shows the expected IPs, although I'm a little surprised to see the ingressVIP show up on master-0 - this is a deployment with 2 workers so the ingress VIP should only ever point to workers I think? /cc @yboaron @celebdor - that's unrelated to this fix though, it may be a separate issue we need to address.

Did the build you're using have #1817? For a long time baremetal masters have been scheduable regardless of # of workers due to the old kubelet hack still being there.

Yes I just fetched this PR branch which includes that commit, so I guess there must still be something that causes the ingressVIP to end up on a master - not yet looked into it in detail

Edit actually the results were from the CI run - there it's NUM_WORKERS=2 also AFAICS

Was ingress responsive?

@celebdor
Copy link
Contributor

/lgtm

In 4.6 the RHCOS image no longer has dhclient. In order to maintain
the same behavior for DHCPv6 it is also necessary to set the
ipv6.dhcp-iaid option to mac (which is what dhclient always uses).

Note that I applied the same config for ovirt even though it was
only setting dhcp=dhclient before. I think making all of these
configs consistent is the right thing to do, even if a platform
doesn't care about IPv6.

Also, there was previously a dhclient.conf file that overrode the
search domain. Since we moved the resolv prepender to a dispatcher
script I don't think we need that setting anymore. The prepender
script already overrides the search domain. It looks like dhclient.conf
was also being used to prepend the DNS VIP on at least one platform.
I've still removed that file because I believe it has no effect on
the 4.6 images so if that breaks anything it will need to be fixed
separately anyway.
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jun 18, 2020
@cybertron
Copy link
Member Author

Just rebased. Hopefully this will pass metal ci now.

@hardys
Copy link
Contributor

hardys commented Jun 18, 2020

/test e2e-metal-ipi

@stbenjam
Copy link
Member

Is metal-ipi expected to pass? I thought we still had OVN issues

@cybertron
Copy link
Member Author

#1830 was the fix for the OVN issue. The rebase was to pull that in so it should pass now.

@hardys
Copy link
Contributor

hardys commented Jun 19, 2020

/lgtm

@runcom
Copy link
Member

runcom commented Jul 6, 2020

/approve

1 similar comment
@kikisdeliveryservice
Copy link
Contributor

/approve

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: celebdor, cybertron, hardys, kikisdeliveryservice, runcom

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [kikisdeliveryservice,runcom]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 6, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

21 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 7, 2020

@cybertron: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-ovn-step-registry e418b1f link /test e2e-ovn-step-registry
ci/prow/e2e-aws-scaleup-rhel7 e418b1f link /test e2e-aws-scaleup-rhel7
ci/prow/e2e-vsphere e418b1f link /test e2e-vsphere

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit aabf4ed into openshift:master Jul 7, 2020
@openshift-ci-robot
Copy link
Contributor

@cybertron: All pull requests linked via external trackers have merged: openshift/machine-config-operator#1840, openshift/installer#3763. Bugzilla bug 1847705 has been moved to the MODIFIED state.

In response to this:

Bug 1847705: Stop forcing dhclient in baremetal and friends

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet