New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1907872: Make dual stack bootstrapping more reliable #532
Conversation
@ironcladlou: This pull request references Bugzilla bug 1907872, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
clusterNetwork: | ||
- cidr: 10.128.0.0/14 | ||
hostPrefix: 23 | ||
machineCIDR: 10.0.0.0/16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(No action required) What is the relationship between networking.machineCIDR
and networking.machineNetwork[x].cidr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the former is deprecated and used only in a fallback case
pkg/cmd/render/render_test.go
Outdated
|
||
for name, test := range tests { | ||
t.Logf("evaluating test %q", name) | ||
var installConfig map[string]interface{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(No action required) Huh, so rendering forces us to relinquish type safety? I would have thought the struct would be vendored into o/api for reuse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The install config isn't public API, and vendoring the installer to get the type drags in a huge transitive dependency tree. In other projects I've worked on which consume the install config, I've created a local struct for the subset of installconfig I wanted to deserialize to get some type safety, we could do that here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need for a huge transitive dependency tree. That's what go mod is for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate? Maybe I'm misunderstanding.
Here's where the InstallConfig
struct is defined: https://github.com/openshift/installer/blob/master/pkg/types/installconfig.go
Since that package isn't itself a nested module, getting the type in the etcd-operator tree requires (I believe) declaring a dependency on github.com/openshift/installer
(the module root). By doing that, and running go mod vendor
, I run into problems immediately as various dependencies I don't want fail to resolve for one reason or another:
go: github.com/openshift/installer@v0.9.0-master.0.20210120203138-27fd27be1a03 requires
github.com/kubevirt/terraform-provider-kubevirt@v0.0.0-00010101000000-000000000000: invalid version: git fetch -f origin refs/heads/*:refs/heads/* refs/tags/*:refs/tags/*
I haven't debugged further to get to a point where go mod graph
will succeed, but even if go mod is smart enough to drag in only the dependencies rooted from that sub package in this case (not at all obvious to me), it looks like a lot (off the bat you can see that some field of InstallConfig is going to drag in terraform provider code).
Seems like a pretty deep rabbit hole to explore versus re-declaring a small subset of the struct or using interface{}
.
I wish this were easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's preventing the installer from moving the struct definition to a location that doesn't involve ancillary dependencies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can think of several possible reasons, but a fundamental one is that (I think) the type is considered internal. And now that I've said it, I can imagine we'll be asked a question I actually can't answer yet: can we replace our usage of the internal type with a public API? I'm looking around on a bootstrap node right now to inspect rendered manifests during bootstrapping, and can't yet identify a public OpenShift config API which specifies the machine networks that come in through install config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danwinship do you have any idea if there's some API resource I'm missing we could be using instead of the install config to get the machine network CIDRs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW this is an overarching concern, shouldn't block merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ironcladlou I don't know; team-sdn doesn't own/use machineNetwork
(it's the odd man out in the networking
stanza of install-config).
I'd expect it to get copied into something machine-api related, but I'm not seeing it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still have the infra to do staging? The installer could move the install-config API to a staging directory and then let it get mirrored to an "install-api" repo for CEO (and other early-install-time operators?) to consume.
@danwinship PTAL |
overall LGTM I would like Networking to sign off this works as expected. /assign @danwinship |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/hold
non-urgent comments below; cancel either the lgtm or the hold
52ba43c
to
107e25e
Compare
/hold cancel |
/retest Please review the full test history for this PR and help us cut down flakes. |
2 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
Due to a parsing bug, the bootstrap rendering logic fails to detect a usable machine network CIDR when using IPv6 dual stack mode unless the IPv4 CIDR is the first element in the install config's machineNetwork array, which is brittle. This patch fixes the parsing mistake so that the IPv4 address can be located amongst the machineNetwork CIDRs in dual stack mode. Although ideally etcd would detect and bind to the "preferred" family and that family would be again located here during bootstrap CIDR detection, such a change would be more invasive. The scope of this fix is to get dual stack reliably working even if it means using IPv4 inconsistently during bootstrapping when IPv6 would be more consistent.
107e25e
to
7a0f166
Compare
Sorry, I used the wrong yaml package and inadvertently changed the module definition. I fixed the import to resolve the module errors. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danwinship, hexfusion, ironcladlou The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Please review the full test history for this PR and help us cut down flakes. |
7 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
@ironcladlou: All pull requests linked via external trackers have merged: Bugzilla bug 1907872 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Due to a parsing bug, the bootstrap rendering logic fails to detect a usable
machine network CIDR when using IPv6 dual stack mode unless the IPv4 CIDR is the
first element in the install config's machineNetwork array, which is brittle.
This patch fixes the parsing mistake so that the IPv4 address can be located
amongst the machineNetwork CIDRs in dual stack mode.
Although ideally etcd would detect and bind to the "preferred" family and that
family would be again located here during bootstrap CIDR detection, such a
change would be more invasive. The scope of this fix is to get dual stack
reliably working even if it means using IPv4 inconsistently during bootstrapping
when IPv6 would be more consistent.