Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1907872: Make dual stack bootstrapping more reliable #532

Merged
merged 1 commit into from Jan 23, 2021

Conversation

ironcladlou
Copy link
Contributor

Due to a parsing bug, the bootstrap rendering logic fails to detect a usable
machine network CIDR when using IPv6 dual stack mode unless the IPv4 CIDR is the
first element in the install config's machineNetwork array, which is brittle.

This patch fixes the parsing mistake so that the IPv4 address can be located
amongst the machineNetwork CIDRs in dual stack mode.

Although ideally etcd would detect and bind to the "preferred" family and that
family would be again located here during bootstrap CIDR detection, such a
change would be more invasive. The scope of this fix is to get dual stack
reliably working even if it means using IPv4 inconsistently during bootstrapping
when IPv6 would be more consistent.

@openshift-ci-robot openshift-ci-robot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Jan 20, 2021
@openshift-ci-robot
Copy link

@ironcladlou: This pull request references Bugzilla bug 1907872, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.7.0) matches configured target release for branch (4.7.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1907872: Make dual stack bootstrapping more reliable

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 20, 2021
clusterNetwork:
- cidr: 10.128.0.0/14
hostPrefix: 23
machineCIDR: 10.0.0.0/16
Copy link
Contributor

@marun marun Jan 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(No action required) What is the relationship between networking.machineCIDR and networking.machineNetwork[x].cidr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the former is deprecated and used only in a fallback case


for name, test := range tests {
t.Logf("evaluating test %q", name)
var installConfig map[string]interface{}
Copy link
Contributor

@marun marun Jan 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(No action required) Huh, so rendering forces us to relinquish type safety? I would have thought the struct would be vendored into o/api for reuse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The install config isn't public API, and vendoring the installer to get the type drags in a huge transitive dependency tree. In other projects I've worked on which consume the install config, I've created a local struct for the subset of installconfig I wanted to deserialize to get some type safety, we could do that here as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need for a huge transitive dependency tree. That's what go mod is for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate? Maybe I'm misunderstanding.

Here's where the InstallConfig struct is defined: https://github.com/openshift/installer/blob/master/pkg/types/installconfig.go

Since that package isn't itself a nested module, getting the type in the etcd-operator tree requires (I believe) declaring a dependency on github.com/openshift/installer (the module root). By doing that, and running go mod vendor, I run into problems immediately as various dependencies I don't want fail to resolve for one reason or another:

go: github.com/openshift/installer@v0.9.0-master.0.20210120203138-27fd27be1a03 requires
        github.com/kubevirt/terraform-provider-kubevirt@v0.0.0-00010101000000-000000000000: invalid version: git fetch -f origin refs/heads/*:refs/heads/* refs/tags/*:refs/tags/*

I haven't debugged further to get to a point where go mod graph will succeed, but even if go mod is smart enough to drag in only the dependencies rooted from that sub package in this case (not at all obvious to me), it looks like a lot (off the bat you can see that some field of InstallConfig is going to drag in terraform provider code).

Seems like a pretty deep rabbit hole to explore versus re-declaring a small subset of the struct or using interface{}.

I wish this were easier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's preventing the installer from moving the struct definition to a location that doesn't involve ancillary dependencies?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can think of several possible reasons, but a fundamental one is that (I think) the type is considered internal. And now that I've said it, I can imagine we'll be asked a question I actually can't answer yet: can we replace our usage of the internal type with a public API? I'm looking around on a bootstrap node right now to inspect rendered manifests during bootstrapping, and can't yet identify a public OpenShift config API which specifies the machine networks that come in through install config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danwinship do you have any idea if there's some API resource I'm missing we could be using instead of the install config to get the machine network CIDRs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW this is an overarching concern, shouldn't block merge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ironcladlou I don't know; team-sdn doesn't own/use machineNetwork (it's the odd man out in the networking stanza of install-config).

I'd expect it to get copied into something machine-api related, but I'm not seeing it...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still have the infra to do staging? The installer could move the install-config API to a staging directory and then let it get mirrored to an "install-api" repo for CEO (and other early-install-time operators?) to consume.

@hexfusion
Copy link
Contributor

@danwinship PTAL

@hexfusion
Copy link
Contributor

overall LGTM I would like Networking to sign off this works as expected.

/assign @danwinship

Copy link
Contributor

@danwinship danwinship left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/hold
non-urgent comments below; cancel either the lgtm or the hold

pkg/cmd/render/render_test.go Show resolved Hide resolved
pkg/cmd/render/render_test.go Outdated Show resolved Hide resolved
@openshift-ci-robot openshift-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Jan 22, 2021
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jan 22, 2021
@danwinship
Copy link
Contributor

/hold cancel
/lgtm

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jan 22, 2021
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

Due to a parsing bug, the bootstrap rendering logic fails to detect a usable
machine network CIDR when using IPv6 dual stack mode unless the IPv4 CIDR is the
first element in the install config's machineNetwork array, which is brittle.

This patch fixes the parsing mistake so that the IPv4 address can be located
amongst the machineNetwork CIDRs in dual stack mode.

Although ideally etcd would detect and bind to the "preferred" family and that
family would be again located here during bootstrap CIDR detection, such a
change would be more invasive. The scope of this fix is to get dual stack
reliably working even if it means using IPv4 inconsistently during bootstrapping
when IPv6 would be more consistent.
@ironcladlou
Copy link
Contributor Author

Sorry, I used the wrong yaml package and inadvertently changed the module definition. I fixed the import to resolve the module errors.

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jan 22, 2021
@hexfusion
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 23, 2021
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, hexfusion, ironcladlou

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [hexfusion,ironcladlou]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

7 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 59dff92 into openshift:master Jan 23, 2021
@openshift-ci-robot
Copy link

@ironcladlou: All pull requests linked via external trackers have merged:

Bugzilla bug 1907872 has been moved to the MODIFIED state.

In response to this:

Bug 1907872: Make dual stack bootstrapping more reliable

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants