Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd: Add initial support for an IPv6 control plane #1211

Merged
merged 4 commits into from Dec 20, 2019

Conversation

russellb
Copy link
Member

@russellb russellb commented Oct 23, 2019

I'm working on bringing OpenShift up with an IPv6 control plane. With these changes, the etcd cluster forms and the bootstrap process continues. There are still some suspicious messages in the etcd-member log, but this seems like a good start.

The changes should have no impact to other installs unless the etcd DNS records are created with IPv6 addresses instead of IPv4. Otherwise, there should be no changes in the resulting behavior.

This is not intended to be complete IPv6 support, just a start. There's an IPV4_ADDRESS variable I left unchanged to cut down on the size of the patch. I also noticed similar issues still need to be fixed in the recovery tools.

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Oct 23, 2019
@ashcrow ashcrow requested review from runcom and removed request for ashcrow October 23, 2019 20:40
@russellb
Copy link
Member Author

russellb commented Nov 7, 2019

This has been rebased.

@runcom any thoughts on this? I think it should be safe. There shouldn't be any changes in behavior in the normal case.

@hexfusion
Copy link
Contributor

I think we need to be careful about IPv6 until we get kube 1.16.3 there is a known bug[1] that we have resolved in etcd via 3.3.17[2] but won't get into the client (apiserver) until 1.16.3. I believe this is a hole in CI as AWS and GCP use ipv4 to my knowledge.

[1] kubernetes/kubernetes#83550
[2] https://github.com/etcd-io/etcd/pull/11211/files#diff-e51eb0fcc3e32b5460ba6ed83b7399dbL235

@russellb
Copy link
Member Author

russellb commented Nov 7, 2019

I think we need to be careful about IPv6 until we get kube 1.16.3 there is a known bug[1] that we have resolved in etcd via 3.3.17[2] but won't get into the client (apiserver) until 1.16.3. I believe this is a hole in CI as AWS and GCP use ipv4 to my knowledge.

[1] kubernetes/kubernetes#83550
[2] https://github.com/etcd-io/etcd/pull/11211/files#diff-e51eb0fcc3e32b5460ba6ed83b7399dbL235

Thanks for the pointer to the bug. I haven't hit that one (yet?).

re: CI, indeed, there's no IPv6 CI yet. That's on the todo list as soon as I can get enough of the cluster up and running. I've an IPv6 cluster coming up partially on AWS with a bunch of changes (including the ones in this PR). The etcd cluster forms using IPv6. Right now I'm working through cluster SDN issues. We'll have CI soon to help exercise this further.

@hexfusion
Copy link
Contributor

hexfusion commented Nov 7, 2019

FTR 1.16.3 should land next week[1] so this won't be a prolonged issue for supporting IPv6.

[1] kubernetes/kubernetes#83550 (comment)

@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 15, 2019
@cgwalters
Copy link
Member

This...seems safer to do in 4.4 to me. But I know that's annoying as it makes development now harder.

@runcom
Copy link
Member

runcom commented Nov 15, 2019

This...seems safer to do in 4.4 to me. But I know that's annoying as it makes development now harder.

I'm agreeing with this actually, we cannot take this PR anyway at this point tho. It looks pretty safeto me tho so when 4.4 opens it can go in

@kikisdeliveryservice
Copy link
Contributor

/skip

@eparis
Copy link
Member

eparis commented Dec 3, 2019

/retest

@kikisdeliveryservice
Copy link
Contributor

@hexfusion @alaypatel07 PTAL

@kikisdeliveryservice
Copy link
Contributor

/skip

@hexfusion
Copy link
Contributor

I picked 3.3.17 client into apiserver so I believe we are good here as 1.16.3 won't actually land in 4.3.

@hexfusion
Copy link
Contributor

/hold

etcd-team would like a little time to manually test this and review. We are moving on this now.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 3, 2019
@hexfusion
Copy link
Contributor

cc @alaypatel07 @retroflexer

@@ -22,6 +22,10 @@ contents: |
--container-runtime=remote \
--container-runtime-endpoint=/var/run/crio/crio.sock \
--node-labels=node-role.kubernetes.io/master,node.openshift.io/os_id=${ID} \
{{- if .KubeletIPv6}}
--node-ip :: \
--address :: \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/

--address 0.0.0.0
The  IP address for the Kubelet to serve on (set to 0.0.0.0 for all IPv4  interfaces and `::` for all IPv6 interfaces) (default 0.0.0.0)  (DEPRECATED: This parameter should be set via the config file specified  by the Kubelet's --config flag. See  https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/  for more information.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out that this was deprecated. I missed that.

@@ -22,6 +22,10 @@ contents: |
--container-runtime=remote \
--container-runtime-endpoint=/var/run/crio/crio.sock \
--node-labels=node-role.kubernetes.io/master,node.openshift.io/os_id=${ID} \
{{- if .KubeletIPv6}}
--node-ip :: \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/

--node-ip string
IP address of the node. If set, kubelet will use this IP address for the node

What does it mean to set the node-ip to :: ?? will this break the apiserver to node communication?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It requires a not-yet-merged upstream patch to kubelet. https://github.com/kubernetes/kubernetes/pull/85850/files

@hexfusion
Copy link
Contributor

@russellb general question, how can setup-etcd-env know isSingleStackIPv6? We need to be able to make that distinction before the control plane is up. Any ideas?

Copy link
Contributor

@alaypatel07 alaypatel07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@russellb Added comments inline. For context, we added changes to cluster-etcd-operator related to ipv6 recently, if possible can you change the naming convention of the env variables to something like this? openshift/cluster-etcd-operator@57825f6 this will help the etcd team in reading the code. Thanks

"WILDCARD_DNS_NAME": fmt.Sprintf("*.%s", setupEnv.opts.discoverySRV),
// TODO This can actually be IPv6, so we should rename this ...
"IPV4_ADDRESS": setupEnv.etcdIP,
"ESCAPED_IP_ADDR": escapedIP,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ESCAPED_IP_ADDR/IP_ADDRESS

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made it ESCAPED_IP_ADDRESS

// TODO This can actually be IPv6, so we should rename this ...
"IPV4_ADDRESS": setupEnv.etcdIP,
"ESCAPED_IP_ADDR": escapedIP,
"ESCAPED_ALL_IPS": escapedAllIPs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would really help readability if instead of ESCAPED_ALL_IPS we have LISTEN_CLIENT_URLS and LISTEN_PEER_URLS

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'd need LISTEN_CLIENT_URLS, LISETN_PEER_URLS, LISTEN_METRIC_URLS, and METRICS_ADDR. sure you want 4 variables instead of this 1?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@russellb thanks for pointing out, I thought it would make the naming more explicit but having 4 variables instead of 1 would be an overdo. Feel free to mark it resolved.

"ESCAPED_IP_ADDR": escapedIP,
"ESCAPED_ALL_IPS": escapedAllIPs,
"LOCALHOST_IP": localhostIP,
"ESCAPED_LOCALHOST_IP": escapedLocalhostIP,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confused why we need both, is it possible to conditionally set LOCALHOST_IP and conditionally escape it if ipv6?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kind of annoying, because we only want to use it in its escaped form in some cases (in a URL), but not when it's provided as a raw IP, that's why it's there twice

@russellb
Copy link
Member Author

@russellb Added comments inline. For context, we added changes to cluster-etcd-operator related to ipv6 recently, if possible can you change the naming convention of the env variables to something like this? openshift/cluster-etcd-operator@57825f6 this will help the etcd team in reading the code. Thanks

took a look at the PR and left a comment on one issue I spotted.

If you don't like the ESCAPED versions of variables in the PR, I could move that logic into shell code instead, I suppose.

@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 19, 2019
@russellb
Copy link
Member Author

I've updated this PR to drop the kubelet config changes, which depended on a kubelet change to land first. We can address that with a follow-up PR. @danwinship

I think this should be fine to merge now ...

@kikisdeliveryservice
Copy link
Contributor

@alaypatel07 @hexfusion PTAL

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Dec 19, 2019

@russellb: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-vsphere 57589d7968e8233705084f504e2e1e266fd79e7b link /test e2e-vsphere

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Contributor

@alaypatel07 alaypatel07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@russellb sorry for the delay in getting back and thanks for the PR, the changes look fine to me!

@retroflexer
Copy link
Contributor

I reviewed the changes and looks good to me.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 20, 2019
Copy link
Contributor

@kikisdeliveryservice kikisdeliveryservice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ok to me.

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 20, 2019
@kikisdeliveryservice
Copy link
Contributor

@alaypatel07 @retroflexer please feel free to remove @hexfusion 's hold. this is ready to go.

@kikisdeliveryservice
Copy link
Contributor

To expedite, I'll just remove the hold myself. :)

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 20, 2019
@kikisdeliveryservice
Copy link
Contributor

not sure why tide says it still needs and approved label when it clearly has one? it's kind of wonky lately.

/approve

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kikisdeliveryservice, retroflexer, russellb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [kikisdeliveryservice]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 66ad271 into openshift:master Dec 20, 2019
@russellb
Copy link
Member Author

/cherry-pick release-4.3

@openshift-cherrypick-robot

@russellb: #1211 failed to apply on top of branch "release-4.3":

Applying: etcd: Use IPv6 IP addresses
error: Failed to merge in the changes.
Using index info to reconstruct a base tree...
M	cmd/setup-etcd-environment/run.go
M	pkg/controller/template/render.go
M	templates/master/00-master/_base/files/etc-kubernetes-manifests-etcd-member.yaml
Falling back to patching base and 3-way merge...
Auto-merging templates/master/00-master/_base/files/etc-kubernetes-manifests-etcd-member.yaml
Auto-merging pkg/controller/template/render.go
CONFLICT (content): Merge conflict in pkg/controller/template/render.go
Auto-merging cmd/setup-etcd-environment/run.go
CONFLICT (content): Merge conflict in cmd/setup-etcd-environment/run.go
Patch failed at 0002 etcd: Use IPv6 IP addresses

In response to this:

/cherry-pick release-4.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@russellb
Copy link
Member Author

well, not a clean cherry-pick, but I already have a 4.3 version of these changes in this branch: https://github.com/openshift-kni/machine-config-operator/commits/4.3-ipv6

russellb added a commit to russellb/machine-config-operator that referenced this pull request Dec 20, 2019
yazug pushed a commit to openshift-kni/machine-config-operator that referenced this pull request Jan 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet