Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet: get IP based on service network IP mode for dual-stack support. #70659

Closed
wants to merge 1 commit into from

Conversation

pmichali
Copy link
Contributor

@pmichali pmichali commented Nov 5, 2018

What type of PR is this?
/kind feature

What this PR does / why we need it:
Allows kubelet to obtain the correct IP from pods, when operating in dual-stack mode.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #70653

Special notes for your reviewer:
Intent of this is to allow IPv4 only and IPv6 only to work as they do today, but adapt kubelet for when running in dual-stack cluster.

Does this PR introduce a user-facing change?:
NONE

/area ipv6
/sig network

@k8s-ci-robot
Copy link
Contributor

@pmichali: Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. kind/feature Categorizes issue or PR as related to a new feature. area/ipv6 sig/network Categorizes an issue or PR as relevant to SIG Network. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 5, 2018
@k8s-ci-robot
Copy link
Contributor

Hi @pmichali. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added area/kubeadm area/kubelet sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Nov 5, 2018
@pmichali
Copy link
Contributor Author

pmichali commented Nov 5, 2018

Commit message for this change...

This is WIP as it relies on bug fix 70633 that fixes reggression in etcd. Once
that commit merges, this can be rebased.

In GetPodIP, the kubelet code attempts to get the IPv4 address for the
pod, and if that fails, tries to get the IPv6 address.  That works fine
for IPv4 only and IPv6 only mode, but not for dual-stack, where each
pod will have both addresses.

In addition, since dual-stack will support only a single (selectable)
family for services, we need to also ensure that kubelet is using the
same family, when getting the IP. One way to do that is to request
the IP, based on the family used for services.

With the introduction of IPv6 only mode, a DNS_SVC_IP environment
variable was defined, so that DNS used an IP that was in the family
for services. The variable can be provided to the kubelet, via a
drop-in file, as is done for kubeadm-dind-cluster.

This change makes use of that information, reading the variable, and,
if set, will use the same family as the DNS IP in the request to obtain
the pod IP. Otherwise, it will fall back on trying to get the IPv4
address, and then trying to get the IPv6 address, if no IPv4 address
is available (to support backward compatibility).

@pmichali
Copy link
Contributor Author

pmichali commented Nov 5, 2018

NOTE: this is WIP because regression was found in etcd for IPv6 clusters and this commit includes the cherry picked fix under PR 70633. Once that is merged, a rebase can be done for this commit.

Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmichali thank you for working on this change.
added some minor comments.

this would need a release note instead of NONE.
@kubernetes/sig-cluster-lifecycle-pr-reviews
/priority important-longterm

if err != nil {
return nil, err
addrType := preferredFamily()
if addrType != "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small whitespace issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will address when I rebase.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this whitespace issue seems to be present still.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed it. Had spaces vs tab, so it looked fine in my editor. Will be in next patch.

@@ -90,7 +92,7 @@ func CreateStackedEtcdStaticPodManifestFile(client clientset.Interface, manifest
}

// notifies the other members of the etcd cluster about the joining member
etcdPeerAddress := fmt.Sprintf("https://%s:%d", cfg.APIEndpoint.AdvertiseAddress, kubeadmconstants.EtcdListenPeerPort)
etcdPeerAddress := fmt.Sprintf("https://%s", net.JoinHostPort(cfg.APIEndpoint.AdvertiseAddress, strconv.Itoa(kubeadmconstants.EtcdListenPeerPort)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the following change are part of the PR 70633 commit (a fix to a regression) that this PR needs. Mention has been made on that PR to add UT changes. When it is merged, I'll rebase to pickup the latest.

@@ -282,7 +284,7 @@ func performEtcdStaticPodUpgrade(client clientset.Interface, waiter apiclient.Wa
if err != nil {
return true, errors.Wrap(err, "failed to retrieve the current etcd version")
}
currentEtcdVersionStr, ok := currentEtcdVersions[fmt.Sprintf("https://%s:%d", cfg.APIEndpoint.AdvertiseAddress, constants.EtcdListenClientPort)]
currentEtcdVersionStr, ok := currentEtcdVersions[fmt.Sprintf("https://%s", net.JoinHostPort(cfg.APIEndpoint.AdvertiseAddress, strconv.Itoa(constants.EtcdListenClientPort)))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above.

@k8s-ci-robot k8s-ci-robot added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Nov 5, 2018
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 10, 2018
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 24, 2018
@pmichali pmichali changed the title WIP: kubelet: get IP based on service network IP mode for dual-stack support. kubelet: get IP based on service network IP mode for dual-stack support. Nov 24, 2018
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 24, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pmichali
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: dchen1107

If they are not already assigned, you can assign the PR to them by writing /assign @dchen1107 in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@pmichali
Copy link
Contributor Author

Updated commit, removing dependent changes from PR #70633, which has since merged. This is ready for review.

Two questions...

This uses the DNS_SVC_IP environment variable, which was introduced with IPv6 only capability previously, to determine if the IPv4 or IPv6 address is used for pod. If not specified, it will default back to trying to find the IPv4 address on the pod first, and then trying to find the IPv6 address. Would the requirement of this environment variable constitute a "user facing" change?

If not, how do I remove the release note label? I accidentally deleted the text that is used to trigger this label, and only have the text "NONE", and I don't remember what the syntax is supposed to be.

@pmichali
Copy link
Contributor Author

/assign @dchen1107

@pmichali
Copy link
Contributor Author

Reviewers, PTAL, and let me know about the release note question I posed. Thanks!

@pmichali
Copy link
Contributor Author

@neolit123 Can you take a look at the latest changes? This is ready for review. Thanks!

Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmichali thanks for the update.
LGTM mostly. added a couple of minor comments.

sig-node and the kubelet maintainers would do the final LGTM / approve.
/ok-to-test

glog.V(3).Infof("DNS Service IP is %s", dnsServiceIP)
dnsIP := net.ParseIP(dnsServiceIP)
if dnsIP == nil {
glog.Warningf("Unable to parse DNS_SVC_IP (%s) to determine preferred family", dnsServiceIP)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it make sense to indicate that the fall back here is to -4.

simplest way is to use a goto to a label before:

fallback:
	glog.V(3).Infof("Using IPv4.....

yet i see that goto is not that widely used in the project.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm changing the wording of the message. Let me know if that is OK.

if err != nil {
return nil, err
addrType := preferredFamily()
if addrType != "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this whitespace issue seems to be present still.

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 28, 2018
@neolit123
Copy link
Member

neolit123 commented Nov 28, 2018

also please add a release note as per the PR template:
https://github.com/kubernetes/kubernetes/blob/master/.github/PULL_REQUEST_TEMPLATE.md
(user-facing change)

@neolit123
Copy link
Member

/remove-area kubeadm

Copy link
Contributor Author

@pmichali pmichali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will upload newer patch today.

glog.V(3).Infof("DNS Service IP is %s", dnsServiceIP)
dnsIP := net.ParseIP(dnsServiceIP)
if dnsIP == nil {
glog.Warningf("Unable to parse DNS_SVC_IP (%s) to determine preferred family", dnsServiceIP)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm changing the wording of the message. Let me know if that is OK.

if err != nil {
return nil, err
addrType := preferredFamily()
if addrType != "" {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed it. Had spaces vs tab, so it looked fine in my editor. Will be in next patch.

@pmichali
Copy link
Contributor Author

@neolit123 Thanks for looking. I was wondering if a release note is needed for this change. See my post 5 days ago... I'm thinking not, but need guidance here.

In GetPodIP, the kubelet code attempts to get the IPv4 address for the
pod, and if that fails, tries to get the IPv6 address.  That works fine
for IPv4 only and IPv6 only mode, but not for dual-stack, where each
pod will have both addresses.

In addition, since dual-stack will support only a single (selectable)
family for services, we need to also ensure that kubelet is using the
same family, when getting the IP. One way to do that is to request
the IP, based on the family used for services.

With the previous introduction of IPv6 only mode, a DNS_SVC_IP environment
variable was defined, so that DNS used an IP that was in the family
for services. The variable can be provided to the kubelet, via a
drop-in file, as is done for kubeadm-dind-cluster.

This change makes use of that information, reading the variable, and,
if set, will use the same family as the DNS IP in the request to obtain
the pod IP. Otherwise, it will fall back on trying to get the IPv4
address, and then trying to get the IPv6 address, if no IPv4 address
is available (to support backward compatibility).

Fixes Issue: kubernetes#70653

/area ipv6
/sig network
@pmichali
Copy link
Contributor Author

/test pull-kubernetes-local-e2e-containerized
/test pull-kubernetes-e2e-gce

@pmichali
Copy link
Contributor Author

@freehan @dchen1107 Can you PTAL at this kubelet change? Also, regarding my question above, is a release note needed? If so, I could use a bit of guidance on the wording. If not, how do I remote the release note label?

@pmichali
Copy link
Contributor Author

pmichali commented Dec 7, 2018

@thockin could you help getting some eyes on this. I'm running out of time to be able to work on getting this merged (taking on new position at work with limited upstream involvement). Thanks!

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 7, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 6, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@internetionals
Copy link

Still an issue on 1.14

/reopen
/remove-lifecycle-rotten
/remove-lifecycle-sta

@k8s-ci-robot
Copy link
Contributor

@internetionals: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

Still an issue on 1.14

/reopen
/remove-lifecycle-rotten
/remove-lifecycle-sta

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ipv6 area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dual-stack: kubelet preferred family
6 participants