Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No InternalIP when upgrading to kubernetes >= 1.6.5 with VSphere Cloud Provider #48760

Closed
cbonte opened this issue Jul 11, 2017 · 9 comments · Fixed by #49202
Closed

No InternalIP when upgrading to kubernetes >= 1.6.5 with VSphere Cloud Provider #48760

cbonte opened this issue Jul 11, 2017 · 9 comments · Fixed by #49202
Labels
area/cloudprovider kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@cbonte
Copy link
Contributor

cbonte commented Jul 11, 2017

BUG REPORT:

/kind bug

What happened:
After upgrading the kubernetes cluster from 1.6.4 with the VSphere Cloud Provider enabled to 1.6.5/1.6.6 or 1.6.7, the cluster nodes don't have anymore InternalIPs

# kubectl get node debug.k8s.debug-01 -o yaml
...
status:
  addresses:
  - address: 10.9.65.204
    type: ExternalIP
  - address: debug.k8s.debug-01
    type: Hostname
...

Several issues occur at this step.
For example, it breaks heapster with such an error :

E0711 12:59:05.000192       1 kubelet.go:282] Node debug.k8s.debug-01 has no valid hostname and/or IP address: debug.k8s.debug-01

What you expected to happen:
The node should have an InternalIP.

How to reproduce it (as minimally and precisely as possible):
Deploy a 1 node kubernetes cluster in a VM, and enable the vsphere cloud provider.

Anything else we need to know?:
with kubernetes 1.6.4 we get :

status:
  addresses:
  - address: 10.9.65.204
    type: InternalIP
  - address: debug.k8s.debug-01
    type: Hostname

Downgrading to 1.6.4 fixes the issue.

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.7+coreos.0", GitCommit:"c8c505ee26ac3ab4d1dff506c46bc5538bc66733", GitTreeState:"clean", BuildDate:"2017-07-06T17:38:33Z", GoVersion:"go1.7.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.7+coreos.0", GitCommit:"c8c505ee26ac3ab4d1dff506c46bc5538bc66733", GitTreeState:"clean", BuildDate:"2017-07-06T17:38:33Z", GoVersion:"go1.7.6", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration: vsphere
  • OS (e.g. from /etc/os-release):
PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
  • Kernel (e.g. uname -a):
Linux debug.k8s.debug-01 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26) x86_64 GNU/Linux

@kubernetes/sig-cluster-ops-bugs @kubernetes/sig-network-bugs
/area platform/vsphere cloudprovider

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Jul 11, 2017
@k8s-github-robot
Copy link

@cbonte There are no sig labels on this issue. Please add a sig label by:
(1) mentioning a sig: @kubernetes/sig-<team-name>-misc
e.g., @kubernetes/sig-api-machinery-* for API Machinery
(2) specifying the label manually: /sig <label>
e.g., /sig scalability for sig/scalability

Note: method (1) will trigger a notification to the team. You can find the team list here and label list here

@k8s-github-robot k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jul 11, 2017
@cbonte
Copy link
Contributor Author

cbonte commented Jul 11, 2017

/sig cluster-ops
/sig network
/area platform/vsphere
/area cloudprovider

@k8s-ci-robot k8s-ci-robot added sig/cluster-ops sig/network Categorizes an issue or PR as relevant to SIG Network. area/cloudprovider labels Jul 11, 2017
@k8s-github-robot k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jul 11, 2017
@cbonte
Copy link
Contributor Author

cbonte commented Jul 12, 2017

Performing some more tests on a mini instance outside of the production cluster, I think I've identified the issue.
The configuration required to start kubelet with the --node-ip option (otherwise the wrong IP/interface is used to communicate between nodes). The issue is triggered when this option is set.

The vsphere cloud provider looks OK. But the issue is located here :
https://github.com/kubernetes/kubernetes/blob/release-1.6/pkg/kubelet/kubelet_node_status.go#L407

When the nodeIP is not nil, the loop iterates on the node addresses and returns only the first one found. Since PR #45201, the list always returns an ExternalIP first.

I think the loop in kubelet_node_status.go should iterate on all the addresses to rebuild the list, so that it will include the ExternalIP and the InternalIP. Can someone confirm the analysis ?

@cbonte
Copy link
Contributor Author

cbonte commented Jul 12, 2017

/sig node

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Jul 12, 2017
@yastij
Copy link
Member

yastij commented Jul 12, 2017

ping @kerneltime

@cbonte
Copy link
Contributor Author

cbonte commented Jul 12, 2017

Modifying the code like this solved the issue :

diff --git a/pkg/kubelet/kubelet_node_status.go b/pkg/kubelet/kubelet_node_status.go
index a0da169f6b..ce347523e9 100644
--- a/pkg/kubelet/kubelet_node_status.go
+++ b/pkg/kubelet/kubelet_node_status.go
@@ -405,15 +405,16 @@ func (kl *Kubelet) setNodeAddress(node *v1.Node) error {
			return fmt.Errorf("failed to get node address from cloud provider: %v", err)
		}
		if kl.nodeIP != nil {
+			node.Status.Addresses = []v1.NodeAddress{}
			for _, nodeAddress := range nodeAddresses {
				if nodeAddress.Address == kl.nodeIP.String() {
-					node.Status.Addresses = []v1.NodeAddress{
-						{Type: nodeAddress.Type, Address: nodeAddress.Address},
-						{Type: v1.NodeHostName, Address: kl.GetHostname()},
-					}
-					return nil
+					node.Status.Addresses = append(node.Status.Addresses, v1.NodeAddress{Type: nodeAddress.Type, Address: nodeAddress.Address})
				}
			}
+			if len(node.Status.Addresses) > 0 {
+				node.Status.Addresses = append(node.Status.Addresses, v1.NodeAddress{Type: v1.NodeHostName, Address: kl.GetHostname()})
+				return nil
+			}
			return fmt.Errorf("failed to get node address from cloud provider that matches ip: %v", kl.nodeIP)
		}

which returns :

status:
  addresses:
  - address: debug.k8s.debug-01
    type: Hostname
  - address: 10.9.65.204
    type: ExternalIP
  - address: 10.9.65.204
    type: InternalIP

I guess it can be done in a cleaner way, but it was to quickly test the fix.

@divyenpatel
Copy link
Member

CC: @BaluDontu @luomiao @tusharnt

@luomiao
Copy link

luomiao commented Jul 13, 2017

@cbonte
Thanks for filing this issue related to vSphere cloud provider.
I agree with your analysis.
It seems the current setNodeAddress function assumes that the IP address specified by user can only be InternalIP.
Your fix to kubelet_node_status.go is a better solution than work-around in cloud provider.

cbonte added a commit to cbonte/kubernetes that referenced this issue Jul 19, 2017
When a node IP is set and a cloud provider returns the same address with
several types, on the first address was accepted. With the changes made
in PR kubernetes#45201, the vSphere cloud provider returned the ExternalIP first,
which led to a node without any InternalIP.

The behaviour is modified to return all the address types for the
specified node IP.

Issue kubernetes#48760
@cbonte
Copy link
Contributor Author

cbonte commented Jul 24, 2017

Hi, what are the next steps to discuss about the PR ?
@luomiao Are you able to make some tests ?

cbonte added a commit to cbonte/kubernetes that referenced this issue Aug 29, 2017
When a node IP is set and a cloud provider returns the same address with
several types, on the first address was accepted. With the changes made
in PR kubernetes#45201, the vSphere cloud provider returned the ExternalIP first,
which led to a node without any InternalIP.

The behaviour is modified to return all the address types for the
specified node IP.

Issue kubernetes#48760
vdemeester pushed a commit to vdemeester/kubernetes that referenced this issue Sep 7, 2017
Automatic merge from submit-queue (batch tested with PRs 51728, 49202)

Fix setNodeAddress when a node IP and a cloud provider are set

**What this PR does / why we need it**:
When a node IP is set and a cloud provider returns the same address with
several types, only the first address was accepted. With the changes made
in PR kubernetes#45201, the vSphere cloud provider returned the ExternalIP first,
which led to a node without any InternalIP.

The behaviour is modified to return all the address types for the
specified node IP.

**Which issue this PR fixes**: fixes kubernetes#48760

**Special notes for your reviewer**:
* I'm not a golang expert, is it possible to mock `kubelet.validateNodeIP()` to avoid the need of real host interface addresses in the test ?
* It would be great to have it backported for a next 1.6.8 release.

**Release note**:
```release-note
NONE
```
cbonte added a commit to cbonte/kubernetes that referenced this issue Sep 8, 2017
When a node IP is set and a cloud provider returns the same address with
several types, on the first address was accepted. With the changes made
in PR kubernetes#45201, the vSphere cloud provider returned the ExternalIP first,
which led to a node without any InternalIP.

The behaviour is modified to return all the address types for the
specified node IP.

Issue kubernetes#48760
cbonte added a commit to cbonte/kubernetes that referenced this issue Sep 9, 2017
When a node IP is set and a cloud provider returns the same address with
several types, on the first address was accepted. With the changes made
in PR kubernetes#45201, the vSphere cloud provider returned the ExternalIP first,
which led to a node without any InternalIP.

The behaviour is modified to return all the address types for the
specified node IP.

Issue kubernetes#48760
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cloudprovider kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants