Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VSphere cloud provider settings with secret not working. #75175

Closed
krishn10 opened this issue Mar 8, 2019 · 27 comments · Fixed by #101028
Closed

VSphere cloud provider settings with secret not working. #75175

krishn10 opened this issue Mar 8, 2019 · 27 comments · Fixed by #101028
Assignees
Labels
area/provider/vmware Issues or PRs related to vmware provider kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@krishn10
Copy link

krishn10 commented Mar 8, 2019

What happened:

VSphere cloud provider settings with secret not working. Kubernetes node goes into "Not-ready" state when the "/etc/cfc/conf/vsphere_cloud_conf" file is modified to use a secret (as stated in the vsphere support document here: https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/existing.html Under point number 2.). The moment a secret file is created with a base64 encoded username and password, and the details updated in the vsphere_cloud_conf file, the node status goes to "Not-ready".

What you expected to happen:

Once the "/etc/cfc/conf/vsphere_cloud_conf" file is updated with the details of a secret file, and the kubernetes daemon and kubelet services is restarted (by running "systemctl daemon-reload" and "systemctl restart kubelet.service"), it is expected that the node comes back to a "ready" state and that all vsphere operations work fine.

How to reproduce it (as minimally and precisely as possible):

Create a vsphere secret file, secret.yaml, as follows:

apiVersion: v1
kind: Secret
metadata:
  name: vspsecret
type: Opaque
data:
  <vcenter IP 1.1.1.1>.username: <some base64 encoded username>
  1.1.1.1.password: abcdefgDBcvX==

Create the secret in the kube-system namespace:

# kubectl create -f secret.yaml -n kube-system

Modify "/etc/cfc/conf/vsphere_cloud_conf" as follows:

[Global]
secret-name="vspsecret"
secret-namespace="kube-system"
port = "443"
insecure-flag = "1"
datacenters = <Name of your datacenter>

[VirtualCenter "1.1.1.1"]
datacenters = <Name of your datacenter>

[Workspace]
server = "1.1.1.1"
default-datastore = <any datastore name>
folder = "kubernetes"


[Disk]
scsicontrollertype = pvscsi

Restart the kubelet service as follows:

# systemctl daemon-reload

# systemctl restart kubelet.service

Verify by checking the node status:

# kubectl get nodes

Here are the kubelet logs corresponding to the issue observed (Please note that the username and password have been verified with a base64 decode):

Mar  8 11:30:11 krish-icp-vsan-ubt1804-master-vm-1 hyperkube[4940]: I0308 11:30:11.404262    4940 kubelet_node_status.go:276] Setting node annotation to enable volume controller attach/detach                                                                                           Mar  8 11:30:12 krish-icp-vsan-ubt1804-master-vm-1 hyperkube[4940]: W0308 11:30:12.067606    4940 prober.go:103] No ref for container "docker://daf9ae0bc1331c1ab14e733790699a858d9112885e0d76c1824dc9a37d4eb7a1" (metrics-server-5bcdd5579-qtvjv_kube-system(a35f459a-3f0b-11e9-9d76-005056a814c4):metrics-server)                                                                                                                    ^C                                                                                                                                           root@krish-icp-vsan-ubt1804-master-vm-1:~# tail -f /var/log/syslog
Mar 12 05:30:32 krish-icp-vsan-ubt1804-master-vm-1 hyperkube[4940]: E0312 05:30:32.451684    4940 vsphere.go:1330] Cannot connent to vsphere. Get zone for node krish-icp-vsan-ubt1804-master-vm-1 error
Mar 12 05:30:32 krish-icp-vsan-ubt1804-master-vm-1 hyperkube[4940]: E0312 05:30:32.451700    4940 kubelet_node_status.go:66] Unable to construct v1.Node object for kubelet: failed to get zone from cloud provider: ServerFaultCode: Cannot complete login due to an incorrect user name or password.
Mar 12 05:30:34 krish-icp-vsan-ubt1804-master-vm-1 hyperkube[4940]: E0312 05:30:34.230029    4940 pod_workers.go:186] Error syncing pod a35f459a-3f0b-11e9-9d76-005056a814c4 ("metrics-server-5bcdd5579-qtvjv_kube-system(a35f459a-3f0b-11e9-9d76-005056a814c4)"), skipping: failed to "StartContainer" for "metrics-server" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=metrics-server pod=metrics-server-5bcdd5579-qtvjv_kube-system(a35f459a-3f0b-11e9-9d76-005056a814c4)"
Mar 12 05:30:39 krish-icp-vsan-ubt1804-master-vm-1 hyperkube[4940]: I0312 05:30:39.451846    4940 kubelet_node_status.go:276] Setting node annotation to enable volume controller attach/detach
Mar 12 05:30:43 krish-icp-vsan-ubt1804-master-vm-1 hyperkube[4940]: E0312 05:30:43.785103    4940 certificate_manager.go:378] Certificate request was not signed: timed out waiting for the condition
Mar 12 05:30:44 krish-icp-vsan-ubt1804-master-vm-1 hyperkube[4940]: E0312 05:30:44.470578    4940 connection.go:65] Failed to create govmomi client. err: ServerFaultCode: Cannot complete login due to an incorrect user name or password.
Mar 12 05:30:44 krish-icp-vsan-ubt1804-master-vm-1 hyperkube[4940]: E0312 05:30:44.470608    4940 nodemanager.go:382] Cannot connect to vCenter with err: ServerFaultCode: Cannot complete login due to an incorrect user name or password.
Mar 12 05:30:44 krish-icp-vsan-ubt1804-master-vm-1 hyperkube[4940]: E0312 05:30:44.470624    4940 vsphere.go:589] failed connecting to vcServer "9.42.99.254" with error ServerFaultCode: Cannot complete login due to an incorrect user name or password.
Mar 12 05:30:44 krish-icp-vsan-ubt1804-master-vm-1 hyperkube[4940]: E0312 05:30:44.470640    4940 vsphere.go:1330] Cannot connent to vsphere. Get zone for node krish-icp-vsan-ubt1804-master-vm-1 error
Mar 12 05:30:44 krish-icp-vsan-ubt1804-master-vm-1 hyperkube[4940]: E0312 05:30:44.470656    4940 kubelet_node_status.go:66] Unable to construct v1.Node object for kubelet: failed to get zone from cloud provider: ServerFaultCode: Cannot complete login due to an incorrect user name or password.
Mar 12 05:30:45 krish-icp-vsan-ubt1804-master-vm-1 hyperkube[4940]: E0312 05:30:45.228658    4940 pod_workers.go:186] Error syncing pod a35f459a-3f0b-11e9-9d76-005056a814c4 ("metrics-server-5bcdd5579-qtvjv_kube-system(a35f459a-3f0b-11e9-9d76-005056a814c4)"), skipping: failed to "StartContainer" for "metrics-server" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=metrics-server pod=metrics-server-5bcdd5579-qtvjv_kube-system(a35f459a-3f0b-11e9-9d76-005056a814c4)"

Anything else we need to know?:

The feature for using a vsphere secret was implemented in kubernetes version 1.11. I am using version 1.12.

Environment:

  • Kubernetes version (use kubectl version): 1.12
  • Cloud provider or hardware configuration: vSphere
  • OS (e.g: cat /etc/os-release): Ubuntu 18.04
  • Kernel (e.g. uname -a): Linux 4.15.0-46-generic Make cloudcfg method switch idiomatic. #49-Ubuntu SMP Wed Feb 6 09:33:07 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools: The installation was performed using IBM Cloud Private version 3.1.2.
  • Others:
@krishn10 krishn10 added the kind/bug Categorizes issue or PR as related to a bug. label Mar 8, 2019
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Mar 8, 2019
@neolit123
Copy link
Member

/sig vmware

@k8s-ci-robot k8s-ci-robot added area/provider/vmware Issues or PRs related to vmware provider and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 9, 2019
@neolit123
Copy link
Member

@kubernetes/cloud-provider-vsphere-maintainers

@Ptyool
Copy link

Ptyool commented Mar 22, 2019

Same issue here on K8s 1.13.4

@framegrace
Copy link

Same issue here on K8s 1.12

@chandidas
Copy link

@divyenpatel do you have any information on this.

@frapposelli
Copy link
Member

/assign @SandeepPissay

PTAL

@SandeepPissay
Copy link
Contributor

Mar 12 05:30:32 krish-icp-vsan-ubt1804-master-vm-1 hyperkube[4940]: E0312 05:30:32.451700 4940 kubelet_node_status.go:66] Unable to construct v1.Node object for kubelet: failed to get zone from cloud provider: ServerFaultCode: Cannot complete login due to an incorrect user name or password.

@krishn10 The above log indicates incorrect user name or password. Are you sure that the username/pwd was correct?

Btw, I'm wondering why kubelet is trying to connect to vSphere. You should not be specifying vsphere.conf file in kubelet params if you are not using zones feature. Are you using zones feature?

@alloran
Copy link

alloran commented May 16, 2019

Same issue here on K8s 1.13.5
And yes, we want to use zones feature

@MaxCav666
Copy link

MaxCav666 commented May 16, 2019

Same issue here on K8s 1.14.1 with the file vsphere.conf without secrets in CentOS 7 and installation with kubeadm.
I connect successfully to vCenter with the same username and password.

[Global]
user = "kubernetes@vsphere.local"
password = "kuBer$78rt"
port = "443"
insecure-flag = "1"

[VirtualCenter "192.168.2.12"]
datacenters = "Datacenter"

[Workspace]
server = "192.168.2.12"
datacenter = "Datacenter"
default-datastore = "ContentLibrary"
folder = "kubernetes"

[Disk]
scsicontrollertype = pvscsi

[Network]
public-network = "VM Network"

I have tried to escape the special character of the password and same result.

password = "kuBer$$78rt"

Log vCenter

image

@rleonardibm
Copy link

any word on this? seems broken for ICP 3.1.0

@BrandonStiff
Copy link

I have the same issue. I was able to work around it by adding the username and password in plain text to the vsphere.conf file. The secret did work before, but definitely does not now.

I believe the cause for us was upgrading K8s from 1.11.2 to 1.13.4.

@divyenpatel
Copy link
Member

We are looking in this issue.

@nikhita nikhita added the sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. label Aug 6, 2019
@andreacasini
Copy link

It work for me on 1.15.2 with the following vsphere.conf file:

[Global]
secret-name = "vcp-secret"
secret-namespace = "kube-system"
port = "443"
insecure-flag = "1"

[VirtualCenter "vcsa-03.k8s.lab"]
datacenters = "LAB"

[Workspace]
server = "vcsa-03.k8s.lab"
datacenter = "LAB"
default-datastore = "DS-02"
resourcepool-path = "CLS-LAB/Resources"
folder = "k8s"

[Disk]
scsicontrollertype = pvscsi

Hope it helps.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 12, 2019
@frapposelli
Copy link
Member

@krishn10 @BrandonStiff @rleonardibm @tiempososcuros @framegrace @Ptyool
Is this still an issue with more recent versions of k8s?

/remove-lifecycle stale
/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 12, 2019
@Ptyool
Copy link

Ptyool commented Nov 18, 2019

Hi, I can confirm that it's working on 1.15.3

@MnrGreg
Copy link

MnrGreg commented Jan 6, 2020

I'm putting together a PR CAPV/699 to incorporate Regions and Zones into both the in-tree VCP and external CPI deployments. I've noticed that with the in-tree VCP, kubelet is not able to consume the secret data and authenticate.

Both the secret and vsphere.conf are correctly constructed by CAPV in the test environment.

[Global]
secret-name = "cloud-provider-vsphere-credentials"
secret-namespace = "kube-system"
datacenters = "RK_NonProd"
insecure-flag = "1"

[VirtualCenter "vcsrck-vdcn001.redacted.local"]

[Workspace]
server = "vcsrck-vdcn001.redacted.local"
datacenter = "RK_NonProd"
folder = "vm"
default-datastore = "RKQA_LUN01"
resourcepool-path = "'RK_NP_Infrastructure/Resources'"

[Disk]
scsicontrollertype = pvscsi

[Network]
public-network = "rck3236trunk"

[Labels]
region = "region"
zone = "zone"
apiVersion: v1
data:
  vcsrck-vdcn001.redacted.local.password: UGFzc3dvcmQtMTIz
  vcsrck-vdcn001.redacted.local.username: dmNwQHJrbnB2Yy5sb2NhbA==
kind: Secret
metadata:
  creationTimestamp: "2020-01-05T18:47:16Z"
  name: cloud-provider-vsphere-credentials
  namespace: kube-system
  resourceVersion: "12"
  selfLink: /api/v1/namespaces/kube-system/secrets/cloud-provider-vsphere-credentials
  uid: 5f9ec9b7-e351-4e46-ba58-d91841d60974
type: Opaque

The password in the above is correctly encoded as "Password-123" and works when specified through "user = " and "password = " in clear text.

What happened:

The kubelet logging was set to v=9:

KUBELET_KUBEADM_ARGS="--cloud-config=/etc/kubernetes/vsphere.conf --cloud-provider=vsphere --container-runtime=remote --container-runtime-endpoint=/var/run/containerd/containerd.sock --v=9"

The kubelet nodemanager.go logs output only a single message:

# systemctl restart kubelet; journalctl -f -u kubelet | grep -i nodeman
Jan 06 00:46:38 generic-cluster-1-controlplane-0 kubelet[21403]: E0106 00:46:38.576575   21403 nodemanager.go:398] Cannot connect to vCenter with err: ServerFaultCode: Cannot complete login due to an incorrect user name or password.

When intercepting the kubelet call, one sees an empty username and password supplied:

POST https://vcsrck-vdcn001.redacted.local/sdk HTTP/1.1
Host:	vcsrck-vdcn001.redacted.local
User-Agent:	kubernetes-cloudprovider/v1.16.2
Transfer-Encoding:	chunked
Content-Type:	text/xml; charset="utf-8"
Soapaction:	urn:vim25/6.7
Accept-Encoding:	gzip
<?xml version="1.0" encoding="UTF-8"?>
<Envelope xmlns="http://schemas.xmlsoap.org/soap/envelope/">
  <Body>
    <Login xmlns="urn:vim25">
      <_this type="SessionManager">SessionManager</_this>
      <userName></userName>
      <password></password>
      <locale>en_US</locale>
    </Login>
  </Body>
</Envelope>

Next one sees the corresponding Response which matches the nodemanager.go log:

HTTP/1.1 500 Internal Server Error
Date:	Sun, 5 Jan 2020 19:09:53 GMT
Set-Cookie:	vmware_soap_session="a0b9e84c1f09d402c863be6e7746539facefc67d"; Path=/; HttpOnly; Secure;
Cache-Control:	no-cache
Connection:	Keep-Alive
Content-Type:	text/xml; charset=utf-8
Content-Length:	585
<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/"
xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <soapenv:Body>
    <soapenv:Fault>
      <faultcode>ServerFaultCode</faultcode>
      <faultstring>Cannot complete login due to an incorrect user name or password.</faultstring>
      <detail>
        <InvalidLoginFault xmlns="urn:vim25" xsi:type="InvalidLogin"></InvalidLoginFault>
      </detail>
    </soapenv:Fault>
  </soapenv:Body>
</soapenv:Envelope>

What you expected to happen:

If I read vsphere.go correctly, the username and password are initially set to null when using secrets:

if isSecretInfoProvided {
if cfg.Global.User != "" {
glog.Warning("Global.User and Secret info provided. VCP will use secret to get credentials")
cfg.Global.User = ""
}
if cfg.Global.Password != "" {
glog.Warning("Global.Password and Secret info provided. VCP will use secret to get credentials")
cfg.Global.Password = ""
}
}

vSphereConn := vclib.VSphereConnection{
Username: vcConfig.User,
Password: vcConfig.Password,

Upon failure, nodemanager.go will then use credentialmanager to swap in the username and password from the secret:

credentialManager := nm.CredentialManager()
if !vclib.IsInvalidCredentialsError(err) || credentialManager == nil {
klog.Errorf("Cannot connect to vCenter with err: %v", err)
return err
}
klog.V(4).Infof("Invalid credentials. Cannot connect to server %q. Fetching credentials from secrets.", vsphereInstance.conn.Hostname)
// Get latest credentials from SecretCredentialManager
credentials, err := credentialManager.GetCredential(vsphereInstance.conn.Hostname)
if err != nil {
klog.Errorf("Failed to get credentials from Secret Credential Manager with err: %v", err)
return err
}
vsphereInstance.conn.UpdateCredentials(credentials.User, credentials.Password)

This isn't working correctly from what I can see and I haven't been able to debug further. I would expect to see the following log message from kubelet nodemanager.go but don't:

klog.V(4).Infof("Invalid credentials. Cannot connect to server %q. Fetching credentials from secrets.", vsphereInstance.conn.Hostname)

Anything else we need to know?:

  • OS images from kubernetes-sig/image-builder were used.

  • Tested with both kubernetes v1.15.3 (reportedly fixed) and v1.16.2

  • I've also noticed that when the Regions and Zones Labels are removed from vsphere.conf, kubelet does not try to authenticate to vsphere. This might be why some people are not seeing the reported authentication errors.

@Ptyool were your tests using the Region and Zone Labels and/or vSphere Storage?

Environment:
Kubernetes version (use kubectl version): 1.15.3 & 1.16.2
Cloud provider or hardware configuration: vSphere
OS (e.g: cat /etc/os-release): CentOS 7 & Photon OS 3 Rev2
Kernel (e.g. uname -a): 3.10.0-1062.7.1.el7.x86_64
Install tools: cluster-api-provider-vsphere

<name>VMware vCenter Server</name>
<fullName>VMware vCenter Server 6.7.0 build-13007421</fullName>
<apiType>VirtualCenter</apiType>
<apiVersion>6.7.2</apiVersion>

@MnrGreg
Copy link

MnrGreg commented Jan 8, 2020

@abrarshivani from the commit history, it seems you're the closest to this. Do you have any thoughts?

++ @divyenpatel

@akutz
Copy link
Member

akutz commented Jan 10, 2020

This is blocking the Cluster API Provider or vSphere on 6.5. Any updates? Thanks!

@SandeepPissay
Copy link
Contributor

I do not have bandwidth to look into this issue.

@SandeepPissay SandeepPissay removed their assignment Jan 10, 2020
@akutz
Copy link
Member

akutz commented Jan 10, 2020

cc @yastij

@uysalnet
Copy link

Is there any update or fix regarding this issue?

@SandeepPissay
Copy link
Contributor

cc @manojvs157

@duritong
Copy link

duritong commented Nov 5, 2020

I can confirm this is still an issue in k8s 1.18.3.

Situation: We have a working intree cloud provider config, with the credentials in a secret. It works to attach volumes in the cluster.

As soon as you try to follow official vmWare documentation https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/zones.html to enable Zone Support by adding the following 3 lines:

[Labels]
    region = "k8s-region"
    zone = "k8s-zone"

The kubelet can't be started anymore with the following messages:

Nov 05 14:13:08 woker.vmwarecluster.com hyperkube[3910689]: E1105 14:13:08.415964 3910689 connection.go:65] Failed to create govmomi client. err: ServerFaultCode: Cannot complete login due to an incorrect user name or password.
Nov 05 14:13:08 woker.vmwarecluster.com hyperkube[3910689]: E1105 14:13:08.416035 3910689 nodemanager.go:398] Cannot connect to vCenter with err: ServerFaultCode: Cannot complete login due to an incorrect user name or password.
Nov 05 14:13:08 woker.vmwarecluster.com hyperkube[3910689]: E1105 14:13:08.416050 3910689 vsphere.go:624] failed connecting to vcServer "vc.vmwarecluster.com" with error ServerFaultCode: Cannot complete login due to an incorrect user name or password.
Nov 05 14:13:08 woker.vmwarecluster.com hyperkube[3910689]: E1105 14:13:08.416069 3910689 vsphere.go:1503] Cannot connect to vsphere. Get zone for node woker.vmwarecluster.com error
Nov 05 14:13:08 woker.vmwarecluster.com hyperkube[3910689]: E1105 14:13:08.416085 3910689 kubelet_node_status.go:66] Unable to construct v1.Node object for kubelet: failed to get zone from cloud provider: ServerFaultCode: Cannot complete login due to an incorrect user name or password.
Nov 05 14:13:08 woker.vmwarecluster.com hyperkube[3910689]: I1105 14:13:08.816342 3910689 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
Nov 05 14:13:08 woker.vmwarecluster.com hyperkube[3910689]: I1105 14:13:08.833196 3910689 connection.go:139] SessionManager.Login with username "" 

As mentioned above, as soon as we configure the user & password directly as plaintext in the config file it works and Zone Support is activated.

So at the moment you cannot use Zone Support safely, as you have to put your vmWare credentials in plain text into the cloud provider config.

@pbertera
Copy link

The issue is still present in 1.19.0

@dealboy
Copy link

dealboy commented Nov 13, 2020

I have the same issue on k8s 1.18.3 (without zones)
(on an RKE setup)

@cheftako
Copy link
Member

cheftako commented Feb 3, 2021

/assign @andrewsykim
/triage accepted

@k8s-ci-robot k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Feb 3, 2021
lobziik added a commit to lobziik/kubernetes that referenced this issue Apr 6, 2021
… secret provided and no CredentialsManager was set up.

Partially solves kubernetes#75175. Kubelet does not stucking on startup.
k8s-ci-robot pushed a commit that referenced this issue Feb 7, 2022
… secret provided and no CredentialsManager was set up.

Partially solves #75175. Kubelet does not stucking on startup.
linxiulei pushed a commit to linxiulei/kubernetes that referenced this issue Jan 7, 2023
… secret provided and no CredentialsManager was set up.

Partially solves kubernetes#75175. Kubelet does not stucking on startup.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/vmware Issues or PRs related to vmware provider kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet